Hello Basketbawful readers! Due to some circumstances, I will be revealing my end of the season fantasy basketball analysis a bit early. Although my writing style is typically humor and sarcasm with some stats thrown in (okay, more like sarcasm and pessimism), today I’d like to take a more refined look at the stats of fantasy basketball.
The Intro:So another year of fantasy basketball is winding to a close. Maybe your team got pounded by injuries; maybe your team had Dirk, Nash, and David Lee and cruised to victory (like mine). There are many different methods out there to look at and evaluate player performance, and there are lots of ranking systems. Sure LeBron was obviously #1, but what about down the list? Do you really trust those pre-rankings? Today I’m going to talk about a method of evaluating the numbers, so hopefully during next year’s draft you can use your 90 seconds scrambling for injury and team information while having some confidence in the numbers to expect.
The Method:We are already quite comfortable with using averages in sports stats. LeBron scored 29.9 points per game. Dwight grabbed 13.6 rebounds per 36 minutes. So instead of jumping to PER or RAPM or some other complex analysis, why not go to just the next step with standard deviation? In fantasy, we have the entire population (all players that have logged minutes in an NBA game), and all we really care about is choosing the guy that is better than what the other teams have. Standard Deviation could fit this need!
Well let’s not get ahead of ourselves. For example with Yahoo!, you get all the raw number totals and averages, and even their special “O-Rank” and “Rank”. Why expand beyond that? Well the problem is, when you sort by FG%, or TOV, things start looking strange. Is Marc Gasol’s 58.3% going to help your team more than David Lee’s 55.2%? Just how bad is Dwight Howard’s 60.3% FT shooting going to kill that category? Kinda hard to tell by eyeballing it. Even with the raw numbers: just how much will having Steve Nash on my team dominate the assists category?
The Good News:Enter: Standardize. If you really don't want to do math, then I’ve still got good news for you: this is all done by
ESPN’s Fantasy Basketball Player Rater. In fact, if you are quite satisfied using just the ESPN Player Rater, you probably can stop reading the article now. Here you can see all the Standard Scores in each category, and they are added up to make the final column as a composite score (yes, this makes more sense than adding percentages together randomly
*coughHOLLINGERcough*).
For example: LeBron has scored 2033 points as of this post. The league average is 472.5 and the standard deviation is 410.25 pts. So (2033 – 472.5) / 410.25 = 3.8. Meaning LeBron is 3.8 standard deviations above the league average. For those not familiar with standard deviations, a score of 1 puts you above ~84.1% of the population, 2 puts you ~97.7% above, and 3 puts you ~99.9% above, and 4+ is outstanding. Isn’t this what you really want to know on draft day? You can find overall contributors with a glance, and see what needs you are lacking and pickup specialists without having to guesstimate the raw numbers.
Another benefit of Standardizing is the use of negative standard deviations, so you can see when a player is really hurting your team!
The Workarounds:Okay so the bad news here is ESPN only shows the 8 categories. If you’re playing with TOVs, how does that fit in? Also, how do I calculate the FG% and FT% numbers since they aren’t raw numbers?
Well here’s where we start doing things for ourselves. Pickup your favorite script of choice, or start copying and pasting CSV text from basketball-reference/your favorite website. Now then, turnovers are easier: since it works as a negative statistic, I simply found all the Standard Scores then changed the signs.
For FG% and FT%: I personally believe ESPN doesn’t give enough weight to the amount shot. Shouldn’t LeBron shooting 50.0% at 20.2 FGA/g have more impact than Varejao shooting 57.1%, but only 6.4 FGA/g? Well I think so, which is why I normalized first, then weighted by shots taken before dividing by standard deviation. My FGscore is defined as:
And I standardize the FGscore (average is already zero, so really I’m just dividing by standard deviation). So LeBron ends up with 2.37, and Varejao with 2.07, not that anyone would think of drafting the latter over the former. But in any case, now we can properly rank players by their FG%, so all the lacktators with 1.000% FG% filter to the bottom.
Same thing with FT%: Is Nash's 94.1% (2.7 FTA/g) or Carmelo's 83.1% (9.3 FTA/g) helping you win the category more? ESPN puts Nash over ‘Melo, but using my FTscore, Carmelo scores a 2.67 while Nash scores a 2.36. Of course, Durant and Dirk still dominate the category.
The Advanced Bad News:Okay, I’ve been far too positive towards ESPN. This sounds almost too good to be true. What are the limitations of this method? Like I said, Z-Score happens to work well since we have the entire population of data. However, a simple glance at the data will show you that we are NOT working with normalized data, one of the assumptions in Standard Scores! Going one step further, I looked at the skew and kurtosis of each category, and they are off the charts, with the worst skew on blocks at 2.2 and kurtosis on FT% at 16.81.
In simpler terms, this means some standard deviations at the far ends may be inflated more than they should be. For example, Dwight gets a near 6 score in blocks, which statistically should not happen in only ~450 people. It’s like one in a million. So as with all advanced statistics, use them carefully!
In addition, I did a Principal Component Analysis (PCA) on the 9 factors. Turns out there’s such a strong negative correlation between Points and Turnovers, and modestly strong correlations between Points and other categories, it’s not even worth bothering looking at the TOV category! Stupid Yahoo!!
The Advanced Customization:So maybe you hate Turnovers. Maybe you hate my FGscore and FTscore. I think it’s also perfectly valid to try and dominate the 6 raw stat categories! It’s very intuitive, and any wins in FG%, FT%, or TOV is just gravy. In fact, I’ve done just this...
Putting it all together: We can analyze total season numbers, per game numbers, or per (36) minute numbers. I’ve proposed looking at standard scores of the 9 categories like Yahoo!, the 8 categories like ESPN, or the 6 raw stat categories. Well since we’re already working with Standard Scores... why not just add composite scores together? I did this with the completed 2008-09 data. So total season numbers help show how much a player contributed, per game numbers account for some injuries and such, and per minute numbers account for varying playing time. Since they’re all standardized now, I just add them together to get a super-composite score, for a really quick look at who did the best (i.e. which players I should
really be comparing during my 90 seconds to draft)!
After doing all this, and comparing it to 9, 8, and 6 category, it turns out there’s lots of correlation among them, but the analysis that made overall sense was... 6 category! Are you serious?! After all that work I did messing with TOV and FG% and FT%, you could essentially ignore them?
Well sorta. Mostly, it’s
Steven Hill’s fault. Because he played in only one game with 2 minutes played, his fantasy impact scales to the absurd (remember all that stuff about skew and kurtosis). But looking at only the 6 raw stats, even he can’t escape a more proper ranking!
Another way to avoid this, and possibly help further normalize the data: just take the top 200 player, or top 100, or whatever, and treat that as your total population, because lets be honest: no one’s putting Mario West or JamesOn Curry on their fantasy teams. Hell, to simplify things, take the top 4 players and do a pretend 2 person draft. From there, you can see what categories you’re taking, which you’re giving away, and which analysis to use.
The End:I hope this gave you some insight into stats and fantasy basketball. Of course, when it comes to injuries and rookies etc., you’re still on your own. This method I presented is highly useful to roto or h2h, and can be expanded or contracted at your liking, despite its limitations. Want to look at the past 3 years combined? The past month only? Go for it. Don’t just trust those pre-built rankings anymore, grab your favorite programming language/spreadsheet/abacus and find those undervalued and steal picks!
Other random notes: Yes, I graphed a ton of stuff while doing this. Tips I picked up:
- With the top picks, don’t over-value 3pt shooting. It is easy to pick that up later in the draft, or with waivers.
- As I implied before, FG% and FT% is pretty even down the board. Use Standard Scores to slightly suggest one guy over the other. e.g. If you pick up Dwight Howard, concentrate on FG% guys cause there’s probably no combination of players you can pick up to make up the FT% column.
- Blocks are sparse, but spread out down the board.
- Of the remaining stats, Steals and Points have the strongest correlation, and Steals is usually a close category (so every little standard score counts!). This is probably why drafting bots tend to eat up all the point guards early.
And finally, in good Basketbawful fashion, how does my ranking compare to ESPN’s for the worst fantasy player of the season so far?
-6.05 Primoz Brezec
-6.05 Jarron Collins
-6.01 Kwame Brown
-5.98 Eddy Curry
-5.97 Lindsey Hunter
For reference, Mario West has a -5.54.
Labels: fantasy sports, guest author, standard deviation, statistics, ugh this sucks I'm just going to use ESPN's Player Rater
Also, if I had to guess two of the worst five fantasy players, I would have probably said Eddy Curry and Kwame Brown. This pleases me to no end.
Primoz Brezec FTL!
Is Lindsey Hunter even still playing at this point?
Say there's Team A and Team B. Both teams have 2 players, one shooting 25% and one shooting 75% each, just like Yahoo! tells us. Who wins the FG% category?
Well the answer is not enough information. Just comparing 25% and 75% to the league average 50% seems to imply the teams are equal, but this may not be the case. Both teams probably did NOT shoot 50%.
Say Team A has someone who shot 300/1200 (25%) and 300/400 (75%). Team B has one who shot 150/600 (25%) and 450/600 (75%). Now which team wins the FG% category?
The answer is team B, shown by adding the categories together. Team A shot 600/1600 (37.5%) and Team B shot 600/1200 (50.0%). Using my FGscore method, Team A had scores of (.25 - .50)*(1200) = -300, and (.75 - .50)*(400) = 100, while Team B had -150 and 150. Adding them together, -200 is less than 0, so Team A was under average and Team B was spot on. Find the FGscore of everyone in the league, get the stddev, divide, and move on.
If anything, from all this, even though the ESPN player ranker is fairly solid, I hope I raised enough thought in its disadvantages, and no one ranking system should be outright trusted. I should have added that yes, it is biased towards fellating LeBron James's fantasy impact, as is standard practice at ESPN. (Note how limited the scores can reach on the negative end, around -6, yet LeBron can get a 17+ score. Skew-tastic!)
Are we assuming a normal distribution? Also, how does the median compare with the mean? Wouldn't we expect more people to be clustered at the bottom? I think that's a poisson distribution, but I only know the name, not how to use it...I would link to wikipedia, but it's seriously dense and technical.
I was thinking you could measure individual players' consistency using standard deviations -- let's say Dwight Howard gets an average of 13.1 rpg, but the standard deviation is 4, whereas Carlos Boozer gets 11.2 rpg but the standard deviation is only 1.8 (I'm making the standard deviations up completely, btw). Wouldn't Boozer more valuable because of consistency? You just win categories, you don't get extra for margin of victory...does this make sense?
If you looked at the basketball talents of the entire world, the NBA players would surely rate much higher than most random high schoolers and low level eastern European team players, etc. You would expect a fairly skewed distribution then. However, looking at just the players who are good enough to make it to the NBA, I would assume that there's a lot of "mediocre" talents in the Association("average" compared to other NBA players, but far and above 99% of the world's population of basketball players). There are several "good" players, and several "bad" players. And of course you have the outliers, such as LeBron James (who is far superior to most NBA players) and of course the extremely untalented by NBA standards lacktators, many of whom are also outliers. But outside of these two far ends of the spectrum, I would think the distribution would be fairly normal. A lot of average guys, a few below average, a few above average, and a handful of really bad and really good players.
If scoring was distributed normally, according to the data in the post, you'd expect 2.3% of the players to have scored -348 points or less this season...and as hard as some of them try, you can't really score negative points.
If you must know , here's what I got for 09-10 upto today:
Skew Kurt. Names
==== ===== =====
0.98 6.41 FG%
1.30 3.80 3P
-0.33 16.81 FT%
1.25 4.25 TRB
2.18 9.28 AST
1.06 3.88 STL
2.20 8.90 BLK
-1.00 3.60 TOV
0.97 3.51 PTS
Also, individual standard deviations would help H2H teams more than roto, but there's so many other factors to week-to-week performance, like number of games, home/away, days rest, etc., that it's not really as helpful as you think.
http://farm5.static.flickr.com/4028/4455139086_58c702356c_o.jpg
Thought that was a nice coincidence. :)
I think that may be the same technique, only using the past 3 games of course, and scaled from 0 to 1...
BTW, since this post I've expanded the scores to look at popular punt categories, such as FG and TO (Granger), FT and TO (Howard), steals (Lee), or blocks.
for example:
total score = 0.5*points + 0.8*rebounds - 2*TO
I'm wondering if you've come up with a formula to determine the coefficients for each category.