Basketbawful

Monday, March 22, 2010

Fantasy Basketball Stats: Standard Deviation and You

Hello Basketbawful readers! Due to some circumstances, I will be revealing my end of the season fantasy basketball analysis a bit early. Although my writing style is typically humor and sarcasm with some stats thrown in (okay, more like sarcasm and pessimism), today I’d like to take a more refined look at the stats of fantasy basketball.

The Intro:

So another year of fantasy basketball is winding to a close. Maybe your team got pounded by injuries; maybe your team had Dirk, Nash, and David Lee and cruised to victory (like mine). There are many different methods out there to look at and evaluate player performance, and there are lots of ranking systems. Sure LeBron was obviously #1, but what about down the list? Do you really trust those pre-rankings? Today I’m going to talk about a method of evaluating the numbers, so hopefully during next year’s draft you can use your 90 seconds scrambling for injury and team information while having some confidence in the numbers to expect.

The Method:

We are already quite comfortable with using averages in sports stats. LeBron scored 29.9 points per game. Dwight grabbed 13.6 rebounds per 36 minutes. So instead of jumping to PER or RAPM or some other complex analysis, why not go to just the next step with standard deviation? In fantasy, we have the entire population (all players that have logged minutes in an NBA game), and all we really care about is choosing the guy that is better than what the other teams have. Standard Deviation could fit this need!

Well let’s not get ahead of ourselves. For example with Yahoo!, you get all the raw number totals and averages, and even their special “O-Rank” and “Rank”. Why expand beyond that? Well the problem is, when you sort by FG%, or TOV, things start looking strange. Is Marc Gasol’s 58.3% going to help your team more than David Lee’s 55.2%? Just how bad is Dwight Howard’s 60.3% FT shooting going to kill that category? Kinda hard to tell by eyeballing it. Even with the raw numbers: just how much will having Steve Nash on my team dominate the assists category?

The Good News:

Enter: Standardize. If you really don't want to do math, then I’ve still got good news for you: this is all done by ESPN’s Fantasy Basketball Player Rater. In fact, if you are quite satisfied using just the ESPN Player Rater, you probably can stop reading the article now. Here you can see all the Standard Scores in each category, and they are added up to make the final column as a composite score (yes, this makes more sense than adding percentages together randomly *coughHOLLINGERcough*).

For example: LeBron has scored 2033 points as of this post. The league average is 472.5 and the standard deviation is 410.25 pts. So (2033 – 472.5) / 410.25 = 3.8. Meaning LeBron is 3.8 standard deviations above the league average. For those not familiar with standard deviations, a score of 1 puts you above ~84.1% of the population, 2 puts you ~97.7% above, and 3 puts you ~99.9% above, and 4+ is outstanding. Isn’t this what you really want to know on draft day? You can find overall contributors with a glance, and see what needs you are lacking and pickup specialists without having to guesstimate the raw numbers.

Another benefit of Standardizing is the use of negative standard deviations, so you can see when a player is really hurting your team!

The Workarounds:

Okay so the bad news here is ESPN only shows the 8 categories. If you’re playing with TOVs, how does that fit in? Also, how do I calculate the FG% and FT% numbers since they aren’t raw numbers?

Well here’s where we start doing things for ourselves. Pickup your favorite script of choice, or start copying and pasting CSV text from basketball-reference/your favorite website. Now then, turnovers are easier: since it works as a negative statistic, I simply found all the Standard Scores then changed the signs.

For FG% and FT%: I personally believe ESPN doesn’t give enough weight to the amount shot. Shouldn’t LeBron shooting 50.0% at 20.2 FGA/g have more impact than Varejao shooting 57.1%, but only 6.4 FGA/g? Well I think so, which is why I normalized first, then weighted by shots taken before dividing by standard deviation. My FGscore is defined as:

And I standardize the FGscore (average is already zero, so really I’m just dividing by standard deviation). So LeBron ends up with 2.37, and Varejao with 2.07, not that anyone would think of drafting the latter over the former. But in any case, now we can properly rank players by their FG%, so all the lacktators with 1.000% FG% filter to the bottom.

Same thing with FT%: Is Nash's 94.1% (2.7 FTA/g) or Carmelo's 83.1% (9.3 FTA/g) helping you win the category more? ESPN puts Nash over ‘Melo, but using my FTscore, Carmelo scores a 2.67 while Nash scores a 2.36. Of course, Durant and Dirk still dominate the category.

The Advanced Bad News:

Okay, I’ve been far too positive towards ESPN. This sounds almost too good to be true. What are the limitations of this method? Like I said, Z-Score happens to work well since we have the entire population of data. However, a simple glance at the data will show you that we are NOT working with normalized data, one of the assumptions in Standard Scores! Going one step further, I looked at the skew and kurtosis of each category, and they are off the charts, with the worst skew on blocks at 2.2 and kurtosis on FT% at 16.81.

In simpler terms, this means some standard deviations at the far ends may be inflated more than they should be. For example, Dwight gets a near 6 score in blocks, which statistically should not happen in only ~450 people. It’s like one in a million. So as with all advanced statistics, use them carefully!

In addition, I did a Principal Component Analysis (PCA) on the 9 factors. Turns out there’s such a strong negative correlation between Points and Turnovers, and modestly strong correlations between Points and other categories, it’s not even worth bothering looking at the TOV category! Stupid Yahoo!!

The Advanced Customization:

So maybe you hate Turnovers. Maybe you hate my FGscore and FTscore. I think it’s also perfectly valid to try and dominate the 6 raw stat categories! It’s very intuitive, and any wins in FG%, FT%, or TOV is just gravy. In fact, I’ve done just this...

Putting it all together: We can analyze total season numbers, per game numbers, or per (36) minute numbers. I’ve proposed looking at standard scores of the 9 categories like Yahoo!, the 8 categories like ESPN, or the 6 raw stat categories. Well since we’re already working with Standard Scores... why not just add composite scores together? I did this with the completed 2008-09 data. So total season numbers help show how much a player contributed, per game numbers account for some injuries and such, and per minute numbers account for varying playing time. Since they’re all standardized now, I just add them together to get a super-composite score, for a really quick look at who did the best (i.e. which players I should really be comparing during my 90 seconds to draft)!

After doing all this, and comparing it to 9, 8, and 6 category, it turns out there’s lots of correlation among them, but the analysis that made overall sense was... 6 category! Are you serious?! After all that work I did messing with TOV and FG% and FT%, you could essentially ignore them?

Well sorta. Mostly, it’s Steven Hill’s fault. Because he played in only one game with 2 minutes played, his fantasy impact scales to the absurd (remember all that stuff about skew and kurtosis). But looking at only the 6 raw stats, even he can’t escape a more proper ranking!

Another way to avoid this, and possibly help further normalize the data: just take the top 200 player, or top 100, or whatever, and treat that as your total population, because lets be honest: no one’s putting Mario West or JamesOn Curry on their fantasy teams. Hell, to simplify things, take the top 4 players and do a pretend 2 person draft. From there, you can see what categories you’re taking, which you’re giving away, and which analysis to use.

The End:

I hope this gave you some insight into stats and fantasy basketball. Of course, when it comes to injuries and rookies etc., you’re still on your own. This method I presented is highly useful to roto or h2h, and can be expanded or contracted at your liking, despite its limitations. Want to look at the past 3 years combined? The past month only? Go for it. Don’t just trust those pre-built rankings anymore, grab your favorite programming language/spreadsheet/abacus and find those undervalued and steal picks!

Other random notes: Yes, I graphed a ton of stuff while doing this. Tips I picked up:

With the top picks, don’t over-value 3pt shooting. It is easy to pick that up later in the draft, or with waivers.

As I implied before, FG% and FT% is pretty even down the board. Use Standard Scores to slightly suggest one guy over the other. e.g. If you pick up Dwight Howard, concentrate on FG% guys cause there’s probably no combination of players you can pick up to make up the FT% column.

Blocks are sparse, but spread out down the board.

Of the remaining stats, Steals and Points have the strongest correlation, and Steals is usually a close category (so every little standard score counts!). This is probably why drafting bots tend to eat up all the point guards early.

And finally, in good Basketbawful fashion, how does my ranking compare to ESPN’s for the worst fantasy player of the season so far?

-6.05 Primoz Brezec
-6.05 Jarron Collins
-6.01 Kwame Brown
-5.98 Eddy Curry
-5.97 Lindsey Hunter
For reference, Mario West has a -5.54.

Labels: fantasy sports, guest author, standard deviation, statistics, ugh this sucks I'm just going to use ESPN's Player Rater

20 Comments:

Dan B. said...

Well, at least this makes more sense than the stuff we covered in my Statistics class in college.

Also, if I had to guess two of the worst five fantasy players, I would have probably said Eddy Curry and Kwame Brown. This pleases me to no end.

3/22/2010 8:28 AM

chris said...

That bottom 5 fantasy player list looks like a who's who from the lacktion report, doesn't it? (And Mario West's Warios are what prevent him from being up there.)

Primoz Brezec FTL!

Is Lindsey Hunter even still playing at this point?

3/22/2010 9:26 AM

AnacondaHL said...

Crap, I forgot to put in another test case to justify my seemingly random handling of FG% and FT%.

Say there's Team A and Team B. Both teams have 2 players, one shooting 25% and one shooting 75% each, just like Yahoo! tells us. Who wins the FG% category?

Well the answer is not enough information. Just comparing 25% and 75% to the league average 50% seems to imply the teams are equal, but this may not be the case. Both teams probably did NOT shoot 50%.

Say Team A has someone who shot 300/1200 (25%) and 300/400 (75%). Team B has one who shot 150/600 (25%) and 450/600 (75%). Now which team wins the FG% category?

The answer is team B, shown by adding the categories together. Team A shot 600/1600 (37.5%) and Team B shot 600/1200 (50.0%). Using my FGscore method, Team A had scores of (.25 - .50)*(1200) = -300, and (.75 - .50)*(400) = 100, while Team B had -150 and 150. Adding them together, -200 is less than 0, so Team A was under average and Team B was spot on. Find the FGscore of everyone in the league, get the stddev, divide, and move on.

If anything, from all this, even though the ESPN player ranker is fairly solid, I hope I raised enough thought in its disadvantages, and no one ranking system should be outright trusted. I should have added that yes, it is biased towards fellating LeBron James's fantasy impact, as is standard practice at ESPN. (Note how limited the scores can reach on the negative end, around -6, yet LeBron can get a 17+ score. Skew-tastic!)

3/22/2010 10:24 AM

Anonymous said...

Does the -6's of our favorite lacktators (esp. my boy Jarron Collins) mean that they are all one in a million too?

3/22/2010 10:26 AM

Unknown said...

Ah, fantasy sports. Also known as "D&D for Jocks".

3/22/2010 10:30 AM

Onandonymous said...

"Meaning LeBron is 3.8 standard deviations above the league average. For those not familiar with standard deviations, a score of 1 puts you above ~84.1% of the population, 2 puts you ~97.7% above, and 3 puts you ~99.9% above, and 4+ is outstanding."

Are we assuming a normal distribution? Also, how does the median compare with the mean? Wouldn't we expect more people to be clustered at the bottom? I think that's a poisson distribution, but I only know the name, not how to use it...I would link to wikipedia, but it's seriously dense and technical.

I was thinking you could measure individual players' consistency using standard deviations -- let's say Dwight Howard gets an average of 13.1 rpg, but the standard deviation is 4, whereas Carlos Boozer gets 11.2 rpg but the standard deviation is only 1.8 (I'm making the standard deviations up completely, btw). Wouldn't Boozer more valuable because of consistency? You just win categories, you don't get extra for margin of victory...does this make sense?

3/22/2010 11:23 AM

Dan B. said...

I actually think it makes sense to assume the population of NBA players would be relatively normal. (Not exactly, but close enough)

If you looked at the basketball talents of the entire world, the NBA players would surely rate much higher than most random high schoolers and low level eastern European team players, etc. You would expect a fairly skewed distribution then. However, looking at just the players who are good enough to make it to the NBA, I would assume that there's a lot of "mediocre" talents in the Association("average" compared to other NBA players, but far and above 99% of the world's population of basketball players). There are several "good" players, and several "bad" players. And of course you have the outliers, such as LeBron James (who is far superior to most NBA players) and of course the extremely untalented by NBA standards lacktators, many of whom are also outliers. But outside of these two far ends of the spectrum, I would think the distribution would be fairly normal. A lot of average guys, a few below average, a few above average, and a handful of really bad and really good players.

3/22/2010 11:31 AM

Onandonymous said...

Right, so in terms of talent the distribution could be normal, but in terms of say, scoring, it isn't. "Normal" is a fancy way of saying bell curve -- and the per game statistics are skewed pretty hard towards the bottom, so it's not a bell curve.

If scoring was distributed normally, according to the data in the post, you'd expect 2.3% of the players to have scored -348 points or less this season...and as hard as some of them try, you can't really score negative points.

3/22/2010 11:50 AM

starang said...

Oh, people can come up with statistics to prove anything. 14% of people know that.

3/22/2010 12:07 PM

Dan B. said...

Misunderstood you. I would imagine, yes, there should be some right-skewing to a total points scored charting of all NBA players, mostly thanks to the bench players getting less time to score, and the revolving door of garbage time guys.

3/22/2010 12:19 PM

AnacondaHL said...

Onandonymous: If you read my "Advanced" section, I describe how I did look into skew and kurtosis risk, to show that we (and ESPN) are assuming normal distribution, but it's really not.

If you must know , here's what I got for 09-10 upto today:

Skew Kurt. Names
==== ===== =====
0.98 6.41 FG%
1.30 3.80 3P
-0.33 16.81 FT%
1.25 4.25 TRB
2.18 9.28 AST
1.06 3.88 STL
2.20 8.90 BLK
-1.00 3.60 TOV
0.97 3.51 PTS

Also, individual standard deviations would help H2H teams more than roto, but there's so many other factors to week-to-week performance, like number of games, home/away, days rest, etc., that it's not really as helpful as you think.

3/22/2010 12:27 PM

chris said...

OT: Team Nellieball is for sale, and grand master of The Oracle organization, Larry Ellison (he of the flight curfew violations at San Jose International - and I THINK his family got their name on a wing of the UC Davis hospital near Sacramento's Oak Park neigbhorhood) is showing interest.

3/22/2010 1:52 PM

Dan B. said...

AnacondaHL -- I just got my latest issue of eWeek Magazine delivered to my desk (I work in IT). In a review of the latest version of OpenOffice, I noticed the following (as seen in the online version of the article):

http://farm5.static.flickr.com/4028/4455139086_58c702356c_o.jpg

Thought that was a nice coincidence. :)

3/22/2010 1:55 PM

AnacondaHL said...

That is uncanny as fffff-

I think that may be the same technique, only using the past 3 games of course, and scaled from 0 to 1...

3/22/2010 2:01 PM

Barry said...

Positively brilliant!

3/22/2010 5:40 PM

Team Captain said...

Awesome. I've always wondered why the concept of standard deviation wasn't used very much in sports. I always thought that it would be a great measuring stick for certain situations - ie evaluating a goal-line back in football. Measuring his consistency and the % chance that he'll gain positive yards could be very useful figures in comparing similar players.

3/23/2010 3:48 PM

Anonymous said...

So no more Worst of the Weekend/Night? I miss those posts...

3/23/2010 5:13 PM

victor said...

this is a great post...any chance you can share your formula for calculating the final player rating?

10/12/2010 7:40 PM

AnacondaHL said...

Victor - Simply add together the scores from whatever category you want to count up.

BTW, since this post I've expanded the scores to look at popular punt categories, such as FG and TO (Granger), FT and TO (Howard), steals (Lee), or blocks.

10/13/2010 7:42 AM

victor said...

actually, what i meant was a final score for a player that takes into account all the categories.

for example:

total score = 0.5*points + 0.8*rebounds - 2*TO

I'm wondering if you've come up with a formula to determine the coefficients for each category.

10/13/2010 5:31 PM