Tuesday, February 15, 2011

NBA League Size and Competitiveness, Part the Last: Calculating Competition

Today we embark on a journey through the perilous land of inventing your own statistics.  As a wrap-up for this series on NBA competitiveness and league size, I wanted to create a kind of competitiveness index based upon the regular season results in any given season.  That has proved to be a more difficult task then I originally anticipated, for reasons which will become clear.

First off, though, why would I want to do something like this?  Well, as I discussed in Part One, I'm reading The Book of Basketball, by Bill Simmons, and was struck by how definitive his assessment of which NBA seasons were competitive and which weren't is.  In particular, some of the early seasons in the NBA sparked comments like, "Everyone had a good team back then."  I wanted to try to figure out if he was right because, as sabermetrics has taught us, often people who are passionate and well-informed fans of a sport still don't really understand what's going on.

For example, in baseball it was long believed that carrying a .300 batting average alone was sufficient to make you a good hitter.  "A .300 hitter" was - still is - an honorable appellate, as well as a since qua non of baseball success.  What about a player like Juan Pierre, though, whose career .298 average puts him close enough to be called a .300 hitter? Is he really any good?  Old-time baseball wisdom would say yes.  He's fast, he hits for a high average, and he's the kind of guy that people assume is a good fielder, whether he is or not.  But even offensively, you can dive deeper into his batting lines and see that he's a deeply, deeply flawed player.

You see, Juan Pierre does not really draw walks.  Nor does he hit for power.  So despite a career .300 average, he sports a .347 OBP - not bad, but not good enough for someone who aspires to be an integral part of a team's success.  Moreover, his .366 career slugging percentage means that he's a singles hitter.  Those many hits he does generate aren't in the gaps or over the fence (as evidenced by his 14 career homers in almost 1600 games).  Now, the traditional baseball viewpoint would be that all of Pierre's singles are made up for by his stolen bases...  Which is fair, except he has lead the league in getting caught stealing 6 times, and stolen bases only 3 times.

 Which is all to say that being a .300 hitter alone used to look great, and still looks great.  But looks can be deceiving.  No one should confuse Juan Pierre with a great hitter.  Similarly, sometimes a league might look competitive without actually being competitive.  And so I embarked on this little blog-project to prove Simmons right and/or wrong.

In Part Two I found and discussed that, while defining competition - let alone assessing it - might be very difficult, we can at least see that, as the league gets larger, so too does the standard deviation of winning percentage.  From one perspective that means that the league is getting less competitive - in the sense that teams are less jumbled together - but from another it means the league is getting more competitive - in the sense that there are more elite teams in any given season.  And that's exactly what my work for today's post shows.

What I did was develop a formula for "competitiveness," using the number of above-.500 teams, the mean of their winning percentages, and the standard deviation of their winning percentages.  My reasoning was this: if a higher percentage of teams are above .500 in a given season, the league is more competitive.  Similarly, the higher the average winning percentage of those teams, the more competitive the league is.  Lastly, the more condensed those winning percentages are (the lower the standard deviation), the more competitive the league is.  The advantage of this approach, of course, is that we can completely ignore any team that finished .500 or worse.  Those teams, I reasoned, don't really count (even if many of them do make the playoffs, thanks to the NBA's "everybody makes it" attitude towards the postseason).  The disadvantage, as the statistically acute among you will see, is that the components I have selected here are all closely related to standard deviation of winning percentage league wide.

What does that mean?  Well, let me show you.

X - Number of teams, Y - "Competitiveness"
My formula for competitiveness is messy, but worth sharing.  Brackets indicate the separate components, which I tried to normalize so that 1 was more or less "average":

[(0.5 + Percentage of teams above .500)] x [10 x (mean above .500 - .5)] x [(stdev above .500 - mean above .500) / (stdev above .500 + mean above .500)] x 50

I multiplied the whole thing by 50 just to pull it up into a more readable and intuitive range.  Basically, 50 is normal (as you can see, the trendline above is close to, though not quite at, 50), while anything above 50 is a particularly competitive season, and anything below 40 is uncompetitive.

Now this graph alone doesn't show you anything problematic.  Like our graph from Part Two, it has a weak, but present trend going upwards, and...  Wait.  It looks very similar to that graph.

So I graphed "competitiveness" by year, and made the following line graph:

"Competitiveness" (Y) by season (X)

 I then did the same with Standard Deviation of winning percentage:

STDEV of Wpct (Y) by season (X)
Now you may notice that these two graphs look almost exactly the same.  With a sinking feeling - starting to realize the folly of my ways - I graphed the two against each other:

Competitiveness (Y) against STDEV of Wpct (X)
 The result is unambiguous.  My "Competitiveness" ranking basically tells me that when the standard deviation of the winning percentages league wide are high, the competitiveness is also high.  Which, of course, is the opposite of what I was suggesting in Part Three.  Yeah.

The result is hardly surprising, as I said, because of the components in my formula.  While the percentage of better-than-.500 teams may not have much bearing on standard deviation of winning percentage, obviously when the mean of the winning percentage of teams above .500 is higher, so too will be the standard deviation of winning percentage of all teams.  Meanwhile, the final component of my formula - accounting for standard deviation of above-.500 winning percentages - will be inversely related to standard deviation of winning percentages league wide, but not enough, obviously, to disrupt the high correlation between "competitiveness" and SD of winning percentage league wide.

But really, this only goes to show that "competitiveness" is a highly ambiguous term.  Where one fan might think the most competitive season is the one where all of the teams are bunched together, another might prefer the one with five great teams and five terrible ones.  It's really a matter of perspective.

Where Simmons makes his determination, then, is probably the best place: skill of players in the league.  While you do have to be careful here - because all evaluation of player skill is heavily influenced by the relative skills of his contemporaries, and things like changing league sizes mess with our understanding of what is good and what is great - probably the best way to assess the competitiveness of the league at any point is to assess the overall skill of the players in the league at a given time.  That's a much more challenging project, but I can imagine going through players and seeing where great careers overlap, and figuring out when talent has been at its apex and nadir.  Of course, Simmons does that kind of thing for a living - though without relying too much on numerical analysis and going more with his perception, a more-than-fair, if perilous, approach.  John Hollinger also does that for his living, relying absolutely on numbers.  So between the two of them, you can probably get a good sense of what's going on.

Finally, if you want to see the nuts and bolts of my work - messy as it is - I've posted my workbook to GoogleDocs.  Do with it what you will.

No comments:

Post a Comment