Wednesday, August 31, 2011

Win Probability and Tennis

As a huge baseball statistics nerd, I love  Well, I don't really read many of their articles, nor do I engage in their often contentious comment threads.  Rather, I love the thing that made it famous: win probability graphs.  If you haven't seen one, they look like this.

Hey look, the Royals lost a game they should have won!

The premise is simple.  Given any game situation, you can calculate - based on the number of expected runs scored for both the remainder of the inning and the remainder of the game - how likely each team is to win.  Certain events, like home runs, tend to increase your odds significantly, while others, like strikeouts with the bases loaded and two outs, tend to hurt them.  Of course, context becomes extremely important, and the graph will fluctuate more in closer games like the Royals versus Tigers game above.

Now the cool part is not even the graph itself, but the "Leverage Index" beneath it.  Basically, the amount of change that is possible/likely in any given situation is mapped beneath the graph.  This leads to a clear, quantitative mapping of high and low leverage situations.  This in turn, leads to cool calculations of things like "clutch," which fangraphs calculates by comparing a player's performance in low leverage situations to his performance in high leverage ones (instead of the standard, and unsatisfying "close and late").

All of this is old hat to baseball fans, and my point isn't to recap what you already know or could find on fangraphs.  Rather, I want to put out a call to coders, web developers, and tennis fans to do this for tennis.  I've seen a man named Jeff Sackmann produce a "Tennis Win Expectancy Graph" for a particular match, but as far as I know, there is no widely available tool to let tennis fans calculate win probability on their own, much less a live scoreboard like on fangraphs.

If anything, win probability for tennis should be simpler than for baseball.  Rather than having to calculate run expectancies, your set of variables are much, much smaller.  If you use the tour wide statistic, as Sackmann does, that the server will win 64% of points, you can, theoretically, easily produce a win probability algorithm.  Turning that into a graph is obviously not too difficult, given Sackmann's own work.

So let's make it happen! Tennis needs would benefit from win probability graphs as much as baseball.  Questions that win probability (and leverage index) could help answer include:
1) Do great players "raise their level" on key points, or are they just always better?
2) How much of a difference in point-by-point leverage is there between a three and five set match?
3) What is the "most important" point in a given match?
There are countless others, so I won't list them all.  But even these should whet the appetite of the statistically-minded tennis fan.

Now you're probably thinking: "great, so do it yourself."  If only I could.  I have spent most of the day watching the U.S. Open and playing around with Excel.  But while the mathematics are not beyond me (though they are harrier than you might think), the coding is.  Perhaps someone else wants to take the baton?  If so, I'd be a happy contributor, cheerleader, partner in the process, to whatever degree I can.

Hey, fangraphs isn't just graphs, it's also writers.  If someone starts "tennisgraphs," I'll be happy to write for it.

No comments:

Post a Comment