Sunday, September 27, 2009

The Origin of Sabermetrics

This is an interesting time to be a baseball fan, and I don't mean because it's September, and the playoffs are about to start. In the last ten or so years baseball has quietly undergone a revolution, the true depth of which is only now becoming apparent in mainstream coverage of the game. Most broadcasts have started to carry On-Base Percentage (OBP) alongside the old mainstays of Batting Average (Avg), Home Runs (HR), and Runs Batted In (RBI). This is a subtle addition, but it belies a fundamental change to baseball culture. The younger generation of baseball fans – and, more importantly – baseball executives and scouts and even some players – are understanding statistics differently. Most teams have come to realize that RBI are not, really, indicative of who's a good player, and that even widely accepted statistics like Earned Run Average (ERA) are limited when evaluating pitchers.

You may have heard the term “Sabermetrics.” This is where it comes from. Back in the 70s, a man named Bill James wrote an annual series of books called The Bill James Baseball Abstract. Self-published, and with almost no initial readership, James's ideas caught on, and he became something of a cult figure among baseball fans who were dissatisfied with traditional ways of thinking about the game. Perhaps the most fundamental argument in the Abstracts was that OBP was more important than, not only Batting Average, but everything else. Fans – and players – who had grown up without ever hearing their favorite announcer utter the words “on base percentage” found it hard to believe that this statistic could really matter that much, but to some people it made sense.

The argument, for the uninitiated, goes something like this. What is the object of the game of baseball? Of course, to score more runs than your opponent. Since we're concerned with offense for the moment, lets think about how runs are scored. In order to win a game, you must score at least one run, which means at least one hitter must reach base, and then travel around the diamond before three outs are recorded. So what's the key part of that run-scoring process? Traditional thinking would have said that traveling around the diamond is the key. “We want fast players, and players who can get to second, or third, or even home, without anyone else coming up. We also want players who are good at driving runners in.”

That's not an unreasonable position, but James had a more nuanced take: “before three outs are recorded” is way more important than moving around the diamond. The best thing a hitter can do may be hitting a home run, but really, as long as a batter doesn't make an out, he's done his job. In baseball, a team only has 27 outs, divided into nine innings, to score as many runs as it can. Each and every out is precious, because once they are used up, no runners can advance anymore, and no hitters can hit home runs. The end of an inning is a step closer to the end of the world.

So what statistic measures a hitter's ability to not get out? If a player doesn't make an out, he must reach base, whether by a hit, a walk, an error, or getting hit by a pitch. Regardless, a player who does one of those things has done his job. It turns out that On Base Percentage measures exactly this. A player with an OBP of .400 gets on base 40% of the time, and makes an out 60% of the time. OBP is the inverse of “Out Percentage,” which James suggested was the most important statistic in all of baseball.

Over time, James's further attempts to better quantify baseball led to the creation of the "Society for American Baseball Reasearch," or SABR. The study of baseball statistics became, then, Sabermetrics (the extra "e" added for quite reasonable linguistic reasons). This group of statisticians and thinkers was, unfortunately, composed almost entirely of fans. Hence, for a long time, they were kept out of the culture of the game, and the few baseball writers or scouts that joined up remained an oddity. It's not that teams were leary of this attempt to better quantify the game, it's that they didn't even notice. The primacy of Batting Average and RBI were so ingrained that anything else was an absurdity.

The origin of Batting Average is worth a quick look, at this point, because it demonstrates the power of habit. When baseball first started being played, attempts were made to quantify who was good and who was not. The reason for doing this, like in any sport where statistics are kept, was to help determine, basically, which players contributed more to their teams' ability to win games. Early scorekeepers agreed that walks - the main difference between Batting Average and OBP - didn't really help a team win, because they were kind of a neutral outcome. The pitcher didn't get the hitter out, but he also didn't really give up anything. Walks were also much rarer - as were strikeouts - because the modern conventions of four balls to a walk and three strikes to an out had not been established. While the numbers varied over the evolution of the game, it took as many as nine balls to walk a hitter.

So when the first statistics were being crafted, walks simply didn't count. You got a hit, or you didn't. Anything else (sacrifices, walks, hit by pitches) was not an "at bat." The idea was, at the time, that the purpose of the game was to put the ball in play, and that it reflected poorly on the hitter if he struck out or if he walked, because he didn't do his job.

If the purpose of statistics is to figure out who's a better player, however, OBP does a better job. Besides the conceptual arguments James offered, he also did some research. Not surprisingly, he found that the statistics that best corrolate to wins and losses over the course of a season were runs scored and runs given up (namely the difference between the two). Which statistic corrolated best to runs? I offer some data:

1976 MLB Team Leaders in Runs:

  1. Cincinnati - 857
  2. Philadelphia - 770
  3. Minnesota - 743
  4. New York - 730
  5. Boston - 715
  6. Kansas City - 713
1976 MLB Team Leaders in Batting Average:
  1. Cincinnati - .280
  2. Minnesota - .274
  3. Philadelphia - .272
  4. Kansas City - .269
  5. New York - .269
  6. Pittsburgh - .267
So where's Boston? Here's the 1976 MLB Team Leaders in OBP:
  1. Cincinnati - .357
  2. Minnesota - .341
  3. Philadelphia - .338
  4. New York - .328
  5. Kansas City - .327
  6. Boston - .324
This small example hardly proves the corrolation, especially because OBP corrolates to Batting Average (since most of OBP is Batting Average). But even here you can see that the OBP list has a lot more in common with the runs list than the Batting Average list does. Taken over every team in the league, over years and years of data, James found that the corrolation was about as close as you can get. It turns out that getting hits doesn't win games. Not making outs wins games.

As I said in the opening, the last ten years has seen this kind of thinking gain wider acceptance, largely because of the success of one team around the turn of the millenium: the Oakland Athletics. There's a whole book about that called Moneyball, by Michael Lewis, and it's a good read even if you're not a baseball fan. In short, the A's began to use sabermetric principles - combined with some good old-fashioned economics (find what is most undervalued, and buy it, whilst selling what is most overvalued). They won, as a result, lots and lots of games, and other teams started emulating them. Including, notably, the Boston Red Sox.

The most telling moment in this revolution in baseball culture was when the Red Sox hired none other than Bill James as an analyst and consultant for the team. A few years later they won their first world series in almost a century, and then won another in 2007 (at the expense of my Rockies, sadly). They're heading for the playoffs again this year, because they - unlike the A's - have the deadly combination of almost unlimited money and people who know how to spend it.

Most teams, however, now employ statisticians, and many even pay attention to what their statisticians say. Over time I'll inevitably talk about some of the modern statistics, how they work (or don't) and why they work (or don't). But I'm at the fringe of all of this, with only enough knowledge to appreciate the changes going on inside of the press boxes and clubhouses and, above all, general managers' brains. In the 1970's and 80's it would have been safe to say that baseball fans knew more about what really counts in the game than managers, players, or scouts. These days, that's not true, even though fans know more than they did in the 70's and 80's. The level of sophistication in analysis of the modern game would have been unimaginable just two decades ago. Some people argue that ruins the game. I would say it makes it richer. But that's a discussion for another time.

No comments:

Post a Comment