## Futility and Sabermetrics

So, there is a new blog at the (Minneapolis) *Star-Tribune* website called Sabermetrics 101.

The first column discussed Pythagorean projections, and, boy, what an uninspiring response.

The first comment, by callmestupid, reads:

I don’t get why that’s cool…..If you already know the outcome….Runs scored for the year and runs against….Then you already know the games won/lost…So why do the math? It seems to me you can tweak it till you come up with something that works. Give us something that can predict wins and losses before they happen and I’ll be impressed

I responded to that comment with this long ‘graph:

To callmestupid: it matters and is cool for a couple of reasons. (1) This sort of “pythagorean” projection is a better predictor of a team’s record in the following season than their actual record is (see 2009-10 Seattle Mariners; in 2009 they way overperformed their runs scores/runs allowed numbers and thus their “regression to the mean” in 2010 should have been expected rather than having been a surprise). (2) At any point in the season, if you wish to make a projection of how well a team is actually playing, this formula gives you/us a basis for evaluation, though the later in the year you wait, the more accurate the projection (small sample size caveats apply); such a run-differential-based projection better predicts the relative quality of teams than their actual record–due to, among other things, luck. Also, John, the formula is more accurate if you use 1.83 as the exponent rather than 2 (squaring). That is, you raise runs scored and runs allowed to the exponential power of 1.83 rather than simply squaring. This is established in the literature, but see the Hardball Times website for a better explanation.

My responses elicited the following comment from MyjahLeesa:

How does this actually predict anything? All I see here is a measure of things that have already happened–that’s not a prediction! It doesn’t give me any new information, it’s just putting the information in a slightly different form… I think that is the inherent limitation of statistics, they are all backward looking. They can’t tell you how all the tangible aspects of the game are causing these numbers just by multiplying the numbers together in different ways. And I think understanding the tangible aspects are key to making more useful predictions anyway. To me, calling these numbers predictions is a little like the tail wagging the dog, no?

Okay, I guess. But what kills me is that my explanation gets a thumbs down and both of the other comments get thumbs up. (!?!)

As you know, Pythagorean projections are better predictors of future performance than actual winning percentage over the course of the rest of the season (at least, those really smart kids over at the always awesome *Baseball Prospectus* tell me this is so; their info tends to pan out so I trust them; also there is some formal statistical support for this–see the section called “Theoretical Explanation”).

Hey, I can’t tell you *how* electricity makes my computer operate, but I *can* tell you that my computer works a lot better when plugged than when it’s not. In the same way, I can’t explain *how* the Pythogorean projection provides accurate predictions (within a standard error of +/- 4 wins), but I *can* tell you that ever since I started playing around with it in, oh, 1987 that it does a pretty good job of predicting wins and losses.

The response to my comments is sort of funny and sort of sad. The answerer says “it’s all backward looking,” which is kind of true but–deep breath–*so are ALL empirical explanations and the subsequent predictions they generate.* That is how social scientific theories are justified, for example: you collect your data set, evaluate it, and then you see if historical conditions conform to your hypotheses. If so, then they are considered more likely to be true in the future. And in any event, they are *descriptions*, and with accurate descriptions you have a better chance of making a valid prediction. Oh, whatever….

Where the rubber really hits the road, *arguing that the past isn’t relevant to the future completely invalidates the collection of ANY baseball statistics. *That is, every time you hear someone–a commentator, someone in your fantasy baseball league, whomever–say, “This guy is only hitting .250 this year, but he’s a .320 career hitter,” they are assuming that past performance predicts future performance. Any time someone says, “This pitcher had a 2.35 ERA last year, but this year it’s 4.20, so what gives,” s/he is assuming that the past tells us something about what we should expect in the future. So, if you stick to the whole “the past is meaningless for the future” position, you reject ANY sort of statistical analysis, like, say, actuarial tables, economic forecasts, expectations that today will be like yesterday, etc. No one lives their lives that way.

[Oh, regarding that pitcher with the ERA that blew up: one should probably look at the pitcher’s BABIP to see if his defense is letting him down, and then take a look at his defense-independent pitching stats, like FIP and xFIP.]

While the map is not the territory, and statistics can’t provide the ability to make perfect predictions, there is a difference between a wild-ass guess and making an inference rooted in some knowledge about how things are and how they’ve been.

Therefore, I consider the response to my response kind of, to be as nice as possible, *ill-conceived*. To say that I–and a host of other people way, way, way more talented than I will ever be–am letting the tail wag the dog is simply, uh, silly. No, those are not just some random numbers that we’re multiplying (first of all, a very well-trained spreadsheet does all the math). They are runs scored and runs allowed, the building blocks of wins and losses.

I am now getting off my soapbox and going to listen to a Jonah Keri–Dave Cameron podcast, two guys who are down with sabermetrics and worth every baseball fan’s attention.

## Leave a Reply