« Previous Entry |
Back to Blissful Knowledge
| Next Entry »
March 06, 2008
SUPER CRUNCH, POPPED
I am currently reading Ian Ayres' Super Crunchers, and it's quite good. But in the introduction, he makes some mistakes regarding one of his examples of the analysis of large databases (the "Super Crunching" of the title). Specifically, he cites the now-famous example from Moneyball about how Billy Beane instituted a new draft philosophy for the 2002 draft, allegedly de-emphasizing the opinions of scouts in favor of a more statistically-driven approach. Ayres specifically cites the example of Jeremy Brown, the infamous "fat catcher" whom scouts hated but was drafted anyway on Beane's orders due to his great hitting stats in college. (This blog may have discussed Moneyball once or twice.
First, while Michael Lewis' account of the A's approach to the 2002 draft - especially regarding Brown - was an all-time classic of journalism, the philosophy of that draft has not held up well. Specifically, Ayres wrote that Brown "has progressed faster than anyone else the A's drafted that year," and then cites his brief 2006 callup. Those two sentences create a very misleading impression (especially the "has" tense). Brown did rise through the high minors quickly and was (according to Moneyball's epilogue) the first 2002 draftee invited to the A's major-league spring training, but Brown's career stalled soon thereafter. Fellow 2002 draftees Nick Swisher, Joe Blanton and Mark Teahen all reached the major leagues long before Brown's 2006 debut. Those three have gone on to major league careers of varying levels of success, but Brown only appeared in 5 games before being designated for assignment in 2007 (probably around the time Ayers' book was going to press). Brown in fact just announced his retirement.
More generally, the data-driven approach taken by the A's in the 2002 draft was not particularly successful. Another major part of their philosophy, as detailed by Lewis, was the near-categorical rejection of high school players in favor of college players, supported by old research conducted by Bill James among others. Well, we now know that those conclusions haven't been accurate for a while. ($$) The A's themselves have in recent years drafted many high school pitchers, generally regarded as the riskiest possible category of prospect. Moreover, as Derek Jacques notes ($$), most of the other prospects specifically identified in Moneyball as draft targets identified through the A's statistical analysis did not come close to making the majors. It is incorrect to say that scouts' importance to the identification of prospects has decreased in the years following Moneyball's publication - if anything, the opposite is true.
Finally, I'm not sure that Ayres picks the righ theoretical example to illustrate the data-mining that is at the heart of his book. While there are thousands of baseball prospects considered for drafting every year, the differences in their playing contexts (high school vs. college, different areas of the country and levels of competition, etc.) work against the idea that a large database of common baseline experiences can be constructed and analyzed. Baseball people look at their statistics, but the contexts are so different as to make it difficult to analyze usefully in the aggregate - which is what "Super Crunching" is about.
But sabermetrics does present a really good example of what Ayres is looking for: the efforts in recent years to build better defensive metrics. Whether it's David Pinto's "Probablistic Model of Range," Bill James and John Dewan's "Plus-Minus System" or an alternate model, the new measures of defensive performance rely on analyzing thousands of plays in the field. So let's pretend this was the example Ayres meant to cite.
Posted by Dr. Manhattan at 12:29 AM | Permalink