Baseball's All-Tme Best Hitters
Michael J. Schell
Princeton University Press, 295 pages.
(Available from amazon.com for $16.07 plus shipping.)
When I was a child I loved baseball. I played it, went to as many games possible (this was rural South Carolina in the 1930's so I didn't get to very many) and religiously listened to broadcasts. All of the local textile mills had semi-pro teams and I even went, and listened, to those games.
Since baseball is all about statistics, I conjecture that my love of the game engendered my later love of mathematics. The author of this book, who is an Associate Professor of Biostatistics at the University of North Carolina, may have been propelled into his career in mathematics in similar fashion. According to the jacket blurb he was already wondering how to rank hitters when he was a child and an avid fan of the Cincinnati Reds. The idea of ranking players in a more accurate fashion than can be done by appeal to the raw statistics found on sports pages is illustrated by what happened in the late 1960's. In 1968, Carl Ystremski of the Boston Red Sox led the American League in batting with a ridiculously low average of .301. In the National League, the situation was not much better. The game was being dominated by pitching. After the season the owners decided to bring hitting back to the game (after all, that's what the fans pay for, they reasoned). So they lowered the pitching mound from 15 to 10 inches, and shrank the strike zone (so that pitches above the belly button were now called balls). The result was that in 1969, and thereafter, batting averages (and number of home runs hit) soared. As the author states in his Preface, "The 1969 season provided direct evidence about how rule changes can affect batting averages from one year to the next. Thus, it seemed possible to me that batting averages decades apart could also differ for reasons other that absolute player ability."
After some deliberation, the author chose his criterion for defining the best hitter: lifetime batting average. This is only one possible measure of course, since it excludes runs batted in, extra base hits (especially home runs) and other hitting criteria from consideration. However this was the author's choice, and whether we accept it or reject it as the sole measure of hitting ability the book has a great deal to teach us. He chose 4000 bats as the minimum number for inclusion on the list, although a good case could be made for choosing 4000 plate appearances instead. (These differ from at bats in that walks and sacrifices are included.)
According to simple, unadjusted, lifetime batting average, the top five hitters of all time were Ty Cobb (.366), Rogers Hornsby (.358), Shoeless Joe Jackson (.356), Ed Delahanty (.346) and Ted Williams (.344). Babe Ruth's lifetime average was .342, placing him ninth on the list; number 15, Lou Gehrig, batted .340; Stan Musial, number 24, had an average of .331 and number 33 Joe DiMaggio's lifetime average was .325. Interestingly enough, Willie Mays, Mickey Mantle, Pete Rose and Hank Aaron don't even appear in the top 100, while Jackie Robinson, with a .311 lifetime average, ranks 71st. Tony Gwynn, at .340 and still counting, is 16th. (Schell doesn't explain the discrepancy between Cobb's generally accepted average of .367, as given in my CD-ROM encyclopedia and elsewhere, and the.366 figure which he quotes.)
Schell then goes on to correct the raw averages for four mitigating factors. The first of these he calls "Adjusting for late career declines." The idea is that players with very long careers, for example Mays and Aaron, tend to lose physical ability in their later years and thus they see their averages decline. Adjusting for this effect alone, Cobb, Hornsby and Jackson remain in the top three spots, but (Happy) Nap Lajoie and Tris Speaker take over the fourth and fifth places. Cobb's adjusted average moves up to .370, incidentally. This correction is made simply by including in the lifetime batting averages only the first 8000 at bats.
The second correction is called "Adjusting for hitting feasts and famines." Here the idea is that in some eras and some leagues players were better than in others. A player in a poor league would face poor pitching and poor fielding and thus have a higher average. Other factors affecting the batting average include rule changes as mentioned above and changes in equipment, for example "juicing up the ball" as well as the proliferation of night games and the introduction of artificial turf. (Hitters typically perform less well at night because it is more difficult to see the ball. The effect of artificial turf is not so clear.) The "feasts and famines" correction is made by comparing the yearly batting averages with the league averages and renormalizing. This modification when applied to the longevity-corrected list changes the top five hitters to Ty Cobb, Joe Jackson, Nap Lajoie, Pete Browning and Tris Speaker. Hornsby drops out of the top 10.
The third correction, "Adjusting for league batting talent," is based on the premise that in the early years of baseball high batting averages were less indicative of skill than they became later. This is due to the changes in the game (more emphasis on the long ball, greater tendency to play for the "big inning," racial integration, etc. Averages going back to 1875 enter into baseball's official statistics.) Here for the first time the author applies sophisticated statistical methods like moving averages, standard deviations and variances, all of which are carefully explained in sidebars oriented towards the layman. These explanations are, of course, superfluous for the readers of this journal, but would be useful if some of our readers decide to pass their copies of the books on to their children (or, as in my case, grandchildren). I do recommend this book for all ages, by the way, starting with high-schoolers.
After the league batting talent adjustment, the top hitter is still Ty Cobb, but his average drops from .370 to .346. He is followed in the top10 by Tony Gwynn (still active), Rod Carew, Stan Musial, Ted Williams, Rogers Hornsby, Joe Jackson, Nap Lajoie and Honus Wagner.
We now come to the final correction, "Adjusting for the ballpark." The rationale here is that some ballparks are easier to hit in than are others; since teams play half of their games at home, members of teams playing in the easy ballparks will have an unfair advantage. (Consider the case of the Colorado Rockies, who have been in existence for only eight seasons, and who have already won four batting championships: Andres Galarraga, Larry Walker (2) and Todd Helton. The Rockies' home ballparks are considered easy to hit in because of the high altitude and, thus, thinner air. Indeed, the Rockies' home batting average (and won-lost percentage) is considerably higher at home than on the road. (The ballpark effect would have lowered Gallaraga's 1993 league-leading batting average of .370 to .334 and given the title to Tony Gwynn.) The author uses, and explains, linear regression models for this analysis.
The result of all of this is a list of the top "adjusted" hitters. There are some surprises here. Tiny Gwynn moves into first place (recall that he was in 16th place on the raw data list); Ty Cobb drops to second, and he is followed by Rod Carew (up from 28th), Joe Jackson, Rogers Hornsby, Ted Williams, Honus Wagner (originally 29th), Stan Musial, Wade Boggs (from 23rd). Nap Lajoie and Tris Speaker. Lajoie and Speaker original ranked 18th and seventh respectively. Willie Mays now makes the list (No. 13) as do Hank Aaron (No. 26), Mickey Mantle (No. 24) and Pete Rose (No. 34). Also Jackie Robinson moves up 31 places, to No. 40.
There is much more in the book than I have been able to discuss. Such topics as team batting averages; batting averages by positions; left-handed batters vis-à-vis right-handed batters; on base percentages; etc. etc. One of the most interesting portions of the book is Chapter 11, In this chapter the author, using his own assessment of the players as well as criteria invented by other authors (which he explains), suggests changes in the Hall of Fame membership.
I thoroughly enjoyed this book in a quasi-professional way. At one time I dropped out of graduate school briefly and became a professional sports writer, but decided against that as a career due to the extremely poor pay ($40 a week in 1952). I have often wondered if I made the right decision. But whatever, I do strongly recommend this book to the sports fans in our readership, and especially those who love statistics.