What Do April's Stat Leaders Tell Us About the Rest of the Season?
We’re getting perilously close to the most relieving time of the baseball season. It’s not when Target Field is guaranteed* to not be snow-covered at game time. Although that is a magical time as well it arrives well into May. No, it’s when baseball nerds can stop uttering their own unique ode to the statistical gods: “granted, it’s a small sample size.”
Granted, baseball fans only really say this for two reasons: Either they’re trying to disguise their own overzealous and unsubstantiated idea or ridicule someone else’s. But the sentiment comes from a good place, right? Lots of crazy things can happen after only a few games. The Cubs might not be terrible. Miguel Cabrera could be leading the AL in stolen bases. (I know it sounds ridiculous, but conceivably there is some universe where that might happen with a non-zero probability). But as April turns to May, most of those crazy things slowly fall away, only to be forgotten by the 100-loss seasons and perennial all-stars that replace them.
But then there’s the occasional surprise. An Evan Gattis or two will beat the odds and continue mashing. A team of rag-tag misfits will stare down the Yankees in October. That’s why we love sports. We know exactly what’s going to happen, because the numbers and our guts tell us it’s the only rational outcome, but then precisely the opposite happens.
This year, the treacherous sample size question is hanging directly above Chris Collabello’s head. He was leading the AL in doubles and RBIs in late April, he’s a 30 year-old (essentially) rookie. He’s one of those feel-good stories who fights his way to the major leagues every few years, and it’s an accomplishment that he even made it that far, and then usually drifts away again. But Collabello, who played his way through the beer leagues on the east coast for almost no money for years until the Twins picked him up for the minor leagues, has started 2014 in a big way. Do we think he can last and be an impact player for the rest of 2014?
For that matter, how much can we count on league leaders at the end of April to be at the top of the list at the end of the season? Historically, are April’s leaders finished once the summer heat sets in? To answer that question, I checked the average end-of-season ranking of players who led the league in a statistic at the end of April. If they don’t end up near the top at the end of the year, then we can attribute it to the small sample size. If the April leaders keep coming up at the end of the year, we can think of it as getting a good head start, or that those players are simply very good.
For clarity’s sake (the economics grad student in me knows that methods are as important as results), I’ve got a few notes on where these numbers came from. I looked at the league leaders for the month of April each year from 2000-2013. If the April leader didn’t have enough plate appearances to qualify for the batting title (3.1 per game, or 502 total), I removed them from the sample.
The list of names is quite interesting. There are some you’d expect to see: Albert Pujols, Alex Rodriguez, Barry Bonds, Miguel Cabrera. There are some I’d never heard of, like Brad Fullmer and Mike Bordick. There are also the players that got off to wickedly fast starts, only to be injured. Had Matt Kemp not been taken down by hamstring and rotator cuff injuries, he could have had given Miguel Cabrera some competition for the Triple Crown. He hit 12 home runs, knocked in 25 runs, and had a .417 batting average. It was a stratospheric month.
April’s home run leader ends up, on average, 11th in the MLB at the end of the season, which of these stats is the most predictive of the season-long performance. RBI is the next most predictive.
On the other hand, the early leader in batting average only ends the season, on average, in 19th place. wOBA – weighted on-base average, a catch-all statistic that measures different types of hits and weighs them based on their likelihood of becoming runs – also ends up on the less predictive side.
Intuitively, these results make sense. Once a hitter has had a terrific month, hitting 10 or 15 home runs, those home runs don’t go anywhere. It’s not like Paul Konerko, April 2010’s leader, could lose April’s home runs in May or June. A short hot streak can propel a player towards the top, where even if they hit no more home runs, they only slowly slide back down the rankings.
Batting average and wOBA are much less a guarantee. They must be sustained throughout the season, or else they will not last. After a hot April, a 0-20 slump can drop a batter from the top of the rankings very quickly. Practically speaking, opposing teams begin to adjust to avoid these hitters’ strengths.
Even 20th in the league in batting average still delivers over a .300 average. That’s nothing to sneeze at, but it’s just not Joe Mauer- or Miguel Cabrera-good. Inevitably, leading in the ‘averages’ statistics is more difficult than leading in ‘counting’ statistics. This in itself doesn’t mean that you should put more weight on some statistics than others (although the case for not paying much attention to RBIs is pretty good), but it just changes how the race for league leader works throughout the year. For everybody’s favorite future-movie-topic Chris Colabello, this has got to be encouraging. He’s in the company of Chris Davis (2013), Miguel Cabrera (2010), and Albert Pujols (2009) as an RBI leader. If Colabello ends the season anywhere near where those guys did, with 100+ RBI, I think both he and the Twins will be more than happy.