Friday, January 10, 2014

Hall of Fame Post Mortem

If you're anything like me, after Wednesday's Hall of Fame election, you asked yourself one question:

What just happened?!

Or maybe a better question would be, what didn't just happen?
  • Three players were elected to the Hall of Fame for the first time since 1999.
  • A fourth, Craig Biggio, missed out on induction by just two votes.
  • Jack Morris's time on the ballot expired, ending one of the great debates in baseball Twitter history.
  • Sixteen people didn't vote for Greg Maddux.
  • Armando Benítez and Jacque Jones got one vote each.
  • A man with 3,000 hits and 500 home runs didn't even get enough votes to survive on the ballot next year.
It was an unpredictable day, although that didn't stop us from trying. Twitter user @RRepoz aggregated all the public ballots prior to the announcement, and I took a stab at projecting final vote totals based on these "exit polls." Both proved relatively accurate on Election Day.

The exit polls deviated from the actual results by an average margin of error of 3.3 percentage points; the median error was 2.9 points. They had some big misses: Tim Raines, whom they overestimated by 8.2 points; Barry Bonds, whom they overestimated by 7.6; and Curt Schilling, whom they overestimated by 7.3. But the exit polls, for the first time ever, also nailed two players' vote totals exactly: Morris and Sammy Sosa.

Otherwise, many of the deviations from the exit polls occurred in predictable directions, if not predictable magnitudes. That was the rationale behind my Hall of Fame vote-projection model, which sought to apply an adjustment to the exit polls based on their margins of error in previous years. (For example, Lee Smith is always underestimated by exit polls, while Raines is always overestimated; this is due to the nature of which voters do not release their ballots.)

My model was better at predicting the final vote than the raw exit polls, although not by as much as it could have been. The average error of my projections was 3.0 points; the median deviation was 2.6 points. Due to the nature of the analysis, my biggest misses were on first-time candidates who had no polling-error history to go off. Using a pseudoscientific process that looked at the performances of similar players, I created adjustment factors for these candidates as well, though in retrospect this probably did more harm than good. I overestimated Mike Mussina by 8.4 points, overestimated Tom Glavine by 6.1 points, and underestimated Jeff Kent by 6.4 points. It was a valiant effort, but I think I'll be scrapping this method of predicting first-time candidates in 2015.

Remove the five first-timers, though, and my basic prediction method actually validated quite well. Among the 17 candidates with vote histories to go off, my average deviation was a much better 2.3 points, and my median error was all the way down to 1.6 points. The only two who gave me trouble were Schilling and Morris, and no calculation method could have nailed these two's very unusual performance on Hall Election Day. Morris, as mentioned above, hit his exit-poll percentage exactly, which is very uncharacteristic for him; he has always outperformed his polls, and by bigger and bigger margins every year. Schilling, on the other hand, had the opposite problem; he had only one year of data to base a prediction on, so there simply wasn't enough information to know how nonpublic voters truly felt about him.

Nevertheless, there are always ways to improve, and we should never stop looking for them. This post mortem would be pointless if it didn't answer the question, "How can I make this model better next year?"

One excellent idea, suggested to me by multiple people, is to calculate the adjustment factors differently—more specifically, to weight more recent data more heavily, rather than relying on a simple average. This makes a lot of sense, and I was enthusiastic about this idea until I took a closer look after the election. Comparing my actual model with an alternative one in which the 2013 error counted for more revealed that recent error was no more predictive than error over time! Out of the 11 2014 candidates who had more than one year of past voting data, six actually moved further away from their ultimate vote total using the alternative, weighted model. In other words, although clearly neither is perfect, the long-term average of a player's deviation from the polls is more predictive than what they did in the year prior.

However, this just proves that new data aren't any better than an entire data set; it doesn't say anything about the oldest data, and when they start to go bad. Next year I'll have the choice whether to calculate adjustment factors from two-year polling error, three-year polling error, four-year polling error, or even five-year polling error. Using numbers from 2010 to predict something about voting in 2015 definitely seems like a bit of a stretch; could using four-year-old numbers, like I did this year, also be using information that is past its expiration date?

I recalculated my projections using adjustment factors averaged from just the past three years of voting history and made a startling discovery: the average error dropped to 2.9 points, and the median error plummeted to 2.1. Using three-year polling error this year would have produced more accurate results, suggesting that four-year-old data have indeed outlived their usefulness. (Curious to see if we could do even better, I also tested a version that used an average of just the previous two years; this produced an equivalent average error to the four-year average—3.0 points—but a lower median error—2.2 points. That's still worse than the three-year calculation, though, indicating that three years is the sweet spot.) Therefore, I will use three-year calculations for my 2015 projections—accounting for the exit-poll error in 2012, 2013, and 2014 only.

Another tweak I will make to the experiment will be a subtle, yet important, change in how I calculate the polling error itself. Currently, I subtract a player's percentage on public ballots (the polls) from his percentage on all ballots (the actual vote). However, it would be more precise to subtract his percentage on public ballots from his percentage on private ballots (i.e., all ballots minus public ballots). This is because "turnout" for the exit polls (i.e., the number of ballots publicly revealed) has increased dramatically in recent years. In 2012, we only know of 114 public ballots out of 573 total (19.9%); in 2014, a whopping 208 people out of 571 (36.4%) revealed their ballots beforehand. This necessarily creates some error because, for instance, some of the +7.9% error for Jack Morris in 2012 was eaten into by the 94 additional ballots that were public in 2014 but private in 2012. Put another way, if someday 98% of voters make their ballots public before Election Day, my adjustment factors will be pretty useless; we'll already know that there can be only a minuscule amount of error in those polls. The very error represented by my adjustment factors by definition gets smaller with a larger sample size in @RRepoz's exit poll.

Finally, one common denominator in both my projections' errors and the polls' errors was that we both guessed too high. Most players ended up getting fewer votes than we projected they would—whether because of the controversial Rule of 10, or old-school voters protesting the Steroid Era, or for some other reason. This suggests automatically building in a one- to two-point negative adjustment factor in addition to the player-specific one. However, a closer examination reveals that it is disproportionately the first-time candidates dragging the ballot in this direction; they over-performed in exit polls by an average of 3.7 points. Rather than try to devise my own method of projecting these fickle ballot rookies—an endeavor that has failed two years in a row now—perhaps I should simply dock them each a few points next year and call it a day.

No comments:

Post a Comment