Tuesday, January 7, 2014

For Hall of Fame Candidates, It Helps to Be a Pitcher, Clean, and White

In my last post predicting the outcome of tomorrow's Hall of Fame vote, I applied an "adjustment factor" to existing exit polls of the Hall election to arrive at projected vote totals for each player. However, there was one critical shortcoming of my method: because it was based on the historical error of past such "exit polls," there was no way to predict how first-time candidates will perform. In 2014, Greg Maddux, Tom Glavine, Frank Thomas, Mike Mussina, and Jeff Kent are on the ballot for the first time, and we have no clue how they will fare beyond their current polling numbers. Will they, like so many players before them, over- or under-perform those polls by a statistically significant degree?

Last year, I attempted a linkage analysis to answer this question, but it was inconclusive. This year, we try something simpler; while we can't look at actual historical results for these players, we can look at historical results for players who are similar to them. We have trustworthy exit-poll data for each of the past four Hall of Fame elections; here is how each player's final vote total differed from his exit poll, sorted by his year on the ballot.

In the far-right column, you can see the average deviations for individual players that we've already used as adjustment factors for the 2014 vote. In the bottom-most row are average deviations not by player, but for specific years on the ballot. A clear pattern jumps out: candidates under-perform their exit polls in the early stages of their candidacy, but, after their sixth year on the ballot, the switch is flipped—they begin to over-perform their expected totals.

This makes logical sense. The voters who aren't covered by the exit polls tend to be more conservative; they're writers who are no longer covering baseball or don't believe in the transparency of releasing their ballot publicly. Retired voters especially are likely to prefer players they witnessed and covered—too curmudgeonly, or not familiar enough with more recent players, to recognize their greatness. There is also probably a "distance makes the heart grow fonder" aspect to it; it's easier to misremember players who played 15 or 20 years ago as better than they were.

So already we have a useful tidbit of information. Players who are appearing on the Hall ballot for the first time tend to do a little bit worse than polling suggests—an average of 2.0 points worse. That's an adjustment we can apply to all first-time candidates.

But, other than all being first-timers, Maddux, Glavine, Thomas, Mussina, and Kent have little else in common. To learn more, we must separate players into categories. Let's start with the simplest: hitters and pitchers.

The average for all the exit-poll deviations for each category is in that category's lower-right corner, in the highlighted cell. Pitchers, it turns out, are looked upon more favorably than hitters by voters not accounted for in the exit polls. But we can get even more specific, by breaking the players down by position:

Relief pitchers, with their gaudy totals of the deeply flawed save statistic, are the most beloved by the "old-school" nonpublic voters; they get an average boost of 6.1 percentage points above and beyond their polling numbers. Starters are no slouches either, though; they earn a 2.2-point boost. Most offensive positions do not have significant changes, and some even come with sharp penalties—notably the middle infield (though it is worth noting that the sample size we're calculating from is very small). These numbers are, at worst, an indicator of which direction (up or down) we can expect a first-time candidate to move; at best, they're position-specific adjustment factors.

What other categories can we separate players into? Well, since this is Hall of Fame voting, we'd be remiss not to compare PED users and PED non-users. Since voters also penalize some players for suspected PED use—despite little to no evidence for it—I'll also put them into a category of their own. Apologies in advance for these categories; they're necessarily subjective, because most of the Steroid Era is based on hearsay.

Unsurprisingly, nonpublic voters are not kind to PED users. Exit polls typically overestimate them by 2.5 points. Suspected users Mike Piazza and Jeff Bagwell are also docked some points, though not as many; there is probably less universal condemnation of them even among conservative voters. "Clean" players (and I put this in scare quotes because there really is no way to know whether they were truly PED-free) are actually slightly underestimated by exit polls.

(There could be some contamination in these data, though. "Clean" players are also exclusively the players who have been on the ballot for eight years or more. Old-school voters might be voting for them at higher rates because of the age bias discussed earlier; alternatively, maybe the age bias is due to retired voters' aversion to steroids. That said, Mark McGwire's evolving margin of error suggests that time may heal all wounds. From a –13.2% adjustment factor in his fourth year on the ballot, McGwire cut that to –0.7% in his fifth year, and in his sixth and seventh years he has actually benefited from nonpublic ballots. This may be a ray of hope for steroids-tainted candidates; voters' nostalgia may win out after they get tired of feeling vindictive. That will certainly be something to watch for in Clemens's and Bonds's numbers this year.)

There's one other way we can categorize players that's been nagging at me: by race. Despite its best efforts since integration, baseball has had trouble becoming a truly colorblind sport, still questionably throwing around terms like "horse" and "scrappy." I wondered if old-school voters' dislike for Tim Raines or Barry Larkin had anything more to it than the types of players they were.

Without assigning blame or making accusations, the numbers do prove that nonpublic ballots are skewed toward white players. They gain an average of 1.4 points on top of their exit polls, while African Americans lose 1.1 points and Hispanics lose 1.0. There are certainly prominent exceptions within each race (e.g., Lee Smith), and thus I think factors like position are more predictive. However, it is not inconceivable that a strand of racism remains among the BBWAA's oldest and crustiest members.

So, finally, let's apply these findings to our first-time candidates. Through some quick-and-dirty arithmetic, we can arrive at a back-of-the-napkin adjustment factor for each of them. These will be incorporated into my final Hall of Fame election forecast on Wednesday, but the crude nature of this analysis will mean there's a greater margin of error for these adjustment factors than for the ones based on historical fact. Without further ado:

Greg Maddux
–2.0% (first-time candidate) + 2.2% (starting pitcher) + 0.7% (clean) + 1.4% (white) = adjustment factor of +2.3% [note: because Maddux is so close to 100%, we cannot add the full 2.3 points; his new vote projection will simply bring his total as high as it can be given that we know there is one ballot against him]

Tom Glavine
–2.0% (first-time candidate) + 2.2% (starting pitcher) + 0.7% (clean) + 1.4% (white) = adjustment factor of +2.3%

Frank Thomas
–2.0% (first-time candidate) + 0.0% (designated hitter) + 0.7% (clean) – 1.1% (black) = adjustment factor of –2.4%

Mike Mussina
–2.0% (first-time candidate) + 2.2% (starting pitcher) + 0.7% (clean) + 1.4% (white) = adjustment factor of +2.3%

Jeff Kent
–2.0% (first-time candidate) – 7.2% (second baseman) + 0.7% (clean) + 1.4% (white) = adjustment factor of –7.1%

Throwing these in with the adjustment factors that we already calculated, we have a new and up-to-date vote projection for 9pm ET on January 7:

No comments:

Post a Comment