Monday, July 28, 2014

One Hall of a Bad Idea

This weekend, the Baseball Hall of Fame announced some election-law changes. Effective immediately, players will only have 10 years on the ballot to make their case before dropping out of consideration—not 15 years as it has been for most of the Hall's history.

Let's get the editorializing out of the way: I think this was the coward's way out for the Hall of Fame. The calls for the Hall to reform its voting process have gotten ever louder in recent years, and it didn't seem like they were listening. Now they've finally made a change, but it was a passive-aggressive one—the exact opposite of confronting the very real issues facing the election process. It was a change that allowed them to dodge the issues even more.

By cutting off one-third of the eligibility period, the Hall of Fame is essentially targeting individual players to make it harder for them to get elected. Most of these players are those tainted with PED use. Roger Clemens and Barry Bonds, who will be on the ballot this winter for the third time each, now have much less time to earn forgiveness from voters. Although I would have disagreed, the Hall could have come out and simply issued an election advisory—one that several voters have publicly said they wished for—that said steroid users were cheaters and should be ineligible for the Hall of Fame. Instead they invented a new rule that does the same thing, except it lets them pretend their hands are still clean, and it creates a permanent fix for a temporary problem.

I want to harp on that last point. In the Hall's obsession with appearing above the steroid fray, they've created a mechanism that will affect generations of players to come, as well as take out collateral victims in the present. Tim Raines, who will be on the ballot for the eighth time, has never been associated with steroids and will now probably fall off the ballot without being elected. They could have avoided this by making the change effective starting for candidates who are new to the ballot this year. But this wouldn't have spared them the embarrassment of the Bonds/Clemens/McGwire/Sosa debates for five more years. They valued their supposed decorum enough that they were willing to sacrifice a ballot full of players.

More than that: the Hall of Fame's actions implicitly said to some of its members, "We don't think you belong." Bert Blyleven, Jim Rice, Bruce Sutter, Duke Snider, Bob Lemon, Ralph Kiner, Dazzy Vance, Gabby Hartnett, Rabbit Maranville, Bill Terry, and Harry Heilmann were all elected in between their 10th and 15th years of eligibility. Under the new rules, they would have dropped off the ballot instead. One of the reasons the Hall made the change was to de-clog the coming logjam of candidates on the ballot. They could have instead decreed that a voter can vote for as many candidates as he or she wants, not just a maximum of 10, as the current rules state. Instead they decided to go more exclusive and solve the logjam by kicking more people off, not letting more in. This is not only an admission that Blyleven, Rice, Sutter, and their kin were mistakes, but also a denial of the size of Hall they already have. (This despite the fact that post-1990 baseball is dramatically underrepresented in the Hall of Fame already.) Like it or not, players elected from now on must meet a different, higher standard, making them somehow "purer" choices for the Hall than those guys. The Hall of Fame has succeeded in creating "tiers" of Hall of Famers.

There is one way—and one way only—that these election changes could actually be a good thing. There is one way that I could owe the Hall of Fame a massive apology.

That would be if the Hall of Fame is outsmarting all of us right now. If they've done their research and dug into the data and know for a fact the ripple effect that this change will have.

This change could redeem itself if it forces Hall voters (the BBWAA, in case you forgot) to change their behavior. As a close watcher of elections of any form, I've spent some time breaking down the trends of past Hall of Fame votes, and I'd say I know them pretty well. What I don't know is how those trends will hold up under different election laws. The new rules put us in new territory for predicting results. Will voters panic, knowing there is suddenly less time to elect players, and suddenly become more generous in allocating their votes? If so, will this be a one-time spike in this year's election, or will voters become permanently more open-minded?

A basic fact of Hall of Fame voting trends is that it often takes the full 15 years for players to build up to the 75% needed for election. Many eventual inductees started on the ballot at levels of support around 20% and added five or ten points each year to gradually climb. Take Blyleven, who attracted just 17.5% of the vote his first year on the ballot in 1998. He was finally elected in his 14th year of eligibility with 79.7% of the vote.

If the new rules had been in place while Blyleven was on the ballot, he'd have fallen off after 2007, when he got 47.7% of the vote.

Or would he? Maybe his timetable would have been accelerated as voters felt more rushed to consider his record. Could the Hall of Fame's new election rules cause steeper climbs than we're accustomed to seeing from candidates? Something like this?

I am honestly not sure. On one hand, I can't not believe that Tim Raines being two years away instead of seven from falling off the ballot will jolt more writers into taking his Hall case seriously. At the very least, more voters will sit down this winter and do some research into Raines's career where before they might have passed over his name, based perhaps on outdated memories or long-obsolete first impressions. And the accelerated timetable will surely force some voters to see that, now more than ever, a non-vote or a blank ballot are as affirmative an action as a "yes" vote. If the rule change makes it so that writers are comfortable with a 10-man ballot being the norm, rather than the conservative six votes per ballot that is the average today, then it will have been worth it.

On the other hand, many of the candidates on the ballot today are so polarizing that no degree of extra consideration will convince 75% of writers to vote for them. There is also little evidence that players currently get an abnormally large bump in support in their 15th year on the ballot, so why should they get one on their 10th now? Unfortunately, it doesn't look like writers consciously withhold their vote, but rather that the full 15 years are sometimes needed for a critical mass to "evolve" their thinking.

Maybe the Hall of Fame knows something we don't, and voters will change their behavior.

But given both the Hall's and the BBWAA's maddening track records, I'm not holding my breath.

Thursday, July 17, 2014

How Many Fans Does Each MLB Team Have?

This week, Harris Interactive came out with its annual baseball poll. Among many interesting findings (thoughts on instant replay, the racial and gender breakdown of MLB fans, etc.), the poll ranks franchises in order of their popularity with the public. (Some team from New York is number one.) What it doesn't do, much to my disappointment, is give us hard numbers on how popular each team is. Even with a sample size of 2,241 adults, it's impossible to get a large enough data set to be meaningful when there are 30 possible answers. Still, it would be nice to know—how far ahead are the Yankees, at number one, from the Red Sox at number two? Are we talking 25% support nationwide for the third-place Giants, or closer to 6%? Are there actual real live Miami Marlins fans out there? These are the burning questions.

These ponderings become relevant every year around this time when America votes on whom to send to the All-Star Game. Especially for the Final Vote, a strong regional base of support is as important to winning an All-Star berth as a GOP primary. MLB's county-by-county results maps confirm that geography is destiny; the candidates with the ties to the nation's most populous areas always seem to win. Is it really any wonder that this year's Final Vote winners both hail from Chicagoland, or that last year's had the Deep South and an entire country to themselves?

This got me wondering—what are most powerful voting blocs in baseball? Which teams own the most turf, or hold sway over the most people? We know this generally, of course, thanks to measures like the Harris poll. But to my knowledge, no one has ever undertaken the ambitious project to quantify how many millions of fans each team has. What team has the largest fan base—and how large is it?

Ultimately, the question is unanswerable—at least until the U.S. Census starts asking about baseball fandom. But we can approximate using two data sets that I know are out there: Facebook data and polling data.

Facebook data simply means people who have "liked" a given Major League team on Facebook. Facebook provided this data to the New York Times, where the Upshot created a amazing tool showing fandom by geography. Their map includes the top three teams, by percentage of MLB team "likes," by county and even down to zipcode. It's a beautiful and rich data set, but it's not perfect for our purposes.
  • The public-facing data posted on the Times website, at least, only provides the top three teams for each geography, leaving potentially hundreds of thousands of fans uncounted.
  • The map doesn't tell us how many baseball fans live in each county or zipcode—just the percentage of total baseball fans there that swear allegiance to X team and Y team.
  • An easy solution would be to multiply these percentages by each county or zipcode's population. That would assume everyone in the country is a baseball fan, however. As much as this should be true, it sadly isn't.
  • We could always scale the population figures down to 37% (a.k.a. the percentage of adults who told Harris that they are baseball fans). However, not all counties are created equal. Suffolk County in Massachusetts is about the same size as Oklahoma County in Oklahoma, but there are almost certainly more baseball fans in Suffolk. In short, deriving absolute numbers from the Upshot map requires a healthy dose of speculation.
  • In addition, Facebook data can be unreliable. Not everyone is on Facebook—it might skew to a younger demographic. People also don't always "like" the things they like, and many people will "like" a zillion things that they don't even like all that much.
  • Finally, and most importantly for our purposes, it would just be too damn hard to apply the data to this project. There are 3,141 counties in the United States, and it would take forever to manually multiply each county's Facebook data from the Upshot map by its population. I've contacted the Times to see if they have the data in exportable form but have yet to hear back.
That leaves us with polling data. While Harris was unable to provide result breakdowns by team nationally, there is a pollster that publishes them state by state: the good sports-obsessed folks at Public Policy Polling (PPP). Over the past few years, PPP has asked voters in 32 states about their baseball allegiances. I collected the data in this Google spreadsheet and multiplied PPP's percentage findings by the greater population that the poll's sample represents. In most cases, this was registered voters, except for Mississippi, Virginia, and West Virginia, where the polls surveyed likely voters. This yielded good estimates of the fan breakdown among the 131,204,273 Americans who are described by the 32 polls we have.

Team Fans
Chicago Cubs 9,960,809
Boston Red Sox 9,694,711
Atlanta Braves 9,496,603
New York Yankees 8,062,618
Detroit Tigers 6,086,696
Texas Rangers 5,158,274
St. Louis Cardinals 5,023,469
San Francisco Giants 4,836,995
Cincinnati Reds 3,912,900
Houston Astros 3,362,355
Los Angeles Dodgers 3,334,120
Philadelphia Phillies 3,080,515
Los Angeles Angels 2,828,442
Colorado Rockies 2,791,248
Pittsburgh Pirates 2,699,190
Milwaukee Brewers 2,597,501
Minnesota Twins 2,553,951
Cleveland Indians 2,283,969
Kansas City Royals 2,204,291
Arizona Diamondbacks 2,189,763
Chicago White Sox 2,088,561
Miami Marlins 1,701,702
Oakland Athletics 1,641,087
San Diego Padres 1,459,699
Baltimore Orioles 1,191,765
Tampa Bay Rays 1,173,900
Seattle Mariners 910,586
Washington Nationals 837,904
New York Mets 643,554
Toronto Blue Jays 0

Take these numbers for what they are—an incomplete answer to an unanswerable question, and a project that will always be a work in progress. Obviously, 18 states remain unpolled as to their MLB preferences, including rather important ones like Georgia and New York. They will seriously alter the numbers above, such as boosting the Mets' abysmal total and, most likely, launching the Yankees and Braves past the Red Sox and Cubs into a battle for America's most electorally powerful fan base. The 32 states we have polling data for are shaded in black:

As you can see, the Mariners, Orioles, Yankees, Mets, and Braves are going to be underrepresented in the current numbers. The Red Sox and Nationals probably are too, given the absence of several New England states and two-thirds of the DMV. The Phillies are similarly probably feeling the non-inclusion of New Jersey and Delaware. However, it's bad news for teams like the Rays, Marlins, Brewers, A's, Padres, and others—there are not a lot of places left for them to accumulate more fans.

Some other flaws with the polling approach:
  • Polls look only at people who are registered to vote (and, in this case, at only people who DID vote in Mississippi, Virginia, and West Virginia—even smaller universes). This is a majority of people in the country but by no means all of them. Millions of people not registered to vote are likely baseball fans, and their preferences not only aren't included, but we also wouldn't know how to extrapolate them. It's very possible that, because of the kind of person who registers to vote vs. doesn't, their tastes are materially different from those polled.
  • These polls only survey voters in the United States. This means foreign fans go uncounted, including—crucially—Canadian fans of the Toronto Blue Jays. (This is also a problem with the Upshot's baseball map—which limits itself to the US—although not necessarily with Facebook data inherently.)
  • A poll can only provide eight or so possible choices to the question, "What is your favorite MLB team?" before the question becomes too long and loses people. That means some fans' preferences won't be counted, although it's not as bad as the three-team limit in the Facebook data. Using the necessary discretion in choosing which teams to ask about also runs the risk of missing, say, a hidden pocket of Orioles fans in Minnesota.
  • Conversely, PPP almost always asks about the four teams with major national fan bases: the Cubs, Red Sox, Yankees, and Braves. This explains why they are so far ahead; if we were able to ask about all 30 teams, each one would gain a sprinkling of a few thousand to a few tens of thousands of fans in each state—enough to add up. Put another way, we really don't know what to do with the "Unknown/Non-Fan" group.
  • One advantage to polling vs. Facebook is that it lets people say they don't have a favorite baseball team, presumably because they are not fans of the sport. However, very few people actually take this option in the survey—certainly less than the 63% we would expect to. Therefore, these polls are probably counting people as fans who are only casual partisans or don't even care about baseball. A Bostonian may consider themselves pro-Red Sox even if he doesn't really care for the sport because, hey, why shouldn't I want them to do well? There's thus a concern that these numbers are higher across the board than the real number of "true" fans.