Saturday, September 6, 2014

What's Wrong With Rafael Soriano?

Last night, Nationals closer Rafael Soriano blew a three-run lead, allowing two home runs to the Phillies (to add insult to injury, one was hit by Ben Revere). The outing before that, he blew another save. The outing before that, he got the save, but gave up a run in a rocky ninth inning.

Soriano's recent struggles have Nats fans screaming bloody murder about removing him from the closer's role. But they're failing to see Soriano for what he is: one of the game's best closers going through a rough patch of infinitesimal sample size.

What's wrong with Rafael Soriano? The short answer is nothing. The longer answer is random variance, factors out of his control, and the rest of this blog post.

In the first half, Soriano had a 0.97 ERA and a 0.811 WHIP. Since the All-Star break, he has had a 6.98 ERA and a 1.759 WHIP. These numbers measure results—which are important, to be sure. But pitching involves a lot of luck; it's been shown that pitchers have little to no control of what happens to a ball after it makes contact with a bat. So to measure the actual quality of Soriano's pitching, we need to fall back on what are known as his peripheral stats.

The main thing pitchers do have control over is throwing balls and strikes, and by extension walks and strikeouts. In the first half, when he was pitching lights-out, Soriano allowed 2.68 walks per nine innings and 8.76 strikeouts per nine innings—both quite good, if not elite, numbers. So far in the second half, Soriano has allowed 3.26 walks per nine innings and 8.84 strikeouts per nine innings. As a function of innings, then, his walks have increased slightly—which is not ideal, but 3.26 BB/9 is still above average—and his strikeouts have actually increased—which is obviously desirable.

Perhaps a better way to measure it is the percentage of batters Soriano has walked and struck out. In the first half, Soriano walked 8.2% of batters and struck out 26.7%. That's an average walk rate and an excellent strikeout rate. In the second half thus far, he's walked 7.5% of batters—an improvement—and struck out 20.4% of batters, which still rates as pretty good.

So Soriano isn't pitching significantly worse when it comes to balls and strikes. What has been worse (from Soriano's perspective) is hitters' contact with those pitches. In the first half, opposing batters had a .153 average, .222 on-base percentage, and .226 slugging percentage off Soriano and hit 0.24 home runs off him every nine innings. Since the All-Star break, opposing batters have hit .321/.387/.530 and 1.4 HR/9.

What accounts for this huge difference? As it turns out, mostly luck. We can use two stats to measure how unlucky a pitcher has been: BABIP (batting average on balls in play) and HR/FB (home runs allowed divided by fly balls allowed). These stats jump around a lot among players and among seasons for a given player, but with no discernible pattern: in other words, they are random variance, and they tend to normalize at a league-average rate over time. (This is because things like the quality and positioning of fielders is more important to deciding whether a ball drops in for a hit, and wind or the dimensions of a ballpark can turn a lazy fly ball into a home run—or vice versa—fairly easily.) Typically, batters "should" hit .300 on balls in play, and 9.5% of fly balls "should" be home runs. Truly talented pitchers can skew these numbers downward—particularly HR/FB—but not significantly.

Soriano's BABIP in the first half was .207 and his HR/FB was 2.2%. These numbers are almost impossibly good and suggest that Soriano was due for some regression; a player can't sustain a 0.97 ERA when it's built on unsustainable secondary stats like those.

But here in the second half, Soriano's BABIP has been .387, and his HR/FB has been 10.0%. Those numbers are also outliers, just like his first-half numbers—just in the opposite direction. A .387 BABIP is particularly unsustainable, and there is no way hitters can keep being this lucky off Soriano in the long term. Meanwhile, the HR/FB rate is also above the rate at which batters normally hit home runs—and it's also well above Soriano's career HR/FB rate of 7.7%. As we saw with the 1.4 HR/9, a big problem for Soriano in the second half has been the increased number of home runs off him, but if we dig deeper we can see that this is atypical for MLB hitters and downright aberrant for batsmen hitting off Soriano.

A final stat that can affect pitchers' luck is the rate at which they strand runners on base. League average LOB% (left-on-base percentage) is about 71% of runners stranded; it too can bounce around a lot for individual seasons for no apparent reason, but it always ends up around 71% in the long haul. In the first half, Soriano stranded 90.9% of runners, which is really lucky—many more of those runners should have crossed the plate. But in the second half, he has stranded just 62.9% of runners, which is fairly unlucky. A lot of those runners had no business scoring.

For Nats fans, it's certainly torture to see Soriano struggling like this—and doubtlessly it has been for Soriano too. But he can take solace in the knowledge that it's not really his fault—from the pitching side of things, he's continued to do things about as well as he's done for his entire career. When luck enters into the equation as much as it does with pitching, though, there are going to be stretches when even the best execution leads to crummy results. Small sample sizes (Soriano has pitched just 19.1 innings during the second half) can produce huge outliers; that's why baseball plays 162 games a season.

Although he has materially pitched the same throughout the year, the disparity between Soriano's extremely lucky results early on and his unlucky ones today is naturally going to catch the eye—and ire—of the fan base. Advanced stats show that, no, Soriano was never as good as he appeared to be in the first half—but he's also not nearly as bad a pitcher as he looks right now. The real Rafael Soriano is somewhere in between: a solid relief pitcher whose true value indeed lies around the average of his first and second half performances. Need proof? His ERA as it stands today (3.04) is almost a dead ringer for his FIP (3.16).

Wednesday, September 3, 2014

Primary Endorsements Show Where Massachusetts Dems Are Drawing Their Battle Lines

As any member of an organized political party will tell you, they're really not all that organized. My home state of Massachusetts is one of several places where one-party rule has caused the political divides that really matter to be intra-party ones. (Kansas is another example.) When outsiders make cracks about how liberal and/or monolithic Massachusetts is, they're showing how little they know about the Democratic Party's extreme factionalism there. In actuality, the party's seemingly giant legislative majorities (36–4 in the State Senate, 125–29 in the State House) are more the result of British-style coalition building between several smaller parties, sometimes with very different identities and policies but yet all, for reasons practical to historical to programmatic, considering themselves Democrats.

Hoping to pen a more ambitious analysis of these subparties in my home state, I canvassed every endorsement issued in the Massachusetts Democratic primary this cycle for the four contested statewide races: governor, lieutenant governor, attorney general, and treasurer. (In a state with one-party rule, after all, the primary dictates which clan sits on the throne for the November coronation.) Unfortunately, the data ended up being too thin to expose the detailed fault lines I wanted to map out, so my original idea for this post fell by the wayside. However, there's still a place for the data that do exist, so I'm leaving them here for Massachusetts political junkies to play with.

Any Democratic politician or organization that endorsed this cycle is listed in this Google doc—also embedded at the bottom of this post. You can see whom they endorsed in each of the four major races. Most only endorsed in one or maybe two races (that's why the data was too thin), but if they endorsed in three or four, their full voting slate is laid bare in that spreadsheet.

Just for fun, we can cross-analyze how much endorsements line up with each other. Do all Tom Conroy for treasurer supporters also support Warren Tolman for attorney general? (Yes, it turns out, but not all Tolman supporters support Conroy.) This gives us a tease of some of the aforementioned Massachusetts Democratic factions—witness, for instance, how endorsements for progressives Don Berwick and Maura Healey line up pretty well—but mostly it's just a confusing mess.

In the chart below, the rows represent the universe of people we are considering to calculate the percentages (i.e., the denominator), and the columns represent the characteristic being measured within that universe (i.e., the numerator). Read the chart by asking, "What percentage of [row] are [column]?" So, for instance, you can find what percentage of Steve Grossman endorsers also endorsed Steve Kerrigan by going to the Grossman row and the Kerrigan column. Percentages are out of the people who endorsed in both the former and latter races being considered, not just in the former race; as a result, percentages all add to 100% and there is no cell for "[Candidate] endorsers... ... supporting no one." Endorsements of withdrawn candidates, like Joe Avellone, don't count for chart purposes (even though they're listed in the spreadsheet for comprehensiveness).

In the chart, read down a column to see the percentage support that a given candidate got among universes of different endorsers (and compare it to the share of the race's total endorsements received by that candidate, displayed in the bottom row); read through a row to see how endorsers of a given candidate split their support in the other three races (useful if you're undecided in one of the races and want elite support to guide your vote).

Spot any interesting patterns? Leave them in the comments!

Sunday, August 10, 2014

Two Missing Hawaii Precincts Probably Can't Save Hanabusa

On Saturday night (or Sunday morning for those of us on the East Coast), Senator Brian Schatz and Congresswoman Colleen Hanabusa fought to a draw in the Democratic primary for Hawaii senator. As of 4pm ET on Sunday, the Associated Press had the results at 113,789 for Schatz and 112,154 for Hanabusa, with 245 of 247 precincts reporting.

So what's the matter with those last two precincts? Well, they're not going to report for a very long time, since they didn't actually vote on Saturday. Hawaii Paradise Community Center and Keoneopoko Elementary School polling places had to be closed after the roads to them were blocked, rendering them inaccessible on Saturday. (The two precincts are in the Puna district of the Big Island, which sustained heavy damage from Tropical Storm Iselle on Friday.) Their voters who didn't get the chance to vote early will now vote by mail over the next couple weeks instead of going to the polling places—setting up a Swing Vote–type scenario where this small group of voters could determine who goes to the US Senate.

The question everyone is asking today is simple: who has the advantage with this last group of voters? If you go by demographics, it's Hanabusa. Hawaii politics has long been defined by its racial tensions; in a state where Republicans are a mere nuisance to be brushed aside, the real dividing line is between white politicians and Asian and Native Hawaiian politicians. If you put stock in the theory of identity politics, that means we should look at the racial demographics of the outstanding areas. According to the US Census, the communities in the two precincts are almost perfectly evenly divided:

Census-Designated Place White Asian Native Hawaiian/Pacific Islander Two or More Races Other
Hawaiian Paradise Park 3,958 2,148 1,341 3,679 278
Hawaiian Beaches 1,360 637 634 1,534 115
Total 5,318 (33.9%) 2,785 (17.8%) 1,975 (12.6%) 5,213 (33.2%) 393 (2.5%)

These data are inconclusive, since there are a lot of multiracial residents. (Portions of the precincts also lie in unincorporated areas, so these population statistics are not totally complete.) However, election data might break the tie by revealing which race of candidate they are inclined to vote for. In 2012, Hawaii saw a situation similar to the current election: a close Democratic primary for a US Senate seat that featured one white, male candidate and one Asian, female candidate. Here's how the voting broke down in the two precincts:

Precinct Ed Case Mazie Hirono Other
04-01 Hawaii Paradise Community Center 696 1,203 24
04-02 Keonepoko Elementary School 554 1,145 47
Total 1,250 (34.1%) 2,348 (64.0%) 71 (1.9%)

Hirono, the Asian candidate, crushed Case 64.0% to 34.1%—indicating an electorate 6–7 points more "pro-Asian" (to put it bluntly) than the state as a whole, where Hirono won the primary 56.8% to 40.3%. If we assume these precincts will behave the same way in 2014, we can estimate that Hanabusa will win them approximately 56% to 43%.

However, that is not a big enough margin for Hanabusa to overcome the deficit she is already running: 1,635 votes, remember, statewide. If the 56%–43% projection is accurate, Hanabusa would need 12,577 voters to turn out in these two precincts in order to gain a net total of 1,635 and pull her into a tie statewide. However, the precincts are only home to about 8,255 registered voters. In the 2012 primary, only 4,429 turned out to vote—and just 3,741 of those participated in the Democratic primary. Even if you assume that the Swing Vote circumstances, the ease of voting by mail, and the campaigns' intense focus on the precincts in the next couple weeks will all drive turnout through the roof, it seems unlikely to exceed turnout in the 2012 general election (a presidential high-water mark), which was 6,556. (Hawaii has a notorious turnout problem.)

Then you have to take into account that early voting means a lot of the precincts' votes have already been banked. The universe of voters who haven't yet voted and thus can still be persuaded could be as small as 3,880 (those who waited to vote in person on Election Day in the 2012 general), 2,531 (those who waited to vote in person on Election Day in the 2012 primary), or 2,149 (those who waited to vote in person on Election Day in the 2012 Democratic primary).

All in all, our calculations look grim for Hanabusa. If we go by 2012 Democratic primary turnout (the most likely model in my opinion), Hanabusa would need to win the two precincts 72% to 28%. Despite the precincts' favorable demographics, that's a daunting task.

UPDATE: A local viewpoint takes issue with my conclusions:

Lots of localities exhibit such idiosyncracies, and if Puna's progressivism overcomes the racial demographics my projections could easily be wrong. We'll find out the final totals in a couple weeks, but regardless the situation doesn't look any less dire for Hanabusa.

Friday, August 8, 2014

Whipping Votes for Baseball Commissioner

It’s essentially an oligarchy. When the MLB owners gather in Baltimore next Thursday to choose the next baseball commissioner, 30 of the richest men in America will vote on a new leader in an election filled with backroom deals, simmering political divisions, and potential backstabbing. The only election less transparent is for pope.

At first we thought the vote might be a formality, another unanimous pick in Commissioner Bud Selig’s era of consensus. Rob Manfred, MLB’s chief operating officer and Selig’s second in command, has been the heir to the throne for a while now. It looked like the commissioner was comfortably guiding him to a coronation in an effort to continue Selig’s policies and legacies uninterrupted. Think of Manfred as the “establishment” candidate; if this is like a House leadership election, he’s John Boehner.

But a breakaway faction of owners isn’t so keen on seeing the status quo continue. The White Sox’ Jerry Reinsdorf, Red Sox’ John Henry, and Angels’ Arte Moreno believe that MLB needs to be tougher on the baseball players’ union, the MLBPA, and that the sport needs a fresh direction to overcome creeping demographic problems. Strongly opposed to Manfred, they’ve set into motion a gambit reminiscent of House Republicans’ past (and future?) attempts at Boehner’s political life.

It allegedly began in May, when Reinsdorf—long an ally of Selig—revealed he did not support Manfred’s continuation of the Selig regime. Spurning his decades-long friendship with the commissioner, he began plotting with the other owners who shared his concerns. They secretly went around shopping for an insurgent candidate who could sink his campaign—the options allegedly ranging from Red Sox Chairman Tom Werner to MLB Executive Vice President Tim Brosnan to Yale President Rick Levin.

The trio had little luck coalescing support around any of them until a few weeks ago, when Werner gave a particularly good interview to MLB’s search committee, apparently winning over some hesitant owners. The mutinous faction now believes they have enough votes to block Manfred from winning the 75% of the vote (23 of 30 owners) he needs. Their hope is that an inconclusive result on the first ballot will result in a chaotic constitutional crisis—a free-for-all in which all previous allegiances go out the window. In that sense, Werner is more a stand-in candidate than a legitimate contender; on subsequent ballots, most people think the owners would have to settle on a third, dark-horse candidate (mystery team!) in order to build a consensus of 23 votes. If they cannot do so after multiple ballots, however, the vote will be postponed until November. This would be a big blow to Selig but also could open the door for Selig to stay on one more year if the owners want to stop the political bloodshed.

The owners just found out on August 4 that the vote will take place on August 14—earlier than expected, perhaps a move by Selig to blunt Werner’s momentum. That means Reinsdorf, Henry, and Moreno had scarcely a week to whip votes for Werner (or Brosnan, the third official candidate, who is also looking to block Manfred but does not have nearly as much support) while Selig’s allies target the same voters for Manfred. Like an old-fashioned party convention or House leadership scrum, it’s got us chattering classes on the edge of our seats.

Although the sensitive competition is expected to unfold outside the public eye, Baseballot will monitor the news for leaks and information about each side’s vote counts; check back often. Here’s what the rumor mill currently says about how your team will vote:

Pro-Werner (4): According to the New York Times (and pretty much every other published report), the White Sox, Red Sox, and Angels are definitely voting for Werner. CBS Sports states that the Brewers’ Mark Attanasio is also "solidly behind Werner."

Likely pro-Werner (3): The New York Daily News reports that the Athletics’ Lew Wolff and John Fisher, the Blue Jays’ Paul Beeston, and the Diamondbacks’ Ken Kendrick are also in the Werner camp. However, CBS is a little less definitive, saying those three clubs are merely leaning toward Werner—hence this less definitive category for them. Oakland's and Arizona's preferences were first noted by the New York Times, which reported on August 7 that the pro-Werner insurgents were counting on their support. (It's speculated, however, that the Diamondbacks may just be voting for Werner to throw the nomination process open, in the hopes that their celebrated president, Derrick Hall, will eventually nab the post.) The Toronto defection was first made public by an earlier report in the Daily News. Previous reports said that the Werner campaign had offered one of Manfred’s close friends a top job in the Werner administration if he betrayed Manfred. Separate reporting from CBS implied that that friend is Beeston, who served alongside Manfred for five years as a former executive with MLB. The Daily News suggested that, if Werner wins, he will make Beeston his right-hand man, as well as keep Brosnan on as executive vice president for business, forming a unified opposition ticket.

Likely anti-Manfred (1): The Times also listed the Reds’ Bob Castellini as a possibility to vote against Manfred, and now the Daily News and CBS are both reporting that Cincinnati will vote for Brosnan, a close friend of Castellini’s. Even though Werner is emerging as the clear opposition candidate, it makes sense that the Reds would prefer not to vote for him. As a small-market team, they would not be well served to see a big-market executive serving in baseball’s most powerful post. UPDATE: Jon Heyman reports that Brosnan has dropped out, so it's unknown who the Reds will vote for. Maybe Werner if the rumors of a Werner-Brosnan ticket are true.

Tossup (2): CBS describes the Nationals’ Lerner family and the Rays’ Stuart Sternberg as "wild cards who could go either way." However, the Daily News disagrees; it reports that the Nationals are “leaning toward Manfred” and that the Rays are in fact a safe vote for Manfred. The Times had previously reported that the Nationals were one of the votes in play for Werner but was silent on the Rays. For now, these two owners are the biggest source of suspense.

Likely pro-Manfred (2): The Daily News lists the Royals’ David Glass and the Rangers’ Ray Davis as Manfred votes, but they hold political beliefs that may align them more with Werner. According to the Sunlight Foundation, the Royals and Rangers are the most conservative baseball clubs, based on their political donations. Glass, the former CEO of Wal-Mart, is notably anti-labor. If part of Werner’s platform is being tougher on the MLBPA, that could be appealing to those two clubs. For now, however, intelligence indicates they are on Manfred's side of the fence.

Pro-Manfred (18): CBS Sports and the Daily News agree that the Mets’ Fred Wilpon, Yankees’ Hal Steinbrenner, Dodgers’ Mark Walter, Rockies’ Dick Monfort, Pirates’ Bob Nutting, Phillies’ David Montgomery, Twins’ Jim Pohlad, Giants’ Charles Johnson, Cardinals’ Bill DeWitt, and Cubs’ Tom Ricketts are in the bag for Manfred. The Daily News adds the Orioles’ Peter Angelos, Braves’ Liberty Media, Indians’ Larry Dolan, Tigers’ Mike Ilitch, Astros’ Jim Crane, Marlins’ Jeff Loria, Padres’ Ron Fowler, and Mariners’ Nintendo to that list for a total of 18.

Monday, July 28, 2014

One Hall of a Bad Idea

This weekend, the Baseball Hall of Fame announced some election-law changes. Effective immediately, players will only have 10 years on the ballot to make their case before dropping out of consideration—not 15 years as it has been for most of the Hall's history.

Let's get the editorializing out of the way: I think this was the coward's way out for the Hall of Fame. The calls for the Hall to reform its voting process have gotten ever louder in recent years, and it didn't seem like they were listening. Now they've finally made a change, but it was a passive-aggressive one—the exact opposite of confronting the very real issues facing the election process. It was a change that allowed them to dodge the issues even more.

By cutting off one-third of the eligibility period, the Hall of Fame is essentially targeting individual players to make it harder for them to get elected. Most of these players are those tainted with PED use. Roger Clemens and Barry Bonds, who will be on the ballot this winter for the third time each, now have much less time to earn forgiveness from voters. Although I would have disagreed, the Hall could have come out and simply issued an election advisory—one that several voters have publicly said they wished for—that said steroid users were cheaters and should be ineligible for the Hall of Fame. Instead they invented a new rule that does the same thing, except it lets them pretend their hands are still clean, and it creates a permanent fix for a temporary problem.

I want to harp on that last point. In the Hall's obsession with appearing above the steroid fray, they've created a mechanism that will affect generations of players to come, as well as take out collateral victims in the present. Tim Raines, who will be on the ballot for the eighth time, has never been associated with steroids and will now probably fall off the ballot without being elected. They could have avoided this by making the change effective starting for candidates who are new to the ballot this year. But this wouldn't have spared them the embarrassment of the Bonds/Clemens/McGwire/Sosa debates for five more years. They valued their supposed decorum enough that they were willing to sacrifice a ballot full of players.

More than that: the Hall of Fame's actions implicitly said to some of its members, "We don't think you belong." Bert Blyleven, Jim Rice, Bruce Sutter, Duke Snider, Bob Lemon, Ralph Kiner, Dazzy Vance, Gabby Hartnett, Rabbit Maranville, Bill Terry, and Harry Heilmann were all elected in between their 10th and 15th years of eligibility. Under the new rules, they would have dropped off the ballot instead. One of the reasons the Hall made the change was to de-clog the coming logjam of candidates on the ballot. They could have instead decreed that a voter can vote for as many candidates as he or she wants, not just a maximum of 10, as the current rules state. Instead they decided to go more exclusive and solve the logjam by kicking more people off, not letting more in. This is not only an admission that Blyleven, Rice, Sutter, and their kin were mistakes, but also a denial of the size of Hall they already have. (This despite the fact that post-1990 baseball is dramatically underrepresented in the Hall of Fame already.) Like it or not, players elected from now on must meet a different, higher standard, making them somehow "purer" choices for the Hall than those guys. The Hall of Fame has succeeded in creating "tiers" of Hall of Famers.

There is one way—and one way only—that these election changes could actually be a good thing. There is one way that I could owe the Hall of Fame a massive apology.

That would be if the Hall of Fame is outsmarting all of us right now. If they've done their research and dug into the data and know for a fact the ripple effect that this change will have.

This change could redeem itself if it forces Hall voters (the BBWAA, in case you forgot) to change their behavior. As a close watcher of elections of any form, I've spent some time breaking down the trends of past Hall of Fame votes, and I'd say I know them pretty well. What I don't know is how those trends will hold up under different election laws. The new rules put us in new territory for predicting results. Will voters panic, knowing there is suddenly less time to elect players, and suddenly become more generous in allocating their votes? If so, will this be a one-time spike in this year's election, or will voters become permanently more open-minded?

A basic fact of Hall of Fame voting trends is that it often takes the full 15 years for players to build up to the 75% needed for election. Many eventual inductees started on the ballot at levels of support around 20% and added five or ten points each year to gradually climb. Take Blyleven, who attracted just 17.5% of the vote his first year on the ballot in 1998. He was finally elected in his 14th year of eligibility with 79.7% of the vote.

If the new rules had been in place while Blyleven was on the ballot, he'd have fallen off after 2007, when he got 47.7% of the vote.

Or would he? Maybe his timetable would have been accelerated as voters felt more rushed to consider his record. Could the Hall of Fame's new election rules cause steeper climbs than we're accustomed to seeing from candidates? Something like this?

I am honestly not sure. On one hand, I can't not believe that Tim Raines being two years away instead of seven from falling off the ballot will jolt more writers into taking his Hall case seriously. At the very least, more voters will sit down this winter and do some research into Raines's career where before they might have passed over his name, based perhaps on outdated memories or long-obsolete first impressions. And the accelerated timetable will surely force some voters to see that, now more than ever, a non-vote or a blank ballot are as affirmative an action as a "yes" vote. If the rule change makes it so that writers are comfortable with a 10-man ballot being the norm, rather than the conservative six votes per ballot that is the average today, then it will have been worth it.

On the other hand, many of the candidates on the ballot today are so polarizing that no degree of extra consideration will convince 75% of writers to vote for them. There is also little evidence that players currently get an abnormally large bump in support in their 15th year on the ballot, so why should they get one on their 10th now? Unfortunately, it doesn't look like writers consciously withhold their vote, but rather that the full 15 years are sometimes needed for a critical mass to "evolve" their thinking.

Maybe the Hall of Fame knows something we don't, and voters will change their behavior.

But given both the Hall's and the BBWAA's maddening track records, I'm not holding my breath.

Thursday, July 17, 2014

How Many Fans Does Each MLB Team Have?

This week, Harris Interactive came out with its annual baseball poll. Among many interesting findings (thoughts on instant replay, the racial and gender breakdown of MLB fans, etc.), the poll ranks franchises in order of their popularity with the public. (Some team from New York is number one.) What it doesn't do, much to my disappointment, is give us hard numbers on how popular each team is. Even with a sample size of 2,241 adults, it's impossible to get a large enough data set to be meaningful when there are 30 possible answers. Still, it would be nice to know—how far ahead are the Yankees, at number one, from the Red Sox at number two? Are we talking 25% support nationwide for the third-place Giants, or closer to 6%? Are there actual real live Miami Marlins fans out there? These are the burning questions.

These ponderings become relevant every year around this time when America votes on whom to send to the All-Star Game. Especially for the Final Vote, a strong regional base of support is as important to winning an All-Star berth as a GOP primary. MLB's county-by-county results maps confirm that geography is destiny; the candidates with the ties to the nation's most populous areas always seem to win. Is it really any wonder that this year's Final Vote winners both hail from Chicagoland, or that last year's had the Deep South and an entire country to themselves?

This got me wondering—what are most powerful voting blocs in baseball? Which teams own the most turf, or hold sway over the most people? We know this generally, of course, thanks to measures like the Harris poll. But to my knowledge, no one has ever undertaken the ambitious project to quantify how many millions of fans each team has. What team has the largest fan base—and how large is it?

Ultimately, the question is unanswerable—at least until the U.S. Census starts asking about baseball fandom. But we can approximate using two data sets that I know are out there: Facebook data and polling data.

Facebook data simply means people who have "liked" a given Major League team on Facebook. Facebook provided this data to the New York Times, where the Upshot created a amazing tool showing fandom by geography. Their map includes the top three teams, by percentage of MLB team "likes," by county and even down to zipcode. It's a beautiful and rich data set, but it's not perfect for our purposes.
  • The public-facing data posted on the Times website, at least, only provides the top three teams for each geography, leaving potentially hundreds of thousands of fans uncounted.
  • The map doesn't tell us how many baseball fans live in each county or zipcode—just the percentage of total baseball fans there that swear allegiance to X team and Y team.
  • An easy solution would be to multiply these percentages by each county or zipcode's population. That would assume everyone in the country is a baseball fan, however. As much as this should be true, it sadly isn't.
  • We could always scale the population figures down to 37% (a.k.a. the percentage of adults who told Harris that they are baseball fans). However, not all counties are created equal. Suffolk County in Massachusetts is about the same size as Oklahoma County in Oklahoma, but there are almost certainly more baseball fans in Suffolk. In short, deriving absolute numbers from the Upshot map requires a healthy dose of speculation.
  • In addition, Facebook data can be unreliable. Not everyone is on Facebook—it might skew to a younger demographic. People also don't always "like" the things they like, and many people will "like" a zillion things that they don't even like all that much.
  • Finally, and most importantly for our purposes, it would just be too damn hard to apply the data to this project. There are 3,141 counties in the United States, and it would take forever to manually multiply each county's Facebook data from the Upshot map by its population. I've contacted the Times to see if they have the data in exportable form but have yet to hear back.
That leaves us with polling data. While Harris was unable to provide result breakdowns by team nationally, there is a pollster that publishes them state by state: the good sports-obsessed folks at Public Policy Polling (PPP). Over the past few years, PPP has asked voters in 32 states about their baseball allegiances. I collected the data in this Google spreadsheet and multiplied PPP's percentage findings by the greater population that the poll's sample represents. In most cases, this was registered voters, except for Mississippi, Virginia, and West Virginia, where the polls surveyed likely voters. This yielded good estimates of the fan breakdown among the 131,204,273 Americans who are described by the 32 polls we have.

Team Fans
Chicago Cubs 9,960,809
Boston Red Sox 9,694,711
Atlanta Braves 9,496,603
New York Yankees 8,062,618
Detroit Tigers 6,086,696
Texas Rangers 5,158,274
St. Louis Cardinals 5,023,469
San Francisco Giants 4,836,995
Cincinnati Reds 3,912,900
Houston Astros 3,362,355
Los Angeles Dodgers 3,334,120
Philadelphia Phillies 3,080,515
Los Angeles Angels 2,828,442
Colorado Rockies 2,791,248
Pittsburgh Pirates 2,699,190
Milwaukee Brewers 2,597,501
Minnesota Twins 2,553,951
Cleveland Indians 2,283,969
Kansas City Royals 2,204,291
Arizona Diamondbacks 2,189,763
Chicago White Sox 2,088,561
Miami Marlins 1,701,702
Oakland Athletics 1,641,087
San Diego Padres 1,459,699
Baltimore Orioles 1,191,765
Tampa Bay Rays 1,173,900
Seattle Mariners 910,586
Washington Nationals 837,904
New York Mets 643,554
Toronto Blue Jays 0

Take these numbers for what they are—an incomplete answer to an unanswerable question, and a project that will always be a work in progress. Obviously, 18 states remain unpolled as to their MLB preferences, including rather important ones like Georgia and New York. They will seriously alter the numbers above, such as boosting the Mets' abysmal total and, most likely, launching the Yankees and Braves past the Red Sox and Cubs into a battle for America's most electorally powerful fan base. The 32 states we have polling data for are shaded in black:

As you can see, the Mariners, Orioles, Yankees, Mets, and Braves are going to be underrepresented in the current numbers. The Red Sox and Nationals probably are too, given the absence of several New England states and two-thirds of the DMV. The Phillies are similarly probably feeling the non-inclusion of New Jersey and Delaware. However, it's bad news for teams like the Rays, Marlins, Brewers, A's, Padres, and others—there are not a lot of places left for them to accumulate more fans.

Some other flaws with the polling approach:
  • Polls look only at people who are registered to vote (and, in this case, at only people who DID vote in Mississippi, Virginia, and West Virginia—even smaller universes). This is a majority of people in the country but by no means all of them. Millions of people not registered to vote are likely baseball fans, and their preferences not only aren't included, but we also wouldn't know how to extrapolate them. It's very possible that, because of the kind of person who registers to vote vs. doesn't, their tastes are materially different from those polled.
  • These polls only survey voters in the United States. This means foreign fans go uncounted, including—crucially—Canadian fans of the Toronto Blue Jays. (This is also a problem with the Upshot's baseball map—which limits itself to the US—although not necessarily with Facebook data inherently.)
  • A poll can only provide eight or so possible choices to the question, "What is your favorite MLB team?" before the question becomes too long and loses people. That means some fans' preferences won't be counted, although it's not as bad as the three-team limit in the Facebook data. Using the necessary discretion in choosing which teams to ask about also runs the risk of missing, say, a hidden pocket of Orioles fans in Minnesota.
  • Conversely, PPP almost always asks about the four teams with major national fan bases: the Cubs, Red Sox, Yankees, and Braves. This explains why they are so far ahead; if we were able to ask about all 30 teams, each one would gain a sprinkling of a few thousand to a few tens of thousands of fans in each state—enough to add up. Put another way, we really don't know what to do with the "Unknown/Non-Fan" group.
  • One advantage to polling vs. Facebook is that it lets people say they don't have a favorite baseball team, presumably because they are not fans of the sport. However, very few people actually take this option in the survey—certainly less than the 63% we would expect to. Therefore, these polls are probably counting people as fans who are only casual partisans or don't even care about baseball. A Bostonian may consider themselves pro-Red Sox even if he doesn't really care for the sport because, hey, why shouldn't I want them to do well? There's thus a concern that these numbers are higher across the board than the real number of "true" fans.

Monday, June 9, 2014

Separating 2016 Narrative from 2016 Fact

I've made no secret of my apparently controversial theory that Hillary Clinton is not going to run for president. The theory rests on the fact that Clinton herself hasn't signaled a desire to run or done any of the things that candidates in the invisible primary tend to do. Instead, this widespread assumption that she's running stems from a snowballing groupthink in the DC bubble—especially in major media outlets, which have ignored her many denials of her candidacy.

Two recent articles, I think, are especially egregious examples of this. The first is from the Washington Post's Chris Cillizza:
Hillary Clinton is running for president.

That simple sentence is one that the political-media complex seems incapable of uttering though evidence is sprinkled absolutely everywhere — including in comments from Clinton herself — that she will be a candidate in 2016.
I pick on this piece in particular (I could have chosen many others) for the irony: the political-media complex has done nothing but perpetrate the notion that Clinton is running for president. Cillizza is more typical of the genre when he goes on to cite the evidence he refers to—the existence of Ready for Hillary, public endorsements, etc. The problem is that none of it is evidence that Hillary Rodham Clinton has herself made plans to run for president. All of it is actually just evidence that other people believe she'll run. This, of course, is not in doubt—and citing other Washington insiders as evidence of what's inside the brain of a third party is the very definition of what I'm talking about.

But the fixation on speculation doesn't end with Hillary—and, say what you will about Cillizza's column, he doesn't contradict any known facts (just misinterprets them, in my opinion). That's not the case with the second article, this one on Elizabeth Warren. Warren announced several months ago that she was flattered, but she is absolutely, positively not running for president. So why did we see an article by Byron York in April speculating about a Warren candidacy? In the article, York addresses head-on, even quotes, Warren’s Sherman statement—but still launches into an exploration of how she could be lying and planning to run!

The whole article reveals a staggering blindness to fact and a slavish devotion to narrative, at all costs—even accuracy. Unlike Clinton's still up-in-the-air status,** Warren has categorically ruled herself out as a candidate. But it’s still fun to speculate about a populist candidate carrying the banner of the Democratic Party’s left wing… Or better yet, about an insurgent who would dare to take on the juggernaut Clinton for the nomination.

The sexy image of what the 2016 campaign could look like has been built up by total speculation for years, even though it was only based on the flimsiest of facts. In Warren's case, now we have facts that directly contradict it—and so that image has gone from speculative to definitively false. But York and others continue to erroneously write reports that Warren could be a candidate. They put so much time and care into building the narrative that they can't bring themselves to tear it down when they have to.

This is a shtick well worn in baseball, where it causes enough trouble in the form of new- vs. old-school culture wars. But baseball is, by and large, an endeavor of entertainment only; politics is important and affects the lives of everyone living in this country, whether they vote or not. Whereas narrative-building is ridiculous in baseball, it's seriously troubling and misleading in political journalism.

(**A note on Clinton: I acknowledge that statements she has made recently have opened the door to her candidacy. In her book being released this week, she writes, “Will I run for president in 2016? I haven’t decided yet.” That’s the clearest indicator yet that she may indeed run. Since writing that Clinton post last year, I’ve become OK with the media narrative of her potential 2016 dominance. It’s a solid fact now that she’s considering a run, and she would indeed be the strongest candidate if she did so. What I take issue with is speculation that becomes reported as fact. In that sense, any journalism about “presumptive Democratic nominee Hillary Clinton” continues to be intellectually dishonest. And even though I now recognize that there’s a very real chance that I’m way wrong about this, I’m still predicting Clinton surprises us all and takes a pass in the end.)