Thursday, July 17, 2014

How Many Fans Does Each MLB Team Have?

This week, Harris Interactive came out with its annual baseball poll. Among many interesting findings (thoughts on instant replay, the racial and gender breakdown of MLB fans, etc.), the poll ranks franchises in order of their popularity with the public. (Some team from New York is number one.) What it doesn't do, much to my disappointment, is give us hard numbers on how popular each team is. Even with a sample size of 2,241 adults, it's impossible to get a large enough data set to be meaningful when there are 30 possible answers. Still, it would be nice to know—how far ahead are the Yankees, at number one, from the Red Sox at number two? Are we talking 25% support nationwide for the third-place Giants, or closer to 6%? Are there actual real live Miami Marlins fans out there? These are the burning questions.

These ponderings become relevant every year around this time when America votes on whom to send to the All-Star Game. Especially for the Final Vote, a strong regional base of support is as important to winning an All-Star berth as a GOP primary. MLB's county-by-county results maps confirm that geography is destiny; the candidates with the ties to the nation's most populous areas always seem to win. Is it really any wonder that this year's Final Vote winners both hail from Chicagoland, or that last year's had the Deep South and an entire country to themselves?

This got me wondering—what are most powerful voting blocs in baseball? Which teams own the most turf, or hold sway over the most people? We know this generally, of course, thanks to measures like the Harris poll. But to my knowledge, no one has ever undertaken the ambitious project to quantify how many millions of fans each team has. What team has the largest fan base—and how large is it?

Ultimately, the question is unanswerable—at least until the U.S. Census starts asking about baseball fandom. But we can approximate using two data sets that I know are out there: Facebook data and polling data.

Facebook data simply means people who have "liked" a given Major League team on Facebook. Facebook provided this data to the New York Times, where the Upshot created a amazing tool showing fandom by geography. Their map includes the top three teams, by percentage of MLB team "likes," by county and even down to zipcode. It's a beautiful and rich data set, but it's not perfect for our purposes.
  • The public-facing data posted on the Times website, at least, only provides the top three teams for each geography, leaving potentially hundreds of thousands of fans uncounted.
  • The map doesn't tell us how many baseball fans live in each county or zipcode—just the percentage of total baseball fans there that swear allegiance to X team and Y team.
  • An easy solution would be to multiply these percentages by each county or zipcode's population. That would assume everyone in the country is a baseball fan, however. As much as this should be true, it sadly isn't.
  • We could always scale the population figures down to 37% (a.k.a. the percentage of adults who told Harris that they are baseball fans). However, not all counties are created equal. Suffolk County in Massachusetts is about the same size as Oklahoma County in Oklahoma, but there are almost certainly more baseball fans in Suffolk. In short, deriving absolute numbers from the Upshot map requires a healthy dose of speculation.
  • In addition, Facebook data can be unreliable. Not everyone is on Facebook—it might skew to a younger demographic. People also don't always "like" the things they like, and many people will "like" a zillion things that they don't even like all that much.
  • Finally, and most importantly for our purposes, it would just be too damn hard to apply the data to this project. There are 3,141 counties in the United States, and it would take forever to manually multiply each county's Facebook data from the Upshot map by its population. I've contacted the Times to see if they have the data in exportable form but have yet to hear back.
That leaves us with polling data. While Harris was unable to provide result breakdowns by team nationally, there is a pollster that publishes them state by state: the good sports-obsessed folks at Public Policy Polling (PPP). Over the past few years, PPP has asked voters in 32 states about their baseball allegiances. I collected the data in this Google spreadsheet and multiplied PPP's percentage findings by the greater population that the poll's sample represents. In most cases, this was registered voters, except for Mississippi, Virginia, and West Virginia, where the polls surveyed likely voters. This yielded good estimates of the fan breakdown among the 131,204,273 Americans who are described by the 32 polls we have.

Team Fans
Chicago Cubs 9,960,809
Boston Red Sox 9,694,711
Atlanta Braves 9,496,603
New York Yankees 8,062,618
Detroit Tigers 6,086,696
Texas Rangers 5,158,274
St. Louis Cardinals 5,023,469
San Francisco Giants 4,836,995
Cincinnati Reds 3,912,900
Houston Astros 3,362,355
Los Angeles Dodgers 3,334,120
Philadelphia Phillies 3,080,515
Los Angeles Angels 2,828,442
Colorado Rockies 2,791,248
Pittsburgh Pirates 2,699,190
Milwaukee Brewers 2,597,501
Minnesota Twins 2,553,951
Cleveland Indians 2,283,969
Kansas City Royals 2,204,291
Arizona Diamondbacks 2,189,763
Chicago White Sox 2,088,561
Miami Marlins 1,701,702
Oakland Athletics 1,641,087
San Diego Padres 1,459,699
Baltimore Orioles 1,191,765
Tampa Bay Rays 1,173,900
Seattle Mariners 910,586
Washington Nationals 837,904
New York Mets 643,554
Toronto Blue Jays 0

Take these numbers for what they are—an incomplete answer to an unanswerable question, and a project that will always be a work in progress. Obviously, 18 states remain unpolled as to their MLB preferences, including rather important ones like Georgia and New York. They will seriously alter the numbers above, such as boosting the Mets' abysmal total and, most likely, launching the Yankees and Braves past the Red Sox and Cubs into a battle for America's most electorally powerful fan base. The 32 states we have polling data for are shaded in black:


As you can see, the Mariners, Orioles, Yankees, Mets, and Braves are going to be underrepresented in the current numbers. The Red Sox and Nationals probably are too, given the absence of several New England states and two-thirds of the DMV. The Phillies are similarly probably feeling the non-inclusion of New Jersey and Delaware. However, it's bad news for teams like the Rays, Marlins, Brewers, A's, Padres, and others—there are not a lot of places left for them to accumulate more fans.

Some other flaws with the polling approach:
  • Polls look only at people who are registered to vote (and, in this case, at only people who DID vote in Mississippi, Virginia, and West Virginia—even smaller universes). This is a majority of people in the country but by no means all of them. Millions of people not registered to vote are likely baseball fans, and their preferences not only aren't included, but we also wouldn't know how to extrapolate them. It's very possible that, because of the kind of person who registers to vote vs. doesn't, their tastes are materially different from those polled.
  • These polls only survey voters in the United States. This means foreign fans go uncounted, including—crucially—Canadian fans of the Toronto Blue Jays. (This is also a problem with the Upshot's baseball map—which limits itself to the US—although not necessarily with Facebook data inherently.)
  • A poll can only provide eight or so possible choices to the question, "What is your favorite MLB team?" before the question becomes too long and loses people. That means some fans' preferences won't be counted, although it's not as bad as the three-team limit in the Facebook data. Using the necessary discretion in choosing which teams to ask about also runs the risk of missing, say, a hidden pocket of Orioles fans in Minnesota.
  • Conversely, PPP almost always asks about the four teams with major national fan bases: the Cubs, Red Sox, Yankees, and Braves. This explains why they are so far ahead; if we were able to ask about all 30 teams, each one would gain a sprinkling of a few thousand to a few tens of thousands of fans in each state—enough to add up. Put another way, we really don't know what to do with the "Unknown/Non-Fan" group.
  • One advantage to polling vs. Facebook is that it lets people say they don't have a favorite baseball team, presumably because they are not fans of the sport. However, very few people actually take this option in the survey—certainly less than the 63% we would expect to. Therefore, these polls are probably counting people as fans who are only casual partisans or don't even care about baseball. A Bostonian may consider themselves pro-Red Sox even if he doesn't really care for the sport because, hey, why shouldn't I want them to do well? There's thus a concern that these numbers are higher across the board than the real number of "true" fans.


3 comments:

  1. Hi

    It's coming from my insider source and it's all a little hazy still but...

    Apparently there's a secret sports picks software making amazingly
    precise predictions since 1999.

    ...used only by top underground group of insiders called Vegas punters.

    ...grabbing data via spiders from all the bookies on the net in
    seconds, crunching numbers and spitting out awesome predictions with
    80% accuracy and higher.

    Here's a video I was able to obtain... thought I'd share it with you :)

    ===> Sports picks directly from the insiders? <=====

    Those seem to be REAL predictions identified by the software, and the
    percentage next to the play is the gain you make when you make the
    bet!

    I could not believe it's possible. I thought it might be a bug, but
    word on the street is that it's real.
    If it is, then we might be in for a real treat! I'll stay on the case
    and let you know more as soon as I speak with the developers!

    In the meanwhile, check out this crazy video:

    ===> How is it possible? 80% accuracy? <=====

    P.S. Those picks are real, verified data since 1999.

    ReplyDelete
  2. Nice article, thank you for sharing wonderful information. I am happy to found your blog on the internet. You can also check - baseball betting software

    ReplyDelete