Underestimating Trump: the US polling industry under fire

38

In a fictional America, elections are decided by Multivac, a supercomputer that requires only the input of one “representative” voter to statistically model the outcomes of thousands of national, state and local contests.

This is the 2008 that science-fiction author Isaac Asimov imagined in his 1955 short story Franchise, published three years after Univac, one of the earliest commercial computers, successfully predicted Dwight Eisenhower’s landslide victory on US television network CBS.

Asimov’s dystopian democracy has not yet materialised. As it turns out — particularly in the two most recent US presidential races — the electorate is not so easy to reliably predict. Yet it is not for a lack of trying.

“People have an expectation that all the information in the world is at their fingertips now,” says Natalie Jackson, director of research at the Public Religion Research Institute, a non-partisan polling organisation. “And that includes who’s going to win an election.”

In the wake of the 2020 race, pollsters and forecasters are once again facing a reckoning after underestimating support for Donald Trump and the Republican party.

Many forecasters got the presidential outcome right: the Economist and FiveThirtyEight models both favoured Joe Biden to win; a Financial Times average of state polls also put the president-elect ahead. But Mr Biden’s narrower-than-expected margin of victory in key states, as well as Republican upsets in at least three Senate races, have cast doubts on the credibility of the polling industry for a second time in two presidential election years.

Chart showing how polling in battleground states of the 2020 US presidential election consistently underestimated Trump's support

Multivac aside, Asimov’s technocratic vision of 2008 is oddly prescient. The real 2008 heralded a new age of data analytics in campaigns, but also the rise of human election forecasters, such as Nate Silver, who used sophisticated poll averaging methods to correctly predict Barack Obama’s victory in the Democratic primary as well as the winner in 49 out of 50 states. Since then, the obsession over polls has only grown.

It now covers nearly every aspect of daily life from the serious — American views of the Black Lives Matter movement or the willingness of people to take the Covid-19 vaccine — to the silly, such as whether the British still love fish and chips. A key question is whether the 2020 and 2016 misses mean that these other kinds of polls are also skewed, raising the more fundamental question of whether the public can still trust the accuracy of polls.

“I can’t convey to you how different it is [from even two decades ago] that we have polls on everything all the time now,” says veteran pollster Scott Rasmussen.

Nate Silver made his name as a forecaster in 2008 by predicting Barack Obama’s victory in the Democratic primary as well as the winner in 49 out of 50 states in the subsequent presidential election © Brian Cahn/ZUMA/Alamy
Joe Biden supporters in Florida, where the Democratic candidate was beaten by Donald Trump after polls predicted the opposite outcome © Joe Raedle/Getty

He says the launch of poll aggregator Real Clear Politics in 2000 was the “first big revolution” in how polls were covered by the media. For the first time, the results from various pollsters were available in one place. “That naturally led to the gamification of the polling data,” he adds, spurring what Ms Jackson calls an “ecosystem” of pollsters, aggregators, forecasters and media who report on polls prior to an election.

Technically, a poll — unlike a forecast provided by FiveThirtyEight — is not a prediction, but a snapshot in time. But that explanation falls a little flat when its final margin deviates perceptibly from the result. “If Joe Biden beat Donald Trump by four [percentage points] and you had it at 8.5 or 9, you got it wrong,” says Republican pollster Frank Luntz.

The Univac computer system in 1951. It predicted Dwight Eisenhower’s landslide victory the following year © US Airforce/Getty
The CBS election night studio in 1956. Four years earlier the crushing victory for Eisenhower was called on the TV station by the Univac computer © CBS/Getty

Underestimating Trump

The error in national polling is the reason some believe the 2020 misses are more serious than four years ago.

Despite state polling errors in 2016, national polls were very accurate. On average, they estimated Hillary Clinton would win the popular vote by three points, and she won by two. This year, most national poll averages indicated Mr Biden would win by 8 points. He won by four. A four-point error would be as much as double the historical average by some measures, according to Will Jennings, a political-science professor and co-author of a 2018 study examining election polling errors.

An American Association for Public Opinion Research postmortem on the 2016 polls found that many state pollsters in 2016 failed to adjust their samples by education, resulting in an undercount of non-college educated voters. This made a difference in upper Midwest states where non-college educated white voters in particular — who overwhelmingly supported Mr Trump — made up large swaths of the electorate.

By 2020, the state pollsters had generally corrected this oversight but they still underestimated the Republican vote in November, suggesting that something else is causing the error in the same places as in 2016. “There are very few areas where we have a direct check on what polls say,” Ms Jackson says. “Anytime we see that a substantial number of polls are off consistently in the same direction we should pay attention to it.”

Bar chart showing average error and bias (overestimation of Republicans or Democrats) in US national presidential polls over time. In, 2020, national polling misses were worse than in 2016, when national polls were very accurate

One theory is that election pollsters did not count enough Trump supporters as “likely voters” in their samples and incorrectly screened out Trump backers who didn’t express a strong enough preference. Though there were fewer late deciders than in 2016, there is some evidence that they once again broke for Mr Trump.

Others suggest the effect of this was magnified by polls that overestimated turnout for Mr Biden. Democrats disproportionately voted by mail. This could have further skewed pollsters’ “likely voter” models.

Problems with “likely voter models” may not be as big an issue for the rest of the polling industry. Unlike other kinds of surveys, pre-election polls need to simulate who will actually turn out to vote. Patrick Murray, director of the Monmouth University Polling Institute, says “election polling violates one of the key principles of probability polling — you don’t know who the population is”. A survey of US adults, for example, need only be representative of the population, while a pre-election survey of likely voters must accurately represent the electorate and their turnout in the race — a much bigger challenge.

Diagram showing the US presidential election polling ecosystem. Using 'the polls' as a catch-all conflates several different aspects of the election polling industry: individual pollsters, poll aggregators, election forecasters, the media and the public.

There is a third, more worrying theory, for the industry: that loyal supporters of Mr Trump and the Republican party declined to take part in polls. Survey response rates did tick up slightly as the pandemic spread across the US, but perhaps they were still missing the most ardent supporters of the president and his party. This would affect all kinds of surveys, not just election polls.

This is different from the longstanding idea that “shy” Trump supporters lied to pollsters about their preference. The AAPOR report and subsequent research found little evidence of this in 2016.

Instead, says Brian Schaffner, a political scientist at Tufts University, who helps lead the Co-operative Election Study, a large-scale online survey that interviewed 70,000 people in the month before the election, 2020 offered further evidence that polls were missing a certain type of Trump supporter.

The CES, says Mr Schaffner, was adjusted to ensure that the polls were weighted not just for party registration, but for respondents’ 2016 vote. It didn’t work. “We had the right number of Trump voters in our sample, just [that] the Trump voters we do have — or the Republicans we do have in our samples — are basically softer supporters of the party than the population of Trump voters.”

Supporters cheer President Donald Trump at a rally in Florida. Brian Schaffner, a political scientist at Tufts University, says it is possible that some Trump voters do not respond to polls © John Raoux/AP

He and other academics are increasingly convinced that the discrepancy might be the result of a strong overlap between individuals voting for Mr Trump and those who harbour a deepening distrust of institutions and are thus less likely to respond to polls, especially those administered by news organisations or academic groups.

The issue of “partisan non-response” — or voters with a particular viewpoint disproportionately refusing to respond to surveys — could be one of the most difficult for pollsters to solve.

“When you have a populist candidate and now a populist party,” says Mr Schaffner, “and the rhetoric is basically distrust of institutions, the most loyal supporters of that party . . . [are] probably not going to be taking polls.”

Scatterplots showing that 2020 and 2016 polling misses were larger in states with more non-college educated white voters, but in 2012 the correlation was less pronounced

A more diverse electorate

It is not all bad news for pollsters. As Mr Jennings, who lectures at the University of Southampton, noted in his paper analysing election polls from 45 countries between 1942 and 2017 — “if anything, polling errors are getting smaller on average, not bigger”.

But, says Mr Rasmussen, while polls are arguably more accurate now than a generation ago, the US polling industry is dealing with a much more diverse electorate and systemic changes that were not there two, three, or four decades ago, including a growing non-English-speaking population and many people no longer working 9-5 jobs. “In the 1980s . . . most people still watched one of three television networks . . . there was a common culture and language, and everybody still talked on the telephone landline,” he says.

“We’re [now] at a time when it’s far more difficult to conduct a poll, because there is no common language, there is no sense of common sources. People live in bubbles, which leads to distrust because when something pops up outside of your bubble, you are more suspicious of it,” he adds.

The internet has also given rise to a surplus of polls — many of which are administered without sufficient checks, such as matching respondents with voter registration files, or making sure the sample of voters is reflective of the broader voting population.

The problem with websites that average and aggregate poll numbers is that while sometimes those aggregations can lead to more accurate results, they can be thrown off by low-quality polls that, for instance, do not offer fully bilingual polling in states with a large Spanish-speaking population, such as Florida or Arizona, says Matt Barreto, co-founder of the polling and research firm Latino Decisions.

“With online samples there is a huge amount of variability in the quality of those samples, and people don’t like to admit it,” says Mr Barreto, who worked on the Biden campaign.

Republican pollster Frank Luntz says: ‘If Joe Biden beat Donald Trump by four [percentage points] and you had it at 8.5 or 9, you got it wrong.’ © J Lawler/The Washington Post/Getty

Less certainty in the future

So how can polling improve? Mr Luntz, the Republican pollster, stresses the importance of focus groups running in parallel with traditional surveys. “What [a survey] doesn’t tell you” is the “intensity” of the voter’s feelings. A focus group “is expensive to do and it’s time consuming. But that’s the only way you’re going to measure it,” he adds.

Henry Fernandez of the African American Research Collaborative notes that proxy questions can offer a better guide to how a segment of the electorate is going to vote, such as posing a question about race relations as a way to measure support for Mr Trump.

Sam Wang, director of the Princeton Election Consortium, goes one step further, arguing that a machine-learning algorithm could potentially provide a more accurate take on guessing respondents’ voting habits, particularly among undecided voters.

The questions could range from “Did you go to college” and “What do you do for a living?” to “How do you feel about brown people?”, says Mr Wang, a neuroscientist. “[If] you dump all that stuff into a machine-learning model . . . you basically find which variables predict what’s going to happen with what that person is going to do.”

President-elect Joe Biden attends a rally in Georgia for the Democratic Senate candidates Jon Ossoff and Rev Raphael Warnock. The race against their Republican rivals is so close that several pollsters have shied away from predicting the result © Drew Angerer/Getty
Democratic Senate candidate Raphael Warnock, voting rights activist Stacey Abrams and Democratic Senate candidate Jon Ossoff listen to US President-elect Joe Biden speak at a campaign rally in Atlanta © Drew Angerer/Getty

Even a population’s Google search history could be a good predictor of its likely voting habits, he says. That was the case during the 2016 Republican primaries, where data released by Google showed that the number of Google searches for each candidate in a given state on the day before the primary closely mirrored each candidate’s performance in the state and who would come first.

Most pollsters say the flaws in the industry, although significant, are more to do with the messaging around the polls to the public.

Poll averages to the nearest decimal point can convey false precision. For the general public, probabilities are also difficult to internalise, despite the pains forecasters take to explain them. In 2016, FiveThirtyEight predicted a 29 per cent chance of a Trump victory, but many thought of it as zero. This year, the Economist’s final forecast gave Mr Biden a 97 per cent chance of victory; looking back, that “may have misled readers into expecting a landslide win for Mr Biden”, the forecasters wrote.

Mr Rasmussen argues there should be less emphasis on the horse race-aspect of the polls, and more emphasis on the underlying questions and issues the polls are asking. “Our purpose is to explain how the voters view the world. It’s about the voters — not the politicians.”

In the run-up to next week’s Georgia Senate run-off races, Mr Rasmussen distributed results showing why people said they were voting and their level of enthusiasm, before putting out a final number. He stopped polling two weeks before election day.

Several other public pollsters have also shied away from polling in Georgia. Although Georgia polls on average were relatively accurate in the presidential race — they were off by about a percentage point — both races are expected to be very close, limiting the predictive power of polling margins. “If the race is that close, it is absurd to use a poll to predict [it],” says Mr Rasmussen.

“Public pollsters are like bartenders that serve another shot of whiskey to a customer who shouldn’t have it,” he says. “I would hope that some analytical sites and political sites would display a little less certainty about what the polls [show].”

US presidential election 2020: You tell us

How do you feel now the election is over? Are you happy with the winner? Do you feel the election process was fair? How do you see the outlook for America? Do you feel positive about the incoming president or uncertain? Share your thoughts with us.

Source