Reading about the debate over Nate Silver’s model, I came across this very revealing quote by Joe Scarborough, and how statistical innumeracy is hampering a discussion of polls, statistical models, and election predictions. According to Politico’s Byers, Scarborough said this on television:

“Nate Silver says this is a 73.6 percent chance that the president is going to win? Nobody in that campaign thinks they have a 73 percent chance — they think they have a 50.1 percent chance of winning. And you talk to the Romney people, it’s the same thing. … And anybody that thinks that this race is anything but a tossup right now is such an ideologue, they should be kept away from typewriters, computers, laptops and microphones for the next 10 days, because they’re jokes.”

In the same article, Byers says that NYT columnist David Brooks said this on PBS earlier this month: “The pollsters tell us what’s happening now. When they start projecting, they’re getting into silly land.”

David Brooks is mistaken and Joe Scarborough is wrong. Pollsters can’t project but statistical models can and do—and they do some predictions very well. We rely on statistical models for many decisions every single day –-including, crucially, weather, medicine, and pretty much any complex system in which there is an element of uncertainty to the outcome– and dismissing them is not only incorrect, it is politically harmful for two reasons in the case of electoral politics. One, it perpetuates the faux “horse-race” coverage which takes election discussions away from substantive issues and turns into a silly, often unfounded, time-wasting exercise in fake punditry of who is 0.1% ahead. (For example, there may well be reasons to consider Ohio as a toss-up state but “absolute necessity for Romney to win the state if he wants to be president” is not one of them as Chris Cillizza argues).

There is a fundamental confusion here. The election can indeed be won by 50.1% of the national vote –which is what Scarborough is talking about– (more correctly by 270 electoral votes which can be won with even less). But, at the same time, the chances of getting past that 270 electoral votes margin can be 80%. Heck, the odds of Obama passing 270 votes can be 90% and still the election can be close in terms of the margin of winning. The first (how many electoral votes Obama/Romney win and the vote percentage) is the outcome of the election. The second is the odds –the probability– of a particular outcome happening. Polls and statistical models are not predictions about the same thing.

In his column last week, David Brooks says “If there’s one thing we know, it’s that even experts with fancy computer models are terrible at predicting human behavior.” He gives examples of stock market predictions by corporate financial officers. He has certain points I agree with –yes, CFOs are not very good at predictions and yes there is no point in checking *individual* polls every few hours. However, experts with fancy computer models *are* good at predicting many thing in the aggregate–including results of elections, which are not about predicting a single person’s behavior (yes, great variance there) but lend themselves well to statistical analysis–the same methods by which we can tell a hurricane was about to hit the United States many days in advance. This isn’t wizardry, this is sound science of complex systems. Uncertainty is an integral part of it–but uncertainty does not mean we don’t know anything and we are completely in the dark, and everything is a toss-up.

Polls tell you the likely outcome with some uncertainty and some sources of error (known and unknown). Statistical models take a bunch of factors (in the case of elections: lots of polls, structural factors (how the economy is doing), what we know about turnout, demographics, etc.) and run lots of simulated “elections” by varying those outcomes according to what we know and we think we can reasonably infer about the range of uncertainty given historical precedents and our logical models, and they produce probability distributions.

So, Nate Silver takes all the polls we have, adds in certain factors to his model that have been shown to have impacted election outcomes in the past and runs lots and lots of lots of elections and looks at the probability distribution of the results. What his model says is that currently, given what we know, if we run a gabazillion modeled “elections”, Obama wins 80% of the time. Note that this isn’t saying if we had all those elections on the same day we’d get different results (we wouldn’t), rather, we are running many simulated elections reflecting the range of uncertainty in our data. The election itself will “collapse” this probability distribution and there will be a single result. [*Last two sentences have been added for clarity with much thanks to Nathan Jurgenson for the suggestion and edits.*]

Since we will have one election it’s possible that Obama can lose. However, Nate Silver’s and others’ statistical models would remain robust and worth keeping and expanding. This is important because refusing to run statistical models because they produce probability distributions rather than absolute certainty is irresponsible. For many important issues –climate change!—statistical models are all we have, and all we can have. We still need to take them seriously and act on them (well, that is if you care about life on earth as we know it, blah, blah, blah).

Statistical models are a standard and well-established method in many sciences and are absolutely key to reasonable risk analysis of complex events. Nate Silver may be the face of electoral statistical model but here’s a site run by people at Princeton. This kind of modeling is important work that requires expertise and care but it is not some dark science of wizards. (Also, frankly, Silver gives a lot of information about his model and it all sounds reasonable but it would be great if it became more open source at some point for more peer-review. :-))

So when Nate Silver’s model gives Obama 80% of passing 270 electoral votes, this is not a prediction for a landslide—it is not even overwhelming odds. A one to five chance is pretty close odds. One in five chance of getting hit by a bus today would not make me very happy to step outside the house–nor would I stop treatment for an illness if I were told I had a one in five chance of survival. If I were Romney’s campaign manager, I’d still continue to believe I had a small but reasonable chance of winning and realize that GOTV efforts can swing this close an election. Again, the election remains pretty close but also the odds that Obama will win remain pretty high—those statements are not in conflict. This kind of modeling is scientifically and methodologically sound and well-established.

One reason for the discrepancy between the odds of a win by Obama and closeness of the vote percentages is that the US electoral system is “winner-takes-all” which means that 50.1% of a state gets 100% of the Electoral College votes for a state. And there are many states in which the polls suggest the candidates are only a few percentage points apart. Given that polls have known sources of error (even if you poll perfectly, you will get results outside the margin of error approximately one in twenty times for a 95% confidence interval) and as well as the existence of unknown sources (cell phones? likely voter screens?), and given that polls do not measure factors such as Get-Out-the-Vote efforts which can make a huge difference in close elections in winner-take-all-systems, it remains a very close election. It also remains hugely and significantly tilted towards an Obama win.

In fact I share a wish with Sam Wang of Princeton that sound statistical models –done the way it should be done– should replace the horse-race coverage of every single poll–which drowns out important policy conversations we should be having. As Wang explains, he started doing statistical modeling thinking his results “should be a useful tool to get rid of media noise about individual polls. … This [meta-analysis of polls] in hand could provide a common set of facts. Space would be opened up for discussion of what really mattered in the campaign – or even discussion of policies.”

In short, if Brooks wants to move away from checking polls all the time, he should be supporting more statistical models and we should hope for more people like Nate Silver and Sam Wang to produce models that can be tested and improved over time. And we should defend statistical models because confusing uncertainty and variance with “oh, we don’t know anything, it could go any which way” does disservice to important discussions we should be having on many topics.

well done.

Well said!

Certainly correct and to the point. However, it seems to me that it is overkill. Two points:

[1] If Mr. Silver were showing a 70% probability of a Rmoney victory, there would be no criticism whatsoever from either Bobo or Scarborough. Likely, we would be treated to some OT misquote from Napier or Farr by Bobo that he though raised the 70% to the level of scripture.

[2] Bobo has been at pains to demonstrate his innumeracy over several years and both men are technologically illiterate.

In short, neither are worthy of serious thought or argument as neither are interested in fact or capable of understanding the argument.

Pingback: Data doesn’t play politics — and most of it suggests Obama will win — Data | GigaOM