The Plebian's Guide to Polls - Margin of Error

Welcome back to The Plebian's Guide to Polls.  In last week's issue, "The Four Polls", we addressed key distinctions among the types of polls that are conducted.  Today's issue, "Margin of Error", will be the first in a number of entries looking at the mechanics of polls and what they really mean.

There's more...

---------------------------------------------------

Someone told me, once, that if you're at a cocktail party and you admit to being a pollster, the one question you're sure to hear is, "So what does that whole 'Margin of Error' thing really mean, anyway?"

The Margin of Error (or MoE as I like to call it - to the sure amusement of my Japanese friends) is one of those concepts where all too many people understand it none too well, and none too many people understand it all too well.  Before we examine what the MoE really means, let me run through a few common myths about it.  I'll use political polling as the standard example for this discussion since, as I discussed last time, political polling is the area in which accuracy is most critical.

  • (MYTH): If the difference between two candidates is smaller than the margin of error, then for all intents and purposes, those candidates are tied.  (FACT): Whichever candidate is ahead in the polls is probably ahead in reality, no matter how small the difference.
  • (MYTH): The real support a candidate has is no more or less than ±MoE from the numbers reported in a poll.  (FACT): The MoE is an arbitrary value used to express what's likely and unlikely, not what's possible and impossible.
  • (MYTH): There is a 5% chance that the numbers in any poll are "wrong" - i.e. the true score is more than ±MoE away from the poll results.  (FACT): MoE fails to account for significant sources of error in polling estimates that may seriously affect the accuracy of the results.

So what is the Margin of Error?  Put simply, MoE is a measure of how variable are the estimates in any poll.  The more people included in the poll, the less variability in the numbers the poll generates.  Mind you, there's a very big difference between 'variability' and 'accuracy' (which I'll address in just a minute).

A poll is trying to estimate the true opinions of the population at large, by looking at a small sample of people from that population.  Like I said before, a poll is only ever an estimate.  For political candidates, it's an estimate of how well those candidates are doing among voters.  The only way to MEASURE how the candidates are doing is by voting (like on Election Day).  But voting is a big, expensive, time-consuming procedure - so usually we want to estimate the true opinion rather than trying to measure it.

The following graphic, shamelessly lifted from Wikipedia, helps show the relationship between sample size (the number of people in your poll) and variability (how sure you can be of the estimates you get with your poll).

As you can see, the more people in the poll, the narrower the curve becomes.  The curve represents the confidence we have in a poll's estimates.  When the curve dips below a certain point, we consider it unlikely that the true score is that far away from our estimate - so we are most confident that the true score lies where the curve is highest.  The interval between the two sides where the curve dips low is called the confidence interval: the interval in which we're confident the true score lies.  The true score might still be way out to one side or the other, but this is unlikely given our polling data.

(For the math nerds out there, most confidence intervals are drawn so that we think there's a 95% chance the true value is within that range from our estimated value.  Which means that we pick two points on the low ends of the curve so that 95% of the area under the curve is between those two points.  Our estimated value, of course, sits right at the middle of the confidence interval.  Our estimated value is the single point we're most confident about)

The Margin of Error is the distance from our estimated value - at the middle of the interval - out to one end of the interval.  So if we take a poll and it estimates Obama has 51% support with an MoE of ±3, then we're 95% confident the REAL value if we MEASURE his support would be between 48% and 54%.

What's important to remember in all this is that the MoE is a measure of confidence, not surety.  I personally have a very strong dislike for reporting the Margin of Error, because it tricks some people into a sense of false equivalency.  All the MoE tells you are the borders of the 95% confidence interval.  What if we feel okay with being 90% confident?  Or 75% confident?  What if we want to be 99.99% confident?

The MoE is an arbitrary choice about what is and isn't likely.  The simple truth is, whoever is up in the polls is always more likely than not to win an election, if it's held right then.  That's what it means to be ahead.  Margin of Error is one way of stating our confidence in that prediction, but it's not the only way.  I personally prefer to see the curves, like those I showed above, so you can get a sense for the amount of variability in your estimates without simplifying everything down to a single number.

In polling, MoE also misses out on a lot of other errors that can decrease the accuracy of any estimate.  Here I'm talking about accuracy, not variability.  Variability is based on how many people you sample.  Accuracy is based on your variability, but also on whether you made good choices in how to sample for your poll and how to analyze the date.  Political polls show an almost constant bias, based on the pollsters conducting the polls.  Some polls score systematically better for Democrats than Republicans, and vice-versa.  This is a methodological error, an error in how the poll was constructed or conducted, that misrepresents the true scores for what the poll is estimating.  MoE is a measure of accuracy if and only if a poll has NO methodological error - if and only if the pollster's assumptions about racial demographics and voter turnout are perfect.  If there is methodological error, then this will compound any error in estimation from the MoE.

So the take-home lesson about the Margin of Error is this: it sums up how much variability there is in the sample estimates being reported.  MoE doesn't consider problems in polling methodology.  It gives only a crude representation of the variability in polling data.  Whoever is ahead in the polls is, more likely than not, ahead in reality as well.  But MoE is still better than nothing.  It gives us some understanding of the spread of polling data.  We just need to remember that all the MoE really says is, what values are we 95% confident could be reality, based on our polling data.