The Plebian's Guide to Polls - Why We Poll

Welcome back to The Plebian's Guide to Polls.  In last week's issue, "A Brief History of Polls", we discussed the origins of modern opinion polling and the start of the Gallup poll, the first scientifically sophisticated polling company.  Today's issue, "Why We Poll", will address the fundamental reasons why polling provides such a valuable service to the political community.

There's more...

Why We Poll

I was somewhat disturbed, this morning, to see the following comment attached to one of my Daily Kos posts about poll sampling problems:

Best way to void this discussion
is for more of us to randomly lie to pollsters. Then the results of all polls will be assumed to be useless and we will be allowed to live our lives without the daily horserace results that have descended from tiresome to insipid.

To anyone with a love for good data, I'm sure this comment makes your teeth hurt.  But the fact of the matter is, many people don't understand the benefits of polling.  Simply put, polling provides us with information about what people think.  In fact, well-executed polling is the only practical, scientific method for learning what a wide variety of geographically diverse people think.  Voting is the ultimate form of opinion polling.

I'm not going to touch on the idea of why we care about what other people think, today.  Although this philosophical point sits at the heart of polling, it also sits at the heart of democracy.  Essentially, anyone who believes in the sovereignty of the people should believe in the importance of knowing what the people think.

There are those who, I am sure, would be content to ask the opinions of a couple "representative individuals" in lieu of polling.  This doesn't work for the same reason that case studies don't work in real science.  Let's come up with an example to demonstrate.

Let's say you want to know what people think about a presidential candidate.  Imagine a spectrum of opinions - from very negative opinions to very positive opinions - such that everyone's opinion about the candidate can be represented on this spectrum.  Now go out and ask someone what they think, and record their answer somewhere on your good-and-bad opinions spectrum.

You've just collected one data point.  This is now your best estimate of what people think about the candidate.  But how good an estimate is it?  Say the first person you talk to is positively in love with the candidate.  Does this mean everyone else loves the candidate too?  To find out, you interview three more people.

It turns out that all three of the new interviewees have very negative views of the candidate.  You record their answers on your opinion spectrum, but now you find that most of the opinions you have are negative.  The "average opinion" is now very different to what it was when you had only one interviewee.

Writ large, any single individual can have a good or bad opinion of a candidate.  That opinion will be influenced by a number of factors: how much information the individual has about the candidate, what the candidate's policies will do to help or hurt that individual, friends perceptions of the candidate, etc.  We can never predict the precise opinions of a single individual, about a candidate, a policy, or anything else.  Individual opinions are irreducibly complex.

We find, however, that the more opinions we collect, the better sense we have of what most people think.  In essence, the more opinions - the more data - we already have, the less the distribution of that data will change when we add one or two new opinions to those we've already recorded.  In our example above, when we only had one very good impression to go on, adding three very negative impressions changed the picture of public opinion drastically.  If we had one thousand very good impressions, three very negative ones would hardly move the distribution at all.

The size of a polling sample is intimately related to a poll's reported margin of error.  We will address this topic at length in another issue a few weeks from now.  The most important thing to know is that the stability of a poll's predictions is proportional to the square root of the sample size - the number of data points (or individual opinions) included in the sample.  The more opinions you include, the more stable your results - but beyond a certain point, it becomes increasingly hard to stabilize the estimate any more.  A one-opinion sample is very, very unstable.  A ten-opinion sample is better (though still not good), and a 1000-opinion sample is considered about standard in the polling industry.  A 10,000-opinion sample, however, is not really that much better than the 1000-opinion sample.

A quick aside, here: I'd like to recall your attention to the Daily Kos comment I quoted above, where the author suggested that we all lie to pollsters.  This cuts to the heart of the need for stability in polling estimates.  There are always one or two people who will happily lie to pollsters.  In a nicely stable 1000-opinion sample, one or two bad data points (lies) won't have much effect on the final outcome.  In a ten-opinion or 100-opinion sample, however, these bad data points carry a lot more influence.  Stability is critical to any estimate of public sentiment.

Let's turn our attention back to the underlying reason for this discussion now.  Why do we poll?  The answer is, we poll to get a stable estimate of what people actually think about a topic.  Asking one or two people on the street gives us an idea of what people think, but the sample size is too small to provide a stable estimate.

Asking all our friends what they think gives a better sample size, but runs into the same methodological problems as the Literary Digest poll I mentioned last week.  If your sample isn't representative of the population at large, you won't come to a good understanding of what most people think.  You'll have a very good understanding of what your friends think, and perhaps other people like your friends, but you'll have no information about what people who AREN'T like your friends think.  The recognition of this problem is what let George Gallup successfully predict the outcome of the 1936 election, when a (very stable, but nonetheless unrepresentative)  poll of 2,300,000 individuals could not.

That's it for this week's installment of The Plebian's Guide.  I'll be back next week with a discussion of what I consider four key types of polls: political polls, policy polls, parse polls, and push polls.  Each of these four types has it's own distinct uses.  We'll go over what differentiates them and what we can learn from each type of poll.