Building a Voter File: News You Can Use
Editor's note: Today's will be part 1 of a special two-day Building a Voter File extravaganza! Quite frankly, this topic is just too awesome to be tackled in one entry, so check back tomorrow for the thrilling conclusion.
Nate Silver had an interesting conjecture recently, in a post about how exactly Norm Coleman got his list of "improperly rejected ballots" in the never-ending Minnesota Senate race:
What I suspect Coleman did to come up with his list of 650 is something like this:
- Create a database of all ballots that were rejected for a non-matching signature ... maybe there were 1500 of these or something statewide.
- Run some algorithm to determine the likelihood of each of these 1500 ballots being a vote for Coleman as based on things like the precinct the ballot was cast in, any information Coleman has about the voter in his voter file, and perhaps even the voter's name (you can tell more than you'd think about someone based on their first and last name).
- All ballots determined by this algorithm to have a >50% likelihood of being Coleman votes were included on his list ... there turned out to be about 650 of these.
There's more beneath the fold:
It's an interesting example of how voter files are used in the real world. So let's throw it out there: given only the information on a modern voter file, what could we use to predict how a person would vote?
The obvious first place to start is going to be any contacts we've had with the voter. If they maxed out to Coleman and gave nothing to Franken, we can count on their votes. If they told our canvasser they were a strong Coleman supporter, that's good too. Direct contact with the voter trumps all other sources of information about their intentions.
But that's not going to be enough. After all, nationwide only 33% of voters were contacted by a campaign. Obama got 66 million votes, but only 3 million donors. What else can we use to predict vote?
To answer that, let's think about what predicts voting patterns more broadly. The easiest is race, so I'll focus on that in this particular case--there will be discussion of other factors tomorrow. Unfortunately, we don't have a single reliable source for race in Minnesota. Some states, mainly in the South, record race on the voter file thanks to the Voting Rights Act. Minnesota does not. So let's see if we can approach this from a different angle.
First of all, we could use commercial data appends to estimate race. Assuming we've purchased good data, this is likely to be accurate. But it might also be incomplete, especially in a lily-white state like Minnesota. The next step might be looking at names and location to estimate race--for example, Minneapolis is home to a large community of Somali immigrants. Think we can guess who they are by name?
Bear in mind, this is the thought process behind only one variable that we'll use to project likely vote. Tomorrow, we'll put it all together and really see what Nate is talking about.














Recent comments
5 hours 13 min ago
5 hours 15 min ago
2 days 16 hours ago
3 days 4 hours ago
3 days 9 hours ago
2 days 12 hours ago
3 days 14 hours ago
3 days 15 hours ago
3 days 18 hours ago
3 days 18 hours ago