Statistics for Political Bloggers
by ohwilleke
Mon Nov 24, 2003 at 10:46:54 PM PDT
Disclaimer The most important kind of survey error is in picking a sample that is in some way skewed or biased. Essentially all surveys you see in newspapers and press releases and on blogs are telephone surveys. They do their best. But, non-answers, unlisted numbers, people without phones, people with multiple phones, and more can skew these lists. Also, sometimes people lie to people who take surveys. None of this is captured by the analysis below.
The analysis below looks at a basic problem. You have a target group. Typically for a general population this is likely voters. Typically for a Democratic primary, this is likely primary voters or caucus attenders as the case may be. There are millions of these critters out there. The survey typically asks 300 to 1500 people in the relevant group some questions. The cases we care most about are a list of candidates for a particular office.
No matter how perfect your sampling methods are, random chance is going to cause the 300 to 1500 people you pick to not prefer candidates in exactly the same proportions as the entire target population. But, it is possible to show with some fancy mathematics, that as your sample gets larger, your results will look more and more like the general population. Random fluke results tend to average out in larger surveys. Fancy mathematics also shows that the distribution of survey results from the same population tends to cluster around one value, if repeated, and that the more distant values are quite unlikely.
The flukiness of the results of surveys due to random sampling difference from the population, is very well defined mathematically. In cases where the target population is significantly greater than the survey sample, there are only two formulas that really matter, plus a couple of corrolaries.
Any particular result in a survey has its own margin of error. For example, in a survey with a sample size of 408, if Dean polls 32%, the margin of error of this result at the 95% confidence level is 4.5%. Popular convention describes margin of error as the 95% confidence level, which means that if the survey is repeated over and over again, that 95% of results will be within the margin of error range.
This convention is arbitrary. A 95% confidence level is a result that is within 1.96 standard deviations of the "mean" result. A 99% confidence level is a result that is within 2.57 standard deviations from the "mean" result. A 90% confidence level is 1.65 standard deviations from the "mean". A single standard deviation from the mean is a 64% confidence level -- the results are within that range two thirds of the time. A useful rule of thumb is to remember that two-thirds of the time, a survey result will be within half the margin of error.
The margin of error formula for large target population sizes is as follows:
MOE=Z*SQRT(P*(1-P)/N)
Where MOE is margin of error at the confidence level for the Z chosen, Z is the number of standard deviations from the mean in the MOE created confidence interval, P is the percentage result expressed as a decimal, and N is the survey size. SQRT means square root of and is the symbol that looks like a checkmark on your calculator. Hence in the Dean example above it looks like this:
MOE=1.96*SQRT(.32*(1-.32)/408) which is 4.5%.
When the margin of error for an entire survey is presented the "P" figure used is 50%, which is the point at which a survey is least accurate and hence a conservative estimate. The margin of error for individual results is generally lower. The MOE of a survey is purely a function of survey size. It is as follows:
Survey Size MOE
Most political surveys are conducted with samples of 400-1500. Subsamples are often 100-300in size. The largest survey I use on a regular basis is the American Survey of Religous Identification which has a sample size of 50,000, and subsamples of 1,000.
Now onto the issue of comparing two results.
- ohwilleke's diary :: ::

Permalink | 26 comments