We Cannot Weight Our Way Out of Bad Data
How weighting improves accuracy in survey research, and why it can’t fix bad data
This week I’ve gone through a lot of resources on more theoretical assumptions and trade-offs when it comes to proper weighting in survey research. Let’s have look!
In the world of statistics and survey research, weighting is a common technique used to ensure that the data collected more accurately represents the target population. When conducting surveys, researchers often encounter situations where certain groups are overrepresented or under-represented in the sample compared to the actual population. Weighting helps correct these imbalances by assigning different levels of influence, or "weights", to responses from different groups, making the overall results more reflective of the population's true characteristics.
Weighting makes polls much, much more accurate than they would otherwise be. However, while weighting can be a powerful tool, it is not a magic fix-all solution. Weighting can only address specific, known statistical biases1 within the data, and it relies heavily on the assumption that the adjustments made truly reflect the underlying population dynamics. When data is fundamentally flawed – due to issues like poor survey design, inaccurate responses, or severe sampling errors – weighting alone cannot rectify these deeper problems. In such cases, the limitations of weighting become apparent, reminding us that good data quality starts with careful planning and execution of the survey process itself. This entry goes through the specific reasons why we can't simply weight our way out of bad data, discussing the boundaries of what weighting can – and cannot – fix.
What does weighting do?
As a way of an example to illustrate how weighting works in survey research, consider a simple example: Imagine a survey is conducted to gauge public opinion on a new policy, but the sample ends up with 70% female respondents and only 30% male respondents, while the actual population is evenly split between men and women. To correct for this imbalance, researchers would apply a weight to male responses to increase their influence in the analysis and reduce the weight of female responses accordingly. If each female response is given a weight of 0.5 and each male response a weight of 1.5, the adjusted sample would better reflect the 50/50 gender distribution of the broader population.
When it comes to an opinion poll, we can determine few kinds of errors that it might have. Chiefly, these include so-called sampling error, perhaps one of the most known of these, and a necessary feature of any opinion poll. It happens when the sample is not quite in line with the known demographic proportions of the whole population, like in above example2. Rectifying this is what weighting is especially equipped with. If certain segments of the population are over-sampled (such as women in our above-discussed national survey), their responses might disproportionately influence the results. By applying weights, researchers can adjust these over-represented groups so that their impact on the final analysis aligns with their actual (known) share in the population.
Another frequent use of weighting is to correct for non-response error. In surveys, not all selected participants respond, and the characteristics of those who do respond can differ from those who do not. For instance, if younger people are less likely to participate in a survey than older individuals, without adjustment, the survey results might skew towards older demographics. Weighting helps to balance this by giving more influence to the under-represented younger respondents, thus ensuring that the survey results more accurately mirror the demographic makeup of the entire population.
An example of this was the suspected effect of “shy Trump voters” in opinion polling of 2016 U.S. presidential elections, implying that respondents for a reason another were more hesitant to tell the interviewer their stance. This theory was, however, convincingly put to rest by the statistical analysis outlet FiveThirtyEight already in 2016. For some reason the theory still persists, which is mayhaps attributable to just wishful thinking.
By making these adjustments, weighting helps to counteract the effects of over- or under-sampling and non-response, enhancing the accuracy and representativeness of survey results. However, it’s important to remember that while weighting can address these specific known biases, it is not a cure-all for all data quality issues, especially those arising from more fundamental flaws in the survey design or data collection process.
What weighting cannot fix: understanding its limits
Weighting is an extremely useful tool in survey research for adjusting skewed samples in data. Correctly done, it improves the accuracy of the results by better representing the target population. Weighting, however, has its limitations. Particularly so when it comes to deeper data quality issues.
A poll might, on top of the two covered, have measurement error embedded in them. Measurement errors occur when the data collected does not accurately reflect what it is intended to measure. These errors can stem from various sources, such as poorly worded survey questions, respondent misunderstandings, or even mistakes in data entry. For example, if a survey question is ambiguous or confusing, respondents may interpret it differently, leading to inconsistent or incorrect answers. Similarly, if respondents provide socially desirable responses rather than truthful ones, the data will not accurately represent their true opinions or behaviours.
Weighting cannot correct these kinds of inaccuracies because it does not alter the content or quality of individual responses. Instead, weighting adjusts the influence of the data points based on the assumption that the responses themselves are valid. If the underlying data is flawed due to measurement errors, no amount of weighting will make the results accurate. The core information remains faulty.
Missing data is another problem that weighting alone cannot fix. Missing responses or incomplete data can occur for various reasons, such as skipped questions, technical issues during data collection, or participants opting out of answering certain questions. These gaps create holes in the dataset, which can lead to biased or unreliable results if not properly addressed.
Weighting cannot fill in these missing values or accurately predict what the missing data would have been. While advanced techniques like data imputation can estimate missing values based on patterns in the data, weighting itself only adjusts the contribution of existing responses and cannot generate new data to replace what is missing. As a result, datasets with significant amounts of missing information can still produce misleading or incomplete analyses, even after weighting.
A further part of total error that a poll might have is called a non-random error, or systematic bias. It occurs when certain groups are consistently affected in a way that skews the data. These biases are not random but are linked to specific characteristics or behaviours within the population. For example, if a survey consistently under-reports the views of a particular demographic, such as low-income individuals or those without internet access, these systematic biases cannot be corrected simply by weighting the data.
Weighting relies on the assumption that errors are random and can be offset by adjusting the sample to more accurately reflect the population’s known characteristics. However, when errors are systematic, they introduce biases that are resistant to correction. Weights adjust the representation of groups within the data but do not change the underlying biases in how data from those groups was collected. As a result, systematic errors remain embedded in the results, and the final analysis can still be skewed despite the application of weights.
Practical challenges in weighting
Practical problems not strictly relating to the theoretical assumptions of the error term in an opinion poll that the weighting cannot rectify and might further cloud the picture. Effective weighting depends on accurate models that reflect the true structure of the population. When researchers apply weights, they do so based on assumptions about how the sample should represent the population, often using demographic information or other known characteristics. However, if these assumptions are incorrect or oversimplified, the weights applied can introduce new errors rather than correcting existing ones. This is called incorrect model assumption
For example, if a survey uses outdated census data to weight responses, or if it incorrectly assumes that all non-respondents are similar to respondents in certain key aspects, the weights will be based on faulty premises. This can lead to further distortions in the results, as the weighting process amplifies inaccuracies rather than mitigates them. Accurate models are essential for weighting to be effective, and any errors in these models can undermine the entire weighting process, making the results less reliable.
Another practical downside of weighting is variance inflation. Applying weights, especially when there are large disparities between the sample and the population, can increase the variance of estimates. This means that while weights adjust for representation, they also make the data less precise, as each data point’s influence is scaled up or down, potentially by large amounts.
Variance inflation reduces the confidence in survey results because it makes them more volatile and less stable. For example, heavily weighting responses from under-represented groups increases the impact of each individual response from that group, which can lead to greater variability in the results. This increased variance makes it harder to draw precise conclusions from the data, as the range of possible outcomes widens.
In practical terms, this means that even if weighting successfully adjusts for some biases, the resulting estimates may still lack precision, leading to a trade-off between representativeness and reliability. As variance increases, so too does the uncertainty around the estimates, which can complicate decision-making based on the data.
This specific effect has led to some absurd weights and subsequent conclusions in the past. In the 2016 U.S. presidential election a solitary black man in the state of Illinois was “weighted as much as 30 times more than the average respondent, and as much as 300 times more than the least-weighted respondent”. This led to the specific poll severely overestimating support for Donald Trump in the demographic category and as a result in their topline.
In summary, while weighting is an effective method for adjusting known biases in survey data, it cannot correct for deeper data quality issues like measurement errors, severe sampling biases, or missing data. To ensure reliable and valid results, it is crucial to address these problems at the source through careful survey design, thorough testing, and robust data collection practices. Weighting should be seen as a complement to good data quality practices, not a substitute for them.
These challenges highlight the critical importance of prioritising high-quality data collection and survey design from the outset. Weighting should be viewed as just one of several tools in a researcher’s statistical toolkit, not a substitute for robust methodology. By focusing on rigorous survey design, clear and unbiased (again in statistical sense) questions, and effective sampling strategies, researchers can minimise the need for heavy reliance on weighting and improve the overall reliability and validity of their results.
All this said, professional pollsters are often exceedingly good with also the weighting and determining where it helps and where it does not. Polls are regularly blamed for the shortcomings of analysis and inference done regarding them by pundits and media.
Ultimately, the goal is to collect data that is as accurate and representative as possible, reducing the need for extensive adjustments. Weighting can be an effective technique when used appropriately, but it works best when it complements solid data collection practices. By recognising its limitations and combining it with other quality control measures, researchers can ensure that their analyses are both credible and meaningful, leading to better insights and more informed decision-making.
When I talk about “bias” in this piece, it is not used in its every-day conversational and derogatory sense. Bias here relates explicitly and strictly to statistical bias, which refers to a systematic error that causes an estimated value to consistently differ from the true value of the population it is meant to represent.
In reality there is something severely wrong in the fundamental sampling design if it ends up with a gender imbalance that big in a national survey, consider it just an example.



Fascinating.
I knew when I could took college statistics that it was complicated. This article is very clear how complicated it really is.
Polsters, perhaps, should indicate what methods and modulations they are using. Can this be done? Effectively.
Thank you for “liking me“. I never would’ve read this except for that .
Excellent overview. I was a math major in college. Statistical analysis was my focus area. Poorly designed surveys yield results that cannot be generalized to the wider population. I am truly a skeptic when it comes to polling data. To feel confident, I would need access to the underlying research methods from which the polling results were derived. Your article clearly explains why such is the case.