This is a blog of our data culture series, where we educate on how to become a data-driven company. One of the main points outlined is that you need to understand how to interpret data. In this article we outlie common mistakes done by people without a statistics background.
Introduction
It is more important than ever to be able to analyse data and derive insights from it in an era where data rules the world. However, data can be misused easily, leading to false conclusions, just like with other strong instrument. Here are some common mistakes to avoid, clearly illustrated with examples, whether you’re new to data or simply want a refresher.
1. Mistaking Causation for Correlation
This is one of the most common errors in data interpretation. It’s not always the case that one event caused the other when two appear to occur simultaneously.
Example: Rain doesn’t necessarily come from your shirt if it happens every time, you wear a red one. In a similar vein, ice cream is not always to blame for a boost in sales during a time of higher drownings. Maybe both are just more prevalent in the summertime!
Example: Rain doesn’t necessarily come from your shirt if it happens every time, you wear a red one. In a similar vein, ice cream is not always to blame for a boost in sales during a time of higher drownings. Maybe both are just more prevalent in the summertime!
2. Ignoring Confounding Variables
Confounding variables are other factors that can influence the results of a study, but they are not the variables that the study is trying to measure and can have an impact on the outcomes.
Pay attention to how quickly people who wear larger shoe sizes read. The idea that having bigger feet equates to being a better reader may be alluring. However, age may be a complicating factor in this case because older kids have *bigger feet* and *better reading comprehension!
3. Cherry Picking Data
Cherry picking data is when you only select data that supports your hypothesis and ignore data that does not support your hypothesis. This can lead to biased results.
Example: What if a business exclusively displayed positive client testimonials and disregarded any unfavourable ones? An imperfect picture of client pleasure would result from this.
Example: What if a business exclusively displayed positive client testimonials and disregarded any unfavourable ones? An imperfect picture of client pleasure would result from this.
4. Using Inappropriate Methods or Metrics
There are many different methods and metrics that can be used to analyze data. It is important to choose the right methods and metrics for the data you are working with. For example, if you are working with a small dataset, you may not be able to use statistical tests that require a large sample size. You also need to make sure that the metrics you are using are appropriate for the data you are measuring.
Example: Suppose your goal was to determine the ‘average’ height of a p opulation. A distorted impression could result from the average in a small group containing one very tall person. Using the median, or middle value, in certain situations may provide more useful information.
5. Overfitting the Data
This occurs when an analysis or model attempts to match the data it was developed on too closely, which reduces its applicability to new data.
Example: Consider customising a garment to fit a single person’s posture so well on a given day that it becomes uncomfortable when they stand in a different way.
6. Not Taking the Context
Context is important since data is never created in a vacuum.
Example: It sounds good to read in a report that a company’s revenues doubled the previous month. However, the context alters the image if you knew that they only sold two items the previous month and four items the month before.
7. Sampling Bias
Occurring when the group under investigation is not a true representation of the wider population in question.
Example: The findings of a survey asking attendees of a chocolate convention exclusively about their favourite kind of ice cream would be skewed towards chocolate.
Example: The findings of a survey asking attendees of a chocolate convention exclusively about their favourite kind of ice cream would be skewed towards chocolate.
8. Data Dredging (or P-hacking)
This involves sifting through data to detect patterns without a specific hypothesis, which might lead to erroneous discoveries.
Example: It is comparable to going fishing without setting a target and enjoying each and every branch, worn-out boot, or fish that is caught.
Example: It is comparable to going fishing without setting a target and enjoying each and every branch, worn-out boot, or fish that is caught.
9. Over-reliance on Statistical Significance
A finding may not always be significant in the real world, even though it appears to be statistically strong.
Example: A new medication may shorten the duration of flu symptoms by ten minutes. This might be statistically significant, but is it practically significant?
Example: A new medication may shorten the duration of flu symptoms by ten minutes. This might be statistically significant, but is it practically significant?
10. Not Taking Regression to the Mean into Account
Over time, extreme examples typically become more average without any outside impact.
Example: As an illustration, it doesn’t always follow that a student didn’t study the second time around if they perform very well on one exam and mediocre on the next.
Example: As an illustration, it doesn’t always follow that a student didn’t study the second time around if they perform very well on one exam and mediocre on the next.
11. Confirmation Bias
As a natural tendency, people prefer information that supports their preexisting opinions.
Example: If someone has the opinion that cats are not friendly, they may only see the times when cats are distant and miss the times when they show affection.
Example: If someone has the opinion that cats are not friendly, they may only see the times when cats are distant and miss the times when they show affection.
12. Overgeneralization
Drawing a broad conclusion from a restricted set of data.
Example: Meeting three engineers who like playing chess and concluding all engineers must like chess.
Example: Meeting three engineers who like playing chess and concluding all engineers must like chess.
13. False Perception of Randomness
People frequently perceive patterns even in the absence of them.
Example: After obtaining heads in a coin toss multiple times in a succession, believing the next toss is ‘due’ to be tails.
Example: After obtaining heads in a coin toss multiple times in a succession, believing the next toss is ‘due’ to be tails.
14. Base Rate Fallacy
Not taking into account the whole likelihood of anything occurring given particular information.
A test results in a false positive if the condition is incredibly rare and the test isn’t flawless. However, this is not always the case.
A test results in a false positive if the condition is incredibly rare and the test isn’t flawless. However, this is not always the case.
15. Not Having a Clear Goal
When evaluating data, it’s crucial to know what you’re searching for.
Example: Analysing website data without a clear idea of what you want to improve upon can result in a disorganised set of insights and unclear recommendations for action.
Example: Analysing website data without a clear idea of what you want to improve upon can result in a disorganised set of insights and unclear recommendations for action.
16. Post Hoc Fallacy
Thinking that a dance produced rain only because it rained after a village did a rain dance.
Description: Thinking that merely because one event follows another, the first event caused the second.
Description: Thinking that merely because one event follows another, the first event caused the second.
17. Ecological Fallacy
Believing that what’s true for a group as a whole is true for each individual in that group.
Example: Assuming there are no impoverished individuals in a prosperous country based on the country’s average income.
Example: Assuming there are no impoverished individuals in a prosperous country based on the country’s average income.
18. Simpson's Paradox
When a trend emerged in two groups but vanished or went the other way when the groups were combined.
Example: A medication seemed to work well in two groups of men and women but not in the combined group.
Example: A medication seemed to work well in two groups of men and women but not in the combined group.
19. Misusing Averages
Relying only on averages can ignore the diversity in the data.
Example: Because of a few billionaires, the average income in a town may be high, but the majority of its citizens may still have poor salaries.
Conclusion
Starting a data interpretation journey is exciting but fraught with possibility for error. You can see and analyse data with a clearer lens if you are familiar with these typical problems. Recall that deriving meaningful insights requires careful consideration of context and methods in addition to numerical data.