A Beginner’s Guide to Preventing Mistakes in Data Interpretation

August 10, 2023
Sebastian Walther
Co-founder & CFO at ValueWorks

Reading Time: 4 minutes

This is a blog of our data culture series, where we educate on how to become a data-driven company. One of the main points outlined is that you need to understand how to interpret data. In this article we outlie common mistakes done by people without a statistics background.

Introduction

It is more important than ever to be able to analyse data and derive insights from it in an era where data rules the world. However, data can be misused easily, leading to false conclusions, just like with other strong instrument. Here are some common mistakes to avoid, clearly illustrated with examples, whether you’re new to data or simply want a refresher.

1. Mistaking Causation for Correlation

This is one of the most common errors in data interpretation. It’s not always the case that one event caused the other when two appear to occur simultaneously.

Example: Rain doesn’t necessarily come from your shirt if it happens every time, you wear a red one. In a similar vein, ice cream is not always to blame for a boost in sales during a time of higher drownings. Maybe both are just more prevalent in the summertime!

2. Ignoring Confounding Variables

Confounding variables are other factors that can influence the results of a study, but they are not the variables that the study is trying to measure and can have an impact on the outcomes. Pay attention to how quickly people who wear larger shoe sizes read. The idea that having bigger feet equates to being a better reader may be alluring. However, age may be a complicating factor in this case because older kids have *bigger feet* and *better reading comprehension!

3. Cherry Picking Data

Cherry picking data is when you only select data that supports your hypothesis and ignore data that does not support your hypothesis. This can lead to biased results.

Example: What if a business exclusively displayed positive client testimonials and disregarded any unfavourable ones? An imperfect picture of client pleasure would result from this.

4. Using Inappropriate Methods or Metrics

There are many different methods and metrics that can be used to analyze data. It is important to choose the right methods and metrics for the data you are working with. For example, if you are working with a small dataset, you may not be able to use statistical tests that require a large sample size. You also need to make sure that the metrics you are using are appropriate for the data you are measuring. Example: Suppose your goal was to determine the ‘average’ height of a p opulation. A distorted impression could result from the average in a small group containing one very tall person. Using the median, or middle value, in certain situations may provide more useful information.

5. Overfitting the Data

This occurs when an analysis or model attempts to match the data it was developed on too closely, which reduces its applicability to new data. Example: Consider customising a garment to fit a single person’s posture so well on a given day that it becomes uncomfortable when they stand in a different way.

6. Not Taking the Context

Context is important since data is never created in a vacuum. Example: It sounds good to read in a report that a company’s revenues doubled the previous month. However, the context alters the image if you knew that they only sold two items the previous month and four items the month before.

7. Sampling Bias

Occurring when the group under investigation is not a true representation of the wider population in question.

Example: The findings of a survey asking attendees of a chocolate convention exclusively about their favourite kind of ice cream would be skewed towards chocolate.

8. Data Dredging (or P-hacking)

This involves sifting through data to detect patterns without a specific hypothesis, which might lead to erroneous discoveries.

Example: It is comparable to going fishing without setting a target and enjoying each and every branch, worn-out boot, or fish that is caught.

9. Over-reliance on Statistical Significance

A finding may not always be significant in the real world, even though it appears to be statistically strong.

Example: A new medication may shorten the duration of flu symptoms by ten minutes. This might be statistically significant, but is it practically significant?

10. Not Taking Regression to the Mean into Account

Over time, extreme examples typically become more average without any outside impact.

Example: As an illustration, it doesn’t always follow that a student didn’t study the second time around if they perform very well on one exam and mediocre on the next.

11. Confirmation Bias

As a natural tendency, people prefer information that supports their preexisting opinions.

Example: If someone has the opinion that cats are not friendly, they may only see the times when cats are distant and miss the times when they show affection.

12. Overgeneralization

Drawing a broad conclusion from a restricted set of data.

Example: Meeting three engineers who like playing chess and concluding all engineers must like chess.

13. False Perception of Randomness

People frequently perceive patterns even in the absence of them.

Example: After obtaining heads in a coin toss multiple times in a succession, believing the next toss is ‘due’ to be tails.

14. Base Rate Fallacy

Not taking into account the whole likelihood of anything occurring given particular information.

A test results in a false positive if the condition is incredibly rare and the test isn’t flawless. However, this is not always the case.

15. Not Having a Clear Goal

When evaluating data, it’s crucial to know what you’re searching for.

Example: Analysing website data without a clear idea of what you want to improve upon can result in a disorganised set of insights and unclear recommendations for action.

16. Post Hoc Fallacy

Thinking that a dance produced rain only because it rained after a village did a rain dance.

Description: Thinking that merely because one event follows another, the first event caused the second.

17. Ecological Fallacy

Believing that what’s true for a group as a whole is true for each individual in that group.

Example: Assuming there are no impoverished individuals in a prosperous country based on the country’s average income.

18. Simpson's Paradox

When a trend emerged in two groups but vanished or went the other way when the groups were combined.

Example: A medication seemed to work well in two groups of men and women but not in the combined group.

19. Misusing Averages

Relying only on averages can ignore the diversity in the data. Example: Because of a few billionaires, the average income in a town may be high, but the majority of its citizens may still have poor salaries.

Conclusion

Starting a data interpretation journey is exciting but fraught with possibility for error. You can see and analyse data with a clearer lens if you are familiar with these typical problems. Recall that deriving meaningful insights requires careful consideration of context and methods in addition to numerical data.

Book a demo

Start today to better drive the direction of your company with ValueWorks!

Predictive Analytics: Definition, Techniques, and Benefits

What is Predictive Analytics? Predictive analytics is a transformative domain that uses statistical, data mining, and machine learning techniques to

Transform Your Business with Seamless Data Integration

Seamless data integration is the effortless process of consolidating data from various sources into a unified view, eliminating data silos

Unlock Profits: Mastering Buy and Build in PE

Buy and Build is a powerful strategy in the realm of private equity, focusing on the acquisition of a platform

Transform Sales Success with AI Funnel Management

AI Funnel Management is a transformative approach that leverages artificial intelligence to optimize sales processes.

More to explore

Master End-to-End Data Governance for Success

Reading time: 8 minutes

Data governance refers to the set of processes, policies, and guidelines that enable organizations to effectively manage their data assets.

Predictive Analytics: Definition, Techniques, and Benefits

Reading time: 8 minutes

Transform Your Business with Seamless Data Integration

Reading time: 8 minutes

Seamless data integration is the effortless process of consolidating data from various sources into a unified view, eliminating data silos and enabling smoother data flow.

PRODUCT FEATURES

Platform Overview

Data platform

Integration

Cloud data protection

AI use cases

Core services

Integrations & Onboarding

Core services

ValueWorks vs. BI-Project

use cases

Planning & Forecasting

Headcount planning

Expense & investment planning

Revenue forecasting

Cashflow forecast & simulation

Reporting & Analytics

Management reporting

Investor reporting

Group management

Sales excellence

Execution

DATEV-Daten endlich richtig nutzen

by company

Industry solutions

Private equity

Holdings

Start-ups

by team

Founders & CEOs

CFOs & Finance Teams

COOs & Operations Teams

Sales & Revenue Teams

HR Teams

Investors & Boards

CASE STUDY

INSIGHTS

Blog

Resources

Metrics Library

PRODUCT FEATURES

Videos

About us

BLOG

A Beginner’s Guide to Preventing Mistakes in Data Interpretation

Introduction

1. Mistaking Causation for Correlation

2. Ignoring Confounding Variables

3. Cherry Picking Data

4. Using Inappropriate Methods or Metrics

5. Overfitting the Data

6. Not Taking the Context

7. Sampling Bias

8. Data Dredging (or P-hacking)

9. Over-reliance on Statistical Significance

10. Not Taking Regression to the Mean into Account

11. Confirmation Bias

12. Overgeneralization

13. False Perception of Randomness

14. Base Rate Fallacy

15. Not Having a Clear Goal

16. Post Hoc Fallacy

17. Ecological Fallacy

18. Simpson's Paradox

19. Misusing Averages

Conclusion

Share:

Table of Contents

Book a demo

Start today to better drive the direction of your company with ValueWorks!

More Posts

Share This Post

More to explore

Master End-to-End Data Governance for Success

Predictive Analytics: Definition, Techniques, and Benefits

Transform Your Business with Seamless Data Integration

Make faster and better decisions.

Company

INFORMATION

SUBSCRIBE TO NEWSLETTER