Observational Studies, Confounders, and Stratification - Data Science Discovery (2024)

← Row Selection with DataFrames Next: Simpson's Paradox →

Unlike controlled experiments, in observational studies, the researcher has NO control over assignment into treatment and control groups. In observational studies, oftentimes either

  • The subjects themselves decide if they get the treatment or not.
    — OR —
  • Fate determines who gets the treatment and who does not.

In both cases, the researcher just observes what happens.

Main Problem with Observational Studies

Observational studies are done out of necessity. Whenever possible, it’s better to do a randomized controlled double-blind experiment. Why is this the case?

Observational studies can show correlation, but they do not imply causation.

Observational studies can show an association, but it’s difficult to make conclusions about causality. Since the treatment and control groups just "happened" they are often very different from each other.

Confounding Variables

These differences between the treatment and control groups are called confounders. Confounders can lead you to incorrect conclusions of the correlation between the treatment and response. They are very common in observational studies. It’s good to be aware of possible confounders so that you don’t lie with your statistics!

In observational studies, there are 2 scenarios:

  1. The treatment truly caused the response. If so, there will be a causal link explaining how or why the treatment itself is causing the response. There is a direct link between the treatment and response as shown below.
    Observational Studies, Confounders, and Stratification - Data Science Discovery (1)

  2. The treatment did not cause the response. Instead, there is a confounder that’s related to the treatment that is also causing the response. It is making it look like the treatment caused the response when in reality, the confounder caused the response.
    Observational Studies, Confounders, and Stratification - Data Science Discovery (2)

How To Handle Confounding Variables

Because observational studies are done out of necessity, sometimes it’s not possible to do a randomized controlled experiment. The question is, can we still make conclusions from an observational study?
Good studies take great care to reduce confounding variables and there are many ways to do this! One common way is through a technique called stratification.

Stratification

Statisticians adjust for these confounding variables by dividing the treatment and control groups into smaller more hom*ogeneous subgroups, where the confounding factor is the same. This is called stratification. Stratification plays a similar role in observational studies as blocking does in randomized experiments and stratification helps us deal with confounders.

With stratification, we can compare groups that are similar in the treatment group to groups that are similar in the control group. If you think a variable could confound your results, you should stratify on that variable. Here is a visual below:

Observational Studies, Confounders, and Stratification - Data Science Discovery (3)

With observational studies, you need a much bigger sample size than you do with randomized experiments because each time you stratify for a possible confounder the comparison groups get smaller and smaller. You can stratify as many times as needed.

Video 1: Observational Studies Examples

Follow along with the worksheet to work through the problem:

  • Download Blank Worksheet (PDF)

Video 2: Experimental Design Examples

Follow along with the worksheet to work through the problem:

  • Download Blank Worksheet (PDF)

Video 3: Stratification Examples

Follow along with the worksheet to work through the problem:

  • Download Blank Worksheet (PDF)

Q1: Some observational studies show that people who drink energy drinks tend to get hurt more often. State whether the following is a confounder, causal link, neither, or both: Genetics - Some people are more genetically prone to injuries than others.

Q2: Some observational studies show that people who drink energy drinks tend to get hurt more often. State whether the following is a confounder, causal link, neither, or both: Harmful ingredients - Energy drinks have harmful ingredients that decrease white blood cells, which can lead to injuries.

Q3: Some observational studies show that people who drink energy drinks tend to get hurt more often. State whether the following is a confounder, causal link, neither, or both: Exercise - People who exercise are more likely to drink energy drinks and exercising makes people more prone to injuries.

Q4: Suppose we want to do an observational study to test the effectiveness of a drug that is supposed to improve focus. There are 30 subjects in this observational study, however, 20 subjects work from home and 10 work in the office. We think this could affect the response. How would you design this study and use stratification to compare the treatment and control groups at the end?

Q5: An observational study is best defined as:

Q6: Stratification is done at:

← Row Selection with DataFrames Next: Simpson's Paradox →

`); } else { $e.prop("disabled", true); $e.html((i, html) => "❌ " + html); $e.after(`

Try Again. ${d.comment}

Observational Studies, Confounders, and Stratification - Data Science Discovery (2024)
Top Articles
Latest Posts
Article information

Author: Tish Haag

Last Updated:

Views: 6523

Rating: 4.7 / 5 (47 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Tish Haag

Birthday: 1999-11-18

Address: 30256 Tara Expressway, Kutchburgh, VT 92892-0078

Phone: +4215847628708

Job: Internal Consulting Engineer

Hobby: Roller skating, Roller skating, Kayaking, Flying, Graffiti, Ghost hunting, scrapbook

Introduction: My name is Tish Haag, I am a excited, delightful, curious, beautiful, agreeable, enchanting, fancy person who loves writing and wants to share my knowledge and understanding with you.