Cause and Effect

Usually, the main reason we do a correlational study is to find evidence of a cause-and-effect relationship. Understanding cause and effect is crucial in data analysis, scientific research, and decision-making.

Types of Relationships

There are 5 types of relationships we could have when a correlation is present:

  • A cause-and-effect relationship happens when a change in X produces a change in Y. This relationship is usually clearly represented, such as a relationship between study time and exam scores.

  • A common cause factor relationship happens when an external variable causes two variables to change in the same way. For example, A study finds a correlation between increased air conditioner sales and higher rates of drowning in the summer months. At first it might suggest that air conditioners somehow increase the likelihood of drowning. But, the real common cause is hot weather—higher temperatures lead to both more people purchasing air conditioners and more people swimming, which increases the risk of drowning.

  • A reverse cause-and-effect relationship happens when the dependent and independent variables are mistakenly switched, leading to an incorrect assumption. For example, someone finds a strong correlation between low self-esteem and social media usage. At first, they might assume that spending more time on social media lowers self-esteem due to the user comparing themselves too much. But, further analysis shows that people with low self-esteem are more likely to spend excessive time on social media as a medium of escape or validation.

  • An accidental relationship happens when two variables show a correlation, but there is no actual connection between them. These correlations only happen by chance and isn't interpreted as meaningful. For example The number of women enrolling in engineering programs and the number of reality TV shows increased at the same time. While these variables show a positive correlation, there is no logical connection between them.

  • A presumed relationship happens when two variables seem logically connected, but no clear cause-and-effect relationship or common-cause factor can be identified. An example would be A study finding a correlation between playing chess and higher academic performance. It seems reasonable, but no direct link has been established. Presumed relationships are hard to prove but may still suggest interesting areas for further research.


A study finds a correlation between people who own more books and their intelligence levels. The researcher initially believes that owning books makes people smarter, but later discovers that highly intelligent people tend to buy more books. What type of relationship is this?

This is an example of a reverse cause-and-effect relationship; the researcher initially believes people who own more books are intellegent, when in reality, people who are more intellegent tend to purchase books.


Correlation Versus Causation

You might've seen a trend in the previous lessons where it would say Correlation does't mean causation. This is because one of the biggest mistakes in data interpretation is assuming that correlation implies causation.

  • Correlation means two variables move together, but one does not necessarily cause the other
  • Causation means one variable directly influences the other

An example of this would be data showing that ice cream sales and drowning rates increase together. This doesn't mean eating ice cream causes drowning! The real cause is summer heat, causing people to swim and buy ice cream. Just because both ice cream sales and drowning rates increase together, doesn't mean one causes the other.

To prove that one variable causes another, you need experiments, statistical techniques, and controlled testing, and not just analyzing if they trend up together.


What is the difference between correlation and causation?

Correlation means that two variables trend together, but not because of each other. Causation means the two variables trend together because of each other.


Extraneous Variables

In simple terms, An extraneous variable is an outside factor that affects either the dependent or independent variables, potentially misleading the results. These variables make it seem like there is a causal relationship when, in reality, there isn't. For example, you might see a correlation between people with a higher education and higher incomes. But, there could be an extraneous bariable of Family wealth, as that influences both education and income levels.


Control Groups

In order to reduce the effect of extraneous variables, researchers often compare an experimental group to a control group. These two groups should be as similar as possible, so that extraneous variables will have about the same effect on both groups. The researchers then vary the independent variable for the experimental group but not for the control group, to analyze for any difference in the dependent variable for the two groups. If there is, the change in data would correlate to the changes done on the independent variable.

For example, there might be a studying being done on a new weight-loss pill. The treatment group takes an actual pill, while a control group takes a placebo (or fake) pill. If the treatment group loses a bigger amount of weight, the pill likely works.


Cause and Effect Techniques

When attempting to find correlation, you need to have background knowledge and insight to recognize the causal relationships (when a change in X directly causes a change in Y) That are currently present. In order to determining wether a correlation is the result of a cause-and-effect relationship, we can use the following techniques:

  • Use sampling methods that put the extraneous variables constant, or as constant as possible
  • Conduct similar investigations with different samples and check for consistency in the results
  • Remove, or account for, possible common-cause factors

A fitness researcher wishes to determine whether a newly developed exercise regimen intended to increase cardiovascular endurance is successful. The study has 80 volunteers, all of whom begin with comparable levels of fitness. In order to guarantee that each group has an equal distribution of ages and genders, the researcher divides them into two groups. While one group sticks to the new exercise regimen, the other group keeps up their regular exercise regimen. After eight weeks, the endurance of the first group increases by 15% on average, while the second group shows an average increase of 7%.

  1. Identify the experimental group, the control group, the independent variable, and the dependent variable
  2. Can the researcher conclude the new workout program is effective? Why or why not?

i. We can identify the different type of groups as such:

  • Experimental Group: The group that follows the new workout program.
  • Control Group: The group that follows the standard workout routine.
  • Independent Gariable: The type of workout program (new program vs. standard routine).
  • Dependent Gariable: Cardiovascular endurance improvement (measured as percentage increase).

ii. Since the new regiment has a significant increase in the average improvement relative to the original workout routine, it suggests the program may be more effective.

However, the sample is small enough that the results could be affected by random statistical fluctuations or extraneous variables, such as the volunteer's attempts at fitness in the past.