Introduction
In the previous chapter, we learned how correlations reveal relationships between variables. We saw that when two variables are correlated, knowing someone's score on one variable helps predict their score on the other. But predicting something isn't the same as knowing what caused it.
Just because two variables are correlated doesn't mean that changes in one variable cause changes in the other. When behavioral scientists say, "A causes B," they mean that manipulating variable A will create changes in variable B through some underlying mechanism.
Understanding these mechanisms is the key to changing behavior. If anxiety causes depression, for instance, then treating anxiety might prevent depression. But, if anxiety and depression are merely correlated because both are caused by some other factor, then treating anxiety might not affect depression.
Behavioral scientists encounter tricky questions about causation all the time. For example, many studies have found correlations between increased social media use and higher rates of depression and anxiety among teenagers. This research has led to calls for limiting teens' access to social media. However, the causal relationship is complex (e.g., Valkenburg, 2022). Does social media use cause mental health problems? Or do teens who struggle with mental health use social media more? Perhaps both variables are influenced by other factors like social isolation, family dynamics, or personality.
The stakes of answering these questions are high. Parents and schools invest millions of dollars in programs limiting teenagers' screen time and tech companies face calls for regulation. If the relationship between social media and mental health is not causal, these interventions will waste resources and fail to address teen mental health.
There are two reasons why correlational research struggles to establish causation. The first is the directionality problem. When two variables are correlated, it is often unclear which variable influences the other. The second problem is the third-variable problem; two variables might be correlated because both are affected by something else—a third variable.
In this chapter, we will explore how behavioral scientists move beyond simple correlations to build a case for causation when experiments are not an option. Throughout the chapter, we will apply the techniques of causal inference in a series of guided analyses with the same clinical dataset you used in past chapters.
In Module 6.1, we will dig into the third-variable problem. We will learn how to use covariates in statistical analyses to examine relationships while holding potential confounding variables constant. Then in Module 6.2, we will tackle the directionality problem, introducing longitudinal research. We will learn how measuring variables at multiple points in time helps researchers establish temporal precedence, something that helps build the case for a causal relationship.
Finally, Module 6.3 will give you the opportunity to design a correlational study that incorporates techniques of causal inference. You will develop hypotheses about causal relationships, plan appropriate statistical controls, and consider how these designs can strengthen your causal claims. By the end of the chapter, you will understand how behavioral scientists build evidence for causality in correlational research, and you will have practical experience applying these techniques to real data.
Chapter Outline
Controlling for Third Variables
Explore how statistical controls help researchers rule out third variables and make stronger claims about causal relationships in data.
The Directionality Problem
When behavioral scientists discover a correlation between two variables, it raises an important question: Does this relationship reflect a causal connection or is it explained by other factors? Internal validity refers to how confident researchers can be that the effect they have observed is due to a cause-and-effect relationship, rather than the result of unmeasured or confounding variables.
Internal validity is often at the heart of debates about how to apply the results of research to problems in daily life, as in this example.
Decades of research have found a correlation between playing violent video games and aggressive behavior in adolescents (e.g., Anderson & Bushman, 2001). Based on this research, some groups have called for regulating video games. But, the issue of internal validity raises important questions. First, do video games lead teenagers to behave aggressively or, perhaps, do teenagers who are prone to aggressive behavior prefer to play violent games? This question is referred to as the directionality problem (see Figure 6.1).
Whenever researchers observe a correlation between two variables, the causal effect could go in either direction. The mere existence of a correlation between playing violent video games and aggression does not allow us to disentangle which variable is the cause and which is the effect. Indeed, research on this topic articulates two hypotheses. The socialization hypothesis states that playing violent video games increases aggression over time. Meanwhile, the selection hypothesis states that aggressive teenagers are more likely to play violent video games over time.
Research Portfolio
Portfolio Entry #19: Digging into the Directionality Problem
- Describe the differences in how the socialization and selection hypotheses explain the correlation between playing violent video games and violent behavior. How is this an example of the directionality problem?
- Conduct a Google Scholar search to find an article that supports either the socialization hypothesis or the selection hypothesis. Briefly describe the methodology used in the article that led to these conclusions. Paste a link or reference to the article in your portfolio.
The Third Variable Problem
Separate from the directionality problem there is another reason that the correlation between violent video games and aggressive behavior cannot establish causation: the third-variable problem. The third variable problem is the idea that something besides violent video games and aggressive behavior, something like a lack of parental supervision, could cause the correlation (see Figure 6.2). If a third variable exists, then there is no causal relationship between violent video games and aggressive behavior—they are only correlated because both are caused by lack of parental involvement. And, of course, if playing violent video games does not cause aggressive behavior, then regulating violent games will do little to reduce aggressive behavior (Przybylski & Weinstein, 2019).
Consider how parental involvement might explain the observed correlation between violent video games and aggressive behavior. Parents who are less involved in their children's lives may be less likely to monitor their children's media consumption, allowing them to play violent video games without supervision or time limits. These same parents may also be less likely to teach emotional regulation skills, set consistent behavioral boundaries, or notice early signs of aggressive behavior that could be addressed through intervention. Additionally, children with less parental involvement may experience more stress, less emotional support, and fewer structured activities, all factors that could independently contribute to aggressive behavior.
This pattern reflects a broader phenomenon known as polyvictimization. Children who experience one risk factor (like inadequate supervision) tend to experience multiple, co-occurring challenges (Finkelhor et al., 2007). These might include exposure to family conflict, peer rejection, academic struggles, neighborhood violence, and bullying (Litman et al., 2015).
In this scenario, both the exposure to violent video games and the aggressive behavior, in addition to numerous other factors such as bullying, being the victim of neighborhood violence, academic struggles, and family conflict, would all be correlated together and would be symptoms of the underlying cause of insufficient parental involvement, rather than causing each other. This illustrates why researchers must carefully consider third variables when interpreting correlational findings—what may appear to be a direct relationship between two variables may actually be the result of an unmeasured cause.
Research Portfolio
Portfolio Entry #20: Digging into the Third Variable Problem
- Describe how parental involvement can cause children to both play more violent video games and be more aggressive. How does this demonstrate the third-variable problem?
- Conduct a Google Scholar search to find an article that argues that there is no causal link between violent video games and aggression. What is their evidence?
Why Control Matters: Strengthening Internal Validity
There are a few things researchers can do to strengthen the internal validity of correlational findings. One thing is to systematically identify and control for potential third variables. Controlling third variables helps rule out alternative explanations and builds a stronger case for causality. The second thing is conduct longitudinal research. Longitudinal research gathers data from participants at several points over time, helping establish which variable is more likely to cause the other. We will examine how researchers control for third variables in this module before turning to longitudinal research in the next module.
A Thought Experiment: How to Control for Third Variables
One way to understand the logic of controlling for third variables is to engage in a thought experiment. Let's take the example of violent video games and aggression.
Suppose you think parental involvement explains this relationship. To rule out that possibility, you could design a study that includes only teenagers whose parents are highly involved in their lives. In other words, every child in the study would have parents who are maximally involved, as measured by a validated scale.
Now, if you find that violent video games predict higher aggression among this group of teenagers, you would have reason to believe the relationship is not due to a lack of parental supervision. That is because transforming parental involvement from a variable into a constant eliminates its ability to explain differences in aggression. It's sort of like trying to figure out why some plants in your garden grow taller than others. If they all get the same amount of water, then water cannot be the reason for height differences. In addition, if parental involvement was the cause of the games-aggression relationship, then when you remove its influence by making it identical for everyone, that relationship should disappear. If violent games still predict higher aggression even when parental involvement is the same across all teens, then parental involvement wasn't the real explanation. Something else was.
While it is rarely practical to control variables by selecting only children with maximally involved parents in real life, researchers can achieve the same goal statistically. Instead of selecting only participants who are equal on some variable, such as parental involvement, statistical techniques allow researchers to ask: What would the relationship between two variables look like if everyone were equal on a third variable? That is the principle behind the analysis we will conduct next.
Controlling Third Variables with ANCOVA: Marital Status, Depression, and Age
Let's examine how this works using a finding from the clinical dataset we worked with in past chapters. The finding involves the relationship between marital status and depression. In the dataset, single people reported higher depression (M = 7.4) than those who were married (M = 5.8). This difference is statistically significant (t = 2.5, p < .05), indicating an association between depression and marital status (Figure 6.3). You can reproduce these results by following the instructions in Box 6.1.
Box 6.1: Instructions for conducting the t-test on depression and marital status
Open the Dataset
- Open SPSS and load the "RITC_DATA_CH06_ClinicalStudy.sav" dataset from the Chapter 6 folder on OSF
Filter the Data to Include Only Single and Married Participants
- Click on Data → Select Cases
- Select "If condition is satisfied" and click on "If..."
- In the formula box, enter: Mar = 1 | Mar = 6. This selects cases where marital status is either 1 [Single] or 6 [Married].
- Click "Continue" and then "OK"
Run an Independent Samples t-test
- Click on Analyze > Compare Means → Independent-Samples T Test
- Move Depression SCORES to the "Test Variable(s)" box
- Move mar (MaritalStatus) to the "Grouping Variable" box
- Click "Define Groups" and set groups (1 = Single, 6 = Married)
- Click OK to run the analysis
Create a Bar Chart
- Click on Graphs > Chart Builder > Bar
- Move mar (MaritalStatus) to the "X Axis" box
- Move depression SCORES to the "Y axis" box, select "Mean" as the statistic
- Click "Display Error bars" and then select standard error. Change the value to 1
- Click "OK" to create the chart
Remove Empty Categories in Chart Editor
- Double-click on the chart to open the Chart Editor
- Click on any of the category labels on the x-axis to select them
- Double click on a label again and select "Categories..." from the pop up menu
- In the Categories dialog box, select each empty category (e.g., divorced) and then click the red "X" to delete it.
- Remove all categories except for Married and Single
- Click "Apply" and then "Close" to update the chart
At face value, you might interpret this result to mean that something about being married protects people from depression. In other words, you might assign a causal explanation.
After observing any association, however, it is important to think about potential third variables. In the case of marriage and depression, what third variables might explain this effect? Before reading on, see if you can come up with a plausible third variable explanation.
Stop and Discuss!
- Look at Figure 6.3. What does it mean to say that there is an apparent causal relationship between marriage and depression?
- Thinking of the relationship between marriage and depression, what third variable(s) could explain the relationship between marriage and depression? (Examine Figure 3.2 for a hint).
Here is a potential explanation for the relationship between marriage and reduced depression that does not invoke marriage as a cause of lower depression. You may remember from Chapter 3 that the National Institutes of Mental Health survey found older adults tend to be less depressed than younger adults (see Figure 3.2). We also saw something similar in the previous chapter where age was negatively correlated with depression (see Figure 5.5).
At the same time, we know age is correlated with marital status; people start life single and some later get married. Indeed, the average age of married people in the analysis shown in Figure 6.3 was 44.5 years old while the average age of single people was 36.5 years old, an eight-year difference. Given that people are less depressed as they get older and they get married as they get older, it is possible married people are less depressed because they are older, not because they are married.
The relationship in Figure 6.4 identifies the problem: age is a plausible third variable that might explain why married people report lower depression than single people. In this situation, age is referred to as a covariate—a third variable that influences both marital status and depression. If that is the case, we cannot be confident that marriage is the cause of the observed difference in depression. So how do we figure out whether age is driving the effect? The answer lies in the logic we applied to the video game example.
Let's imagine a controlled version of a study in which researchers hold age constant. If researchers conducted a study and sampled only people who are exactly 40 years old, then they could compare depression scores between married and single people. Because everyone in the study would be the same age, any differences in depression could not be due to age. If the researchers found that married 40-year-olds were less depressed than single 40-year-olds, they would know age does not account for differences in depression. If, on the other hand, they found that married 40-year-olds were just as depressed as single 40-year-olds, it would suggest that age—and not marriage—explains the original relationship.
Figure 6.5 shows a visual representation of the hypothetical design. The key to this design is understanding that age has been made into a constant rather than a variable. Everyone in the study is the same age. As you may remember from Chapter 3, constants do not change across people in a study. By making age a constant, researchers can remove it as an explanation for any differences observed within the study.
Research Activity 6.1: Statistically Controlling for Third Variables
Designing studies the way we imagined in the thought experiment—where researchers select only people of the same age—is almost never done in practice. That is because recruiting people who are exactly the same age or exactly the same on any other characteristic would be extremely difficult. Instead, researchers use statistical techniques that mathematically transform a variable into a constant, allowing them to examine relationships as if everyone in the sample were the same age.
When age is added as a covariate, the statistical software calculates what the results would look like if everyone in the sample were the same age. Let's look at how this works in an analysis.
The analysis required to examine whether married people are less depressed than single people while statistically controlling for age is called an ANCOVA—Analysis of Covariance. An ANCOVA is appropriate when one variable is categorical (marital status) and the other is continuous (depression). ANCOVA will tell us what the mean difference between the married and single groups would be if everyone in the sample was the same age and whether that adjusted difference is statistically significant.
You can follow the steps in HOW TO Box 6.2 to run the ANCOVA and create the bar chart required for this activity or you can watch the accompanying video: https://bit.ly/CH6ancova.
When age is controlled for in the analysis, the difference in depression between married (M = 6.10) and single (M = 6.97) people shrinks and is no longer statistically significant (p > .05).
Figure 6.6 shows what depression scores would look like if everyone in the sample were exactly forty-one and a half years old (the average age of the sample). Notice there is still a difference in depression between groups, but the difference is much smaller than in the original analysis. The smaller difference between groups and loss of statistical significance suggests that age, rather than marital status itself, might better explain why married people are less depressed. Specifically, our results suggest that married people in our sample are less depressed because they are older.
This example illustrates a broader point about correlational research. While behavioral scientists cannot establish causation through correlational methods, controlling for third variables allows them test hypotheses about whether specific third variables are playing a role. Sometimes, controlling for a third variable reveals that what appeared to be a meaningful relationship is better explained by other factors (as in the example above). At other times, the original relationship remains strong even after controlling for third variables.
Box 6.2: Control for Third Variables Using ANCOVA in SPSS
Open the Dataset
- Open SPSS and load the "RITC_DATA_CH06_ClinicalStudy.sav" dataset if it is not already open
Filter the Data to Include Only Single and Married Participants
- Click on Data → Select Cases
- Select "If condition is satisfied" and click on "If..."
- In the formula box, enter: Mar = 1 | Mar = 6. This selects cases where marital status is either 1 [Single] or 6 [Married].
- Click "Continue" and then "OK"
Run an ANCOVA analysis
- Click on Analyze in the top menu
- Select "General Linear Model" --> "Univariate"
- Move Depression SCORES into the Dependent Variable box
- Move mar (MaritalStatus) into the "Fixed Factor(s)" box
- Move "Age" into the "Covariate(s) box"
- Next, select "EM Means..."
- Choose "mar" (Marital Status) and move it into the "Display Means for:" box
- Select "Compare main effects" and then within the "Confidence interval adjustment:" box use the dropdown to select "Bonferroni"
- Before running the analysis, create a bar chart by selecting "Plots..."
- Move "mar" from the "Factors:" box over to the "Horizontal Axis": box
- Select "Add"
- Change the chart type to "Bar Chart"
- Click the box to "Include Error Bars" and select "Standard Error"
- Change the standard error multiplier to "1." Then click "Continue"
- Click OK to run the analysis
Each time the original relationship withstands the scrutiny of controlling for another third variable, it increases the researcher's confidence that there is meaningful effect. For example, behavioral scientists studying whether violent video games cause aggressive behavior in adolescents have controlled for many third variables, including parental supervision, socioeconomic status, age and developmental stage, competitiveness, mental health, the influence of peers, and several others (e.g., Adachi & Willoughby, 2011; Anderson & Bushman, 2002; Greitemeyer & Mügge, 2014; Markey & Markey, 2010). Over the years, the association between violent video games and aggressive behavior has remained. And, the ability to control for third variables in this way is a fundamental part of the research process because it helps build a stronger case for causality in correlational research.
Research Portfolio
Portfolio Entry #21: Describing the Relationship Between Marital Status and Depression after Controlling for Age
Once you have conducted the analysis above, paste the ANCOVA output to your portfolio. Also paste the bar graph of the original difference in depression between married and single people and the bar graph where age was controlled for.
Write a few sentences describing what the output shows about the relationship between age, depression, and marital status. Is depression associated with marital status after age is controlled for? Why?
Controlling for Third Variables with Regression
In the previous section, we saw how controlling for age led to a better understanding of the relationship between marital status (a categorical variable) and depression (a continuous variable). The same principle applies when examining correlations between two continuous variables, but a different statistical analysis is used.
Let's return to the correlation between anxiety and depression (r = .82), which are both continuous variables. While it might be tempting to conclude that anxiety causes depression, or perhaps that depression causes anxiety, it is possible that some other factor causes both anxiety and depression.
For instance, one plausible explanation for the anxiety-depression relationship involves trauma (Figure 6.7). People who live through traumatic events often experience distress and develop both anxiety and depression as a result. Common traumatic events include divorce, job loss, serious financial problems, academic failure, relationship breakups, family conflict, moving away from home, serious illness, being bullied or socially excluded, and the end of a close friendship. If trauma causes both anxiety and depression, then the correlation between depression and anxiety might arise because both are caused by a history of trauma.
To test this idea, we could use a statistical technique called multiple regression. Multiple regression is like a correlational analysis but with an additional capability: it allows researchers to examine how two variables correlate while controlling for other, third variables. Just as controlling for age in the previous example showed what the marriage-depression relationship would look like if everyone were the same age, regression allows researchers to see what the relationship between anxiety and depression would look like if everyone in the sample had experienced the same level of trauma. In this example, we will examine what the correlation between anxiety and depression would be if we only recruited people who had no trauma. This means the analysis would calculate the anxiety-depression correlation when trauma is not playing a role. It does so with something called a partial correlation.
Partial Correlations Explained
When researchers conduct a regression analysis an important piece of information they get is called a partial correlation. A partial correlation tells how strongly two variables are related after statistically holding other variables constant. Sometimes the partial correlation becomes much smaller than the original correlation and may no longer be statistically significant. This indicates that the variable the researcher is controlling for explains a lot of why the two variables were related. Other times, the partial correlation remains close to the original correlation, suggesting that the variable the researcher is controlling for does not explain much about why the two variables are related.
Looking at the size of the original correlation compared to the partial correlation helps explain whether the covariates are playing an important role in the relationship. If controlling for a third variable (trauma) causes the original correlation (anxiety–depression) to become much weaker or disappear entirely, it suggests the third variable might be the real reason the other two variables are related. If the correlation remains strong, even after controlling for other variables, it suggests there is a relationship that exists independently of the third variables that were controlled for. Let's look at an example.
Research Activity 6.2: Multiple Regression in Action
In this activity, we will see how the anxiety-depression relationship changes when controlling for people's past traumatic experiences. To run the regression analysis, you can either follow the steps in HOW TO Box 6.3 or watch the video for this activity: https://bit.ly/CH6multr.
Box 6.3: Control for Third Variables Using Regression in SPSS
Open the Dataset
- Open SPSS and load the "RITC_DATA_CH06_ClinicalStudy.sav" dataset if it is not already open
Run a linear regression analysis
- Click on "Analyze" in the top menu
- Select "Regression" --> "Linear"
- Move Depression SCORES into the Dependent Variable box
- Move both the predictor variable "Anxiety" and the control variable "Trauma" into the "Independent(s)" box
Request partial correlations
- In the Linear Regression dialog box, click on "Statistics"
- Check "Part and partial correlations"
- Click "Continue"
- Back in the main dialog box, click "OK" to run the analysis
Interpret your results
- In the "Coefficients" table, look at the significance values (Sig.) for each predictor
- If your predictor variable (e.g., Anxiety) remains significant (p < .05) even with the control variable in the model, this suggests a robust relationship
- Look at the "Correlations" section of the output which shows:
- Zero-order correlations (original correlation without controlling for other variables)
- Partial correlations (relationship after controlling for other variables)
- Compare the zero-order correlation with the partial correlation to see how much the relationship changes after controlling for the third variable
As a reminder, the original correlation between anxiety and depression was r = .82. When we conduct the regression analysis and receive the partial correlations—what the correlation would be if everyone in the sample had no trauma-related distress—we see the anxiety-depression correlation is .76, which remains statistically significant (Figure 6.8). This tells us two important things.
First, trauma explains some of the relationship between anxiety and depression. You can see this in the reduced correlation, from .82 to .76. However, the partial correlation remains quite strong (.76), which tells us the second important thing: anxiety and depression have a robust relationship that exists even after controlling for trauma. This means that even if everyone in the sample had no trauma-related distress, there would still be a strong correlation between anxiety and depression.
Reporting Results: What Does the Regression Show?
Here is an example of how to report the results of this analysis. You can use this as a template for reporting your own results.
"I conducted a multiple regression analysis to examine the relationship between anxiety and depression while controlling for trauma-related distress. The zero-order correlation between anxiety and depression was strong, r = .82, p < .001. After statistically controlling for trauma-related distress, the partial correlation between anxiety and depression decreased slightly but remained significant, rp = .76, p < .001. These results suggest that while trauma explains part of the association, a strong relationship between anxiety and depression persists even after accounting for trauma."
Controlling for Multiple Third Variables at Once
Of course, trauma-related distress is not the only variable that might explain the relationship between anxiety and depression. Poor sleep could play a role: people who sleep poorly often experience both anxiety and depression. Age might also be important. As we saw earlier, age relates to both depression and anxiety. Finally, socioeconomic factors like income and education might contribute, since they affect both anxiety and depression.
Multiple regression allows researchers to control for all these variables simultaneously. What does this mean? Rather than controlling for each variable one at a time, it is possible to statistically calculate what the correlation between anxiety and depression would look like if everyone in the sample had the same trauma score AND the same sleep quality AND the same age AND the same income AND the same education level (Figure 6.9). It is just like the examples we have seen, but now the analysis holds several variables constant at once. Let's give it a try.
Research Activity 6.3: Anxiety and Depression, Controlling for Trauma, Sleep, and More
To conduct this analysis, return to the Clinical dataset. Then, you can follow the steps in HOW TO Box 6.4 or watch the video for this activity: https://bit.ly/Ch6cmtv.
When you control for all variables in the analysis simultaneously, the partial correlation between anxiety and depression drops to .656 (Figure 6.10). This drop is larger than when controlling for trauma alone—the partial correlation then was .76. The reduction in the correlation indicates that the additional variables in the analysis explain some of why anxiety and depression are related.
Box 6.4: Control for Multiple Third Variables Using Regression in SPSS
Open the Dataset
- Open SPSS and load the "RITC_DATA_CH06_ClinicalStudy.sav" dataset if it is not already open
Run a multiple regression analysis
- Click on "Analyze" in the top menu
- Select "Regression" > "Linear"
- Move "Depression" into the "Dependent" box
- Move all variables into the "Independent(s)" box:
- Anxiety (main predictor of interest)
- Trauma (third variable #1)
- Sleep (third variable #2)
- Income (third variable #3)
- Education (third variable #4)
- Request partial correlations
- In the Linear Regression dialog box, click on "Statistics"
- Check "Part and partial correlations"
- Click "Continue"
- Back in the main dialog box, click "OK" to run the analysis
Interpret your results
- In the "Coefficients" table, look at the significance values (Sig.) for each predictor. Focus on the partial correlation value for Anxiety (your main predictor of interest)
- Compare this partial correlation to the zero-order correlation to see how much the relationship changes after controlling for all these variables simultaneously
Looking at the analysis reveals that several control variables have their own relationship with depression. Sleep quality, for instance, shows a significant partial correlation of .419 with depression, suggesting that poor sleep is independently related to depression even after controlling for everything else, including trauma and anxiety. Trauma has a smaller but still significant relationship with depression (partial correlation = .157). Income, however, is not significant once the other factors are controlled for.
Yet even after accounting for all the control variables in this analysis—essentially asking what the anxiety-depression relationship would look like if everyone were identical on all the control characteristics—the partial correlation of .656 remains high and statistically significant. This suggests that while trauma, sleep, income, and education together explain part of the relationship between anxiety and depression, there is something robust about the relationship between anxiety and depression that exists independently of these other factors.
Research Portfolio
Portfolio Entry #22: Reporting What these Controls Reveal about Anxiety and Depression
Once you have conducted the analysis, paste the regression output to your portfolio. The output should include the original correlation (zero-order) and the partial correlation.
Write a few sentences describing what the output shows about the relationship between depression, anxiety, and trauma-related distress. Interpret the partial correlation in your own words. Is trauma-related distress the likely cause of depression and anxiety?
Why Statistical Control Strengthens Causal Claims
Throughout this module, we have explored how behavioral scientists address the third-variable problem in correlational research. Whenever researchers discover a relationship between two variables—like marriage and depression or anxiety and depression—they must consider whether other factors might explain the relationship. By statistically controlling for potential third variables, it is possible to build stronger evidence for causal claims.
We have examined how researchers do this with two examples. First, we found that the relationship between marital status and depression became non-significant after controlling for age. This suggests that age, rather than marriage itself, might explain why married people in our sample reported lower levels of depression. Second, we discovered that while trauma, sleep quality, income, and education explain some of the relationship between anxiety and depression, a strong association remains even after controlling for these factors.
The statistical techniques introduced thus far are powerful tools. They allow researchers to explore the potential causes of the associations they observe. Would married and single people still differ in depression if they were the same age? Would anxiety and depression still be related if everyone had the same level of trauma or the same quality of sleep?
The answers to these questions bring researchers closer to understanding causal relationships, even within correlational designs. When a relationship disappears after controlling for a third variable (as with marriage and depression), it suggests the original relationship may have been spurious. When a relationship remains robust after controlling for several alternative explanations (as with anxiety and depression), however, researchers gain more confidence in the findings' importance—even though they still cannot definitively claim causation.
Ruling out third variables is essential when conducting research. Whether researchers are studying the effects of social media on teen mental health, the relationship between video games and aggressive behavior, or the connection between exercise and mood, statistical control techniques help separate genuine relationships from those better explained by other factors. In the next module, we will build on these methods by exploring how to address another key challenge in establishing causality: determining which variable comes first.
The Directionality of Cause and Effect
Examine the directionality problem by learning about temporal precedence and how longitudinal designs can strengthen causal interpretations.
The Directionality Problem: Which Comes First?
In the previous section, we learned how researchers address the problem of third variables. But correlational research also faces the directionality problem. When two variables are correlated, it is often unclear which variable causes changes in the other. Let's explore how researchers tackle directionality by returning to the relationship between anxiety and depression.
As we have discussed, there is a strong positive correlation between anxiety and depression. Yet the problem of directionality asks: does anxiety lead to depression, or does depression lead to anxiety (Figure 6.11)?
Think about how this might play out in the life of a friend. Maybe your friend experienced anxiety about an upcoming exam, which led to difficulty sleeping and concentrating. As their anxiety persisted, they started feeling hopeless about their academic performance, lost interest in activities they usually enjoy, and developed symptoms of depression. In this case, anxiety preceded and potentially contributed to depression.
But the opposite is equally plausible. Perhaps your friend first experienced depression, feeling unmotivated and struggling to keep up with coursework. As the assignments piled up, they became increasingly anxious about falling behind, developing symptoms of anxiety that were not present before. In this case, depression led to anxiety.
The question of directionality has practical application as it affects what interventions might be effective. If anxiety leads to depression, then preventing or treating anxiety early might help prevent depression from developing. But, if the causal direction runs the other way, then early intervention for anxiety won't do much to prevent depression.
Behavioral scientists face the question of directionality whenever they find a correlation between two variables. For example, does social media use lead to loneliness or does loneliness lead to social media use (Keles et al., 2020)? Do violent video games increase aggression or do people who are naturally aggressive choose violent games (Anderson & Bushman, 2001)? Does exercise improve mood or are people in better moods more likely to exercise (Hyde et al., 2011)? A correlation cannot answer these questions alone.
Why Temporal Precedence Matters
For one thing to cause another, the cause must come before the effect. This simple idea—called temporal precedence—is a requirement of causality. If researchers want to say depression causes anxiety, they need to show that increases in depression precede increases in anxiety. Similarly, if anxiety causes depression, increases in anxiety must precede increases in depression.
A simple correlation between anxiety and depression measured at one point in time does not tell us anything about what came first. All that a correlation reveals is that two variables tend to occur together. It says nothing about which variable developed first or whether changes in one variable preceded changes in another.
To establish temporal precedence, variables need to be measured at different points in time. To do that, behavioral scientists often turn to longitudinal designs.
Establishing Temporal Precedence with Longitudinal Research
Longitudinal research examines how variables relate to each other over time. Rather than measuring variables just once, longitudinal studies measure the same variables multiple times over a period of days, weeks, months, or even years. Longitudinal research makes it possible to untangle how the variables relate to each other over time.
Let's return to the example of anxiety and depression. Instead of measuring these variables once, imagine a researcher measured them twice—once at the beginning of the semester (Time 1) and again at the end of the semester (Time 2). The same people would participate in both Time 1 and Time 2 measurements. This design opens powerful possibilities for understanding how anxiety and depression relate to each other.
To demonstrate the possibilities, we gathered data from 480 Connect participants over a one-year period. Each participant completed measures of anxiety and depression at two timepoints, one year apart. With this dataset, we are interested in whether depression at Time 1 predicts increased anxiety at Time 2 and vice versa: does anxiety at Time 1 predict depression at Time 2?
Correlations between two different variables across time are called cross-lag correlations, while a correlation between two variables at one point in time is called a cross-sectional correlation. (see Figure 6.12). When we examine the cross-lag correlation between depression at Time 1 and Anxiety at Time 2, we see that depression at Time 1 predicts anxiety one year later. As Figure 6.13 shows, in a simple correlational analysis there is a strong correlation between depression at Time 1 and anxiety at Time 2.
The cross-lag correlation establishes that depression scores predict anxiety scores one year later. This is important because prediction is a requirement of causation: for one variable to cause another, it must predict that variable. But the correlation alone doesn't show that depression at Time 1 actually caused the anxiety at Time 2. Here's why: people who are depressed at Time 1 also tend to be anxious at Time 1. So, when we see that Time 1 depression predicts Time 2 anxiety, we cannot tell whether it's because (a) depression leads to anxiety over time, or (b) people who were already anxious at Time 1 simply stayed anxious at Time 2. In other words, the relationship might just reflect the fact that anxious people tend to stay anxious, not that depression causes anxiety.
To answer the questions above requires statistical controls. By controlling for anxiety at Time 1, we can examine whether higher levels of depression predict future increases in anxiety among people who started with no anxiety (Figure 6.14).
Let's consider a simplified version of the study to clarify the statistical analysis. Imagine that in September of 2024, we recruited participants who varied in their levels of depression but everyone reported no symptoms of anxiety. In other words, every participant had an anxiety score of zero as measured by the GAD-7. One year later, in September of 2025, we brought the same participants back and measured their levels of depression and anxiety. Suppose we found that many of the participants who had no anxiety in 2024 now showed signs of anxiety. We could then conduct a cross-lag correlation to examine whether depression in 2024 predicted the rise in anxiety one year later.
If we found that people who were more depressed in 2024 were more likely to develop anxiety in 2025 under these conditions, we would have strong evidence of temporal precedence. In this scenario, depression was present before anxiety developed. Importantly, if we statistically hold anxiety constant in 2024—meaning we compare people who all started out with no anxiety—and still find that depression predicts anxiety in 2025, then we can conclude that depression both precedes and predicts anxiety. That is, people who began with similar levels of anxiety but differed in depression show different anxiety outcomes a year later, with those higher in initial depression experiencing greater increases in anxiety. This pattern would provide strong evidence that depression precedes the development of future anxiety.
This is what the regression analysis we will conduct below accomplishes. It calculates what the cross-lag correlation between depression in 2024 and anxiety in 2025 would be if everyone had no anxiety in 2024.
Research Activity 6.4: Correlation Between Anxiety and Depression One Year Later: Cross Lagged Correlations
You can download the dataset for this activity from the Research in the Cloud OSF page: https://osf.io/a8kev/. In the folder labeled "Chapter 6 – Causal Inference" you will find a file titled "RITC_DATA_CH06_Longitudinal.sav" Download this file and open it in SPSS. Then, follow the steps below or watch the video for this project to conduct the analyses: https://bit.ly/CH6crossl.
To conduct the analysis, we will turn to regression. In SPSS, select "Regression," → Linear and put Time 2 anxiety in the dependent variable box. Then, in the Independent(s) box add Time 1 depression and Time 1 anxiety. This will calculate the cross-lag correlation between depression at Time 1 and anxiety one year later, while acting as if everyone had the same level of anxiety at Time 1.
As displayed in Figure 6.15, the association between depression at Time 1 and anxiety at Time 2 remains statistically significant.
What do these results say about the relationship between depression and anxiety? First, they reveal that depression at Time 1 predicted anxiety one year later. Second, and more importantly, the results show this relationship held even after controlling for initial anxiety. This means among people who started with no anxiety in 2024, those with higher depression scores showed a larger increase in anxiety over the following year than those with lower depression scores. Such a pattern suggests that depression's ability to predict anxiety one year later cannot be explained simply by pre-existing anxiety levels. In other words, we have established temporal precedence for depression.
Overall, this example illustrates how longitudinal research can help scientists better understand the complex relationships between psychological variables. By measuring variables at multiple time points and controlling for people's initial levels of something like anxiety, researchers can build a stronger case for how one variable might influence another over time.
Reporting the Results of a Cross-Lagged Regression
Here is an example of how to report the results of this analysis. You can use this as a template for your own results.
"I conducted a cross-lagged regression analysis to examine whether depression at Time 1 (September 2024) predicted anxiety at Time 2 (September 2025), controlling for baseline anxiety. The results indicated that Time 1 depression significantly predicted increases in anxiety one year later, even after accounting for initial levels of anxiety, B = .12, t(478) = 2.1, p < .05. This suggests that higher depression at baseline was associated with higher anxiety at follow-up, independent of initial anxiety levels."
Research Portfolio
Portfolio Entry #23: Describing the Longitudinal Relationship Between Anxiety and Depression
Once you have conducted the analysis, paste the regression output to your portfolio. The output should include the original cross-lag correlation between depression and anxiety and the partial correlation with anxiety controlled for at Time 1.
Write a few sentences describing what the output shows about the relationship between depression, anxiety, and temporal precedence. Explain the finding in your own words.
Finally, conduct an analysis establishing the temporal precedence of anxiety relative to depression (i.e., the reverse of the analysis above). Is there evidence for bidirectional causality?
Combining Approaches: Multiple Controls in Longitudinal Research
At the beginning of this chapter, we discussed two challenges in establishing causality: the third-variable problem and the directionality problem. You have now seen how statistical controls can address the third-variable problem, and how longitudinal designs can address the directionality problem. But the most compelling evidence for causality in correlational research comes from combining these approaches. By controlling for multiple third variables in a longitudinal study, researchers can build a compelling case about the causal relationships between variables. For instance, scientists studying social media's effects on teen mental health might control for pre-existing conditions, family factors, and other variables while tracking changes over time to make a stronger case for causality (see Przbylski and Weinstien, 2019).
Let's see how this plays out with our example of anxiety and depression. While a longitudinal study helps researchers understand which variable comes first, third variables can still play a role. Even when depression predicts future anxiety after controlling for initial anxiety levels, as we found in the previous example, third variables might still be causing both. Remember the third variables discussed earlier—trauma, sleep quality, and income? These could still influence both depression and future anxiety.
To rule out these explanations, it is possible to combine approaches. Just as the previous study controlled for initial anxiety levels, it could also control for other important variables measured at Time 1. For instance, we could statistically control for people's initial levels of traumatic stress, making everyone's stress level the same at the start. We could also do the same with sleep quality and social support. By controlling for all these variables at Time 1, we not only demonstrate the temporal precedence of depression relative to anxiety, but also rule out that both depression and anxiety are being caused by other third variable covariates (Figure 6.16).
This approach is particularly valuable because psychological variables rarely operate in isolation. Mental health, like most aspects of human behavior, involves complex relationships between many variables. By controlling for multiple factors at once, researchers can better understand the unique role that each variable plays in predicting future outcomes.
Designing Your Own Causal Inference Study
Design a study that controls for confounds and makes a compelling case for causality.
In the previous modules, we discussed how behavioral scientists address the third-variable problem through statistical controls and the directionality problem through longitudinal research. Now it's time to apply these techniques in your own correlational study that incorporates the methods of causal inference.
From Correlation to Causal Inference: Your Research Project
Throughout this book, you have been building research skills step by step. In Chapter 3, you conducted descriptive research, programming an online survey and conducting basic statistical analyses. In Chapter 4, you learned to measure psychological constructs and how to find and develop scale instruments. In Chapter 5 you examined correlations between variables. Now you will take your research to the next level by addressing potential third variables that might explain the correlations you observe.
This project gives you the opportunity to design and conduct a correlational study that controls for a relevant third variable. You will collect real data on Connect, analyze it using the techniques you have learned, and interpret your findings by comparing the original uncontrolled effect to the effect that is observed after controlling for a covariate.
Designing and Conducting a Causal Inference Study
For this project, you will design and conduct a correlational study that examines a potential causal relationship between two variables while controlling for at least one plausible third variable.
Step 1: Choose your research question
Start by selecting a research question that interests you. Your question should focus on a potential causal relationship between two variables. You have several options for approaching this project.
First, you might build on your Chapter 5 project. If you conducted a correlational study in the previous chapter, you can expand it here by identifying and measuring potential third variables that might explain the relationship you found. This approach allows you to deepen your investigation of a topic you have already started exploring.
Second, you can draw from measures we have explored in previous chapters. These include personality traits from Chapter 1; moral foundations from Chapter 3; clinical variables such as anxiety, depression, or sleep quality from Chapters 5 and 6; or variables that can predict responses to moral dilemmas like the Heinz dilemma. Using established measures simplifies your design process and connects your work to the broader research literature.
Third, you might create your own measures. Using techniques from Chapter 4, particularly AI-assisted scale development, you can create measures for constructs not covered by existing scales. This approach gives you more flexibility to investigate what interests you.
When forming your research question, focus on how variables relate to each other when accounting for other factors. Your question should examine relationships that are both theoretically interesting and practically measurable using a measurement instrument of between five and ten questions. For example, you might ask: is openness to experience associated with people's willingness to engage with opposing political viewpoints when controlling for political orientation? Or is the care/harm moral foundation related to donation intentions toward different charitable causes when controlling for agreeableness or empathy? Any of these questions, or hundreds of others, will work.
Step 2: Identify Potential Third Variables
Once you select a research question, identify at least one potential third variable that might explain the relationship between your main variables. For each third variable, consider why it might be related to your predictor variable, why it might be related to your outcome variable, and how it could potentially explain the relationship between the variables.
The process of identifying third variables requires both creativity and knowledge of your topic. Think about what factors might influence both of your main variables. Consider demographic characteristics, environmental factors, or psychological traits that could give rise to both variables you are interested in.
Step 3: Design Your Study
After you have identified your variables, plan your study. Your plan should specify how you will operationalize each variable. For instance, which scale or questions will you use to measure your predictor variable, outcome variable, and third variables? Be specific. If you are creating new measures using AI, document your process following the guidelines from Chapter 4.
Next, consider participant recruitment. As in previous projects, you can recruit participants from your university's participant pool using SONA, send your study to friends and family, or use Connect. You should aim for around 100 participants, if possible.
Using the skills you developed in previous chapters, create your survey in Qualtrics or Engage. Organize your survey into clear, logical blocks that guide participants through the survey experience.
You do not need to build your survey from scratch. Take an existing correlational study from the previous chapter and add another block for the control variable. You can swap the variables in that study for the ones you will use in this one.
Step 4: Collect and analyze Your Data
If you are collecting data from Connect, you can follow the step-by-step instructions for setting up a project from Chapter 3. Once your data collection is complete, download your data for analysis. Using SPSS or another statistical package, begin your analysis by examining the bivariate correlations between your predictor and outcome variable. Note the strength, direction, and statistical significance of these correlations.
Next, control for your identified third variables using multiple regression or ANCOVA, depending on your variable types. For continuous outcome variables, multiple regression is typically appropriate. For categorical predictors and continuous outcomes, ANCOVA would be the method of choice. The statistical techniques we practiced in this chapter will guide your analysis.
After running these analyses, examine how the relationship between your main variables changes when controlling for third variables. Does the correlation become stronger, weaker, or stay about the same? Does it remain statistically significant, or does the significance disappear once you control for third variables? These changes provide clues about the nature of the relationship between your variables.
Step 5: Interpret Your Findings
Based on your analysis, interpret what your findings suggest about potential causal relationships. If controlling for third variables substantially weakened or eliminated the relationship between your main variables, this suggests that the third variables might explain much of the original correlation. The apparent relationship between your main variables might be spurious rather than causal.
Conversely, if the relationship remains strong despite controlling for plausible third variables, this provides more confidence in a causal connection, though it still does not prove causation. There might be other unmeasured variables that explain the relationship, or the direction of causality might run in the opposite direction from what you hypothesized.
Consider what other explanations might exist for the patterns you observed. What other variables might you want to control for in future research? How might experimental methods address some of the limitations of your correlational approach? Critically thinking about alternative explanations is central to scientific reasoning.
Research Portfolio
Portfolio Entry #24: Writing it Up: Telling the Story of Your Study
After completing your study, prepare a 2-to-3-page research report that presents your project. Follow the instructions in the Part I Appendix for writing research reports. Your results section should report your findings, including both the simple correlations and the relationships after controlling for third variables.
In your discussion section, interpret your results in terms of potential causal relationships. Discuss what your findings suggest about whether one variable influences the other. Acknowledge limitations of your approach, such as the inability to manipulate variables experimentally or potential unmeasured third variables. Suggest directions for future research that might address these limitations.
Summary
Throughout this chapter, we have explored how behavioral scientists use sophisticated correlational techniques to build a case for cause-and-effect relationships. By statistically controlling for third variables, conducting longitudinal research to establish temporal precedence, or combining both approaches, researchers can build stronger evidence about how variables may be influencing each other in the moment or over time. For instance, when behavioral scientists find that depression predicts future anxiety even after controlling for initial anxiety, stress, sleep problems, and other variables, they are building a pattern that is consistent with depression being the cause of anxiety.
But correlational research, no matter how sophisticated, can never establish causation. There is always the possibility that some variable that was not measured or controlled for might explain the observed relationship. Because it is not possible to control for every possible third variable, researchers can never rule out all alternative explanations through correlational research alone.
This limitation is precisely why experiments are so valuable in behavioral research. In experimental studies, which we explore in the next chapter, researchers don't just measure and control for variables statistically. They manipulate variables to see what happens. They might randomly assign some participants to complete a stress-reduction program while others serve as a control group, then measure how this intervention affects both depression and anxiety. This kind of manipulation, combined with random assignment, provides the strongest possible evidence for causal relationships.
Yet this does not mean correlational research isn't valuable. Many important variables in behavioral science—like depression, anxiety, aggression, personality traits, or life experiences—cannot be easily manipulated in an experiment for either ethical or practical reasons, if they can be manipulated at all. In these cases, carefully designed correlational studies that combine longitudinal measurement with appropriate controls provide the best window into understanding cause and effect. These studies help researchers develop theories about how psychological processes work, which can then be tested more rigorously through experimental methods, when possible.
While correlational research cannot definitively prove causation, the statistical control techniques you have learned in this chapter can substantially strengthen causal inference. By thoughtfully controlling for third variables, you can build more compelling evidence for potential causal relationships.
Frequently Asked Questions
What is the third variable problem in research?
The third variable problem occurs when an unmeasured factor causes both variables in a correlation, making the apparent relationship spurious. For example, a correlation between violent video games and aggressive behavior might actually be explained by a third variable like lack of parental supervision, which independently influences both gaming habits and aggression.
How do researchers control for third variables?
Researchers use statistical techniques like ANCOVA (Analysis of Covariance) for categorical predictors and multiple regression for continuous variables. These methods mathematically transform a variable into a constant, allowing researchers to examine relationships as if everyone in the sample were equal on the third variable being controlled for.
What is temporal precedence and why is it important for causation?
Temporal precedence means that a cause must occur before its effect. It is essential for establishing causation because if researchers want to claim that variable A causes variable B, they must show that changes in A precede changes in B. Longitudinal research designs help establish temporal precedence by measuring variables at multiple time points.
What is a partial correlation?
A partial correlation shows how strongly two variables are related after statistically holding other variables constant. When the partial correlation is much smaller than the original correlation, it indicates that the controlled variable explains much of the relationship. When the partial correlation remains close to the original, the controlled variable does not explain the relationship.
Key Takeaways
- Internal validity refers to how confident researchers can be that an observed effect reflects a true causal relationship rather than the influence of confounding variables.
- The directionality problem occurs when a correlation exists between two variables but it is unclear which variable causes changes in the other.
- The third-variable problem arises when an unmeasured factor (such as parental involvement or trauma) might explain the correlation between two variables, making the apparent relationship spurious.
- A covariate is a third variable that researchers statistically control for to examine whether a relationship between two variables persists after accounting for the covariate's influence.
- ANCOVA (Analysis of Covariance) is used when examining relationships between categorical predictors and continuous outcomes while controlling for third variables.
- Multiple regression allows researchers to examine correlations between continuous variables while simultaneously controlling for multiple third variables.
- A partial correlation shows the relationship between two variables after statistically holding other variables constant, helping identify whether covariates explain the original relationship.
- Temporal precedence is the requirement that a cause must occur before its effect, establishing which variable came first in a potential causal relationship.
- Longitudinal research measures variables at multiple time points, enabling researchers to examine how variables relate to each other over time and establish temporal precedence.
- Cross-lag correlations examine relationships between different variables measured across time points, while cross-sectional correlations examine relationships at a single point in time.
- Combining statistical controls with longitudinal designs provides the strongest evidence for causality in correlational research by addressing both third-variable and directionality problems.
- Even sophisticated correlational techniques cannot definitively establish causation because unmeasured variables might still explain observed relationships.









