Introduction
Describing people's behavior is often the starting point for behavioral research. Correlational research steps in where descriptive research leaves off by aiming to uncover potential patterns of relationships between variables.
For example, in the previous chapter, we talked about measuring anxiety with the GAD-7. In a correlational study, researchers might move beyond describing anxiety to investigate how anxiety is associated with other variables such as depression, trauma, sleep, or self-esteem. Are anxious people more likely to be depressed? Are married people less likely to be anxious than unmarried people? Do people become less anxious as they get older? These are questions for correlational research.
Understanding associations, or patterns of relationships between variables, is what moves behavioral research from description to prediction. If a researcher establishes that variable A is related to variable B, then it is possible to predict values of variable B simply by knowing values of variable A. An example comes from the survey of mental health we examined in Chapter 3. The survey, conducted by the National Institute of Mental Health, reported that younger people experienced mental illness at a higher rate than older people. This means that older people have a lower risk of developing mental illness than younger people. The ability to predict risk is a big step forward in understanding.
In this chapter, we will learn how to analyze relationships between variables and discover meaningful patterns in behavioral data. In Module 5.1, we will explore the fundamentals of correlation, including how to measure, visualize, and interpret relationships between continuous variables like anxiety and depression. We will learn to distinguish between positive and negative correlations, understand the strength of a correlation, and create correlation matrices that reveal patterns across multiple variables.
In Module 5.2, we will expand our analytical toolkit by examining different types of associations. We will discover how to analyze associations between categorical variables (like gender or employment status) and continuous variables (like depression scores), using techniques such as t-tests and chi-square.
Module 5.3 offers a hands-on research project that applies correlational methods to the Heinz dilemma we encountered in Chapter 3. Using Moral Foundations Theory, you will develop and test hypotheses about how people's moral intuitions predict their ethical judgments. This guided project will walk you through each step of correlational research, from forming hypotheses to creating a study on Qualtrics, to analyzing and reporting the results.
Finally, Module 5.4 empowers you to design and conduct your own correlational study. You will develop a research question, create a survey to test the relationship between variables, and collect and analyze your own data. This independent project will allow you to apply everything you have learned so far and give you the chance to make your own discoveries about human behavior.
By the end of this chapter, you will understand how correlational research helps scientists move beyond simple description to identify meaningful patterns and make predictions about human behavior. You will also have practical experience conducting correlational studies, an essential skill for anyone interested in understanding the complex relationships that shape how people think, feel, and act.
Chapter Outline
What Do Correlations Tell Us?
Learn how to measure and interpret correlations using real-world examples like anxiety and depression.
In Chapter 3, we learned about descriptive research and the power of observing people's behavior. With systematic observation, researchers can not only describe what people do but also identify patterns, trends, and associations. An association occurs when changes in one variable correspond to changes in another variable in a systematic way. Here's an example from everyday life.
Many years ago, some psychology professors at big schools with competitive football teams observed something interesting: on Mondays after the football team won (the games were always played on Saturdays in those days), lots of students attended class in clothing that displayed the school's name or logo. On Monday's after the football team lost, however, school-affiliated clothing appeared less often. Curious to know if these observations were reliable, and if so, how strong the association between winning and wearing the school's clothing was, the professor's decided to conduct a study (Cialdini et al., 1976). The study is now a classic in the field of social psychology, showing how people try to affiliate with successful teams or people in order to boost self-esteem. For our purposes, it's also an introduction to correlational research.
Correlational research examines whether two variables are related to each other and quantifies their association with statistics. Instead of manipulating or controlling variables like in an experiment, researchers simply measure both variables as they naturally occur and look for patterns. To understand how correlational research works, we will examine a real dataset several times throughout this chapter. The data comes from over 500 participants on Connect and is actually the same dataset you used in the previous chapter to understand the basics of measurement. Now, however, we will explore the relationships between variables rather than just examining them individually.
As a reminder, the study was created in Qualtrics and included several validated measures. You can examine the survey by downloading the "RITC_SURVEY_CH05_ClinicalStudy.qsf" file from the "Chapter 5 – Correlational Research" folder on the OSF project page: https://osf.io/a8kev/. Once you have this file, upload it into Qualtrics or Engage to view the survey.
Within the survey, participants completed the Generalized Anxiety Disorder scale (GAD-7), as well as the Patient Health Questionnaire (PHQ-9), a widely-used measure of depression (Kroenke et al., 2001). Then, they completed a measure of sleep quality and experiences of trauma. In the last chapter, we calculated total scores for each of these variables. You can either use the data file you worked with previously or download a new data file from the Chapter 5 project folder, in which these total scores have been calculated for you. The file name is: RITC_DATA_CH05_ClinicalStudy.sav.
Understanding Correlations
The correlation between variables is expressed as a statistic, called a correlation coefficient. Pearson's r is the most common correlation coefficient, and it provides information about both the direction and strength of the relationship between two variables.
Positive Relationships
Positive correlations occur when an increase in one variable predicts an increase in another variable. Lots of variables are positively correlated. People with more education tend to earn more money (e.g., Day & Neuberger, 2002). People who spend more time on social media tend to feel more socially isolated (e.g., Primack et al., 2017). And people who have high levels of anxiety tend to also have higher levels of depression (e.g., Löwe et al., 2008; Spitzer et al., 2006). As these examples show, in positive correlations, people who score high on one measure also tend to score high on the other. This kind of relationship is often visually depicted with a scatterplot where the trend line ascends from left to right (Figure 5.1) and is mathematically represented by a positive correlation coefficient.
Research Activity 5.1: Measuring Positive Correlations
Let's examine the relationship between anxiety and depression in the clinical dataset. But first, a word of encouragement. Many people without a strong mathematical background feel intimidated by statistics. In this case, however, the most challenging parts of the research—operationally defining variables, creating good measures, designing a study, and collecting quality data—are already complete. All you need to do is conduct some simple statistical tests with modern software that we will guide you through. There is nothing to fear.
Anxiety and Depression in the Real-World
HOW TO Box 5.1 describes how to conduct the correlation between anxiety and depression and create a scatterplot to visualize the results. You can follow the instructions in the box or watch the video that goes with this project: https://bit.ly/Ch5_Cbad.
The output shows a strong positive correlation (r = .82, p < .05). This means that as people's anxiety scores increased, their depression scores tended to increase as well. The scatterplot in Figure 5.2 shows that the points form a pattern moving upward from left to right, indicating that higher scores on one measure are associated with higher scores on the other.
HOW TO: Conduct a Correlational Analysis in SPSS
These steps will allow you to conduct a correlational analysis.
Open the dataset
- Open SPSS and navigate to File --> Open --> Data
- Find the "RITC_DATA_CH05_ClinicalStudy.sav" file from where you downloaded it
Run the Correlational Analysis
- Click on "Analyze" in the top menu
- Select "Correlate --> Bivariate"
- Move the Anxiety and Depression variables into the "Variables" box
- Make sure "Pearson" is selected under "Correlation Coefficients"
- Click "Ok" to run the analysis
Create a scatterplot to visualize the relationship
- Click on "Graphs" and select "Chart Builder"
- In the gallery, click "Scatter/Dot"
- Drag the Simple Scatter icon into the canvas area
- Drag the Anxiety variable to the x-axis
- Drag the Depression variable to the y-axis
- Click "OK" to create the scatterplot
Add a trend line to your scatterplot
- Double-click on the scatterplot to open the Chart Editor
- Click on "Elements" in the menu bar
- Select "Fit Line at Total"
- Click "Close" to exit the Chart Editor and view your completed scatterplot
Research Portfolio
Portfolio Entry #12: Reporting the Positive Correlation between Anxiety and Depression
Once you have generated the scatterplot, paste it in your portfolio and write a few sentences about what the scatterplot shows about the relationship between anxiety and depression. Why would anxiety and depression tend to be positively correlated? In other words, what underlying factors might explain why people who experience more anxiety also tend to experience more depression?
Stop and Discuss!
Think about the positive correlational relationships you observe in your own life and in psychology. Then, discuss the following questions with classmates or friends.
- What examples of positive correlation do you see in daily life?
- What psychological characteristics do you think might be positively related to each other?
- Consider how you might design a study to test whether the relationships you've identified actually exist. What would you need to measure? How would you collect the data? This kind of scientific thinking is at the heart of this chapter.
Correlations, Trends, and Prediction
When examining correlations, the predictions researchers make are not perfect for every person. You can see this by examining the scatterplot in Figure 5.2. Notice that some people have depression scores above ten but anxiety scores of zero. For these people, high depression scores do not predict high anxiety scores. Nevertheless, across the entire sample, people with high anxiety scores generally have higher depression scores and vice versa.
Clinical psychologists know that anxiety and depression usually have a strong ability to predict one another because the correlation between these variables is one of the most fundamental findings within the field (e.g., Kalin, 2020; Spitzer et al., 2006). This strong association helps explain why many mental health treatments target both anxiety and depression simultaneously.
Negative Relationships
Not all correlations are positive, some are negative. A negative correlation indicates that as scores on one variable increase scores on the other variable tend to decrease. This is also sometimes called an inverse relationship.
A negative correlation is how we would describe the relationship between anxiety and self-esteem. As people's anxiety increases their self-esteem generally decreases (e.g., Löwe et al., 2008). Thus, these two variables are negatively correlated.
Other examples of negative correlations abound. For example, as people's time spent multitasking increases, their satisfaction with what they are doing tends to decrease (Mark et al., 2008; Mark et al., 2016). The less time people spend sleeping, the more health problems they tend to encounter (Luyster et al., 2012). And, as people's sense of control over their lives decreases, the frequency of depression tends to increase. When a negative correlation is graphed, the result is a line that moves down and to the right as in Figure 5.3.
Research Activity 5.2: Measuring a Negative Correlation
Let's look at a negative correlation. In Chapter 3, we saw that the National Institute of Mental Health conducted a survey that found younger adults reported higher rates of mental illness than older adults (see Figure 3.2). Based on this finding, we might expect a similar pattern in the clinical dataset. Specifically, depression and anxiety scores might decrease with age.
To test this relationship, you can follow the same steps within SPSS that you used for positive correlation. HOW TO Box 5.1 describes how to conduct the correlation, except this time you will add age into the correlation window as well.
Age and Emotional Distress
When you conduct this correlation, you will find a negative relationship between age and depression (r = -.20, p < .05) and between age and anxiety (r = -.23, p < .05). The minus sign indicates that older people tend to have lower levels of depression and anxiety. While neither of these relationships are as strong as the one between anxiety and depression, the negative correlations show that older participants in the sample generally reported lower levels of depression and anxiety than younger participants.
Research Portfolio
Portfolio Entry #13: Reporting the Negative Correlations between Age, and Anxiety and Depression
Once you have conducted the analysis, paste the SPSS output to your portfolio. Write a few sentences describing what the output shows about the relationship between age and depression, and age and anxiety. Interpret the correlations in your own words. Why do anxiety and depression tend to be negatively correlated with age? What underlying factors might explain why people who are younger, such as teenagers, might experience more anxiety and depression compared to older people?
When a Correlation Counts: Magnitude and Statistical Significance
The Magnitude of a Correlation
While the sign of a correlation coefficient tells us how two variables are related, it does not say anything about the strength of the relationship. For that information, we must examine the numerical value of the Pearson's r statistic. The closer the value is to 1 or -1, the stronger the relationship. The closer the value is to zero, the weaker the relationship. And, if the correlation is close to zero there is no relationship between the variables (Figure 5.4).
Across the behavioral sciences, researchers share conventions for interpreting the size of a correlation coefficient. These conventions come from the work of Jacob Cohen, a prominent figure in the field of psychology and statistics (1992). The conventions apply to the size of a correlation regardless of its direction.
A correlation between 0.1 and 0.3 is considered small, indicating a weak association between the variables. A coefficient above 0.3 and below 0.5 is considered moderate, suggesting a more substantial but still modest relationship. Finally, a correlation of 0.5 or higher is considered large, indicating a strong relationship between the variables. From the examples we have seen, the correlation between anxiety and depression would be considered large and the correlation between depression and age would be considered small. Table 5.1 presents other examples of small, moderate, and large correlations.
| r value (in absolute terms) | Effect size | Examples |
|---|---|---|
| < .30 | Small | Conscientiousness and medication adherence (r = .15; Molloy et al. 2014) |
| .30 to .50 | Medium | Job satisfaction and job performance (r = .30; Judge et al., 2001) |
| > .50 | Large | Parental education and child's academic attainment (r = .50-.60; Dubow et al., 2009) |
Table 5.1. Guidelines for the strength of correlation coefficients. These conventions are the same for positive and negative values.
Statistical Significance
A correlation that is close to zero indicates no relationship between variables. Yet when behavioral scientists calculate the correlation between any two variables, they rarely get exactly zero. Even if two variables should not be related—say, shoe size and empathy—there will probably be some small correlation, maybe .03 or -.04.
The question then becomes: when is a correlation large enough to be meaningful rather than just random noise? The answer comes from statistical significance. Researchers use tests of statistical significance to determine if a correlation is larger than what would occur by chance. The result of these tests is expressed as a probability or "p-value".
By convention, if p is less than .05 the correlation is statistically significant. When p < .05, behavioral scientists have good reason to believe the correlation represents a real relationship rather than random variation in the data. A p-value below .05 means that if there were truly no relationship between the variables, a correlation as large as what is observed (or larger) would be found less than 5% of the time by chance alone.
In the data you examined earlier, all the correlations were statistically significant. The strong positive correlation between anxiety and depression (r = .82, p < .05) and the negative correlations between age and depression (r = -.20, p < .05) and age and anxiety (r = -.23, p < .05) all represent reliable patterns in the data, rather than chance findings.
The logic of statistical significance applies to all the statistical tests you will encounter in this book. Whether looking at correlations or any other statistical test, behavioral scientists use p < .05 as a guideline for determining whether the results are meaningful or more likely due to chance.
Research Activity 5.3: Examining a Correlation Matrix
When there are multiple variables in a study, researchers often want to examine how all the variables relate to each other. Rather than calculating correlations one at a time, it is possible to generate a correlation matrix—a table that shows all possible correlations at once.
To create a correlation matrix in SPSS, follow the steps in HOW TO Box 5.1 or watch the video associated with this activity: https://bit.ly/Ch5_cm. The process is similar to what you did earlier, but with one additional step. Instead of placing two variables into the "Variables" box in the correlation analysis, move all the variables of interest into the box. The resulting output will show every possible correlation between the variables in one table.
Figure 5.5 shows a correlation matrix with seven variables from the clinical dataset. The variables in the matrix include depression, anxiety, sleep quality, trauma, age, income, and education. Each cell in the matrix shows the correlation between the variables in that row and column. For instance, to find the correlation between depression and anxiety, look at where the depression row intersects with the anxiety column. The value there is r = .82.
A correlation matrix has several key features. First, along the diagonal line running from top-left to bottom-right, we see all 1's—these represent each variable's correlation with itself.
Second, above and below the diagonal line, the matrix is symmetrical. This means the correlation between any two variables appears twice. When reading the table, we can ignore everything below the diagonal. Third, the "Sig. (2-tailed) value" is the p value for each correlation. Asterisks mark statistically significant correlations, with one asterisk indicating p < .05 and two asterisks indicating p < .01. Finally, the "N" rows show the sample size for each correlation.
Looking at the matrix reveals patterns in how these variables correlate. For example, depression shows a strong positive correlation with both anxiety (r = .82) and sleep problems (r = .71), a moderate positive correlation with trauma (r = .54), and a small negative correlation with both age (r = -.20) and income (r = -.16). The correlation between disordered sleep and education is not significant.
Reporting the Results of a Correlation Analysis
Writing about statistical results is an important part of behavioral research. It is used not only to clearly communicate the results of a study to others, but often it helps clarify a researcher's own thinking about what was found. A good write-up presents the results in clear, non-technical language that any reader can understand, while also including the essential statistical information. Let's look at an example.
To describe the correlation between anxiety and depression you might write:
"I conducted a Pearson correlation analysis to examine the relationship between anxiety and depression. I found a significant positive correlation between anxiety (as measured by the GAD-7) and depression (as measured by the PHQ-9), r(524) = .82, p < .05. This result suggests that anxiety and depression are strongly related. Specifically, the results suggest that people who experience higher levels of anxiety also tend to experience higher levels of depression."
In this write-up, the correlation coefficient (r = .82) tells readers how strong the relationship is and in what direction. The number in parentheses (524) represents the degrees of freedom, which is related to the sample size (sample size [N] minus 2). The p-value (p < .05) indicates the relationship is statistically significant and larger than what would be expected by chance. Together, these elements give readers the information they need. Plus, the plain language interpretation helps everyone understand what the results mean in practical terms.
Stop and Discuss!
Take a minute to examine the correlation matrix in Figure 5.5. It shows the relationships among depression, anxiety, sleep problems, trauma, age, income, and education. After you've examined the correlations, discuss each of the questions below.
- For each correlation in the matrix, practice interpreting and reporting the findings. As a class activity, go around the room with each person interpreting one of the correlations. Start with the r coefficient and its direction, note its size using Cohen's conventions, and then explain what it means in plain language.
- After writing your results, search Google Scholar for studies that have examined these same relationships. For instance, you can search "depression anxiety correlation meta-analysis" or "sleep depression correlation meta-analysis." How do the findings from our example study compare with what other researchers have found? Are the correlations similar in size, direction, and magnitude? What might explain any differences you find?
- Discuss your interpretations and literature findings with your classmates. Some interesting correlations to discuss include depression and trauma, depression and age, depression and income, anxiety and sleep, anxiety and trauma, and sleep and trauma. What patterns do you notice in how these variables relate to each other? Do these relationships align with what you might have predicted?
Research Portfolio
Portfolio Entry #14: Writing the Results of a Pearson's Correlation Analysis in a Formal Scientific Format
Once you have created the correlation matrix, paste it to your portfolio. Select two correlations, one that is significant and one that is not. Write up the results using the template for reporting correlations that was provided above. Make sure to report the r coefficient and its direction, note its size using Cohen's conventions, and then explain what it means in plain language.
Different Types of Associations
Explore how correlations vary across categorical and continuous variables.
Types of Associations
Many behavioral studies examine how different variables are associated with one another. In conducting these studies, researchers are interested in whether changes in one variable correspond to changes in another variable. As we saw in the last chapter, however, not all variables are measured on the same scale. Variables measured on different scales produce different kinds of associations.
Recall that the previous chapter described four scales of measurement: nominal, ordinal, interval, and ratio (remember the acronym 'measurement NOIR'). For practical purposes, these measurement scales can be simplified into two broad categories: categorical and continuous. So far, this chapter has focused on associations between continuous variables like anxiety and depression. However, behavioral scientists often want to understand relationships between variables that are not continuous. For example, is anxiety (continuous) related to gender (not continuous)? Examining this kind of relationship requires a different statistical tool than the ones we have seen so far. When examining the relationship between variables that are not both continuous, researchers must use different statistical tools.
When examining associations, we can treat both interval and ratio scales as continuous variables because they exist on a continuum and are analyzed the same way. We can also treat ordinal variables, like Likert scales, as continuous variables (even though there are specialized techniques for ordinal data that we do not cover here). Finally, we can treat nominal variables as categorical data. This yields two main types of variables—categorical and continuous—that can be associated with each other in three possible combinations (Figure 5.6).
Each type of relationship requires a different analytical approach. The type of relationship also determines how researchers visualize the results. Let's examine each type of relationship using examples from the anxiety dataset.
Associations between Categorical and Continuous Variables
Many research questions examine how different groups of people differ on a continuous measure. A school administrator, for instance, might want to know if students at private schools have higher test scores than students at public schools. In this case, school type (public vs. private) is categorical while test scores are continuous. Public health researchers might investigate whether people living in cities report higher stress than those in rural areas. Medical researchers might examine whether participants in exercise programs show lower blood pressure than people not in such programs. Or developmental psychologists might want to know whether first-born children score differently on personality traits than later-born children. In each of these cases, the researchers want to know if one group of people scores differently on a measure than another group.
To analyze these kinds of relationships, researchers compare the average scores for each group. For example, in the anxiety dataset, we can examine whether the average depression score is higher for one gender group than another.
Research Activity 5.4: Comparing Gender Differences in Depression
In Chapter 3, we described the National Institutes of Mental Health survey that found higher rates of mental illness among women than men. While that survey compared self-reported rates of mental illness (yes vs. no), our clinical dataset offers the opportunity to examine whether a similar relationship exists between a categorical variable (gender) and a continuous variable (mental health symptoms). When examining this type of relationship, researchers calculate the average score for each group and then compare the two. Let's see how this works with the clinical dataset.
Figure 5.7 shows the average depression scores for men (M = 6.10) and women (M = 7.27). The difference in average scores represents an association between gender (the categorical variable) and depression (the continuous variable). We can call this an association because knowing someone's gender gives us information to predict their depression score. As a group, women score about one point higher than men. If there were no association between gender and depression, we would expect similar levels of depression across groups.
Conducting a t-test
Just as we used Pearson's r to describe the relationship between continuous variables, researchers use something called a t-test to determine if differences between group averages are meaningful. The t-test assesses whether the difference between two groups is larger than what would be expected by chance alone. Like with correlations, a p < .05 is the criterion for statistical significance.
HOW TO Box 5.2 describes how to conduct the t-test examining whether men's and women's depression scores differ on average. You can follow the instructions in the box or watch the video for this project: https://bit.ly/Ch5_CtT.
HOW TO: Conduct an Independent Samples t-test in SPSS
These steps will allow you to compare the means of two independent groups, such as the depression scores between men and women.
Open the dataset
- Open the "RITC_DATA_CH05_ClinicalStudy.sav" file
Access the t-test function
- Click "Analyze" in the top menu
- Select "Compare Means" from the dropdown menu
- Click on "Independent Samples T-Test"
Select variables for analysis
- Move the depression score into the "Test Variable(s)" box
- Move the categorical variable gender into the "Grouping Variable" box
Define the groups to compare
- After moving gender into the grouping variable box, click the "Define Groups" button
- Enter "1" into the "Group 1" box, representing men
- Enter "2" into the "Group 2" box, representing women
- Click "Continue" to return to the main dialog box
Run the analysis
- Click "OK" to execute the t-test
- Results will appear in the output viewer window
- The output includes descriptive statistics for each group and the t-test results showing whether differences between groups are statistically significant
The t-test shows a significant difference between men's and women's depression scores (t = 2.3, p < .05), meaning the difference is larger than what would be expected by chance (Figure 5.8). This significant test confirms that there is a real, reliable relationship between gender and depression in this sample.
Just as with correlations, researchers communicate the findings of t-tests in a scientific report. An example of how to do this might read:
"I conducted an independent samples t-test to examine differences in depression between men (M = 6.10, SD = 5.31) and women (M = 7.27, SD = 6.49). There was a significant difference, t(525) = 2.38, p < .05, suggesting that gender is associated with depression. More specifically, my results indicated that women tend to experience higher levels of depression than men, even though the effect size (Cohen's d = 0.21) suggests the difference is small in magnitude."
If you are wondering where that last bit about effect sizes came from and why d = .21 is considered small, continue to the next section.
Making Sense of Effect Sizes
Just as researchers have guidelines for interpreting the size of correlations, they have guidelines for interpreting the size of differences between group means. The most common statistic for this is Cohen's d, which measures the difference between two group means relative to the standard deviation of both groups.
A Cohen's d around 0.2 represents a small difference between groups. Values around 0.4 represent medium-sized differences. And values of 0.6 or larger represent large differences. These guidelines help researchers communicate not just whether group differences are statistically significant, but how meaningful they are in practical terms. Table 5.2 contains examples of research findings with different effect sizes.
In our sample project, we found a Cohen's d of 0.2, indicating a small but reliable difference between men's and women's depression scores. This aligns with the survey from Chapter 3, where women showed somewhat higher rates of mental illness than men. While the difference is small, the consistency of this pattern across different studies—from large national surveys down to our sample of 500 participants—suggests the gender difference in depression is a reliable phenomenon, even if its effects are modest.
| Cohen's d | Effect Size | Examples from Research |
|---|---|---|
| 0.2 | Small | Gender differences in depression (d = 0.2; Salk et al., 2017) |
| 0.4 | Medium | Social media use and anxiety (d = 0.45; Hunt et al., 2018) |
| 0.6 | Large | Exercise intervention effects on stress (d = 0.84; Anderson & Durstine, 2019) |
Table 5.2. Conventions for interpreting Cohen's d effect sizes.
Overall, the key to understanding categorical-continuous relationships is recognizing that they are still just a pattern of association. However, instead of looking at how two continuous variables move together (like anxiety and depression), researchers compare the average scores of different groups. The statistical tools change—from correlation coefficients to t-tests—but the goal remains the same: examining whether scores on one variable help predict scores on another.
Research Portfolio
Portfolio Entry #15: Reporting on Gender Differences in Depression with a t-test
Once you have conducted your t-test, copy and paste the results into your portfolio. Then, following the template above, report the results of the t-test. Finally, write a few sentences explaining the results in your own words.
Associations between Two Categorical Variables
Many questions in behavioral research involve relationships between categorical variables. For example, researchers might want to know whether employed people are more likely to register as voters than unemployed people. In this case, both variables—employment status (employed vs. unemployed) and voter registration (registered vs. not registered)—are categorical. Similarly, researchers might investigate whether college graduates are more likely to own homes than non-graduates, or whether people living in cities are more likely to use public transportation than those living in rural areas.
When examining relationships between categorical variables, researchers compare percentages between groups: what percentage of employed people are registered to vote vs what percentage of unemployed people are registered to vote? To examine whether percentages differ between groups, researchers use a statistic called a chi-square (χ²). The idea behind the chi-square is the same as Pearson's r and the t-test: we want to know if the p-value associated with the statistic is smaller than .05. If so, there is an association between the variables.
Research Activity 5.5: Comparing Categorical Variables
Many variables are naturally categorical (voter registration: registered vs. not-registered). But researchers sometimes create a categorical variable from a continuous measure to answer a specific research question.
For example, while we previously examined the difference in average depression scores between men and women, the same data could be used to test whether women are more likely than men to experience severe depression. To do so, we would draw upon clinical cutoffs for the PHQ-9, where a score of 20 or higher indicates severe depression. Using the cutoffs, we could create two groups: one with anyone who has a PHQ-9 score of 20 or greater (the 'severe depression' group) and a second with anyone who has a score below 20 (the 'no severe depression' group). Creating these groups transforms the continuous measure into a categorical one that we can use to test gender differences in severe depression.
To create these groups and conduct the Chi-square test, you can follow the instructions in HOW TO Box 5.3 or watch the accompanying video for this exercise: https://bit.ly/Ch5_Cchit.
In our sample project, 5.6% of women experienced severe depression compared to just 0.7% of men. The chi square value was 10.73, p < .05, indicating that the difference in percentages was statistically significant (Figure 5.9).
HOW TO: Compare Categorical Variables Using Chi-Square in SPSS
These steps help you analyze whether a relationship exists between two categorical variables such as gender and severe depression.
Open the dataset
- Open the "RITC_DATA_CH05_ClinicalStudy.sav" file
Create the categorical variable
- To create categories from a continuous variable like depression scores, click "Transform" in the menu bar
- Select "Recode into Different Variables..."
- Move the Depression variable into the "Input Variable" box
- In the "Output Variable" box, enter a name for the new variable such as "SevereDepression"
- Click "Old and New Values" to define your categories
- Click "Range, LOWEST through value:" and enter "19" in the box. Then, in the "New Value" box to the right enter "1." Then click "Add." All scores below the cutoff of 20 will now be scored as a "1"
- Click "Range, value through HIGHEST:" and enter "20" in the box. Then, in the "New Value" box to the right enter "2." Then click "Add." All scores above the cutoff of 20 will now be scored as a "2."
- Click "Continue" and then "OK"
Run a Chi-Square analysis
- Click on "Analyze" in the top menu
- Select "Descriptive Statistics → Crosstabs"
- Move one categorical variable, Gender, to the "Row(s)" box
- Move the other categorical variable, SevereDepression, to the "Column(s)" box
- Click "Statistics" and check "Chi-square"
- Click "Continue"
- Click "Cells" and check "Row percentages"
- Click "Continue" and then "OK"
Interpret your results
- Look at the "Chi-Square Tests" table to find the Pearson Chi-Square value and significance level
- Look at the Crosstabulation table to compare percentages between groups
To report the results of a chi-square analysis, you might write:
"I conducted a chi-square test to examine if there was a difference in the percentage of men and women who scored above a 20 on the PHQ-9. There was a significant difference, χ² (2) = 10.71, p < .05, suggesting that gender is associated with severe depression. More specifically, my results indicated that women tend to experience severe depression at a higher rate than men."
Research Portfolio
Portfolio Entry #16: Reporting on Differences in Depression with a Chi-square test
Once you have conducted the chi-square, paste the results into your portfolio. Report on the results using the template above, and then write a few sentences explaining why you think this difference in severe depression exists.
Guided Research Project: Morality and the Heinz Dilemma
Walk through a correlational research project examining moral decision making.
Guided Project: Moral Foundations and the Heinz Dilemma
Throughout this chapter, we have explored how researchers examine the associations between variables. Now, we will put this knowledge to use in a guided project.
Like the project we completed together in Chapter 3, this project will involve the Heinz dilemma. Instead of describing people's moral decisions, however, we will examine what might predict those decisions. Specifically, we will use something called Moral Foundations Theory (e.g., Graham et al., 2013) to investigate whether differences in people's moral intuitions help explain their judgments in the Heinz dilemma. All the study materials, the data file, and the instructions for what to do will be provided to you.
The project will give you hands-on experience with the key elements of correlational research: developing theoretically driven hypotheses, analyzing relationships between variables, and interpreting results. The accompanying video for this project provides a step-by-step guide for what you need to do: https://bit.ly/Ch5hcs.
What We are Studying: Project Goals and Big Questions
This project consists of three tasks. First, you will generate research hypotheses. This will involve reading about Moral Foundations Theory and developing two specific hypotheses about how moral foundations might predict reactions to the Heinz dilemma. For each hypothesis, you will write approximately one paragraph explaining your rationale based on the Moral Foundations theory.
Second, you will analyze the data. After examining the Qualtrics survey, you will download the SPSS data file and calculate scores for the five moral foundations subscales. You will then create a correlation matrix showing relationships between all five moral foundations and the moral acceptability ratings. Next, you will conduct t-tests to examine whether moral foundation scores differ between people who think Heinz should have stolen the drug and those who think he should not have. For your hypothesized relationships, you will create appropriate figures, including a bar graph comparing moral foundation scores between groups.
Finally, you will write up the results. Your write-up should focus on the specific hypotheses you generated, following the examples from earlier in the chapter that show how to report statistical analyses.
Part 1: Frame Your Hypotheses
In Chapter 3, we conducted a descriptive study examining how people respond to the Heinz dilemma. We found that when people were asked whether Heinz should steal the drug, a slight majority said he should not. We also found that when people were asked how morally acceptable stealing would be, most found it acceptable.
In this project, we will explore what might predict people's responses. To answer this question, we will draw upon Moral Foundations Theory. Just as the Big Five personality traits provide a framework for understanding personality, Moral Foundations Theory suggests that moral judgments can be understood through five basic moral intuitions that are described in Box 5.4.
Brief Background Reading
This project begins where all research begins: looking at existing theory. Take 20-30 minutes to read about Moral Foundations Theory. You can start at the www.moralfoundations.org website. Then, you can explore the peer reviewed literature on Google Scholar (e.g. Graham et al., 2013). During your reading, pay attention to how this theory has been used to understand people's moral judgments in different situations.
Develop Your Hypotheses
After familiarizing yourself with the Moral Foundations Theory, take a moment to develop two of your own hypotheses. Looking at the five moral foundations, which ones do you think might predict whether someone finds stealing the drug acceptable or unacceptable? Why?
Write down your predictions and your reasoning in your portfolio. Specifically, one hypothesis should relate to the question of why people find Heinz's decision morally acceptable or unacceptable. The second hypothesis should relate to whether Heinz should have stolen the drug. For example, are people who say that Heinz should steal the drug more likely to have higher Authority scores compared to those who think he shouldn't steal the drug?
The Moral Foundations Theory
Care/Harm: This foundation focuses on our sensitivity to others' suffering and desire to protect the vulnerable. People high in care are compassionate, empathetic, and disturbed by cruelty. They prioritize alleviating suffering and providing care for those in need.
Fairness/Cheating: This foundation relates to justice, rights, and proportional treatment. People high in fairness value equality, reciprocity, and impartiality. They're sensitive to cheating, discrimination, and injustice, with strong reactions to those who violate these principles.
Loyalty/Betrayal: This foundation emphasizes obligations to in-groups like family, community, or nation. People high in loyalty value group cohesion, patriotism, and self-sacrifice for collective good. They strongly disapprove of betrayal and those who abandon their group commitments.
Authority/Subversion: This foundation relates to tradition, hierarchy, and leadership respect. People high in authority value social order, deference to legitimate authorities, and role fulfillment. They're concerned with maintaining institutions and respecting established hierarchies.
Sanctity/Degradation: This foundation involves physical and spiritual purity concerns. People high in sanctity are guided by disgust toward "unnatural" behaviors and value restraint, cleanliness, and treating certain things as sacred. They're concerned with avoiding contamination.
Part 2: Design, Materials, and Methods
Now that you have hypotheses, let's examine how to test them. The first step is selecting the measures to assess each construct.
Moral Foundations Questionnaire
To measure moral foundations, we will use the Moral Foundations Questionnaire (MFQ-30), which assesses how strongly people endorse each of the five moral foundations (Graham et al., 2008). The MFQ-30 contains 30 items total, with six items measuring each foundation. For instance, an item that measures the Care/Harm foundation asks people to rate how relevant "whether or not someone suffered emotionally" is when deciding if something is right or wrong. All items are answered on a 1 to 5 scale, although the answer labels vary across the measure.
Heinz Dilemma
We will present participants with the same moral dilemma and follow up questions used in Chapter 3. The questions asked whether Heinz should steal the drug (yes/no) and how morally acceptable stealing would be (rated from 1-7).
Data Quality
The Moral Foundations Questionnaire (MFQ-30) includes a few items that check to see if participants are paying attention. For example, one item asks participants how important "whether or not someone was good at math" is when deciding between right and wrong. To the items embedded within the MFQ, we added an additional attention check. Chapters 10, 11, and 12 cover data quality in-depth, but as an optional exercise, you can identify which participants failed the attention checks and remove them from the data analyses after performing the steps below.
Accessing study materials
To see how the measures were implemented online, download the Qualtrics survey file from the OSF project page: https://osf.io/a8kev/. Find the folder named "Ch. 5 – Correlational Research" and download the "RITC_SURVEY_CH05_HeinzDilemma.qsf" file. Import this file to your Qualtrics or Engage account and explore its structure.
Notice how the survey is organized into blocks, how it uses matrix-style questions to present the MFQ-30 items, and how it randomly determines the order people receive the MFQ-30 and Heinz dilemma in (you can see this within the survey flow). The instructional video for this assignment walks you through the key features of the survey design in more detail.
Data collection
After programming the survey, we gathered data from 200 Connect participants. Each person was paid $1.00 for their time, and the study took about 7 minutes to complete. Once the study was launched, data collection completed in under an hour.
Part 3: Analyze the Data
To analyze the data, download the SPSS file from the OSF page. In the "Ch. 5 – Correlational Research" folder find the file named "RITC_DATA_CH05_HeinzDilemma.sav" and download it.
Once you have the file open, calculate the scores for each moral foundation subscale. HOW TO Box 5.5 provides instructions or you can watch the video for this project. The MFQ-30 consists of five moral foundations with six items each. You need to average the items to create a single score for each foundation.
Once you have created a score for each subscale, create a correlation matrix to show the correlations between all five moral foundation subscales and the moral acceptability judgments. Then, conduct t-tests to see whether people who thought it was acceptable for Heinz to steal the drug differ from those who thought it was unacceptable. These analyses will tell you whether your hypotheses about the moral foundations and the Heinz dilemma were supported.
Create a Figure
After the t-tests, create a bar graph that compares moral foundation scores between those who thought Heinz should steal the drug and those who thought he should not. You can use either HOW TO Box 5.3 or the video for this assignment to guide you through this process. Make sure you put the yes/no groups on the x-axis and the moral foundation score on the y-axis. Also make sure your graph includes error bars (± 1 standard error). Label both axes and give the graph a title. You will include this figure with the write-up of your results.
HOW TO: Analyze Data for the Heinz Dilemma Correlational Study
Download and open the dataset
- Navigate to the OSF page (https://osf.io/a8kev/)
- Download the "Heinz Dilemma - Correlational Project" SPSS file (.sav)
- Open SPSS and load the data file (File > Open > Data)
Calculate moral foundation scores
- Click "Transform" in the top menu, then select "Compute Variable"
- For each moral foundation, create an average score from its six items.
- For example, to create the "Harm" variable write "Harm" in the "Target Variable:" box. Then in "Numeric Expression" enter: MEAN(Harm1, Harm2, Harm3, Harm4, Harm5, Harm6)
- Click "OK"
- Repeat this process for the other foundations (Fairness, Loyalty, Authority, Sanctity)
Examine correlations with moral acceptability
- Click "Analyze" in the top menu, then select "Correlate → Bivariate"
- Move all five moral foundation scores and the "Acceptability" variable to the box
- Ensure "Pearson" is selected under "Correlation Coefficients"
- Click "OK" to produce the correlation matrix
- Identify which moral foundations correlate significantly with moral acceptability
Compare groups based on the yes/no decision
- Click "Analyze" in the top menu, then select "Compare Means → Independent Samples T-Test"
- Move the moral foundation scores to the "Test Variable(s)" box
- Move the "Steal" variable to the "Grouping Variable" box
- Click "Define Groups" and enter "1" for "Group 1" (Yes) and "2" for "Group 2" (No)
- Click "Continue" and then "OK"
Create a bar graph for your hypothesized relationship
- Click "Graphs" in the menu bar, then select "Chart Builder"
- In the gallery, select "Bar" chart and drag the simple bar chart to the canvas area
- Drag "Steal" to the x-axis and your foundation of interest to the y-axis
- Make sure the "Steal" variable is listed as a "Nominal" measure otherwise SPSS will not recognize the categories for your chart
- Click the "Element Properties" button
- In the dialog box, select the "Error Bars" tab
- Check "Display error bars" and select "1 Standard Error" from the dropdown
- Click "Apply" and then "OK" to create your graph with error bars
- Repeat these steps for each of the Moral Foundations variables you hypothesized a difference for
Part 4: Interpret the Findings
How do the results compare to your hypotheses? Were your predictions supported? What surprised you about the findings? Why do you think some moral foundations predicted reactions to the dilemma while others didn't?
Write up the results relating to your hypotheses about moral foundations and acceptability ratings using the style below:
"I conducted a Pearson correlation analysis to examine the relationship between [your predicted moral foundation] and judgments of moral acceptability in the Heinz dilemma. There was a [significant/non-significant] [positive/negative] correlation between [foundation name] and moral acceptability ratings, r(198) = [value], p = [value]."
Add one sentence explaining what your result means in plain language. Include your figure and a brief statement about whether your hypotheses were supported.
Group Differences
For your hypothesis about moral foundations and people's yes/no decisions, write up your findings with the following structure:
"I conducted an independent samples t-test to compare [foundation name] scores between participants who said Heinz should steal the drug and those who said he should not. There was a [significant/non-significant] difference in scores for yes (M = [value], SD = [value]) and no (M = [value], SD = [value]) groups; t(198) = [value], p = [value]."
Add one sentence explaining what this means in plain language.
Research Portfolio
Part 5: Portfolio Entry #17 - Report the Relationship Between Moral Foundations and People's Decisions
For your portfolio, describe your two hypotheses about moral foundations and people's judgments in the Heinz dilemma, with one paragraph explaining the rationale for each hypothesis.
Include your analyses and the write up from above. Present the correlation matrix showing relationships between moral foundations and acceptability ratings, followed by the t-test results comparing moral foundation scores between yes/no groups, and then the bar graph showing group differences for your effect.
Write up your results, with one paragraph each for the correlation and t-test results. End with a brief interpretation of what your findings mean.
Designing Your Own Correlational Study
Apply your knowledge by designing, conducting, analyzing, and reporting your own correlational study.
Now that you have worked through an example, you are ready to investigate your own correlational research question.
Throughout this book, we have explored several different domains of research. You have measured personality traits, examined clinical variables like anxiety and depression, and investigated stages of moral development. Now it is time to use what you have learned to investigate a question that interests you.
Your correlational study could build on any of the measures explored previously. For instance, you might wonder how personality traits relate to mental health: are more conscientious people less likely to experience anxiety? Or you might be curious about how moral foundations connect to other aspects of behavior: do people who score high on the care/harm foundation show more empathy in everyday situations?
You could also venture into new territory. Many students are curious about patterns they have observed in their own lives or questions about human behavior that intrigue them. For example: Do people who spend more time on social media report feeling more socially connected or more isolated? Are students who exercise regularly less stressed during exam periods? Do people who maintain structured daily routines sleep better? Is procrastination related to perfectionism?
You have experience finding validated measurement tools, from the TIPI for assessing personality to the GAD-7 for anxiety and the Moral Foundations Questionnaire for moral intuitions. You can use these measures, find new measures in the instrument databases we discussed, or create your own measure following the process outlined in Chapter 4 for working with AI. Either way, this project presents an opportunity to pull together many of the things you have learned to investigate a question of your choosing.
Step 1: Craft Your Question and Study Design
Using Qualtrics or Engage, design a short survey to collect data for a correlational research project. Keep it simple: measuring two or three variables is plenty. The best way to begin is to take the Moral Foundations study in Qualtrics and modify it. Include clear instructions for participants and organize your measurements into blocks.
The first block should include instructions and a welcome message. After that, you should measure one variable per block, placing all the items that are part of your measurement instrument into their own block. The last block should include basic demographic questions.
Step 2: Collect Data
For this project, you should aim to collect data from at least 100 participants. These could be students in your school (if your instructor sets up a class data collection system through a platform like Sona), friends and family (using the anonymous survey link from Qualtrics or Engage), or CloudResearch Connect participants.
Step 3: Analyze Your Results
Once you have the data, analyze it using the statistical tools you learned about in this chapter. For relationships between continuous variables, use correlations. For comparing groups on continuous measures, use t-tests. It is less common to examine relationships between categorical variables for a first attempt at a correlational study, but if this fits the question you have chosen, use chi-square tests. Create appropriate visualizations using the techniques you practiced in the chapter.
Research Portfolio
Step 4: Portfolio Entry #18 - Interpret and Share What You Found
The Appendix for Part I provides instructions for reporting research. Follow those guidelines and add your results to your portfolio. Include your research question and why you chose it, how you measured your variables, your statistical findings, a figure showing your main result, and your interpretation of what the results mean.
Stop and Discuss!
Before starting your project, discuss these questions with your class.
- What research questions interest you and why?
- What challenges do you anticipate in measuring your variables?
- How will you recruit participants for your study?
- What type of statistical analysis will best answer your research question?
Remember, the goal is to learn something about human behavior through systematic investigation.
Summary
In this chapter, we explored correlational research. We learned how to analyze correlations, interpret their strength, and understand when relationships are statistically significant. We also saw how behavioral scientists assess relationships between different types of variables: continuous-continuous associations are measured with Pearson's r, categorical-continuous associations with t-tests, and categorical-categorical associations with a chi-square.
The hands-on projects have given you practical experience with correlational analyses. By exploring relationships between anxiety, depression, and demographic variables, you had the chance to work with a real dataset. The Heinz dilemma project further demonstrated how psychological theories like Moral Foundations Theory can explain individual differences in moral judgments.
The goal of correlational research, as should be clear by now, is to identify patterns and make predictions. When behavioral scientists find that two variables are correlated, they know that one variable can predict the other. However, correlations alone cannot tell researchers whether one variable causes change in the other.
This limitation brings us to the next chapter, where we will explore how researchers address questions of causality in correlational research. Chapter 6 will introduce you to advanced techniques for investigating cause and effect relationships when experiments are not possible. You will learn about statistical control, multiple regression, and longitudinal designs. These are common methods that help researchers build stronger evidence for causal relationships while still acknowledging the inherent limitations of correlational approaches.
By combining what you have learned about correlational research with these advanced techniques of causal inference, you will develop a more sophisticated understanding of how behavioral scientists investigate associations between variables and draw meaningful conclusions about human behavior.
Frequently Asked Questions
What is the difference between positive and negative correlations?
Positive correlations occur when an increase in one variable predicts an increase in another variable (e.g., anxiety and depression, r = .82). Negative correlations (inverse relationships) occur when an increase in one variable predicts a decrease in another (e.g., age and anxiety, r = -.23). The direction is indicated by the sign of the correlation coefficient.
How do you interpret the size of a correlation coefficient?
According to Cohen's conventions, a correlation between 0.1 and 0.3 is considered small, indicating a weak association. A coefficient between 0.3 and 0.5 is moderate, suggesting a more substantial relationship. A correlation of 0.5 or higher is large, indicating a strong relationship. These conventions apply regardless of whether the correlation is positive or negative.
When should you use a t-test versus a chi-square test?
Use a t-test when comparing group averages for a categorical variable versus a continuous variable (e.g., comparing depression scores between men and women). Use a chi-square test when examining relationships between two categorical variables by comparing percentages between groups (e.g., comparing rates of severe depression between genders).
What does statistical significance (p < .05) mean for correlations?
Statistical significance (p < .05) indicates that a correlation is larger than what would occur by chance alone. It means that if there were truly no relationship between the variables, a correlation as large as observed would be found less than 5% of the time by chance. However, statistical significance should be considered alongside effect size to determine practical meaningfulness.
Key Takeaways
- Correlational research examines whether two variables are related and quantifies their association with statistics.
- Pearson's r measures both the direction (positive/negative) and strength of relationships between continuous variables.
- Positive correlations occur when increases in one variable predict increases in another (e.g., anxiety and depression, r = .82).
- Negative correlations occur when increases in one variable predict decreases in another (e.g., age and anxiety, r = -.23).
- Cohen's conventions for correlation size: Small (r = .1-.3), Medium (r = .3-.5), Large (r > .5).
- Statistical significance (p < .05) indicates a relationship is larger than expected by chance.
- Correlation matrices display all relationships between multiple variables at once.
- T-tests compare group averages when examining categorical vs. continuous variables.
- Chi-square tests compare percentages when examining categorical vs. categorical variables.
- Cohen's d measures effect size for group differences: Small (0.2), Medium (0.4), Large (0.6+).
- Both statistical significance AND effect size matter for interpreting results.
- Correlations identify patterns and enable prediction, but cannot establish causation.









