Introduction to Multilevel Model for Intraindividual Covariation
Overview
The MultiLevel Model (MLM) was developed to accommodate the interdependence in nested data. It is particularly useful for examining individual differences in bivariate associations - thus providing for more robust statistical inference about within-person associations (intraindividual covariation) and the between-person differences therein.
This tutorial covers how the multilevel model can be used to examine within-person associations (intraindividual covariation) and how those associations are moderated by between-person differences.
Our example is developed using experience sampling data (repeated occasions nested within persons), but also applies to other kinds of nested data.
Introduction to the Multilevel Model
The basic linear multilevel model is written as
\[y_{it} = \beta_{0i} + \beta_{1i}x_{it} +
e_{it}\] where \(\beta_{1i}\) is
individual i’s level of
\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]
where \[e_{it} \tilde{} N(0,\mathbf{R})\]
and \[\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})\]
\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\]
where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.
\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]
Preliminaries
Loading Libraries
library(ggplot2) # data visualization
library(psych) # describing data
library(plyr) # data manipulation
library(tidyr) # tidy data
library(dplyr) # data manipulation
library(interactions) # probing/plotting interactions
library(backports) # revive the isFALSE() function for sim_slopes()
library(effects) # probing interactions
library(lme4) # multilevel models
library(lmerTest) # p-valuesReading in Repeated Measures Data
Our example makes use of the person-level and day-level (EMA-type) AMIB Data, a multiple time-scale data set that has been shared for teaching purposes.The data are shared with intent to facilitate learning about analysis of intensive longitudinal data (from experience sampling, daily diary, or EMA designs). The data are posted only for teaching purposes and should not be used for research.
Use requires citation and acknowledgement of both this website and the following paper:
We make use of three variables:
Outcome: daily negative affect
(continuous), the negaff variable in the
day-level data file;
Time-varying Predictor: daily stress, a stress
variable obtained through reverse coding of pss in the
day-level data file;
Person-level Predictor/Moderator: trait neuroticism, the
bfi_n variable in the person-level data file.
Loading person-level file (N = 190) and subsetting to variables of interest
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/AMIB/AMIB_persons.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath), header=TRUE)
#subsetting to variables of interest
AMIB_persons <- AMIB_persons %>%
select(id, bfi_n)Loading day-level file (T = 8) and subsetting to variables of interest.
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/AMIB/AMIB_daily.csv"
#read in the .csv file using the url() function
AMIB_daily <- read.csv(file=url(filepath),header=TRUE)
#subsetting to variables of interest
AMIB_daily <- AMIB_daily %>%
select(id, day, negaff, pss)Data Preparation and Description
In this example data are prepared through some recoding (idiosyncratic for this data set), and separation of time-varying predictor variables into between-person and within-person components (typical for all multilevel analysis)
Data Recoding
Reverse code pss into a new stress variable
where higher values indicate higher perceived stress.
#reverse coding the pss variable into a new stress variable
AMIB_daily <- AMIB_daily %>%
mutate(stress = 4-pss)
#describing new variable
describe(AMIB_daily$stress)## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1445 1.39 0.68 1.25 1.36 0.74 0 4 4 0.35 0.13 0.02
#histogram
AMIB_daily %>%
ggplot(aes(x=stress)) +
geom_histogram(fill="white", color="black",bins=20) +
labs(x = "Stress (high = more stressed)")Between- and Within- Components
We now split the time-varying predictor into “trait” (between-person
differences) and “state” (within-person deviations) components.
Specifically, the daily variable stress is split into two
variables: stress_trait is the sample-mean centered
between-person component, and stress_state is the
person-centered within-person component.
Although not formally needed, we do this for all the time-varying variables (predictors, outcomes) for easier plotting later.
Making trait variables
#calculating intraindividual means
AMIB_imeans <- AMIB_daily %>%
group_by(id) %>%
summarize(stress_trait = mean(stress, na.rm=TRUE),
negaff_trait = mean(negaff, na.rm=TRUE))
describe(AMIB_imeans)## vars n mean sd median trimmed mad min max range
## id 1 190 318.29 130.44 321.50 318.99 151.23 101.00 532.00 431.00
## stress_trait 2 190 1.40 0.48 1.41 1.39 0.51 0.19 2.56 2.38
## negaff_trait 3 190 2.48 0.73 2.41 2.43 0.72 1.11 5.09 3.98
## skew kurtosis se
## id -0.04 -1.09 9.46
## stress_trait -0.04 -0.23 0.03
## negaff_trait 0.68 0.45 0.05
#merging into person-level file
AMIB_persons <- AMIB_persons %>%
left_join(AMIB_imeans, by="id")
#make centered versions of the person-level scores
AMIB_persons <- AMIB_persons %>%
mutate(bfi_n_c = scale(bfi_n, center=TRUE, scale=FALSE),
stress_trait_c = scale(stress_trait, center=TRUE, scale=FALSE))
#describe person-level data
describe(AMIB_persons)## vars n mean sd median trimmed mad min max
## id 1 190 318.29 130.44 321.50 318.99 151.23 101.00 532.00
## bfi_n 2 190 2.98 0.96 3.00 3.00 1.48 1.00 5.00
## stress_trait 3 190 1.40 0.48 1.41 1.39 0.51 0.19 2.56
## negaff_trait 4 190 2.48 0.73 2.41 2.43 0.72 1.11 5.09
## bfi_n_c 5 190 0.00 0.96 0.02 0.02 1.48 -1.98 2.02
## stress_trait_c 6 190 0.00 0.48 0.01 0.00 0.51 -1.21 1.17
## range skew kurtosis se
## id 431.00 -0.04 -1.09 9.46
## bfi_n 4.00 -0.09 -0.82 0.07
## stress_trait 2.38 -0.04 -0.23 0.03
## negaff_trait 3.98 0.68 0.45 0.05
## bfi_n_c 4.00 -0.09 -0.82 0.07
## stress_trait_c 2.38 -0.04 -0.23 0.03
Making state variables in long data (as deviations from uncentered trait variable)
#merging person-level data into daily data
daily_long <- AMIB_daily %>%
left_join(AMIB_persons, by="id")
#calculating state variables
daily_long <- daily_long %>%
mutate(stress_state = stress - stress_trait,
negaff_state = negaff - negaff_trait)
#describing data
describe(daily_long)## vars n mean sd median trimmed mad min max
## id 1 1458 322.53 129.08 324.00 324.20 151.23 101.00 532.00
## day 2 1458 3.48 2.30 3.00 3.47 2.97 0.00 7.00
## negaff 3 1441 2.45 1.04 2.20 2.34 1.04 1.00 6.90
## pss 4 1445 2.61 0.68 2.75 2.64 0.74 0.00 4.00
## stress 5 1445 1.39 0.68 1.25 1.36 0.74 0.00 4.00
## bfi_n 6 1458 2.97 0.96 3.00 2.98 1.48 1.00 5.00
## stress_trait 7 1458 1.39 0.47 1.41 1.39 0.51 0.19 2.56
## negaff_trait 8 1458 2.45 0.71 2.38 2.41 0.67 1.11 5.09
## bfi_n_c 9 1458 -0.01 0.96 0.02 0.00 1.48 -1.98 2.02
## stress_trait_c 10 1458 -0.01 0.47 0.01 -0.01 0.51 -1.21 1.17
## stress_state 11 1445 0.00 0.49 -0.03 -0.02 0.46 -1.75 2.12
## negaff_state 12 1441 0.00 0.76 -0.10 -0.04 0.63 -3.62 3.16
## range skew kurtosis se
## id 431.00 -0.07 -1.06 3.38
## day 7.00 0.01 -1.24 0.06
## negaff 5.90 0.96 0.77 0.03
## pss 4.00 -0.35 0.13 0.02
## stress 4.00 0.35 0.13 0.02
## bfi_n 4.00 -0.08 -0.79 0.03
## stress_trait 2.38 -0.08 -0.21 0.01
## negaff_trait 3.98 0.66 0.52 0.02
## bfi_n_c 4.00 -0.08 -0.79 0.03
## stress_trait_c 2.38 -0.08 -0.21 0.01
## stress_state 3.88 0.36 0.79 0.01
## negaff_state 6.78 0.50 1.70 0.02
Great! Now the data are all prepared as needed.
Setting up the Multilevel Model
Set-up for basic multilevel model with continuous outcome
Outlining the Substantive Inquiry
We define stress reactivity (a person-level dynamic characteristic; Ram & Gerstorf, 2009) as the extent to which an individual’s daily negative affect is related to daily stress. That is, stress reactivity is the within-person association between daily negative affect and daily stress.
Lets examine some of our prepared data to see what stress reactivity looks like.
Plotting within-person associations for a subset of individuals.
#faceted plot
daily_long %>%
filter(id <= 125) %>%
ggplot(aes(x=stress_state,y=negaff)) +
geom_point() +
stat_smooth(method="lm", fullrange=TRUE) +
xlab("Stress State") + ylab("Negative Affect (Continuous)") +
facet_wrap( ~ id) +
theme(axis.title=element_text(size=14),
axis.text=element_text(size=12),
strip.text=element_text(size=12))To constrain the regression line within the range of stress scores
for each person (instead of extrapolating to the full range of stress
scores), specify fullrange=FALSE in the
stat_smooth layer above.
From the panel of plots we get a sense that individuals’ negative affect is higher on days where their stress is higher, but to a different extent for each person. These differences in stress reactivity are indicated by differences in the slope of the regression lines.
Our substantive interest is in describing individual differences in stress reactivity and determining if those differences are related to differences in neuroticism.
Outlining the Mathematical Model
The basic linear multilevel model is written as
\[y_{it} = \beta_{0i} + \beta_{1i}x_{it} + e_{it}\] where the time-varying outcome variable, \(y_{it}\) (negative affect) is modeled as a function of a person-specific intercept, \(\beta_{0i}\), a person-specific slope, \(\beta_{1i}\), that captures the within-person association of interest (in our case stress reactivity), and residual error, \(e_{it}\). The person-specific intercepts and slopes are modeled in turn as
\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]
where the fixed effects \(\gamma_{00}\) and \(\gamma_{01}\) describe the prototypical individual’s intercept and slope; \(\gamma_{10}\) and \(\gamma_{11}\) indicate how individual differences in the person-specific intercepts and slopes are related to a between-person differences variable, \(z_{i}\); and the random effects \(u_{0i}\) and \(u_{1i}\) are residual unexplained differences in intercepts and slopes.
Importantly, \[e_{it} \tilde{} N(0,\mathbf{R})\], and \[\mathbf{U}_{i} \tilde{} MVN(0,\mathbf{G})\]
\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\], where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.
\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]
Together, the full set of parameters provide a parsimonious description of the data that allows for inference about within-person associations and between-person differeneces in those associations.
Checking a distributional assumption
Formally, the model assumes that the errors (conditional on all the predictors) are normally distributed. The residuals could be checked after the mode is run.
We can check out the distribution of the outcome variable
(negaff) now:
AMIB_daily %>%
ggplot(aes(x=negaff)) +
geom_histogram(aes(y=..density..),
binwidth=.25,
colour="black", fill="white") +
labs(x = "Negative Affect", y="Density") ## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 17 rows containing non-finite outside the scale range
## (`stat_bin()`).
We see that the negative affect outcome variable looks like a skewed distribution. It is almost always like this. However, the literature has mostly ignored that and gone forward with normality assumptions. Without condoning that practice (but having done it often), we also treat the outcome as relatively normally distributed. Other models we cover later will treat the variable in different ways (binary, count).
Coding the Multilevel Model (Unconditional Means for ICC)
Now that we have defined the dynamic characteristic of interest - stress reactivity - we can operationalize it using the multilevel model.
We make use of the lme4 package for fitting mixed
effects models, and some supplmentary packages: lmerTest
provides tools for obtaining p-vlaues; effects provides
tools for calculating and plotting model-based predictions; and
interactions provides tools for plotting and probing
interactions.
The lmer() function fits the MLMs The data
argument specifies the data sources The na.action argument
specifies what to do with missing data
To start, we often fit an unconditional means model that provides us information about how much of the total variance in the outcome varaible is within-person variance and how much is between-person variance.
Fitting the unconditional means model.
#unconditional means model
model0_fit <- lmer(formula = negaff ~ 1 + (1|id),
data=daily_long,
na.action=na.exclude)
summary(model0_fit)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: negaff ~ 1 + (1 | id)
## Data: daily_long
##
## REML criterion at convergence: 3833.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8739 -0.6123 -0.1608 0.4658 3.9394
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.4270 0.6535
## Residual 0.6627 0.8141
## Number of obs: 1441, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.46368 0.05229 185.80793 47.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We extract the random effects with the VarCorr() function:
## Groups Name Std.Dev.
## id (Intercept) 0.65347
## Residual 0.81408
And then compute the intra-class correlation (ICC) as the ratio of the random intercept variance (between-person) to the total variance, defined as the sum of the random intercept variance and residual variance (between + within). Specifically, \[ICC_{between} = \frac{\sigma^{2}_{u0}}{\sigma^{2}_{u0} + \sigma^{2}_{e}}\]
Calculating the ICC
Store the random effect variances, which will be the first column of the VarCorr object (see above).
## grp var1 var2 vcov sdcor
## 1 id (Intercept) <NA> 0.4270294 0.6534749
## 2 Residual <NA> <NA> 0.6627260 0.8140798
Next, compute the ICC. It is the ratio of the random intercept variance (between-person var) over the total variance (between + within var):
## [1] 0.391858
From the unconditional means model, the ICC was calculated, which indicated that of the total variance in negative affect, 39.19% is attributable to between-person variation whereas 60.81% is attributatable to within-person variation.
This means there is a good portion of within-person variance to model using time-varying predictor.
A Mutlilevel Model for Examining Between-Person Differences in Within-Person Associations
OK - let’s add the predictor, which was split into two components
stress_trait (between-person component) and
stress_state (within-person component), that latter gives
us possibility to quantify each individual’s stress reactivity.
Note that we also include day as a time-varying predictor,
but simply as a control variable to account for any systematic time
trends in the data.
The model as would be included in the Data Analysis section of a paper becomes \[negaff_{it} = \beta_{0i}1_{it} + \beta_{1i}stressstate_{it} + \beta_{2i}day_{it} + e_{it}\] where the time-varying outcome variable, \(y_{it}\) (negative affect) is modeled as a function of a person-specific intercept, \(\beta_{0i}\), a person-specific slope, \(\beta_{1i}\), that captures the within-person association of interest (in our case stress reactivity), a person-specific slope, \(\beta_{2i}\), that captures the linear trend across days, and residual error, \(e_{it}\) that is assumed normally distributed. The person-specific intercepts and slopes are modeled in turn as
\[\beta_{0i} = \gamma_{00} + \gamma_{01}stresstrait_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}stresstrait_{i} + u_{1i}\] \[\beta_{2i} = \gamma_{20}\] where the fixed effects \(\gamma_{00}\) and \(\gamma_{01}\) describe the prototypical individual’s baseline level of negative affect and stress reactivity; \(\gamma_{10}\) and \(\gamma_{11}\) indicate how individual differences in the person-specific those characteristics are related to a between-person differences in stress exposure; and \(\gamma_{20}\) indicates the general trend across days, which is treated as a nuicance and assumed equivalent across persons. The random effects \(u_{0i}\) and \(u_{1i}\) are residual unexplained differences that are assumed multivariate normal and may be correlated.
We need to expand that model through algebraic substitution and multiplying out to get a single-equation model that converts into R code. Ordered to match the code below the singel-equation model becomes \[negaff_{it} = \gamma_{00}1_{it} + \gamma_{20}day_{ti} + \gamma_{01}stresstrait_{i} + \gamma_{10}stressstate_{it} + \gamma_{11}stressstate_{it}stresstrait_{i} + (u_{0i}1_{ti} + u_{1i}stressstate_{it}) + e_{it}\] The model is then implemented in lmer as …
# fit model
model1_fit <- lmer(formula = negaff ~ 1 + day + stress_trait_c +
stress_state + stress_state:stress_trait_c +
(1 + stress_state|id),
data=daily_long,
na.action=na.exclude)
summary(model1_fit)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## negaff ~ 1 + day + stress_trait_c + stress_state + stress_state:stress_trait_c +
## (1 + stress_state | id)
## Data: daily_long
##
## REML criterion at convergence: 3162.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.5368 -0.6127 -0.0729 0.5093 4.4164
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.2135 0.4621
## stress_state 0.1257 0.3546 0.53
## Residual 0.4038 0.6355
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.695e+00 4.583e-02 3.922e+02 58.806 <2e-16
## day -6.580e-02 7.552e-03 1.250e+03 -8.713 <2e-16
## stress_trait_c 1.038e+00 7.946e-02 1.859e+02 13.067 <2e-16
## stress_state 7.647e-01 4.561e-02 1.664e+02 16.765 <2e-16
## stress_trait_c:stress_state 1.550e-01 9.780e-02 1.584e+02 1.585 0.115
##
## (Intercept) ***
## day ***
## stress_trait_c ***
## stress_state ***
## stress_trait_c:stress_state
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) day strs__ strss_
## day -0.569
## strss_trt_c 0.004 0.007
## stress_stat 0.216 0.012 0.002
## strss_tr_:_ 0.034 -0.057 0.268 -0.118
Interpreting Results (model parameters)
The results indicate that there is an association between perceived stress and negative affect, both across persons (between-person association) and within persons (average within-person association). These are described by the Fixed Effects parameters.
- Fixed Effects:
(Intercept): The expected value of negative affect for the prototypical person on a prototypical day is 2.7.day: negative affect decreased over the course of the study days - for every unit increase in day, negative affect decreased by -0.07 (p = 0).stress_trait_c: Individuals with higher perceived stress tended to experience higher negative affect, 1.04, which is statistically significantly different from zero (p = 0).stress_state: On days when the prototypical individual’s perceived stress was higher than usual, their negative affect also tended to be higher than usual, 0.76, which is statistically significantly different from zero (p = 0).stress_trait_c:stress_state: differences in the within-person association between state stress and negative affect are not moderated by trait stress level (0.16, p = 0.11).
- Random Effects:
Variance((Intercept)): There are substantial between-person differences in individuals’ expected value of negative affect (0.21).Variance(stress_state): There are substantial between-person differences in the within-person association between perceived stress and negative affect, stress reactivity, (0.13).Corr((Intercept),stress_state): The correlation between the random intercept and random slope was 0.53, which indicates that those who had higher intercepts for negative affect were also more likely to have greater (more positive) associations between negative affect and perceived stress.
Note that often now the Std Dev values are getting reported instead
of the variances. Interpretation is the same.
We can also get confidence intervals for the fixed and random effects.
Depending on model complexity, the confint() can sometimes
take a while to run.
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 0.40389534 0.52247750
## .sig02 0.27635471 0.77129745
## .sig03 0.25230526 0.45069359
## .sigma 0.60962930 0.66244352
## (Intercept) 2.60530435 2.78468853
## day -0.08058845 -0.05099124
## stress_trait_c 0.88249219 1.19390741
## stress_state 0.67339998 0.85449747
## stress_trait_c:stress_state -0.03788643 0.34703329
The labels for the random effects are not entirely intuitive (e.g.,
sig01, sig02, sig03,
sigma) - especially for models with several random effects.
So, need to check carefully how to interpret the output.
In the above output:
sig01= random effect CI for random intercept (given in SD units - need to square the values if you want the variances)sig02= correlation between the random intercept and random slope (for stress_state)sig03= random effect CI for random slope (stress_state; given in SD units - need to square the values if you want the variances)sigma= residual error CI (given in SD units - need to square the values if you want the variances)
In general, the output given from the confint() will
give you the confidence intervals (given in SD units - will need to
square these values if you want the variances) for the first random
effect (e.g., the random intercept). Then it will give you the
correlations between that random effect and all other random effects it
is correlated with. The next row of CI values will correspond to the
second random effect (e.g., the random slope for stress_state), with the
following rows pertaining to all random effects it is correlated with
(but not include any correlations already given - because for example,
the correlation for the random intercept and random slope for
stress_state was already given). The last row pertaining to random
effects is always called sigma, and pertains to the
residual error (given in SD units - will need to square these values if
you want the variances).
Plotting the Between-Person and Within-Person Associations
Next, we can plot between-person associations.
AMIB_imeans %>%
ggplot(aes(x=stress_trait, y=negaff_trait, group=factor(id)), legend=FALSE) +
geom_point(colour="gray40") +
geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1,
linewidth=2, color="blue") +
xlab("Trait Stress") + ylab("Trait Negative Affect") +
theme_classic() +
theme(axis.title=element_text(size=16),
axis.text=element_text(size=12),
plot.title=element_text(size=16, hjust=.5)) +
ggtitle("Between-Person Association Plot\nTrait Stress & Negative Affect")## `geom_smooth()` using formula = 'y ~ x'
Note that the plot is not actually using the model output - so it is just an approximation of the exact model (using geom_smooth embedded within ggplot). However, it is a very useful display.
Plotting predicted within-person associations using the predicted scores.
daily_long %>%
ggplot(aes(x=stress_state, y=pred_m1, group=factor(id), colour="gray"), legend=FALSE) +
geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1,
linewidth=.5, color="gray40") +
geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1,
linewidth=2, color="blue") +
xlab("State Stress") + ylab("Predicted Negative Affect") +
theme_classic() +
theme(axis.title=element_text(size=18),
axis.text=element_text(size=14),
plot.title=element_text(size=18, hjust=.5)) +
ggtitle("Within-Person Association\nState Stress & Negative Affect")## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 20 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 20 rows containing non-finite outside the scale range
## (`stat_smooth()`).
Note that in this plot, although the predictor on the x-axis is a within-person state variable, the negative affect outcome variable on the y-axis still has both the within- and the between-person components in it.
We can alternatively “purify” the plot by using the negative affect state variable as the outcome. Then we have plotted only the within-person associations. Plotting predicted within-person associations.
daily_long %>%
ggplot(aes(x=stress_state, y=negaff_state, group=factor(id), colour="gray"),
legend=FALSE) +
geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1,
linewidth=.5, color="gray40") +
geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1,
linewidth=2, color="blue") +
xlab("Stress State") + ylab("Predicted State Negative Affect") +
theme_classic() +
theme(axis.title=element_text(size=18),
axis.text=element_text(size=14),
plot.title=element_text(size=18, hjust=.5)) +
ggtitle("Within-Person Association Plot\nPerceived Stress & Negative Affect")## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 20 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 20 rows containing non-finite outside the scale range
## (`stat_smooth()`).
Note that the plot is not actually using the model output - so it is just an approximation of the exact model (using geom_smooth with state scores embedded within ggplot). However, it is a very useful display.
That was awesome!!!
A First Extension
Adding Another Person-level Predictor
We now add trait neuroticism to the model, to see if and how between-person differences in neuroticism are related to stress reactivity.
Running the expanded model.
# fit model
model2_fit <- lmer(formula = negaff ~ 1 + day + stress_trait_c +
bfi_n_c + stress_trait_c:bfi_n_c +
stress_state + stress_state:stress_trait_c +
stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c +
(1 + stress_state|id),
data=daily_long,
na.action=na.exclude)
#Look at results
summary(model2_fit)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## negaff ~ 1 + day + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +
## stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +
## stress_state:stress_trait_c:bfi_n_c + (1 + stress_state | id)
## Data: daily_long
##
## REML criterion at convergence: 3161.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4271 -0.6011 -0.0749 0.5045 4.4732
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.1955 0.4422
## stress_state 0.1238 0.3518 0.51
## Residual 0.4040 0.6356
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 2.690e+00 4.556e-02 3.944e+02 59.055
## day -6.545e-02 7.572e-03 1.247e+03 -8.644
## stress_trait_c 9.695e-01 7.878e-02 1.835e+02 12.307
## bfi_n_c 1.543e-01 3.917e-02 1.809e+02 3.939
## stress_state 7.687e-01 4.682e-02 1.673e+02 16.418
## stress_trait_c:bfi_n_c 3.715e-02 7.832e-02 1.824e+02 0.474
## stress_trait_c:stress_state 1.254e-01 1.015e-01 1.692e+02 1.235
## bfi_n_c:stress_state 7.595e-02 4.845e-02 1.549e+02 1.568
## stress_trait_c:bfi_n_c:stress_state -3.167e-02 1.024e-01 1.722e+02 -0.309
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## day < 2e-16 ***
## stress_trait_c < 2e-16 ***
## bfi_n_c 0.000117 ***
## stress_state < 2e-16 ***
## stress_trait_c:bfi_n_c 0.635818
## stress_trait_c:stress_state 0.218476
## bfi_n_c:stress_state 0.119006
## stress_trait_c:bfi_n_c:stress_state 0.757563
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) day strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## day -0.576
## strss_trt_c 0.003 0.005
## bfi_n_c -0.008 0.007 -0.224
## stress_stat 0.204 0.004 0.000 0.001
## strss_t_:__ -0.179 0.008 0.008 0.040 -0.054
## strss_tr_:_ 0.042 -0.073 0.248 -0.056 -0.087 0.003
## bf_n_c:str_ -0.033 0.058 -0.058 0.258 0.016 0.009 -0.244
## strs__:__:_ -0.061 0.032 0.004 0.008 -0.233 0.243 -0.103 -0.059
- Fixed Effects:
(Intercept): The expected value of negative affect for the prototypical person on a prototypical day is 2.69.day: For every unit increase in day, negative affect is expected to decrease by -0.07 (p = 0).stress_trait_c: Individuals with higher perceived stress tended to experience higher negative affect, 0.97, which is statistically significantly different from zero (p = 0).bfi_n_c: On average, individuals with higher neuroticism also tended to experience higher negative affect (0.15, p = 0)stress_state: On days when the prototypical individual’s perceived stress was higher than usual, their negative affect also tended to be higher than usual, 0.77, which was statistically significantly different from zero (p = 0).stress_trait_c:bfi_n_c:trait neuroticism did not moderate the between-person association between stress and negative affect (0.04, p = 0.64).stress_trait_c:stress_state: trait stress does not moderate the within-person association between state stress and negative affect (0.13, p = 0.22).bfi_n_c:stress_state: trait neuroticism does not moderate the within-person association between stress and negative affect (0.08, p = 0.12).stress_trait_c:bfi_n_c:stress_state: neuroticism does not moderate the trait stress moderator (-0.03, p = 0.76).
- Random Effects:
Variance((Intercept)): There was a substantial extent of between-person differences in the average value of negative affect (0.2).Variance(stress_state): There was a substantial extent of between-person differences in the average within-person association between perceived stress and negative affect (0.12).Corr((Intercept),stress_state): The correlation between the random intercept and random slope was 0.51, which indicates that those who had higher intercept values for negative affect were also more likely to exhibit stress reactivity (more positive within-person associations between negative affect and perceived stress).
Plotting and Probing the Cross-Level Interactions
Probing and Plotting the Interaction
The moderation is not significant. However, if it were, we would want
to plot and probe the interaction term, specifically the
stress_state:bfi_n_c term.
We can use the effect function.
We examine what the effect is as various levels of the predictors.
Standard practice is to use +1 and -1 SD.
#xlevels = is the list of values that we want to evaluate at.
#obtaining the standard deviation of the between-person moderator
describe(daily_long$bfi_n_c)## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1458 -0.01 0.96 0.02 0 1.48 -1.98 2.02 4 -0.08 -0.79
## se
## X1 0.03
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1445 0 0.49 -0.03 -0.02 0.46 -1.75 2.12 3.88 0.36 0.79 0.01
#calculate effect
effects_model2 <- effect(term="bfi_n_c*stress_state", mod=model2_fit,
xlevels=list(bfi_n_c=c(-0.96, +0.96), stress_state=c(-0.49,+0.49)))## NOTE: bfi_n_c:stress_state is not a high-order term in the model
## Warning in Analyze.model(focal.predictors, mod, xlevels, default.levels, : the
## predictors stress_trait_c, bfi_n_c are one-column matrices that were converted
## to vectors
##
## bfi_n_c*stress_state effect
## stress_state
## bfi_n_c -0.49 0.49
## -0.96 1.964450 2.644774
## 0.96 2.188157 3.011998
##
## Lower 95 Percent Confidence Limits
## stress_state
## bfi_n_c -0.49 0.49
## -0.96 1.858012 2.510688
## 0.96 2.080620 2.876440
##
## Upper 95 Percent Confidence Limits
## stress_state
## bfi_n_c -0.49 0.49
## -0.96 2.070888 2.778860
## 0.96 2.295694 3.147556
Everything we need is in there!
We can convert to a data frame and plot it!
#convert to dataframe
effectsdata <- as.data.frame(effects_model2)
#plotting the effect evaluation (with standard error ribbon)
effectsdata %>%
ggplot(aes(x=stress_state, y=fit, group=bfi_n_c), legend=FALSE) +
geom_point() +
geom_line() +
geom_errorbar(aes(ymin=lower, ymax=upper), width=.15) +
xlab("Stress State") + xlim(-2,2) +
ylab("Predicted Negative Affect") + ylim(1,7) +
ggtitle("Differences in Stress Reactivity across Neuroticism")Note that the non-significant interaction appears as two practically parallel lines.
Johnson-Neyman Technique
Let’s go a bit further …
We would like to identify the range of values of the moderator at which the within-person slope is significant. Formally, the Johnson-Neyman technique is used to probe significant interactions in order to determine for what values of the moderator the focal predictor is significantly related to the outcome - i.e., to identify the region of significance. This is quickly becoming standard practice in reporting about interactions in multilevel models.
In our case, we are interested to know the range of neuroticism scores at which we would expect individuals to have significant stress reactivity (within-person association between daily stress and daily negative affect).
We make use of the johnson-neyman() function in the
interactions package. Note that the backports
package may also be needed to get a deprecated isFALSE()
function that this package uses.
Also, when using the interactions package we need to do
a little data formatting (to make sure that all variables are vectors),
and to run the model using lmer() without the lmerTest() overlay (to
obtain a merMod object).
Fixing data.
## 'data.frame': 1458 obs. of 14 variables:
## $ id : int 101 101 101 101 101 101 101 101 102 102 ...
## $ day : int 0 1 2 3 4 5 6 7 0 1 ...
## $ negaff : num 3 2.3 1 1.3 1.1 1 1.2 1.1 1.4 1.6 ...
## $ pss : num 2.5 2.75 3.5 3 2.75 2.75 3.5 2.75 3.5 4 ...
## $ stress : num 1.5 1.25 0.5 1 1.25 1.25 0.5 1.25 0.5 0 ...
## $ bfi_n : num 2 2 2 2 2 2 2 2 2 2 ...
## $ stress_trait : num 1.06 1.06 1.06 1.06 1.06 ...
## $ negaff_trait : num 1.5 1.5 1.5 1.5 1.5 ...
## $ bfi_n_c : num [1:1458, 1] -0.982 -0.982 -0.982 -0.982 -0.982 ...
## ..- attr(*, "scaled:center")= num 2.98
## $ stress_trait_c: num [1:1458, 1] -0.333 -0.333 -0.333 -0.333 -0.333 ...
## ..- attr(*, "scaled:center")= num 1.4
## $ stress_state : num 0.4375 0.1875 -0.5625 -0.0625 0.1875 ...
## $ negaff_state : num 1.5 0.8 -0.5 -0.2 -0.4 ...
## $ pred_m1 : Named num 2.12 1.91 1.4 1.63 1.71 ...
## ..- attr(*, "names")= chr [1:1458] "1" "2" "3" "4" ...
## $ pred_m2 : Named num 2.1 1.89 1.39 1.61 1.69 ...
## ..- attr(*, "names")= chr [1:1458] "1" "2" "3" "4" ...
#fixing a few variables that were matrices to be vectors
daily_long <- daily_long %>%
mutate(bfi_n_c = as.vector(bfi_n_c),
stress_trait_c = as.vector(stress_trait_c))Specify and fit model as specific lme4::lmer to get
merMod object (rather than a lmerModLmerTest
object).
# fit model
model2a_fit <- lme4::lmer(formula = negaff ~ 1 + day + stress_trait_c +
bfi_n_c + stress_trait_c:bfi_n_c +
stress_state + stress_state:stress_trait_c +
stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c +
(1 + stress_state|id),
data=daily_long,
na.action=na.exclude)
# Look at results
summary(model2a_fit)## Linear mixed model fit by REML ['lmerMod']
## Formula:
## negaff ~ 1 + day + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +
## stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +
## stress_state:stress_trait_c:bfi_n_c + (1 + stress_state | id)
## Data: daily_long
##
## REML criterion at convergence: 3161.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4271 -0.6011 -0.0749 0.5045 4.4732
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.1955 0.4422
## stress_state 0.1238 0.3518 0.51
## Residual 0.4040 0.6356
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 2.690415 0.045558 59.055
## day -0.065452 0.007572 -8.644
## stress_trait_c 0.969539 0.078780 12.307
## bfi_n_c 0.154266 0.039168 3.939
## stress_state 0.768705 0.046821 16.418
## stress_trait_c:bfi_n_c 0.037152 0.078322 0.474
## stress_trait_c:stress_state 0.125401 0.101524 1.235
## bfi_n_c:stress_state 0.075952 0.048450 1.568
## stress_trait_c:bfi_n_c:stress_state -0.031668 0.102428 -0.309
##
## Correlation of Fixed Effects:
## (Intr) day strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## day -0.576
## strss_trt_c 0.003 0.005
## bfi_n_c -0.008 0.007 -0.224
## stress_stat 0.204 0.004 0.000 0.001
## strss_t_:__ -0.179 0.008 0.008 0.040 -0.054
## strss_tr_:_ 0.042 -0.073 0.248 -0.056 -0.087 0.003
## bf_n_c:str_ -0.033 0.058 -0.058 0.258 0.016 0.009 -0.244
## strs__:__:_ -0.061 0.032 0.004 0.008 -0.233 0.243 -0.103 -0.059
# Get confidence intervals for both fixed and random effects
#confint(model2a_fit)
# Save predicted scores
daily_long <- daily_long %>%
mutate(pred_m2a = predict(model2a_fit))
# Fit statistics
AIC(logLik(model2a_fit)) ## [1] 3187.776
## [1] 3256.299
## 'log Lik.' -1580.888 (df=13)
Plotting and probing the simple slopes (within-person association
between stress_state and negaff) across the
full range of the moderator (bfi_n_c).
## JOHNSON-NEYMAN INTERVAL
##
## When bfi_n_c is INSIDE the interval [-4.45, 40.14], the slope of
## stress_state is p < .05.
##
## Note: The range of observed values of bfi_n_c is [-1.98, 2.02]
We see from the output and the plot that the within-person stress reactivity slopes are significant at all observed values of neuroticism - as we would expect given the non-significance of the interaction term.
We can also probe the 3-way interaction, provided we choose specific values for the 2nd moderator.
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1458 -0.01 0.47 0.01 -0.01 0.51 -1.21 1.17 2.38 -0.08 -0.21
## se
## X1 0.01
#probing 3-way interaction
simpleslopes_model2a <- sim_slopes(model=model2a_fit, pred=stress_state, modx=bfi_n_c,
mod2=stress_trait_c,mod2.values = c(-0.47,+0.47))
plot(simpleslopes_model2a)From the output we see that the value of the within-person stress reactivity slopes (i.e., the simple slopes) are similar across all values of the moderators and have overlapping confidence intervals - as we would expect given the non-significance of the interaction term.
As you try these techniques out with the AMIB data, please let us know if you find a set of variables that has the significant interactions that make for interesting results and theory. =:-]
Conclusion
This tutorial illustrated application of the basic multilevel model (where the outcome variable is assumed normally distributed) for examining between-person differences in within-person associations (Intraindividual Covariation). Follow-ups include making beautifully informative plots and the probing and plotting of interactions.
Good luck! =:-]
Citations
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01
Fox, J., & Hong, J. (2010). Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package. Journal of Statistical Software, 32, 1–24. https://doi.org/10.18637/jss.v032.i01
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software, 82, 1–26. https://doi.org/10.18637/jss.v082.i13
Lang, M., Murdoch, D., & R Core Team. (2024). backports: Reimplementations of Functions Introduced Since R-3.0.0 (Version 1.5.0). https://CRAN.R-project.org/package=backports
Long, J. A. (2024). interactions: Comprehensive, User-Friendly Toolkit for Probing Interactions (Version 1.2.0). https://doi.org/10.32614/CRAN.package.interactions
R Core Team. (2024). R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. https://www.R-project.org/
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research Northwestern University. https://CRAN.R-project.org/package=psych
Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29.
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag.
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation (Version 1.1.4). https://CRAN.R-project.org/package=dplyr
Wickham, H., Vaughan, D., & Girlich, M. (2024). tidyr: Tidy Messy Data (Version 1.3.1). https://CRAN.R-project.org/package=tidyr