Introduction to Multilevel Model for Intraindividual Covariation

Overview

The MultiLevel Model (MLM) was developed to accommodate the interdependence in nested data. It is particularly useful for examining individual differences in bivariate associations - thus providing for more robust statistical inference about within-person associations (intraindividual covariation) and the between-person differences therein.

This tutorial covers how the multilevel model can be used to examine within-person associations (intraindividual covariation) and how those associations are moderated by between-person differences.

Our example is developed using experience sampling data (repeated occasions nested within persons), but also applies to other kinds of nested data.

Introduction to the Multilevel Model

The basic linear multilevel model is written as

\[y_{it} = \beta_{0i} + \beta_{1i}x_{it} + e_{it}\] where \(\beta_{1i}\) is individual i’s level of … -ity, and where

\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]

where \[e_{it} \tilde{} N(0,\mathbf{R})\]

and \[\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})\]

\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\]

where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.

\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]

Preliminaries

Loading Libraries

library(ggplot2)       # data visualization
library(psych)         # describing  data
library(plyr)          # data manipulation
library(tidyr)         # tidy data
library(dplyr)         # data manipulation
library(interactions)  # probing/plotting interactions
library(backports)     # revive the isFALSE() function for sim_slopes()
library(effects)       # probing interactions
library(lme4)          # multilevel models
library(lmerTest)      # p-values

Reading in Repeated Measures Data

Our example makes use of the person-level and day-level (EMA-type) AMIB Data, a multiple time-scale data set that has been shared for teaching purposes.The data are shared with intent to facilitate learning about analysis of intensive longitudinal data (from experience sampling, daily diary, or EMA designs). The data are posted only for teaching purposes and should not be used for research.

Use requires citation and acknowledgement of both this website and the following paper:

Ram, N., Conroy, D. E., Pincus, A. L., Hyde, A. L., & Molloy, L. E. (2012). Tethering theory to method: Using measures of intraindividual variability to operationalize individuals’ dynamic characteristics. In G. Hancock & J. Harring (Eds.), Advances in longitudinal methods in the social and behavioral sciences (pp.81-110). New York: Information Age.

We make use of three variables:

Outcome: daily negative affect (continuous), the negaff variable in the day-level data file;

Time-varying Predictor: daily stress, a stress variable obtained through reverse coding of pss in the day-level data file;

Person-level Predictor/Moderator: trait neuroticism, the bfi_n variable in the person-level data file.

Loading person-level file (N = 190) and subsetting to variables of interest

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/AMIB/AMIB_persons.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath), header=TRUE)

#subsetting to variables of interest
AMIB_persons <- AMIB_persons %>%
  select(id, bfi_n)

Loading day-level file (T = 8) and subsetting to variables of interest.

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/AMIB/AMIB_daily.csv"
#read in the .csv file using the url() function
AMIB_daily <- read.csv(file=url(filepath),header=TRUE)

#subsetting to variables of interest
AMIB_daily <- AMIB_daily %>%
  select(id, day, negaff, pss)

Data Preparation and Description

In this example data are prepared through some recoding (idiosyncratic for this data set), and separation of time-varying predictor variables into between-person and within-person components (typical for all multilevel analysis)

Data Recoding

Reverse code pss into a new stress variable where higher values indicate higher perceived stress.

#reverse coding the pss variable into a new stress variable
AMIB_daily <- AMIB_daily %>%
  mutate(stress = 4-pss)

#describing new variable
describe(AMIB_daily$stress)

##    vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 1445 1.39 0.68   1.25    1.36 0.74   0   4     4 0.35     0.13 0.02

#histogram
AMIB_daily %>%
  ggplot(aes(x=stress)) +
  geom_histogram(fill="white", color="black",bins=20) +
  labs(x = "Stress (high = more stressed)")

Between- and Within- Components

We now split the time-varying predictor into “trait” (between-person differences) and “state” (within-person deviations) components. Specifically, the daily variable stress is split into two variables: stress_trait is the sample-mean centered between-person component, and stress_state is the person-centered within-person component.

Although not formally needed, we do this for all the time-varying variables (predictors, outcomes) for easier plotting later.

Making trait variables

#calculating intraindividual means
AMIB_imeans <- AMIB_daily %>% 
  group_by(id) %>%
  summarize(stress_trait = mean(stress, na.rm=TRUE),
            negaff_trait = mean(negaff, na.rm=TRUE))

describe(AMIB_imeans)

##              vars   n   mean     sd median trimmed    mad    min    max  range
## id              1 190 318.29 130.44 321.50  318.99 151.23 101.00 532.00 431.00
## stress_trait    2 190   1.40   0.48   1.41    1.39   0.51   0.19   2.56   2.38
## negaff_trait    3 190   2.48   0.73   2.41    2.43   0.72   1.11   5.09   3.98
##               skew kurtosis   se
## id           -0.04    -1.09 9.46
## stress_trait -0.04    -0.23 0.03
## negaff_trait  0.68     0.45 0.05

#merging into person-level file
AMIB_persons <- AMIB_persons %>%
  left_join(AMIB_imeans, by="id")
  
#make centered versions of the person-level scores
AMIB_persons <- AMIB_persons %>%
  mutate(bfi_n_c = scale(bfi_n, center=TRUE, scale=FALSE),
         stress_trait_c = scale(stress_trait, center=TRUE, scale=FALSE))

#describe person-level data
describe(AMIB_persons)

##                vars   n   mean     sd median trimmed    mad    min    max
## id                1 190 318.29 130.44 321.50  318.99 151.23 101.00 532.00
## bfi_n             2 190   2.98   0.96   3.00    3.00   1.48   1.00   5.00
## stress_trait      3 190   1.40   0.48   1.41    1.39   0.51   0.19   2.56
## negaff_trait      4 190   2.48   0.73   2.41    2.43   0.72   1.11   5.09
## bfi_n_c           5 190   0.00   0.96   0.02    0.02   1.48  -1.98   2.02
## stress_trait_c    6 190   0.00   0.48   0.01    0.00   0.51  -1.21   1.17
##                 range  skew kurtosis   se
## id             431.00 -0.04    -1.09 9.46
## bfi_n            4.00 -0.09    -0.82 0.07
## stress_trait     2.38 -0.04    -0.23 0.03
## negaff_trait     3.98  0.68     0.45 0.05
## bfi_n_c          4.00 -0.09    -0.82 0.07
## stress_trait_c   2.38 -0.04    -0.23 0.03

Making state variables in long data (as deviations from uncentered trait variable)

#merging person-level data into daily data
daily_long <- AMIB_daily %>%
  left_join(AMIB_persons, by="id")

#calculating state variables
daily_long <- daily_long %>%
  mutate(stress_state = stress - stress_trait,
         negaff_state = negaff - negaff_trait)

#describing data
describe(daily_long)

##                vars    n   mean     sd median trimmed    mad    min    max
## id                1 1458 322.53 129.08 324.00  324.20 151.23 101.00 532.00
## day               2 1458   3.48   2.30   3.00    3.47   2.97   0.00   7.00
## negaff            3 1441   2.45   1.04   2.20    2.34   1.04   1.00   6.90
## pss               4 1445   2.61   0.68   2.75    2.64   0.74   0.00   4.00
## stress            5 1445   1.39   0.68   1.25    1.36   0.74   0.00   4.00
## bfi_n             6 1458   2.97   0.96   3.00    2.98   1.48   1.00   5.00
## stress_trait      7 1458   1.39   0.47   1.41    1.39   0.51   0.19   2.56
## negaff_trait      8 1458   2.45   0.71   2.38    2.41   0.67   1.11   5.09
## bfi_n_c           9 1458  -0.01   0.96   0.02    0.00   1.48  -1.98   2.02
## stress_trait_c   10 1458  -0.01   0.47   0.01   -0.01   0.51  -1.21   1.17
## stress_state     11 1445   0.00   0.49  -0.03   -0.02   0.46  -1.75   2.12
## negaff_state     12 1441   0.00   0.76  -0.10   -0.04   0.63  -3.62   3.16
##                 range  skew kurtosis   se
## id             431.00 -0.07    -1.06 3.38
## day              7.00  0.01    -1.24 0.06
## negaff           5.90  0.96     0.77 0.03
## pss              4.00 -0.35     0.13 0.02
## stress           4.00  0.35     0.13 0.02
## bfi_n            4.00 -0.08    -0.79 0.03
## stress_trait     2.38 -0.08    -0.21 0.01
## negaff_trait     3.98  0.66     0.52 0.02
## bfi_n_c          4.00 -0.08    -0.79 0.03
## stress_trait_c   2.38 -0.08    -0.21 0.01
## stress_state     3.88  0.36     0.79 0.01
## negaff_state     6.78  0.50     1.70 0.02

Great! Now the data are all prepared as needed.

Setting up the Multilevel Model

Set-up for basic multilevel model with continuous outcome

Outlining the Substantive Inquiry

We define stress reactivity (a person-level dynamic characteristic; Ram & Gerstorf, 2009) as the extent to which an individual’s daily negative affect is related to daily stress. That is, stress reactivity is the within-person association between daily negative affect and daily stress.

Lets examine some of our prepared data to see what stress reactivity looks like.

Plotting within-person associations for a subset of individuals.

#faceted plot
daily_long %>%
  filter(id <= 125) %>%
  ggplot(aes(x=stress_state,y=negaff)) +
  geom_point() +
  stat_smooth(method="lm", fullrange=TRUE) +
  xlab("Stress State") + ylab("Negative Affect (Continuous)") + 
  facet_wrap( ~ id) +
  theme(axis.title=element_text(size=14),
        axis.text=element_text(size=12),
        strip.text=element_text(size=12))

To constrain the regression line within the range of stress scores for each person (instead of extrapolating to the full range of stress scores), specify fullrange=FALSE in the stat_smooth layer above.

From the panel of plots we get a sense that individuals’ negative affect is higher on days where their stress is higher, but to a different extent for each person. These differences in stress reactivity are indicated by differences in the slope of the regression lines.

Our substantive interest is in describing individual differences in stress reactivity and determining if those differences are related to differences in neuroticism.

Outlining the Mathematical Model

The basic linear multilevel model is written as

\[y_{it} = \beta_{0i} + \beta_{1i}x_{it} + e_{it}\] where the time-varying outcome variable, \(y_{it}\) (negative affect) is modeled as a function of a person-specific intercept, \(\beta_{0i}\), a person-specific slope, \(\beta_{1i}\), that captures the within-person association of interest (in our case stress reactivity), and residual error, \(e_{it}\). The person-specific intercepts and slopes are modeled in turn as

\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]

where the fixed effects \(\gamma_{00}\) and \(\gamma_{01}\) describe the prototypical individual’s intercept and slope; \(\gamma_{10}\) and \(\gamma_{11}\) indicate how individual differences in the person-specific intercepts and slopes are related to a between-person differences variable, \(z_{i}\); and the random effects \(u_{0i}\) and \(u_{1i}\) are residual unexplained differences in intercepts and slopes.

Importantly, \[e_{it} \tilde{} N(0,\mathbf{R})\], and \[\mathbf{U}_{i} \tilde{} MVN(0,\mathbf{G})\]

\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\], where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.

\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]

Together, the full set of parameters provide a parsimonious description of the data that allows for inference about within-person associations and between-person differeneces in those associations.

Checking a distributional assumption

Formally, the model assumes that the errors (conditional on all the predictors) are normally distributed. The residuals could be checked after the mode is run.

We can check out the distribution of the outcome variable (negaff) now:

AMIB_daily %>%
  ggplot(aes(x=negaff)) +
  geom_histogram(aes(y=..density..),      
                   binwidth=.25,
                   colour="black", fill="white") +
  labs(x = "Negative Affect", y="Density")

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 17 rows containing non-finite outside the scale range
## (`stat_bin()`).

We see that the negative affect outcome variable looks like a skewed distribution. It is almost always like this. However, the literature has mostly ignored that and gone forward with normality assumptions. Without condoning that practice (but having done it often), we also treat the outcome as relatively normally distributed. Other models we cover later will treat the variable in different ways (binary, count).

Coding the Multilevel Model (Unconditional Means for ICC)

Now that we have defined the dynamic characteristic of interest - stress reactivity - we can operationalize it using the multilevel model.

We make use of the lme4 package for fitting mixed effects models, and some supplmentary packages: lmerTest provides tools for obtaining p-vlaues; effects provides tools for calculating and plotting model-based predictions; and interactions provides tools for plotting and probing interactions.

The lmer() function fits the MLMs The data argument specifies the data sources The na.action argument specifies what to do with missing data

To start, we often fit an unconditional means model that provides us information about how much of the total variance in the outcome varaible is within-person variance and how much is between-person variance.

Fitting the unconditional means model.

#unconditional means model
model0_fit <- lmer(formula = negaff ~ 1 + (1|id), 
              data=daily_long,
              na.action=na.exclude)
summary(model0_fit)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: negaff ~ 1 + (1 | id)
##    Data: daily_long
## 
## REML criterion at convergence: 3833.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8739 -0.6123 -0.1608  0.4658  3.9394 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept) 0.4270   0.6535  
##  Residual             0.6627   0.8141  
## Number of obs: 1441, groups:  id, 190
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   2.46368    0.05229 185.80793   47.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We extract the random effects with the VarCorr() function:

VarCorr(model0_fit)

##  Groups   Name        Std.Dev.
##  id       (Intercept) 0.65347 
##  Residual             0.81408

And then compute the intra-class correlation (ICC) as the ratio of the random intercept variance (between-person) to the total variance, defined as the sum of the random intercept variance and residual variance (between + within). Specifically, \[ICC_{between} = \frac{\sigma^{2}_{u0}}{\sigma^{2}_{u0} + \sigma^{2}_{e}}\]

Calculating the ICC

Store the random effect variances, which will be the first column of the VarCorr object (see above).

RandomEffects <- as.data.frame(VarCorr(model0_fit))
RandomEffects

##        grp        var1 var2      vcov     sdcor
## 1       id (Intercept) <NA> 0.4270294 0.6534749
## 2 Residual        <NA> <NA> 0.6627260 0.8140798

Next, compute the ICC. It is the ratio of the random intercept variance (between-person var) over the total variance (between + within var):

ICC_between <- RandomEffects[1,4]/(RandomEffects[1,4]+RandomEffects[2,4]) 
ICC_between

## [1] 0.391858

From the unconditional means model, the ICC was calculated, which indicated that of the total variance in negative affect, 39.19% is attributable to between-person variation whereas 60.81% is attributatable to within-person variation.

This means there is a good portion of within-person variance to model using time-varying predictor.

A Mutlilevel Model for Examining Between-Person Differences in Within-Person Associations

OK - let’s add the predictor, which was split into two components stress_trait (between-person component) and stress_state (within-person component), that latter gives us possibility to quantify each individual’s stress reactivity. Note that we also include day as a time-varying predictor, but simply as a control variable to account for any systematic time trends in the data.

The model as would be included in the Data Analysis section of a paper becomes \[negaff_{it} = \beta_{0i}1_{it} + \beta_{1i}stressstate_{it} + \beta_{2i}day_{it} + e_{it}\] where the time-varying outcome variable, \(y_{it}\) (negative affect) is modeled as a function of a person-specific intercept, \(\beta_{0i}\), a person-specific slope, \(\beta_{1i}\), that captures the within-person association of interest (in our case stress reactivity), a person-specific slope, \(\beta_{2i}\), that captures the linear trend across days, and residual error, \(e_{it}\) that is assumed normally distributed. The person-specific intercepts and slopes are modeled in turn as

\[\beta_{0i} = \gamma_{00} + \gamma_{01}stresstrait_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}stresstrait_{i} + u_{1i}\] \[\beta_{2i} = \gamma_{20}\] where the fixed effects \(\gamma_{00}\) and \(\gamma_{01}\) describe the prototypical individual’s baseline level of negative affect and stress reactivity; \(\gamma_{10}\) and \(\gamma_{11}\) indicate how individual differences in the person-specific those characteristics are related to a between-person differences in stress exposure; and \(\gamma_{20}\) indicates the general trend across days, which is treated as a nuicance and assumed equivalent across persons. The random effects \(u_{0i}\) and \(u_{1i}\) are residual unexplained differences that are assumed multivariate normal and may be correlated.

We need to expand that model through algebraic substitution and multiplying out to get a single-equation model that converts into R code. Ordered to match the code below the singel-equation model becomes \[negaff_{it} = \gamma_{00}1_{it} + \gamma_{20}day_{ti} + \gamma_{01}stresstrait_{i} + \gamma_{10}stressstate_{it} + \gamma_{11}stressstate_{it}stresstrait_{i} + (u_{0i}1_{ti} + u_{1i}stressstate_{it}) + e_{it}\] The model is then implemented in lmer as …

# fit model
model1_fit <- lmer(formula = negaff ~ 1 + day + stress_trait_c + 
                      stress_state + stress_state:stress_trait_c + 
                      (1 + stress_state|id), 
                    data=daily_long,
                    na.action=na.exclude)
summary(model1_fit)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## negaff ~ 1 + day + stress_trait_c + stress_state + stress_state:stress_trait_c +  
##     (1 + stress_state | id)
##    Data: daily_long
## 
## REML criterion at convergence: 3162.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5368 -0.6127 -0.0729  0.5093  4.4164 
## 
## Random effects:
##  Groups   Name         Variance Std.Dev. Corr
##  id       (Intercept)  0.2135   0.4621       
##           stress_state 0.1257   0.3546   0.53
##  Residual              0.4038   0.6355       
## Number of obs: 1438, groups:  id, 190
## 
## Fixed effects:
##                               Estimate Std. Error         df t value Pr(>|t|)
## (Intercept)                  2.695e+00  4.583e-02  3.922e+02  58.806   <2e-16
## day                         -6.580e-02  7.552e-03  1.250e+03  -8.713   <2e-16
## stress_trait_c               1.038e+00  7.946e-02  1.859e+02  13.067   <2e-16
## stress_state                 7.647e-01  4.561e-02  1.664e+02  16.765   <2e-16
## stress_trait_c:stress_state  1.550e-01  9.780e-02  1.584e+02   1.585    0.115
##                                
## (Intercept)                 ***
## day                         ***
## stress_trait_c              ***
## stress_state                ***
## stress_trait_c:stress_state    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) day    strs__ strss_
## day         -0.569                     
## strss_trt_c  0.004  0.007              
## stress_stat  0.216  0.012  0.002       
## strss_tr_:_  0.034 -0.057  0.268 -0.118

# save predicted scores (for later plotting)
daily_long <- daily_long %>%
  mutate(pred_m1 = predict(model1_fit))

Interpreting Results (model parameters)

The results indicate that there is an association between perceived stress and negative affect, both across persons (between-person association) and within persons (average within-person association). These are described by the Fixed Effects parameters.

Fixed Effects:
- (Intercept): The expected value of negative affect for the prototypical person on a prototypical day is 2.7.
- day: negative affect decreased over the course of the study days - for every unit increase in day, negative affect decreased by -0.07 (p = 0).
- stress_trait_c: Individuals with higher perceived stress tended to experience higher negative affect, 1.04, which is statistically significantly different from zero (p = 0).
- stress_state: On days when the prototypical individual’s perceived stress was higher than usual, their negative affect also tended to be higher than usual, 0.76, which is statistically significantly different from zero (p = 0).
- stress_trait_c:stress_state: differences in the within-person association between state stress and negative affect are not moderated by trait stress level (0.16, p = 0.11).
Random Effects:
- Variance((Intercept)): There are substantial between-person differences in individuals’ expected value of negative affect (0.21).
- Variance(stress_state): There are substantial between-person differences in the within-person association between perceived stress and negative affect, stress reactivity, (0.13).
- Corr((Intercept),stress_state): The correlation between the random intercept and random slope was 0.53, which indicates that those who had higher intercepts for negative affect were also more likely to have greater (more positive) associations between negative affect and perceived stress.

Note that often now the Std Dev values are getting reported instead of the variances. Interpretation is the same.
We can also get confidence intervals for the fixed and random effects. Depending on model complexity, the confint() can sometimes take a while to run.

# Get confidence intervals for both fixed and random effects
confint(model1_fit)

## Computing profile confidence intervals ...

##                                   2.5 %      97.5 %
## .sig01                       0.40389534  0.52247750
## .sig02                       0.27635471  0.77129745
## .sig03                       0.25230526  0.45069359
## .sigma                       0.60962930  0.66244352
## (Intercept)                  2.60530435  2.78468853
## day                         -0.08058845 -0.05099124
## stress_trait_c               0.88249219  1.19390741
## stress_state                 0.67339998  0.85449747
## stress_trait_c:stress_state -0.03788643  0.34703329

The labels for the random effects are not entirely intuitive (e.g., sig01, sig02, sig03, sigma) - especially for models with several random effects. So, need to check carefully how to interpret the output.

In the above output:

sig01 = random effect CI for random intercept (given in SD units - need to square the values if you want the variances)
sig02 = correlation between the random intercept and random slope (for stress_state)
sig03 = random effect CI for random slope (stress_state; given in SD units - need to square the values if you want the variances)
sigma = residual error CI (given in SD units - need to square the values if you want the variances)

In general, the output given from the confint() will give you the confidence intervals (given in SD units - will need to square these values if you want the variances) for the first random effect (e.g., the random intercept). Then it will give you the correlations between that random effect and all other random effects it is correlated with. The next row of CI values will correspond to the second random effect (e.g., the random slope for stress_state), with the following rows pertaining to all random effects it is correlated with (but not include any correlations already given - because for example, the correlation for the random intercept and random slope for stress_state was already given). The last row pertaining to random effects is always called sigma, and pertains to the residual error (given in SD units - will need to square these values if you want the variances).

Plotting the Between-Person and Within-Person Associations

Next, we can plot between-person associations.

AMIB_imeans %>%
  ggplot(aes(x=stress_trait, y=negaff_trait, group=factor(id)), legend=FALSE) +
  geom_point(colour="gray40") +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, 
              linewidth=2, color="blue") +
  xlab("Trait Stress") + ylab("Trait Negative Affect") +
  theme_classic() +
  theme(axis.title=element_text(size=16),
        axis.text=element_text(size=12),
        plot.title=element_text(size=16, hjust=.5)) +
  ggtitle("Between-Person Association Plot\nTrait Stress & Negative Affect")

## `geom_smooth()` using formula = 'y ~ x'

Note that the plot is not actually using the model output - so it is just an approximation of the exact model (using geom_smooth embedded within ggplot). However, it is a very useful display.

Plotting predicted within-person associations using the predicted scores.

daily_long %>%
  ggplot(aes(x=stress_state, y=pred_m1, group=factor(id), colour="gray"), legend=FALSE) +
  geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1, 
              linewidth=.5, color="gray40") +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, 
              linewidth=2, color="blue") +
  xlab("State Stress") + ylab("Predicted Negative Affect") +
  theme_classic() +
  theme(axis.title=element_text(size=18),
        axis.text=element_text(size=14),
        plot.title=element_text(size=18, hjust=.5)) +
  ggtitle("Within-Person Association\nState Stress & Negative Affect")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 20 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 20 rows containing non-finite outside the scale range
## (`stat_smooth()`).

Note that in this plot, although the predictor on the x-axis is a within-person state variable, the negative affect outcome variable on the y-axis still has both the within- and the between-person components in it.

We can alternatively “purify” the plot by using the negative affect state variable as the outcome. Then we have plotted only the within-person associations. Plotting predicted within-person associations.

daily_long %>%
  ggplot(aes(x=stress_state, y=negaff_state, group=factor(id), colour="gray"), 
         legend=FALSE) +
  geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1, 
              linewidth=.5, color="gray40") +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, 
              linewidth=2, color="blue") +
  xlab("Stress State") + ylab("Predicted State Negative Affect") +
  theme_classic() +
  theme(axis.title=element_text(size=18),
        axis.text=element_text(size=14),
        plot.title=element_text(size=18, hjust=.5)) +
  ggtitle("Within-Person Association Plot\nPerceived Stress & Negative Affect")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 20 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 20 rows containing non-finite outside the scale range
## (`stat_smooth()`).

Note that the plot is not actually using the model output - so it is just an approximation of the exact model (using geom_smooth with state scores embedded within ggplot). However, it is a very useful display.

That was awesome!!!

A First Extension

Adding Another Person-level Predictor

We now add trait neuroticism to the model, to see if and how between-person differences in neuroticism are related to stress reactivity.

Running the expanded model.

# fit model
model2_fit <- lmer(formula = negaff ~ 1 + day + stress_trait_c + 
                      bfi_n_c + stress_trait_c:bfi_n_c +
                      stress_state + stress_state:stress_trait_c + 
                      stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c + 
                      (1 + stress_state|id),
                    data=daily_long,
                    na.action=na.exclude)
#Look at results
summary(model2_fit)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## negaff ~ 1 + day + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +  
##     stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +  
##     stress_state:stress_trait_c:bfi_n_c + (1 + stress_state |      id)
##    Data: daily_long
## 
## REML criterion at convergence: 3161.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4271 -0.6011 -0.0749  0.5045  4.4732 
## 
## Random effects:
##  Groups   Name         Variance Std.Dev. Corr
##  id       (Intercept)  0.1955   0.4422       
##           stress_state 0.1238   0.3518   0.51
##  Residual              0.4040   0.6356       
## Number of obs: 1438, groups:  id, 190
## 
## Fixed effects:
##                                       Estimate Std. Error         df t value
## (Intercept)                          2.690e+00  4.556e-02  3.944e+02  59.055
## day                                 -6.545e-02  7.572e-03  1.247e+03  -8.644
## stress_trait_c                       9.695e-01  7.878e-02  1.835e+02  12.307
## bfi_n_c                              1.543e-01  3.917e-02  1.809e+02   3.939
## stress_state                         7.687e-01  4.682e-02  1.673e+02  16.418
## stress_trait_c:bfi_n_c               3.715e-02  7.832e-02  1.824e+02   0.474
## stress_trait_c:stress_state          1.254e-01  1.015e-01  1.692e+02   1.235
## bfi_n_c:stress_state                 7.595e-02  4.845e-02  1.549e+02   1.568
## stress_trait_c:bfi_n_c:stress_state -3.167e-02  1.024e-01  1.722e+02  -0.309
##                                     Pr(>|t|)    
## (Intercept)                          < 2e-16 ***
## day                                  < 2e-16 ***
## stress_trait_c                       < 2e-16 ***
## bfi_n_c                             0.000117 ***
## stress_state                         < 2e-16 ***
## stress_trait_c:bfi_n_c              0.635818    
## stress_trait_c:stress_state         0.218476    
## bfi_n_c:stress_state                0.119006    
## stress_trait_c:bfi_n_c:stress_state 0.757563    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) day    strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## day         -0.576                                                  
## strss_trt_c  0.003  0.005                                           
## bfi_n_c     -0.008  0.007 -0.224                                    
## stress_stat  0.204  0.004  0.000  0.001                             
## strss_t_:__ -0.179  0.008  0.008  0.040 -0.054                      
## strss_tr_:_  0.042 -0.073  0.248 -0.056 -0.087  0.003               
## bf_n_c:str_ -0.033  0.058 -0.058  0.258  0.016  0.009  -0.244       
## strs__:__:_ -0.061  0.032  0.004  0.008 -0.233  0.243  -0.103 -0.059

# save predicted scores
daily_long <- daily_long %>%
  mutate(pred_m2 = predict(model2_fit))

Fixed Effects:
- (Intercept): The expected value of negative affect for the prototypical person on a prototypical day is 2.69.
- day: For every unit increase in day, negative affect is expected to decrease by -0.07 (p = 0).
- stress_trait_c: Individuals with higher perceived stress tended to experience higher negative affect, 0.97, which is statistically significantly different from zero (p = 0).
- bfi_n_c: On average, individuals with higher neuroticism also tended to experience higher negative affect (0.15, p = 0)
- stress_state: On days when the prototypical individual’s perceived stress was higher than usual, their negative affect also tended to be higher than usual, 0.77, which was statistically significantly different from zero (p = 0).
- stress_trait_c:bfi_n_c: trait neuroticism did not moderate the between-person association between stress and negative affect (0.04, p = 0.64).
- stress_trait_c:stress_state: trait stress does not moderate the within-person association between state stress and negative affect (0.13, p = 0.22).
- bfi_n_c:stress_state: trait neuroticism does not moderate the within-person association between stress and negative affect (0.08, p = 0.12).
- stress_trait_c:bfi_n_c:stress_state: neuroticism does not moderate the trait stress moderator (-0.03, p = 0.76).
Random Effects:
- Variance((Intercept)): There was a substantial extent of between-person differences in the average value of negative affect (0.2).
- Variance(stress_state): There was a substantial extent of between-person differences in the average within-person association between perceived stress and negative affect (0.12).
- Corr((Intercept),stress_state): The correlation between the random intercept and random slope was 0.51, which indicates that those who had higher intercept values for negative affect were also more likely to exhibit stress reactivity (more positive within-person associations between negative affect and perceived stress).

Plotting and Probing the Cross-Level Interactions

Probing and Plotting the Interaction

The moderation is not significant. However, if it were, we would want to plot and probe the interaction term, specifically the stress_state:bfi_n_c term.

We can use the effect function.
We examine what the effect is as various levels of the predictors. Standard practice is to use +1 and -1 SD.

#xlevels = is the list of values that we want to evaluate at.
#obtaining the standard deviation of the between-person moderator
describe(daily_long$bfi_n_c)

##    vars    n  mean   sd median trimmed  mad   min  max range  skew kurtosis
## X1    1 1458 -0.01 0.96   0.02       0 1.48 -1.98 2.02     4 -0.08    -0.79
##      se
## X1 0.03

#obtaining the standard deviation of the time-varying predictor
describe(daily_long$stress_state)

##    vars    n mean   sd median trimmed  mad   min  max range skew kurtosis   se
## X1    1 1445    0 0.49  -0.03   -0.02 0.46 -1.75 2.12  3.88 0.36     0.79 0.01

#calculate effect
effects_model2 <- effect(term="bfi_n_c*stress_state", mod=model2_fit, 
                         xlevels=list(bfi_n_c=c(-0.96, +0.96), stress_state=c(-0.49,+0.49)))

## NOTE: bfi_n_c:stress_state is not a high-order term in the model

## Warning in Analyze.model(focal.predictors, mod, xlevels, default.levels, : the
## predictors stress_trait_c, bfi_n_c are one-column matrices that were converted
## to vectors

summary(effects_model2)

## 
##  bfi_n_c*stress_state effect
##        stress_state
## bfi_n_c    -0.49     0.49
##   -0.96 1.964450 2.644774
##   0.96  2.188157 3.011998
## 
##  Lower 95 Percent Confidence Limits
##        stress_state
## bfi_n_c    -0.49     0.49
##   -0.96 1.858012 2.510688
##   0.96  2.080620 2.876440
## 
##  Upper 95 Percent Confidence Limits
##        stress_state
## bfi_n_c    -0.49     0.49
##   -0.96 2.070888 2.778860
##   0.96  2.295694 3.147556

Everything we need is in there!

We can convert to a data frame and plot it!

#convert to dataframe
effectsdata <- as.data.frame(effects_model2)
#plotting the effect evaluation (with standard error ribbon)
effectsdata %>%
  ggplot(aes(x=stress_state, y=fit, group=bfi_n_c), legend=FALSE) + 
  geom_point() +
  geom_line() +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=.15) +
  xlab("Stress State") + xlim(-2,2) +
  ylab("Predicted Negative Affect") + ylim(1,7) +
  ggtitle("Differences in Stress Reactivity across Neuroticism")

Note that the non-significant interaction appears as two practically parallel lines.

Johnson-Neyman Technique

Let’s go a bit further …

We would like to identify the range of values of the moderator at which the within-person slope is significant. Formally, the Johnson-Neyman technique is used to probe significant interactions in order to determine for what values of the moderator the focal predictor is significantly related to the outcome - i.e., to identify the region of significance. This is quickly becoming standard practice in reporting about interactions in multilevel models.

In our case, we are interested to know the range of neuroticism scores at which we would expect individuals to have significant stress reactivity (within-person association between daily stress and daily negative affect).

We make use of the johnson-neyman() function in the interactions package. Note that the backports package may also be needed to get a deprecated isFALSE() function that this package uses.

Also, when using the interactions package we need to do a little data formatting (to make sure that all variables are vectors), and to run the model using lmer() without the lmerTest() overlay (to obtain a merMod object).

Fixing data.

#check for embedded matrices
str(daily_long)

## 'data.frame':    1458 obs. of  14 variables:
##  $ id            : int  101 101 101 101 101 101 101 101 102 102 ...
##  $ day           : int  0 1 2 3 4 5 6 7 0 1 ...
##  $ negaff        : num  3 2.3 1 1.3 1.1 1 1.2 1.1 1.4 1.6 ...
##  $ pss           : num  2.5 2.75 3.5 3 2.75 2.75 3.5 2.75 3.5 4 ...
##  $ stress        : num  1.5 1.25 0.5 1 1.25 1.25 0.5 1.25 0.5 0 ...
##  $ bfi_n         : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ stress_trait  : num  1.06 1.06 1.06 1.06 1.06 ...
##  $ negaff_trait  : num  1.5 1.5 1.5 1.5 1.5 ...
##  $ bfi_n_c       : num [1:1458, 1] -0.982 -0.982 -0.982 -0.982 -0.982 ...
##   ..- attr(*, "scaled:center")= num 2.98
##  $ stress_trait_c: num [1:1458, 1] -0.333 -0.333 -0.333 -0.333 -0.333 ...
##   ..- attr(*, "scaled:center")= num 1.4
##  $ stress_state  : num  0.4375 0.1875 -0.5625 -0.0625 0.1875 ...
##  $ negaff_state  : num  1.5 0.8 -0.5 -0.2 -0.4 ...
##  $ pred_m1       : Named num  2.12 1.91 1.4 1.63 1.71 ...
##   ..- attr(*, "names")= chr [1:1458] "1" "2" "3" "4" ...
##  $ pred_m2       : Named num  2.1 1.89 1.39 1.61 1.69 ...
##   ..- attr(*, "names")= chr [1:1458] "1" "2" "3" "4" ...

#fixing a few variables that were matrices to be vectors
daily_long <- daily_long %>%
  mutate(bfi_n_c = as.vector(bfi_n_c),
         stress_trait_c = as.vector(stress_trait_c))

Specify and fit model as specific lme4::lmer to get merMod object (rather than a lmerModLmerTest object).

# fit model
model2a_fit <- lme4::lmer(formula = negaff ~ 1 + day + stress_trait_c + 
                      bfi_n_c + stress_trait_c:bfi_n_c +
                      stress_state + stress_state:stress_trait_c + 
                      stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c + 
                      (1 + stress_state|id),
                    data=daily_long,
                    na.action=na.exclude)

# Look at results
summary(model2a_fit)

## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## negaff ~ 1 + day + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +  
##     stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +  
##     stress_state:stress_trait_c:bfi_n_c + (1 + stress_state |      id)
##    Data: daily_long
## 
## REML criterion at convergence: 3161.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4271 -0.6011 -0.0749  0.5045  4.4732 
## 
## Random effects:
##  Groups   Name         Variance Std.Dev. Corr
##  id       (Intercept)  0.1955   0.4422       
##           stress_state 0.1238   0.3518   0.51
##  Residual              0.4040   0.6356       
## Number of obs: 1438, groups:  id, 190
## 
## Fixed effects:
##                                      Estimate Std. Error t value
## (Intercept)                          2.690415   0.045558  59.055
## day                                 -0.065452   0.007572  -8.644
## stress_trait_c                       0.969539   0.078780  12.307
## bfi_n_c                              0.154266   0.039168   3.939
## stress_state                         0.768705   0.046821  16.418
## stress_trait_c:bfi_n_c               0.037152   0.078322   0.474
## stress_trait_c:stress_state          0.125401   0.101524   1.235
## bfi_n_c:stress_state                 0.075952   0.048450   1.568
## stress_trait_c:bfi_n_c:stress_state -0.031668   0.102428  -0.309
## 
## Correlation of Fixed Effects:
##             (Intr) day    strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## day         -0.576                                                  
## strss_trt_c  0.003  0.005                                           
## bfi_n_c     -0.008  0.007 -0.224                                    
## stress_stat  0.204  0.004  0.000  0.001                             
## strss_t_:__ -0.179  0.008  0.008  0.040 -0.054                      
## strss_tr_:_  0.042 -0.073  0.248 -0.056 -0.087  0.003               
## bf_n_c:str_ -0.033  0.058 -0.058  0.258  0.016  0.009  -0.244       
## strs__:__:_ -0.061  0.032  0.004  0.008 -0.233  0.243  -0.103 -0.059

# Get confidence intervals for both fixed and random effects
#confint(model2a_fit)
  
# Save predicted scores
daily_long <- daily_long %>%
  mutate(pred_m2a = predict(model2a_fit))

# Fit statistics  
AIC(logLik(model2a_fit))

## [1] 3187.776

BIC(logLik(model2a_fit))

## [1] 3256.299

logLik(logLik(model2a_fit))

## 'log Lik.' -1580.888 (df=13)

Plotting and probing the simple slopes (within-person association between stress_state and negaff) across the full range of the moderator (bfi_n_c).

#probing 2-way interaction
johnson_neyman(model=model2a_fit, pred=stress_state, modx=bfi_n_c)

## JOHNSON-NEYMAN INTERVAL
## 
## When bfi_n_c is INSIDE the interval [-4.45, 40.14], the slope of
## stress_state is p < .05.
## 
## Note: The range of observed values of bfi_n_c is [-1.98, 2.02]

We see from the output and the plot that the within-person stress reactivity slopes are significant at all observed values of neuroticism - as we would expect given the non-significance of the interaction term.

We can also probe the 3-way interaction, provided we choose specific values for the 2nd moderator.

#obtaining the standard deviation of the 2nd moderator 
describe(daily_long$stress_trait_c)

##    vars    n  mean   sd median trimmed  mad   min  max range  skew kurtosis
## X1    1 1458 -0.01 0.47   0.01   -0.01 0.51 -1.21 1.17  2.38 -0.08    -0.21
##      se
## X1 0.01

#probing 3-way interaction
simpleslopes_model2a <- sim_slopes(model=model2a_fit, pred=stress_state, modx=bfi_n_c,
                                   mod2=stress_trait_c,mod2.values = c(-0.47,+0.47))
plot(simpleslopes_model2a)

From the output we see that the value of the within-person stress reactivity slopes (i.e., the simple slopes) are similar across all values of the moderators and have overlapping confidence intervals - as we would expect given the non-significance of the interaction term.

As you try these techniques out with the AMIB data, please let us know if you find a set of variables that has the significant interactions that make for interesting results and theory. =:-]

Conclusion

This tutorial illustrated application of the basic multilevel model (where the outcome variable is assumed normally distributed) for examining between-person differences in within-person associations (Intraindividual Covariation). Follow-ups include making beautifully informative plots and the probing and plotting of interactions.

Good luck! =:-]

Citations

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01

Fox, J., & Hong, J. (2010). Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package. Journal of Statistical Software, 32, 1–24. https://doi.org/10.18637/jss.v032.i01

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software, 82, 1–26. https://doi.org/10.18637/jss.v082.i13

Lang, M., Murdoch, D., & R Core Team. (2024). backports: Reimplementations of Functions Introduced Since R-3.0.0 (Version 1.5.0). https://CRAN.R-project.org/package=backports

Long, J. A. (2024). interactions: Comprehensive, User-Friendly Toolkit for Probing Interactions (Version 1.2.0). https://doi.org/10.32614/CRAN.package.interactions

R Core Team. (2024). R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. https://www.R-project.org/

Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research Northwestern University. https://CRAN.R-project.org/package=psych

Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29.

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag.

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation (Version 1.1.4). https://CRAN.R-project.org/package=dplyr

Wickham, H., Vaughan, D., & Girlich, M. (2024). tidyr: Tidy Messy Data (Version 1.3.1). https://CRAN.R-project.org/package=tidyr