Multilevel Model with Heterogeneous Variance (Location-Scale models)
nlme implementation with AMIB
Overview
In the previous tutorial we explored the multilevel model with heterogeneous variance (location-scale models) and illustrated how they are constructed using some simulated data.
In this tutorial, we work through some empirical examples. Along the way we replicate some of the IIV findings that were obtained using a 2-stage approach where we examined interindividual differences in IIV (specifically the iSD).Our example makes use of the AMIB data - specifically examining differences in intraindividual variability of daily affect. Oooh so fun!
Preliminaries
Loading libraries used in this script.
AMIB Data
Our examples make use of The AMIB Data, a multiple time-scale data set that has been shared for teaching purposes. The data include person-level dispositions, daily diary assessments, and ecological momentary assessments that were obtained after everyday social interactions (event-contingent sampling).
The data are shared with intent to facilitate learning about analysis of intensive longitudinal data (from experience sampling, daily diary, or EMA designs). The data are posted only for teaching purposes and should not be used for research.
Use requires citation and acknowledgement of both this website and the following paper:
#Daily data
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/AMIB/AMIB_daily.csv"
#read in the .csv file using the url() function
daily <- read.csv(file=url(filepath), header=TRUE)
#subsetting data
daily1 <- daily %>%
select(id, day, posaff, negaff)
#Person data
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/AMIB/AMIB_persons.csv"
#read in the .csv file using the url() function
person <- read.csv(file=url(filepath), header=TRUE)
#subsetting data
person <- person %>%
select(id, sex, bfi_n)
person <- person %>%
mutate(bfi_nc = bfi_n - mean(bfi_n, na.rm=TRUE), #make a centered variable
sex_f = factor(sex, labels = c("male", "female"))) #make a factor variable
#Merging person and daily together
data_all <- merge(person, daily1, by="id")
# Examine first few rows of the data set
head(data_all, 10)## id sex bfi_n bfi_nc sex_f day posaff negaff
## 1 101 2 2 -0.9815789 female 0 3.9 3.0
## 2 101 2 2 -0.9815789 female 1 3.8 2.3
## 3 101 2 2 -0.9815789 female 2 5.1 1.0
## 4 101 2 2 -0.9815789 female 3 5.6 1.3
## 5 101 2 2 -0.9815789 female 4 4.3 1.1
## 6 101 2 2 -0.9815789 female 5 3.9 1.0
## 7 101 2 2 -0.9815789 female 6 5.1 1.2
## 8 101 2 2 -0.9815789 female 7 4.8 1.1
## 9 102 2 2 -0.9815789 female 0 6.3 1.4
## 10 102 2 2 -0.9815789 female 1 7.0 1.6
Ok - lets consider two individual differences (notice that we do this
by subsetting to one row per person):
Sex (factor version)
##
## male female
## 65 125
Neuroticism
## vars n mean sd median trimmed mad min max range skew kurtosis
## bfi_n 1 190 2.98 0.96 3.00 3.00 1.48 1.00 5.00 4 -0.09 -0.82
## bfi_nc 2 190 0.00 0.96 0.02 0.02 1.48 -1.98 2.02 4 -0.09 -0.82
## se
## bfi_n 0.07
## bfi_nc 0.07
Some Simplistic Hypotheses
We can formulate some simple hypotheses about variance heterogeneity based on stereotypes and construct definitions. Apologies for the content and its binary-ness. Our move to better examples is not complete.
A. Individuals who self-identify as Female have more variable
emotions (e.g., positive affect) than those who identify as
Male.
B. Individuals higher in Neuroticism have higher “intrinsic IIV” in
negative affect.
Ideally, we would have some raw data plots here that would give us useful information about how the data are mapping to the hypotheses. We have not yet developed a good framework that we’d like the field to adopt for simultaneous display of raw data and the modeling results. For raw data we can color-code the individuals and plot the trajectories.
For the categorical between-person differences variable.
# plotting intraindividual change
data_all %>%
ggplot(aes(x = day, y=posaff, group=id, color=factor(sex_f))) +
# first variable
geom_line() +
# geom_point(aes(y=agval), color="black") +
# plot layouts
scale_x_continuous(breaks=c(0,1,2,3,4,5,6,7), name="Day") +
scale_y_continuous(breaks=c(1,2,3,4,5,6,7), name="Positive Affect") +
theme_classic() +
theme(axis.title=element_text(size=14),
axis.text=element_text(size=14),
plot.title=element_text(size=14, hjust=.5)) +
ggtitle("Intraindividual Variability Across Days\nGroup Differences")## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
That works ok, but needs better coloring and labeling.
For the continuous between-person differences variable.
# plotting intraindividual change
data_all %>%
ggplot(aes(x = day, y=posaff, group=id, color=bfi_nc)) +
# first variable
geom_line() +
# geom_point(aes(y=agval), color="black") +
# plot layouts
scale_x_continuous(breaks=c(0,1,2,3,4,5,6,7), name="Day") +
scale_y_continuous(breaks=c(1,2,3,4,5,6,7), name="Positive Affect") +
theme_classic() +
theme(axis.title=element_text(size=14),
axis.text=element_text(size=14),
plot.title=element_text(size=14, hjust=.5)) +
ggtitle("Intraindividual Variability Across Days\nIndividual Differences")## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
That might work but needs some better coloring.
Research Question A
The hypothesis is that individuals who self-identify as Female have more variable emotions (e.g., positive affect) than those who identify as Male.
Model 1: Common models for between-person and within-person variance
We first run a homogeneous variance model.
model_A1 = lme(fixed = posaff ~ 1 + sex_f,
random = list(id = pdSymm(form = ~ 1)),
data = data_all,
na.action = na.exclude,
method = 'REML')
summary(model_A1)## Linear mixed-effects model fit by REML
## Data: data_all
## AIC BIC logLik
## 4036.608 4057.695 -2014.304
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6342518 0.8807278
##
## Fixed effects: posaff ~ 1 + sex_f
## Value Std.Error DF t-value p-value
## (Intercept) 4.352704 0.08889986 1251 48.96187 0e+00
## sex_ffemale -0.369189 0.10938668 188 -3.37509 9e-04
## Correlation:
## (Intr)
## sex_ffemale -0.813
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.34222753 -0.57183337 0.05917556 0.59744941 3.67294434
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.4022753 0.6342518
## Residual 0.7756815 0.8807278
Specifically, variance of the intercepts is 0.40.
Model 2: Heterogeneous variance for between-person and common within-person variance
We then run heterogeneous variance model that allows for different amounts of intercept variance across groups.
model_A2 = lme(fixed = posaff ~ 1 + sex_f,
random = list(id = pdDiag(form = ~ 0 + sex_f)),
data = data_all,
na.action = na.exclude,
method = 'REML')
summary(model_A2)## Linear mixed-effects model fit by REML
## Data: data_all
## AIC BIC logLik
## 4038.51 4064.869 -2014.255
##
## Random effects:
## Formula: ~0 + sex_f | id
## Structure: Diagonal
## sex_fmale sex_ffemale Residual
## StdDev: 0.6521665 0.6249241 0.8807269
##
## Fixed effects: posaff ~ 1 + sex_f
## Value Std.Error DF t-value p-value
## (Intercept) 4.352339 0.09089571 1251 47.88278 0.000
## sex_ffemale -0.368757 0.11058883 188 -3.33449 0.001
## Correlation:
## (Intr)
## sex_ffemale -0.822
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.3359166 -0.5739653 0.0617249 0.6015832 3.6714934
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
## id = pdDiag(0 + sex_f)
## Variance StdDev
## sex_fmale 0.4253211 0.6521665
## sex_ffemale 0.3905301 0.6249241
## Residual 0.7756798 0.8807269
Hmmm… the variances are …
\(\sigma^{2}_{u0g=male} =
0.425\),
\(\sigma^{2}_{u0g=female} =
0.391\),
These are not so different. Lets check the relative fits of the two
models
## Model df AIC BIC logLik Test L.Ratio p-value
## model_A1 1 4 4036.608 4057.695 -2014.304
## model_A2 2 5 4038.510 4064.869 -2014.255 1 vs 2 0.09745325 0.7549
Not different - we go with the more parsimonious model for the homogenous between-person variance.
Model 3: Common between-person variance and heterogeneous within-person variance
We can then look at differences in the prototypical within-person variance within each group.
model_A3 = lme(fixed = posaff ~ 1 + sex_f,
random = list(id = pdSymm(form = ~ 1)),
weights = varIdent(form = ~ 1 | sex_f),
data = data_all,
na.action = na.exclude,
method = 'REML')
summary(model_A3)## Linear mixed-effects model fit by REML
## Data: data_all
## AIC BIC logLik
## 4027.781 4054.14 -2008.891
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6355109 0.9196675
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | sex_f
## Parameter estimates:
## female male
## 1.0000000 0.8686141
## Fixed effects: posaff ~ 1 + sex_f
## Value Std.Error DF t-value p-value
## (Intercept) 4.351398 0.08738773 1251 49.79416 0e+00
## sex_ffemale -0.367695 0.10856624 188 -3.38682 9e-04
## Correlation:
## (Intr)
## sex_ffemale -0.805
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.47441822 -0.57332919 0.06120628 0.60932796 3.51348151
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.4038741 0.6355109
## Residual 0.8457883 0.9196675
The common between-person variance is …
\(\sigma^{2}_{u0} = 0.404\),
For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual StdDev is here ..
## [1] 0.9196675
and the weights are here … (yeah, I’m not sure why they are so hard to extract).
And be careful about the labels - they are easily missed.
## male
## 0.8686141
So the estimated Residual variance for group = female, \(\sigma^{2}_{eg=female}\) is …
## [1] 0.8457883
and the estimated variance for group = male, \(\sigma^{2}_{eg=male}\) is …
## male
## 0.6381393
Those look like they may be different, with females having greater within-person variance than males.
Lets check the relative fits of the two models before deciding about our hypothesis.
## Model df AIC BIC logLik Test L.Ratio p-value
## model_A1 1 4 4036.608 4057.695 -2014.304
## model_A3 2 5 4027.781 4054.140 -2008.891 1 vs 2 10.82627 0.001
Yes! There is a significant difference in fit between the models. The heterogeneity in within-person variance is needed. We conclude that there is a significant difference in IIV between males and females. In this sample, females have more IIV in daily positive affect than males.
Research Question B
The hypothesis is that individuals higher in neuroticism have higher “intrinsic IIV” in negative affect.
Model 1: Common models for between-person and within-person variance
We again start with a base model with homogeneity assumptions.
model_B1 = lme(fixed = negaff ~ 1 + bfi_nc,
random = list(id = pdSymm(form = ~ 1)),
data = data_all,
na.action = na.exclude,
method = 'REML')
summary(model_B1)## Linear mixed-effects model fit by REML
## Data: data_all
## AIC BIC logLik
## 3820.718 3841.805 -1906.359
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6055919 0.8139183
##
## Fixed effects: negaff ~ 1 + bfi_nc
## Value Std.Error DF t-value p-value
## (Intercept) 2.4640809 0.04913707 1251 50.14709 0
## bfi_nc 0.2642412 0.05149343 188 5.13155 0
## Correlation:
## (Intr)
## bfi_nc 0.006
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.8045055 -0.6157753 -0.1600364 0.4738359 3.8266991
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.3667416 0.6055919
## Residual 0.6624629 0.8139183
Model 2: Hetergeneous between-person variance and common within-person variance (Skipped)
We skip Model 2 as its not clear how a continuous between-person predictor maps to differences in between-person variance. That is not possible to examine.
Model 3: Common between-person variance and heterogeneous within-person variance
We can then look at whether differences in within-person variance are related to differences in neuroticism.
model_B3 = lme(fixed = negaff ~ 1 + bfi_nc,
random = list(id = pdSymm(form = ~ 1)),
weights = varExp(form = ~ bfi_nc),
data = data_all,
na.action = na.exclude,
method = 'REML')
summary(model_B3)## Linear mixed-effects model fit by REML
## Data: data_all
## AIC BIC logLik
## 3798.319 3824.677 -1894.159
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6031915 0.807518
##
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~bfi_nc
## Parameter estimates:
## expon
## 0.09838318
## Fixed effects: negaff ~ 1 + bfi_nc
## Value Std.Error DF t-value p-value
## (Intercept) 2.4634419 0.04900460 1251 50.26960 0
## bfi_nc 0.2617456 0.05133223 188 5.09905 0
## Correlation:
## (Intr)
## bfi_nc 0.042
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.8296045 -0.6260128 -0.1640791 0.4841943 4.0175758
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.3638400 0.6031915
## Residual 0.6520853 0.8075180
For the within-person variances we need to do some multiplication
(and squaring to get variances). Recall that the equation implemented
here in lme() … see ?varExp … is \[\sigma^{2}_{ei} =
\alpha_{0}^{2}exp(2\alpha_{1}x_i)\]
The Residual StdDev, \(\sqrt{\alpha_{0}^2} = \alpha_{0}\) is here …
## [1] 0.807518
And the parameter in the exponential variance function is here …
## expon
## 0.09838318
So for a given value of bfi_Nc we expect the residual
within-person variance to be …
bfi_nc <- seq(-2.5,2.5,.1)
varpred <- summary(model_B3)$sigma^2 * exp(coef(2*model_B3$modelStruct$varStruct, uncons=FALSE)*bfi_nc)We can make a plot of how the residual variances are related to the predictor
#make a data frame with predictions and back-transform centered variable to original scale.
plotvars <- as.data.frame(cbind(bfi_nc, varpred))
plotvars <- plotvars %>%
mutate(bfi_n = bfi_nc + mean(person$bfi_n, na.rm=TRUE))
#plotting
plotvars %>%
ggplot(aes(x=bfi_n,y = varpred), legend=FALSE) +
geom_point() +
geom_line() +
xlab("BFI Neuroticism") +
ylab("Predicted IIV in Daily NA (residual variance)") + ylim(0,1) +
theme_bw()Lets check the relative fits of the two models before deciding if there is actually a relation between Neuroticism and between-person differences in intraindividual variability.
## Model df AIC BIC logLik Test L.Ratio p-value
## model_B1 1 4 3820.718 3841.805 -1906.359
## model_B3 2 5 3798.319 3824.677 -1894.159 1 vs 2 24.39873 <.0001
Yes! There is a significant difference in fit between the models. The heterogeneity in within-person variance is needed. We conclude that higher Neuroticism is related to higher IIV in daily Negative Affect.
We are still looking for an intuitive way to display the variance heterogeneity results that respects both the IIV metrics and multilevel modeling approaches. Not there yet, but we will find a good one!
Conclusion
Ok - we have successfully pushed into the variance heterogeneity
models. Although, these models are not yet widely used in the analysis
of experience sampling data, they are promising and will become
increasingly useful as the data streams get longer and more temporally
dense. Of note, the implementation using nlme is both
flexible and constrained, so there are good reasons to move into a
Bayesian modeling framework where we can introduce even more flexibility
(that accommodates the reality that there are unexplained between-person
differences).
As always, Be Careful! These implementations are not always 2*straightforward. [Sorry, I just cannot let that joke go! =:-]
Citations
Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80, 1–28. https://doi.org/10.18637/jss.v080.i01
Pinheiro, J. C., Bates, D., & R Core Team. (2025). nlme: Linear and Nonlinear Mixed Effects Models (Version 3.1-167). https://doi.org/10.1007/b98882
R Core Team. (2024). R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. https://www.R-project.org/
Ram, N., Conroy, D. E., Pincus, A. L., Hyde, A. L., & Molloy, L. E. (2012). Tethering theory to method: Using measures of intraindividual variability to operationalize individuals’ dynamic characteristics. In G. Hancock & J. Harring (Eds.), Advances in longitudinal methods in the social and behavioral sciences (pp. 81-110). New York: Information Age.
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University. https://CRAN.R-project.org/package=psych
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag.
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation (Version 1.1.4). https://CRAN.R-project.org/package=dplyr