Intensive Longitudinal Data: Modeling Count Outcome using the Generalized Multilevel Model
Overview
This tutorial illustrates set-up and fitting of a generalized multilevel model to experience sampling data where the outcome variable is a count variable. The code for a Poisson multilevel model illustrated here can be adapted easily to accommodate other types of outcomes (e.g., to a logistic multilevel model for analysis of a binary outcome variable).
Our example makes use of the person-level and day-level (EMA-type) AMIB Data, a multiple time-scale data set that has been shared for teaching purposes.The data are shared with intent to facilitate learning about analysis of intensive longitudinal data (from experience sampling, daily diary, or EMA designs). The data are posted only for teaching purposes and should not be used for research.
Use requires citation and acknowledgement of both this website and the following paper:
Parallel to the tutorial on the basic multilevel model, this tutorial treats the daily measure of Negative Affect as a count variable.
Note however that while the basic model was fit using the
lme() function in the nlme library, the
generalized model is being fit using the glmer() function
in the lme4 library.
Outline
A. Overview of model
B. Set up the Generalized Multilevel Model (Poisson MLM for count data)
C. Interpretation of results
D. Display (data visualization) of the result
A. Introduction to the Generalized Multilevel Model
The basic linear multilevel model is written as
\[y_{it} = \beta_{0i} +
\beta_{1i}x_{it}\] where \(\beta_{1i}\) is individual i’s
level of
\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]
where \[e_{it} \tilde{} N(0,\mathbf{R})\], and \[\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})\]
\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\], where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.
\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]
The generalized linear multilevel model is an extension of linear multilevel models that allows that response variables from different distributions besides Gaussian (see also https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-generalized-linear-mixed-models/). To do this, we introduce a link function.
Let the linear predictor (the right hand side of the regression equation) be called \(\mathbf{\eta}\). That is (writing the above model in a general way common in statistics), \[\mathbf{\eta = X\beta + Z\gamma}\] Thus, \(\mathbf{\eta}\) is the combination of the fixed and random effects excluding the residuals.
We introduce a generic link function, \(g(\cdot)\), that relates the outcome \(\mathbf{y}\) to the linear predictor \(\mathbf{\eta}\).
\(g(\cdot)\) = link function
\(h(\cdot)=g^{−1}(\cdot)\) = inverse
link function
We use these link functions to formalize that the conditional
expectation of \(\mathbf{y}\)
(conditional because it is the expected value depending on the level of
the predictors) is
\[g(E(\mathbf{y}))=\mathbf{\eta}\] So, basically the link function “transforms” the outcome variable \(\mathbf{y}\) into a normal outcome.
We could also model the expectation of \(\mathbf{y}\) as \[E(\mathbf{y})=h(\mathbf{\eta})=\mathbf{\mu}\]
With \(\mathbf{y}\) itself equal to
\[\mathbf{y}=h(\mathbf{\eta}) + \mathbf{e}\]
Link functions and Families
There are many different link functions. In this tutorial we focus on the Poisson family, which is useful for modeling outcome variables that are measured as counts.
Count Outcomes
For a count outcome, we use a log link function and the probability mass function (PMF) for the Poisson distribution. These are:
\[g(\cdot) = log_{e}(\cdot)\] \[h(\cdot)=e^{(\cdot)}\] \[PMF = Pr(X = k) = \frac{\lambda^ke^-\lambda}{k!}\] \[E(X)=\lambda\] \[Var(X)=\lambda\]
Model Fitting
When we use an “identity” link function, we are in our usual specification of means and variances for the normal (Gaussian) distribution. “Generalized” is good because it covers the usual case and expands to other situations (binary, count, etc.). However, the estimation algorithms are different when working on the more general problem, and so the usual (normal continuous) case can be slow when using the software for generalized linear modeling. Stick with the non-generalized software for the normal case, and use the specialized programs when needed.
Model Interpretation
After inclusion of the link function - the modeling proceeds as usual. Interpretation of results is similar. On the linearized metric (after using the link function) interpretation is done in a standard way - interpreting significance and sign of parameters. However, for substantive interpretation, it is often easier to back transform the results to the original metric. For example, in a random effects logistic model, one might want to talk about the probability of an event given some specific values of the predictors. Likewise in a poisson (count) model, one might want to talk about the expected count rather than the expected log count. These transformations complicate matters because they are nonlinear and so even random intercepts no longer play a strictly additive role and instead can have a multiplicative effect. Good explanation can be found, among other places, at https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-generalized-linear-mixed-models/ and a rich resource of practical questions is here http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html.
We will walk through an example together in the Addressing Example Research Questions section below.
B. Set up the Generalized Multilevel Model (Poisson MLM for count data)
Loading Libraries
Loading libraries used in this script.
Loading Data
Our example makes use of the person-level and day-level (EMA-type) AMIB data files. Specifically, we make use of three variables:
Outcome: daily negative affect count, obtained through transformation of
negaffin the day-level file;Predictor: daily stress, obtained through reverse coding of
pssin the day-level file;Person-level predictor: trait neuroticism,
bfi_nin the person-level file
Loading person-level file and subsetting to variables of interest
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/AMIB/AMIB_persons.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath), header=TRUE)
#subsetting to variables of interest
AMIB_persons <- AMIB_persons %>%
select(id, bfi_n)Loading day-level file (T = 8) and subsetting to variables of interest.
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/AMIB/AMIB_daily.csv"
#read in the .csv file using the url() function
AMIB_daily <- read.csv(file=url(filepath), header=TRUE)
#subsetting to variables of interest
AMIB_daily <- AMIB_daily %>%
select(id, day, negaff, pss)Data Preparation
For our Poisson illustration we convert negaff to a
count variable - which must have only integer values. Our new variable
is called negaffcount.
#making a negaff count variable that ranges from 0 to 60
AMIB_daily <- AMIB_daily %>%
mutate(negaffcount = round(negaff-1, 1)*10)
#describing new variable
table(AMIB_daily$negaffcount)##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## 39 45 42 49 56 56 82 69 58 57 72 51 65 47 26 50 47 40 39 52 38 38 37 32 20 22
## 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 48 49 51 54 59
## 27 22 17 14 12 10 13 10 9 10 9 5 8 6 9 6 6 2 1 2 3 3 3 2 1 2
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1441 14.47 10.41 12 13.38 10.38 0 59 59 0.96 0.77 0.27
#histogram of the count variable
AMIB_daily %>%
ggplot(aes(x=negaffcount)) +
geom_histogram(fill="royalblue2", color="black") +
theme_bw()We also reverse code pss to a continuous
stress variable.
#reverse coding the pss variable into a new stress variable
AMIB_daily <- AMIB_daily %>%
mutate(stress = 4 - pss)
#describing new variable
describe(AMIB_daily$stress)## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1445 1.39 0.68 1.25 1.36 0.74 0 4 4 0.35 0.13 0.02
#histogram
AMIB_daily %>%
ggplot(aes(x=stress)) +
geom_histogram(fill="royalblue2", color="black") +
labs(x = "Stress (high = more stressed)") +
theme_bw()As usual we split the predictor into “trait” (between-person
differences) and “state” (within-person deviations) components.
Specifically, the daily variable stress is split into two
varaibles: stress_trait is the sample-mean centered
between-person component, and stress_state is the
person-centered within-person component.
Making trait variables
#calculating intraindividual means
AMIB_imeans <- AMIB_daily %>%
dplyr::group_by(id) %>%
dplyr::summarize(stress_trait=mean(stress, na.rm=TRUE))
describe(AMIB_imeans)## vars n mean sd median trimmed mad min max range
## id 1 190 318.29 130.44 321.50 318.99 151.23 101.00 532.00 431.00
## stress_trait 2 190 1.40 0.48 1.41 1.39 0.51 0.19 2.56 2.38
## skew kurtosis se
## id -0.04 -1.09 9.46
## stress_trait -0.04 -0.23 0.03
#merging into person-level file
AMIB_persons <- merge(AMIB_persons, AMIB_imeans, by="id")
#make centered versions of the person-level scores
AMIB_persons <- AMIB_persons %>%
mutate(bfi_n_c = scale(bfi_n, center=TRUE, scale=FALSE),
stress_trait_c = scale(stress_trait, center=TRUE, scale=FALSE))
#describe person-level data
describe(AMIB_persons)## vars n mean sd median trimmed mad min max
## id 1 190 318.29 130.44 321.50 318.99 151.23 101.00 532.00
## bfi_n 2 190 2.98 0.96 3.00 3.00 1.48 1.00 5.00
## stress_trait 3 190 1.40 0.48 1.41 1.39 0.51 0.19 2.56
## bfi_n_c 4 190 0.00 0.96 0.02 0.02 1.48 -1.98 2.02
## stress_trait_c 5 190 0.00 0.48 0.01 0.00 0.51 -1.21 1.17
## range skew kurtosis se
## id 431.00 -0.04 -1.09 9.46
## bfi_n 4.00 -0.09 -0.82 0.07
## stress_trait 2.38 -0.04 -0.23 0.03
## bfi_n_c 4.00 -0.09 -0.82 0.07
## stress_trait_c 2.38 -0.04 -0.23 0.03
Making state variables in long data
#merging person-level data into daily data
daily_long <- merge(AMIB_daily, AMIB_persons,by="id")
#calculating state variables
daily_long <- daily_long %>%
mutate(stress_state = stress - stress_trait)
#describing data
describe(daily_long)## vars n mean sd median trimmed mad min max
## id 1 1458 322.53 129.08 324.00 324.20 151.23 101.00 532.00
## day 2 1458 3.48 2.30 3.00 3.47 2.97 0.00 7.00
## negaff 3 1441 2.45 1.04 2.20 2.34 1.04 1.00 6.90
## pss 4 1445 2.61 0.68 2.75 2.64 0.74 0.00 4.00
## negaffcount 5 1441 14.47 10.41 12.00 13.38 10.38 0.00 59.00
## stress 6 1445 1.39 0.68 1.25 1.36 0.74 0.00 4.00
## bfi_n 7 1458 2.97 0.96 3.00 2.98 1.48 1.00 5.00
## stress_trait 8 1458 1.39 0.47 1.41 1.39 0.51 0.19 2.56
## bfi_n_c 9 1458 -0.01 0.96 0.02 0.00 1.48 -1.98 2.02
## stress_trait_c 10 1458 -0.01 0.47 0.01 -0.01 0.51 -1.21 1.17
## stress_state 11 1445 0.00 0.49 -0.03 -0.02 0.46 -1.75 2.12
## range skew kurtosis se
## id 431.00 -0.07 -1.06 3.38
## day 7.00 0.01 -1.24 0.06
## negaff 5.90 0.96 0.77 0.03
## pss 4.00 -0.35 0.13 0.02
## negaffcount 59.00 0.96 0.77 0.27
## stress 4.00 0.35 0.13 0.02
## bfi_n 4.00 -0.08 -0.79 0.03
## stress_trait 2.38 -0.08 -0.21 0.01
## bfi_n_c 4.00 -0.08 -0.79 0.03
## stress_trait_c 2.38 -0.08 -0.21 0.01
## stress_state 3.88 0.36 0.79 0.01
Great! Now the data are all prepared as needed.
Setting up for Generalized Multilevel Model with Count Outcome
Outlining the substantive inquiry.
As we did previously, we define stress reactivity (a person-level dynamic characteristic) as the extent to which a person’s level of daily negative affect is related to daily stress level. Stress reactivity is the intraindividual covariation of daily negative affect and daily stress. This time, though, we explicitly model negative affect as a count variable.
Formally,
\[log_{e}\left( y_{it} \right) = \beta_{0i} +
\beta_{1i}x_{it}\]
where the outcome (left side of equation) has been “transformed” using the log link function, \(g(\cdot) = log_{e}(\cdot)\), and where \(\beta_{1i}\) is individual i’s level of stress reactivity.
Lets first look at a subset of persons to see what stress reactivity looks like.
Preliminary Visualization
Let’s take a look at scatterplots of the raw data for a subset of participants.
#faceted plot
panelplot <-
daily_long %>%
filter(id <= 125) %>%
ggplot(aes(x=stress_state,y=negaffcount)) +
geom_point(alpha = .8, color = "royalblue") + #
xlab("Stress") + ylab("Negative Affect (Count)") +
scale_x_continuous(limits=c(-2,2), breaks=seq(-2,2,by=1)) +
scale_y_continuous(limits=c(0,60), breaks=c(0,30,60)) +
facet_wrap( ~ id) +
theme_bw()
panelplotWe can already begin to see hints of nonlinearity in these associations (e.g., participant 106), and we can also note the between-person differences.
Running the Model
We use the glmer() function in the lme4
package.
To start, lets fit the usual unconditional means model …
#unconditional means model
model0_fit <- glmer(formula = negaffcount ~ 1 + (1|id),
family="poisson",
data=daily_long,
na.action=na.exclude)
summary(model0_fit)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: negaffcount ~ 1 + (1 | id)
## Data: daily_long
##
## AIC BIC logLik -2*log(L) df.resid
## 12556.4 12567.0 -6276.2 12552.4 1439
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.8837 -1.4131 -0.3372 1.0990 8.9100
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.3014 0.549
## Number of obs: 1441, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.55757 0.04065 62.91 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The output is similar to what we are used to - although, note that there is no residual term (as per the equation above). Remember that the outcome is \(log(negaffcount_{it})\).
OK - lets add the predictors, stress_trait and
stress_level._state to examine stress
reactivity.
#simple model
model1_fit <- glmer(formula = negaffcount ~ 1 + stress_trait_c +
stress_state + stress_state:stress_trait_c +
(1 + stress_state|id),
family="poisson",
data=daily_long,
na.action=na.exclude)
summary(model1_fit)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula:
## negaffcount ~ 1 + stress_trait_c + stress_state + stress_state:stress_trait_c +
## (1 + stress_state | id)
## Data: daily_long
##
## AIC BIC logLik -2*log(L) df.resid
## 10558.8 10595.7 -5272.4 10544.8 1431
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6742 -1.0652 -0.1746 0.8470 7.3430
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.1591 0.3989
## stress_state 0.1500 0.3874 -0.34
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.50209 0.03018 82.898 < 2e-16 ***
## stress_trait_c 0.83509 0.06396 13.057 < 2e-16 ***
## stress_state 0.55984 0.03469 16.137 < 2e-16 ***
## stress_trait_c:stress_state -0.29855 0.07564 -3.947 7.91e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) strs__ strss_
## strss_trt_c -0.032
## stress_stat -0.305 0.021
## strss_tr_:_ 0.021 -0.300 -0.139
#other useful outputs:
#confint(model1_fit) # 95% CI for the coefficients
#exp(coef(model1_fit)) # exponentiated coefficients
#exp(confint(model1_fit, method="Wald")) # 95% CI for exponentiated coefficients
#predict(model1_fit, type="response") # predicted values
#residuals(model1_fit, type="deviance") # residualsInterpreting Results
Within this model the intercept \(\gamma_{00}\) is the expected ‘log’ of negative affect for a prototypical person on a prototyical day. We exponentiate to obtain an estimate on the original ‘count’ scale.
## [1] 12.20798
Therefore, the prototype person is expected to have negative affect count of 12.21 on the protoypical (average stress) day.
Similarly, \(\gamma_{10}\) is the protoypical within-person association between daily stress and log negative affect. We exponentiate to obtain an estimate on the original ‘count’ scale.
## [1] 1.750375
In a moment we will go over how to interpret this slope and obtain
predicted count values at varying levels of
stress_state.
C. Model Visualizations
Generating Predictions
Lets look at predicted scores. This can be done in two ways.
1. We can predict in the link scale (for us that is log
odds), or
2. We can predict in the response scale (for us that is
count).
#obtaining predicted scores for individuals
daily_long <- daily_long %>%
mutate(pred_m1log = predict(model1_fit, type="link"),
pred_m1original = predict(model1_fit, type="response"))#create new data for prototypical person with average person-level predictor
avedata <- data.frame(
# fill in a sequence of values between the min and max observed in dataset
stress_state = seq(min(daily_long$stress_state, na.rm = T),
max(daily_long$stress_state, na.rm = T), by = .1),
stress_trait_c = 0)
# generate model prediction (in log units) for prototypical person using fixed effects
fixedpredictlog <- cbind(avedata, predict(model1_fit, avedata, type = "link", re.form=~0))
colnames(fixedpredictlog) <- c("stress_state", "stress_trait_c", "fit")
# generate model prediction (in original units) for prototypical person using fixed effects
fixedpredict <- cbind(avedata, predict(model1_fit, avedata, type = "response", re.form =~0))
colnames(fixedpredict) <- c("stress_state", "stress_trait_c", "fit")Plotting Results
Plotting predicted within-person associations.
fixedplotlog <-
daily_long %>%
ggplot(aes(x = stress_state, y = pred_m1log, group = id)) +
geom_line(color="black", alpha = .6) +
xlab("Stress State (centered)") + ylab("Predicted Log Negative Affect") +
geom_line(data = fixedpredictlog, aes(x = stress_state, y = fit, group = NULL),
color = "red", linewidth = 2) +
xlab("Stress State (centered)") +
ylab("Predicted Log Negative Affect") +
theme_bw()
fixedplotlogfixedplot <-
daily_long %>%
ggplot(aes(x = stress_state, y = pred_m1original, group = id)) +
geom_line(alpha = .6) +
geom_line(data = fixedpredict, aes(x = stress_state, y = fit, group = NULL),
color = "red", linewidth = 2) +
xlab("Stress State (centered)") +
ylab("Predicted Negative Affect Count") +
theme_bw()
fixedplot## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_line()`).
Interpolate the Data
Notice, however, that the lines don’t all look smooth. Some have a
clear break in them. This can happen when ggplot is drawing
a line between sparse data points. If the line is supposed to be
straight, the sparseness of the data won’t matter. But if it is supposed
to be curved, then having gaps in the data can generate these kinds of
discontinuities. One particularly noteworthy example is indicated with
the red circle.
daily_long %>%
ggplot(aes(x = stress_state, y = pred_m1original, group = id)) +
geom_line(alpha = .6) +
xlab("Stress State (centered)") +
ylab("Predicted Negative Affect Count") +
geom_point() +
geom_point(aes(x = .9, y =54), pch = 1, size = 9, fill = "ghostwhite", color = "red") +
theme_bw()This is mostly an aesthetic consideration, although it could have implications for eye-balling predicted values. To address this issue, we can interpolate data for plotting. To be clear, we are NOT adding more data to the model; the model only uses our actual observations to generate estimates. Instead, we are just filling in more data points for the purposes of plotting, and we will do this only within each person’s observed range. In other words, we will not add data that is fall outside of each person’s minimum observed value or maximum observed value.
To accomplish this, we will need to create a new dataframe, called
newdata2. We will begin by setting this dataframe to
NULL so that it is just a placeholder waiting to “catch”
the data we will populate it with.
Next, we will set up a for loop in which we will ask R to take each subject’s minimum and maximum observed value, and generate values within that range in small increments, in this case increments of .1 (as shown with the seq() function). This will enable us to have a high degree of granularity so that our curves will be smooth when we plot them. We will also need to set other variables to 0, as they were mean centered in our model.
newdata2 <- NULL
for (i in unique(daily_long$id)) {
cseq <- data.frame(
stress_state = seq(min(subset(daily_long, id==i)$stress_state, na.rm=T),
max(subset(daily_long, id==i)$stress_state, na.rm=T), .01),
id = i,
stress_trait_c = unique(subset(daily_long, id==i)$stress_trait_c))
newdata2 <- rbind(newdata2, cseq)
}proto_m1_interp <- cbind(newdata2, predict(model1_fit, newdata2,
type="response", re.form=NULL))
colnames(proto_m1_interp) <- c("stress_state", "id", "stress_trait_c",
"fit")#Count scale predictions - interpolated data
fixedplot2 <-
daily_long %>%
ggplot(aes(x = stress_state, y = pred_m1original, group = id)) +
xlab("Stress State (centered)") + ylab("Predicted Negative Affect (Count)") +
geom_line(data = proto_m1_interp, aes(stress_state, fit, group = id), color = "blue", alpha =.6) +
geom_line(data = fixedpredict, aes(x = stress_state, y = fit, group = NULL),
color = "red", linewidth = 2) +
theme_bw()
fixedplot2#Count scale predictions - interpolated vs. non interpolated data
fixedplot2both <-
daily_long %>%
ggplot(aes(x = stress_state, y = pred_m1original, group = id)) +
geom_line(color="black", alpha = .6) +
xlab("Stress State (centered)") + ylab("Predicted Negative Affect (Count)") +
geom_line(data = proto_m1_interp, aes(stress_state, fit, group = id), color = "blue", alpha =.6) +
geom_line(data = fixedpredict, aes(x = stress_state, y = fit, group = NULL),
color = "red", size = 2) +
theme_bw()## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_line()`).
D. Addressing Example Research Questions
What is the predicted Negative Affect Count for a person who is +1
unit higher on stress_state and average
stress_trait_c?
# Method 1: Weights
plus1unit <- 1*lme4::fixef(model1_fit)["(Intercept)"] +
0*lme4::fixef(model1_fit)["stress_trait_c"] +
1*lme4::fixef(model1_fit)["stress_state"] +
0*lme4::fixef(model1_fit)["stress_trait_c:stress_state"]
exp(plus1unit)## (Intercept)
## 21.36876
# Method 2: Predict
predict(model1_fit,
newdata = data.frame(
stress_trait_c = 0,
stress_state = 1),
type="response", re.form=~0)## 1
## 21.36876
What is the predicted difference in Negative Affect Count for a
person who is +2 units higher (vs. +1 unit higher) on
stress_state and average stress_trait_c?
Our knowledge of regression might make us think that we can get the
answer to this question by just exponentiating the slope for
stress_state. But, as you will see, it’s not so simple when
working with GMLM.
## stress_state
## 1.750389
Instead, we can compute the predicted negative affect count for these
two scenarios. We have already computed the predicted negative affect
count for a person who is +1 unit higher on stress_state
and average stress_trait_c in the step above.
## (Intercept)
## 21.36876
Here, we do the same thing, but set stress_state to 2
units above the mean.
# Method 1: Weights
plus2units <- 1*lme4::fixef(model1_fit)["(Intercept)"] +
0*lme4::fixef(model1_fit)["stress_trait_c"] +
2*lme4::fixef(model1_fit)["stress_state"] + # We will multiply by 2 now instead of 1
0*lme4::fixef(model1_fit)["stress_trait_c:stress_state"]
exp(plus2units)## (Intercept)
## 37.40363
# Method 2: Predict
predict(model1_fit,
newdata = data.frame(
stress_trait_c = 0,
stress_state = 2), # NWe will set this to 2 instead of 1
type="response", re.form=~0)## 1
## 37.40363
We can then take the difference. We see that for a person +2 units
higher on stress_state, the predicted negative affect count
is 37.4 in the original metric, whereas for a person +1 unit higher on
stress_state, the predicted negative affect count is 21.37.
This shows that the value does not correspond to simply exponentiating
the slope of stress_state
(exp(fixef(model1_fit)["stress_state"]) = 1.75).
## (Intercept)
## 16.03487
We can also visualize this difference, pinpointing the two values of
our X variable (stress_state) we want to examine. Notice
how the difference between the predicted difference in the value of
negative affect count between stress_state = 1 and
stress_state = 2 is larger than the exponentiated slope of
stress_state (1.75 negative affect counts). We can see that
it is also larger than the difference of 1 unit between two other
stress_state (e.g., stress_state = -1
vs. stress_state = 0).
fixedplot2 +
geom_segment(aes(x = 1, y = -Inf, xend = 1, yend = exp(plus1unit)), size = .5, color = "red") +
geom_segment(aes(x = -Inf, y = exp(plus1unit), xend = 1, yend = exp(plus1unit)),
size = .5, color = "red")+
geom_segment(aes(x = 2, y = -Inf, xend = 2, yend = exp(plus2units)), size = .5,
lty = "dashed", color = "red") +
geom_segment(aes(x = -Inf, y = exp(plus2units), xend = 2, yend = exp(plus2units)),
size = .5, lty = "dashed",color = "red") +
theme_bw()E. Adding another Person-level Predictor
As in the basic multilevel model example, we can add trait
neuroticism to the model, to see if and how between-person differences
in neuroticism (bfi_n_c) are related to stress
reactivity.
Running the expanded model.
#simple model
model2_fit <- glmer(formula = negaffcount ~ 1 + stress_trait_c +
bfi_n_c + stress_trait_c:bfi_n_c +
stress_state + stress_state:stress_trait_c +
stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c +
(1 + stress_state|id),
family="poisson",
data=daily_long,
na.action=na.exclude)
summary(model2_fit)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: negaffcount ~ 1 + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +
## stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +
## stress_state:stress_trait_c:bfi_n_c + (1 + stress_state | id)
## Data: daily_long
##
## AIC BIC logLik -2*log(L) df.resid
## 10537.8 10595.8 -5257.9 10515.8 1427
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6747 -1.0714 -0.1799 0.8393 7.3668
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.1384 0.3720
## stress_state 0.1476 0.3841 -0.36
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.52086 0.02893 87.142 < 2e-16 ***
## stress_trait_c 0.78492 0.06161 12.740 < 2e-16 ***
## bfi_n_c 0.12742 0.03046 4.183 2.87e-05 ***
## stress_state 0.55101 0.03494 15.769 < 2e-16 ***
## stress_trait_c:bfi_n_c -0.19234 0.06263 -3.071 0.00213 **
## stress_trait_c:stress_state -0.31526 0.07722 -4.083 4.45e-05 ***
## bfi_n_c:stress_state 0.01424 0.03635 0.392 0.69519
## stress_trait_c:bfi_n_c:stress_state 0.10430 0.07927 1.316 0.18823
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## strss_trt_c -0.023
## bfi_n_c -0.011 -0.213
## stress_stat -0.321 0.017 0.002
## strss_t_:__ -0.204 -0.045 -0.002 0.061
## strss_tr_:_ 0.017 -0.316 0.062 -0.103 0.021
## bf_n_c:str_ 0.002 0.065 -0.326 -0.014 0.011 -0.183
## strs__:__:_ 0.059 0.021 0.011 -0.170 -0.316 -0.128 -0.103
We see that, while neuroticism is related to level of negative affect count, and moderates the association between stress_trait and negative affect count, neuroticism does not moderate the within-person association between stress_state and negative affect count. That is, neuroticism is not related to stress reactivity in these data.
F. Conclusion
This tutorial illustrated application of the generalized multilevel model in cases where the outcome variable is not normally distributed. Our example focused on a count variable, but the setp applies for other types of variables as well (e.g., binomial, Gamma). We encourage taking some care with the number of random effects included in these models, and with interpretation. Converting results back and forth using link functions and inverse link functions is complicated, but well worth the effort for interpretation and communication. Readers are most appreciative!
Good luck! =:-]
Citations
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01
Introduction to Generalized Linear Mixed Models. (n.d.). UCLA: Statistical Consulting Group. Retrieved July 7, 2025, from https://stats.oarc.ucla.edu/other/mult-pkg/introduction-to-generalized-linear-mixed-models/
R Core Team. (2024). R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. https://www.R-project.org/
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research Northwestern University. https://CRAN.R-project.org/package=psych
Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. 10.18637/jss.v040.i01
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag.
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation (Version 1.1.4). https://CRAN.R-project.org/package=dplyr