Intensive Longitudinal Data: Modeling Count Outcome using the Generalized Multilevel Model

Overview

This tutorial illustrates set-up and fitting of a generalized multilevel model to experience sampling data where the outcome variable is a count variable. The code for a Poisson multilevel model illustrated here can be adapted easily to accommodate other types of outcomes (e.g., to a logistic multilevel model for analysis of a binary outcome variable).

Our example makes use of the person-level and day-level (EMA-type) AMIB Data, a multiple time-scale data set that has been shared for teaching purposes.The data are shared with intent to facilitate learning about analysis of intensive longitudinal data (from experience sampling, daily diary, or EMA designs). The data are posted only for teaching purposes and should not be used for research.

Use requires citation and acknowledgement of both this website and the following paper:

Ram, N., Conroy, D. E., Pincus, A. L., Hyde, A. L., & Molloy, L. E. (2012). Tethering theory to method: Using measures of intraindividual variability to operationalize individuals’ dynamic characteristics. In G. Hancock & J. Harring (Eds.), Advances in longitudinal methods in the social and behavioral sciences (pp.81-110). New York: Information Age.

Parallel to the tutorial on the basic multilevel model, this tutorial treats the daily measure of Negative Affect as a count variable.

Note however that while the basic model was fit using the lme() function in the nlme library, the generalized model is being fit using the glmer() function in the lme4 library.

Outline

A. Overview of model

B. Set up the Generalized Multilevel Model (Poisson MLM for count data)

C. Interpretation of results

D. Display (data visualization) of the result

A. Introduction to the Generalized Multilevel Model

The basic linear multilevel model is written as

\[y_{it} = \beta_{0i} + \beta_{1i}x_{it}\] where \(\beta_{1i}\) is individual i’s level of … -ity, and where

\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]

where \[e_{it} \tilde{} N(0,\mathbf{R})\], and \[\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})\]

\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\], where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.

\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]

The generalized linear multilevel model is an extension of linear multilevel models that allows that response variables from different distributions besides Gaussian (see also https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-generalized-linear-mixed-models/). To do this, we introduce a link function.

Let the linear predictor (the right hand side of the regression equation) be called \(\mathbf{\eta}\). That is (writing the above model in a general way common in statistics), \[\mathbf{\eta = X\beta + Z\gamma}\] Thus, \(\mathbf{\eta}\) is the combination of the fixed and random effects excluding the residuals.

We introduce a generic link function, \(g(\cdot)\), that relates the outcome \(\mathbf{y}\) to the linear predictor \(\mathbf{\eta}\).
\(g(\cdot)\) = link function
\(h(\cdot)=g^{−1}(\cdot)\) = inverse link function
We use these link functions to formalize that the conditional expectation of \(\mathbf{y}\) (conditional because it is the expected value depending on the level of the predictors) is

\[g(E(\mathbf{y}))=\mathbf{\eta}\] So, basically the link function “transforms” the outcome variable \(\mathbf{y}\) into a normal outcome.

We could also model the expectation of \(\mathbf{y}\) as \[E(\mathbf{y})=h(\mathbf{\eta})=\mathbf{\mu}\]

With \(\mathbf{y}\) itself equal to

\[\mathbf{y}=h(\mathbf{\eta}) + \mathbf{e}\]

Link functions and Families

There are many different link functions. In this tutorial we focus on the Poisson family, which is useful for modeling outcome variables that are measured as counts.

Count Outcomes

For a count outcome, we use a log link function and the probability mass function (PMF) for the Poisson distribution. These are:

\[g(\cdot) = log_{e}(\cdot)\] \[h(\cdot)=e^{(\cdot)}\] \[PMF = Pr(X = k) = \frac{\lambda^ke^-\lambda}{k!}\] \[E(X)=\lambda\] \[Var(X)=\lambda\]

Model Fitting

When we use an “identity” link function, we are in our usual specification of means and variances for the normal (Gaussian) distribution. “Generalized” is good because it covers the usual case and expands to other situations (binary, count, etc.). However, the estimation algorithms are different when working on the more general problem, and so the usual (normal continuous) case can be slow when using the software for generalized linear modeling. Stick with the non-generalized software for the normal case, and use the specialized programs when needed.

Model Interpretation

After inclusion of the link function - the modeling proceeds as usual. Interpretation of results is similar. On the linearized metric (after using the link function) interpretation is done in a standard way - interpreting significance and sign of parameters. However, for substantive interpretation, it is often easier to back transform the results to the original metric. For example, in a random effects logistic model, one might want to talk about the probability of an event given some specific values of the predictors. Likewise in a poisson (count) model, one might want to talk about the expected count rather than the expected log count. These transformations complicate matters because they are nonlinear and so even random intercepts no longer play a strictly additive role and instead can have a multiplicative effect. Good explanation can be found, among other places, at https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-generalized-linear-mixed-models/ and a rich resource of practical questions is here http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html.

We will walk through an example together in the Addressing Example Research Questions section below.

B. Set up the Generalized Multilevel Model (Poisson MLM for count data)

Loading Libraries

Loading libraries used in this script.

library(psych)    #describing the data
library(plyr)     #data manipulation
library(dplyr)    #plyr, but faster and more functionality
library(ggplot2)  #data visualization
library(lme4)     #multilevel models

Loading Data

Our example makes use of the person-level and day-level (EMA-type) AMIB data files. Specifically, we make use of three variables:

Outcome: daily negative affect count, obtained through transformation of negaff in the day-level file;
Predictor: daily stress, obtained through reverse coding of pss in the day-level file;
Person-level predictor: trait neuroticism, bfi_n in the person-level file

Loading person-level file and subsetting to variables of interest

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/AMIB/AMIB_persons.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath), header=TRUE)

#subsetting to variables of interest
AMIB_persons <- AMIB_persons %>%
  select(id, bfi_n)

Loading day-level file (T = 8) and subsetting to variables of interest.

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/AMIB/AMIB_daily.csv"
#read in the .csv file using the url() function
AMIB_daily <- read.csv(file=url(filepath), header=TRUE)

#subsetting to variables of interest
AMIB_daily <- AMIB_daily %>%
  select(id, day, negaff, pss)

Data Preparation

For our Poisson illustration we convert negaff to a count variable - which must have only integer values. Our new variable is called negaffcount.

#making a negaff count variable that ranges from 0 to 60
AMIB_daily <- AMIB_daily %>%
  mutate(negaffcount = round(negaff-1, 1)*10)

#describing new variable
table(AMIB_daily$negaffcount)

## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
## 39 45 42 49 56 56 82 69 58 57 72 51 65 47 26 50 47 40 39 52 38 38 37 32 20 22 
## 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 48 49 51 54 59 
## 27 22 17 14 12 10 13 10  9 10  9  5  8  6  9  6  6  2  1  2  3  3  3  2  1  2

describe(AMIB_daily$negaffcount)

##    vars    n  mean    sd median trimmed   mad min max range skew kurtosis   se
## X1    1 1441 14.47 10.41     12   13.38 10.38   0  59    59 0.96     0.77 0.27

#histogram of the count variable
AMIB_daily %>%
  ggplot(aes(x=negaffcount)) +
  geom_histogram(fill="royalblue2", color="black") +
  theme_bw()

We also reverse code pss to a continuous stress variable.

#reverse coding the pss variable into a new stress variable
AMIB_daily <- AMIB_daily %>%
  mutate(stress = 4 - pss)

#describing new variable
describe(AMIB_daily$stress)

##    vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 1445 1.39 0.68   1.25    1.36 0.74   0   4     4 0.35     0.13 0.02

#histogram
AMIB_daily %>%
  ggplot(aes(x=stress)) +
  geom_histogram(fill="royalblue2", color="black") +
  labs(x = "Stress (high = more stressed)") +
  theme_bw()

As usual we split the predictor into “trait” (between-person differences) and “state” (within-person deviations) components. Specifically, the daily variable stress is split into two varaibles: stress_trait is the sample-mean centered between-person component, and stress_state is the person-centered within-person component.

Making trait variables

#calculating intraindividual means
AMIB_imeans <- AMIB_daily %>% 
  dplyr::group_by(id) %>% 
  dplyr::summarize(stress_trait=mean(stress, na.rm=TRUE))

describe(AMIB_imeans)

##              vars   n   mean     sd median trimmed    mad    min    max  range
## id              1 190 318.29 130.44 321.50  318.99 151.23 101.00 532.00 431.00
## stress_trait    2 190   1.40   0.48   1.41    1.39   0.51   0.19   2.56   2.38
##               skew kurtosis   se
## id           -0.04    -1.09 9.46
## stress_trait -0.04    -0.23 0.03

#merging into person-level file
AMIB_persons <- merge(AMIB_persons, AMIB_imeans, by="id")                                              

#make centered versions of the person-level scores
AMIB_persons <- AMIB_persons %>%
  mutate(bfi_n_c = scale(bfi_n, center=TRUE, scale=FALSE),
         stress_trait_c = scale(stress_trait, center=TRUE, scale=FALSE))

#describe person-level data
describe(AMIB_persons)

##                vars   n   mean     sd median trimmed    mad    min    max
## id                1 190 318.29 130.44 321.50  318.99 151.23 101.00 532.00
## bfi_n             2 190   2.98   0.96   3.00    3.00   1.48   1.00   5.00
## stress_trait      3 190   1.40   0.48   1.41    1.39   0.51   0.19   2.56
## bfi_n_c           4 190   0.00   0.96   0.02    0.02   1.48  -1.98   2.02
## stress_trait_c    5 190   0.00   0.48   0.01    0.00   0.51  -1.21   1.17
##                 range  skew kurtosis   se
## id             431.00 -0.04    -1.09 9.46
## bfi_n            4.00 -0.09    -0.82 0.07
## stress_trait     2.38 -0.04    -0.23 0.03
## bfi_n_c          4.00 -0.09    -0.82 0.07
## stress_trait_c   2.38 -0.04    -0.23 0.03

Making state variables in long data

#merging person-level data into daily data
daily_long <- merge(AMIB_daily, AMIB_persons,by="id")

#calculating state variables
daily_long <- daily_long %>%
  mutate(stress_state = stress - stress_trait)

#describing data
describe(daily_long)

##                vars    n   mean     sd median trimmed    mad    min    max
## id                1 1458 322.53 129.08 324.00  324.20 151.23 101.00 532.00
## day               2 1458   3.48   2.30   3.00    3.47   2.97   0.00   7.00
## negaff            3 1441   2.45   1.04   2.20    2.34   1.04   1.00   6.90
## pss               4 1445   2.61   0.68   2.75    2.64   0.74   0.00   4.00
## negaffcount       5 1441  14.47  10.41  12.00   13.38  10.38   0.00  59.00
## stress            6 1445   1.39   0.68   1.25    1.36   0.74   0.00   4.00
## bfi_n             7 1458   2.97   0.96   3.00    2.98   1.48   1.00   5.00
## stress_trait      8 1458   1.39   0.47   1.41    1.39   0.51   0.19   2.56
## bfi_n_c           9 1458  -0.01   0.96   0.02    0.00   1.48  -1.98   2.02
## stress_trait_c   10 1458  -0.01   0.47   0.01   -0.01   0.51  -1.21   1.17
## stress_state     11 1445   0.00   0.49  -0.03   -0.02   0.46  -1.75   2.12
##                 range  skew kurtosis   se
## id             431.00 -0.07    -1.06 3.38
## day              7.00  0.01    -1.24 0.06
## negaff           5.90  0.96     0.77 0.03
## pss              4.00 -0.35     0.13 0.02
## negaffcount     59.00  0.96     0.77 0.27
## stress           4.00  0.35     0.13 0.02
## bfi_n            4.00 -0.08    -0.79 0.03
## stress_trait     2.38 -0.08    -0.21 0.01
## bfi_n_c          4.00 -0.08    -0.79 0.03
## stress_trait_c   2.38 -0.08    -0.21 0.01
## stress_state     3.88  0.36     0.79 0.01

Great! Now the data are all prepared as needed.

Setting up for Generalized Multilevel Model with Count Outcome

Outlining the substantive inquiry.

As we did previously, we define stress reactivity (a person-level dynamic characteristic) as the extent to which a person’s level of daily negative affect is related to daily stress level. Stress reactivity is the intraindividual covariation of daily negative affect and daily stress. This time, though, we explicitly model negative affect as a count variable.

Formally,
\[log_{e}\left( y_{it} \right) = \beta_{0i} + \beta_{1i}x_{it}\]

where the outcome (left side of equation) has been “transformed” using the log link function, \(g(\cdot) = log_{e}(\cdot)\), and where \(\beta_{1i}\) is individual i’s level of stress reactivity.

Lets first look at a subset of persons to see what stress reactivity looks like.

Preliminary Visualization

Let’s take a look at scatterplots of the raw data for a subset of participants.

#faceted plot
panelplot <- 
  daily_long %>%
  filter(id <= 125) %>%
  ggplot(aes(x=stress_state,y=negaffcount)) +
  geom_point(alpha = .8, color = "royalblue") + #
  xlab("Stress") + ylab("Negative Affect (Count)") + 
  scale_x_continuous(limits=c(-2,2), breaks=seq(-2,2,by=1)) + 
  scale_y_continuous(limits=c(0,60), breaks=c(0,30,60)) +
  facet_wrap( ~ id) +
  theme_bw()

panelplot

We can already begin to see hints of nonlinearity in these associations (e.g., participant 106), and we can also note the between-person differences.

Running the Model

We use the glmer() function in the lme4 package.

To start, lets fit the usual unconditional means model …

#unconditional means model
model0_fit <- glmer(formula = negaffcount ~ 1 + (1|id), 
              family="poisson",
              data=daily_long,
              na.action=na.exclude)
summary(model0_fit)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: negaffcount ~ 1 + (1 | id)
##    Data: daily_long
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   12556.4   12567.0   -6276.2   12552.4      1439 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.8837 -1.4131 -0.3372  1.0990  8.9100 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.3014   0.549   
## Number of obs: 1441, groups:  id, 190
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.55757    0.04065   62.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output is similar to what we are used to - although, note that there is no residual term (as per the equation above). Remember that the outcome is \(log(negaffcount_{it})\).

OK - lets add the predictors, stress_trait and stress_level._state to examine stress reactivity.

#simple model 
model1_fit <- glmer(formula = negaffcount ~ 1 + stress_trait_c + 
                      stress_state + stress_state:stress_trait_c + 
                      (1 + stress_state|id), 
                    family="poisson",
                    data=daily_long,
                    na.action=na.exclude)
summary(model1_fit)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: 
## negaffcount ~ 1 + stress_trait_c + stress_state + stress_state:stress_trait_c +  
##     (1 + stress_state | id)
##    Data: daily_long
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   10558.8   10595.7   -5272.4   10544.8      1431 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6742 -1.0652 -0.1746  0.8470  7.3430 
## 
## Random effects:
##  Groups Name         Variance Std.Dev. Corr 
##  id     (Intercept)  0.1591   0.3989        
##         stress_state 0.1500   0.3874   -0.34
## Number of obs: 1438, groups:  id, 190
## 
## Fixed effects:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  2.50209    0.03018  82.898  < 2e-16 ***
## stress_trait_c               0.83509    0.06396  13.057  < 2e-16 ***
## stress_state                 0.55984    0.03469  16.137  < 2e-16 ***
## stress_trait_c:stress_state -0.29855    0.07564  -3.947 7.91e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) strs__ strss_
## strss_trt_c -0.032              
## stress_stat -0.305  0.021       
## strss_tr_:_  0.021 -0.300 -0.139

#other useful outputs:
#confint(model1_fit) # 95% CI for the coefficients
#exp(coef(model1_fit)) # exponentiated coefficients
#exp(confint(model1_fit, method="Wald")) # 95% CI for exponentiated coefficients
#predict(model1_fit, type="response") # predicted values
#residuals(model1_fit, type="deviance") # residuals

Interpreting Results

Within this model the intercept \(\gamma_{00}\) is the expected ‘log’ of negative affect for a prototypical person on a prototyical day. We exponentiate to obtain an estimate on the original ‘count’ scale.

exp(2.50209)

## [1] 12.20798

Therefore, the prototype person is expected to have negative affect count of 12.21 on the protoypical (average stress) day.

Similarly, \(\gamma_{10}\) is the protoypical within-person association between daily stress and log negative affect. We exponentiate to obtain an estimate on the original ‘count’ scale.

 exp(0.55983)

## [1] 1.750375

In a moment we will go over how to interpret this slope and obtain predicted count values at varying levels of stress_state.

C. Model Visualizations

Generating Predictions

Lets look at predicted scores. This can be done in two ways.
1. We can predict in the link scale (for us that is log odds), or
2. We can predict in the response scale (for us that is count).

#obtaining predicted scores for individuals
daily_long <- daily_long %>%
  mutate(pred_m1log = predict(model1_fit, type="link"),
         pred_m1original = predict(model1_fit, type="response"))

#create new data for prototypical person with average person-level predictor
avedata <- data.frame(
  # fill in a sequence of values between the min and max observed in dataset
  stress_state = seq(min(daily_long$stress_state, na.rm = T),
                     max(daily_long$stress_state, na.rm = T), by = .1),
  stress_trait_c = 0)


# generate model prediction (in log units) for prototypical person using fixed effects
fixedpredictlog <- cbind(avedata, predict(model1_fit, avedata, type = "link", re.form=~0))
colnames(fixedpredictlog) <- c("stress_state", "stress_trait_c", "fit")

# generate model prediction (in original units) for prototypical person using fixed effects
fixedpredict <- cbind(avedata, predict(model1_fit, avedata, type = "response", re.form =~0))
colnames(fixedpredict) <- c("stress_state", "stress_trait_c", "fit")

Plotting Results

Plotting predicted within-person associations.

fixedplotlog <- 
  daily_long %>%
  ggplot(aes(x = stress_state, y = pred_m1log, group = id)) +
  geom_line(color="black", alpha = .6) +
  xlab("Stress State (centered)") + ylab("Predicted Log Negative Affect") +
  geom_line(data = fixedpredictlog, aes(x = stress_state, y = fit, group = NULL),
       color = "red", linewidth = 2) +
  xlab("Stress State (centered)") + 
  ylab("Predicted Log Negative Affect") +
  theme_bw()

fixedplotlog

fixedplot <- 
  daily_long %>%
  ggplot(aes(x = stress_state, y = pred_m1original, group = id)) + 
  geom_line(alpha = .6) +
  geom_line(data = fixedpredict, aes(x = stress_state, y = fit, group = NULL),
            color = "red", linewidth = 2) +
  xlab("Stress State (centered)") + 
  ylab("Predicted Negative Affect Count") +
  theme_bw()

fixedplot

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_line()`).

Interpolate the Data

Notice, however, that the lines don’t all look smooth. Some have a clear break in them. This can happen when ggplot is drawing a line between sparse data points. If the line is supposed to be straight, the sparseness of the data won’t matter. But if it is supposed to be curved, then having gaps in the data can generate these kinds of discontinuities. One particularly noteworthy example is indicated with the red circle.

daily_long %>%
  ggplot(aes(x = stress_state, y = pred_m1original, group = id)) +
  geom_line(alpha = .6) +
  xlab("Stress State (centered)") + 
  ylab("Predicted Negative Affect Count") +
  geom_point() +
  geom_point(aes(x = .9, y =54), pch = 1, size = 9, fill = "ghostwhite", color = "red") +
  theme_bw()

This is mostly an aesthetic consideration, although it could have implications for eye-balling predicted values. To address this issue, we can interpolate data for plotting. To be clear, we are NOT adding more data to the model; the model only uses our actual observations to generate estimates. Instead, we are just filling in more data points for the purposes of plotting, and we will do this only within each person’s observed range. In other words, we will not add data that is fall outside of each person’s minimum observed value or maximum observed value.

To accomplish this, we will need to create a new dataframe, called newdata2. We will begin by setting this dataframe to NULL so that it is just a placeholder waiting to “catch” the data we will populate it with.

Next, we will set up a for loop in which we will ask R to take each subject’s minimum and maximum observed value, and generate values within that range in small increments, in this case increments of .1 (as shown with the seq() function). This will enable us to have a high degree of granularity so that our curves will be smooth when we plot them. We will also need to set other variables to 0, as they were mean centered in our model.

newdata2 <- NULL
for (i in unique(daily_long$id)) {
  cseq <- data.frame(
  stress_state = seq(min(subset(daily_long, id==i)$stress_state, na.rm=T), 
               max(subset(daily_long, id==i)$stress_state, na.rm=T), .01),
  id = i,
  stress_trait_c = unique(subset(daily_long, id==i)$stress_trait_c))
  newdata2 <- rbind(newdata2, cseq)
}

proto_m1_interp <- cbind(newdata2, predict(model1_fit, newdata2,
                                                    type="response", re.form=NULL))
colnames(proto_m1_interp) <- c("stress_state", "id", "stress_trait_c", 
                               "fit")

#Count scale predictions - interpolated data
fixedplot2 <- 
  daily_long %>%
  ggplot(aes(x = stress_state, y = pred_m1original, group = id)) +
  xlab("Stress State (centered)") + ylab("Predicted Negative Affect (Count)") +
  geom_line(data = proto_m1_interp, aes(stress_state, fit, group = id), color = "blue", alpha =.6) +
  geom_line(data = fixedpredict, aes(x = stress_state, y = fit, group = NULL),
            color = "red", linewidth = 2) +
  theme_bw()

fixedplot2

#Count scale predictions - interpolated vs. non interpolated data
fixedplot2both <- 
  daily_long %>%
  ggplot(aes(x = stress_state, y = pred_m1original, group = id)) +
  geom_line(color="black", alpha = .6) +
  xlab("Stress State (centered)") + ylab("Predicted Negative Affect (Count)") +
  geom_line(data = proto_m1_interp, aes(stress_state, fit, group = id), color = "blue", alpha =.6) + 
  geom_line(data = fixedpredict, aes(x = stress_state, y = fit, group = NULL),
            color = "red", size = 2) +
  theme_bw()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

fixedplot2both

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_line()`).

D. Addressing Example Research Questions

What is the predicted Negative Affect Count for a person who is +1 unit higher on `stress_state` and average `stress_trait_c`?

# Method 1: Weights
plus1unit <- 1*lme4::fixef(model1_fit)["(Intercept)"] + 
  0*lme4::fixef(model1_fit)["stress_trait_c"] +
  1*lme4::fixef(model1_fit)["stress_state"] +
  0*lme4::fixef(model1_fit)["stress_trait_c:stress_state"] 
exp(plus1unit)

## (Intercept) 
##    21.36876

# Method 2: Predict
predict(model1_fit, 
        newdata = data.frame(
          stress_trait_c = 0,
          stress_state = 1),
        type="response", re.form=~0)

##        1 
## 21.36876

What is the predicted difference in Negative Affect Count for a person who is +2 units higher (vs. +1 unit higher) on `stress_state` and average `stress_trait_c`?

Our knowledge of regression might make us think that we can get the answer to this question by just exponentiating the slope for stress_state. But, as you will see, it’s not so simple when working with GMLM.

exp(fixef(model1_fit)["stress_state"])

## stress_state 
##     1.750389

Instead, we can compute the predicted negative affect count for these two scenarios. We have already computed the predicted negative affect count for a person who is +1 unit higher on stress_state and average stress_trait_c in the step above.

exp(plus1unit)

## (Intercept) 
##    21.36876

Here, we do the same thing, but set stress_state to 2 units above the mean.

# Method 1: Weights
plus2units <- 1*lme4::fixef(model1_fit)["(Intercept)"] + 
  0*lme4::fixef(model1_fit)["stress_trait_c"] +
  2*lme4::fixef(model1_fit)["stress_state"] + # We will multiply by 2 now instead of 1
  0*lme4::fixef(model1_fit)["stress_trait_c:stress_state"] 
exp(plus2units)

## (Intercept) 
##    37.40363

# Method 2: Predict
predict(model1_fit, 
        newdata = data.frame(
          stress_trait_c = 0,
          stress_state = 2), # NWe will set this to 2 instead of 1
        type="response", re.form=~0)

##        1 
## 37.40363

We can then take the difference. We see that for a person +2 units higher on stress_state, the predicted negative affect count is 37.4 in the original metric, whereas for a person +1 unit higher on stress_state, the predicted negative affect count is 21.37. This shows that the value does not correspond to simply exponentiating the slope of stress_state (exp(fixef(model1_fit)["stress_state"]) = 1.75).

exp(plus2units) - exp(plus1unit)

## (Intercept) 
##    16.03487

We can also visualize this difference, pinpointing the two values of our X variable (stress_state) we want to examine. Notice how the difference between the predicted difference in the value of negative affect count between stress_state = 1 and stress_state = 2 is larger than the exponentiated slope of stress_state (1.75 negative affect counts). We can see that it is also larger than the difference of 1 unit between two other stress_state (e.g., stress_state = -1 vs. stress_state = 0).

fixedplot2 + 
  geom_segment(aes(x = 1, y = -Inf, xend = 1, yend = exp(plus1unit)), size = .5, color = "red") +
  geom_segment(aes(x = -Inf, y = exp(plus1unit), xend = 1, yend = exp(plus1unit)),
               size = .5, color = "red")+
  geom_segment(aes(x = 2, y = -Inf, xend = 2, yend = exp(plus2units)), size = .5, 
               lty = "dashed", color = "red") +
  geom_segment(aes(x = -Inf, y = exp(plus2units), xend = 2, yend = exp(plus2units)),
               size = .5, lty = "dashed",color = "red") +
  theme_bw()

E. Adding another Person-level Predictor

As in the basic multilevel model example, we can add trait neuroticism to the model, to see if and how between-person differences in neuroticism (bfi_n_c) are related to stress reactivity.

Running the expanded model.

#simple model 
model2_fit <- glmer(formula = negaffcount ~ 1 + stress_trait_c + 
                      bfi_n_c + stress_trait_c:bfi_n_c +
                      stress_state + stress_state:stress_trait_c + 
                      stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c + 
                      (1 + stress_state|id), 
                    family="poisson",
                    data=daily_long,
                    na.action=na.exclude)
summary(model2_fit)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: negaffcount ~ 1 + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +  
##     stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +  
##     stress_state:stress_trait_c:bfi_n_c + (1 + stress_state |      id)
##    Data: daily_long
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   10537.8   10595.8   -5257.9   10515.8      1427 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6747 -1.0714 -0.1799  0.8393  7.3668 
## 
## Random effects:
##  Groups Name         Variance Std.Dev. Corr 
##  id     (Intercept)  0.1384   0.3720        
##         stress_state 0.1476   0.3841   -0.36
## Number of obs: 1438, groups:  id, 190
## 
## Fixed effects:
##                                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                          2.52086    0.02893  87.142  < 2e-16 ***
## stress_trait_c                       0.78492    0.06161  12.740  < 2e-16 ***
## bfi_n_c                              0.12742    0.03046   4.183 2.87e-05 ***
## stress_state                         0.55101    0.03494  15.769  < 2e-16 ***
## stress_trait_c:bfi_n_c              -0.19234    0.06263  -3.071  0.00213 ** 
## stress_trait_c:stress_state         -0.31526    0.07722  -4.083 4.45e-05 ***
## bfi_n_c:stress_state                 0.01424    0.03635   0.392  0.69519    
## stress_trait_c:bfi_n_c:stress_state  0.10430    0.07927   1.316  0.18823    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## strss_trt_c -0.023                                           
## bfi_n_c     -0.011 -0.213                                    
## stress_stat -0.321  0.017  0.002                             
## strss_t_:__ -0.204 -0.045 -0.002  0.061                      
## strss_tr_:_  0.017 -0.316  0.062 -0.103  0.021               
## bf_n_c:str_  0.002  0.065 -0.326 -0.014  0.011  -0.183       
## strs__:__:_  0.059  0.021  0.011 -0.170 -0.316  -0.128 -0.103

We see that, while neuroticism is related to level of negative affect count, and moderates the association between stress_trait and negative affect count, neuroticism does not moderate the within-person association between stress_state and negative affect count. That is, neuroticism is not related to stress reactivity in these data.

F. Conclusion

This tutorial illustrated application of the generalized multilevel model in cases where the outcome variable is not normally distributed. Our example focused on a count variable, but the setp applies for other types of variables as well (e.g., binomial, Gamma). We encourage taking some care with the number of random effects included in these models, and with interpretation. Converting results back and forth using link functions and inverse link functions is complicated, but well worth the effort for interpretation and communication. Readers are most appreciative!

Good luck! =:-]

Citations

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01

Introduction to Generalized Linear Mixed Models. (n.d.). UCLA: Statistical Consulting Group. Retrieved July 7, 2025, from https://stats.oarc.ucla.edu/other/mult-pkg/introduction-to-generalized-linear-mixed-models/

R Core Team. (2024). R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. https://www.R-project.org/

Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research Northwestern University. https://CRAN.R-project.org/package=psych

Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. 10.18637/jss.v040.i01

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag.

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation (Version 1.1.4). https://CRAN.R-project.org/package=dplyr

Intensive Longitudinal Data: Modeling Count Outcome using the Generalized Multilevel Model

Overview

Outline

A. Introduction to the Generalized Multilevel Model

Link functions and Families

Count Outcomes

Model Fitting

Model Interpretation

B. Set up the Generalized Multilevel Model (Poisson MLM for count data)

Loading Libraries

Loading Data

Data Preparation

Making trait variables

Making state variables in long data

Setting up for Generalized Multilevel Model with Count Outcome

Preliminary Visualization

Running the Model

Interpreting Results

C. Model Visualizations

Generating Predictions

Plotting Results

Interpolate the Data

D. Addressing Example Research Questions

What is the predicted Negative Affect Count for a person who is +1 unit higher on stress_state and average stress_trait_c?

What is the predicted difference in Negative Affect Count for a person who is +2 units higher (vs. +1 unit higher) on stress_state and average stress_trait_c?

E. Adding another Person-level Predictor

F. Conclusion

Citations

What is the predicted Negative Affect Count for a person who is +1 unit higher on `stress_state` and average `stress_trait_c`?

What is the predicted difference in Negative Affect Count for a person who is +2 units higher (vs. +1 unit higher) on `stress_state` and average `stress_trait_c`?