ESM Data Mining using GLMER Trees

APA-ATI Intensive Longitudinal Data

Overview

Repeated measures data, often obtained from experience sampling or daily diary studies, require analytic methods that accommodate the inherent nesting in the data. Multilevel and structural equation modeling approaches are typically used to analyze repeated measures data. However, data mining and machine learning methods, particularly decision trees, are now being modified to accommodate repeated measures data (e.g., SEM trees, Brandmaier et al., 2013; mixed-effects regression trees, Hajjem et al., 2011, 2014; nonlinear longitudinal recursive partitioning, Stegmann et al., 2018).

In this tutorial, we illustrate how generalized linear mixed effects regression trees (glmer trees; Fokkema, Smits, Zeileis, Hothorn & Kelderman, 2018) can be applied to EMA-type data for knowledge discovery.

Generalized linear mixed effects regression trees have a continuous outcome variable (e.g., negative affect measured on a Likert or slider scale). Generalized linear mixed effects regression trees can accommodate a sizable number of predictor variables, and these predictors can be categorical, interval, or continuous.

Ultimately, the model aims to sort each instance (or data point) into a node, with the instances within a node being as similar as possible. Note that nodes are compromised of observations, not individuals. To determine the composition of a node, generalized linear mixed effects regression trees use a recursive algorithm to minimize the sum of squares of the node.

In our example, we will use data from an experience sampling study that examines daily interactions, emotions, and behaviors. Specifically, we will use the AMIB data set that followed 190 adults for eight days - thus repeated measures across days are nested within individuals.

Outline

In this tutorial, we will cover…

The Research Questions
The Data
Generalized Linear Mixed Effects Regression Tree Model
Affect Reactivity Model
Conclusion

1. The Research Questions

We are going to address:

Are people’s personality, patterns of daily behaviors (e.g., sleep hours and quality, perceived stress, physical activity), and positive affect predictive of their daily negative affect?
What variables are best at distinguishing differences in daily negative affect?
Furthermore, is it possible to specific a random slope multilevel model using generalized linear mixed effect regression trees?

Notice that no pattern of association is hypothesized between the predictors and the outcome; this is one advantage of regression trees - the different splits allow for higher-order interactions and thus non-linear associations between the predictors and outcome variables.

2. The Data

The data are organized as follows:

There are N (number of individuals) = 190 individuals, a majority of whom completed all eight days of the daily diary study. In total, we have 1,418 rows of data - an average of 7.46 reports per individual.

The data set has additional columns we will not use, so we will subset to only the variables we need, which are an ID variable, time variable, predictors variables, and outcome variable. Specifically…

Columns:
- Participant ID (id)
- Time (e.g., day within the daily diary study; day)
- Big Five Inventory: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism
- Sleep hours (slphrs)
- Sleep quality (slpwell)
- Physical activity (lteq)
- Perceived stress (pss)
- Positive affect (posaff)
- Negative affect (negaff)
- for a total of 8 columns in this data set

Loading Libraries

Loading libraries used in this script.

library(psych)      #describing the data
library(ggplot2)    #data visualization
library(party)      #tree models; alternative decision tree algorithm
library(partykit)   #tree models; updated party functions
library(rattle)     #tree models; fancy tree plot
library(glmertree)  #generalized linear mixed effects regression tree models
library(dplyr)      #data manipulation

Loading Data

Load and merge person-level and day-level data.

# Set filepath for person-level data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/AMIB/AMIB_persons.csv"
# Read in the .csv file using the url() function
AMIB_persons <- read.csv(file = url(filepath), header = TRUE)

# Set filepath for daily data file
filepath1 <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/AMIB/AMIB_daily.csv"
# Read in the .csv file using the url() function
AMIB_daily <- read.csv(file = url(filepath1), header = TRUE)

# Merge daily and person-level data
amib <- merge(AMIB_persons, AMIB_daily, by = "id")

Data Preparation

Reduce to the variables of interest.

amib <- amib %>%
  select(id, day, bfi_e, bfi_a, bfi_c, bfi_n, bfi_o, slphrs, slpwell, pss, lteq, posaff)

head(amib, 10)

##     id day bfi_e bfi_a bfi_c bfi_n bfi_o slphrs slpwell  pss lteq posaff
## 1  101   0   3.5   1.5   4.0     2   4.0    6.0       1 2.50   10    3.9
## 2  101   1   3.5   1.5   4.0     2   4.0    2.0       1 2.75   10    3.8
## 3  101   2   3.5   1.5   4.0     2   4.0    9.0       3 3.50   10    5.1
## 4  101   3   3.5   1.5   4.0     2   4.0    7.5       3 3.00    9    5.6
## 5  101   4   3.5   1.5   4.0     2   4.0    8.0       4 2.75   18    4.3
## 6  101   5   3.5   1.5   4.0     2   4.0    8.0       3 2.75   19    3.9
## 7  101   6   3.5   1.5   4.0     2   4.0    8.0       4 3.50   21    5.1
## 8  101   7   3.5   1.5   4.0     2   4.0    7.0       3 2.75   14    4.8
## 9  102   0   5.0   4.5   3.5     2   2.5    7.0       3 3.50   12    6.3
## 10 102   1   5.0   4.5   3.5     2   2.5    6.0       3 4.00   20    7.0

Reverse code pss so higher values indicate higher perceived stress.

#reverse coding the pss variable into a new stress variable
amib$stress <- 4 - amib$pss

Split the variables into “trait” (between-person differences) and “state” (within-person deviations) components.

Specifically, the time-varying predictors, sleep hours, sleep quality, perceived stress, physical activity, and positive affect are split into two variables:

VARIABLE_trait is the sample-mean centered between-person component,
VARIABLE_state is the person-centered within-person component.
For convenience of plotting we also split the outcome variable, posaff.

Calculate the Trait Variables

# Calculate intraindividual means
amib_imeans <- amib %>%
  group_by(id) %>%
  summarize(posaff_trait = mean(posaff, na.rm=TRUE),
            slphrs_trait = mean(slphrs, na.rm=TRUE),
            slpwell_trait = mean(slpwell, na.rm=TRUE),
            stress_trait = mean(stress, na.rm=TRUE),
            lteq_trait = mean(lteq, na.rm=TRUE))

# Merge into long data set
amib <- merge(amib, amib_imeans, by="id")

Calculate the State Variables

#calculate state variables
amib <- amib %>%
  mutate(posaff_state = posaff - posaff_trait,
         slphrs_state = slphrs - slphrs_trait,
         slpwell_state = slpwell - slpwell_trait,
         stress_state = stress - stress_trait,
         lteq_state = lteq - lteq_trait)

# Describe data
describe(amib)

##               vars    n   mean     sd median trimmed    mad    min    max
## id               1 1458 322.53 129.08 324.00  324.20 151.23 101.00 532.00
## day              2 1458   3.48   2.30   3.00    3.47   2.97   0.00   7.00
## bfi_e            3 1458   3.39   1.00   3.50    3.41   0.74   1.00   5.00
## bfi_a            4 1458   3.61   0.88   3.50    3.69   0.74   1.00   5.00
## bfi_c            5 1458   3.76   0.85   4.00    3.78   0.74   1.50   5.00
## bfi_n            6 1458   2.97   0.96   3.00    2.98   1.48   1.00   5.00
## bfi_o            7 1458   3.60   0.96   3.50    3.65   0.74   1.00   5.00
## slphrs           8 1428   7.17   1.81   7.00    7.18   1.48   0.00  18.00
## slpwell          9 1439   2.49   1.09   3.00    2.56   1.48   0.00   4.00
## pss             10 1445   2.61   0.68   2.75    2.64   0.74   0.00   4.00
## lteq            11 1433  12.44  10.37   9.00   11.18   8.90   0.00  58.00
## posaff          12 1441   4.12   1.10   4.20    4.15   1.19   1.00   7.00
## stress          13 1445   1.39   0.68   1.25    1.36   0.74   0.00   4.00
## posaff_trait    14 1458   4.11   0.73   4.10    4.11   0.63   2.22   6.04
## slphrs_trait    15 1458   7.16   0.94   7.20    7.20   0.91   4.12   9.31
## slpwell_trait   16 1458   2.49   0.65   2.50    2.50   0.56   0.38   4.00
## stress_trait    17 1458   1.39   0.47   1.41    1.39   0.51   0.19   2.56
## lteq_trait      18 1458  12.60   8.35  10.75   11.88   8.52   0.00  45.33
## posaff_state    19 1441   0.00   0.82   0.05    0.01   0.70  -3.17   3.29
## slphrs_state    20 1428   0.00   1.55   0.00   -0.01   1.30  -8.75   9.25
## slpwell_state   21 1439   0.00   0.88   0.12    0.03   0.74  -3.29   2.57
## stress_state    22 1445   0.00   0.49  -0.03   -0.02   0.46  -1.75   2.12
## lteq_state      23 1433   0.00   6.44  -0.25   -0.11   5.37 -23.75  28.88
##                range  skew kurtosis   se
## id            431.00 -0.07    -1.06 3.38
## day             7.00  0.01    -1.24 0.06
## bfi_e           4.00 -0.20    -0.58 0.03
## bfi_a           4.00 -0.69     0.10 0.02
## bfi_c           3.50 -0.11    -0.90 0.02
## bfi_n           4.00 -0.08    -0.79 0.03
## bfi_o           4.00 -0.36    -0.43 0.03
## slphrs         18.00  0.10     1.83 0.05
## slpwell         4.00 -0.52    -0.36 0.03
## pss             4.00 -0.35     0.13 0.02
## lteq           58.00  1.07     0.94 0.27
## posaff          6.00 -0.25    -0.33 0.03
## stress          4.00  0.35     0.13 0.02
## posaff_trait    3.81  0.05     0.09 0.02
## slphrs_trait    5.19 -0.40     0.28 0.02
## slpwell_trait   3.62 -0.31     0.55 0.02
## stress_trait    2.38 -0.08    -0.21 0.01
## lteq_trait     45.33  0.85     0.60 0.22
## posaff_state    6.46 -0.14     0.82 0.02
## slphrs_state   18.00  0.19     3.31 0.04
## slpwell_state   5.86 -0.31     0.34 0.02
## stress_state    3.88  0.36     0.79 0.01
## lteq_state     52.62  0.18     1.52 0.17

Note that we did not sample-center the person-level variables. Tree models generally work with the raw data. The state variables are by definition person-centered.

However, the data cannot have missingness. Thus, we reduce to only the complete cases.

Reduce to Complete Cases

amib <- amib[complete.cases(amib), ]
head(amib)

##    id day bfi_e bfi_a bfi_c bfi_n bfi_o slphrs slpwell  pss lteq posaff stress
## 1 101   0   3.5   1.5     4     2     4    6.0       1 2.50   10    3.9   1.50
## 2 101   1   3.5   1.5     4     2     4    2.0       1 2.75   10    3.8   1.25
## 3 101   2   3.5   1.5     4     2     4    9.0       3 3.50   10    5.1   0.50
## 4 101   3   3.5   1.5     4     2     4    7.5       3 3.00    9    5.6   1.00
## 5 101   4   3.5   1.5     4     2     4    8.0       4 2.75   18    4.3   1.25
## 6 101   5   3.5   1.5     4     2     4    8.0       3 2.75   19    3.9   1.25
##   posaff_trait slphrs_trait slpwell_trait stress_trait lteq_trait posaff_state
## 1       4.5625       6.9375          2.75       1.0625     13.875      -0.6625
## 2       4.5625       6.9375          2.75       1.0625     13.875      -0.7625
## 3       4.5625       6.9375          2.75       1.0625     13.875       0.5375
## 4       4.5625       6.9375          2.75       1.0625     13.875       1.0375
## 5       4.5625       6.9375          2.75       1.0625     13.875      -0.2625
## 6       4.5625       6.9375          2.75       1.0625     13.875      -0.6625
##   slphrs_state slpwell_state stress_state lteq_state
## 1      -0.9375         -1.75       0.4375     -3.875
## 2      -4.9375         -1.75       0.1875     -3.875
## 3       2.0625          0.25      -0.5625     -3.875
## 4       0.5625          0.25      -0.0625     -4.875
## 5       1.0625          1.25       0.1875      4.125
## 6       1.0625          0.25       0.1875      5.125

Plotting the Data

Before we begin running our models, it is always a good idea to look at our data.

Our generalized linear mixed effects regression tree has a number of predictors, but let’s focus on examining the association of positive affect and perceived stress on a subset of participants as an example. Plotting the outcome variable, posaff.

#Intraindividual change plot
amib %>%
  ggplot(aes(x = day, group = id, color = factor(id), 
             legend = FALSE)) +
  geom_line(aes(x = day, y = posaff), lty = 1, 
            linewidth = 0.5, alpha = .5) +
  xlab("Day") + 
  ylab("Positive Affect (outcome)") + 
  scale_x_continuous(breaks=seq(0,7,by=1)) +
  theme_classic() +
  theme(legend.position = "none")

Plotting select predictor-outcome relations

#time-invariant predictor
amib %>%
  ggplot(aes(x = bfi_n, y = posaff_trait,
           group = id, color = factor(id), 
           legend = FALSE)) +
  geom_point(alpha = .5) +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=TRUE,
              lty=1, linewidth=1, color="black") +
  xlab("Neuroticism") + 
  ylab("Positive Affect (outcome)") + 
  theme_classic() +
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'

#time-invariant predictor
amib %>%
  ggplot(aes(x = slphrs_trait, y = posaff_trait,
           group = id, color = factor(id), 
           legend = FALSE)) +
  geom_point(alpha = .5) +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=TRUE, lty=1, 
              linewidth=1, color="black") +
  xlab("Sleep Hours (Trait)") + 
  ylab("Positive Affect (outcome)") + 
  theme_classic() +
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'

#time-varying predictor
amib %>%
  ggplot(aes(x = stress_state, y = posaff,
           group = id, color = factor(id), 
           legend = FALSE)) +
  geom_point(color="gray40",alpha = .2) +
  geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1, 
              linewidth=.5, color="gray40") +
  xlab("Stress (State)") + 
  ylab("Positive Affect (outcome)") + 
  theme_classic() +
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'

#time-varying predictor
amib %>%
  ggplot(aes(x = lteq_state, y = posaff,
           group = id, color = factor(id), 
           legend = FALSE)) +
  geom_point(color="gray40",alpha = .2) +
  geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1, 
              linewidth=.5, color="gray40") +
  xlab("Physical Activity (State)") + 
  ylab("Positive Affect (outcome)") + 
  theme_classic() +
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'

There are many more predictors, both time-invariant and time-varying that could be plotted. All will be used as predictors in the tree model.

3. Generalized Linear Mixed Effects Regression Tree Model

We’ll construct a generalized linear mixed effects regression tree looking at how people’s personality (as measured by the Big 5 Inventory) and daily behaviors (sleep hours and quality, perceived stress, physical activity) predict their daily negative affect.

The general form used to specify the model is:

model_object <- lmertree (response ~ node-specific predictors| global predictors| partitioning/splitting variables, data = “your data frame name” )

Other things one can specify and should consider include:

Parameters that inhibit tree growth (e.g., minimum number of observations needed in node in order to split)

Now, let’s run GLMERTree on the AMIB data.

#setting seed for replication
set.seed(1234)

# Compared to the code in the lecture, we removed the day as a level in the MLM model, and put day as a predictor and splitting variable 
model <- lmertree(posaff ~  1  | (1 | id) | 
                    day + 
                    slphrs_trait + stress_trait + 
                    slphrs_state + stress_state + 
                    bfi_e + bfi_a + bfi_c + 
                    bfi_n + bfi_o,
                  data     = amib,
                  REML     = TRUE,
                  alpha    = 1 - .999999, 
                  minsplit = 150) 

# Print statistical output
print(model)

## Linear mixed model tree
## 
## Model formula:
## posaff ~ 1 | day + slphrs_trait + stress_trait + slphrs_state + 
##     stress_state + bfi_e + bfi_a + bfi_c + bfi_n + bfi_o
## 
## Fitted party:
## [1] root
## |   [2] stress_state <= 0.27083
## |   |   [3] stress_state <= -0.46429: n = 223
## |   |       (Intercept) 
## |   |          4.811067 
## |   |   [4] stress_state > -0.46429
## |   |   |   [5] day <= 2: n = 292
## |   |   |       (Intercept) 
## |   |   |          4.444144 
## |   |   |   [6] day > 2: n = 522
## |   |   |       (Intercept) 
## |   |   |          4.060112 
## |   [7] stress_state > 0.27083
## |   |   [8] stress_state <= 0.60714: n = 208
## |   |       (Intercept) 
## |   |          3.665668 
## |   |   [9] stress_state > 0.60714: n = 161
## |   |       (Intercept) 
## |   |          3.161255 
## 
## Number of inner nodes:    4
## Number of terminal nodes: 5
## Number of parameters per node: 1
## Objective function (residual sum of squares): 634.208
## 
## Random effects:
## $id
##      (Intercept)
## 101  0.279080373
## 102  0.573327740
## 103 -0.509685380
## 104 -0.444145669
## 105 -0.038992879
## 106  0.611704117
## 107  0.206803902
## 108 -1.603481556
## 109 -0.072286400
## 110  0.864504241
## 111 -0.211285401
## 112  0.711664941
## 113  1.550677435
## 114  0.152220818
## 115 -0.361778009
## 116  0.642562232
## 117  0.348451842
## 118 -1.035254113
## 119  0.776023453
## 120  0.224981359
## 121  0.002676463
## 122 -0.278687528
## 123  1.215575273
## 124  0.527537378
## 125 -0.856325414
## 126 -0.132107725
## 127  0.140745566
## 128 -1.104624914
## 129  0.534245596
## 130  0.774495025
## 201  0.396882559
## 202 -0.977779686
## 203  0.060392288
## 204 -0.793537887
## 205  0.140556117
## 206 -0.581779615
## 207  0.085905291
## 208  0.168138651
## 209 -0.100589268
## 210 -0.560559644
## 211 -0.577790263
## 212  0.454111617
## 213  0.457378179
## 214  0.058489525
## 215  1.161071001
## 218  1.429349276
## 219 -0.786499632
## 220  0.102659734
## 221 -0.096449854
## 222  1.084362268
## 223 -0.110403683
## 224 -0.246015408
## 225 -0.149682376
## 226 -0.245275853
## 227 -0.089237829
## 228 -1.638429006
## 229  0.388753794
## 230  0.535004181
## 231  0.223747773
## 233 -0.370967299
## 234  0.877158289
## 237 -1.185061624
## 238 -0.056424078
## 239 -0.074493372
## 240 -1.192727212
## 241 -0.646751568
## 242 -0.107743378
## 243 -0.534470031
## 244  0.584945831
## 245  1.396550509
## 246 -0.480548822
## 247 -0.075468028
## 248 -1.542859823
## 249  0.982558653
## 301 -0.051411467
## 302 -0.085408557
## 303  1.834887241
## 304 -0.595071054
## 305 -0.165561567
## 306 -0.648027400
## 307 -0.879989234
## 308 -0.007009034
## 309 -0.128470832
## 310  0.520972842
## 311 -0.672311532
## 312 -0.475982881
## 313 -0.886008026
## 314 -0.969813123
## 315  0.025491746
## 316  0.161850554
## 317  0.263593075
## 318  1.188587615
## 319 -0.118694536
## 320 -0.611822128
## 321 -1.098620517
## 322 -0.495226179
## 323 -1.116388635
## 324  0.049457850
## 325  0.109682712
## 326  0.103091085
## 327  0.002795710
## 328 -0.239499393
## 329  0.070445440
## 330 -0.832353546
## 331 -0.946142432
## 332 -0.975890413
## 333  0.057514870
## 334  0.613929964
## 335 -1.176381488
## 336  0.034540513
## 337 -0.493623307
## 338  0.131151866
## 339 -0.260582678
## 340  0.952960734
## 341  0.531502157
## 342 -0.280288673
## 343 -0.363729665
## 344 -0.307335275
## 345  0.035769194
## 346 -0.118694536
## 401 -0.095787463
## 402 -0.869340677
## 403  0.067280681
## 404  0.347302124
## 405  1.021182484
## 406  0.104543988
## 407 -0.365717374
## 408  0.132252635
## 409 -0.365093079
## 410 -0.504911677
## 411 -0.037380128
## 412 -0.196481777
## 413 -0.511542479
## 414 -0.339454055
## 415  0.242791722
## 417 -0.241213603
## 418 -1.196102922
## 419  1.026110402
## 420  0.802371921
## 421  0.037808875
## 422  0.056428682
## 423  0.224897491
## 424  0.405547256
## 425  0.696760751
## 426  1.418844256
## 427  0.010485752
## 428  1.793585980
## 429 -0.055936750
## 430  0.528151247
## 431  1.067688185
## 432  0.547808638
## 433  0.230050876
## 434 -0.544365780
## 435  0.332547996
## 436  0.008210956
## 437 -0.359677834
## 438  0.253345692
## 439  0.486975753
## 440  0.376461602
## 441  0.637655356
## 442 -0.058975395
## 501 -0.339081877
## 502  0.184595194
## 503  0.258634825
## 504  0.988383717
## 505  0.325219529
## 506  0.211536393
## 507  0.099254855
## 508  0.123636541
## 509  0.183882552
## 510 -0.348646376
## 511  0.405547256
## 512  0.115308764
## 513 -0.587531780
## 514 -0.700195242
## 515  0.614056423
## 516  0.640278431
## 517  0.989947864
## 518  0.669750238
## 519 -0.072393293
## 520  0.656734974
## 521 -1.340866671
## 524 -0.188531520
## 525 -0.215804047
## 526 -0.491559963
## 527 -0.536615115
## 528  0.380283443
## 530 -0.708444243
## 531 -0.819549435
## 532 -0.155921868
## 
## with conditional variances for "id"

# Print fit statistics,AIC and BIC, for the model
AIC(model)

## [1] 3477.964

BIC(model)

## [1] 3535.697

Based upon the above plot, it appears that daily perceived stress (stress_state), and day (which is a proxy for day of week) are predictive of daily positive affect. The values in the terminal nodes represent the average positive affect for that node.

Also, note this tree has a depth of 2 (root node is counted as zero) and 5 resulting nodes.

4. Affect Reactivity Model

Now let’s try a more complicated mode - with a within-person predictor included in each node. We use a model of “affect reactivity”.

model2 <- lmertree(posaff ~  1 + stress_state | ((1 + stress_state) | id) | 
                    day + 
                    slphrs_trait + stress_trait + 
                    slphrs_state +  
                    bfi_e + bfi_a + bfi_c + 
                    bfi_n + bfi_o,
                  data     = amib,
                  REML     = TRUE,
                  alpha    = 1 - .999999, 
                  minsplit = 150) 

# visualize the tree results
plot(model2, which = "tree")

# Print statistical output
print(model2)

## Linear mixed model tree
## 
## Model formula:
## posaff ~ 1 + stress_state | day + slphrs_trait + stress_trait + 
##     slphrs_state + bfi_e + bfi_a + bfi_c + bfi_n + bfi_o
## 
## Fitted party:
## [1] root
## |   [2] stress_trait <= 1.4375
## |   |   [3] bfi_n <= 3.5
## |   |   |   [4] stress_trait <= 1.125
## |   |   |   |   [5] day <= 3: n = 178
## |   |   |   |        (Intercept) stress_state 
## |   |   |   |          4.9786503   -0.8702966 
## |   |   |   |   [6] day > 3: n = 177
## |   |   |   |        (Intercept) stress_state 
## |   |   |   |           4.464807    -1.007018 
## |   |   |   [7] stress_trait > 1.125: n = 253
## |   |   |        (Intercept) stress_state 
## |   |   |          4.2218336   -0.8098648 
## |   |   [8] bfi_n > 3.5: n = 154
## |   |        (Intercept) stress_state 
## |   |          3.8938327   -0.6968044 
## |   [9] stress_trait > 1.4375
## |   |   [10] stress_trait <= 1.84375: n = 450
## |   |        (Intercept) stress_state 
## |   |           3.947909    -1.125616 
## |   |   [11] stress_trait > 1.84375: n = 194
## |   |        (Intercept) stress_state 
## |   |          3.3738649   -0.9798695 
## 
## Number of inner nodes:    5
## Number of terminal nodes: 6
## Number of parameters per node: 2
## Objective function (residual sum of squares): 549.9184
## 
## Random effects:
## $id
##      (Intercept) stress_state
## 101 -0.135337162 -0.063284084
## 102 -0.065358275 -0.361673520
## 103 -0.581648716 -0.221017144
## 104 -0.349063359 -0.203780992
## 105  0.104286959 -0.042316654
## 106 -0.083880451 -0.243778920
## 107  0.095725501 -0.018292324
## 108 -0.950842209  0.180663632
## 109  0.013424980 -0.117809780
## 110  0.169649634  0.009780413
## 111  0.324701736  0.097359385
## 112  0.154058888 -0.080920540
## 113  0.866331242  0.129909810
## 114  0.091574861  0.068154772
## 115 -0.075615143 -0.005746730
## 116  0.381832915  0.123914604
## 117  0.463474547  0.075810678
## 118 -0.810695403 -0.079638599
## 119  0.083395381  0.197833516
## 120  0.355949841  0.095034776
## 121 -0.037679717  0.075297310
## 122  0.427672428 -0.020895591
## 123  1.219351167  0.123285674
## 124  0.523080661 -0.030882025
## 125 -0.282009196  0.221972655
## 126  0.127836893  0.087645834
## 127  0.271585980  0.128333621
## 128 -0.443825003  0.137778365
## 129 -0.080874133  0.064006949
## 130  0.261786502  0.027983859
## 201  0.586419495  0.022387156
## 202 -0.840830278 -0.039403985
## 203  0.087135210 -0.190042806
## 204 -0.387336158  0.175271247
## 205  0.269230629  0.170809188
## 206 -0.671049775  0.097051000
## 207  0.281438393 -0.119984721
## 208  0.171669229  0.252448427
## 209 -0.148029328  0.141617293
## 210 -1.044302640  0.024276688
## 211 -0.978363060 -0.073705797
## 212 -0.250777940  0.183222857
## 213  0.402737141  0.098512232
## 214  0.064374357  0.002060094
## 215  0.729158294  0.222812190
## 218  0.841167437  0.066953885
## 219 -0.182968935 -0.024951552
## 220  0.386770268  0.035346352
## 221 -0.161720924  0.060421646
## 222  0.706648648  0.146647053
## 223  0.632662119  0.155152387
## 224 -0.213710379 -0.432147485
## 225 -0.103923581 -0.035348973
## 226 -0.063243221 -0.060644720
## 227  0.502646935  0.093713165
## 228 -0.858251422  0.150091666
## 229  0.500102959  0.467972344
## 230 -0.017462687 -0.065614703
## 231 -0.278870942  0.129439178
## 233 -0.387979522  0.049293773
## 234  0.916210236  0.102679396
## 237 -0.426229812  0.005714452
## 238  0.122042489  0.139879483
## 239 -0.001730961  0.172712344
## 240 -0.643308547 -0.366049110
## 241  0.059757077 -0.092869946
## 242  0.329455496 -0.385246545
## 243 -0.584092027 -0.203058232
## 244 -0.024167681 -0.024033469
## 245  0.777089907  0.035505631
## 246 -0.565106119 -0.077432087
## 247  0.095090895 -0.011785347
## 248 -1.255456455  0.019139898
## 249  0.861671873  0.104843711
## 301 -0.232722572 -0.149694057
## 302  0.068955894 -0.143155507
## 303  1.096543966  0.158182976
## 304 -0.969974000 -0.117710322
## 305 -0.023132434  0.158984028
## 306 -0.467475884 -0.087350695
## 307 -0.753695419 -0.172181017
## 308  0.181780134  0.043281894
## 309  0.088196027 -0.202851362
## 310 -0.150159023  0.093093076
## 311 -0.517200504 -0.202352241
## 312  0.090943546 -0.003862108
## 313 -0.596844371  0.149988026
## 314 -0.271365087 -0.031650553
## 315  0.605407109 -0.004053210
## 316  0.041565230 -0.068931222
## 317 -0.285139363  0.070894421
## 318  1.264724588  0.149856026
## 319  0.322279658  0.024493163
## 320 -0.579928255 -0.144296937
## 321 -0.955297624  0.050111035
## 322 -0.385797279  0.052598950
## 323 -0.408649995 -0.075536386
## 324  0.287013775  0.033319670
## 325 -0.415619634  0.048019523
## 326  0.016820330  0.187491934
## 327 -0.169061941 -0.122023802
## 328 -0.083104825 -0.120581502
## 329  0.316222158  0.043284179
## 330 -0.571122327 -0.072463675
## 331 -0.309443202  0.058318491
## 332 -0.749251146 -0.010639817
## 333 -0.051909081  0.198762599
## 334  0.570334254  0.056303889
## 335 -0.496862922  0.077591603
## 336  0.108023125 -0.014232135
## 337 -0.352290537 -0.398257461
## 338 -0.025626005  0.074732837
## 339 -0.386142527 -0.076422341
## 340  0.359740380  0.064060448
## 341  0.242267774 -0.003414273
## 342 -0.324582329 -0.100953906
## 343 -0.280505512  0.120514264
## 344 -0.125203007 -0.046044069
## 345  0.133820797  0.089045748
## 346 -0.303755588 -0.023085339
## 401  0.116503354  0.002216007
## 402 -0.660775192  0.115602774
## 403  0.223778004  0.077500392
## 404 -0.191842142 -0.083641389
## 405  0.293451740  0.066214116
## 406  0.240924146 -0.014528548
## 407 -0.212329752  0.038713683
## 408 -0.445678831  0.255310547
## 409  0.247328420 -0.094621211
## 410 -0.334201927 -0.030384334
## 411 -0.505051749 -0.057709234
## 412  0.428571970  0.070213665
## 413 -0.271560054 -0.056318707
## 414 -0.253195685  0.067037190
## 415  0.386522366 -0.023112937
## 417  0.319122703 -0.123886252
## 418 -0.876098230 -0.138674309
## 419  0.457333304  0.195500097
## 420  0.333289864  0.095374473
## 421  0.239118162  0.199667557
## 422  0.013053111  0.023453393
## 423  0.361750208 -0.120612496
## 424 -0.127075373  0.009213648
## 425  0.683661949  0.085143797
## 426  1.206978842  0.137077634
## 427  0.089321972 -0.155159719
## 428  1.089631008  0.112624473
## 429  0.113459717  0.101174198
## 430  0.403463123  0.116554981
## 431  0.562925249 -0.030115585
## 432  0.042755560 -0.041526393
## 433 -0.252503466 -0.184578932
## 434 -0.298666748 -0.370532622
## 435  0.204460107 -0.165803813
## 436  0.635447695  0.224381980
## 437 -0.373390480 -0.018577185
## 438  0.453186349  0.079199300
## 439 -0.068562908  0.218877452
## 440  0.487074830  0.057021708
## 441  0.457945558  0.040789973
## 442 -0.163927317  0.005586409
## 501 -0.055579299  0.129825861
## 502  0.202813133 -0.184457117
## 503  0.432106947 -0.185604647
## 504  0.899841447  0.039426446
## 505 -0.202929471  0.070412139
## 506 -0.328648124 -0.207470488
## 507 -0.378908304  0.050330353
## 508  0.137369623  0.002642945
## 509  0.229590854 -0.179604432
## 510 -0.745346029 -0.142075482
## 511  0.289351770  0.130438816
## 512 -0.513294232 -0.300252766
## 513 -0.464747433  0.064937842
## 514 -0.572465700 -0.280451713
## 515  0.751026310 -0.031938839
## 516  0.773099401  0.257561136
## 517  0.421350397  0.034183894
## 518  0.100934815 -0.282915717
## 519 -0.692978505 -0.258251651
## 520  0.189619498 -0.138319333
## 521 -0.647059850  0.217211224
## 524  0.304105220 -0.149869587
## 525 -0.304460879 -0.119037858
## 526 -0.230788080 -0.079337580
## 527 -0.410864353  0.108687770
## 528  0.912344453 -0.250163950
## 530 -0.445171037  0.285295402
## 531 -0.221630386 -0.090999830
## 532 -0.311116035 -0.284437663
## 
## with conditional variances for "id"

model2$lmer

## Linear mixed model fit by REML ['lmerMod']
## Formula: posaff ~ .tree + .tree:stress_state + ((1 + stress_state) | id)
##    Data: data
## Weights: .weights
## REML criterion at convergence: 3293.491
## Random effects:
##  Groups   Name         Std.Dev. Corr
##  id       (Intercept)  0.5412       
##           stress_state 0.3031   0.14
##  Residual              0.6767       
## Number of obs: 1406, groups:  id, 190
## Fixed Effects:
##          (Intercept)                .tree6                .tree7  
##               4.9787               -0.5138               -0.7568  
##               .tree8               .tree10               .tree11  
##              -1.0848               -1.0307               -1.6048  
##  .tree5:stress_state   .tree6:stress_state   .tree7:stress_state  
##              -0.8703               -1.0070               -0.8099  
##  .tree8:stress_state  .tree10:stress_state  .tree11:stress_state  
##              -0.6968               -1.1256               -0.9799

# Print fit statistics,AIC and BIC, for the model
AIC(model2)

## [1] 3335.491

BIC(model2)

## [1] 3445.709

5. Conclusion

In sum, this tutorial outlines how to run a generalized linear mixed effects regression tree using the glmertree package.

We provide brief explanation of

the underlying model,
the code to run this model in R, and
the interpretation of the results.

More detailed information about these (and related analyses) can be found in Fokkema, Smits, Zeileis, Hothorn & Kelderman (2018).

Citations

Brandmaier, A. M., von Oertzen, T., McArdle, J. J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71–86. https://doi.org/10.1037/a0030001

Fokkema, M., Smits, N., Zeileis, A., Hothorn, T., & Kelderman, H. (2018). Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees. Behavior Research Methods, 50(5), 2016–2034. https://doi.org/10.3758/s13428-017-0971-x

Hajjem, A., Bellavance, F., & Larocque, D. (2011). Mixed effects regression trees for clustered data. Statistics & Probability Letters, 81(4), 451–459. https://doi.org/10.1016/j.spl.2010.12.003

Hajjem, A., Bellavance, F., & Larocque, D. (2014). Mixed-effects random forest for clustered data. Journal of Statistical Computation and Simulation, 84(6), 1313–1328. https://doi.org/10.1080/00949655.2012.741599

Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3), 355–373. https://doi.org/10.1093/biostatistics/kxj011

Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651–674. https://doi.org/10.1198/106186006X133933

Hothorn, T., & Zeileis, A. (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16(118), 3905–3909.

Milborrow, S. (2024). Rpart.plot: Plot “rpart” Models: An Enhanced Version of “plot.rpart” (Version 3.1.2). https://CRAN.R-project.org/package=rpart.plot

R Core Team. (2024). R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. https://www.R-project.org/

Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University. https://CRAN.R-project.org/package=psych

Stegmann, G., Jacobucci, R., Serang, S., & Grimm, K. J. (2018). Recursive Partitioning with Nonlinear Models of Change. Multivariate Behavioral Research, 53(4), 559–570. https://doi.org/10.1080/00273171.2018.1461602

Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1), 307. https://doi.org/10.1186/1471-2105-9-307

Strobl, C., Boulesteix, A.-L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8(1), 25. https://doi.org/10.1186/1471-2105-8-25

Therneau, T., & Atkinson, B. (2025). rpart: Recursive Partitioning and Regression Trees (Version 4.1.24) . https://CRAN.R-project.org/package=rpart

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag. https://ggplot2.tidyverse.org/

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation (Version 1.1.4). https://CRAN.R-project.org/package=dplyr

Williams, G. (2011). Data Mining with {Rattle} and {R}: The art of excavating data for knowledge discovery. Springer. https://rd.springer.com/book/10.1007/978-1-4419-9890-3

Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514. https://doi.org/10.1198/106186008X319331