ESM Data Mining using GLMER Trees
APA-ATI Intensive Longitudinal Data
Overview
Repeated measures data, often obtained from experience sampling or daily diary studies, require analytic methods that accommodate the inherent nesting in the data. Multilevel and structural equation modeling approaches are typically used to analyze repeated measures data. However, data mining and machine learning methods, particularly decision trees, are now being modified to accommodate repeated measures data (e.g., SEM trees, Brandmaier et al., 2013; mixed-effects regression trees, Hajjem et al., 2011, 2014; nonlinear longitudinal recursive partitioning, Stegmann et al., 2018).
In this tutorial, we illustrate how generalized linear mixed effects regression trees (glmer trees; Fokkema, Smits, Zeileis, Hothorn & Kelderman, 2018) can be applied to EMA-type data for knowledge discovery.
Generalized linear mixed effects regression trees have a continuous outcome variable (e.g., negative affect measured on a Likert or slider scale). Generalized linear mixed effects regression trees can accommodate a sizable number of predictor variables, and these predictors can be categorical, interval, or continuous.
Ultimately, the model aims to sort each instance (or data point) into a node, with the instances within a node being as similar as possible. Note that nodes are compromised of observations, not individuals. To determine the composition of a node, generalized linear mixed effects regression trees use a recursive algorithm to minimize the sum of squares of the node.
In our example, we will use data from an experience sampling study that examines daily interactions, emotions, and behaviors. Specifically, we will use the AMIB data set that followed 190 adults for eight days - thus repeated measures across days are nested within individuals.
Outline
In this tutorial, we will cover…
The Research Questions
The Data
Generalized Linear Mixed Effects Regression Tree Model
Affect Reactivity Model
Conclusion
1. The Research Questions
We are going to address:
- Are people’s personality, patterns of daily behaviors (e.g., sleep hours and quality, perceived stress, physical activity), and positive affect predictive of their daily negative affect?
- What variables are best at distinguishing differences in daily negative affect?
- Furthermore, is it possible to specific a random slope multilevel model using generalized linear mixed effect regression trees?
Notice that no pattern of association is hypothesized between the predictors and the outcome; this is one advantage of regression trees - the different splits allow for higher-order interactions and thus non-linear associations between the predictors and outcome variables.
2. The Data
The data are organized as follows:
- There are N (number of individuals) = 190 individuals, a majority of whom completed all eight days of the daily diary study. In total, we have 1,418 rows of data - an average of 7.46 reports per individual.
The data set has additional columns we will not use, so we will subset to only the variables we need, which are an ID variable, time variable, predictors variables, and outcome variable. Specifically…
- Columns:
Participant ID (
id)Time (e.g., day within the daily diary study;
day)Big Five Inventory: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism
Sleep hours (
slphrs)Sleep quality (
slpwell)Physical activity (
lteq)Perceived stress (
pss)Positive affect (
posaff)Negative affect (
negaff)for a total of 8 columns in this data set
Loading Libraries
Loading libraries used in this script.
library(psych) #describing the data
library(ggplot2) #data visualization
library(party) #tree models; alternative decision tree algorithm
library(partykit) #tree models; updated party functions
library(rattle) #tree models; fancy tree plot
library(glmertree) #generalized linear mixed effects regression tree models
library(dplyr) #data manipulationLoading Data
Load and merge person-level and day-level data.
# Set filepath for person-level data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/AMIB/AMIB_persons.csv"
# Read in the .csv file using the url() function
AMIB_persons <- read.csv(file = url(filepath), header = TRUE)
# Set filepath for daily data file
filepath1 <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/AMIB/AMIB_daily.csv"
# Read in the .csv file using the url() function
AMIB_daily <- read.csv(file = url(filepath1), header = TRUE)
# Merge daily and person-level data
amib <- merge(AMIB_persons, AMIB_daily, by = "id")Data Preparation
Reduce to the variables of interest.
amib <- amib %>%
select(id, day, bfi_e, bfi_a, bfi_c, bfi_n, bfi_o, slphrs, slpwell, pss, lteq, posaff)
head(amib, 10)## id day bfi_e bfi_a bfi_c bfi_n bfi_o slphrs slpwell pss lteq posaff
## 1 101 0 3.5 1.5 4.0 2 4.0 6.0 1 2.50 10 3.9
## 2 101 1 3.5 1.5 4.0 2 4.0 2.0 1 2.75 10 3.8
## 3 101 2 3.5 1.5 4.0 2 4.0 9.0 3 3.50 10 5.1
## 4 101 3 3.5 1.5 4.0 2 4.0 7.5 3 3.00 9 5.6
## 5 101 4 3.5 1.5 4.0 2 4.0 8.0 4 2.75 18 4.3
## 6 101 5 3.5 1.5 4.0 2 4.0 8.0 3 2.75 19 3.9
## 7 101 6 3.5 1.5 4.0 2 4.0 8.0 4 3.50 21 5.1
## 8 101 7 3.5 1.5 4.0 2 4.0 7.0 3 2.75 14 4.8
## 9 102 0 5.0 4.5 3.5 2 2.5 7.0 3 3.50 12 6.3
## 10 102 1 5.0 4.5 3.5 2 2.5 6.0 3 4.00 20 7.0
Reverse code pss so higher values indicate higher
perceived stress.
Split the variables into “trait” (between-person differences) and “state” (within-person deviations) components.
Specifically, the time-varying predictors, sleep hours, sleep quality, perceived stress, physical activity, and positive affect are split into two variables:
VARIABLE_trait is the sample-mean centered between-person component,
VARIABLE_state is the person-centered within-person component.
For convenience of plotting we also split the outcome variable,
posaff.
Calculate the Trait Variables
# Calculate intraindividual means
amib_imeans <- amib %>%
group_by(id) %>%
summarize(posaff_trait = mean(posaff, na.rm=TRUE),
slphrs_trait = mean(slphrs, na.rm=TRUE),
slpwell_trait = mean(slpwell, na.rm=TRUE),
stress_trait = mean(stress, na.rm=TRUE),
lteq_trait = mean(lteq, na.rm=TRUE))
# Merge into long data set
amib <- merge(amib, amib_imeans, by="id") Calculate the State Variables
#calculate state variables
amib <- amib %>%
mutate(posaff_state = posaff - posaff_trait,
slphrs_state = slphrs - slphrs_trait,
slpwell_state = slpwell - slpwell_trait,
stress_state = stress - stress_trait,
lteq_state = lteq - lteq_trait)
# Describe data
describe(amib)## vars n mean sd median trimmed mad min max
## id 1 1458 322.53 129.08 324.00 324.20 151.23 101.00 532.00
## day 2 1458 3.48 2.30 3.00 3.47 2.97 0.00 7.00
## bfi_e 3 1458 3.39 1.00 3.50 3.41 0.74 1.00 5.00
## bfi_a 4 1458 3.61 0.88 3.50 3.69 0.74 1.00 5.00
## bfi_c 5 1458 3.76 0.85 4.00 3.78 0.74 1.50 5.00
## bfi_n 6 1458 2.97 0.96 3.00 2.98 1.48 1.00 5.00
## bfi_o 7 1458 3.60 0.96 3.50 3.65 0.74 1.00 5.00
## slphrs 8 1428 7.17 1.81 7.00 7.18 1.48 0.00 18.00
## slpwell 9 1439 2.49 1.09 3.00 2.56 1.48 0.00 4.00
## pss 10 1445 2.61 0.68 2.75 2.64 0.74 0.00 4.00
## lteq 11 1433 12.44 10.37 9.00 11.18 8.90 0.00 58.00
## posaff 12 1441 4.12 1.10 4.20 4.15 1.19 1.00 7.00
## stress 13 1445 1.39 0.68 1.25 1.36 0.74 0.00 4.00
## posaff_trait 14 1458 4.11 0.73 4.10 4.11 0.63 2.22 6.04
## slphrs_trait 15 1458 7.16 0.94 7.20 7.20 0.91 4.12 9.31
## slpwell_trait 16 1458 2.49 0.65 2.50 2.50 0.56 0.38 4.00
## stress_trait 17 1458 1.39 0.47 1.41 1.39 0.51 0.19 2.56
## lteq_trait 18 1458 12.60 8.35 10.75 11.88 8.52 0.00 45.33
## posaff_state 19 1441 0.00 0.82 0.05 0.01 0.70 -3.17 3.29
## slphrs_state 20 1428 0.00 1.55 0.00 -0.01 1.30 -8.75 9.25
## slpwell_state 21 1439 0.00 0.88 0.12 0.03 0.74 -3.29 2.57
## stress_state 22 1445 0.00 0.49 -0.03 -0.02 0.46 -1.75 2.12
## lteq_state 23 1433 0.00 6.44 -0.25 -0.11 5.37 -23.75 28.88
## range skew kurtosis se
## id 431.00 -0.07 -1.06 3.38
## day 7.00 0.01 -1.24 0.06
## bfi_e 4.00 -0.20 -0.58 0.03
## bfi_a 4.00 -0.69 0.10 0.02
## bfi_c 3.50 -0.11 -0.90 0.02
## bfi_n 4.00 -0.08 -0.79 0.03
## bfi_o 4.00 -0.36 -0.43 0.03
## slphrs 18.00 0.10 1.83 0.05
## slpwell 4.00 -0.52 -0.36 0.03
## pss 4.00 -0.35 0.13 0.02
## lteq 58.00 1.07 0.94 0.27
## posaff 6.00 -0.25 -0.33 0.03
## stress 4.00 0.35 0.13 0.02
## posaff_trait 3.81 0.05 0.09 0.02
## slphrs_trait 5.19 -0.40 0.28 0.02
## slpwell_trait 3.62 -0.31 0.55 0.02
## stress_trait 2.38 -0.08 -0.21 0.01
## lteq_trait 45.33 0.85 0.60 0.22
## posaff_state 6.46 -0.14 0.82 0.02
## slphrs_state 18.00 0.19 3.31 0.04
## slpwell_state 5.86 -0.31 0.34 0.02
## stress_state 3.88 0.36 0.79 0.01
## lteq_state 52.62 0.18 1.52 0.17
Note that we did not sample-center the person-level variables. Tree models generally work with the raw data. The state variables are by definition person-centered.
However, the data cannot have missingness. Thus, we reduce to only the complete cases.
Reduce to Complete Cases
## id day bfi_e bfi_a bfi_c bfi_n bfi_o slphrs slpwell pss lteq posaff stress
## 1 101 0 3.5 1.5 4 2 4 6.0 1 2.50 10 3.9 1.50
## 2 101 1 3.5 1.5 4 2 4 2.0 1 2.75 10 3.8 1.25
## 3 101 2 3.5 1.5 4 2 4 9.0 3 3.50 10 5.1 0.50
## 4 101 3 3.5 1.5 4 2 4 7.5 3 3.00 9 5.6 1.00
## 5 101 4 3.5 1.5 4 2 4 8.0 4 2.75 18 4.3 1.25
## 6 101 5 3.5 1.5 4 2 4 8.0 3 2.75 19 3.9 1.25
## posaff_trait slphrs_trait slpwell_trait stress_trait lteq_trait posaff_state
## 1 4.5625 6.9375 2.75 1.0625 13.875 -0.6625
## 2 4.5625 6.9375 2.75 1.0625 13.875 -0.7625
## 3 4.5625 6.9375 2.75 1.0625 13.875 0.5375
## 4 4.5625 6.9375 2.75 1.0625 13.875 1.0375
## 5 4.5625 6.9375 2.75 1.0625 13.875 -0.2625
## 6 4.5625 6.9375 2.75 1.0625 13.875 -0.6625
## slphrs_state slpwell_state stress_state lteq_state
## 1 -0.9375 -1.75 0.4375 -3.875
## 2 -4.9375 -1.75 0.1875 -3.875
## 3 2.0625 0.25 -0.5625 -3.875
## 4 0.5625 0.25 -0.0625 -4.875
## 5 1.0625 1.25 0.1875 4.125
## 6 1.0625 0.25 0.1875 5.125
Plotting the Data
Before we begin running our models, it is always a good idea to look at our data.
Our generalized linear mixed effects regression tree has a number of predictors, but let’s focus on examining the association of positive affect and perceived stress on a subset of participants as an example. Plotting the outcome variable, posaff.
#Intraindividual change plot
amib %>%
ggplot(aes(x = day, group = id, color = factor(id),
legend = FALSE)) +
geom_line(aes(x = day, y = posaff), lty = 1,
linewidth = 0.5, alpha = .5) +
xlab("Day") +
ylab("Positive Affect (outcome)") +
scale_x_continuous(breaks=seq(0,7,by=1)) +
theme_classic() +
theme(legend.position = "none")Plotting select predictor-outcome relations
#time-invariant predictor
amib %>%
ggplot(aes(x = bfi_n, y = posaff_trait,
group = id, color = factor(id),
legend = FALSE)) +
geom_point(alpha = .5) +
geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=TRUE,
lty=1, linewidth=1, color="black") +
xlab("Neuroticism") +
ylab("Positive Affect (outcome)") +
theme_classic() +
theme(legend.position = "none")## `geom_smooth()` using formula = 'y ~ x'
#time-invariant predictor
amib %>%
ggplot(aes(x = slphrs_trait, y = posaff_trait,
group = id, color = factor(id),
legend = FALSE)) +
geom_point(alpha = .5) +
geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=TRUE, lty=1,
linewidth=1, color="black") +
xlab("Sleep Hours (Trait)") +
ylab("Positive Affect (outcome)") +
theme_classic() +
theme(legend.position = "none")## `geom_smooth()` using formula = 'y ~ x'
#time-varying predictor
amib %>%
ggplot(aes(x = stress_state, y = posaff,
group = id, color = factor(id),
legend = FALSE)) +
geom_point(color="gray40",alpha = .2) +
geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1,
linewidth=.5, color="gray40") +
xlab("Stress (State)") +
ylab("Positive Affect (outcome)") +
theme_classic() +
theme(legend.position = "none")## `geom_smooth()` using formula = 'y ~ x'
#time-varying predictor
amib %>%
ggplot(aes(x = lteq_state, y = posaff,
group = id, color = factor(id),
legend = FALSE)) +
geom_point(color="gray40",alpha = .2) +
geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1,
linewidth=.5, color="gray40") +
xlab("Physical Activity (State)") +
ylab("Positive Affect (outcome)") +
theme_classic() +
theme(legend.position = "none")## `geom_smooth()` using formula = 'y ~ x'
There are many more predictors, both time-invariant and time-varying that could be plotted. All will be used as predictors in the tree model.
3. Generalized Linear Mixed Effects Regression Tree Model
We’ll construct a generalized linear mixed effects regression tree looking at how people’s personality (as measured by the Big 5 Inventory) and daily behaviors (sleep hours and quality, perceived stress, physical activity) predict their daily negative affect.
The general form used to specify the model is:
model_object <- lmertree (response ~ node-specific
predictors| global predictors| partitioning/splitting variables, data =
“your data frame name” )
Other things one can specify and should consider include:
- Parameters that inhibit tree growth (e.g., minimum number of observations needed in node in order to split)
Now, let’s run GLMERTree on the AMIB data.
#setting seed for replication
set.seed(1234)
# Compared to the code in the lecture, we removed the day as a level in the MLM model, and put day as a predictor and splitting variable
model <- lmertree(posaff ~ 1 | (1 | id) |
day +
slphrs_trait + stress_trait +
slphrs_state + stress_state +
bfi_e + bfi_a + bfi_c +
bfi_n + bfi_o,
data = amib,
REML = TRUE,
alpha = 1 - .999999,
minsplit = 150)
# Print statistical output
print(model)## Linear mixed model tree
##
## Model formula:
## posaff ~ 1 | day + slphrs_trait + stress_trait + slphrs_state +
## stress_state + bfi_e + bfi_a + bfi_c + bfi_n + bfi_o
##
## Fitted party:
## [1] root
## | [2] stress_state <= 0.27083
## | | [3] stress_state <= -0.46429: n = 223
## | | (Intercept)
## | | 4.811067
## | | [4] stress_state > -0.46429
## | | | [5] day <= 2: n = 292
## | | | (Intercept)
## | | | 4.444144
## | | | [6] day > 2: n = 522
## | | | (Intercept)
## | | | 4.060112
## | [7] stress_state > 0.27083
## | | [8] stress_state <= 0.60714: n = 208
## | | (Intercept)
## | | 3.665668
## | | [9] stress_state > 0.60714: n = 161
## | | (Intercept)
## | | 3.161255
##
## Number of inner nodes: 4
## Number of terminal nodes: 5
## Number of parameters per node: 1
## Objective function (residual sum of squares): 634.208
##
## Random effects:
## $id
## (Intercept)
## 101 0.279080373
## 102 0.573327740
## 103 -0.509685380
## 104 -0.444145669
## 105 -0.038992879
## 106 0.611704117
## 107 0.206803902
## 108 -1.603481556
## 109 -0.072286400
## 110 0.864504241
## 111 -0.211285401
## 112 0.711664941
## 113 1.550677435
## 114 0.152220818
## 115 -0.361778009
## 116 0.642562232
## 117 0.348451842
## 118 -1.035254113
## 119 0.776023453
## 120 0.224981359
## 121 0.002676463
## 122 -0.278687528
## 123 1.215575273
## 124 0.527537378
## 125 -0.856325414
## 126 -0.132107725
## 127 0.140745566
## 128 -1.104624914
## 129 0.534245596
## 130 0.774495025
## 201 0.396882559
## 202 -0.977779686
## 203 0.060392288
## 204 -0.793537887
## 205 0.140556117
## 206 -0.581779615
## 207 0.085905291
## 208 0.168138651
## 209 -0.100589268
## 210 -0.560559644
## 211 -0.577790263
## 212 0.454111617
## 213 0.457378179
## 214 0.058489525
## 215 1.161071001
## 218 1.429349276
## 219 -0.786499632
## 220 0.102659734
## 221 -0.096449854
## 222 1.084362268
## 223 -0.110403683
## 224 -0.246015408
## 225 -0.149682376
## 226 -0.245275853
## 227 -0.089237829
## 228 -1.638429006
## 229 0.388753794
## 230 0.535004181
## 231 0.223747773
## 233 -0.370967299
## 234 0.877158289
## 237 -1.185061624
## 238 -0.056424078
## 239 -0.074493372
## 240 -1.192727212
## 241 -0.646751568
## 242 -0.107743378
## 243 -0.534470031
## 244 0.584945831
## 245 1.396550509
## 246 -0.480548822
## 247 -0.075468028
## 248 -1.542859823
## 249 0.982558653
## 301 -0.051411467
## 302 -0.085408557
## 303 1.834887241
## 304 -0.595071054
## 305 -0.165561567
## 306 -0.648027400
## 307 -0.879989234
## 308 -0.007009034
## 309 -0.128470832
## 310 0.520972842
## 311 -0.672311532
## 312 -0.475982881
## 313 -0.886008026
## 314 -0.969813123
## 315 0.025491746
## 316 0.161850554
## 317 0.263593075
## 318 1.188587615
## 319 -0.118694536
## 320 -0.611822128
## 321 -1.098620517
## 322 -0.495226179
## 323 -1.116388635
## 324 0.049457850
## 325 0.109682712
## 326 0.103091085
## 327 0.002795710
## 328 -0.239499393
## 329 0.070445440
## 330 -0.832353546
## 331 -0.946142432
## 332 -0.975890413
## 333 0.057514870
## 334 0.613929964
## 335 -1.176381488
## 336 0.034540513
## 337 -0.493623307
## 338 0.131151866
## 339 -0.260582678
## 340 0.952960734
## 341 0.531502157
## 342 -0.280288673
## 343 -0.363729665
## 344 -0.307335275
## 345 0.035769194
## 346 -0.118694536
## 401 -0.095787463
## 402 -0.869340677
## 403 0.067280681
## 404 0.347302124
## 405 1.021182484
## 406 0.104543988
## 407 -0.365717374
## 408 0.132252635
## 409 -0.365093079
## 410 -0.504911677
## 411 -0.037380128
## 412 -0.196481777
## 413 -0.511542479
## 414 -0.339454055
## 415 0.242791722
## 417 -0.241213603
## 418 -1.196102922
## 419 1.026110402
## 420 0.802371921
## 421 0.037808875
## 422 0.056428682
## 423 0.224897491
## 424 0.405547256
## 425 0.696760751
## 426 1.418844256
## 427 0.010485752
## 428 1.793585980
## 429 -0.055936750
## 430 0.528151247
## 431 1.067688185
## 432 0.547808638
## 433 0.230050876
## 434 -0.544365780
## 435 0.332547996
## 436 0.008210956
## 437 -0.359677834
## 438 0.253345692
## 439 0.486975753
## 440 0.376461602
## 441 0.637655356
## 442 -0.058975395
## 501 -0.339081877
## 502 0.184595194
## 503 0.258634825
## 504 0.988383717
## 505 0.325219529
## 506 0.211536393
## 507 0.099254855
## 508 0.123636541
## 509 0.183882552
## 510 -0.348646376
## 511 0.405547256
## 512 0.115308764
## 513 -0.587531780
## 514 -0.700195242
## 515 0.614056423
## 516 0.640278431
## 517 0.989947864
## 518 0.669750238
## 519 -0.072393293
## 520 0.656734974
## 521 -1.340866671
## 524 -0.188531520
## 525 -0.215804047
## 526 -0.491559963
## 527 -0.536615115
## 528 0.380283443
## 530 -0.708444243
## 531 -0.819549435
## 532 -0.155921868
##
## with conditional variances for "id"
## [1] 3477.964
## [1] 3535.697
Based upon the above plot, it appears that daily perceived stress (stress_state), and day (which is a proxy for day of week) are predictive of daily positive affect. The values in the terminal nodes represent the average positive affect for that node.
Also, note this tree has a depth of 2 (root node is counted as zero) and 5 resulting nodes.
4. Affect Reactivity Model
Now let’s try a more complicated mode - with a within-person predictor included in each node. We use a model of “affect reactivity”.
model2 <- lmertree(posaff ~ 1 + stress_state | ((1 + stress_state) | id) |
day +
slphrs_trait + stress_trait +
slphrs_state +
bfi_e + bfi_a + bfi_c +
bfi_n + bfi_o,
data = amib,
REML = TRUE,
alpha = 1 - .999999,
minsplit = 150)
# visualize the tree results
plot(model2, which = "tree")## Linear mixed model tree
##
## Model formula:
## posaff ~ 1 + stress_state | day + slphrs_trait + stress_trait +
## slphrs_state + bfi_e + bfi_a + bfi_c + bfi_n + bfi_o
##
## Fitted party:
## [1] root
## | [2] stress_trait <= 1.4375
## | | [3] bfi_n <= 3.5
## | | | [4] stress_trait <= 1.125
## | | | | [5] day <= 3: n = 178
## | | | | (Intercept) stress_state
## | | | | 4.9786503 -0.8702966
## | | | | [6] day > 3: n = 177
## | | | | (Intercept) stress_state
## | | | | 4.464807 -1.007018
## | | | [7] stress_trait > 1.125: n = 253
## | | | (Intercept) stress_state
## | | | 4.2218336 -0.8098648
## | | [8] bfi_n > 3.5: n = 154
## | | (Intercept) stress_state
## | | 3.8938327 -0.6968044
## | [9] stress_trait > 1.4375
## | | [10] stress_trait <= 1.84375: n = 450
## | | (Intercept) stress_state
## | | 3.947909 -1.125616
## | | [11] stress_trait > 1.84375: n = 194
## | | (Intercept) stress_state
## | | 3.3738649 -0.9798695
##
## Number of inner nodes: 5
## Number of terminal nodes: 6
## Number of parameters per node: 2
## Objective function (residual sum of squares): 549.9184
##
## Random effects:
## $id
## (Intercept) stress_state
## 101 -0.135337162 -0.063284084
## 102 -0.065358275 -0.361673520
## 103 -0.581648716 -0.221017144
## 104 -0.349063359 -0.203780992
## 105 0.104286959 -0.042316654
## 106 -0.083880451 -0.243778920
## 107 0.095725501 -0.018292324
## 108 -0.950842209 0.180663632
## 109 0.013424980 -0.117809780
## 110 0.169649634 0.009780413
## 111 0.324701736 0.097359385
## 112 0.154058888 -0.080920540
## 113 0.866331242 0.129909810
## 114 0.091574861 0.068154772
## 115 -0.075615143 -0.005746730
## 116 0.381832915 0.123914604
## 117 0.463474547 0.075810678
## 118 -0.810695403 -0.079638599
## 119 0.083395381 0.197833516
## 120 0.355949841 0.095034776
## 121 -0.037679717 0.075297310
## 122 0.427672428 -0.020895591
## 123 1.219351167 0.123285674
## 124 0.523080661 -0.030882025
## 125 -0.282009196 0.221972655
## 126 0.127836893 0.087645834
## 127 0.271585980 0.128333621
## 128 -0.443825003 0.137778365
## 129 -0.080874133 0.064006949
## 130 0.261786502 0.027983859
## 201 0.586419495 0.022387156
## 202 -0.840830278 -0.039403985
## 203 0.087135210 -0.190042806
## 204 -0.387336158 0.175271247
## 205 0.269230629 0.170809188
## 206 -0.671049775 0.097051000
## 207 0.281438393 -0.119984721
## 208 0.171669229 0.252448427
## 209 -0.148029328 0.141617293
## 210 -1.044302640 0.024276688
## 211 -0.978363060 -0.073705797
## 212 -0.250777940 0.183222857
## 213 0.402737141 0.098512232
## 214 0.064374357 0.002060094
## 215 0.729158294 0.222812190
## 218 0.841167437 0.066953885
## 219 -0.182968935 -0.024951552
## 220 0.386770268 0.035346352
## 221 -0.161720924 0.060421646
## 222 0.706648648 0.146647053
## 223 0.632662119 0.155152387
## 224 -0.213710379 -0.432147485
## 225 -0.103923581 -0.035348973
## 226 -0.063243221 -0.060644720
## 227 0.502646935 0.093713165
## 228 -0.858251422 0.150091666
## 229 0.500102959 0.467972344
## 230 -0.017462687 -0.065614703
## 231 -0.278870942 0.129439178
## 233 -0.387979522 0.049293773
## 234 0.916210236 0.102679396
## 237 -0.426229812 0.005714452
## 238 0.122042489 0.139879483
## 239 -0.001730961 0.172712344
## 240 -0.643308547 -0.366049110
## 241 0.059757077 -0.092869946
## 242 0.329455496 -0.385246545
## 243 -0.584092027 -0.203058232
## 244 -0.024167681 -0.024033469
## 245 0.777089907 0.035505631
## 246 -0.565106119 -0.077432087
## 247 0.095090895 -0.011785347
## 248 -1.255456455 0.019139898
## 249 0.861671873 0.104843711
## 301 -0.232722572 -0.149694057
## 302 0.068955894 -0.143155507
## 303 1.096543966 0.158182976
## 304 -0.969974000 -0.117710322
## 305 -0.023132434 0.158984028
## 306 -0.467475884 -0.087350695
## 307 -0.753695419 -0.172181017
## 308 0.181780134 0.043281894
## 309 0.088196027 -0.202851362
## 310 -0.150159023 0.093093076
## 311 -0.517200504 -0.202352241
## 312 0.090943546 -0.003862108
## 313 -0.596844371 0.149988026
## 314 -0.271365087 -0.031650553
## 315 0.605407109 -0.004053210
## 316 0.041565230 -0.068931222
## 317 -0.285139363 0.070894421
## 318 1.264724588 0.149856026
## 319 0.322279658 0.024493163
## 320 -0.579928255 -0.144296937
## 321 -0.955297624 0.050111035
## 322 -0.385797279 0.052598950
## 323 -0.408649995 -0.075536386
## 324 0.287013775 0.033319670
## 325 -0.415619634 0.048019523
## 326 0.016820330 0.187491934
## 327 -0.169061941 -0.122023802
## 328 -0.083104825 -0.120581502
## 329 0.316222158 0.043284179
## 330 -0.571122327 -0.072463675
## 331 -0.309443202 0.058318491
## 332 -0.749251146 -0.010639817
## 333 -0.051909081 0.198762599
## 334 0.570334254 0.056303889
## 335 -0.496862922 0.077591603
## 336 0.108023125 -0.014232135
## 337 -0.352290537 -0.398257461
## 338 -0.025626005 0.074732837
## 339 -0.386142527 -0.076422341
## 340 0.359740380 0.064060448
## 341 0.242267774 -0.003414273
## 342 -0.324582329 -0.100953906
## 343 -0.280505512 0.120514264
## 344 -0.125203007 -0.046044069
## 345 0.133820797 0.089045748
## 346 -0.303755588 -0.023085339
## 401 0.116503354 0.002216007
## 402 -0.660775192 0.115602774
## 403 0.223778004 0.077500392
## 404 -0.191842142 -0.083641389
## 405 0.293451740 0.066214116
## 406 0.240924146 -0.014528548
## 407 -0.212329752 0.038713683
## 408 -0.445678831 0.255310547
## 409 0.247328420 -0.094621211
## 410 -0.334201927 -0.030384334
## 411 -0.505051749 -0.057709234
## 412 0.428571970 0.070213665
## 413 -0.271560054 -0.056318707
## 414 -0.253195685 0.067037190
## 415 0.386522366 -0.023112937
## 417 0.319122703 -0.123886252
## 418 -0.876098230 -0.138674309
## 419 0.457333304 0.195500097
## 420 0.333289864 0.095374473
## 421 0.239118162 0.199667557
## 422 0.013053111 0.023453393
## 423 0.361750208 -0.120612496
## 424 -0.127075373 0.009213648
## 425 0.683661949 0.085143797
## 426 1.206978842 0.137077634
## 427 0.089321972 -0.155159719
## 428 1.089631008 0.112624473
## 429 0.113459717 0.101174198
## 430 0.403463123 0.116554981
## 431 0.562925249 -0.030115585
## 432 0.042755560 -0.041526393
## 433 -0.252503466 -0.184578932
## 434 -0.298666748 -0.370532622
## 435 0.204460107 -0.165803813
## 436 0.635447695 0.224381980
## 437 -0.373390480 -0.018577185
## 438 0.453186349 0.079199300
## 439 -0.068562908 0.218877452
## 440 0.487074830 0.057021708
## 441 0.457945558 0.040789973
## 442 -0.163927317 0.005586409
## 501 -0.055579299 0.129825861
## 502 0.202813133 -0.184457117
## 503 0.432106947 -0.185604647
## 504 0.899841447 0.039426446
## 505 -0.202929471 0.070412139
## 506 -0.328648124 -0.207470488
## 507 -0.378908304 0.050330353
## 508 0.137369623 0.002642945
## 509 0.229590854 -0.179604432
## 510 -0.745346029 -0.142075482
## 511 0.289351770 0.130438816
## 512 -0.513294232 -0.300252766
## 513 -0.464747433 0.064937842
## 514 -0.572465700 -0.280451713
## 515 0.751026310 -0.031938839
## 516 0.773099401 0.257561136
## 517 0.421350397 0.034183894
## 518 0.100934815 -0.282915717
## 519 -0.692978505 -0.258251651
## 520 0.189619498 -0.138319333
## 521 -0.647059850 0.217211224
## 524 0.304105220 -0.149869587
## 525 -0.304460879 -0.119037858
## 526 -0.230788080 -0.079337580
## 527 -0.410864353 0.108687770
## 528 0.912344453 -0.250163950
## 530 -0.445171037 0.285295402
## 531 -0.221630386 -0.090999830
## 532 -0.311116035 -0.284437663
##
## with conditional variances for "id"
## Linear mixed model fit by REML ['lmerMod']
## Formula: posaff ~ .tree + .tree:stress_state + ((1 + stress_state) | id)
## Data: data
## Weights: .weights
## REML criterion at convergence: 3293.491
## Random effects:
## Groups Name Std.Dev. Corr
## id (Intercept) 0.5412
## stress_state 0.3031 0.14
## Residual 0.6767
## Number of obs: 1406, groups: id, 190
## Fixed Effects:
## (Intercept) .tree6 .tree7
## 4.9787 -0.5138 -0.7568
## .tree8 .tree10 .tree11
## -1.0848 -1.0307 -1.6048
## .tree5:stress_state .tree6:stress_state .tree7:stress_state
## -0.8703 -1.0070 -0.8099
## .tree8:stress_state .tree10:stress_state .tree11:stress_state
## -0.6968 -1.1256 -0.9799
## [1] 3335.491
## [1] 3445.709
5. Conclusion
In sum, this tutorial outlines how to run a generalized linear mixed
effects regression tree using the glmertree package.
We provide brief explanation of
the underlying model,
the code to run this model in R, and
the interpretation of the results.
More detailed information about these (and related analyses) can be found in Fokkema, Smits, Zeileis, Hothorn & Kelderman (2018).
Citations
Brandmaier, A. M., von Oertzen, T., McArdle, J. J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71–86. https://doi.org/10.1037/a0030001
Fokkema, M., Smits, N., Zeileis, A., Hothorn, T., & Kelderman, H. (2018). Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees. Behavior Research Methods, 50(5), 2016–2034. https://doi.org/10.3758/s13428-017-0971-x
Hajjem, A., Bellavance, F., & Larocque, D. (2011). Mixed effects regression trees for clustered data. Statistics & Probability Letters, 81(4), 451–459. https://doi.org/10.1016/j.spl.2010.12.003
Hajjem, A., Bellavance, F., & Larocque, D. (2014). Mixed-effects random forest for clustered data. Journal of Statistical Computation and Simulation, 84(6), 1313–1328. https://doi.org/10.1080/00949655.2012.741599
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3), 355–373. https://doi.org/10.1093/biostatistics/kxj011
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651–674. https://doi.org/10.1198/106186006X133933
Hothorn, T., & Zeileis, A. (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16(118), 3905–3909.
Milborrow, S. (2024). Rpart.plot: Plot “rpart” Models: An Enhanced Version of “plot.rpart” (Version 3.1.2). https://CRAN.R-project.org/package=rpart.plot
R Core Team. (2024). R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. https://www.R-project.org/
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University. https://CRAN.R-project.org/package=psych
Stegmann, G., Jacobucci, R., Serang, S., & Grimm, K. J. (2018). Recursive Partitioning with Nonlinear Models of Change. Multivariate Behavioral Research, 53(4), 559–570. https://doi.org/10.1080/00273171.2018.1461602
Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1), 307. https://doi.org/10.1186/1471-2105-9-307
Strobl, C., Boulesteix, A.-L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8(1), 25. https://doi.org/10.1186/1471-2105-8-25
Therneau, T., & Atkinson, B. (2025). rpart: Recursive Partitioning and Regression Trees (Version 4.1.24) . https://CRAN.R-project.org/package=rpart
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag. https://ggplot2.tidyverse.org/
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation (Version 1.1.4). https://CRAN.R-project.org/package=dplyr
Williams, G. (2011). Data Mining with {Rattle} and {R}: The art of excavating data for knowledge discovery. Springer. https://rd.springer.com/book/10.1007/978-1-4419-9890-3
Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514. https://doi.org/10.1198/106186008X319331