Linear Growth Model - Multilevel Modeling Implementation in R
Overview
This tutorial illustrates fitting of linear growth models in the
multilevel framework in R using both the nlme and
lme4 packages. Knowing how to fit the models in different
packages can be helpful when working with more complex models because
each package has both advantages and limitations.
Example data and code are drawn from Chapter 3 of Grimm, Ram, and Estabrook (2017). Specifically, we examine change in children’s mathematics achievement through elementary and middle school using the NLSY-CYA Dataset. We fit both No Growth and Linear Growth moodels. Please see the book chapter for additional interpretations and insights about the analyses.
Preliminaries - Loading libraries used in this script.
library(psych) #for basic functions
library(ggplot2) #for plotting
library(nlme) #for mixed effects models
library(lme4) #for mixed effects modelsPreliminaries - Data Preparation and Description
For our examples, we use the mathematics achievement scores from the NLSY-CYA Long Data.
Load the repeated measures data
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_long_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
na.strings = ".") #indicates the missing data designator
#copy data with new name
nlsy_math_long <- dat
#Add names the columns of the data set
names(nlsy_math_long) = c('id' , 'female', 'lb_wght',
'anti_k1', 'math' , 'grade' ,
'occ' , 'age' , 'men' ,
'spring' , 'anti')
#view the first few observations in the data set
head(nlsy_math_long, 10)## id female lb_wght anti_k1 math grade occ age men spring anti
## 1 201 1 0 0 38 3 2 111 0 1 0
## 2 201 1 0 0 55 5 3 135 1 1 0
## 3 303 1 0 1 26 2 2 121 0 1 2
## 4 303 1 0 1 33 5 3 145 0 1 2
## 5 2702 0 0 0 56 2 2 100 NA 1 0
## 6 2702 0 0 0 58 4 3 125 NA 1 2
## 7 2702 0 0 0 80 8 4 173 NA 1 2
## 8 4303 1 0 0 41 3 2 115 0 0 1
## 9 4303 1 0 0 58 4 3 135 0 1 2
## 10 5002 0 0 4 46 4 2 117 NA 1 4
Our specific interest is in how the repeated measures of
math change across grade.
As noted in Chapter 2 , it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon. Here, we want to examine the data to make sure a growth model would be an appropriate analysis for the data (i.e., we need to check that there is in fact growth to model).
Longitudinal Plot of Math across Grade at Testing
#intraindividual change trajetories
ggplot(data=nlsy_math_long, #data set
aes(x = grade, y = math, group = id)) + #setting variables
geom_point(size=.5) + #adding points to plot
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "PIAT Mathematics")We see both intraindividual change and interindividual differences in change.
Of note, the data begin at Grade 2 - suggesting that we rescale the time variable so that time = 0 is located at the Grade 2 assessment.
Creating another time variable for use in some models. Specifically we use k_1 = 2 and k_2 = 1 to recenter at Grade 2 and keep scaling in years.
#creating new, recentered time variable
nlsy_math_long$grade_c2 <- (nlsy_math_long$grade - 2)/1
#Looking at first few observations with variables of interest
head(nlsy_math_long[ ,c("id", "math", "grade", "grade_c2")], 10)## id math grade grade_c2
## 1 201 38 3 1
## 2 201 55 5 3
## 3 303 26 2 0
## 4 303 33 5 3
## 5 2702 56 2 0
## 6 2702 58 4 2
## 7 2702 80 8 6
## 8 4303 41 3 1
## 9 4303 58 4 2
## 10 5002 46 4 2
No Growth Model
There appears to be systematic growth in the mathematics scores over time, as can be seen in our plot above. To begin the analysis we will fit a no growth model. This no growth model will act as our “null” model to which to compare later models. In the multilevel literature this model is often termed the unconditional means model.
Using the notation from the lecture, the model equations are …
\[y_{it} = b_{1i}1 + u_{it}\] \[b_{1i} = \beta_{1} + d_{1i}\] Note the explicit inclusion of the “silent” vector of 1s.
Substitution of one equation into the other we get … \[y_{it} = \beta_{1}1 + d_{1i}1 + u_{it}\]
where \[u_{it} \tilde{} N(0,\sigma^{2}_{u})\], and \[d_{1i} \tilde{} N(0,\sigma^{2}_{d_1})\].
This is then the model that gets coded into the multilevel modeling functions. This can be done in a variety of ways. We concentrate on use of the nlme:nlme() function (but also provide some alternatives below).
No Growth Model using the nlme() function
#fitting no growth model and assigning it to an object
ng_math_nlme <- nlme(math ~ beta_1 + d_1i, #model equation
data=nlsy_math_long, #data set
fixed=beta_1~1, #fixed parameters
random=d_1i~1, #random coefficients
group=~id, #clustering variable
start=c(beta_1=40), #starting values
na.action = na.exclude) #missing data treatment
#obtaining summary of the model using the object we just created
summary(ng_math_nlme)## Nonlinear mixed-effects model fit by maximum likelihood
## Model: math ~ beta_1 + d_1i
## Data: nlsy_math_long
## AIC BIC logLik
## 17497.9 17515.02 -8745.952
##
## Random effects:
## Formula: d_1i ~ 1 | id
## d_1i Residual
## StdDev: 6.849581 10.80196
##
## Fixed effects: beta_1 ~ 1
## Value Std.Error DF t-value p-value
## beta_1 45.91468 0.323796 1289 141.8013 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.79834509 -0.57827919 0.05408079 0.57885592 2.52690026
##
## Number of Observations: 2221
## Number of Groups: 932
We get information about how the model fits the data, as well as some information about the data itself. Interpetation is covered in more detail in the accompanying video.
Plotting the predicted trajectories.
#obtaining predicted scores for individuals
nlsy_math_long$pred_ng <- predict(ng_math_nlme)
#obtaining predicted scores for prototype
nlsy_math_long$proto_ng <- predict(ng_math_nlme, level=0)
#plotting predicted trajectories
#intraindividual change trajetories
ggplot(data = nlsy_math_long, aes(x = grade, y = pred_ng, group = id)) +
#geom_point(color="black") +
geom_line(color="black") +
geom_line(aes(x = grade, y = proto_ng), color="red",size=2) +
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "Predicted PIAT Mathematics")We see from the plot that this model characterized everyone’s trajectory as a straight, flat line. We can examine the residuals to see what was missed by this model.
Plotting the predicted trajectories.
#obtaining residual scores for individuals
nlsy_math_long$resid_ng <- residuals(ng_math_nlme)
#plotting residuals trajectories
ggplot(data = nlsy_math_long, aes(x = grade, y = resid_ng, group = id)) +
#geom_point(color="black") +
geom_line(color="lightblue") +
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#removed constraints on the y-axis
scale_y_continuous(name = "Residual PIAT Mathematics")We see from the plot that the residuals have some clear structure to
them. They increase systematically across time. Thus, we move on to the
linear growth model, where we explcitly use grade as a
predictor of the PIAT mathematics score.
Linear Growth Model
To capture the systematic growth in the mathematics scores over time,
we exapnd the model to include a “time” variable. Here we use
grade as the time variable, so that the rate of growth
indicates (linear) school-related change in mathematics scores.
Using the notation from the lecture, the model equations are …
\[y_{it} = b_{1i}1 + b_{2i}grade_{it} + u_{it}\] \[b_{1i} = \beta_{1} + d_{1i}\] and \[b_{2i} = \beta_{2} + d_{2i}\]
Substitution of one equation into the other we get … \[y_{it} = (\beta_{1} + d_{1i}) + (\beta_{2} + d_{2i})grade_{it} + u_{it}\]
where \[u_{it} \tilde{} N(0,\sigma^{2}_{u})\], and \[\left[\begin{array} {r} d_{1i} \\ d_{2i} \end{array}\right] \tilde{} N(\left[\begin{array} {r} 0 \\ 0 \end{array}\right] ,\left[\begin{array} {r} \sigma^{2}_{d_1} & \sigma_{d_1d_2} \\ \sigma_{d_1d_2} & \sigma^{2}_{d_2} \end{array}\right]\].
This is then the model that gets coded into the multilevel modeling functions. This can be done in a variety of ways. We concentrate on use of the nlme:nlme() function (but also provide some alternatives below).
Linear Growth Model using nlme() function
Note that in fitting the linear growth model, we recenter the time variable, so that the intercept is the expected score at Grade 2 (the beginning of the data).
#fitting linear growth model and assigning it to an object
lg_math_nlme <- nlme(math~(beta_1+d_1i)+(beta_2+d_2i)*(grade-2),
data=nlsy_math_long,
fixed=beta_1+beta_2~1,
random=d_1i+d_2i~1,
group=~id,
start=c(beta_1=35,beta_2=4),
na.action = na.exclude)
#obtaining summary of the model using the object we just created
summary(lg_math_nlme)## Nonlinear mixed-effects model fit by maximum likelihood
## Model: math ~ (beta_1 + d_1i) + (beta_2 + d_2i) * (grade - 2)
## Data: nlsy_math_long
## AIC BIC logLik
## 15949.39 15983.62 -7968.693
##
## Random effects:
## Formula: list(d_1i ~ 1, d_2i ~ 1)
## Level: id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## d_1i 8.0350377 d_1i
## d_2i 0.8559002 -0.026
## Residual 6.0191136
##
## Fixed effects: beta_1 + beta_2 ~ 1
## Value Std.Error DF t-value p-value
## beta_1 35.26736 0.3551567 1288 99.30083 0
## beta_2 4.33933 0.0873767 1288 49.66231 0
## Correlation:
## beta_1
## beta_2 -0.532
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.030277915 -0.525876524 0.001632888 0.540751622 2.542250378
##
## Number of Observations: 2221
## Number of Groups: 932
#obtaining confidence intervals for parameters
intervals(lg_math_nlme)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## beta_1 34.570923 35.26736 35.963793
## beta_2 4.167991 4.33933 4.510669
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: id
## lower est. upper
## sd(d_1i) 7.5093038 8.03503772 8.59757875
## sd(d_2i) 0.6514098 0.85590016 1.12458402
## cor(d_1i,d_2i) -0.1474695 -0.02638833 0.09547213
##
## Within-group standard error:
## lower est. upper
## 5.758502 6.019114 6.291519
We get information about how the model fits the data, as well as some information about the linear growth process. We see from the model that there is evidence of linear growth in children’s mathematics scores aross grade. More detailed interpetation is given in the accompanying video.
Plotting the predicted trajectories.
#obtaining predicted scores for individuals
nlsy_math_long$pred_lg <- predict(lg_math_nlme)
#obtaining predicted scores for prototype
nlsy_math_long$proto_lg <- predict(lg_math_nlme, level=0)
#plotting predicted trajectories
#intraindividual change trajetories
ggplot(data = nlsy_math_long, aes(x = (grade-2), y = pred_lg, group = id)) +
#geom_point(color="black") +
geom_line(color="black") +
geom_line(aes(x = (grade-2), y = proto_lg), color="red",size=2) +
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2-2,8-2),
breaks = c(2-2,3-2,4-2,5-2,6-2,7-2,8-2),
name = "Grade at Testing (centered at Grade 2)") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "Predicted PIAT Mathematics")Note that the x-axis here is scaled to match the specific analysis.
For interpretation we may want to replot on the original time scaling, and plot the protytpical predicted trajectory derived from the linear growth model against the raw data.
#intraindividual change trajetories
#plotting predicted trajectory
ggplot(data = nlsy_math_long, aes(x = grade, y = math, group = id)) +
geom_point(color="gray", size=.5) +
geom_line(color="gray") +
geom_line(aes(x = grade, y = proto_lg), color="red",size=2) +
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "Predicted PIAT Mathematics")Alternative Model Specifications
There are a variety of multilevel modeling functions and coding
styles that can be used to fit the growth models. Here we illustrate
some alternatives that use the lme() and
nlme() functions in the nlme package and the
lmer() function in the lme4 package.
Linear growth model using the lme() function
#fitting no growth model and assigning it to an object
lg_math_lme <- nlme::lme(math ~ 1 + grade_c2,
random= ~1 + grade_c2|id,
data = nlsy_math_long,
na.action = na.exclude,
method="ML")
#obtaining summary of the model using the model object we just created
summary(lg_math_lme)## Linear mixed-effects model fit by maximum likelihood
## Data: nlsy_math_long
## AIC BIC logLik
## 15949.39 15983.62 -7968.693
##
## Random effects:
## Formula: ~1 + grade_c2 | id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 8.035040 (Intr)
## grade_c2 0.855900 -0.026
## Residual 6.019114
##
## Fixed effects: math ~ 1 + grade_c2
## Value Std.Error DF t-value p-value
## (Intercept) 35.26736 0.3551568 1288 99.30081 0
## grade_c2 4.33933 0.0873767 1288 49.66231 0
## Correlation:
## (Intr)
## grade_c2 -0.532
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.030277420 -0.525876489 0.001633084 0.540751577 2.542250490
##
## Number of Observations: 2221
## Number of Groups: 932
Linear growth model using alternative specification in
nlme() function
#fitting linear growth model and assigning it to an object
lg2_math_nlme <- nlme::nlme(math~b_1i+b_2i*(grade-2),
data=nlsy_math_long,
fixed=b_1i+b_2i~1,
random=b_1i+b_2i~1|id,
start=c(b_1i=35, b_2i=4),
na.action = na.exclude)
#obtaining summary of the model using the object we just created
summary(lg2_math_nlme)## Nonlinear mixed-effects model fit by maximum likelihood
## Model: math ~ b_1i + b_2i * (grade - 2)
## Data: nlsy_math_long
## AIC BIC logLik
## 15949.39 15983.62 -7968.693
##
## Random effects:
## Formula: list(b_1i ~ 1, b_2i ~ 1)
## Level: id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## b_1i 8.0350377 b_1i
## b_2i 0.8558995 -0.026
## Residual 6.0191139
##
## Fixed effects: b_1i + b_2i ~ 1
## Value Std.Error DF t-value p-value
## b_1i 35.26736 0.3551567 1288 99.30083 0
## b_2i 4.33933 0.0873767 1288 49.66231 0
## Correlation:
## b_1i
## b_2i -0.532
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.030277731 -0.525876493 0.001632888 0.540751589 2.542250224
##
## Number of Observations: 2221
## Number of Groups: 932
Linear growth model using the lmer() function in the
“lme4” package
#fitting linear growth model and assigning it to an object
lg_math_lmer <- lme4::lmer(math ~ 1 + grade_c2 + (1 + grade_c2 | id),
data = nlsy_math_long,
REML = TRUE,
na.action = na.exclude)
#obtaining summary of the model using the object we just created
summary(lg_math_lmer)## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ 1 + grade_c2 + (1 + grade_c2 | id)
## Data: nlsy_math_long
##
## REML criterion at convergence: 15941
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.02699 -0.52567 0.00173 0.54024 2.54141
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 64.69 8.0430
## grade_c2 0.74 0.8602 -0.03
## Residual 36.23 6.0192
## Number of obs: 2221, groups: id, 932
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 35.26717 0.35522 99.28
## grade_c2 4.33958 0.08741 49.65
##
## Correlation of Fixed Effects:
## (Intr)
## grade_c2 -0.532
Conclusion
This tutorial has presented several ways to set up and fit no growth and linear growth models in the multilevel modeling framework in R, as well as plot the predicted trajectories (and residuals).
Yay for Growth Modeling! It’s sooo fun!