Linear Growth Model - Multilevel Modeling Implementation in R

Overview

This tutorial illustrates fitting of linear growth models in the multilevel framework in R using both the nlme and lme4 packages. Knowing how to fit the models in different packages can be helpful when working with more complex models because each package has both advantages and limitations.

Example data and code are drawn from Chapter 3 of Grimm, Ram, and Estabrook (2017). Specifically, we examine change in children’s mathematics achievement through elementary and middle school using the NLSY-CYA Dataset. We fit both No Growth and Linear Growth moodels. Please see the book chapter for additional interpretations and insights about the analyses.

Preliminaries - Loading libraries used in this script.

library(psych)  #for basic functions
library(ggplot2)  #for plotting
library(nlme) #for mixed effects models
library(lme4) #for mixed effects models

Preliminaries - Data Preparation and Description

For our examples, we use the mathematics achievement scores from the NLSY-CYA Long Data.

Load the repeated measures data

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_long_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
                  na.strings = ".")  #indicates the missing data designator
#copy data with new name 
nlsy_math_long <- dat  

#Add names the columns of the data set
names(nlsy_math_long) = c('id'     , 'female', 'lb_wght', 
                          'anti_k1', 'math'  , 'grade'  ,
                          'occ'    , 'age'   , 'men'    ,
                          'spring' , 'anti')

#view the first few observations in the data set 
head(nlsy_math_long, 10)
##      id female lb_wght anti_k1 math grade occ age men spring anti
## 1   201      1       0       0   38     3   2 111   0      1    0
## 2   201      1       0       0   55     5   3 135   1      1    0
## 3   303      1       0       1   26     2   2 121   0      1    2
## 4   303      1       0       1   33     5   3 145   0      1    2
## 5  2702      0       0       0   56     2   2 100  NA      1    0
## 6  2702      0       0       0   58     4   3 125  NA      1    2
## 7  2702      0       0       0   80     8   4 173  NA      1    2
## 8  4303      1       0       0   41     3   2 115   0      0    1
## 9  4303      1       0       0   58     4   3 135   0      1    2
## 10 5002      0       0       4   46     4   2 117  NA      1    4

Our specific interest is in how the repeated measures of math change across grade.

As noted in Chapter 2 , it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon. Here, we want to examine the data to make sure a growth model would be an appropriate analysis for the data (i.e., we need to check that there is in fact growth to model).

Longitudinal Plot of Math across Grade at Testing

#intraindividual change trajetories
ggplot(data=nlsy_math_long,                    #data set
       aes(x = grade, y = math, group = id)) + #setting variables
  geom_point(size=.5) + #adding points to plot
  geom_line() +  #adding lines to plot
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "PIAT Mathematics")

We see both intraindividual change and interindividual differences in change.

Of note, the data begin at Grade 2 - suggesting that we rescale the time variable so that time = 0 is located at the Grade 2 assessment.

Creating another time variable for use in some models. Specifically we use k_1 = 2 and k_2 = 1 to recenter at Grade 2 and keep scaling in years.

#creating new, recentered time variable 
nlsy_math_long$grade_c2 <- (nlsy_math_long$grade - 2)/1

#Looking at first few observations with variables of interest 
head(nlsy_math_long[ ,c("id", "math", "grade", "grade_c2")], 10)
##      id math grade grade_c2
## 1   201   38     3        1
## 2   201   55     5        3
## 3   303   26     2        0
## 4   303   33     5        3
## 5  2702   56     2        0
## 6  2702   58     4        2
## 7  2702   80     8        6
## 8  4303   41     3        1
## 9  4303   58     4        2
## 10 5002   46     4        2

No Growth Model

There appears to be systematic growth in the mathematics scores over time, as can be seen in our plot above. To begin the analysis we will fit a no growth model. This no growth model will act as our “null” model to which to compare later models. In the multilevel literature this model is often termed the unconditional means model.

Using the notation from the lecture, the model equations are …

\[y_{it} = b_{1i}1 + u_{it}\] \[b_{1i} = \beta_{1} + d_{1i}\] Note the explicit inclusion of the “silent” vector of 1s.

Substitution of one equation into the other we get … \[y_{it} = \beta_{1}1 + d_{1i}1 + u_{it}\]

where \[u_{it} \tilde{} N(0,\sigma^{2}_{u})\], and \[d_{1i} \tilde{} N(0,\sigma^{2}_{d_1})\].

This is then the model that gets coded into the multilevel modeling functions. This can be done in a variety of ways. We concentrate on use of the nlme:nlme() function (but also provide some alternatives below).

No Growth Model using the nlme() function

#fitting no growth model and assigning it to an object
ng_math_nlme <- nlme(math ~ beta_1 + d_1i,    #model equation
                     data=nlsy_math_long,     #data set                   
                     fixed=beta_1~1,          #fixed parameters              
                     random=d_1i~1,           #random coefficients
                     group=~id,               #clustering variable         
                     start=c(beta_1=40),      #starting values
                     na.action = na.exclude)  #missing data treatment                     

#obtaining summary of the model using the object we just created                     
summary(ng_math_nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: math ~ beta_1 + d_1i 
##   Data: nlsy_math_long 
##       AIC      BIC    logLik
##   17497.9 17515.02 -8745.952
## 
## Random effects:
##  Formula: d_1i ~ 1 | id
##             d_1i Residual
## StdDev: 6.849581 10.80196
## 
## Fixed effects:  beta_1 ~ 1 
##           Value Std.Error   DF  t-value p-value
## beta_1 45.91468  0.323796 1289 141.8013       0
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.79834509 -0.57827919  0.05408079  0.57885592  2.52690026 
## 
## Number of Observations: 2221
## Number of Groups: 932

We get information about how the model fits the data, as well as some information about the data itself. Interpetation is covered in more detail in the accompanying video.

Plotting the predicted trajectories.

#obtaining predicted scores for individuals
nlsy_math_long$pred_ng <- predict(ng_math_nlme)

#obtaining predicted scores for prototype
nlsy_math_long$proto_ng <- predict(ng_math_nlme, level=0)

#plotting predicted trajectories
#intraindividual change trajetories
ggplot(data = nlsy_math_long, aes(x = grade, y = pred_ng, group = id)) +
  #geom_point(color="black") + 
  geom_line(color="black") +
  geom_line(aes(x = grade, y = proto_ng), color="red",size=2) + 
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "Predicted PIAT Mathematics")

We see from the plot that this model characterized everyone’s trajectory as a straight, flat line. We can examine the residuals to see what was missed by this model.

Plotting the predicted trajectories.

#obtaining residual scores for individuals
nlsy_math_long$resid_ng <- residuals(ng_math_nlme)

#plotting residuals trajectories
ggplot(data = nlsy_math_long, aes(x = grade, y = resid_ng, group = id)) +
  #geom_point(color="black") + 
  geom_line(color="lightblue") +
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #removed constraints on the y-axis
  scale_y_continuous(name = "Residual PIAT Mathematics")

We see from the plot that the residuals have some clear structure to them. They increase systematically across time. Thus, we move on to the linear growth model, where we explcitly use grade as a predictor of the PIAT mathematics score.

Linear Growth Model

To capture the systematic growth in the mathematics scores over time, we exapnd the model to include a “time” variable. Here we use grade as the time variable, so that the rate of growth indicates (linear) school-related change in mathematics scores.

Using the notation from the lecture, the model equations are …

\[y_{it} = b_{1i}1 + b_{2i}grade_{it} + u_{it}\] \[b_{1i} = \beta_{1} + d_{1i}\] and \[b_{2i} = \beta_{2} + d_{2i}\]

Substitution of one equation into the other we get … \[y_{it} = (\beta_{1} + d_{1i}) + (\beta_{2} + d_{2i})grade_{it} + u_{it}\]

where \[u_{it} \tilde{} N(0,\sigma^{2}_{u})\], and \[\left[\begin{array} {r} d_{1i} \\ d_{2i} \end{array}\right] \tilde{} N(\left[\begin{array} {r} 0 \\ 0 \end{array}\right] ,\left[\begin{array} {r} \sigma^{2}_{d_1} & \sigma_{d_1d_2} \\ \sigma_{d_1d_2} & \sigma^{2}_{d_2} \end{array}\right]\].

This is then the model that gets coded into the multilevel modeling functions. This can be done in a variety of ways. We concentrate on use of the nlme:nlme() function (but also provide some alternatives below).

Linear Growth Model using nlme() function

Note that in fitting the linear growth model, we recenter the time variable, so that the intercept is the expected score at Grade 2 (the beginning of the data).

#fitting linear growth model and assigning it to an object
lg_math_nlme <- nlme(math~(beta_1+d_1i)+(beta_2+d_2i)*(grade-2),  
                   data=nlsy_math_long,                      
                   fixed=beta_1+beta_2~1,                      
                   random=d_1i+d_2i~1,
                   group=~id,                     
                   start=c(beta_1=35,beta_2=4),
                   na.action = na.exclude)

#obtaining summary of the model using the object we just created
summary(lg_math_nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: math ~ (beta_1 + d_1i) + (beta_2 + d_2i) * (grade - 2) 
##   Data: nlsy_math_long 
##        AIC      BIC    logLik
##   15949.39 15983.62 -7968.693
## 
## Random effects:
##  Formula: list(d_1i ~ 1, d_2i ~ 1)
##  Level: id
##  Structure: General positive-definite, Log-Cholesky parametrization
##          StdDev    Corr  
## d_1i     8.0350377 d_1i  
## d_2i     0.8559002 -0.026
## Residual 6.0191136       
## 
## Fixed effects:  beta_1 + beta_2 ~ 1 
##           Value Std.Error   DF  t-value p-value
## beta_1 35.26736 0.3551567 1288 99.30083       0
## beta_2  4.33933 0.0873767 1288 49.66231       0
##  Correlation: 
##        beta_1
## beta_2 -0.532
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.030277915 -0.525876524  0.001632888  0.540751622  2.542250378 
## 
## Number of Observations: 2221
## Number of Groups: 932
#obtaining confidence intervals for parameters
intervals(lg_math_nlme)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##            lower     est.     upper
## beta_1 34.570923 35.26736 35.963793
## beta_2  4.167991  4.33933  4.510669
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: id 
##                     lower        est.      upper
## sd(d_1i)        7.5093038  8.03503772 8.59757875
## sd(d_2i)        0.6514098  0.85590016 1.12458402
## cor(d_1i,d_2i) -0.1474695 -0.02638833 0.09547213
## 
##  Within-group standard error:
##    lower     est.    upper 
## 5.758502 6.019114 6.291519

We get information about how the model fits the data, as well as some information about the linear growth process. We see from the model that there is evidence of linear growth in children’s mathematics scores aross grade. More detailed interpetation is given in the accompanying video.

Plotting the predicted trajectories.

#obtaining predicted scores for individuals
nlsy_math_long$pred_lg <- predict(lg_math_nlme)

#obtaining predicted scores for prototype
nlsy_math_long$proto_lg <- predict(lg_math_nlme, level=0)

#plotting predicted trajectories
#intraindividual change trajetories
ggplot(data = nlsy_math_long, aes(x = (grade-2), y = pred_lg, group = id)) +
  #geom_point(color="black") + 
  geom_line(color="black") +
  geom_line(aes(x = (grade-2), y = proto_lg), color="red",size=2) + 
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2-2,8-2),
                     breaks = c(2-2,3-2,4-2,5-2,6-2,7-2,8-2), 
                     name = "Grade at Testing (centered at Grade 2)") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "Predicted PIAT Mathematics")

Note that the x-axis here is scaled to match the specific analysis.

For interpretation we may want to replot on the original time scaling, and plot the protytpical predicted trajectory derived from the linear growth model against the raw data.

#intraindividual change trajetories
#plotting predicted trajectory
ggplot(data = nlsy_math_long, aes(x = grade, y = math, group = id)) +
  geom_point(color="gray", size=.5) + 
  geom_line(color="gray") +
  geom_line(aes(x = grade, y = proto_lg), color="red",size=2) + 
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "Predicted PIAT Mathematics")

Alternative Model Specifications

There are a variety of multilevel modeling functions and coding styles that can be used to fit the growth models. Here we illustrate some alternatives that use the lme() and nlme() functions in the nlme package and the lmer() function in the lme4 package.

Linear growth model using the lme() function

#fitting no growth model and assigning it to an object
lg_math_lme <- nlme::lme(math ~ 1 + grade_c2, 
                         random= ~1 + grade_c2|id, 
                         data = nlsy_math_long,
                         na.action = na.exclude,
                         method="ML")

#obtaining summary of the model using the model object we just created
summary(lg_math_lme)
## Linear mixed-effects model fit by maximum likelihood
##   Data: nlsy_math_long 
##        AIC      BIC    logLik
##   15949.39 15983.62 -7968.693
## 
## Random effects:
##  Formula: ~1 + grade_c2 | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 8.035040 (Intr)
## grade_c2    0.855900 -0.026
## Residual    6.019114       
## 
## Fixed effects:  math ~ 1 + grade_c2 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 35.26736 0.3551568 1288 99.30081       0
## grade_c2     4.33933 0.0873767 1288 49.66231       0
##  Correlation: 
##          (Intr)
## grade_c2 -0.532
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.030277420 -0.525876489  0.001633084  0.540751577  2.542250490 
## 
## Number of Observations: 2221
## Number of Groups: 932

Linear growth model using alternative specification in nlme() function

#fitting linear growth model and assigning it to an object
lg2_math_nlme <- nlme::nlme(math~b_1i+b_2i*(grade-2),
                            data=nlsy_math_long,
                            fixed=b_1i+b_2i~1,
                            random=b_1i+b_2i~1|id,
                            start=c(b_1i=35, b_2i=4),
                            na.action = na.exclude)

#obtaining summary of the model using the object we just created
summary(lg2_math_nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: math ~ b_1i + b_2i * (grade - 2) 
##   Data: nlsy_math_long 
##        AIC      BIC    logLik
##   15949.39 15983.62 -7968.693
## 
## Random effects:
##  Formula: list(b_1i ~ 1, b_2i ~ 1)
##  Level: id
##  Structure: General positive-definite, Log-Cholesky parametrization
##          StdDev    Corr  
## b_1i     8.0350377 b_1i  
## b_2i     0.8558995 -0.026
## Residual 6.0191139       
## 
## Fixed effects:  b_1i + b_2i ~ 1 
##         Value Std.Error   DF  t-value p-value
## b_1i 35.26736 0.3551567 1288 99.30083       0
## b_2i  4.33933 0.0873767 1288 49.66231       0
##  Correlation: 
##      b_1i  
## b_2i -0.532
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.030277731 -0.525876493  0.001632888  0.540751589  2.542250224 
## 
## Number of Observations: 2221
## Number of Groups: 932

Linear growth model using the lmer() function in the “lme4” package

#fitting linear growth model and assigning it to an object
lg_math_lmer <- lme4::lmer(math ~ 1 + grade_c2 + (1 + grade_c2 | id), 
                           data = nlsy_math_long, 
                           REML = TRUE,
                           na.action = na.exclude)

#obtaining summary of the model using the object we just created
summary(lg_math_lmer)
## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ 1 + grade_c2 + (1 + grade_c2 | id)
##    Data: nlsy_math_long
## 
## REML criterion at convergence: 15941
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.02699 -0.52567  0.00173  0.54024  2.54141 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 64.69    8.0430        
##           grade_c2     0.74    0.8602   -0.03
##  Residual             36.23    6.0192        
## Number of obs: 2221, groups:  id, 932
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 35.26717    0.35522   99.28
## grade_c2     4.33958    0.08741   49.65
## 
## Correlation of Fixed Effects:
##          (Intr)
## grade_c2 -0.532

Conclusion

This tutorial has presented several ways to set up and fit no growth and linear growth models in the multilevel modeling framework in R, as well as plot the predicted trajectories (and residuals).

Yay for Growth Modeling! It’s sooo fun!