Linear Growth Model - SEM Implementation in R

Overview

This tutorial illustrates fitting of linear growth models in the SEM framework in R using the lavaan package.

Example data and code are drawn from Chapter 3 of Grimm, Ram, and Estabrook (2017). Specifically, we examine change in children’s mathematics achievement through elementary and middle school using the NLSY-CYA Dataset. We fit both No Growth and Linear Growth models. Please see the book chapter for additional interpretations and insights about the analyses.

Preliminaries - Loading libraries used in this script.

library(psych)      #for basic functions
library(tidyverse)  #ggplot2, stringr, dplyr, readr, forcats, tidyr, purr
library(lavaan)     #SEM 
library(semPlot)    #SEM diagrams

Preliminaries - Data Preparation and Description

For our examples, we use the mathematics achievement scores from the NLSY-CYA Wide Data.

Load the repeated measures data

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/GrowthModeling/nlsy_math_wide_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
                  na.strings = ".")  #indicates the missing data designator
#copy data with new name 
nlsy_math_wide <- dat  

# Give the variable names
names(nlsy_math_wide)=c('id'     ,'female' ,'lb_wght','anti_k1',
                        'math2'  ,'math3'  ,'math4'  ,'math5'  ,'math6'  ,'math7'  ,'math8'  ,
                        'age2'   ,'age3'   ,'age4'   ,'age5'   ,'age6'   ,'age7'   ,'age8'   ,
                        'men2'   ,'men3'   ,'men4'   ,'men5'   ,'men6'   ,'men7'   ,'men8'   ,
                        'spring2','spring3','spring4','spring5','spring6','spring7','spring8',
                        'anti2'  ,'anti3'  ,'anti4'  ,'anti5'  ,'anti6'  ,'anti7'  ,'anti8')


#view the first few observations (and columns) in the data set 
head(nlsy_math_wide[ ,1:11], 10)
##      id female lb_wght anti_k1 math2 math3 math4 math5 math6 math7 math8
## 1   201      1       0       0    NA    38    NA    55    NA    NA    NA
## 2   303      1       0       1    26    NA    NA    33    NA    NA    NA
## 3  2702      0       0       0    56    NA    58    NA    NA    NA    80
## 4  4303      1       0       0    NA    41    58    NA    NA    NA    NA
## 5  5002      0       0       4    NA    NA    46    NA    54    NA    66
## 6  5005      1       0       0    35    NA    50    NA    60    NA    59
## 7  5701      0       0       2    NA    62    61    NA    NA    NA    NA
## 8  6102      0       0       0    NA    NA    55    67    NA    81    NA
## 9  6801      1       0       0    NA    54    NA    62    NA    66    NA
## 10 6802      0       0       0    NA    55    NA    66    NA    68    NA

Our specific interest is in change across the the repeated measures of math, math2 to math8.

As noted in Chapter 2, it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon. Here, we want to examine the data to make sure a growth model would be an appropriate analysis for the data (i.e., we need to check that there is in fact growth to model).

Longitudinal Plot of Math across Grade at Testing

We have to reshape from wide to long for plotting.

#subsetting to variables of interest
nlsy_math_sub <- nlsy_math_wide %>%
  select(id, math2, math3, math4, math5, math6, math7, math8)

#reshaping wide to long
nlsy_math_long <- nlsy_math_sub %>%
  pivot_longer(!id,                        #reshape all vars but the ID col
               names_to = "grade",         #create grade col using math cols names
               names_prefix = "math",      #remove the names prefix when creating col
               values_to = "math") %>%     #store the values from the math cols
    mutate(grade = as.numeric(grade),      #convert char vars to numeric
          math = as.numeric(math))

#sorting for easy viewing
nlsy_math_long <- nlsy_math_long %>%
  arrange(id, grade) %>%                   #order by id and time
  filter(!is.na(math))                     #remove rows with NA for math 
                                           #(needed to get trajectory lines connected)

#intraindividual change trajectories
nlsy_math_long %>%                         #data set 
  ggplot(aes(x=grade, y=math, group=id)) + #setting variables
  geom_point(size=.5) +                    #adding points to plot
  geom_line() +                            #adding lines to plot
  theme_bw() +                             #changing style & background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks=c(2,3,4,5,6,7,8), 
                     name="Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks=c(10,30,50,70,90), 
                     name="PIAT Mathematics")

We see both intraindividual change and interindividual differences in change.

Of note, the data begin at Grade 2 - suggesting that we rescale the time variable so that time = 0 is located at the Grade 2 assessment.

No Growth Model

There appears to be systematic growth in the mathematics scores over time, as can be seen in our plot above. To begin the analysis we will fit a no growth model. This no growth model will act as our “null” model to which to compare later models.

Now, we set up the no growth model as a lavaan model

#writing out no growth model in full SEM way 
ng_math_lavaan_model <- ' 
  # latent variable definitions
      #intercept
      eta_1 =~ 1*math2
      eta_1 =~ 1*math3
      eta_1 =~ 1*math4
      eta_1 =~ 1*math5
      eta_1 =~ 1*math6
      eta_1 =~ 1*math7
      eta_1 =~ 1*math8

  # factor variances
      eta_1 ~~ eta_1

  # covariances among factors 
      #none (only 1 factor)

  # factor means 
      eta_1 ~ start(30)*1

  # manifest variances (made equivalent by naming theta)
      math2 ~~ theta*math2
      math3 ~~ theta*math3
      math4 ~~ theta*math4
      math5 ~~ theta*math5
      math6 ~~ theta*math6
      math7 ~~ theta*math7
      math8 ~~ theta*math8
  # manifest means (fixed at zero)
      math2 ~ 0*1
      math3 ~ 0*1
      math4 ~ 0*1
      math5 ~ 0*1
      math6 ~ 0*1
      math7 ~ 0*1
      math8 ~ 0*1
' #end of model definition

And fit the model to the data …

#estimating the model using sem() function
ng_math_lavaan_fit <- sem(ng_math_lavaan_model, 
                          data = nlsy_math_wide,
                          meanstructure = TRUE,
                          estimator = "ML",
                          missing = "fiml")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
##   741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     due to missing values, some pairwise combinations have less than
##     10% coverage; use lavInspect(fit, "coverage") to investigate.
## Warning in lav_mvnorm_missing_h1_estimate_moments(Y = X[[g]], wt = WT[[g]], : lavaan WARNING:
##     Maximum number of iterations reached when computing the sample
##     moments using EM; use the em.h1.iter.max= argument to increase the
##     number of iterations

We can look more specifically at the numerical summaries of the model.

#obtaining summary of the model using the object we just created
summary(ng_math_lavaan_fit, fit.measures=TRUE)
## lavaan 0.6.16 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         9
##   Number of equality constraints                     6
## 
##                                                   Used       Total
##   Number of observations                           932         933
##   Number of missing patterns                        60            
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1759.002
##   Degrees of freedom                                32
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               862.334
##   Degrees of freedom                                21
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.000
##   Tucker-Lewis Index (TLI)                      -0.347
##                                                       
##   Robust Comparative Fit Index (CFI)             0.000
##   Robust Tucker-Lewis Index (TLI)                0.093
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -8745.952
##   Loglikelihood unrestricted model (H1)      -7866.451
##                                                       
##   Akaike (AIC)                               17497.903
##   Bayesian (BIC)                             17512.415
##   Sample-size adjusted Bayesian (SABIC)      17502.888
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.241
##   90 Percent confidence interval - lower         0.231
##   90 Percent confidence interval - upper         0.250
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    1.000
##                                                       
##   Robust RMSEA                                   0.467
##   90 Percent confidence interval - lower         0.402
##   90 Percent confidence interval - upper         0.534
##   P-value H_0: Robust RMSEA <= 0.050             0.000
##   P-value H_0: Robust RMSEA >= 0.080             1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.480
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 =~                                            
##     math2             1.000                           
##     math3             1.000                           
##     math4             1.000                           
##     math5             1.000                           
##     math6             1.000                           
##     math7             1.000                           
##     math8             1.000                           
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            45.915    0.324  141.721    0.000
##    .math2             0.000                           
##    .math3             0.000                           
##    .math4             0.000                           
##    .math5             0.000                           
##    .math6             0.000                           
##    .math7             0.000                           
##    .math8             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            46.917    4.832    9.709    0.000
##    .math2   (thet)  116.682    4.548   25.656    0.000
##    .math3   (thet)  116.682    4.548   25.656    0.000
##    .math4   (thet)  116.682    4.548   25.656    0.000
##    .math5   (thet)  116.682    4.548   25.656    0.000
##    .math6   (thet)  116.682    4.548   25.656    0.000
##    .math7   (thet)  116.682    4.548   25.656    0.000
##    .math8   (thet)  116.682    4.548   25.656    0.000

We get information about how the model fits the data, as well as some information about the data itself. Interpretation is covered in more detail in the accompanying video.

No Growth Diagram

We can use the fitted model object to make a diagram (and also check if we specified and estimated the model as intended).

#diagram of fitted model
semPaths(ng_math_lavaan_fit, what = "path", whatLabels = "par")

No Growth Predicted Trajectories

Pulling predicted trajectories out of lavaan takes a bit of work, but is possible.

#obtaining predicted factor scores for individuals
nlsy_math_predicted <- as.data.frame(cbind(nlsy_math_wide$id,
                                           lavPredict(ng_math_lavaan_fit)))

#naming columns
names(nlsy_math_predicted) <- c("id", "eta_1")

#looking at data
head(nlsy_math_predicted) 
##     id    eta_1
## 1  201 46.17558
## 2  303 38.59816
## 3 2702 56.16725
## 4 4303 47.51278
## 5 5002 51.06429
## 6 5005 49.05038
#calculating implied manifest scores
nlsy_math_predicted$math2 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math3 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math4 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math5 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math6 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math7 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math8 <- 1*nlsy_math_predicted$eta_1

#reshaping wide to long
nlsy_math_predicted_long <- nlsy_math_predicted %>%
  pivot_longer(cols = starts_with("math"),   
               names_to = "grade",    
               names_prefix = "math", 
               values_to = "math") %>%
  mutate(grade = as.numeric(grade),      #convert char vars to numeric
          math = as.numeric(math)) %>%
  arrange(id, grade)                     #order by id and time

#intraindividual change trajectories
nlsy_math_predicted_long %>%                      #data
  ggplot(aes(x = grade, y = math, group = id)) +  #setting variables
  geom_line() +                                   #adding lines to plot
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks=c(2,3,4,5,6,7,8), 
                     name="Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks=c(10,30,50,70,90), 
                     name="Predicted PIAT Mathematics")

We see from the plot that this no growth model characterizes each individual’s trajectory as a straight, flat line.

Linear Growth Model

To capture the systematic growth in the mathematics scores over time, we exapnd the model to include a linear “slope” factor.

Specifying linear growth model

#writing out linear growth model in full SEM way 
lg_math_lavaan_model <- '
  # latent variable definitions
      #intercept (note intercept is a reserved term)
      eta_1 =~ 1*math2
      eta_1 =~ 1*math3
      eta_1 =~ 1*math4
      eta_1 =~ 1*math5
      eta_1 =~ 1*math6
      eta_1 =~ 1*math7
      eta_1 =~ 1*math8

      #linear slope 
      eta_2 =~ 0*math2
      eta_2 =~ 1*math3
      eta_2 =~ 2*math4
      eta_2 =~ 3*math5
      eta_2 =~ 4*math6
      eta_2 =~ 5*math7
      eta_2 =~ 6*math8

  # factor variances
      eta_1 ~~ eta_1
      eta_2 ~~ eta_2

  # covariances among factors 
      eta_1 ~~ eta_2

  # factor means 
      eta_1 ~ start(35)*1
      eta_2 ~ start(4)*1

  # manifest variances (made equivalent by naming theta)
      math2 ~~ theta*math2
      math3 ~~ theta*math3
      math4 ~~ theta*math4
      math5 ~~ theta*math5
      math6 ~~ theta*math6
      math7 ~~ theta*math7
      math8 ~~ theta*math8
  # manifest means (fixed at zero)
      math2 ~ 0*1
      math3 ~ 0*1
      math4 ~ 0*1
      math5 ~ 0*1
      math6 ~ 0*1
      math7 ~ 0*1
      math8 ~ 0*1
' #end of model definition

And fit the model to the data …

#estimating the model using sem() function
lg_math_lavaan_fit <- sem(lg_math_lavaan_model, 
                          data = nlsy_math_wide,
                          meanstructure = TRUE,
                          estimator = "ML",
                          missing = "fiml")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
##   741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     due to missing values, some pairwise combinations have less than
##     10% coverage; use lavInspect(fit, "coverage") to investigate.
## Warning in lav_mvnorm_missing_h1_estimate_moments(Y = X[[g]], wt = WT[[g]], : lavaan WARNING:
##     Maximum number of iterations reached when computing the sample
##     moments using EM; use the em.h1.iter.max= argument to increase the
##     number of iterations

We can look more specifically at the numerical summaries of the model.

#obtaining summary of the model using the object we just created
summary(lg_math_lavaan_fit, fit.measures=TRUE)
## lavaan 0.6.16 ended normally after 27 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
##   Number of equality constraints                     6
## 
##                                                   Used       Total
##   Number of observations                           932         933
##   Number of missing patterns                        60            
## 
## Model Test User Model:
##                                                       
##   Test statistic                               204.484
##   Degrees of freedom                                29
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               862.334
##   Degrees of freedom                                21
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.791
##   Tucker-Lewis Index (TLI)                       0.849
##                                                       
##   Robust Comparative Fit Index (CFI)             0.896
##   Robust Tucker-Lewis Index (TLI)                0.925
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -7968.693
##   Loglikelihood unrestricted model (H1)      -7866.451
##                                                       
##   Akaike (AIC)                               15949.386
##   Bayesian (BIC)                             15978.410
##   Sample-size adjusted Bayesian (SABIC)      15959.354
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.081
##   90 Percent confidence interval - lower         0.070
##   90 Percent confidence interval - upper         0.091
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    0.550
##                                                       
##   Robust RMSEA                                   0.134
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.233
##   P-value H_0: Robust RMSEA <= 0.050             0.136
##   P-value H_0: Robust RMSEA >= 0.080             0.792
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.121
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 =~                                            
##     math2             1.000                           
##     math3             1.000                           
##     math4             1.000                           
##     math5             1.000                           
##     math6             1.000                           
##     math7             1.000                           
##     math8             1.000                           
##   eta_2 =~                                            
##     math2             0.000                           
##     math3             1.000                           
##     math4             2.000                           
##     math5             3.000                           
##     math6             4.000                           
##     math7             5.000                           
##     math8             6.000                           
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 ~~                                            
##     eta_2            -0.181    1.150   -0.158    0.875
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            35.267    0.355   99.229    0.000
##     eta_2             4.339    0.088   49.136    0.000
##    .math2             0.000                           
##    .math3             0.000                           
##    .math4             0.000                           
##    .math5             0.000                           
##    .math6             0.000                           
##    .math7             0.000                           
##    .math8             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            64.562    5.659   11.408    0.000
##     eta_2             0.733    0.327    2.238    0.025
##    .math2   (thet)   36.230    1.867   19.410    0.000
##    .math3   (thet)   36.230    1.867   19.410    0.000
##    .math4   (thet)   36.230    1.867   19.410    0.000
##    .math5   (thet)   36.230    1.867   19.410    0.000
##    .math6   (thet)   36.230    1.867   19.410    0.000
##    .math7   (thet)   36.230    1.867   19.410    0.000
##    .math8   (thet)   36.230    1.867   19.410    0.000

We get information about how the model fits the data, as well as some information about the data itself. Interpretation is covered in more detail in the accompanying video.

Linear Growth Model Diagram

We can make a diagram to check if we specified and estimated the model as intended.

#diagram of fitted model
semPaths(lg_math_lavaan_fit, what = "path", whatLabels = "par")

Linear Growth Model Predicted Trajectories Plot

Plotting the predicted trajectories again takes some work, but is possible. The prototypical trajectory would need to be made from the model output.

#obtaining predicted factor scores for individuals
nlsy_math_predicted <- as.data.frame(cbind(nlsy_math_wide$id,
                                           lavPredict(lg_math_lavaan_fit)))

#naming columns
names(nlsy_math_predicted) <- c("id", "eta_1", "eta_2")

head(nlsy_math_predicted)
##     id    eta_1    eta_2
## 1  201 36.94675 4.534084
## 2  303 26.03589 4.050780
## 3 2702 49.70187 4.594148
## 4 4303 41.04200 4.548063
## 5 5002 37.01241 4.496745
## 6 5005 37.68809 4.324197
#calculating implied manifest scores
nlsy_math_predicted$math2 <- 1*nlsy_math_predicted$eta_1 + 0*nlsy_math_predicted$eta_2
nlsy_math_predicted$math3 <- 1*nlsy_math_predicted$eta_1 + 1*nlsy_math_predicted$eta_2
nlsy_math_predicted$math4 <- 1*nlsy_math_predicted$eta_1 + 2*nlsy_math_predicted$eta_2
nlsy_math_predicted$math5 <- 1*nlsy_math_predicted$eta_1 + 3*nlsy_math_predicted$eta_2
nlsy_math_predicted$math6 <- 1*nlsy_math_predicted$eta_1 + 4*nlsy_math_predicted$eta_2
nlsy_math_predicted$math7 <- 1*nlsy_math_predicted$eta_1 + 5*nlsy_math_predicted$eta_2
nlsy_math_predicted$math8 <- 1*nlsy_math_predicted$eta_1 + 6*nlsy_math_predicted$eta_2

#reshaping wide to long
nlsy_math_predicted_long <- nlsy_math_predicted %>%
  pivot_longer(cols = starts_with("math"),   
               names_to = "grade",    
               names_prefix = "math", 
               values_to = "math") %>%
  mutate(grade = as.numeric(grade),      #convert char vars to numeric
          math = as.numeric(math)) %>%
  arrange(id, grade)                     #order by id and time

#intraindividual change trajectories
nlsy_math_predicted_long %>%                     #data set
  filter(id < 80000) %>%                         #filter IDs
  ggplot(aes(x = grade, y = math, group = id)) + #setting variables
  geom_line() +                                  #adding lines to plot
  theme_bw() +                                   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks=c(2,3,4,5,6,7,8), 
                     name="Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks=c(10,30,50,70,90), 
                     name="Predicted PIAT Mathematics")

We see from the plot that this model characterized everyone’s trajectory as a straight line, with some interindividual differences in the rate of intraindividual change.

Conclusion

This tutorial has presented how to set up and fit no growth and linear growth models in the SEM modeling framework using the lavaan package in R, as well as calculate and plot the predicted trajectories.

Yay for Growth Modeling! It’s sooo fun!