Linear Growth Model - SEM Implementation in R
Overview
This tutorial illustrates fitting of linear growth models in the SEM
framework in R using the lavaan package.
Example data and code are drawn from Chapter 3 of Grimm, Ram, and Estabrook (2017). Specifically, we examine change in children’s mathematics achievement through elementary and middle school using the NLSY-CYA Dataset. We fit both No Growth and Linear Growth models. Please see the book chapter for additional interpretations and insights about the analyses.
Preliminaries - Loading libraries used in this script.
library(psych) #for basic functions
library(tidyverse) #ggplot2, stringr, dplyr, readr, forcats, tidyr, purr
library(lavaan) #SEM
library(semPlot) #SEM diagramsPreliminaries - Data Preparation and Description
For our examples, we use the mathematics achievement scores from the NLSY-CYA Wide Data.
Load the repeated measures data
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/GrowthModeling/nlsy_math_wide_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
na.strings = ".") #indicates the missing data designator
#copy data with new name
nlsy_math_wide <- dat
# Give the variable names
names(nlsy_math_wide)=c('id' ,'female' ,'lb_wght','anti_k1',
'math2' ,'math3' ,'math4' ,'math5' ,'math6' ,'math7' ,'math8' ,
'age2' ,'age3' ,'age4' ,'age5' ,'age6' ,'age7' ,'age8' ,
'men2' ,'men3' ,'men4' ,'men5' ,'men6' ,'men7' ,'men8' ,
'spring2','spring3','spring4','spring5','spring6','spring7','spring8',
'anti2' ,'anti3' ,'anti4' ,'anti5' ,'anti6' ,'anti7' ,'anti8')
#view the first few observations (and columns) in the data set
head(nlsy_math_wide[ ,1:11], 10)## id female lb_wght anti_k1 math2 math3 math4 math5 math6 math7 math8
## 1 201 1 0 0 NA 38 NA 55 NA NA NA
## 2 303 1 0 1 26 NA NA 33 NA NA NA
## 3 2702 0 0 0 56 NA 58 NA NA NA 80
## 4 4303 1 0 0 NA 41 58 NA NA NA NA
## 5 5002 0 0 4 NA NA 46 NA 54 NA 66
## 6 5005 1 0 0 35 NA 50 NA 60 NA 59
## 7 5701 0 0 2 NA 62 61 NA NA NA NA
## 8 6102 0 0 0 NA NA 55 67 NA 81 NA
## 9 6801 1 0 0 NA 54 NA 62 NA 66 NA
## 10 6802 0 0 0 NA 55 NA 66 NA 68 NA
Our specific interest is in change across the the repeated measures
of math, math2 to math8.
As noted in Chapter 2, it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon. Here, we want to examine the data to make sure a growth model would be an appropriate analysis for the data (i.e., we need to check that there is in fact growth to model).
Longitudinal Plot of Math across Grade at Testing
We have to reshape from wide to long for plotting.
#subsetting to variables of interest
nlsy_math_sub <- nlsy_math_wide %>%
select(id, math2, math3, math4, math5, math6, math7, math8)
#reshaping wide to long
nlsy_math_long <- nlsy_math_sub %>%
pivot_longer(!id, #reshape all vars but the ID col
names_to = "grade", #create grade col using math cols names
names_prefix = "math", #remove the names prefix when creating col
values_to = "math") %>% #store the values from the math cols
mutate(grade = as.numeric(grade), #convert char vars to numeric
math = as.numeric(math))
#sorting for easy viewing
nlsy_math_long <- nlsy_math_long %>%
arrange(id, grade) %>% #order by id and time
filter(!is.na(math)) #remove rows with NA for math
#(needed to get trajectory lines connected)
#intraindividual change trajectories
nlsy_math_long %>% #data set
ggplot(aes(x=grade, y=math, group=id)) + #setting variables
geom_point(size=.5) + #adding points to plot
geom_line() + #adding lines to plot
theme_bw() + #changing style & background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks=c(2,3,4,5,6,7,8),
name="Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks=c(10,30,50,70,90),
name="PIAT Mathematics")We see both intraindividual change and interindividual differences in change.
Of note, the data begin at Grade 2 - suggesting that we rescale the time variable so that time = 0 is located at the Grade 2 assessment.
No Growth Model
There appears to be systematic growth in the mathematics scores over time, as can be seen in our plot above. To begin the analysis we will fit a no growth model. This no growth model will act as our “null” model to which to compare later models.
Now, we set up the no growth model as a lavaan model
#writing out no growth model in full SEM way
ng_math_lavaan_model <- '
# latent variable definitions
#intercept
eta_1 =~ 1*math2
eta_1 =~ 1*math3
eta_1 =~ 1*math4
eta_1 =~ 1*math5
eta_1 =~ 1*math6
eta_1 =~ 1*math7
eta_1 =~ 1*math8
# factor variances
eta_1 ~~ eta_1
# covariances among factors
#none (only 1 factor)
# factor means
eta_1 ~ start(30)*1
# manifest variances (made equivalent by naming theta)
math2 ~~ theta*math2
math3 ~~ theta*math3
math4 ~~ theta*math4
math5 ~~ theta*math5
math6 ~~ theta*math6
math7 ~~ theta*math7
math8 ~~ theta*math8
# manifest means (fixed at zero)
math2 ~ 0*1
math3 ~ 0*1
math4 ~ 0*1
math5 ~ 0*1
math6 ~ 0*1
math7 ~ 0*1
math8 ~ 0*1
' #end of model definitionAnd fit the model to the data …
#estimating the model using sem() function
ng_math_lavaan_fit <- sem(ng_math_lavaan_model,
data = nlsy_math_wide,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml")## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
## 741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## due to missing values, some pairwise combinations have less than
## 10% coverage; use lavInspect(fit, "coverage") to investigate.
## Warning in lav_mvnorm_missing_h1_estimate_moments(Y = X[[g]], wt = WT[[g]], : lavaan WARNING:
## Maximum number of iterations reached when computing the sample
## moments using EM; use the em.h1.iter.max= argument to increase the
## number of iterations
We can look more specifically at the numerical summaries of the model.
#obtaining summary of the model using the object we just created
summary(ng_math_lavaan_fit, fit.measures=TRUE)## lavaan 0.6.16 ended normally after 18 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 9
## Number of equality constraints 6
##
## Used Total
## Number of observations 932 933
## Number of missing patterns 60
##
## Model Test User Model:
##
## Test statistic 1759.002
## Degrees of freedom 32
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 862.334
## Degrees of freedom 21
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.000
## Tucker-Lewis Index (TLI) -0.347
##
## Robust Comparative Fit Index (CFI) 0.000
## Robust Tucker-Lewis Index (TLI) 0.093
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -8745.952
## Loglikelihood unrestricted model (H1) -7866.451
##
## Akaike (AIC) 17497.903
## Bayesian (BIC) 17512.415
## Sample-size adjusted Bayesian (SABIC) 17502.888
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.241
## 90 Percent confidence interval - lower 0.231
## 90 Percent confidence interval - upper 0.250
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Robust RMSEA 0.467
## 90 Percent confidence interval - lower 0.402
## 90 Percent confidence interval - upper 0.534
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.480
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## eta_1 =~
## math2 1.000
## math3 1.000
## math4 1.000
## math5 1.000
## math6 1.000
## math7 1.000
## math8 1.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## eta_1 45.915 0.324 141.721 0.000
## .math2 0.000
## .math3 0.000
## .math4 0.000
## .math5 0.000
## .math6 0.000
## .math7 0.000
## .math8 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 46.917 4.832 9.709 0.000
## .math2 (thet) 116.682 4.548 25.656 0.000
## .math3 (thet) 116.682 4.548 25.656 0.000
## .math4 (thet) 116.682 4.548 25.656 0.000
## .math5 (thet) 116.682 4.548 25.656 0.000
## .math6 (thet) 116.682 4.548 25.656 0.000
## .math7 (thet) 116.682 4.548 25.656 0.000
## .math8 (thet) 116.682 4.548 25.656 0.000
We get information about how the model fits the data, as well as some information about the data itself. Interpretation is covered in more detail in the accompanying video.
No Growth Diagram
We can use the fitted model object to make a diagram (and also check if we specified and estimated the model as intended).
No Growth Predicted Trajectories
Pulling predicted trajectories out of lavaan takes a bit of work, but is possible.
#obtaining predicted factor scores for individuals
nlsy_math_predicted <- as.data.frame(cbind(nlsy_math_wide$id,
lavPredict(ng_math_lavaan_fit)))
#naming columns
names(nlsy_math_predicted) <- c("id", "eta_1")
#looking at data
head(nlsy_math_predicted) ## id eta_1
## 1 201 46.17558
## 2 303 38.59816
## 3 2702 56.16725
## 4 4303 47.51278
## 5 5002 51.06429
## 6 5005 49.05038
#calculating implied manifest scores
nlsy_math_predicted$math2 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math3 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math4 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math5 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math6 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math7 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math8 <- 1*nlsy_math_predicted$eta_1
#reshaping wide to long
nlsy_math_predicted_long <- nlsy_math_predicted %>%
pivot_longer(cols = starts_with("math"),
names_to = "grade",
names_prefix = "math",
values_to = "math") %>%
mutate(grade = as.numeric(grade), #convert char vars to numeric
math = as.numeric(math)) %>%
arrange(id, grade) #order by id and time
#intraindividual change trajectories
nlsy_math_predicted_long %>% #data
ggplot(aes(x = grade, y = math, group = id)) + #setting variables
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks=c(2,3,4,5,6,7,8),
name="Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks=c(10,30,50,70,90),
name="Predicted PIAT Mathematics")We see from the plot that this no growth model characterizes each individual’s trajectory as a straight, flat line.
Linear Growth Model
To capture the systematic growth in the mathematics scores over time, we exapnd the model to include a linear “slope” factor.
Specifying linear growth model
#writing out linear growth model in full SEM way
lg_math_lavaan_model <- '
# latent variable definitions
#intercept (note intercept is a reserved term)
eta_1 =~ 1*math2
eta_1 =~ 1*math3
eta_1 =~ 1*math4
eta_1 =~ 1*math5
eta_1 =~ 1*math6
eta_1 =~ 1*math7
eta_1 =~ 1*math8
#linear slope
eta_2 =~ 0*math2
eta_2 =~ 1*math3
eta_2 =~ 2*math4
eta_2 =~ 3*math5
eta_2 =~ 4*math6
eta_2 =~ 5*math7
eta_2 =~ 6*math8
# factor variances
eta_1 ~~ eta_1
eta_2 ~~ eta_2
# covariances among factors
eta_1 ~~ eta_2
# factor means
eta_1 ~ start(35)*1
eta_2 ~ start(4)*1
# manifest variances (made equivalent by naming theta)
math2 ~~ theta*math2
math3 ~~ theta*math3
math4 ~~ theta*math4
math5 ~~ theta*math5
math6 ~~ theta*math6
math7 ~~ theta*math7
math8 ~~ theta*math8
# manifest means (fixed at zero)
math2 ~ 0*1
math3 ~ 0*1
math4 ~ 0*1
math5 ~ 0*1
math6 ~ 0*1
math7 ~ 0*1
math8 ~ 0*1
' #end of model definitionAnd fit the model to the data …
#estimating the model using sem() function
lg_math_lavaan_fit <- sem(lg_math_lavaan_model,
data = nlsy_math_wide,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml")## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
## 741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## due to missing values, some pairwise combinations have less than
## 10% coverage; use lavInspect(fit, "coverage") to investigate.
## Warning in lav_mvnorm_missing_h1_estimate_moments(Y = X[[g]], wt = WT[[g]], : lavaan WARNING:
## Maximum number of iterations reached when computing the sample
## moments using EM; use the em.h1.iter.max= argument to increase the
## number of iterations
We can look more specifically at the numerical summaries of the model.
#obtaining summary of the model using the object we just created
summary(lg_math_lavaan_fit, fit.measures=TRUE)## lavaan 0.6.16 ended normally after 27 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 12
## Number of equality constraints 6
##
## Used Total
## Number of observations 932 933
## Number of missing patterns 60
##
## Model Test User Model:
##
## Test statistic 204.484
## Degrees of freedom 29
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 862.334
## Degrees of freedom 21
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.791
## Tucker-Lewis Index (TLI) 0.849
##
## Robust Comparative Fit Index (CFI) 0.896
## Robust Tucker-Lewis Index (TLI) 0.925
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -7968.693
## Loglikelihood unrestricted model (H1) -7866.451
##
## Akaike (AIC) 15949.386
## Bayesian (BIC) 15978.410
## Sample-size adjusted Bayesian (SABIC) 15959.354
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.081
## 90 Percent confidence interval - lower 0.070
## 90 Percent confidence interval - upper 0.091
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 0.550
##
## Robust RMSEA 0.134
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.233
## P-value H_0: Robust RMSEA <= 0.050 0.136
## P-value H_0: Robust RMSEA >= 0.080 0.792
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.121
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## eta_1 =~
## math2 1.000
## math3 1.000
## math4 1.000
## math5 1.000
## math6 1.000
## math7 1.000
## math8 1.000
## eta_2 =~
## math2 0.000
## math3 1.000
## math4 2.000
## math5 3.000
## math6 4.000
## math7 5.000
## math8 6.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 ~~
## eta_2 -0.181 1.150 -0.158 0.875
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## eta_1 35.267 0.355 99.229 0.000
## eta_2 4.339 0.088 49.136 0.000
## .math2 0.000
## .math3 0.000
## .math4 0.000
## .math5 0.000
## .math6 0.000
## .math7 0.000
## .math8 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 64.562 5.659 11.408 0.000
## eta_2 0.733 0.327 2.238 0.025
## .math2 (thet) 36.230 1.867 19.410 0.000
## .math3 (thet) 36.230 1.867 19.410 0.000
## .math4 (thet) 36.230 1.867 19.410 0.000
## .math5 (thet) 36.230 1.867 19.410 0.000
## .math6 (thet) 36.230 1.867 19.410 0.000
## .math7 (thet) 36.230 1.867 19.410 0.000
## .math8 (thet) 36.230 1.867 19.410 0.000
We get information about how the model fits the data, as well as some information about the data itself. Interpretation is covered in more detail in the accompanying video.
Linear Growth Model Diagram
We can make a diagram to check if we specified and estimated the model as intended.
Linear Growth Model Predicted Trajectories Plot
Plotting the predicted trajectories again takes some work, but is possible. The prototypical trajectory would need to be made from the model output.
#obtaining predicted factor scores for individuals
nlsy_math_predicted <- as.data.frame(cbind(nlsy_math_wide$id,
lavPredict(lg_math_lavaan_fit)))
#naming columns
names(nlsy_math_predicted) <- c("id", "eta_1", "eta_2")
head(nlsy_math_predicted)## id eta_1 eta_2
## 1 201 36.94675 4.534084
## 2 303 26.03589 4.050780
## 3 2702 49.70187 4.594148
## 4 4303 41.04200 4.548063
## 5 5002 37.01241 4.496745
## 6 5005 37.68809 4.324197
#calculating implied manifest scores
nlsy_math_predicted$math2 <- 1*nlsy_math_predicted$eta_1 + 0*nlsy_math_predicted$eta_2
nlsy_math_predicted$math3 <- 1*nlsy_math_predicted$eta_1 + 1*nlsy_math_predicted$eta_2
nlsy_math_predicted$math4 <- 1*nlsy_math_predicted$eta_1 + 2*nlsy_math_predicted$eta_2
nlsy_math_predicted$math5 <- 1*nlsy_math_predicted$eta_1 + 3*nlsy_math_predicted$eta_2
nlsy_math_predicted$math6 <- 1*nlsy_math_predicted$eta_1 + 4*nlsy_math_predicted$eta_2
nlsy_math_predicted$math7 <- 1*nlsy_math_predicted$eta_1 + 5*nlsy_math_predicted$eta_2
nlsy_math_predicted$math8 <- 1*nlsy_math_predicted$eta_1 + 6*nlsy_math_predicted$eta_2
#reshaping wide to long
nlsy_math_predicted_long <- nlsy_math_predicted %>%
pivot_longer(cols = starts_with("math"),
names_to = "grade",
names_prefix = "math",
values_to = "math") %>%
mutate(grade = as.numeric(grade), #convert char vars to numeric
math = as.numeric(math)) %>%
arrange(id, grade) #order by id and time
#intraindividual change trajectories
nlsy_math_predicted_long %>% #data set
filter(id < 80000) %>% #filter IDs
ggplot(aes(x = grade, y = math, group = id)) + #setting variables
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks=c(2,3,4,5,6,7,8),
name="Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks=c(10,30,50,70,90),
name="Predicted PIAT Mathematics")We see from the plot that this model characterized everyone’s trajectory as a straight line, with some interindividual differences in the rate of intraindividual change.
Conclusion
This tutorial has presented how to set up and fit no growth and
linear growth models in the SEM modeling framework using the
lavaan package in R, as well as calculate and plot the
predicted trajectories.
Yay for Growth Modeling! It’s sooo fun!