Growth Model with Continuous Time Variable
Multilevel & SEM Implementation in R
Overview
This tutorial illustrates fitting of linear growth models with a continuous time variable in the multilevel and SEM frameworks in R.
Example data and code are drawn from Chapter 4 of Grimm, Ram, and Estabrook (2017). Specifically, we examine change in children’s mathematics achievement across age using the NLSY-CYA Dataset. We fit both No Growth and Linear Growth models. Please see the book chapter for additional interpretations and insights about the analyses.
Preliminaries
Loading libraries used in this script.
Data Preparation and Description
For our examples, we use the mathematics achievement scores from the NLSY-CYA Long Data.
Load the repeated measures data
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/GrowthModeling/nlsy_math_long_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
na.strings = ".") #indicates the missing data designator
#copy data with new name
nlsy_math_long <- dat
#Add names the columns of the data set
names(nlsy_math_long) = c('id' , 'female', 'lb_wght',
'anti_k1', 'math' , 'grade' ,
'occ' , 'age' , 'men' ,
'spring' , 'anti')
#subset to the variables of interest
nlsy_math_long <- nlsy_math_long[ ,c("id", "math", "grade", "age")]
#view the first few observations in the data set
head(nlsy_math_long, 10)## id math grade age
## 1 201 38 3 111
## 2 201 55 5 135
## 3 303 26 2 121
## 4 303 33 5 145
## 5 2702 56 2 100
## 6 2702 58 4 125
## 7 2702 80 8 173
## 8 4303 41 3 115
## 9 4303 58 4 135
## 10 5002 46 4 117
Our specific interest is in how the repeated measures of
math change across age. Note that
age is measured in months.
Longitudinal Plot of Math across Grade at Testing
As noted in Chapter 2 , it is important to plot the data to obtain a
better understanding of the structure and form of the observed
phenomenon. Here, we want to examine the data to see how the repeated
measures of math are structured with respect to
age.
#intraindividual change trajectories
nlsy_math_long %>% #data set
ggplot(aes(x = age, y = math, group = id)) + #setting variables
geom_point(size=.5) + #adding points to plot
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(name = "Age (Months) at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "PIAT Mathematics")Note that each individual provides data at unique ages and for a unique span of age.
Of note, the data do not go back to age = 0 months. Thus, we will
rescale age into years and center at age 8 years.
Multilevel: Continuous Time Linear Growth Model
To capture the systematic growth in the mathematics scores over age.
We use the continuous time age variable, so that the rate
of growth indicates (linear) age-related change in mathematics
scores.
Using the notation from the lecture, the model equations are …
\[y_{it} = b_{1i}1 + b_{2i}((age_{it}/12)-8) + u_{it}\] \[b_{1i} = \beta_{1} + d_{1i}\] and \[b_{2i} = \beta_{2} + d_{2i}\]
Substitution of one equation into the other we get … \[y_{it} = (\beta_{1} + d_{1i}) + (\beta_{2} + d_{2i})((age_{it}/12)-8) + u_{it}\]
where \[u_{it} \tilde{} N(0,\sigma^{2}_{u})\], and \[\left[\begin{array} {r} d_{1i} \\ d_{2i} \end{array}\right] \tilde{} N(\left[\begin{array} {r} 0 \\ 0 \end{array}\right] ,\left[\begin{array} {r} \sigma^{2}_{d_1} & \sigma_{d_1d_2} \\ \sigma_{d_1d_2} & \sigma^{2}_{d_2} \end{array}\right])\].
This is then the model that gets coded into the multilevel modeling functions. This can be done in a variety of ways. We concentrate on use of the nlme:nlme() function, with an explicit coding style.
Linear Growth Model using nlme() function
Note that in fitting the linear growth model, we recenter the time variable within the model so that the intercept is the expected score at Age 8 years.
#fitting linear growth model and assigning it to an object
lg_math_age_nlme <- nlme::nlme(math~(beta_1+d_1i)+(beta_2+d_2i)*((age/12)-8),
data=nlsy_math_long,
fixed=beta_1+beta_2~1,
random=d_1i+d_2i~1,
group=~id,
start=c(beta_1=35,beta_2=4),
na.action = na.exclude)
#obtaining summary of the model using the object we just created
summary(lg_math_age_nlme)## Nonlinear mixed-effects model fit by maximum likelihood
## Model: math ~ (beta_1 + d_1i) + (beta_2 + d_2i) * ((age/12) - 8)
## Data: nlsy_math_long
## AIC BIC logLik
## 15855.14 15889.37 -7921.568
##
## Random effects:
## Formula: list(d_1i ~ 1, d_2i ~ 1)
## Level: id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## d_1i 8.0516834 d_1i
## d_2i 0.8216866 0.2
## Residual 5.6705745
##
## Fixed effects: beta_1 + beta_2 ~ 1
## Value Std.Error DF t-value p-value
## beta_1 35.17313 0.3463334 1288 101.55858 0
## beta_2 4.25656 0.0804631 1288 52.90071 0
## Correlation:
## beta_1
## beta_2 -0.453
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.382126438 -0.516299284 0.003860197 0.528128481 2.651445554
##
## Number of Observations: 2221
## Number of Groups: 932
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## beta_1 34.493998 35.173131 35.852265
## beta_2 4.098775 4.256557 4.414339
##
## Random Effects:
## Level: id
## lower est. upper
## sd(d_1i) 7.4015676 8.0516834 8.7589019
## sd(d_2i) 0.5480003 0.8216866 1.2320592
## cor(d_1i,d_2i) -0.1870258 0.1997317 0.5328806
##
## Within-group standard error:
## lower est. upper
## 5.383789 5.670574 5.972637
We get information about how the model fits the data, as well as some information about the linear growth process. We see from the model that there is evidence of linear growth in children’s mathematics scores across age. More detailed interpretation is given in the accompanying video.
Plotting the predicted trajectories
#obtaining predicted scores for individuals
nlsy_math_long$pred_lg <- predict(lg_math_age_nlme)
#obtaining predicted scores for prototype
nlsy_math_long$proto_lg <- predict(lg_math_age_nlme, level=0)
#plotting predicted trajectories
#intraindividual change trajectories
nlsy_math_long %>%
ggplot(aes(x = age/12-8, y = pred_lg, group = id)) +
geom_line(color="black") +
geom_line(aes(x = (age/12-8), y = proto_lg),
color="red", linewidth=2) +
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(name = "Age at Testing (centered at 8 years)") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks=c(10,30,50,70,90),
name="Predicted PIAT Mathematics")Note that the x-axis here is scaled to match the specific analysis.
For interpretation we may want to replot on the original time scaling, and plot the prototypical predicted trajectory derived from the linear growth model against the raw data.
#intraindividual change trajectories
#plotting predicted trajectory
nlsy_math_long %>%
ggplot(aes(x = age/12, y = math, group = id)) +
geom_point(color="gray", size=.5) +
geom_line(color="gray") +
geom_line(aes(x = age/12, y = proto_lg),
color="red", linewidth=2) +
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(name = "Age (Years)") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "Predicted PIAT Mathematics")SEM: Time Window Approach Linear Growth Model
Analysis of continuous time metrics is not so straightforward in SEM.
The time window approach is one potential method for analyzing data with individually varying measurement occasions. Essentially, the time window approach aims to approximate the individually varying time metric on a discrete scale. For example, this can be achieved by rounding time/age to the nearest half- or quarter-year.
This method is of course still an approximation of time. One can achieve greater precision by using smaller windows, but if the data matrix becomes too sparse, estimation is difficult.
In this example, the time windows are defined as half-years. So, we take our long data, round age to the nearest half-year, and convert to wide format for use in SEM framework.
Preparation of Time-Windowed Data
Implementing the time-windowing and reshaping of data
#creating new age variable scaled in years
nlsy_math_long$ageyr <- (nlsy_math_long$age/12)
head(nlsy_math_long)## id math grade age pred_lg proto_lg ageyr
## 1 201 38 3 111 41.85670 40.49383 9.250000
## 2 201 55 5 135 50.77440 49.00694 11.250000
## 3 303 26 2 121 29.07883 44.04096 10.083333
## 4 303 33 5 145 36.24612 52.55407 12.083333
## 5 2702 56 2 100 49.77825 36.59198 8.333333
## 6 2702 58 4 125 59.56828 45.45981 10.416667
#rounding to nearest half-year
#multiplied by 10 to remove decimal for easy conversion to wide
nlsy_math_long$agewindow <- plyr::round_any(nlsy_math_long$ageyr*10, 5)
head(nlsy_math_long)## id math grade age pred_lg proto_lg ageyr agewindow
## 1 201 38 3 111 41.85670 40.49383 9.250000 90
## 2 201 55 5 135 50.77440 49.00694 11.250000 110
## 3 303 26 2 121 29.07883 44.04096 10.083333 100
## 4 303 33 5 145 36.24612 52.55407 12.083333 120
## 5 2702 56 2 100 49.77825 36.59198 8.333333 85
## 6 2702 58 4 125 59.56828 45.45981 10.416667 105
#reshaping long to wide (just variables of interest)
nlsy_math_wide <- nlsy_math_long %>%
select(id, math, agewindow) %>%
pivot_wider(id_cols="id",
names_from = "agewindow",
names_prefix = "math",
values_from = "math")
#reordering columns for easy viewing
nlsy_math_wide <- nlsy_math_wide[ ,c("id", "math70", "math75",
"math80", "math85", "math90", "math95",
"math100", "math105", "math110", "math115",
"math120", "math125", "math130", "math135",
"math140", "math145")]
#looking at the data
head(nlsy_math_wide)## # A tibble: 6 × 17
## id math70 math75 math80 math85 math90 math95 math100 math105 math110
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 201 NA NA NA NA 38 NA NA NA 55
## 2 303 NA NA NA NA NA NA 26 NA NA
## 3 2702 NA NA NA 56 NA NA NA 58 NA
## 4 4303 NA NA NA NA NA 41 NA NA 58
## 5 5002 NA NA NA NA NA NA 46 NA NA
## 6 5005 NA NA 35 NA NA 50 NA NA NA
## # ℹ 7 more variables: math115 <int>, math120 <int>, math125 <int>,
## # math130 <int>, math135 <int>, math140 <int>, math145 <int>
The data are now time-windowed and ready for use in SEM framework. Typically one would explore a few different time-windows and check to see what the coverage looks like in order to find a suitable window-size that maintains specificity but still has sufficient coverage for accurate model estimation.
Linear Growth Model with Time-Windowed Age
Specifying the SEM model
#writing out linear growth model in full SEM way
lg_math_age_lavaan_model <- '
# latent variable definitions
#intercept (note intercept is a reserved term)
eta_1 =~ 1*math70 +
1*math75 +
1*math80 +
1*math85 +
1*math90 +
1*math95 +
1*math100 +
1*math105 +
1*math110 +
1*math115 +
1*math120 +
1*math125 +
1*math130 +
1*math135 +
1*math140 +
1*math145
#linear slope (note intercept is a reserved term)
eta_2 =~ -1*math70 +
-0.5*math75 +
0*math80 +
0.5*math85 +
1*math90 +
1.5*math95 +
2*math100 +
2.5*math105 +
3*math110 +
3.5*math115 +
4*math120 +
4.5*math125 +
5*math130 +
5.5*math135 +
6*math140 +
6.5*math145
# factor variances
eta_1 ~~ start(65)*eta_1
eta_2 ~~ start(.75)*eta_2
# covariances among factors
eta_1 ~~ start(1.2)*eta_2
# manifest variances (made equivalent by naming theta)
math70 ~~ start(35)*theta*math70
math75 ~~ theta*math75
math80 ~~ theta*math80
math85 ~~ theta*math85
math90 ~~ theta*math90
math95 ~~ theta*math95
math100 ~~ theta*math100
math105 ~~ theta*math105
math110 ~~ theta*math110
math115 ~~ theta*math115
math120 ~~ theta*math120
math125 ~~ theta*math125
math130 ~~ theta*math130
math135 ~~ theta*math135
math140 ~~ theta*math140
math145 ~~ theta*math145
# manifest means (fixed at zero)
math70 ~ 0*1
math75 ~ 0*1
math80 ~ 0*1
math85 ~ 0*1
math90 ~ 0*1
math95 ~ 0*1
math100 ~ 0*1
math105 ~ 0*1
math110 ~ 0*1
math115 ~ 0*1
math120 ~ 0*1
math125 ~ 0*1
math130 ~ 0*1
math135 ~ 0*1
math140 ~ 0*1
math145 ~ 0*1
# factor means (estimated freely)
eta_1 ~ start(35)*1
eta_2 ~ start(4)*1
' #end of model definitionAnd fit the model to the data … using maximum likelihood estimator and full information maximum likelihood to handle missing values.
#estimating the model using sem() function
lg_math_age_lavaan_fit <- sem(lg_math_age_lavaan_model,
data = nlsy_math_wide,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml")## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
## due to missing values, some pairwise combinations have 0%
## coverage; use lavInspect(fit, "coverage") to investigate.
## Warning in lav_mvnorm_missing_h1_estimate_moments(Y = X[[g]], wt = WT[[g]], : lavaan WARNING:
## Maximum number of iterations reached when computing the sample
## moments using EM; use the em.h1.iter.max= argument to increase the
## number of iterations
We can look more specifically at the numerical summaries of the model.
#obtaining summary of the model using the object we just created
summary(lg_math_age_lavaan_fit, fit.measures=TRUE)## Warning in pchisq(X2, df = df, ncp = ncp): NaNs produced
## Warning in pchisq(X2, df = df, ncp = ncp): NaNs produced
## lavaan 0.6.16 ended normally after 27 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 21
## Number of equality constraints 15
##
## Number of observations 932
## Number of missing patterns 139
##
## Model Test User Model:
##
## Test statistic 295.028
## Degrees of freedom 146
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 1053.342
## Degrees of freedom 120
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.840
## Tucker-Lewis Index (TLI) 0.869
##
## Robust Comparative Fit Index (CFI) 0.003
## Robust Tucker-Lewis Index (TLI) 0.181
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -7928.559
## Loglikelihood unrestricted model (H1) -7781.045
##
## Akaike (AIC) 15869.117
## Bayesian (BIC) 15898.141
## Sample-size adjusted Bayesian (SABIC) 15879.086
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.033
## 90 Percent confidence interval - lower 0.028
## 90 Percent confidence interval - upper 0.039
## P-value H_0: RMSEA <= 0.050 1.000
## P-value H_0: RMSEA >= 0.080 0.000
##
## Robust RMSEA 4.193
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NaN
## P-value H_0: Robust RMSEA >= 0.080 NaN
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.314
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## eta_1 =~
## math70 1.000
## math75 1.000
## math80 1.000
## math85 1.000
## math90 1.000
## math95 1.000
## math100 1.000
## math105 1.000
## math110 1.000
## math115 1.000
## math120 1.000
## math125 1.000
## math130 1.000
## math135 1.000
## math140 1.000
## math145 1.000
## eta_2 =~
## math70 -1.000
## math75 -0.500
## math80 0.000
## math85 0.500
## math90 1.000
## math95 1.500
## math100 2.000
## math105 2.500
## math110 3.000
## math115 3.500
## math120 4.000
## math125 4.500
## math130 5.000
## math135 5.500
## math140 6.000
## math145 6.500
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 ~~
## eta_2 1.157 1.010 1.146 0.252
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .math70 0.000
## .math75 0.000
## .math80 0.000
## .math85 0.000
## .math90 0.000
## .math95 0.000
## .math100 0.000
## .math105 0.000
## .math110 0.000
## .math115 0.000
## .math120 0.000
## .math125 0.000
## .math130 0.000
## .math135 0.000
## .math140 0.000
## .math145 0.000
## eta_1 35.236 0.347 101.512 0.000
## eta_2 4.229 0.081 51.910 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 65.063 5.503 11.824 0.000
## eta_2 0.725 0.277 2.616 0.009
## .math70 (thet) 32.337 1.695 19.083 0.000
## .math75 (thet) 32.337 1.695 19.083 0.000
## .math80 (thet) 32.337 1.695 19.083 0.000
## .math85 (thet) 32.337 1.695 19.083 0.000
## .math90 (thet) 32.337 1.695 19.083 0.000
## .math95 (thet) 32.337 1.695 19.083 0.000
## .math100 (thet) 32.337 1.695 19.083 0.000
## .math105 (thet) 32.337 1.695 19.083 0.000
## .math110 (thet) 32.337 1.695 19.083 0.000
## .math115 (thet) 32.337 1.695 19.083 0.000
## .math120 (thet) 32.337 1.695 19.083 0.000
## .math125 (thet) 32.337 1.695 19.083 0.000
## .math130 (thet) 32.337 1.695 19.083 0.000
## .math135 (thet) 32.337 1.695 19.083 0.000
## .math140 (thet) 32.337 1.695 19.083 0.000
## .math145 (thet) 32.337 1.695 19.083 0.000
We get information about how the model fits the data, as well as some information about the data itself. Interpretation is covered in more detail in the accompanying video.
Linear Growth Fitted Model Diagram
We can use the fitted model object to make a diagram (and also check if we specified and estimated the model as intended).
Conclusion
This tutorial has presented how continuous time metrics can be used in both the multilevel and SEM frameworks.
Yay for Continuous Time! It keeps things moving!