Growth Model with Continuous Time Variable

Multilevel & SEM Implementation in R

Overview

This tutorial illustrates fitting of linear growth models with a continuous time variable in the multilevel and SEM frameworks in R.

Example data and code are drawn from Chapter 4 of Grimm, Ram, and Estabrook (2017). Specifically, we examine change in children’s mathematics achievement across age using the NLSY-CYA Dataset. We fit both No Growth and Linear Growth models. Please see the book chapter for additional interpretations and insights about the analyses.

Preliminaries

Loading libraries used in this script.

library(psych)      #basic functions
library(tidyverse)  #ggplot2, stringr, dplyr, readr, forcats, tidyr, purr
library(nlme)       #mixed effects models
library(lme4)       #mixed effects models
library(lavaan)     #SEM 
library(semPlot)    #SEM diagrams

Data Preparation and Description

For our examples, we use the mathematics achievement scores from the NLSY-CYA Long Data.

Load the repeated measures data

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/main/GrowthModeling/nlsy_math_long_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
                  na.strings = ".")  #indicates the missing data designator
#copy data with new name 
nlsy_math_long <- dat  

#Add names the columns of the data set
names(nlsy_math_long) = c('id'     , 'female', 'lb_wght', 
                          'anti_k1', 'math'  , 'grade'  ,
                          'occ'    , 'age'   , 'men'    ,
                          'spring' , 'anti')

#subset to the variables of interest
nlsy_math_long <- nlsy_math_long[ ,c("id", "math", "grade", "age")]
#view the first few observations in the data set 
head(nlsy_math_long, 10)

##      id math grade age
## 1   201   38     3 111
## 2   201   55     5 135
## 3   303   26     2 121
## 4   303   33     5 145
## 5  2702   56     2 100
## 6  2702   58     4 125
## 7  2702   80     8 173
## 8  4303   41     3 115
## 9  4303   58     4 135
## 10 5002   46     4 117

Our specific interest is in how the repeated measures of math change across age. Note that age is measured in months.

Longitudinal Plot of Math across Grade at Testing

As noted in Chapter 2 , it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon. Here, we want to examine the data to see how the repeated measures of math are structured with respect to age.

#intraindividual change trajectories
nlsy_math_long %>%                              #data set
  ggplot(aes(x = age, y = math, group = id)) +  #setting variables
  geom_point(size=.5) +                         #adding points to plot
  geom_line() +                                 #adding lines to plot
  theme_bw() +                                  #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(name = "Age (Months) at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "PIAT Mathematics")

Note that each individual provides data at unique ages and for a unique span of age.

Of note, the data do not go back to age = 0 months. Thus, we will rescale age into years and center at age 8 years.

Multilevel: Continuous Time Linear Growth Model

To capture the systematic growth in the mathematics scores over age. We use the continuous time age variable, so that the rate of growth indicates (linear) age-related change in mathematics scores.

Using the notation from the lecture, the model equations are …

\[y_{it} = b_{1i}1 + b_{2i}((age_{it}/12)-8) + u_{it}\] \[b_{1i} = \beta_{1} + d_{1i}\] and \[b_{2i} = \beta_{2} + d_{2i}\]

Substitution of one equation into the other we get … \[y_{it} = (\beta_{1} + d_{1i}) + (\beta_{2} + d_{2i})((age_{it}/12)-8) + u_{it}\]

where \[u_{it} \tilde{} N(0,\sigma^{2}_{u})\], and \[\left[\begin{array} {r} d_{1i} \\ d_{2i} \end{array}\right] \tilde{} N(\left[\begin{array} {r} 0 \\ 0 \end{array}\right] ,\left[\begin{array} {r} \sigma^{2}_{d_1} & \sigma_{d_1d_2} \\ \sigma_{d_1d_2} & \sigma^{2}_{d_2} \end{array}\right])\].

This is then the model that gets coded into the multilevel modeling functions. This can be done in a variety of ways. We concentrate on use of the nlme:nlme() function, with an explicit coding style.

Linear Growth Model using `nlme()` function

Note that in fitting the linear growth model, we recenter the time variable within the model so that the intercept is the expected score at Age 8 years.

#fitting linear growth model and assigning it to an object
lg_math_age_nlme <- nlme::nlme(math~(beta_1+d_1i)+(beta_2+d_2i)*((age/12)-8),  
                         data=nlsy_math_long,                      
                         fixed=beta_1+beta_2~1,                      
                         random=d_1i+d_2i~1,
                         group=~id,                     
                         start=c(beta_1=35,beta_2=4),
                         na.action = na.exclude)

#obtaining summary of the model using the object we just created
summary(lg_math_age_nlme)

## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: math ~ (beta_1 + d_1i) + (beta_2 + d_2i) * ((age/12) - 8) 
##   Data: nlsy_math_long 
##        AIC      BIC    logLik
##   15855.14 15889.37 -7921.568
## 
## Random effects:
##  Formula: list(d_1i ~ 1, d_2i ~ 1)
##  Level: id
##  Structure: General positive-definite, Log-Cholesky parametrization
##          StdDev    Corr
## d_1i     8.0516834 d_1i
## d_2i     0.8216866 0.2 
## Residual 5.6705745     
## 
## Fixed effects:  beta_1 + beta_2 ~ 1 
##           Value Std.Error   DF   t-value p-value
## beta_1 35.17313 0.3463334 1288 101.55858       0
## beta_2  4.25656 0.0804631 1288  52.90071       0
##  Correlation: 
##        beta_1
## beta_2 -0.453
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.382126438 -0.516299284  0.003860197  0.528128481  2.651445554 
## 
## Number of Observations: 2221
## Number of Groups: 932

#obtaining confidence intervals for parameters
nlme::intervals(lg_math_age_nlme)

## Approximate 95% confidence intervals
## 
##  Fixed effects:
##            lower      est.     upper
## beta_1 34.493998 35.173131 35.852265
## beta_2  4.098775  4.256557  4.414339
## 
##  Random Effects:
##   Level: id 
##                     lower      est.     upper
## sd(d_1i)        7.4015676 8.0516834 8.7589019
## sd(d_2i)        0.5480003 0.8216866 1.2320592
## cor(d_1i,d_2i) -0.1870258 0.1997317 0.5328806
## 
##  Within-group standard error:
##    lower     est.    upper 
## 5.383789 5.670574 5.972637

We get information about how the model fits the data, as well as some information about the linear growth process. We see from the model that there is evidence of linear growth in children’s mathematics scores across age. More detailed interpretation is given in the accompanying video.

Plotting the predicted trajectories

#obtaining predicted scores for individuals
nlsy_math_long$pred_lg <- predict(lg_math_age_nlme)

#obtaining predicted scores for prototype
nlsy_math_long$proto_lg <- predict(lg_math_age_nlme, level=0)

#plotting predicted trajectories
#intraindividual change trajectories
nlsy_math_long %>%
  ggplot(aes(x = age/12-8, y = pred_lg, group = id)) +
  geom_line(color="black") +
  geom_line(aes(x = (age/12-8), y = proto_lg), 
            color="red", linewidth=2) + 
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(name = "Age at Testing (centered at 8 years)") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks=c(10,30,50,70,90), 
                     name="Predicted PIAT Mathematics")

Note that the x-axis here is scaled to match the specific analysis.

For interpretation we may want to replot on the original time scaling, and plot the prototypical predicted trajectory derived from the linear growth model against the raw data.

#intraindividual change trajectories
#plotting predicted trajectory
nlsy_math_long %>%
  ggplot(aes(x = age/12, y = math, group = id)) +
  geom_point(color="gray", size=.5) + 
  geom_line(color="gray") +
  geom_line(aes(x = age/12, y = proto_lg), 
            color="red", linewidth=2) + 
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(name = "Age (Years)") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "Predicted PIAT Mathematics")

SEM: Time Window Approach Linear Growth Model

Analysis of continuous time metrics is not so straightforward in SEM.

The time window approach is one potential method for analyzing data with individually varying measurement occasions. Essentially, the time window approach aims to approximate the individually varying time metric on a discrete scale. For example, this can be achieved by rounding time/age to the nearest half- or quarter-year.

This method is of course still an approximation of time. One can achieve greater precision by using smaller windows, but if the data matrix becomes too sparse, estimation is difficult.

In this example, the time windows are defined as half-years. So, we take our long data, round age to the nearest half-year, and convert to wide format for use in SEM framework.

Preparation of Time-Windowed Data

Implementing the time-windowing and reshaping of data

#creating new age variable scaled in years 
nlsy_math_long$ageyr <- (nlsy_math_long$age/12)
head(nlsy_math_long)

##     id math grade age  pred_lg proto_lg     ageyr
## 1  201   38     3 111 41.85670 40.49383  9.250000
## 2  201   55     5 135 50.77440 49.00694 11.250000
## 3  303   26     2 121 29.07883 44.04096 10.083333
## 4  303   33     5 145 36.24612 52.55407 12.083333
## 5 2702   56     2 100 49.77825 36.59198  8.333333
## 6 2702   58     4 125 59.56828 45.45981 10.416667

#rounding to nearest half-year 
#multiplied by 10 to remove decimal for easy conversion to wide
nlsy_math_long$agewindow <- plyr::round_any(nlsy_math_long$ageyr*10, 5)
head(nlsy_math_long)

##     id math grade age  pred_lg proto_lg     ageyr agewindow
## 1  201   38     3 111 41.85670 40.49383  9.250000        90
## 2  201   55     5 135 50.77440 49.00694 11.250000       110
## 3  303   26     2 121 29.07883 44.04096 10.083333       100
## 4  303   33     5 145 36.24612 52.55407 12.083333       120
## 5 2702   56     2 100 49.77825 36.59198  8.333333        85
## 6 2702   58     4 125 59.56828 45.45981 10.416667       105

#reshaping long to wide (just variables of interest)
nlsy_math_wide <- nlsy_math_long %>%
  select(id, math, agewindow) %>%
  pivot_wider(id_cols="id",
              names_from = "agewindow",
              names_prefix = "math",
              values_from = "math")

#reordering columns for easy viewing
nlsy_math_wide <- nlsy_math_wide[ ,c("id", "math70", "math75", 
                                     "math80", "math85", "math90", "math95", 
                                     "math100", "math105", "math110", "math115", 
                                     "math120", "math125", "math130", "math135", 
                                     "math140", "math145")]
#looking at the data
head(nlsy_math_wide)

## # A tibble: 6 × 17
##      id math70 math75 math80 math85 math90 math95 math100 math105 math110
##   <int>  <int>  <int>  <int>  <int>  <int>  <int>   <int>   <int>   <int>
## 1   201     NA     NA     NA     NA     38     NA      NA      NA      55
## 2   303     NA     NA     NA     NA     NA     NA      26      NA      NA
## 3  2702     NA     NA     NA     56     NA     NA      NA      58      NA
## 4  4303     NA     NA     NA     NA     NA     41      NA      NA      58
## 5  5002     NA     NA     NA     NA     NA     NA      46      NA      NA
## 6  5005     NA     NA     35     NA     NA     50      NA      NA      NA
## # ℹ 7 more variables: math115 <int>, math120 <int>, math125 <int>,
## #   math130 <int>, math135 <int>, math140 <int>, math145 <int>

The data are now time-windowed and ready for use in SEM framework. Typically one would explore a few different time-windows and check to see what the coverage looks like in order to find a suitable window-size that maintains specificity but still has sufficient coverage for accurate model estimation.

Linear Growth Model with Time-Windowed Age

Specifying the SEM model

#writing out linear growth model in full SEM way 
lg_math_age_lavaan_model <- '
  # latent variable definitions
      #intercept (note intercept is a reserved term)
      eta_1 =~ 1*math70 +
                1*math75 +
                1*math80 +
                1*math85 +
                1*math90 +
                1*math95 +
                1*math100 +
                1*math105 +
                1*math110 +
                1*math115 +
                1*math120 +
                1*math125 +
                1*math130 +
                1*math135 +
                1*math140 +
                1*math145 

      #linear slope (note intercept is a reserved term)
      eta_2 =~ -1*math70 +
                -0.5*math75 +
                0*math80 +
                0.5*math85 +
                1*math90 +
                1.5*math95 +
                2*math100 +
                2.5*math105 +
                3*math110 +
                3.5*math115 +
                4*math120 +
                4.5*math125 +
                5*math130 +
                5.5*math135 +
                6*math140 +
                6.5*math145

  # factor variances
      eta_1 ~~ start(65)*eta_1
      eta_2 ~~ start(.75)*eta_2

  # covariances among factors 
      eta_1 ~~ start(1.2)*eta_2

  # manifest variances (made equivalent by naming theta)
      math70 ~~ start(35)*theta*math70
      math75 ~~ theta*math75
      math80 ~~ theta*math80
      math85 ~~ theta*math85
      math90 ~~ theta*math90
      math95 ~~ theta*math95
      math100 ~~ theta*math100
      math105 ~~ theta*math105
      math110 ~~ theta*math110
      math115 ~~ theta*math115
      math120 ~~ theta*math120
      math125 ~~ theta*math125
      math130 ~~ theta*math130
      math135 ~~ theta*math135
      math140 ~~ theta*math140
      math145 ~~ theta*math145
      
  # manifest means (fixed at zero)
      math70 ~ 0*1
      math75 ~ 0*1
      math80 ~ 0*1
      math85 ~ 0*1
      math90 ~ 0*1
      math95 ~ 0*1
      math100 ~ 0*1
      math105 ~ 0*1
      math110 ~ 0*1
      math115 ~ 0*1
      math120 ~ 0*1
      math125 ~ 0*1
      math130 ~ 0*1
      math135 ~ 0*1
      math140 ~ 0*1
      math145 ~ 0*1

  # factor means (estimated freely)
      eta_1 ~ start(35)*1
      eta_2 ~ start(4)*1
' #end of model definition

And fit the model to the data … using maximum likelihood estimator and full information maximum likelihood to handle missing values.

#estimating the model using sem() function
lg_math_age_lavaan_fit <- sem(lg_math_age_lavaan_model, 
                          data = nlsy_math_wide,
                          meanstructure = TRUE,
                          estimator = "ML",
                          missing = "fiml")

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING:
##     due to missing values, some pairwise combinations have 0%
##     coverage; use lavInspect(fit, "coverage") to investigate.

## Warning in lav_mvnorm_missing_h1_estimate_moments(Y = X[[g]], wt = WT[[g]], : lavaan WARNING:
##     Maximum number of iterations reached when computing the sample
##     moments using EM; use the em.h1.iter.max= argument to increase the
##     number of iterations

We can look more specifically at the numerical summaries of the model.

#obtaining summary of the model using the object we just created
summary(lg_math_age_lavaan_fit, fit.measures=TRUE)

## Warning in pchisq(X2, df = df, ncp = ncp): NaNs produced

## Warning in pchisq(X2, df = df, ncp = ncp): NaNs produced

## lavaan 0.6.16 ended normally after 27 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        21
##   Number of equality constraints                    15
## 
##   Number of observations                           932
##   Number of missing patterns                       139
## 
## Model Test User Model:
##                                                       
##   Test statistic                               295.028
##   Degrees of freedom                               146
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1053.342
##   Degrees of freedom                               120
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.840
##   Tucker-Lewis Index (TLI)                       0.869
##                                                       
##   Robust Comparative Fit Index (CFI)             0.003
##   Robust Tucker-Lewis Index (TLI)                0.181
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -7928.559
##   Loglikelihood unrestricted model (H1)      -7781.045
##                                                       
##   Akaike (AIC)                               15869.117
##   Bayesian (BIC)                             15898.141
##   Sample-size adjusted Bayesian (SABIC)      15879.086
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.033
##   90 Percent confidence interval - lower         0.028
##   90 Percent confidence interval - upper         0.039
##   P-value H_0: RMSEA <= 0.050                    1.000
##   P-value H_0: RMSEA >= 0.080                    0.000
##                                                       
##   Robust RMSEA                                   4.193
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: Robust RMSEA <= 0.050               NaN
##   P-value H_0: Robust RMSEA >= 0.080               NaN
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.314
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 =~                                            
##     math70            1.000                           
##     math75            1.000                           
##     math80            1.000                           
##     math85            1.000                           
##     math90            1.000                           
##     math95            1.000                           
##     math100           1.000                           
##     math105           1.000                           
##     math110           1.000                           
##     math115           1.000                           
##     math120           1.000                           
##     math125           1.000                           
##     math130           1.000                           
##     math135           1.000                           
##     math140           1.000                           
##     math145           1.000                           
##   eta_2 =~                                            
##     math70           -1.000                           
##     math75           -0.500                           
##     math80            0.000                           
##     math85            0.500                           
##     math90            1.000                           
##     math95            1.500                           
##     math100           2.000                           
##     math105           2.500                           
##     math110           3.000                           
##     math115           3.500                           
##     math120           4.000                           
##     math125           4.500                           
##     math130           5.000                           
##     math135           5.500                           
##     math140           6.000                           
##     math145           6.500                           
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 ~~                                            
##     eta_2             1.157    1.010    1.146    0.252
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .math70            0.000                           
##    .math75            0.000                           
##    .math80            0.000                           
##    .math85            0.000                           
##    .math90            0.000                           
##    .math95            0.000                           
##    .math100           0.000                           
##    .math105           0.000                           
##    .math110           0.000                           
##    .math115           0.000                           
##    .math120           0.000                           
##    .math125           0.000                           
##    .math130           0.000                           
##    .math135           0.000                           
##    .math140           0.000                           
##    .math145           0.000                           
##     eta_1            35.236    0.347  101.512    0.000
##     eta_2             4.229    0.081   51.910    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            65.063    5.503   11.824    0.000
##     eta_2             0.725    0.277    2.616    0.009
##    .math70  (thet)   32.337    1.695   19.083    0.000
##    .math75  (thet)   32.337    1.695   19.083    0.000
##    .math80  (thet)   32.337    1.695   19.083    0.000
##    .math85  (thet)   32.337    1.695   19.083    0.000
##    .math90  (thet)   32.337    1.695   19.083    0.000
##    .math95  (thet)   32.337    1.695   19.083    0.000
##    .math100 (thet)   32.337    1.695   19.083    0.000
##    .math105 (thet)   32.337    1.695   19.083    0.000
##    .math110 (thet)   32.337    1.695   19.083    0.000
##    .math115 (thet)   32.337    1.695   19.083    0.000
##    .math120 (thet)   32.337    1.695   19.083    0.000
##    .math125 (thet)   32.337    1.695   19.083    0.000
##    .math130 (thet)   32.337    1.695   19.083    0.000
##    .math135 (thet)   32.337    1.695   19.083    0.000
##    .math140 (thet)   32.337    1.695   19.083    0.000
##    .math145 (thet)   32.337    1.695   19.083    0.000

We get information about how the model fits the data, as well as some information about the data itself. Interpretation is covered in more detail in the accompanying video.

Linear Growth Fitted Model Diagram

We can use the fitted model object to make a diagram (and also check if we specified and estimated the model as intended).

#diagram of fitted model
semPaths(lg_math_age_lavaan_fit, what = "path", whatLabels = "par")

Conclusion

This tutorial has presented how continuous time metrics can be used in both the multilevel and SEM frameworks.

Yay for Continuous Time! It keeps things moving!