Confirmatory Factor Analysis - Basic

Overview

In this tutorial we walk through the very basics of conducting confirmatory factor analysis (CFA) in R. This is not a comprehensive coverage, just enough to get one started.

Prelim - Loading libraries used in this script.

library(psych)       #psychometrics basic functions

## Warning: package 'psych' was built under R version 4.3.3

library(corrplot)    #plotting correlation matrices
library(lavaan)      #fitting structural equation models
library(semPlot)     #automatically making diagrams 
library(tidyverse)   #ggplot2, dplyr, tidyr, readr, purr, tibble, stringr, forcats

Introduction to the Factor Analyis Model

Recall that the basic factor analysis model is written as series of equations of the form …

\[y_{pi} = \lambda_{pq} f_{qi} + u_{pi}\] where \(y_{pi}\) is individual i’s score on the pth observed variable, \(f_{qi}\) is individual i’s score on the qth latent common factor, \(u_{pi}\) is individual i’s score on the pth latent unique factor, and \(\lambda_{pq}\) is the factor loading that indicates the relation between the pth observed variable and the qth latent common factor.

Typically, we have multiple observed variables and one or more common factors. For instance in the 4 variable, 2 factor case we would have …

\[y_{1i} = \lambda_{11} f_{1i} + \lambda_{12} f_{2i} + u_{1i}\] \[y_{2i} = \lambda_{21} f_{1i} + \lambda_{22} f_{2i} + u_{2i}\] \[y_{3i} = \lambda_{31} f_{1i} + \lambda_{32} f_{2i} + u_{3i}\] \[y_{4i} = \lambda_{41} f_{1i} + \lambda_{42} f_{2i} + u_{4i}\]

which can be written in a compact matrix form as

\[ \boldsymbol{Y_{i}} = \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \] where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores.

Extension to multiple persons provided for mapping to the observed correlation matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\) and the common factor model becomes

\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.

Confirmatory Factor Analysis is usually done within a Structural Equation Modeling (SEM) framework. There are a number of SEM packages in R. We currently tend to be using the lavaan package.
Lots of good information and instruction can be found on the package website … http://lavaan.ugent.be , including one of the examples we use here.

Prelim - Reading in Multiobservation Multivariate Data

For this example, we use classic data from Osborne & Sudick (1972) that has been the basis for many examples in Prof. Jack McArdle and colleagues’ papers, courses, and workshops.
The data were obtained from 204 participants (rows) who completed a variety of cognitive ability tests that are part of the WISC (columns).

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/GrowthModeling/WISC_CFAexample.csv"

#read in the .csv file using the url() function
dat <- read.csv(file=url(filepath), header=TRUE)

#subsetting to the variables of interest 
dat <- dat[,c("id","info_06","comp_06","simi_06","voca_06","picc_06","pica_06","bloc_06","obje_06")]

Lets have a quick look at the data file and the descriptives.

#data structure
head(dat, 10)

##    id  info_06   comp_06   simi_06  voca_06   picc_06  pica_06    bloc_06
## 1   1 31.28655 25.626566 22.932331 22.21491 25.877193 8.310249  9.4098884
## 2   2 13.80117 14.786967  7.581454 15.37281 10.526316 7.417667  3.8915470
## 3   3 34.96970 34.675325 28.051948 26.84091 42.818182 6.443381  4.9586777
## 4   4 24.79532 31.390977  8.208020 20.19737 50.000000 2.708526  8.8676236
## 5   5 25.26316 30.263158 15.977444 35.41667 44.298246 9.295168  8.7400319
## 6   6 15.40230 23.399015 11.453202 20.56034 18.103448 8.832426  1.5047022
## 7   7 15.38012 -1.253133  2.318296 13.00439  8.596491 1.662050 -0.3827751
## 8   8 19.88304  6.704261 14.160401 14.86842 39.912281 6.463527  7.0813397
## 9   9 12.63158 13.847118 10.275689 23.22368 19.736842 6.401970 -0.3189793
## 10 10 23.69048 25.446429 11.415816 21.78571 35.178571 7.675439  7.8571429
##      obje_06
## 1  39.164087
## 2   4.179567
## 3  41.764706
## 4  63.054696
## 5  43.085655
## 6  13.793103
## 7   4.437564
## 8  19.762642
## 9  25.902993
## 10 12.710084

#descriptives
psych::describe(dat)

##         vars   n   mean    sd median trimmed   mad   min    max  range  skew
## id         1 204 102.50 59.03 102.50  102.50 75.61  1.00 204.00 203.00  0.00
## info_06    2 204  19.78  6.12  19.13   19.77  5.53  1.04  34.97  33.93 -0.06
## comp_06    3 204  21.80  9.74  21.90   21.98 10.32 -1.25  46.49  47.74 -0.13
## simi_06    4 204  14.90  7.56  14.54   14.57  7.83 -4.52  37.76  42.28  0.39
## voca_06    5 204  20.40  6.29  19.84   20.11  6.52  2.77  39.66  36.89  0.37
## picc_06    6 204  28.25 12.15  29.20   28.60 12.68 -3.42  55.18  58.60 -0.24
## pica_06    7 204   9.31  8.41   7.40    7.97  5.77 -1.85  54.30  56.14  1.97
## bloc_06    8 204   6.94  6.60   5.81    6.12  4.62 -5.58  42.19  47.78  2.05
## obje_06    9 204  25.26 16.39  22.34   23.89 15.60 -4.54  71.10  75.64  0.66
##         kurtosis   se
## id         -1.22 4.13
## info_06     0.17 0.43
## comp_06    -0.49 0.68
## simi_06    -0.14 0.53
## voca_06     0.02 0.44
## picc_06    -0.40 0.85
## pica_06     5.36 0.59
## bloc_06     6.77 0.46
## obje_06    -0.24 1.15

Example 1: Basic CFA Orientation & Interpretation

Our goal is to code a model that matches an a priori hypothesis about the structure of the data, and evaluate the match between that model, specifically the mean and variance-covariance expectations, and the observed data (i.e., the observed means and variance-covariance matrix).

We go through a series of models.

Of particular interest for what we are doing is the correlation matrix of the first 4 substantive variables in the data set. We can look at both the numeric and visual versions of the correlation matrix.

round(cor(dat[,c("info_06","voca_06","comp_06","simi_06")]),2)

##         info_06 voca_06 comp_06 simi_06
## info_06    1.00    0.56    0.51    0.54
## voca_06    0.56    1.00    0.58    0.44
## comp_06    0.51    0.58    1.00    0.45
## simi_06    0.54    0.44    0.45    1.00

And a compact visual version from the corrplot package.

#visual correlation matrix
corrplot(cor(dat[,c("info_06","voca_06","comp_06","simi_06")]), 
         order = "original", tl.col='black', tl.cex=.75)

We will consider a set of 3 models …
* 0-Factor model * 1-Factor model * 2-Factor model

Each time we engage with … Model Specification, Model Estimation, and Model Evaluation.

0-Factor Model

We start with a 0-factor model - basically a null model. But it is still a set of assumptions. Namely, that scores on one test are NOT related to scores on any of the other tests - i.e., that there are no covariances among the 4 tests.

Model Specification

First, code the model into a lavaan model object. Note that there are a variety of short-hand ways to program these models (i.e, defaults). We take a very explicit and comprehensive approach - where we basically ‘draw’ every single arrow into the model.

Basically, there are four formula types that correspond to different kinds of arrows.

latent variable definition =~ “is measured by” regression ~ “is regressed on” (residual) (co)variance ~~ “is correlated with” intercept ~ 1 “intercept” A complete lavaan model syntax is simply a combination of these formula types, enclosed between single quotes.

We set up a simple zero factor model …

#Model with 0 common factors 
verbal_0factor <- ' #start of model

# latent variable definitions (common factors)
# latent variable variances
# latent variable covariances
# latent variable means

# manifest variable variances (uniquenesses)
  info_06 ~~ info_06
  voca_06 ~~ voca_06
  comp_06 ~~ comp_06
  simi_06 ~~ simi_06
            
# manifest variable covariances (uniquenesses)

#manifest variable means 
  info_06 ~ 1
  voca_06 ~ 1
  comp_06 ~ 1
  simi_06 ~ 1
' #end of model

Model Fitting

Second, we fit the specified model to the data to obtain parameter estimates.

fit0 <- lavaan(verbal_0factor, data=dat, mimic = "mplus")
summary(fit0, standardized=TRUE, fit.measures=TRUE)

## lavaan 0.6.16 ended normally after 8 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
## 
##   Number of observations                           204
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                               261.106
##   Degrees of freedom                                 6
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               261.106
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.000
##   Tucker-Lewis Index (TLI)                      -0.000
##                                                       
##   Robust Comparative Fit Index (CFI)             0.000
##   Robust Tucker-Lewis Index (TLI)                0.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2777.649
##   Loglikelihood unrestricted model (H1)      -2647.095
##                                                       
##   Akaike (AIC)                                5571.297
##   Bayesian (BIC)                              5597.842
##   Sample-size adjusted Bayesian (SABIC)       5572.496
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.457
##   90 Percent confidence interval - lower         0.410
##   90 Percent confidence interval - upper         0.505
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    1.000
##                                                       
##   Robust RMSEA                                   0.457
##   90 Percent confidence interval - lower         0.410
##   90 Percent confidence interval - upper         0.505
##   P-value H_0: Robust RMSEA <= 0.050             0.000
##   P-value H_0: Robust RMSEA >= 0.080             1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.337
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     info_06          19.776    0.427   46.273    0.000   19.776    3.240
##     voca_06          20.396    0.439   46.416    0.000   20.396    3.250
##     comp_06          21.797    0.680   32.036    0.000   21.797    2.243
##     simi_06          14.903    0.528   28.224    0.000   14.903    1.976
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     info_06          37.261    3.689   10.100    0.000   37.261    1.000
##     voca_06          39.390    3.900   10.100    0.000   39.390    1.000
##     comp_06          94.434    9.350   10.100    0.000   94.434    1.000
##     simi_06          56.881    5.632   10.100    0.000   56.881    1.000

Model Evaluation

Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?

Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)

semPaths(fit0, what="std", 
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Note that we asked for standardized estimates in the diagram.

We can also look more closely at how well the model fit, and interpret the model more generally. * Note that there are 6 degrees of freedom in this model. * Note how bad the fit of the model is. * There is room for improvement. Do we have another theory?

1-Factor Model

We have another theory - that the 4 tests are indicators of a single latent factor we’ll call Verbal ability.

Model Specification

We specify a 1-factor model by defining a latent variable and adding in some additional arrows. The factor must be indicated by manifest variables, and must have a variance (maybe covariance and mean).

Basically, there are four formula types that correspond to different kinds of arrows.

We set up a 1-factor model …

#Model with 1 common factor 
verbal_1factor <- ' #start of model

# latent variable definitions (common factors)
  verbal =~ NA*info_06 + 
            NA*voca_06 +
            NA*comp_06 +
            NA*simi_06
  
# latent variable variances
  verbal ~~ 1*verbal

# latent variable covariances
# latent variable means

# manifest variable variances (uniquenesses)
  info_06 ~~ info_06
  voca_06 ~~ voca_06
  comp_06 ~~ comp_06
  simi_06 ~~ simi_06
            
# manifest variable covariances (uniquenesses)

#manifest variable means 
  info_06 ~ 1
  voca_06 ~ 1
  comp_06 ~ 1
  simi_06 ~ 1
' #end of model

Model Fitting

Second, we fit the specified model to the data to obrain parameter estimates.

fit1 <- lavaan(verbal_1factor, data=dat, mimic = "mplus")
summary(fit1, standardized=TRUE, fit.measures=TRUE)

## lavaan 0.6.16 ended normally after 17 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
## 
##   Number of observations                           204
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 7.147
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.028
## 
## Model Test Baseline Model:
## 
##   Test statistic                               261.106
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.980
##   Tucker-Lewis Index (TLI)                       0.939
##                                                       
##   Robust Comparative Fit Index (CFI)             0.980
##   Robust Tucker-Lewis Index (TLI)                0.939
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2650.669
##   Loglikelihood unrestricted model (H1)      -2647.095
##                                                       
##   Akaike (AIC)                                5325.338
##   Bayesian (BIC)                              5365.155
##   Sample-size adjusted Bayesian (SABIC)       5327.136
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.112
##   90 Percent confidence interval - lower         0.032
##   90 Percent confidence interval - upper         0.206
##   P-value H_0: RMSEA <= 0.050                    0.088
##   P-value H_0: RMSEA >= 0.080                    0.791
##                                                       
##   Robust RMSEA                                   0.112
##   90 Percent confidence interval - lower         0.032
##   90 Percent confidence interval - upper         0.206
##   P-value H_0: Robust RMSEA <= 0.050             0.088
##   P-value H_0: Robust RMSEA >= 0.080             0.791
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.025
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   verbal =~                                                             
##     info_06           4.582    0.412   11.112    0.000    4.582    0.751
##     voca_06           4.714    0.421   11.192    0.000    4.714    0.751
##     comp_06           7.013    0.659   10.642    0.000    7.013    0.722
##     simi_06           4.843    0.527    9.190    0.000    4.843    0.642
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .info_06          19.776    0.427   46.273    0.000   19.776    3.240
##    .voca_06          20.396    0.439   46.416    0.000   20.396    3.250
##    .comp_06          21.797    0.680   32.036    0.000   21.797    2.243
##    .simi_06          14.903    0.528   28.224    0.000   14.903    1.976
##     verbal            0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     verbal            1.000                               1.000    1.000
##    .info_06          16.271    2.419    6.725    0.000   16.271    0.437
##    .voca_06          17.167    2.517    6.820    0.000   17.167    0.436
##    .comp_06          45.258    6.176    7.328    0.000   45.258    0.479
##    .simi_06          33.426    4.031    8.293    0.000   33.426    0.588

Model Evaluation

Some basic questions to ask after fitting a model - are Did it work?
Does it make sense? Are the parameters viable?

Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)

semPaths(fit1, what="std", 
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Then, we can look more closely at how well the model fit, and interpret the model more generally.

Note that there are 2 degrees of freedom in this model.
Note how much better the fit of the model is.
We can formalize the improvement in fit

We do model comparison…
Looking at fit statistics

#Compare model fit statistics
round(cbind(m1=inspect(fit0, 'fit.measures'), 
            m2=inspect(fit1, 'fit.measures')), 3)

##                                     m1        m2
## npar                             8.000    12.000
## fmin                             0.640     0.018
## chisq                          261.106     7.147
## df                               6.000     2.000
## pvalue                           0.000     0.028
## baseline.chisq                 261.106   261.106
## baseline.df                      6.000     6.000
## baseline.pvalue                  0.000     0.000
## cfi                              0.000     0.980
## tli                              0.000     0.939
## cfi.robust                       0.000     0.980
## tli.robust                       0.000     0.939
## nnfi                             0.000     0.939
## rfi                              1.000     0.918
## nfi                              0.000     0.973
## pnfi                             0.000     0.324
## ifi                              0.000     0.980
## rni                              0.000     0.980
## nnfi.robust                      0.000     0.939
## rni.robust                       0.000     0.980
## logl                         -2777.649 -2650.669
## unrestricted.logl            -2647.095 -2647.095
## aic                           5571.297  5325.338
## bic                           5597.842  5365.155
## ntotal                         204.000   204.000
## bic2                          5572.496  5327.136
## rmsea                            0.457     0.112
## rmsea.ci.lower                   0.410     0.032
## rmsea.ci.upper                   0.505     0.206
## rmsea.ci.level                   0.900     0.900
## rmsea.pvalue                     0.000     0.088
## rmsea.close.h0                   0.050     0.050
## rmsea.notclose.pvalue            1.000     0.791
## rmsea.notclose.h0                0.080     0.080
## rmsea.robust                     0.457     0.112
## rmsea.ci.lower.robust            0.410     0.032
## rmsea.ci.upper.robust            0.505     0.206
## rmsea.pvalue.robust              0.000     0.088
## rmsea.notclose.pvalue.robust     1.000     0.791
## rmr                             18.441     1.294
## rmr_nomean                      21.820     1.531
## srmr                             0.337     0.025
## srmr_bentler                     0.337     0.025
## srmr_bentler_nomean              0.399     0.029
## crmr                             0.399     0.029
## crmr_nomean                      0.515     0.038
## srmr_mplus                       0.337     0.025
## srmr_mplus_nomean                0.399     0.029
## cn_05                           10.838   172.016
## cn_01                           14.135   263.893
## gfi                              0.953     0.998
## agfi                             0.889     0.984
## pgfi                             0.408     0.143
## mfi                              0.535     0.987
## ecvi                             1.358     0.153

Formal model test

#Chi-square difference test for nested models 
anova(fit0, fit1)

## 
## Chi-Squared Difference Test
## 
##      Df    AIC    BIC   Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)    
## fit1  2 5325.3 5365.2   7.147                                          
## fit0  6 5571.3 5597.8 261.106     253.96 0.55346       4  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We reject the null hypothesis that the models fit the same.

2-Factor Model

We have yet another theory - that the 4 tests are indicators of two correlated latent factors we will call f1 and f2.

Model Specification

We specify a 2-factor model by defining the latent variables and adding in some additional arrows. Each factor must be indicated by manifest variables, and must have a variance (maybe covariance and mean).
We won’t go into it here, but identification constraints are needed and important.

We set up a 2-factor model …

#Model with 1 common factor 
verbal_2factor <- ' #start of model

# latent variable definitions (common factors)
  f1 =~ NA*info_06 + 
        NA*voca_06 
  f2 =~ NA*comp_06 +
        NA*simi_06
  
# latent variable variances
  f1 ~~ 1*f1
  f2 ~~ 1*f2

# latent variable covariances
  f1 ~~ f2
# latent variable means

# manifest variable variances (uniquenesses)
  info_06 ~~ info_06
  voca_06 ~~ voca_06
  comp_06 ~~ comp_06
  simi_06 ~~ simi_06
            
# manifest variable covariances (uniquenesses)

#manifest variable means 
  info_06 ~ 1
  voca_06 ~ 1
  comp_06 ~ 1
  simi_06 ~ 1
' #end of model

Model Fitting

Second, we fit the specified model to the data to obtain parameter estimates.

fit2 <- lavaan(verbal_2factor, data=dat, mimic = "mplus")

## Warning in lav_object_post_check(object): lavaan WARNING: covariance matrix of latent variables
##                 is not positive definite;
##                 use lavInspect(fit, "cov.lv") to investigate.

summary(fit2, standardized=TRUE, fit.measures=TRUE)

## lavaan 0.6.16 ended normally after 24 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        13
## 
##   Number of observations                           204
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 6.805
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.009
## 
## Model Test Baseline Model:
## 
##   Test statistic                               261.106
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.977
##   Tucker-Lewis Index (TLI)                       0.863
##                                                       
##   Robust Comparative Fit Index (CFI)             0.977
##   Robust Tucker-Lewis Index (TLI)                0.863
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2650.498
##   Loglikelihood unrestricted model (H1)      -2647.095
##                                                       
##   Akaike (AIC)                                5326.996
##   Bayesian (BIC)                              5370.132
##   Sample-size adjusted Bayesian (SABIC)       5328.944
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.169
##   90 Percent confidence interval - lower         0.067
##   90 Percent confidence interval - upper         0.298
##   P-value H_0: RMSEA <= 0.050                    0.030
##   P-value H_0: RMSEA >= 0.080                    0.929
##                                                       
##   Robust RMSEA                                   0.169
##   90 Percent confidence interval - lower         0.067
##   90 Percent confidence interval - upper         0.298
##   P-value H_0: Robust RMSEA <= 0.050             0.030
##   P-value H_0: Robust RMSEA >= 0.080             0.929
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.024
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 =~                                                                 
##     info_06           4.553    0.413   11.032    0.000    4.553    0.746
##     voca_06           4.675    0.424   11.015    0.000    4.675    0.745
##   f2 =~                                                                 
##     comp_06           6.882    0.696    9.889    0.000    6.882    0.708
##     simi_06           4.778    0.538    8.886    0.000    4.778    0.634
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 ~~                                                                 
##     f2                1.035    0.061   17.067    0.000    1.035    1.035
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .info_06          19.776    0.427   46.273    0.000   19.776    3.240
##    .voca_06          20.396    0.439   46.416    0.000   20.396    3.250
##    .comp_06          21.797    0.680   32.036    0.000   21.797    2.243
##    .simi_06          14.903    0.528   28.224    0.000   14.903    1.976
##     f1                0.000                               0.000    0.000
##     f2                0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     f1                1.000                               1.000    1.000
##     f2                1.000                               1.000    1.000
##    .info_06          16.529    2.423    6.821    0.000   16.529    0.444
##    .voca_06          17.537    2.562    6.846    0.000   17.537    0.445
##    .comp_06          47.070    6.912    6.810    0.000   47.070    0.498
##    .simi_06          34.047    4.174    8.158    0.000   34.047    0.599

Model Evaluation

Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?

NOTICE THERE WAS A WARNING Something is wrong with this model. Look at the correlation between the factors, r = 1.03. That is not viable.

Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)

semPaths(fit2, what="std", 
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Then, we can look more closely at how well the model fit, and interpret the model more generally.

Note that there are 2 degrees of freedom in this model.
Note how much better the fit of the model is.
We can formalize the improvement in fit

We do model comparison…
Looking at fit statistics

#Compare model fit statistics
round(cbind(m1=inspect(fit0, 'fit.measures'), m2=inspect(fit1, 'fit.measures'),
            m3=inspect(fit2, 'fit.measures')), 3)

##                                     m1        m2        m3
## npar                             8.000    12.000    13.000
## fmin                             0.640     0.018     0.017
## chisq                          261.106     7.147     6.805
## df                               6.000     2.000     1.000
## pvalue                           0.000     0.028     0.009
## baseline.chisq                 261.106   261.106   261.106
## baseline.df                      6.000     6.000     6.000
## baseline.pvalue                  0.000     0.000     0.000
## cfi                              0.000     0.980     0.977
## tli                              0.000     0.939     0.863
## cfi.robust                       0.000     0.980     0.977
## tli.robust                       0.000     0.939     0.863
## nnfi                             0.000     0.939     0.863
## rfi                              1.000     0.918     0.844
## nfi                              0.000     0.973     0.974
## pnfi                             0.000     0.324     0.162
## ifi                              0.000     0.980     0.978
## rni                              0.000     0.980     0.977
## nnfi.robust                      0.000     0.939     0.863
## rni.robust                       0.000     0.980     0.977
## logl                         -2777.649 -2650.669 -2650.498
## unrestricted.logl            -2647.095 -2647.095 -2647.095
## aic                           5571.297  5325.338  5326.996
## bic                           5597.842  5365.155  5370.132
## ntotal                         204.000   204.000   204.000
## bic2                          5572.496  5327.136  5328.944
## rmsea                            0.457     0.112     0.169
## rmsea.ci.lower                   0.410     0.032     0.067
## rmsea.ci.upper                   0.505     0.206     0.298
## rmsea.ci.level                   0.900     0.900     0.900
## rmsea.pvalue                     0.000     0.088     0.030
## rmsea.close.h0                   0.050     0.050     0.050
## rmsea.notclose.pvalue            1.000     0.791     0.929
## rmsea.notclose.h0                0.080     0.080     0.080
## rmsea.robust                     0.457     0.112     0.169
## rmsea.ci.lower.robust            0.410     0.032     0.067
## rmsea.ci.upper.robust            0.505     0.206     0.298
## rmsea.pvalue.robust              0.000     0.088     0.030
## rmsea.notclose.pvalue.robust     1.000     0.791     0.929
## rmr                             18.441     1.294     1.252
## rmr_nomean                      21.820     1.531     1.481
## srmr                             0.337     0.025     0.024
## srmr_bentler                     0.337     0.025     0.024
## srmr_bentler_nomean              0.399     0.029     0.029
## crmr                             0.399     0.029     0.029
## crmr_nomean                      0.515     0.038     0.037
## srmr_mplus                       0.337     0.025     0.024
## srmr_mplus_nomean                0.399     0.029     0.029
## cn_05                           10.838   172.016   116.154
## cn_01                           14.135   263.893   199.891
## gfi                              0.953     0.998     0.998
## agfi                             0.889     0.984     0.971
## pgfi                             0.408     0.143     0.071
## mfi                              0.535     0.987     0.986
## ecvi                             1.358     0.153     0.161

Formal model test

#Chi-square difference test for nested models 
anova(fit1, fit2)

## 
## Chi-Squared Difference Test
## 
##      Df    AIC    BIC  Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
## fit2  1 5327.0 5370.1 6.8053                                    
## fit1  2 5325.3 5365.2 7.1470    0.34172     0       1     0.5588

We cannot reject the null hypothesis that the models fit the same. And - the 2-factor model is bad anyway.

We conclude that the 1-factor model is the optimal model, of those evaluated.

Example 2: A Different Set of Variables

Again, our goal is to code a model that matches an a priori hypothesis about the structure of the data, and evaluate the match between that model, specifically the mean and variance-covariance expectations, and the observed data (i.e., the observed means and variance-covaraince matrix).

We go through a series of models.

Of particular interest for what we are doing is the correlation matrix of 4 substantive variables in the data set (CO, VO, BL, OB). We can look at both the numeric and visual versions of the correlation matrix.

round(cor(dat[,c("comp_06","voca_06","bloc_06","obje_06")]), 2)

##         comp_06 voca_06 bloc_06 obje_06
## comp_06    1.00    0.58    0.32    0.31
## voca_06    0.58    1.00    0.39    0.45
## bloc_06    0.32    0.39    1.00    0.55
## obje_06    0.31    0.45    0.55    1.00

And a compact visual version from the corrplot package.

#visusal correlation matrix
corrplot(cor(dat[,c("comp_06","voca_06","bloc_06","obje_06")]), order = "original", tl.col='black', tl.cex=.75)

We will consider a set of 3 models …
* 0-Factor model * 1-Factor model * 2-Factor model

Each time we engage with … Model Specification, Model Estimation, and Model Evaluation.

0-Factor Model

Model Specification

First, code the model into a lavaan model object. Note that there are a variety of short-hand ways to program these models (i.e, defualts). We take a very explicit and comprehensive approach - where we basically ‘draw’ every single arrow into the model.

Basically, there are four formula types that correspond to different kinds of arrows.

We set up a simple zero factor model …

#Model with 0 common factors 
General_0factor <- ' #start of model

# latent variable definitions (common factors)
# latent variable variances
# latent variable covariances
# latent variable means

# manifest variable variances (uniquenesses)
  comp_06 ~~ comp_06
  voca_06 ~~ voca_06
  bloc_06 ~~ bloc_06
  obje_06 ~~ obje_06
            
# manifest variable covariances (uniquenesses)

#manifest variable means 
  comp_06 ~ 1
  voca_06 ~ 1
  bloc_06 ~ 1
  obje_06 ~ 1
' #end of model

Model Fitting

Second, we fit the specified model to the data to obrain parameter estimates.

fit0 <- lavaan(General_0factor, data=dat, mimic = "mplus")
summary(fit0, standardized=TRUE, fit.measures=TRUE)

## lavaan 0.6.16 ended normally after 8 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
## 
##   Number of observations                           204
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                               214.189
##   Degrees of freedom                                 6
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               214.189
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.000
##   Tucker-Lewis Index (TLI)                       0.000
##                                                       
##   Robust Comparative Fit Index (CFI)             0.000
##   Robust Tucker-Lewis Index (TLI)                0.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2950.942
##   Loglikelihood unrestricted model (H1)      -2843.848
##                                                       
##   Akaike (AIC)                                5917.885
##   Bayesian (BIC)                              5944.430
##   Sample-size adjusted Bayesian (SABIC)       5919.084
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.412
##   90 Percent confidence interval - lower         0.366
##   90 Percent confidence interval - upper         0.461
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    1.000
##                                                       
##   Robust RMSEA                                   0.412
##   90 Percent confidence interval - lower         0.366
##   90 Percent confidence interval - upper         0.461
##   P-value H_0: Robust RMSEA <= 0.050             0.000
##   P-value H_0: Robust RMSEA >= 0.080             1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.292
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     comp_06          21.797    0.680   32.036    0.000   21.797    2.243
##     voca_06          20.396    0.439   46.416    0.000   20.396    3.250
##     bloc_06           6.942    0.461   15.063    0.000    6.942    1.055
##     obje_06          25.258    1.145   22.059    0.000   25.258    1.544
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     comp_06          94.434    9.350   10.100    0.000   94.434    1.000
##     voca_06          39.390    3.900   10.100    0.000   39.390    1.000
##     bloc_06          43.333    4.291   10.100    0.000   43.333    1.000
##     obje_06         267.462   26.483   10.100    0.000  267.462    1.000

Model Evaluation

Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?

Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)

semPaths(fit0, what="std", 
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Note that we asked for standardized estimates in the diagram.

1-Factor Model

We have another theory - that the 4 tests are indicators of a single latent factor we’ll call G (general ability).

Model Specification

Basically, there are four formula types that correspond to different kinds of arrows.

We set up a 1-factor model …

#Model with 1 common factor 
General_1factor <- ' #start of model

# latent variable definitions (common factors)
  G =~ NA*comp_06 + 
       NA*voca_06 +
       NA*bloc_06 +
       NA*obje_06
  
# latent variable variances
  G ~~ 1*G

# latent variable covariances
# latent variable means

# manifest variable variances (uniquenesses)
  comp_06 ~~ comp_06
  voca_06 ~~ voca_06
  bloc_06 ~~ bloc_06
  obje_06 ~~ obje_06
            
# manifest variable covariances (uniquenesses)

#manifest variable means 
  comp_06 ~ 1
  voca_06 ~ 1
  bloc_06 ~ 1
  obje_06 ~ 1
' #end of model

Model Fitting

Second, we fit the specified model to the data to obtain parameter estimates.

fit1 <- lavaan(General_1factor, data=dat, mimic = "mplus")
summary(fit1, standardized=TRUE, fit.measures=TRUE)

## lavaan 0.6.16 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
## 
##   Number of observations                           204
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                31.453
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               214.189
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.859
##   Tucker-Lewis Index (TLI)                       0.576
##                                                       
##   Robust Comparative Fit Index (CFI)             0.859
##   Robust Tucker-Lewis Index (TLI)                0.576
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2859.574
##   Loglikelihood unrestricted model (H1)      -2843.848
##                                                       
##   Akaike (AIC)                                5743.148
##   Bayesian (BIC)                              5782.966
##   Sample-size adjusted Bayesian (SABIC)       5744.946
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.269
##   90 Percent confidence interval - lower         0.191
##   90 Percent confidence interval - upper         0.355
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    1.000
##                                                       
##   Robust RMSEA                                   0.269
##   90 Percent confidence interval - lower         0.191
##   90 Percent confidence interval - upper         0.355
##   P-value H_0: Robust RMSEA <= 0.050             0.000
##   P-value H_0: Robust RMSEA >= 0.080             1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.064
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   G =~                                                                  
##     comp_06           6.409    0.721    8.891    0.000    6.409    0.660
##     voca_06           4.888    0.482   10.151    0.000    4.888    0.779
##     bloc_06           3.788    0.544    6.961    0.000    3.788    0.575
##     obje_06          10.053    1.322    7.606    0.000   10.053    0.615
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .comp_06          21.797    0.680   32.036    0.000   21.797    2.243
##    .voca_06          20.396    0.439   46.416    0.000   20.396    3.250
##    .bloc_06           6.942    0.461   15.063    0.000    6.942    1.055
##    .obje_06          25.258    1.145   22.059    0.000   25.258    1.544
##     G                 0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     G                 1.000                               1.000    1.000
##    .comp_06          53.358    7.333    7.276    0.000   53.358    0.565
##    .voca_06          15.493    3.415    4.536    0.000   15.493    0.393
##    .bloc_06          28.985    3.880    7.470    0.000   28.985    0.669
##    .obje_06         166.401   23.404    7.110    0.000  166.401    0.622

Model Evaluation

Some basic questions to ask after fitting a model - are Did it work?
Does it make sense? Are the parameters viable?

Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)

semPaths(fit1, what="std", 
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Then, we can look more closely at how well the model fit, and interpret the model more generally.

Note that there are 2 degrees of freedom in this model.
Note how much better the fit of the model is.
We can formalize the improvement in fit

We do model comparison…
Looking at fit statistics

#Compare model fit statistics
round(cbind(m1=inspect(fit0, 'fit.measures'), 
            m2=inspect(fit1, 'fit.measures')), 3)

##                                     m1        m2
## npar                             8.000    12.000
## fmin                             0.525     0.077
## chisq                          214.189    31.453
## df                               6.000     2.000
## pvalue                           0.000     0.000
## baseline.chisq                 214.189   214.189
## baseline.df                      6.000     6.000
## baseline.pvalue                  0.000     0.000
## cfi                              0.000     0.859
## tli                              0.000     0.576
## cfi.robust                       0.000     0.859
## tli.robust                       0.000     0.576
## nnfi                             0.000     0.576
## rfi                              0.000     0.559
## nfi                              0.000     0.853
## pnfi                             0.000     0.284
## ifi                              0.000     0.861
## rni                              0.000     0.859
## nnfi.robust                      0.000     0.576
## rni.robust                       0.000     0.859
## logl                         -2950.942 -2859.574
## unrestricted.logl            -2843.848 -2843.848
## aic                           5917.885  5743.148
## bic                           5944.430  5782.966
## ntotal                         204.000   204.000
## bic2                          5919.084  5744.946
## rmsea                            0.412     0.269
## rmsea.ci.lower                   0.366     0.191
## rmsea.ci.upper                   0.461     0.355
## rmsea.ci.level                   0.900     0.900
## rmsea.pvalue                     0.000     0.000
## rmsea.close.h0                   0.050     0.050
## rmsea.notclose.pvalue            1.000     1.000
## rmsea.notclose.h0                0.080     0.080
## rmsea.robust                     0.412     0.269
## rmsea.ci.lower.robust            0.366     0.191
## rmsea.ci.upper.robust            0.461     0.355
## rmsea.pvalue.robust              0.000     0.000
## rmsea.notclose.pvalue.robust     1.000     1.000
## rmr                             26.731     7.015
## rmr_nomean                      31.629     8.300
## srmr                             0.292     0.064
## srmr_bentler                     0.292     0.064
## srmr_bentler_nomean              0.345     0.076
## crmr                             0.345     0.076
## crmr_nomean                      0.446     0.098
## srmr_mplus                       0.292     0.064
## srmr_mplus_nomean                0.345     0.076
## cn_05                           12.993    39.860
## cn_01                           17.012    60.738
## gfi                              0.946     0.988
## agfi                             0.875     0.916
## pgfi                             0.406     0.141
## mfi                              0.600     0.930
## ecvi                             1.128     0.272

Formal model test

#Chi-square difference test for nested models 
anova(fit0, fit1)

## 
## Chi-Squared Difference Test
## 
##      Df    AIC    BIC   Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)    
## fit1  2 5743.1 5783.0  31.453                                          
## fit0  6 5917.9 5944.4 214.189     182.74 0.46802       4  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We reject the null hypothesis that the models fit the same.

2-Factor Model

We have yet another theory - that the 4 tests are indicators of two correlated latent factor we’ll call Gf and Gc.

Model Specification

We set up a 2-factor model …

#Model with 1 common factor 
General_2factor <- ' #start of model

# latent variable definitions (common factors)
  Gc =~ NA*comp_06 + 
        NA*voca_06 
  Gf =~ NA*bloc_06 +
        NA*obje_06
  
# latent variable variances
  Gc ~~ 1*Gc
  Gf ~~ 1*Gf

# latent variable covariances
  Gc ~~ Gf
# latent variable means

# manifest variable variances (uniquenesses)
  comp_06 ~~ comp_06
  voca_06 ~~ voca_06
  bloc_06 ~~ bloc_06
  obje_06 ~~ obje_06
            
# manifest variable covariances (uniquenesses)

#manifest variable means 
  comp_06 ~ 1
  voca_06 ~ 1
  bloc_06 ~ 1
  obje_06 ~ 1
' #end of model

Model Fitting

Second, we fit the specified model to the data to obrain parameter estimates.

fit2 <- lavaan(General_2factor, data=dat, mimic = "mplus")
summary(fit2, standardized=TRUE, fit.measures=TRUE)

## lavaan 0.6.16 ended normally after 29 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        13
## 
##   Number of observations                           204
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 1.167
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.280
## 
## Model Test Baseline Model:
## 
##   Test statistic                               214.189
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.999
##   Tucker-Lewis Index (TLI)                       0.995
##                                                       
##   Robust Comparative Fit Index (CFI)             0.999
##   Robust Tucker-Lewis Index (TLI)                0.995
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2844.431
##   Loglikelihood unrestricted model (H1)      -2843.848
##                                                       
##   Akaike (AIC)                                5714.862
##   Bayesian (BIC)                              5757.998
##   Sample-size adjusted Bayesian (SABIC)       5716.810
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.029
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.191
##   P-value H_0: RMSEA <= 0.050                    0.394
##   P-value H_0: RMSEA >= 0.080                    0.462
##                                                       
##   Robust RMSEA                                   0.029
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.191
##   P-value H_0: Robust RMSEA <= 0.050             0.394
##   P-value H_0: Robust RMSEA >= 0.080             0.462
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.010
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Gc =~                                                                 
##     comp_06           6.420    0.758    8.466    0.000    6.420    0.661
##     voca_06           5.550    0.526   10.547    0.000    5.550    0.884
##   Gf =~                                                                 
##     bloc_06           4.576    0.517    8.851    0.000    4.576    0.695
##     obje_06          12.829    1.320    9.718    0.000   12.829    0.784
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Gc ~~                                                                 
##     Gf                0.641    0.074    8.642    0.000    0.641    0.641
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .comp_06          21.797    0.680   32.036    0.000   21.797    2.243
##    .voca_06          20.396    0.439   46.416    0.000   20.396    3.250
##    .bloc_06           6.942    0.461   15.063    0.000    6.942    1.055
##    .obje_06          25.258    1.145   22.059    0.000   25.258    1.544
##     Gc                0.000                               0.000    0.000
##     Gf                0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     Gc                1.000                               1.000    1.000
##     Gf                1.000                               1.000    1.000
##    .comp_06          53.220    7.931    6.710    0.000   53.220    0.564
##    .voca_06           8.592    4.511    1.905    0.057    8.592    0.218
##    .bloc_06          22.391    3.717    6.025    0.000   22.391    0.517
##    .obje_06         102.876   25.562    4.025    0.000  102.876    0.385

Model Evaluation

Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?

Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)

semPaths(fit2, what="std", 
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Then, we can look more closely at how well the model fit, and interpret the model more generally.

Note that there are 2 degrees of freedom in this model.
Note how much better the fit of the model is.
We can formalize the improvement in fit

We do model comparison…
Looking at fit statistics

#Compare model fit statistics
round(cbind(m1=inspect(fit0, 'fit.measures'), 
            m2=inspect(fit1, 'fit.measures'),
            m3=inspect(fit2, 'fit.measures')), 3)

##                                     m1        m2        m3
## npar                             8.000    12.000    13.000
## fmin                             0.525     0.077     0.003
## chisq                          214.189    31.453     1.167
## df                               6.000     2.000     1.000
## pvalue                           0.000     0.000     0.280
## baseline.chisq                 214.189   214.189   214.189
## baseline.df                      6.000     6.000     6.000
## baseline.pvalue                  0.000     0.000     0.000
## cfi                              0.000     0.859     0.999
## tli                              0.000     0.576     0.995
## cfi.robust                       0.000     0.859     0.999
## tli.robust                       0.000     0.576     0.995
## nnfi                             0.000     0.576     0.995
## rfi                              0.000     0.559     0.967
## nfi                              0.000     0.853     0.995
## pnfi                             0.000     0.284     0.166
## ifi                              0.000     0.861     0.999
## rni                              0.000     0.859     0.999
## nnfi.robust                      0.000     0.576     0.995
## rni.robust                       0.000     0.859     0.999
## logl                         -2950.942 -2859.574 -2844.431
## unrestricted.logl            -2843.848 -2843.848 -2843.848
## aic                           5917.885  5743.148  5714.862
## bic                           5944.430  5782.966  5757.998
## ntotal                         204.000   204.000   204.000
## bic2                          5919.084  5744.946  5716.810
## rmsea                            0.412     0.269     0.029
## rmsea.ci.lower                   0.366     0.191     0.000
## rmsea.ci.upper                   0.461     0.355     0.191
## rmsea.ci.level                   0.900     0.900     0.900
## rmsea.pvalue                     0.000     0.000     0.394
## rmsea.close.h0                   0.050     0.050     0.050
## rmsea.notclose.pvalue            1.000     1.000     0.462
## rmsea.notclose.h0                0.080     0.080     0.080
## rmsea.robust                     0.412     0.269     0.029
## rmsea.ci.lower.robust            0.366     0.191     0.000
## rmsea.ci.upper.robust            0.461     0.355     0.191
## rmsea.pvalue.robust              0.000     0.000     0.394
## rmsea.notclose.pvalue.robust     1.000     1.000     0.462
## rmr                             26.731     7.015     0.980
## rmr_nomean                      31.629     8.300     1.160
## srmr                             0.292     0.064     0.010
## srmr_bentler                     0.292     0.064     0.010
## srmr_bentler_nomean              0.345     0.076     0.012
## crmr                             0.345     0.076     0.012
## crmr_nomean                      0.446     0.098     0.015
## srmr_mplus                       0.292     0.064     0.010
## srmr_mplus_nomean                0.345     0.076     0.012
## cn_05                           12.993    39.860   672.610
## cn_01                           17.012    60.738  1160.992
## gfi                              0.946     0.988     1.000
## agfi                             0.875     0.916     0.994
## pgfi                             0.406     0.141     0.071
## mfi                              0.600     0.930     1.000
## ecvi                             1.128     0.272     0.133

Formal model test

#Chi-square difference test for nested models 
anova(fit1, fit2)

## 
## Chi-Squared Difference Test
## 
##      Df    AIC  BIC   Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)    
## fit2  1 5714.9 5758  1.1668                                          
## fit1  2 5743.1 5783 31.4527     30.286 0.37889       1  3.728e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We reject the null hypothesis that the models fit the same.

We conclude that the 2-factor model is the optimal model, of those evaluated.

Conclusion

We’ve walked through specification, estimation, and evaluation of a variety of simple Confirmatory Factor Models. Our examples highlight how the framework is used to test the relative fit of different theories to the data. Lots of fun to be had - but always be careful to respect the confirmatory nature of this kind of modeling enterprise - and to use the models for good (and not evil).

Thanks for playing!