Confirmatory Factor Analysis - Basic
Overview
In this tutorial we walk through the very basics of conducting confirmatory factor analysis (CFA) in R. This is not a comprehensive coverage, just enough to get one started.
Prelim - Loading libraries used in this script.
## Warning: package 'psych' was built under R version 4.3.3
Introduction to the Factor Analyis Model
Recall that the basic factor analysis model is written as series of equations of the form …
\[y_{pi} = \lambda_{pq} f_{qi} + u_{pi}\] where \(y_{pi}\) is individual i’s score on the pth observed variable, \(f_{qi}\) is individual i’s score on the qth latent common factor, \(u_{pi}\) is individual i’s score on the pth latent unique factor, and \(\lambda_{pq}\) is the factor loading that indicates the relation between the pth observed variable and the qth latent common factor.
Typically, we have multiple observed variables and one or more common factors. For instance in the 4 variable, 2 factor case we would have …
\[y_{1i} = \lambda_{11} f_{1i} + \lambda_{12} f_{2i} + u_{1i}\] \[y_{2i} = \lambda_{21} f_{1i} + \lambda_{22} f_{2i} + u_{2i}\] \[y_{3i} = \lambda_{31} f_{1i} + \lambda_{32} f_{2i} + u_{3i}\] \[y_{4i} = \lambda_{41} f_{1i} + \lambda_{42} f_{2i} + u_{4i}\]
which can be written in a compact matrix form as
\[ \boldsymbol{Y_{i}} = \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \] where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores.
Extension to multiple persons provided for mapping to the observed correlation matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\) and the common factor model becomes
\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.
Confirmatory Factor Analysis is usually done within a Structural
Equation Modeling (SEM) framework. There are a number of SEM packages in
R. We currently tend to be using the lavaan package.
Lots of good information and instruction can be found on the package
website … http://lavaan.ugent.be , including one of the examples
we use here.
Prelim - Reading in Multiobservation Multivariate Data
For this example, we use classic data from Osborne & Sudick
(1972) that has been the basis for many examples in Prof. Jack McArdle
and colleagues’ papers, courses, and workshops.
The data were obtained from 204 participants (rows) who completed a
variety of cognitive ability tests that are part of the WISC
(columns).
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/The-Change-Lab/collaborations/refs/heads/main/GrowthModeling/WISC_CFAexample.csv"
#read in the .csv file using the url() function
dat <- read.csv(file=url(filepath), header=TRUE)
#subsetting to the variables of interest
dat <- dat[,c("id","info_06","comp_06","simi_06","voca_06","picc_06","pica_06","bloc_06","obje_06")]Lets have a quick look at the data file and the descriptives.
## id info_06 comp_06 simi_06 voca_06 picc_06 pica_06 bloc_06
## 1 1 31.28655 25.626566 22.932331 22.21491 25.877193 8.310249 9.4098884
## 2 2 13.80117 14.786967 7.581454 15.37281 10.526316 7.417667 3.8915470
## 3 3 34.96970 34.675325 28.051948 26.84091 42.818182 6.443381 4.9586777
## 4 4 24.79532 31.390977 8.208020 20.19737 50.000000 2.708526 8.8676236
## 5 5 25.26316 30.263158 15.977444 35.41667 44.298246 9.295168 8.7400319
## 6 6 15.40230 23.399015 11.453202 20.56034 18.103448 8.832426 1.5047022
## 7 7 15.38012 -1.253133 2.318296 13.00439 8.596491 1.662050 -0.3827751
## 8 8 19.88304 6.704261 14.160401 14.86842 39.912281 6.463527 7.0813397
## 9 9 12.63158 13.847118 10.275689 23.22368 19.736842 6.401970 -0.3189793
## 10 10 23.69048 25.446429 11.415816 21.78571 35.178571 7.675439 7.8571429
## obje_06
## 1 39.164087
## 2 4.179567
## 3 41.764706
## 4 63.054696
## 5 43.085655
## 6 13.793103
## 7 4.437564
## 8 19.762642
## 9 25.902993
## 10 12.710084
## vars n mean sd median trimmed mad min max range skew
## id 1 204 102.50 59.03 102.50 102.50 75.61 1.00 204.00 203.00 0.00
## info_06 2 204 19.78 6.12 19.13 19.77 5.53 1.04 34.97 33.93 -0.06
## comp_06 3 204 21.80 9.74 21.90 21.98 10.32 -1.25 46.49 47.74 -0.13
## simi_06 4 204 14.90 7.56 14.54 14.57 7.83 -4.52 37.76 42.28 0.39
## voca_06 5 204 20.40 6.29 19.84 20.11 6.52 2.77 39.66 36.89 0.37
## picc_06 6 204 28.25 12.15 29.20 28.60 12.68 -3.42 55.18 58.60 -0.24
## pica_06 7 204 9.31 8.41 7.40 7.97 5.77 -1.85 54.30 56.14 1.97
## bloc_06 8 204 6.94 6.60 5.81 6.12 4.62 -5.58 42.19 47.78 2.05
## obje_06 9 204 25.26 16.39 22.34 23.89 15.60 -4.54 71.10 75.64 0.66
## kurtosis se
## id -1.22 4.13
## info_06 0.17 0.43
## comp_06 -0.49 0.68
## simi_06 -0.14 0.53
## voca_06 0.02 0.44
## picc_06 -0.40 0.85
## pica_06 5.36 0.59
## bloc_06 6.77 0.46
## obje_06 -0.24 1.15
Example 1: Basic CFA Orientation & Interpretation
Our goal is to code a model that matches an a priori hypothesis about the structure of the data, and evaluate the match between that model, specifically the mean and variance-covariance expectations, and the observed data (i.e., the observed means and variance-covariance matrix).
We go through a series of models.
Of particular interest for what we are doing is the correlation matrix of the first 4 substantive variables in the data set. We can look at both the numeric and visual versions of the correlation matrix.
## info_06 voca_06 comp_06 simi_06
## info_06 1.00 0.56 0.51 0.54
## voca_06 0.56 1.00 0.58 0.44
## comp_06 0.51 0.58 1.00 0.45
## simi_06 0.54 0.44 0.45 1.00
And a compact visual version from the corrplot
package.
#visual correlation matrix
corrplot(cor(dat[,c("info_06","voca_06","comp_06","simi_06")]),
order = "original", tl.col='black', tl.cex=.75) We will consider a set of 3 models …
* 0-Factor model * 1-Factor model * 2-Factor model
Each time we engage with … Model Specification, Model Estimation, and Model Evaluation.
0-Factor Model
We start with a 0-factor model - basically a null model. But it is still a set of assumptions. Namely, that scores on one test are NOT related to scores on any of the other tests - i.e., that there are no covariances among the 4 tests.
Model Specification
First, code the model into a lavaan model object. Note that there are a variety of short-hand ways to program these models (i.e, defaults). We take a very explicit and comprehensive approach - where we basically ‘draw’ every single arrow into the model.
Basically, there are four formula types that correspond to different kinds of arrows.
latent variable definition =~ “is measured by” regression ~ “is regressed on” (residual) (co)variance ~~ “is correlated with” intercept ~ 1 “intercept” A complete lavaan model syntax is simply a combination of these formula types, enclosed between single quotes.
We set up a simple zero factor model …
#Model with 0 common factors
verbal_0factor <- ' #start of model
# latent variable definitions (common factors)
# latent variable variances
# latent variable covariances
# latent variable means
# manifest variable variances (uniquenesses)
info_06 ~~ info_06
voca_06 ~~ voca_06
comp_06 ~~ comp_06
simi_06 ~~ simi_06
# manifest variable covariances (uniquenesses)
#manifest variable means
info_06 ~ 1
voca_06 ~ 1
comp_06 ~ 1
simi_06 ~ 1
' #end of modelModel Fitting
Second, we fit the specified model to the data to obtain parameter estimates.
fit0 <- lavaan(verbal_0factor, data=dat, mimic = "mplus")
summary(fit0, standardized=TRUE, fit.measures=TRUE)## lavaan 0.6.16 ended normally after 8 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 204
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 261.106
## Degrees of freedom 6
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 261.106
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.000
## Tucker-Lewis Index (TLI) -0.000
##
## Robust Comparative Fit Index (CFI) 0.000
## Robust Tucker-Lewis Index (TLI) 0.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2777.649
## Loglikelihood unrestricted model (H1) -2647.095
##
## Akaike (AIC) 5571.297
## Bayesian (BIC) 5597.842
## Sample-size adjusted Bayesian (SABIC) 5572.496
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.457
## 90 Percent confidence interval - lower 0.410
## 90 Percent confidence interval - upper 0.505
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Robust RMSEA 0.457
## 90 Percent confidence interval - lower 0.410
## 90 Percent confidence interval - upper 0.505
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.337
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## info_06 19.776 0.427 46.273 0.000 19.776 3.240
## voca_06 20.396 0.439 46.416 0.000 20.396 3.250
## comp_06 21.797 0.680 32.036 0.000 21.797 2.243
## simi_06 14.903 0.528 28.224 0.000 14.903 1.976
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## info_06 37.261 3.689 10.100 0.000 37.261 1.000
## voca_06 39.390 3.900 10.100 0.000 39.390 1.000
## comp_06 94.434 9.350 10.100 0.000 94.434 1.000
## simi_06 56.881 5.632 10.100 0.000 56.881 1.000
Model Evaluation
Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?
Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)
Note that we asked for standardized estimates in the diagram.
We can also look more closely at how well the model fit, and interpret the model more generally. * Note that there are 6 degrees of freedom in this model. * Note how bad the fit of the model is. * There is room for improvement. Do we have another theory?
1-Factor Model
We have another theory - that the 4 tests are indicators of a single latent factor we’ll call Verbal ability.
Model Specification
We specify a 1-factor model by defining a latent variable and adding in some additional arrows. The factor must be indicated by manifest variables, and must have a variance (maybe covariance and mean).
Basically, there are four formula types that correspond to different kinds of arrows.
latent variable definition =~ “is measured by” regression ~ “is regressed on” (residual) (co)variance ~~ “is correlated with” intercept ~ 1 “intercept” A complete lavaan model syntax is simply a combination of these formula types, enclosed between single quotes.
We set up a 1-factor model …
#Model with 1 common factor
verbal_1factor <- ' #start of model
# latent variable definitions (common factors)
verbal =~ NA*info_06 +
NA*voca_06 +
NA*comp_06 +
NA*simi_06
# latent variable variances
verbal ~~ 1*verbal
# latent variable covariances
# latent variable means
# manifest variable variances (uniquenesses)
info_06 ~~ info_06
voca_06 ~~ voca_06
comp_06 ~~ comp_06
simi_06 ~~ simi_06
# manifest variable covariances (uniquenesses)
#manifest variable means
info_06 ~ 1
voca_06 ~ 1
comp_06 ~ 1
simi_06 ~ 1
' #end of modelModel Fitting
Second, we fit the specified model to the data to obrain parameter estimates.
fit1 <- lavaan(verbal_1factor, data=dat, mimic = "mplus")
summary(fit1, standardized=TRUE, fit.measures=TRUE)## lavaan 0.6.16 ended normally after 17 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 12
##
## Number of observations 204
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 7.147
## Degrees of freedom 2
## P-value (Chi-square) 0.028
##
## Model Test Baseline Model:
##
## Test statistic 261.106
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.980
## Tucker-Lewis Index (TLI) 0.939
##
## Robust Comparative Fit Index (CFI) 0.980
## Robust Tucker-Lewis Index (TLI) 0.939
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2650.669
## Loglikelihood unrestricted model (H1) -2647.095
##
## Akaike (AIC) 5325.338
## Bayesian (BIC) 5365.155
## Sample-size adjusted Bayesian (SABIC) 5327.136
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.112
## 90 Percent confidence interval - lower 0.032
## 90 Percent confidence interval - upper 0.206
## P-value H_0: RMSEA <= 0.050 0.088
## P-value H_0: RMSEA >= 0.080 0.791
##
## Robust RMSEA 0.112
## 90 Percent confidence interval - lower 0.032
## 90 Percent confidence interval - upper 0.206
## P-value H_0: Robust RMSEA <= 0.050 0.088
## P-value H_0: Robust RMSEA >= 0.080 0.791
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.025
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## verbal =~
## info_06 4.582 0.412 11.112 0.000 4.582 0.751
## voca_06 4.714 0.421 11.192 0.000 4.714 0.751
## comp_06 7.013 0.659 10.642 0.000 7.013 0.722
## simi_06 4.843 0.527 9.190 0.000 4.843 0.642
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .info_06 19.776 0.427 46.273 0.000 19.776 3.240
## .voca_06 20.396 0.439 46.416 0.000 20.396 3.250
## .comp_06 21.797 0.680 32.036 0.000 21.797 2.243
## .simi_06 14.903 0.528 28.224 0.000 14.903 1.976
## verbal 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## verbal 1.000 1.000 1.000
## .info_06 16.271 2.419 6.725 0.000 16.271 0.437
## .voca_06 17.167 2.517 6.820 0.000 17.167 0.436
## .comp_06 45.258 6.176 7.328 0.000 45.258 0.479
## .simi_06 33.426 4.031 8.293 0.000 33.426 0.588
Model Evaluation
Some basic questions to ask after fitting a model - are Did it
work?
Does it make sense? Are the parameters viable?
Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)
Then, we can look more closely at how well the model fit, and interpret the model more generally.
- Note that there are 2 degrees of freedom in this model.
- Note how much better the fit of the model is.
- We can formalize the improvement in fit
We do model comparison…
Looking at fit statistics
#Compare model fit statistics
round(cbind(m1=inspect(fit0, 'fit.measures'),
m2=inspect(fit1, 'fit.measures')), 3)## m1 m2
## npar 8.000 12.000
## fmin 0.640 0.018
## chisq 261.106 7.147
## df 6.000 2.000
## pvalue 0.000 0.028
## baseline.chisq 261.106 261.106
## baseline.df 6.000 6.000
## baseline.pvalue 0.000 0.000
## cfi 0.000 0.980
## tli 0.000 0.939
## cfi.robust 0.000 0.980
## tli.robust 0.000 0.939
## nnfi 0.000 0.939
## rfi 1.000 0.918
## nfi 0.000 0.973
## pnfi 0.000 0.324
## ifi 0.000 0.980
## rni 0.000 0.980
## nnfi.robust 0.000 0.939
## rni.robust 0.000 0.980
## logl -2777.649 -2650.669
## unrestricted.logl -2647.095 -2647.095
## aic 5571.297 5325.338
## bic 5597.842 5365.155
## ntotal 204.000 204.000
## bic2 5572.496 5327.136
## rmsea 0.457 0.112
## rmsea.ci.lower 0.410 0.032
## rmsea.ci.upper 0.505 0.206
## rmsea.ci.level 0.900 0.900
## rmsea.pvalue 0.000 0.088
## rmsea.close.h0 0.050 0.050
## rmsea.notclose.pvalue 1.000 0.791
## rmsea.notclose.h0 0.080 0.080
## rmsea.robust 0.457 0.112
## rmsea.ci.lower.robust 0.410 0.032
## rmsea.ci.upper.robust 0.505 0.206
## rmsea.pvalue.robust 0.000 0.088
## rmsea.notclose.pvalue.robust 1.000 0.791
## rmr 18.441 1.294
## rmr_nomean 21.820 1.531
## srmr 0.337 0.025
## srmr_bentler 0.337 0.025
## srmr_bentler_nomean 0.399 0.029
## crmr 0.399 0.029
## crmr_nomean 0.515 0.038
## srmr_mplus 0.337 0.025
## srmr_mplus_nomean 0.399 0.029
## cn_05 10.838 172.016
## cn_01 14.135 263.893
## gfi 0.953 0.998
## agfi 0.889 0.984
## pgfi 0.408 0.143
## mfi 0.535 0.987
## ecvi 1.358 0.153
Formal model test
##
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
## fit1 2 5325.3 5365.2 7.147
## fit0 6 5571.3 5597.8 261.106 253.96 0.55346 4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We reject the null hypothesis that the models fit the same.
2-Factor Model
We have yet another theory - that the 4 tests are indicators of two correlated latent factors we will call f1 and f2.
Model Specification
We specify a 2-factor model by defining the latent variables and
adding in some additional arrows. Each factor must be indicated by
manifest variables, and must have a variance (maybe covariance and
mean).
We won’t go into it here, but identification constraints are
needed and important.
We set up a 2-factor model …
#Model with 1 common factor
verbal_2factor <- ' #start of model
# latent variable definitions (common factors)
f1 =~ NA*info_06 +
NA*voca_06
f2 =~ NA*comp_06 +
NA*simi_06
# latent variable variances
f1 ~~ 1*f1
f2 ~~ 1*f2
# latent variable covariances
f1 ~~ f2
# latent variable means
# manifest variable variances (uniquenesses)
info_06 ~~ info_06
voca_06 ~~ voca_06
comp_06 ~~ comp_06
simi_06 ~~ simi_06
# manifest variable covariances (uniquenesses)
#manifest variable means
info_06 ~ 1
voca_06 ~ 1
comp_06 ~ 1
simi_06 ~ 1
' #end of modelModel Fitting
Second, we fit the specified model to the data to obtain parameter estimates.
## Warning in lav_object_post_check(object): lavaan WARNING: covariance matrix of latent variables
## is not positive definite;
## use lavInspect(fit, "cov.lv") to investigate.
## lavaan 0.6.16 ended normally after 24 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 13
##
## Number of observations 204
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 6.805
## Degrees of freedom 1
## P-value (Chi-square) 0.009
##
## Model Test Baseline Model:
##
## Test statistic 261.106
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.977
## Tucker-Lewis Index (TLI) 0.863
##
## Robust Comparative Fit Index (CFI) 0.977
## Robust Tucker-Lewis Index (TLI) 0.863
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2650.498
## Loglikelihood unrestricted model (H1) -2647.095
##
## Akaike (AIC) 5326.996
## Bayesian (BIC) 5370.132
## Sample-size adjusted Bayesian (SABIC) 5328.944
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.169
## 90 Percent confidence interval - lower 0.067
## 90 Percent confidence interval - upper 0.298
## P-value H_0: RMSEA <= 0.050 0.030
## P-value H_0: RMSEA >= 0.080 0.929
##
## Robust RMSEA 0.169
## 90 Percent confidence interval - lower 0.067
## 90 Percent confidence interval - upper 0.298
## P-value H_0: Robust RMSEA <= 0.050 0.030
## P-value H_0: Robust RMSEA >= 0.080 0.929
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.024
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 =~
## info_06 4.553 0.413 11.032 0.000 4.553 0.746
## voca_06 4.675 0.424 11.015 0.000 4.675 0.745
## f2 =~
## comp_06 6.882 0.696 9.889 0.000 6.882 0.708
## simi_06 4.778 0.538 8.886 0.000 4.778 0.634
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 ~~
## f2 1.035 0.061 17.067 0.000 1.035 1.035
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .info_06 19.776 0.427 46.273 0.000 19.776 3.240
## .voca_06 20.396 0.439 46.416 0.000 20.396 3.250
## .comp_06 21.797 0.680 32.036 0.000 21.797 2.243
## .simi_06 14.903 0.528 28.224 0.000 14.903 1.976
## f1 0.000 0.000 0.000
## f2 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 1.000 1.000 1.000
## f2 1.000 1.000 1.000
## .info_06 16.529 2.423 6.821 0.000 16.529 0.444
## .voca_06 17.537 2.562 6.846 0.000 17.537 0.445
## .comp_06 47.070 6.912 6.810 0.000 47.070 0.498
## .simi_06 34.047 4.174 8.158 0.000 34.047 0.599
Model Evaluation
Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?
NOTICE THERE WAS A WARNING Something is wrong with this model. Look at the correlation between the factors, r = 1.03. That is not viable.
Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)
Then, we can look more closely at how well the model fit, and interpret the model more generally.
- Note that there are 2 degrees of freedom in this model.
- Note how much better the fit of the model is.
- We can formalize the improvement in fit
We do model comparison…
Looking at fit statistics
#Compare model fit statistics
round(cbind(m1=inspect(fit0, 'fit.measures'), m2=inspect(fit1, 'fit.measures'),
m3=inspect(fit2, 'fit.measures')), 3)## m1 m2 m3
## npar 8.000 12.000 13.000
## fmin 0.640 0.018 0.017
## chisq 261.106 7.147 6.805
## df 6.000 2.000 1.000
## pvalue 0.000 0.028 0.009
## baseline.chisq 261.106 261.106 261.106
## baseline.df 6.000 6.000 6.000
## baseline.pvalue 0.000 0.000 0.000
## cfi 0.000 0.980 0.977
## tli 0.000 0.939 0.863
## cfi.robust 0.000 0.980 0.977
## tli.robust 0.000 0.939 0.863
## nnfi 0.000 0.939 0.863
## rfi 1.000 0.918 0.844
## nfi 0.000 0.973 0.974
## pnfi 0.000 0.324 0.162
## ifi 0.000 0.980 0.978
## rni 0.000 0.980 0.977
## nnfi.robust 0.000 0.939 0.863
## rni.robust 0.000 0.980 0.977
## logl -2777.649 -2650.669 -2650.498
## unrestricted.logl -2647.095 -2647.095 -2647.095
## aic 5571.297 5325.338 5326.996
## bic 5597.842 5365.155 5370.132
## ntotal 204.000 204.000 204.000
## bic2 5572.496 5327.136 5328.944
## rmsea 0.457 0.112 0.169
## rmsea.ci.lower 0.410 0.032 0.067
## rmsea.ci.upper 0.505 0.206 0.298
## rmsea.ci.level 0.900 0.900 0.900
## rmsea.pvalue 0.000 0.088 0.030
## rmsea.close.h0 0.050 0.050 0.050
## rmsea.notclose.pvalue 1.000 0.791 0.929
## rmsea.notclose.h0 0.080 0.080 0.080
## rmsea.robust 0.457 0.112 0.169
## rmsea.ci.lower.robust 0.410 0.032 0.067
## rmsea.ci.upper.robust 0.505 0.206 0.298
## rmsea.pvalue.robust 0.000 0.088 0.030
## rmsea.notclose.pvalue.robust 1.000 0.791 0.929
## rmr 18.441 1.294 1.252
## rmr_nomean 21.820 1.531 1.481
## srmr 0.337 0.025 0.024
## srmr_bentler 0.337 0.025 0.024
## srmr_bentler_nomean 0.399 0.029 0.029
## crmr 0.399 0.029 0.029
## crmr_nomean 0.515 0.038 0.037
## srmr_mplus 0.337 0.025 0.024
## srmr_mplus_nomean 0.399 0.029 0.029
## cn_05 10.838 172.016 116.154
## cn_01 14.135 263.893 199.891
## gfi 0.953 0.998 0.998
## agfi 0.889 0.984 0.971
## pgfi 0.408 0.143 0.071
## mfi 0.535 0.987 0.986
## ecvi 1.358 0.153 0.161
Formal model test
##
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
## fit2 1 5327.0 5370.1 6.8053
## fit1 2 5325.3 5365.2 7.1470 0.34172 0 1 0.5588
We cannot reject the null hypothesis that the models fit the same. And - the 2-factor model is bad anyway.
We conclude that the 1-factor model is the optimal model, of those evaluated.
Example 2: A Different Set of Variables
Again, our goal is to code a model that matches an a priori hypothesis about the structure of the data, and evaluate the match between that model, specifically the mean and variance-covariance expectations, and the observed data (i.e., the observed means and variance-covaraince matrix).
We go through a series of models.
Of particular interest for what we are doing is the correlation matrix of 4 substantive variables in the data set (CO, VO, BL, OB). We can look at both the numeric and visual versions of the correlation matrix.
## comp_06 voca_06 bloc_06 obje_06
## comp_06 1.00 0.58 0.32 0.31
## voca_06 0.58 1.00 0.39 0.45
## bloc_06 0.32 0.39 1.00 0.55
## obje_06 0.31 0.45 0.55 1.00
And a compact visual version from the corrplot
package.
#visusal correlation matrix
corrplot(cor(dat[,c("comp_06","voca_06","bloc_06","obje_06")]), order = "original", tl.col='black', tl.cex=.75) We will consider a set of 3 models …
* 0-Factor model * 1-Factor model * 2-Factor model
Each time we engage with … Model Specification, Model Estimation, and Model Evaluation.
0-Factor Model
We start with a 0-factor model - basically a null model. But it is still a set of assumptions. Namely, that scores on one test are NOT related to scores on any of the other tests - i.e., that there are no covariances among the 4 tests.
Model Specification
First, code the model into a lavaan model object. Note that there are a variety of short-hand ways to program these models (i.e, defualts). We take a very explicit and comprehensive approach - where we basically ‘draw’ every single arrow into the model.
Basically, there are four formula types that correspond to different kinds of arrows.
latent variable definition =~ “is measured by” regression ~ “is regressed on” (residual) (co)variance ~~ “is correlated with” intercept ~ 1 “intercept” A complete lavaan model syntax is simply a combination of these formula types, enclosed between single quotes.
We set up a simple zero factor model …
#Model with 0 common factors
General_0factor <- ' #start of model
# latent variable definitions (common factors)
# latent variable variances
# latent variable covariances
# latent variable means
# manifest variable variances (uniquenesses)
comp_06 ~~ comp_06
voca_06 ~~ voca_06
bloc_06 ~~ bloc_06
obje_06 ~~ obje_06
# manifest variable covariances (uniquenesses)
#manifest variable means
comp_06 ~ 1
voca_06 ~ 1
bloc_06 ~ 1
obje_06 ~ 1
' #end of modelModel Fitting
Second, we fit the specified model to the data to obrain parameter estimates.
fit0 <- lavaan(General_0factor, data=dat, mimic = "mplus")
summary(fit0, standardized=TRUE, fit.measures=TRUE)## lavaan 0.6.16 ended normally after 8 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 204
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 214.189
## Degrees of freedom 6
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 214.189
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.000
## Tucker-Lewis Index (TLI) 0.000
##
## Robust Comparative Fit Index (CFI) 0.000
## Robust Tucker-Lewis Index (TLI) 0.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2950.942
## Loglikelihood unrestricted model (H1) -2843.848
##
## Akaike (AIC) 5917.885
## Bayesian (BIC) 5944.430
## Sample-size adjusted Bayesian (SABIC) 5919.084
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.412
## 90 Percent confidence interval - lower 0.366
## 90 Percent confidence interval - upper 0.461
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Robust RMSEA 0.412
## 90 Percent confidence interval - lower 0.366
## 90 Percent confidence interval - upper 0.461
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.292
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## comp_06 21.797 0.680 32.036 0.000 21.797 2.243
## voca_06 20.396 0.439 46.416 0.000 20.396 3.250
## bloc_06 6.942 0.461 15.063 0.000 6.942 1.055
## obje_06 25.258 1.145 22.059 0.000 25.258 1.544
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## comp_06 94.434 9.350 10.100 0.000 94.434 1.000
## voca_06 39.390 3.900 10.100 0.000 39.390 1.000
## bloc_06 43.333 4.291 10.100 0.000 43.333 1.000
## obje_06 267.462 26.483 10.100 0.000 267.462 1.000
Model Evaluation
Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?
Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)
Note that we asked for standardized estimates in the diagram.
We can also look more closely at how well the model fit, and interpret the model more generally. * Note that there are 6 degrees of freedom in this model. * Note how bad the fit of the model is. * There is room for improvement. Do we have another theory?
1-Factor Model
We have another theory - that the 4 tests are indicators of a single latent factor we’ll call G (general ability).
Model Specification
We specify a 1-factor model by defining a latent variable and adding in some additional arrows. The factor must be indicated by manifest variables, and must have a variance (maybe covariance and mean).
Basically, there are four formula types that correspond to different kinds of arrows.
latent variable definition =~ “is measured by” regression ~ “is regressed on” (residual) (co)variance ~~ “is correlated with” intercept ~ 1 “intercept” A complete lavaan model syntax is simply a combination of these formula types, enclosed between single quotes.
We set up a 1-factor model …
#Model with 1 common factor
General_1factor <- ' #start of model
# latent variable definitions (common factors)
G =~ NA*comp_06 +
NA*voca_06 +
NA*bloc_06 +
NA*obje_06
# latent variable variances
G ~~ 1*G
# latent variable covariances
# latent variable means
# manifest variable variances (uniquenesses)
comp_06 ~~ comp_06
voca_06 ~~ voca_06
bloc_06 ~~ bloc_06
obje_06 ~~ obje_06
# manifest variable covariances (uniquenesses)
#manifest variable means
comp_06 ~ 1
voca_06 ~ 1
bloc_06 ~ 1
obje_06 ~ 1
' #end of modelModel Fitting
Second, we fit the specified model to the data to obtain parameter estimates.
fit1 <- lavaan(General_1factor, data=dat, mimic = "mplus")
summary(fit1, standardized=TRUE, fit.measures=TRUE)## lavaan 0.6.16 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 12
##
## Number of observations 204
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 31.453
## Degrees of freedom 2
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 214.189
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.859
## Tucker-Lewis Index (TLI) 0.576
##
## Robust Comparative Fit Index (CFI) 0.859
## Robust Tucker-Lewis Index (TLI) 0.576
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2859.574
## Loglikelihood unrestricted model (H1) -2843.848
##
## Akaike (AIC) 5743.148
## Bayesian (BIC) 5782.966
## Sample-size adjusted Bayesian (SABIC) 5744.946
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.269
## 90 Percent confidence interval - lower 0.191
## 90 Percent confidence interval - upper 0.355
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Robust RMSEA 0.269
## 90 Percent confidence interval - lower 0.191
## 90 Percent confidence interval - upper 0.355
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.064
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## G =~
## comp_06 6.409 0.721 8.891 0.000 6.409 0.660
## voca_06 4.888 0.482 10.151 0.000 4.888 0.779
## bloc_06 3.788 0.544 6.961 0.000 3.788 0.575
## obje_06 10.053 1.322 7.606 0.000 10.053 0.615
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .comp_06 21.797 0.680 32.036 0.000 21.797 2.243
## .voca_06 20.396 0.439 46.416 0.000 20.396 3.250
## .bloc_06 6.942 0.461 15.063 0.000 6.942 1.055
## .obje_06 25.258 1.145 22.059 0.000 25.258 1.544
## G 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## G 1.000 1.000 1.000
## .comp_06 53.358 7.333 7.276 0.000 53.358 0.565
## .voca_06 15.493 3.415 4.536 0.000 15.493 0.393
## .bloc_06 28.985 3.880 7.470 0.000 28.985 0.669
## .obje_06 166.401 23.404 7.110 0.000 166.401 0.622
Model Evaluation
Some basic questions to ask after fitting a model - are Did it
work?
Does it make sense? Are the parameters viable?
Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)
Then, we can look more closely at how well the model fit, and interpret the model more generally.
- Note that there are 2 degrees of freedom in this model.
- Note how much better the fit of the model is.
- We can formalize the improvement in fit
We do model comparison…
Looking at fit statistics
#Compare model fit statistics
round(cbind(m1=inspect(fit0, 'fit.measures'),
m2=inspect(fit1, 'fit.measures')), 3)## m1 m2
## npar 8.000 12.000
## fmin 0.525 0.077
## chisq 214.189 31.453
## df 6.000 2.000
## pvalue 0.000 0.000
## baseline.chisq 214.189 214.189
## baseline.df 6.000 6.000
## baseline.pvalue 0.000 0.000
## cfi 0.000 0.859
## tli 0.000 0.576
## cfi.robust 0.000 0.859
## tli.robust 0.000 0.576
## nnfi 0.000 0.576
## rfi 0.000 0.559
## nfi 0.000 0.853
## pnfi 0.000 0.284
## ifi 0.000 0.861
## rni 0.000 0.859
## nnfi.robust 0.000 0.576
## rni.robust 0.000 0.859
## logl -2950.942 -2859.574
## unrestricted.logl -2843.848 -2843.848
## aic 5917.885 5743.148
## bic 5944.430 5782.966
## ntotal 204.000 204.000
## bic2 5919.084 5744.946
## rmsea 0.412 0.269
## rmsea.ci.lower 0.366 0.191
## rmsea.ci.upper 0.461 0.355
## rmsea.ci.level 0.900 0.900
## rmsea.pvalue 0.000 0.000
## rmsea.close.h0 0.050 0.050
## rmsea.notclose.pvalue 1.000 1.000
## rmsea.notclose.h0 0.080 0.080
## rmsea.robust 0.412 0.269
## rmsea.ci.lower.robust 0.366 0.191
## rmsea.ci.upper.robust 0.461 0.355
## rmsea.pvalue.robust 0.000 0.000
## rmsea.notclose.pvalue.robust 1.000 1.000
## rmr 26.731 7.015
## rmr_nomean 31.629 8.300
## srmr 0.292 0.064
## srmr_bentler 0.292 0.064
## srmr_bentler_nomean 0.345 0.076
## crmr 0.345 0.076
## crmr_nomean 0.446 0.098
## srmr_mplus 0.292 0.064
## srmr_mplus_nomean 0.345 0.076
## cn_05 12.993 39.860
## cn_01 17.012 60.738
## gfi 0.946 0.988
## agfi 0.875 0.916
## pgfi 0.406 0.141
## mfi 0.600 0.930
## ecvi 1.128 0.272
Formal model test
##
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
## fit1 2 5743.1 5783.0 31.453
## fit0 6 5917.9 5944.4 214.189 182.74 0.46802 4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We reject the null hypothesis that the models fit the same.
2-Factor Model
We have yet another theory - that the 4 tests are indicators of two correlated latent factor we’ll call Gf and Gc.
Model Specification
We specify a 2-factor model by defining the latent variables and
adding in some additional arrows. Each factor must be indicated by
manifest variables, and must have a variance (maybe covariance and
mean).
We won’t go into it here, but identification constraints are
needed and important.
We set up a 2-factor model …
#Model with 1 common factor
General_2factor <- ' #start of model
# latent variable definitions (common factors)
Gc =~ NA*comp_06 +
NA*voca_06
Gf =~ NA*bloc_06 +
NA*obje_06
# latent variable variances
Gc ~~ 1*Gc
Gf ~~ 1*Gf
# latent variable covariances
Gc ~~ Gf
# latent variable means
# manifest variable variances (uniquenesses)
comp_06 ~~ comp_06
voca_06 ~~ voca_06
bloc_06 ~~ bloc_06
obje_06 ~~ obje_06
# manifest variable covariances (uniquenesses)
#manifest variable means
comp_06 ~ 1
voca_06 ~ 1
bloc_06 ~ 1
obje_06 ~ 1
' #end of modelModel Fitting
Second, we fit the specified model to the data to obrain parameter estimates.
fit2 <- lavaan(General_2factor, data=dat, mimic = "mplus")
summary(fit2, standardized=TRUE, fit.measures=TRUE)## lavaan 0.6.16 ended normally after 29 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 13
##
## Number of observations 204
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 1.167
## Degrees of freedom 1
## P-value (Chi-square) 0.280
##
## Model Test Baseline Model:
##
## Test statistic 214.189
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.999
## Tucker-Lewis Index (TLI) 0.995
##
## Robust Comparative Fit Index (CFI) 0.999
## Robust Tucker-Lewis Index (TLI) 0.995
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2844.431
## Loglikelihood unrestricted model (H1) -2843.848
##
## Akaike (AIC) 5714.862
## Bayesian (BIC) 5757.998
## Sample-size adjusted Bayesian (SABIC) 5716.810
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.029
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.191
## P-value H_0: RMSEA <= 0.050 0.394
## P-value H_0: RMSEA >= 0.080 0.462
##
## Robust RMSEA 0.029
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.191
## P-value H_0: Robust RMSEA <= 0.050 0.394
## P-value H_0: Robust RMSEA >= 0.080 0.462
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.010
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Gc =~
## comp_06 6.420 0.758 8.466 0.000 6.420 0.661
## voca_06 5.550 0.526 10.547 0.000 5.550 0.884
## Gf =~
## bloc_06 4.576 0.517 8.851 0.000 4.576 0.695
## obje_06 12.829 1.320 9.718 0.000 12.829 0.784
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Gc ~~
## Gf 0.641 0.074 8.642 0.000 0.641 0.641
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .comp_06 21.797 0.680 32.036 0.000 21.797 2.243
## .voca_06 20.396 0.439 46.416 0.000 20.396 3.250
## .bloc_06 6.942 0.461 15.063 0.000 6.942 1.055
## .obje_06 25.258 1.145 22.059 0.000 25.258 1.544
## Gc 0.000 0.000 0.000
## Gf 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Gc 1.000 1.000 1.000
## Gf 1.000 1.000 1.000
## .comp_06 53.220 7.931 6.710 0.000 53.220 0.564
## .voca_06 8.592 4.511 1.905 0.057 8.592 0.218
## .bloc_06 22.391 3.717 6.025 0.000 22.391 0.517
## .obje_06 102.876 25.562 4.025 0.000 102.876 0.385
Model Evaluation
Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?
Lets look at a diagram of the model to check if it matches what we intended (this should also be done with the summary information.)
Then, we can look more closely at how well the model fit, and interpret the model more generally.
- Note that there are 2 degrees of freedom in this model.
- Note how much better the fit of the model is.
- We can formalize the improvement in fit
We do model comparison…
Looking at fit statistics
#Compare model fit statistics
round(cbind(m1=inspect(fit0, 'fit.measures'),
m2=inspect(fit1, 'fit.measures'),
m3=inspect(fit2, 'fit.measures')), 3)## m1 m2 m3
## npar 8.000 12.000 13.000
## fmin 0.525 0.077 0.003
## chisq 214.189 31.453 1.167
## df 6.000 2.000 1.000
## pvalue 0.000 0.000 0.280
## baseline.chisq 214.189 214.189 214.189
## baseline.df 6.000 6.000 6.000
## baseline.pvalue 0.000 0.000 0.000
## cfi 0.000 0.859 0.999
## tli 0.000 0.576 0.995
## cfi.robust 0.000 0.859 0.999
## tli.robust 0.000 0.576 0.995
## nnfi 0.000 0.576 0.995
## rfi 0.000 0.559 0.967
## nfi 0.000 0.853 0.995
## pnfi 0.000 0.284 0.166
## ifi 0.000 0.861 0.999
## rni 0.000 0.859 0.999
## nnfi.robust 0.000 0.576 0.995
## rni.robust 0.000 0.859 0.999
## logl -2950.942 -2859.574 -2844.431
## unrestricted.logl -2843.848 -2843.848 -2843.848
## aic 5917.885 5743.148 5714.862
## bic 5944.430 5782.966 5757.998
## ntotal 204.000 204.000 204.000
## bic2 5919.084 5744.946 5716.810
## rmsea 0.412 0.269 0.029
## rmsea.ci.lower 0.366 0.191 0.000
## rmsea.ci.upper 0.461 0.355 0.191
## rmsea.ci.level 0.900 0.900 0.900
## rmsea.pvalue 0.000 0.000 0.394
## rmsea.close.h0 0.050 0.050 0.050
## rmsea.notclose.pvalue 1.000 1.000 0.462
## rmsea.notclose.h0 0.080 0.080 0.080
## rmsea.robust 0.412 0.269 0.029
## rmsea.ci.lower.robust 0.366 0.191 0.000
## rmsea.ci.upper.robust 0.461 0.355 0.191
## rmsea.pvalue.robust 0.000 0.000 0.394
## rmsea.notclose.pvalue.robust 1.000 1.000 0.462
## rmr 26.731 7.015 0.980
## rmr_nomean 31.629 8.300 1.160
## srmr 0.292 0.064 0.010
## srmr_bentler 0.292 0.064 0.010
## srmr_bentler_nomean 0.345 0.076 0.012
## crmr 0.345 0.076 0.012
## crmr_nomean 0.446 0.098 0.015
## srmr_mplus 0.292 0.064 0.010
## srmr_mplus_nomean 0.345 0.076 0.012
## cn_05 12.993 39.860 672.610
## cn_01 17.012 60.738 1160.992
## gfi 0.946 0.988 1.000
## agfi 0.875 0.916 0.994
## pgfi 0.406 0.141 0.071
## mfi 0.600 0.930 1.000
## ecvi 1.128 0.272 0.133
Formal model test
##
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
## fit2 1 5714.9 5758 1.1668
## fit1 2 5743.1 5783 31.4527 30.286 0.37889 1 3.728e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We reject the null hypothesis that the models fit the same.
We conclude that the 2-factor model is the optimal model, of those evaluated.
Conclusion
We’ve walked through specification, estimation, and evaluation of a variety of simple Confirmatory Factor Models. Our examples highlight how the framework is used to test the relative fit of different theories to the data. Lots of fun to be had - but always be careful to respect the confirmatory nature of this kind of modeling enterprise - and to use the models for good (and not evil).
Thanks for playing!