Ensemble Methods - Bagging, Random Forests, Boosting
Overview
Ensemble Methods are methods that combine together many model predictions. For example, in Bagging (short for bootstrap aggregation), parallel models are constructed on m = many bootstrapped samples (eg., 50), and then the predictions from the m models are averaged to obtain the prediction from the ensemble of models.
In this tutorial we walk through basics of three Ensemble Methods: Bagging, Random Forests, and Boosting.
Outline
In this session we cover …
- Introduction to Data
- Splitting Data into Training and Test sets
- Model 0: A Single Classification Tree
- Model 1: Bagging of ctrees
- Model 2: Random Forest for classification
trees
- Model 2a: CForest for Conditional Inference
Tree
- Model 3: Random Forest with Boosting
- Model Stacking (Not included yet)
- Model Comparison
- Conclusion
Loading Libraries Used In This Script
library(ISLR) #the Carseat Data
library(psych) #data descriptives
library(caret) #training and cross validation, other model libraries
library(rpart) #trees
library(rattle) #fancy tree plot
library(rpart.plot) #enhanced tree plots
library(RColorBrewer) #color pallets
library(party) #alternative decision tree algorithm
library(partykit) #convert rpart object to BinaryTree
library(randomForest) #random forest
library(pROC) #ROC curves
library(gbm) #gradient boosting
library(ggplot2) #data visualization
library(dplyr) #data manipulation1. Introduction to the Data
Reading in the CarSeats Data Exploration Data Set
This is a simulated data set containing sales of child car seats at 400 different stores. Sales can be predicted by 10 other variables.
Data Descriptives
Lets have a quick look at the data file and the descriptives.
## Sales CompPrice Income Advertising Population Price ShelveLoc Age Education
## 1 9.50 138 73 11 276 120 Bad 42 17
## 2 11.22 111 48 16 260 83 Good 65 10
## 3 10.06 113 35 10 269 80 Medium 59 12
## 4 7.40 117 100 4 466 97 Medium 55 14
## 5 4.15 141 64 3 340 128 Bad 38 13
## 6 10.81 124 113 13 501 72 Bad 78 16
## 7 6.63 115 105 0 45 108 Medium 71 15
## 8 11.85 136 81 15 425 120 Good 67 10
## 9 6.54 132 110 0 108 124 Medium 76 10
## 10 4.69 132 113 0 131 124 Medium 76 17
## Urban US
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes No
## 6 No Yes
## 7 Yes No
## 8 Yes Yes
## 9 No No
## 10 No Yes
Our outcome of interest will be a binary version of
Sales: Unit sales (in thousands) at each location.
(Note again that there is no id variable. This is
convenient for some tasks.)
## vars n mean sd median trimmed mad min max range
## Sales 1 400 7.50 2.82 7.49 7.43 2.87 0 16.27 16.27
## CompPrice 2 400 124.97 15.33 125.00 125.04 14.83 77 175.00 98.00
## Income 3 400 68.66 27.99 69.00 68.26 35.58 21 120.00 99.00
## Advertising 4 400 6.64 6.65 5.00 5.89 7.41 0 29.00 29.00
## Population 5 400 264.84 147.38 272.00 265.56 191.26 10 509.00 499.00
## Price 6 400 115.79 23.68 117.00 115.92 22.24 24 191.00 167.00
## ShelveLoc* 7 400 2.31 0.83 3.00 2.38 0.00 1 3.00 2.00
## Age 8 400 53.32 16.20 54.50 53.48 20.02 25 80.00 55.00
## Education 9 400 13.90 2.62 14.00 13.88 2.97 10 18.00 8.00
## Urban* 10 400 1.71 0.46 2.00 1.76 0.00 1 2.00 1.00
## US* 11 400 1.65 0.48 2.00 1.68 0.00 1 2.00 1.00
## skew kurtosis se
## Sales 0.18 -0.11 0.14
## CompPrice -0.04 0.01 0.77
## Income 0.05 -1.10 1.40
## Advertising 0.63 -0.57 0.33
## Population -0.05 -1.21 7.37
## Price -0.12 0.41 1.18
## ShelveLoc* -0.62 -1.28 0.04
## Age -0.08 -1.14 0.81
## Education 0.04 -1.31 0.13
## Urban* -0.90 -1.20 0.02
## US* -0.60 -1.64 0.02
#histogram of outcome
Carseats %>%
ggplot(aes(x=Sales)) +
geom_histogram(binwidth=1, boundary=.5, fill="white", color="black") +
geom_vline(xintercept = 8, color="red", linewidth=2) +
labs(x = "Sales")For convenience of didactic illustration we create a new variable
HighSales that is binary, “No” if Sales <= 8, and “Yes”
otherwise.
2. Splitting the Data Into Training and Test Sets
We split the data: half for Training, half for Testing.
#random sample half the rows
halfsample = sample(dim(Carseats)[1], dim(Carseats)[1]/2) # half of sample
#create training and test data sets
Carseats.train = Carseats[halfsample, ]
Carseats.test = Carseats[-halfsample, ]We will use these to evaluate a variety of different classification algorithms: Random Forests, CForests, etc.
3. Model 0: A Single Classification Tree
We first optimize fit of a classification tree. Our objective with the cross-validation is to optimize the size of the tree - tuning the complexity parameter.
train.tree <- train(as.factor(HighSales) ~ .,
data=Carseats.train,
method="ctree",
trControl=cvcontrol,
tuneLength = 10)
train.tree## Conditional Inference Tree
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## mincriterion Accuracy Kappa
## 0.0100000 0.690 0.3509902
## 0.1188889 0.690 0.3509902
## 0.2277778 0.690 0.3509902
## 0.3366667 0.685 0.3430716
## 0.4455556 0.685 0.3430716
## 0.5544444 0.665 0.3062027
## 0.6633333 0.690 0.3544755
## 0.7722222 0.675 0.3169755
## 0.8811111 0.680 0.3406330
## 0.9900000 0.690 0.3768388
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mincriterion = 0.99.
We see how the accuracy is maximized at a relatively less complex tree.
Look at the final tree:
To evaluate the accuracy of the tree we can look at the confusion matrix for the Training data.
#obtaining class predictions
tree.classTrain <- predict(train.tree, type="raw")
head(tree.classTrain)## [1] No No No No No No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 103 17
## Yes 28 52
##
## Accuracy : 0.775
## 95% CI : (0.7108, 0.8309)
## No Information Rate : 0.655
## P-Value [Acc > NIR] : 0.0001539
##
## Kappa : 0.5203
##
## Mcnemar's Test P-Value : 0.1360371
##
## Sensitivity : 0.7863
## Specificity : 0.7536
## Pos Pred Value : 0.8583
## Neg Pred Value : 0.6500
## Prevalence : 0.6550
## Detection Rate : 0.5150
## Detection Prevalence : 0.6000
## Balanced Accuracy : 0.7699
##
## 'Positive' Class : No
##
Some Errors. But the model was learned.
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
tree.classTest <- predict(train.tree,
newdata = Carseats.test,
type="raw")
head(tree.classTest)## [1] Yes No No No Yes No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 87 29
## Yes 35 49
##
## Accuracy : 0.68
## 95% CI : (0.6105, 0.744)
## No Information Rate : 0.61
## P-Value [Acc > NIR] : 0.02412
##
## Kappa : 0.3367
##
## Mcnemar's Test P-Value : 0.53197
##
## Sensitivity : 0.7131
## Specificity : 0.6282
## Pos Pred Value : 0.7500
## Neg Pred Value : 0.5833
## Prevalence : 0.6100
## Detection Rate : 0.4350
## Detection Prevalence : 0.5800
## Balanced Accuracy : 0.6707
##
## 'Positive' Class : No
##
Accuracy of 0.68
When evaluating classification models, a few other functions may be
useful. For example, the pROC package provides convenience
for calculating confusion matrices, the associated measures of
sensitivity and specificity, and for obtaining and plotting ROC curves.
We can also look at the ROC curve by extracting probabilities of
“Yes”.
#Obtaining predicted probabilities for Test data
tree.probs=predict(train.tree,
newdata=Carseats.test,
type="prob")
head(tree.probs)## No Yes
## 1 0.2156863 0.7843137
## 2 0.8269231 0.1730769
## 3 0.8269231 0.1730769
## 4 0.8269231 0.1730769
## 5 0.2156863 0.7843137
## 6 0.8269231 0.1730769
## Setting levels: control = No, case = Yes
## Setting direction: controls < cases
## Area under the curve: 0.6823
4. Model 1: Bagging of ctrees
Training the model using treebag.
We first optimize fit of a classification tree. Our objective with the cross-validation is to optmize the size of the tree - tuning the complexity parameter.
#Using treebag
train.bagg <- train(as.factor(HighSales) ~ .,
data=Carseats.train,
method="treebag",
trControl=cvcontrol,
importance=TRUE)
train.bagg## Bagged CART
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results:
##
## Accuracy Kappa
## 0.78 0.5303142
Not yet sure how to parse mode details from the output in order to look at the collection of trees.
To evaluate the accuracy of the Bagged Trees we can look at the confusion matrix for the Training data.
#obtaining class predictions
bagg.classTrain <- predict(train.bagg, type="raw")
head(bagg.classTrain)## [1] No No Yes No No No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 120 0
## Yes 0 80
##
## Accuracy : 1
## 95% CI : (0.9817, 1)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0
## Specificity : 1.0
## Pos Pred Value : 1.0
## Neg Pred Value : 1.0
## Prevalence : 0.6
## Detection Rate : 0.6
## Detection Prevalence : 0.6
## Balanced Accuracy : 1.0
##
## 'Positive' Class : No
##
The accuracy is perfect!
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
bagg.classTest <- predict(train.bagg,
newdata = Carseats.test,
type="raw")
head(bagg.classTest)## [1] Yes Yes No No Yes No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 89 27
## Yes 31 53
##
## Accuracy : 0.71
## 95% CI : (0.6418, 0.7718)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.0007913
##
## Kappa : 0.4008
##
## Mcnemar's Test P-Value : 0.6936406
##
## Sensitivity : 0.7417
## Specificity : 0.6625
## Pos Pred Value : 0.7672
## Neg Pred Value : 0.6310
## Prevalence : 0.6000
## Detection Rate : 0.4450
## Detection Prevalence : 0.5800
## Balanced Accuracy : 0.7021
##
## 'Positive' Class : No
##
Accuracy of 0.79
We can also look at the ROC curve by extracting probabilities of “Yes”.
#Obtaining predicted probabilities for Test data
bagg.probs=predict(train.bagg,
newdata=Carseats.test,
type="prob")
head(bagg.probs)## No Yes
## 1 0.20 0.80
## 2 0.48 0.52
## 3 0.76 0.24
## 4 0.88 0.12
## 5 0.00 1.00
## 6 0.92 0.08
## Setting levels: control = No, case = Yes
## Setting direction: controls < cases
## Area under the curve: 0.7972
5. Model 2: Random Forest for Classification Trees
Training the model using random forest.
train.rf <- train(as.factor(HighSales) ~ .,
data=Carseats.train,
method="rf",
trControl=cvcontrol,
#tuneLength = 3,
importance=TRUE)
train.rf## Random Forest
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.810 0.5817081
## 6 0.770 0.5061877
## 10 0.785 0.5381540
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
We can look at the confusion matrix for the Training data.
## [1] No No Yes No No No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 120 0
## Yes 0 80
##
## Accuracy : 1
## 95% CI : (0.9817, 1)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0
## Specificity : 1.0
## Pos Pred Value : 1.0
## Neg Pred Value : 1.0
## Prevalence : 0.6
## Detection Rate : 0.6
## Detection Prevalence : 0.6
## Balanced Accuracy : 1.0
##
## 'Positive' Class : No
##
No Errors. That is good - the model was learned well.
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
rf.classTest <- predict(train.rf,
newdata = Carseats.test,
type="raw")
head(rf.classTest)## [1] Yes No No No Yes No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 106 10
## Yes 37 47
##
## Accuracy : 0.765
## 95% CI : (0.7, 0.8219)
## No Information Rate : 0.715
## P-Value [Acc > NIR] : 0.0663581
##
## Kappa : 0.4953
##
## Mcnemar's Test P-Value : 0.0001491
##
## Sensitivity : 0.7413
## Specificity : 0.8246
## Pos Pred Value : 0.9138
## Neg Pred Value : 0.5595
## Prevalence : 0.7150
## Detection Rate : 0.5300
## Detection Prevalence : 0.5800
## Balanced Accuracy : 0.7829
##
## 'Positive' Class : No
##
Accuracy of 0.79. An improvement over Bagging only.
We can also look at the ROC curve by extracting probabilities of “Yes”.
#Obtaining predicted probabilities for Test data
rf.probs=predict(train.rf,
newdata=Carseats.test,
type="prob")
head(rf.probs)## No Yes
## 3 0.334 0.666
## 4 0.672 0.328
## 5 0.730 0.270
## 7 0.918 0.082
## 8 0.250 0.750
## 9 0.814 0.186
## Setting levels: control = No, case = Yes
## Setting direction: controls < cases
## Area under the curve: 0.8449
6. Model 2a: CForest for Conditional Inference Tree
An implementation of the random forest and bagging ensemble
algorithms utilizing conditional inference trees as base learners (from
party package)
train.cf <- train(HighSales ~ ., #cforest knows the outcome is binary (unlike rf)
data=Carseats.train,
method="cforest",
trControl=cvcontrol) #Note that importance not available here
train.cf## Conditional Inference Random Forest
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.795 0.5468233
## 6 0.775 0.5205785
## 10 0.780 0.5333754
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
We can look at the confusion matrix for the Training data.
## [1] No No No No No No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 116 4
## Yes 29 51
##
## Accuracy : 0.835
## 95% CI : (0.7762, 0.8836)
## No Information Rate : 0.725
## P-Value [Acc > NIR] : 0.0001807
##
## Kappa : 0.6374
##
## Mcnemar's Test P-Value : 2.943e-05
##
## Sensitivity : 0.8000
## Specificity : 0.9273
## Pos Pred Value : 0.9667
## Neg Pred Value : 0.6375
## Prevalence : 0.7250
## Detection Rate : 0.5800
## Detection Prevalence : 0.6000
## Balanced Accuracy : 0.8636
##
## 'Positive' Class : No
##
A few Errors. Model learned pretty well.
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
cf.classTest <- predict(train.cf,
newdata = Carseats.test,
type="raw")
head(cf.classTest)## [1] Yes No No No Yes No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 106 10
## Yes 52 32
##
## Accuracy : 0.69
## 95% CI : (0.6209, 0.7533)
## No Information Rate : 0.79
## P-Value [Acc > NIR] : 0.9997
##
## Kappa : 0.3166
##
## Mcnemar's Test P-Value : 1.919e-07
##
## Sensitivity : 0.6709
## Specificity : 0.7619
## Pos Pred Value : 0.9138
## Neg Pred Value : 0.3810
## Prevalence : 0.7900
## Detection Rate : 0.5300
## Detection Prevalence : 0.5800
## Balanced Accuracy : 0.7164
##
## 'Positive' Class : No
##
Accuracy of 0.775.
We can also look at the ROC curve by extracting probabilities of “Yes”.
#Obtaining predicted probabilities for Test data
cf.probs=predict(train.cf,
newdata=Carseats.test,
type="prob")
head(cf.probs)## No Yes
## 1 0.4666175 0.5333825
## 2 0.6406987 0.3593013
## 3 0.7342379 0.2657621
## 4 0.7882546 0.2117454
## 5 0.4409320 0.5590680
## 6 0.7540845 0.2459155
## Setting levels: control = No, case = Yes
## Setting direction: controls < cases
## Area under the curve: 0.7787
7. Model 3: Random Forest with Boosting
Possible to use a variety of packages (all can be accessed through
caret): gbm, ada, and xgbLinear
Can lookup the various tuning parameters
## model parameter label forReg forClass probModel
## 1 ada iter #Trees FALSE TRUE TRUE
## 2 ada maxdepth Max Tree Depth FALSE TRUE TRUE
## 3 ada nu Learning Rate FALSE TRUE TRUE
## model parameter label forReg forClass probModel
## 1 gbm n.trees # Boosting Iterations TRUE TRUE TRUE
## 2 gbm interaction.depth Max Tree Depth TRUE TRUE TRUE
## 3 gbm shrinkage Shrinkage TRUE TRUE TRUE
## 4 gbm n.minobsinnode Min. Terminal Node Size TRUE TRUE TRUE
Here, we use Gradient Boosting Example tuning parameters for
gbm: http://topepo.github.io/caret/training.html
Training with gradient boosting
train.gbm <- train(as.factor(HighSales) ~ .,
data=Carseats.train,
method="gbm",
verbose=F,
trControl=cvcontrol)
train.gbm## Stochastic Gradient Boosting
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## interaction.depth n.trees Accuracy Kappa
## 1 50 0.810 0.5864970
## 1 100 0.850 0.6781974
## 1 150 0.840 0.6559583
## 2 50 0.835 0.6435729
## 2 100 0.855 0.6870897
## 2 150 0.860 0.6973974
## 3 50 0.845 0.6600169
## 3 100 0.845 0.6640979
## 3 150 0.865 0.7048597
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were n.trees = 150, interaction.depth =
## 3, shrinkage = 0.1 and n.minobsinnode = 10.
We can look at the confusion matrix for the Training data.
## [1] No No Yes No No No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 120 0
## Yes 0 80
##
## Accuracy : 1
## 95% CI : (0.9817, 1)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0
## Specificity : 1.0
## Pos Pred Value : 1.0
## Neg Pred Value : 1.0
## Prevalence : 0.6
## Detection Rate : 0.6
## Detection Prevalence : 0.6
## Balanced Accuracy : 1.0
##
## 'Positive' Class : No
##
A few Errors. Model learned quite well.
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
gbm.classTest <- predict(train.gbm,
newdata = Carseats.test,
type="raw")
head(gbm.classTest)## [1] Yes No No No Yes No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 103 13
## Yes 28 56
##
## Accuracy : 0.795
## 95% CI : (0.7323, 0.8487)
## No Information Rate : 0.655
## P-Value [Acc > NIR] : 1.035e-05
##
## Kappa : 0.5686
##
## Mcnemar's Test P-Value : 0.02878
##
## Sensitivity : 0.7863
## Specificity : 0.8116
## Pos Pred Value : 0.8879
## Neg Pred Value : 0.6667
## Prevalence : 0.6550
## Detection Rate : 0.5150
## Detection Prevalence : 0.5800
## Balanced Accuracy : 0.7989
##
## 'Positive' Class : No
##
Accuracy of 0.83
We can also look at the ROC curve by extracting probabilities of “Yes”.
#Obtaining predicted probabilities for Test data
gbm.probs=predict(train.gbm,
newdata=Carseats.test,
type="prob")
head(gbm.probs)## No Yes
## 1 0.140471649 0.859528351
## 2 0.794429421 0.205570579
## 3 0.887294223 0.112705777
## 4 0.996868156 0.003131844
## 5 0.002862385 0.997137615
## 6 0.967026957 0.032973043
## Setting levels: control = No, case = Yes
## Setting direction: controls < cases
## Area under the curve: 0.8837
8. Model Stacking
See …
https://machinelearningmastery.com/machine-learning-ensembles-with-r/
https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/
9. Model Comparisons
We can examine how the models do by looking at the ROC curves.
plot(rocCurve.tree, col=c(4))
plot(rocCurve.bagg, add=TRUE, col=c(6)) #color magenta is bagg
plot(rocCurve.rf, add=TRUE, col=c(1)) #color black is rf
plot(rocCurve.cf, add=TRUE, col=c(2)) #color red is cforest
plot(rocCurve.gbm, add=TRUE, col=c(3)) #color green is gbmTree = blue, Bagg = magenta, RF = black, CForest = red, gradient boosting = green
10. Conclusion
For this example, random forests and boosting are more stable than the other methods. Comparing the variable importance metrics to the decision tree results is a way to see how likely the tree is to generalize.
Thank you for playing!
Citations
Brownlee, J. (2016, February 7). How to Build an Ensemble Of Machine Learning Algorithms in R. MachineLearningMastery.Com. https://www.machinelearningmastery.com/machine-learning-ensembles-with-r/
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3), 355–373. https://doi.org/10.1093/biostatistics/kxj011
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651–674. https://doi.org/10.1198/106186006X133933
Hothorn, T., & Zeileis, A. (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16(118), 3905–3909.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). ISLR: Data for an Introduction to Statistical Learning with Applications in R (Version 1.4). https://CRAN.R-project.org/package=ISLR
Kaushik, S. (2019, June 25). Ensemble Models in machine learning? (With code in R). Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/
Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28, 1–26. https://doi.org/10.18637/jss.v028.i05
Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News, 2(3), 18–22.
Milborrow, S. (2024). Rpart.plot: Plot “rpart” Models: An Enhanced Version of “plot.rpart” (Version 3.1.2). https://CRAN.R-project.org/package=rpart.plot
Neuwirth, E. (2022). RColorBrewer: ColorBrewer Palettes (Version 1.1-3). https://CRAN.R-project.org/package=RColorBrewer
R Core Team. (2024). R: A Language and Environment for Statistical Computing. Foundation for Statistical Computing. https://www.R-project.org/
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University. https://CRAN.R-project.org/package=psych
Ridgeway, G., & GBM Developers. (2024). gbm: Generalized Boosted Regression Models (Version 2.2.2). https://CRAN.R-project.org/package=gbm
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), 77. https://doi.org/10.1186/1471-2105-12-77
Strobl, C., Boulesteix, A.-L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8(1), 25. https://doi.org/10.1186/1471-2105-8-25
Therneau, T., & Atkinson, B. (2025). rpart: Recursive Partitioning and Regression Trees (Version 4.1.24) . https://CRAN.R-project.org/package=rpart
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag. https://ggplot2.tidyverse.org/
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation (Version 1.1.4). https://CRAN.R-project.org/package=dplyr
Williams, G. (2011). Data Mining with {Rattle} and {R}: The art of excavating data for knowledge discovery. Springer. ttps://rd.springer.com/book/10.1007/978-1-4419-9890-3
Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514. https://doi.org/10.1198/106186008X319331