Bagging Model
Details
Decision trees suffer from high variance (If we split the training data-set randomly into two parts and set a decision tree to both parts, the results might be quite different). Bagging is an ensemble procedure which reduces the variance and increases the prediction accuracy of a statistical learning method by considering many training sets (\(\hat{f}^{1}(x),\hat{f}^{2}(x),\ldots,\hat{f}^{B}(x)\)) from the population. Since we can not have multiple training-sets, from a single training data-set, we can generate \(B\) different bootstrapped training data-sets (\(\hat{f}^{*1}(x), \hat{f}^{*2}(x), \ldots,\hat{f}^{*B}(x)\)) by each \(B\) trees and take a majority vote. Therefore, bagging for classification problem defined as $$\hat{f}(x)=arg\max_{k}\hat{f}^{*b}(x)$$
Examples
# \donttest{
yvar <- c("Loan.Type")
sample_data <- sample_data[c(1:750),]
xvar <- c("sex", "married", "age", "havejob", "educ", "political.afl",
"rural", "region", "fin.intermdiaries", "fin.knowldge", "income")
BchMk.BAG <- BAG_Model(sample_data, c(xvar, "networth"), yvar )
#> Loading required package: ggplot2
#> Loading required package: lattice
#> + Fold01: parameter=none
#> - Fold01: parameter=none
#> + Fold02: parameter=none
#> - Fold02: parameter=none
#> + Fold03: parameter=none
#> - Fold03: parameter=none
#> + Fold04: parameter=none
#> - Fold04: parameter=none
#> + Fold05: parameter=none
#> - Fold05: parameter=none
#> + Fold06: parameter=none
#> - Fold06: parameter=none
#> + Fold07: parameter=none
#> - Fold07: parameter=none
#> + Fold08: parameter=none
#> - Fold08: parameter=none
#> + Fold09: parameter=none
#> - Fold09: parameter=none
#> + Fold10: parameter=none
#> - Fold10: parameter=none
#> Aggregating results
#> Fitting final model on full training set
BchMk.BAG$Roc$auc
#> Multi-class area under the curve: 0.6906
# }