Improve SVM Tuning through Parallelism
This article is originally published at https://statcompute.wordpress.com
As pointed out in the chapter 10 of “The Elements of Statistical Learning”, ANN and SVM (support vector machines) share similar pros and cons, e.g. lack of interpretability and good predictive power. However, in contrast to ANN usually suffering from local minima solutions, SVM is always able to converge globally. In addition, SVM is less prone to over-fitting given a good choice of free parameters, which usually can be identified through cross-validations.
In the R package “e1071”, tune() function can be used to search for SVM parameters but is extremely inefficient due to the sequential instead of parallel executions. In the code snippet below, a parallelism-based algorithm performs the grid search for SVM parameters through the K-fold cross validation.
pkgs <- c('foreach', 'doParallel') lapply(pkgs, require, character.only = T) registerDoParallel(cores = 4) ### PREPARE FOR THE DATA ### df1 <- read.csv("credit_count.txt") df2 <- df1[df1$CARDHLDR == 1, ] x <- paste("AGE + ACADMOS + ADEPCNT + MAJORDRG + MINORDRG + OWNRENT + INCOME + SELFEMPL + INCPER + EXP_INC") fml <- as.formula(paste("as.factor(DEFAULT) ~ ", x)) ### SPLIT DATA INTO K FOLDS ### set.seed(2016) df2$fold <- caret::createFolds(1:nrow(df2), k = 4, list = FALSE) ### PARAMETER LIST ### cost <- c(10, 100) gamma <- c(1, 2) parms <- expand.grid(cost = cost, gamma = gamma) ### LOOP THROUGH PARAMETER VALUES ### result <- foreach(i = 1:nrow(parms), .combine = rbind) %do% { c <- parms[i, ]$cost g <- parms[i, ]$gamma ### K-FOLD VALIDATION ### out <- foreach(j = 1:max(df2$fold), .combine = rbind, .inorder = FALSE) %dopar% { deve <- df2[df2$fold != j, ] test <- df2[df2$fold == j, ] mdl <- e1071::svm(fml, data = deve, type = "C-classification", kernel = "radial", cost = c, gamma = g, probability = TRUE) pred <- predict(mdl, test, decision.values = TRUE, probability = TRUE) data.frame(y = test$DEFAULT, prob = attributes(pred)$probabilities[, 2]) } ### CALCULATE SVM PERFORMANCE ### roc <- pROC::roc(as.factor(out$y), out$prob) data.frame(parms[i, ], roc = roc$auc[1]) }
Thanks for visiting r-craft.org
This article is originally published at https://statcompute.wordpress.com
Please visit source website for post related comments.