RcppMLPACK2 and the MLPACK Machine Learning Library
This article is originally published at http://gallery.rcpp.org
mlpack
mlpack is, to quote, a scalable machine learning library, written in C++, that aims to provide fast, extensible implementations of cutting-edge machine learning algorithms. It has been written by Ryan Curtin and others, and is described in two papers in BigLearning (2011) and JMLR (2013). mlpack uses Armadillo as the underlying linear algebra library, which, thanks to RcppArmadillo, is already a rather well-known library in the R ecosystem.
RcppMLPACK1
Qiang Kou has created the RcppMLPACK package on CRAN for easy-to-use integration of mlpack with R. It integrates the mlpack sources, and is, as a CRAN package, widely available on all platforms.
However, this RcppMLPACK package is also based on a by-now dated version of mlpack. Quoting again: mlpack provides these algorithms as simple command-line programs and C++ classes which can then be integrated into larger-scale machine learning solutions. Version 2 of the mlpack sources switched to a slightly more encompassing build also requiring the Boost libraries ‘program_options’, ‘unit_test_framework’ and ‘serialization’. Within the context of an R package, we could condition out the first two as R provides both the direct interface (hence no need to parse command-line options) and also the testing framework. However, it would be both difficult and potentially undesirable to condition out the serialization which allows mlpack to store and resume machine learning tasks.
We refer to this version now as RcppMLPACK1.
RcppMLPACK2
As of February 2017, the current version of mlpack is 2.1.1. As it requires external linking with (some) Boost libraries as well as with Armadillo, we have created a new package RcppMLPACK2 inside a new GitHub organization RcppMLPACK.
Linux
This package works fine on Linux provided mlpack, Armadillo and Boost are installed.
OS X / macOS
For maxOS / OS X, James Balamuta has tried to set up a homebrew recipe but there are some tricky interaction with the compiler suites used by both brew and R on macOS.
Windows
For Windows, one could do what Jeroen Ooms has done and build (external) libraries. Volunteers are encouraged to get in touch via the issue tickets at GitHub.
Installation from source
Release are available from a drat repository hosted in the GitHub orgranization RcppMLPACK. So
will use this. If you prefer to rather pick a random commit state,
will work as well.
Example: Logistic Regression
To illustrate mlpack we show a first simple example also included in the package. As the rest of the Rcpp Gallery, these are “live” code examples.
We can then call this function with the same (trivial) data set as used in the first unit test for it:
$parameters [1] 67.9550 -13.6328 -13.6328
Example: Naive Bayes Classifier
A second examples shows the NaiveBayesClassifier
class.
We can use the sample data included in recent-enough version of the RcppMLPACK package:
$means [1] 2.75000 4.00000 3.68750 2.37500 8.33333 4.66667 3.66667 2.40000 $variances [1] 0.333333 0.800000 0.629167 0.383333 0.809524 3.380952 0.666667 0.400000 $probabilities [1] 0.516129 0.483871
$means [1] 2.75000 4.00000 3.68750 2.37500 8.33333 4.66667 3.66667 2.40000 $variances [1] 0.333333 0.800000 0.629167 0.383333 0.809524 3.380952 0.666667 0.400000 $probabilities [1] 0.516129 0.483871 $classification [1] 0 0 0 1 1 1 1
[1] TRUE
As we can see, the computed classification on the test set corresponds to the expected
classification in testlabels
.
Thanks for visiting r-craft.org
This article is originally published at http://gallery.rcpp.org
Please visit source website for post related comments.