The Power of Decision Stumps
A decision stump is the weak classification model with the simple tree structure consisting of one split, which can also be considered a one-level decision tree. Due to its simplicity,...continue reading.
A decision stump is the weak classification model with the simple tree structure consisting of one split, which can also be considered a one-level decision tree. Due to its simplicity,...continue reading.
Map() is a convenient routine in Python to apply a function to all items from one or more lists, as shown below. This specific nature also makes map() a perfect...continue reading.
When modeling the frequency measure in the operational risk with regressions, most modelers often prefer Poisson or Negative Binomial regressions as best practices in the industry. However, as an alternative...continue reading.
In the Loss Distributional Approach (LDA) for Operational Risk models, multiple distributions, including Log Normal, Gamma, Burr, Pareto, and so on, can be considered candidates for the distribution of severity...continue reading.
# READ QUARTERLY DATA FROM CSV library(zoo) ts1 <- read.zoo(‘Documents/data/macros.csv’, header = T, sep = ",", FUN = as.yearqtr) # CONVERT THE DATA TO STATIONARY TIME SERIES ts1$hpi_rate <- log(ts1$hpi...continue reading.
In R, there are two ways to read a block of the spreadsheet, e.g. xlsx file, as the one shown below. The xlsx package provides the most intuitive interface with...continue reading.
The example below shows how to estimate a simple univariate Poisson time series model with the tscount package. While the model estimation is straightforward and yeilds very similar parameter estimates...continue reading.
Similar to rPython, the rPithon package (http://rpithon.r-forge.r-project.org) allows users to execute Python code from R and exchange the data between Python and R. However, the underlying mechanisms between these two...continue reading.
Modeling the time series of count outcome is of interest in the operational risk while forecasting the frequency of losses. Below is an example showing how to estimate a simple...continue reading.
The tree-based Cubist model can be easily used to develop an ensemble classifier with a scheme called “committees”. The concept of “committees” is similar to the one of “boosting” by...continue reading.
Cubist is a tree-based model with a OLS regression attached to each terminal node and is somewhat similar to mob() function in the Party package (https://statcompute.wordpress.com/2014/10/26/model-segmentation-with-recursive-partitioning). Below is a demonstrate...continue reading.
library(betareg) library(sas7bdat) df1 <- read.sas7bdat(‘lgd.sas7bdat’) df2 <- df1[df1$y < 1, ] fml <- as.formula(‘y ~ x2 + x3 + x4 + x5 + x6 | x3 + x4 | x1...continue reading.
library(party) df1 <- read.csv("credit_count.csv") df2 <- df1[df1$CARDHLDR == 1, ] mdl <- mob(DEFAULT ~ MAJORDRG + MINORDRG + INCOME + OWNRENT | AGE + SELFEMPL, data = df2, family =...continue reading.
pkgs <- c(‘sas7bdat’, ‘betareg’, ‘lmtest’) lapply(pkgs, require, character.only = T) df1 <- read.sas7bdat("lgd.sas7bdat") df2 <- df1[which(df1$y < 1), ] xvar <- paste("x", 1:7, sep = ”, collapse = " +...continue reading.
Similar to the row search, by-group aggregation is another perfect use case to demonstrate the power of split-and-conquer with parallelism. In the example below, it is shown that the homebrew...continue reading.
# REFERENCE: # user2014.stat.ucla.edu/files/tutorial_Matt.pdf pkgs <- c(‘data.table’, ‘rbenchmark’) lapply(pkgs, require, character.only = T) load(‘2008.Rdata’) dt <- data.table(data) benchmark(replications = 10, order = "elap…continue reading.
I’ve been always wondering whether the efficiency of row search can be improved if the whole data.frame is splitted into chunks and then the row search is conducted within each...continue reading.
First of all, I used SQL statement with SQLDF package in R. It took ~51 seconds user time to select 12 rows out of 7 millions. Next, I used Apache...continue reading.
The feed-forward neural network is a very powerful classification model in the machine learning content. Since the goodness-of-fit of a neural network is majorly dominated by the model complexity, it...continue reading.