Author: statcompute

January 2, 2016

The Power of Decision Stumps

A decision stump is the weak classification model with the simple tree structure consisting of one split, which can also be considered a one-level decision tree. Due to its simplicity,...continue reading.

statcompute

January 1, 2016

Parallelize Map()

Map() is a convenient routine in Python to apply a function to all items from one or more lists, as shown below. This specific nature also makes map() a perfect...continue reading.

statcompute

October 15, 2015

Estimating Quasi-Poisson Regression with GLIMMIX in SAS

When modeling the frequency measure in the operational risk with regressions, most modelers often prefer Poisson or Negative Binomial regressions as best practices in the industry. However, as an alternative...continue reading.

statcompute

August 17, 2015

Some Considerations of Modeling Severity in Operational Losses

In the Loss Distributional Approach (LDA) for Operational Risk models, multiple distributions, including Log Normal, Gamma, Burr, Pareto, and so on, can be considered candidates for the distribution of severity...continue reading.

statcompute

May 26, 2015

Granger Causality Test

# READ QUARTERLY DATA FROM CSV library(zoo) ts1 <- read.zoo(‘Documents/data/macros.csv’, header = T, sep = ",", FUN = as.yearqtr) # CONVERT THE DATA TO STATIONARY TIME SERIES ts1$hpi_rate <- log(ts1$hpi...continue reading.

statcompute

May 11, 2015

Read A Block of Spreadsheet with R

In R, there are two ways to read a block of the spreadsheet, e.g. xlsx file, as the one shown below. The xlsx package provides the most intuitive interface with...continue reading.

statcompute

April 1, 2015

Modeling Count Time Series with tscount Package

The example below shows how to estimate a simple univariate Poisson time series model with the tscount package. While the model estimation is straightforward and yeilds very similar parameter estimates...continue reading.

statcompute

March 31, 2015

rPithon vs. rPython

Similar to rPython, the rPithon package (http://rpithon.r-forge.r-project.org) allows users to execute Python code from R and exchange the data between Python and R. However, the underlying mechanisms between these two...continue reading.

statcompute

March 30, 2015

Autoregressive Conditional Poisson Model – I

Modeling the time series of count outcome is of interest in the operational risk while forecasting the frequency of losses. Below is an example showing how to estimate a simple...continue reading.

statcompute

March 21, 2015

Ensemble Learning with Cubist Model

The tree-based Cubist model can be easily used to develop an ensemble classifier with a scheme called “committees”. The concept of “committees” is similar to the one of “boosting” by...continue reading.

statcompute

March 19, 2015

Model Segmentation with Cubist

Cubist is a tree-based model with a OLS regression attached to each terminal node and is somewhat similar to mob() function in the Party package (https://statcompute.wordpress.com/2014/10/26/model-segmentation-with-recursive-partitioning). Below is a demonstrate...continue reading.

statcompute

October 28, 2014

Flexible Beta Modeling

library(betareg) library(sas7bdat) df1 <- read.sas7bdat(‘lgd.sas7bdat’) df2 <- df1[df1$y < 1, ] fml <- as.formula(‘y ~ x2 + x3 + x4 + x5 + x6 | x3 + x4 | x1...continue reading.

statcompute

October 26, 2014

Model Segmentation with Recursive Partitioning

library(party) df1 <- read.csv("credit_count.csv") df2 <- df1[df1$CARDHLDR == 1, ] mdl <- mob(DEFAULT ~ MAJORDRG + MINORDRG + INCOME + OWNRENT | AGE + SELFEMPL, data = df2, family =...continue reading.

statcompute

October 20, 2014

Estimating a Beta Regression with The Variable Dispersion in R

pkgs <- c(‘sas7bdat’, ‘betareg’, ‘lmtest’) lapply(pkgs, require, character.only = T) df1 <- read.sas7bdat("lgd.sas7bdat") df2 <- df1[which(df1$y < 1), ] xvar <- paste("x", 1:7, sep = ”, collapse = " +...continue reading.

statcompute

October 8, 2014

Fitting Lasso with Julia

Julia Code R Codecontinue reading.

statcompute

October 5, 2014

By-Group Aggregation in Parallel

Similar to the row search, by-group aggregation is another perfect use case to demonstrate the power of split-and-conquer with parallelism. In the example below, it is shown that the homebrew...continue reading.

statcompute

October 2, 2014

Vector Search vs. Binary Search

# REFERENCE: # user2014.stat.ucla.edu/files/tutorial_Matt.pdf pkgs <- c(‘data.table’, ‘rbenchmark’) lapply(pkgs, require, character.only = T) load(‘2008.Rdata’) dt <- data.table(data) benchmark(replications = 10, order = "elap…continue reading.

statcompute

September 29, 2014

Row Search in Parallel

I’ve been always wondering whether the efficiency of row search can be improved if the whole data.frame is splitted into chunks and then the row search is conducted within each...continue reading.

statcompute

September 25, 2014

Select Distinct Values with Pig

First of all, I used SQL statement with SQLDF package in R. It took ~51 seconds user time to select 12 rows out of 7 millions. Next, I used Apache...continue reading.

statcompute

February 4, 2013

A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

The feed-forward neural network is a very powerful classification model in the machine learning content. Since the goodness-of-fit of a neural network is majorly dominated by the model complexity, it...continue reading.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Author: statcompute

The Power of Decision Stumps

Parallelize Map()

Estimating Quasi-Poisson Regression with GLIMMIX in SAS

Some Considerations of Modeling Severity in Operational Losses

Granger Causality Test

Read A Block of Spreadsheet with R

Modeling Count Time Series with tscount Package

rPithon vs. rPython

Autoregressive Conditional Poisson Model – I

Ensemble Learning with Cubist Model

Model Segmentation with Cubist

Flexible Beta Modeling

Model Segmentation with Recursive Partitioning

Estimating a Beta Regression with The Variable Dispersion in R

Fitting Lasso with Julia

By-Group Aggregation in Parallel

Vector Search vs. Binary Search

Row Search in Parallel

Select Distinct Values with Pig

A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

Editor Picks

Introducing Tapyr: Create and Deploy Enterprise-Ready PyShiny Dashboards with Ease

Pimping your shiny app with a JavaScript library : an example using sweetalert2

Categories

Platinum Sponsors

Sponsors

Buy us a coffee for $10.

Older posts