Author: statcompute

Parallelize Map()

Map() is a convenient routine in Python to apply a function to all items from one or more lists, as shown below. This specific nature also makes map() a perfect...continue reading.

Granger Causality Test

# READ QUARTERLY DATA FROM CSV library(zoo) ts1 <- read.zoo(‘Documents/data/macros.csv’, header = T, sep = ",", FUN = as.yearqtr) # CONVERT THE DATA TO STATIONARY TIME SERIES ts1$hpi_rate <- log(ts1$hpi...continue reading.

rPithon vs. rPython

Similar to rPython, the rPithon package (http://rpithon.r-forge.r-project.org) allows users to execute Python code from R and exchange the data between Python and R. However, the underlying mechanisms between these two...continue reading.

Model Segmentation with Cubist

Cubist is a tree-based model with a OLS regression attached to each terminal node and is somewhat similar to mob() function in the Party package (https://statcompute.wordpress.com/2014/10/26/model-segmentation-with-recursive-partitioning). Below is a demonstrate...continue reading.

Vector Search vs. Binary Search

# REFERENCE: # user2014.stat.ucla.edu/files/tutorial_Matt.pdf pkgs <- c(‘data.table’, ‘rbenchmark’) lapply(pkgs, require, character.only = T) load(‘2008.Rdata’) dt <- data.table(data) benchmark(replications = 10, order = "elap…continue reading.

Row Search in Parallel

I’ve been always wondering whether the efficiency of row search can be improved if the whole data.frame is splitted into chunks and then the row search is conducted within each...continue reading.