Text bashing in R for SQL
Reading Time: < 1 minute Fairly often, a coworker who is strong in Excel, but weak in writing code will come to me for help in special details about customers...continue reading.
Reading Time: < 1 minute Fairly often, a coworker who is strong in Excel, but weak in writing code will come to me for help in special details about customers...continue reading.
Reading Time: < 1 minutes Fairly often, a coworker who is strong in Excel, but weak in writing code will come to me for help in special details about customers...continue reading.
A few days ago, I collected 30 minutes of tweets all around the world. I used the twitteR and streamR packages for this. The nice thing about those tweets is...continue reading.
Bayesian networks (BNs) are a type of graphical model that encode the conditional probability between different learning variables in a directed acyclic graph. There are benefits to using BNs compared...continue reading.
As a data scientist, occasionally, you receive a dataset and you would like to know what is the generative distribution for that dataset. In this post, I aim to show...continue reading.
Consider the following example: there is a three-stage truck maintenance pipeline. Initially, when a Truck comes to the maintenance service, it is added to the first stage and its status...continue reading.
The previous article introduced the sensitivity and elasticity to seasonal matrix model of imaginary annual plant. Both sensitivity and elasticity are partial derivatives. This means the values can only predict...continue reading.
library(betareg) library(sas7bdat) df1 <- read.sas7bdat(‘lgd.sas7bdat’) df2 <- df1[df1$y < 1, ] fml <- as.formula(‘y ~ x2 + x3 + x4 + x5 + x6 | x3 + x4 | x1...continue reading.
library(party) df1 <- read.csv("credit_count.csv") df2 <- df1[df1$CARDHLDR == 1, ] mdl <- mob(DEFAULT ~ MAJORDRG + MINORDRG + INCOME + OWNRENT | AGE + SELFEMPL, data = df2, family =...continue reading.
pkgs <- c(‘sas7bdat’, ‘betareg’, ‘lmtest’) lapply(pkgs, require, character.only = T) df1 <- read.sas7bdat("lgd.sas7bdat") df2 <- df1[which(df1$y < 1), ] xvar <- paste("x", 1:7, sep = ”, collapse = " +...continue reading.
The previous article introduced the seasonal matrices and the population growth rate λ of imaginary annual plant. In this article, let’s try the sensitivity analysis of these matrices and the...continue reading.
Similar to the row search, by-group aggregation is another perfect use case to demonstrate the power of split-and-conquer with parallelism. In the example below, it is shown that the homebrew...continue reading.
# REFERENCE: # user2014.stat.ucla.edu/files/tutorial_Matt.pdf pkgs <- c(‘data.table’, ‘rbenchmark’) lapply(pkgs, require, character.only = T) load(‘2008.Rdata’) dt <- data.table(data) benchmark(replications = 10, order = "elap…continue reading.
I’ve been always wondering whether the efficiency of row search can be improved if the whole data.frame is splitted into chunks and then the row search is conducted within each...continue reading.
The previous article introduced the seasonal matrices and the population growth rate λ of imaginary annual plant. This article focuses on the meaning of the eigenvector at first, and then...continue reading.
First of all, I used SQL statement with SQLDF package in R. It took ~51 seconds user time to select 12 rows out of 7 millions. Next, I used Apache...continue reading.
Let’s challenge to build a matrix population model of annual organisms and then calculate the population growth rate λ using R. Consider a simple life cycle of imaginary annual plants;...continue reading.
This is something I did a while ago using the Berlin Affective Word List (BAWL).The BAWL contains ratings for 2902 German words (2107 nouns, 504 verbs, 291 adjectives). Ratings were...continue reading.
If history can tell us anything about the World Cup, it’s that the host nation has an advantage of all other teams. Evidence of this was presented last night as...continue reading.