Efficiency in Joining Two Data Frames
In R, there are multiple ways to merge 2 data frames. However, there could be a huge disparity in terms of efficiency. Therefore, it is worthwhile to test the performance...continue reading.
In R, there are multiple ways to merge 2 data frames. However, there could be a huge disparity in terms of efficiency. Therefore, it is worthwhile to test the performance...continue reading.
> require(‘RWeka’) > require(‘pROC’) > > # SEPARATE DATA INTO TRAINING AND TESTING SETS > df1 <- read.csv(‘credit_count.csv’) > df2 <- df1[df1$CARDHLDR == 1, 2:12] > set.seed(2013) > rows <-...continue reading.
Sometimes a minor change to your R code can make a big difference in processing time. Here is an example showing that if you’re don’t care about the names attribute...continue reading.
In the example below, 552 rows are extracted from a data frame with 10 million rows using six different methods. Results show a significant disparity between the least and the...continue reading.
Similar to NLMIXED procedure in SAS, optim() in R provides the functionality to estimate a model by specifying the log likelihood function explicitly. Below is a demo showing how to...continue reading.
data.table (http://datatable.r-forge.r-project.org/) inherits from data.frame and provides functionality in fast subset, fast grouping, and fast joins. In previous posts, it is shown that the shortest CPU time to aggregate a...continue reading.
Motivated by my young friend, HongMing Song, I managed to find more handy ways to calculate aggregated statistics by group in R. They require loading additional packages, plyr, doBy, Hmisc,...continue reading.
Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.continue reading.
After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of...continue reading.
Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how...continue reading.
In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a...continue reading.
In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: data = pd.read_table(‘/home/liuwensui/Documents/data/csdata.txt’) In [4]: Y = data.LEV_LT3 In [5]: X = sm.add_constant(data[[‘COLLAT1’, ‘SIZE1’, ‘PROF2’, ‘LIQ’,...continue reading.
This article is originally published at https://lcolladotor.github.io/ Thanks for visiting r-craft.org This article is originally published at https://lcolladotor.github.io/ Please visit source website for post related comments.continue reading.
I got a question today on how to add a video to a beamer pdf presentation. Well, I had never done it, but I got curious enough to google around...continue reading.
SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by...continue reading.
In finance and investing the term portfolio refers to the collection of assets one owns. Compared to just holding a single asset at a time a portfolio has a number...continue reading.
To follow my Introducing R and Biostatistics to first year LCG students (2012 version) post, you can now find the presentation online from my site either in presentation format, in a single...continue reading.
When Sandy was in town at some point I started doing some of my research work, but I shouldn’t have. I basically did a silly mistake and erased files that...continue reading.