R-Craft

R News from another blog for R community

January 29, 2013

Efficiency in Joining Two Data Frames

In R, there are multiple ways to merge 2 data frames. However, there could be a huge disparity in terms of efficiency. Therefore, it is worthwhile to test the performance...continue reading.

statcompute

January 12, 2013

PART – A Rule-Learning Algorithm

> require(‘RWeka’) > require(‘pROC’) > > # SEPARATE DATA INTO TRAINING AND TESTING SETS > df1 <- read.csv(‘credit_count.csv’) > df2 <- df1[df1$CARDHLDR == 1, 2:12] > set.seed(2013) > rows <-...continue reading.

JottR

January 7, 2013

Speed Trick: unlist(…, use.names=FALSE) is Heaps Faster!

Sometimes a minor change to your R code can make a big difference in processing time. Here is an example showing that if you’re don’t care about the names attribute...continue reading.

statcompute

January 2, 2013

Efficiecy of Extracting Rows from A Data Frame in R

In the example below, 552 rows are extracted from a data frame with 10 million rows using six different methods. Results show a significant disparity between the least and the...continue reading.

statcompute

December 31, 2012

Modeling in R with Log Likelihood Function

Similar to NLMIXED procedure in SAS, optim() in R provides the functionality to estimate a model by specifying the log likelihood function explicitly. Below is a demo showing how to...continue reading.

statcompute

December 29, 2012

Surprising Performance of data.table in Data Aggregation

data.table (http://datatable.r-forge.r-project.org/) inherits from data.frame and provides functionality in fast subset, fast grouping, and fast joins. In previous posts, it is shown that the shortest CPU time to aggregate a...continue reading.

statcompute

December 25, 2012

More about Aggregation by Group in R

Motivated by my young friend, HongMing Song, I managed to find more handy ways to calculate aggregated statistics by group in R. They require loading additional packages, plyr, doBy, Hmisc,...continue reading.

statcompute

December 24, 2012

Aggregation by Group in R

Efficiency Comparison among 4 Methods abovecontinue reading.

statcompute

December 24, 2012

Data Import Efficiency – A Case in R

Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.continue reading.

statcompute

December 21, 2012

Removing Records by Duplicate Values in R – An Efficiency Comparison

After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of...continue reading.

statcompute

December 20, 2012

Removing Records by Duplicate Values

Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how...continue reading.

statcompute

December 19, 2012

Generalized Boosted Regression with A Monotonic Marginal Effect for Each Predictor

In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a...continue reading.

statcompute

December 17, 2012

Fractional Logit Model with Python

In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: data = pd.read_table(‘/home/liuwensui/Documents/data/csdata.txt’) In [4]: Y = data.LEV_LT3 In [5]: X = sm.add_constant(data[[‘COLLAT1’, ‘SIZE1’, ‘PROF2’, ‘LIQ’,...continue reading.

L. Collado-Torres

December 11, 2012

DEXSeq paper discussion

This article is originally published at https://lcolladotor.github.io/ Thanks for visiting r-craft.org This article is originally published at https://lcolladotor.github.io/ Please visit source website for post related comments.continue reading.

L. Collado-Torres

December 5, 2012

Adding youtube videos in pdfs, html reports and html presentations

I got a question today on how to add a video to a beamer pdf presentation. Well, I had never done it, but I got curious enough to google around...continue reading.

statcompute

December 3, 2012

Exchange Data between Python and R with SQLite

SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by...continue reading.

L. Collado-Torres

November 13, 2012

Introduction to R and Biostatistics (2012 version)

Blog post describing this talkcontinue reading.

quantsignals

November 12, 2012

Portfolio Trading

In finance and investing the term portfolio refers to the collection of assets one owns. Compared to just holding a single asset at a time a portfolio has a number...continue reading.

L. Collado-Torres

November 12, 2012

Introduction to R and Biostatistics (2012 version): presentation

To follow my Introducing R and Biostatistics to first year LCG students (2012 version) post, you can now find the presentation online from my site either in presentation format, in a single...continue reading.

L. Collado-Torres

November 7, 2012

me: Bad rm, don’t delete stuff I didn’t want to delete! (rm: well, I do what you tell me to do!)

When Sandy was in town at some point I started doing some of my research work, but I shouldn’t have. I basically did a silly mistake and erased files that...continue reading.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

R News from another blog for R community

Efficiency in Joining Two Data Frames

PART – A Rule-Learning Algorithm

Speed Trick: unlist(…, use.names=FALSE) is Heaps Faster!

Efficiecy of Extracting Rows from A Data Frame in R

Modeling in R with Log Likelihood Function

Surprising Performance of data.table in Data Aggregation

More about Aggregation by Group in R

Aggregation by Group in R

Data Import Efficiency – A Case in R

Removing Records by Duplicate Values in R – An Efficiency Comparison

Removing Records by Duplicate Values

Generalized Boosted Regression with A Monotonic Marginal Effect for Each Predictor

Fractional Logit Model with Python

DEXSeq paper discussion

Adding youtube videos in pdfs, html reports and html presentations

Exchange Data between Python and R with SQLite

Introduction to R and Biostatistics (2012 version)

Portfolio Trading

Introduction to R and Biostatistics (2012 version): presentation

me: Bad rm, don’t delete stuff I didn’t want to delete! (rm: well, I do what you tell me to do!)

Editor Picks

Minitab Alternative BlueSky Statistics to Display Graphical Interface to R at ASQ Conference

Appsilon Joins the Pharmaverse Council to Advance Open-Source Clinical Reporting

Categories

Platinum Sponsors

Sponsors

Buy us a coffee for $10.

Older posts