Data Import Efficiency – A Case in R
This article is originally published at https://statcompute.wordpress.com
Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.
> library(RSQLite) Loading required package: DBI > library(rhdf5) > df <- read.csv('credit_count.csv') > do.call(cat, list(nrow(df), ncol(df), '\n')) 13444 14 > > # WRITE DF INTO SQLITE > if(file.exists('data.db')) file.remove('data.db') [1] TRUE > con <- dbConnect("SQLite", dbname = "data.db") > dbWriteTable(con, "tbl", df) [1] TRUE > > # WRITE DF INTO HDF5 > if(file.exists('data.h5')) file.remove('data.h5') [1] TRUE > h5createFile("data.h5") [1] TRUE > h5write(df, 'data.h5', 'tbl') > > # CALCULATE CPU TIMES > system.time(for(i in 1:10) read.csv('credit_count.csv')) user system elapsed 1.148 0.056 1.576 > system.time(for(i in 1:10) dbReadTable(con, 'tbl')) user system elapsed 0.492 0.024 0.649 > system.time(for(i in 1:10) h5read('data.h5','tbl')) user system elapsed 0.164 1.184 1.946
Thanks for visiting r-craft.org
This article is originally published at https://statcompute.wordpress.com
Please visit source website for post related comments.