Catching errors in R and trying something else
This article is originally published at https://rcrastinate.blogspot.com/I recently encountered some functionality in R which most of you might already know. Nevertheless, I want to share it here, because it might come in handy for those of you who do not know this yet.
Suppose you want to read in a large number of very large text tables in R. There is the great function fread() in the data.table package, which is really fast in reading in those large tables. However, it is still under development and sometimes it fails (e.g., if there are unbalanced quotes for an entry).
I guess, this will be fixed in the future. In the meantime, I wrote a little function which catches an error and tries something else.
The following function reads in a file (I stored it in one some private webspace for you if you want to try this out) with fread(). It will fail for fread(), but it tries good old read.table() with the appropriate parameter set next. read.table() is much slower but it also works for unbalanced quotes.
The function try() does the trick...
read.file <- function (file.name) {
require(data.table)
file <- try(fread(file.name))
if (class(file) == "try-error") {
cat("Caught an error during fread, trying read.table.\n")
file <- as.data.table(read.table(file.name, sep = " ", quote = ""))
}
file
}
# Let's try this (excuse the German output)
> read.file("http://www.wolferonline.de/test/test.txt")
versuche URL 'http://www.wolferonline.de/test/test.txt'
Content type 'text/plain' length 72 bytes
URL geöffnet
==================================================
downloaded 72 bytes
Error in fread(file.name) :
Unbalanced " observed on this line: "Unbalanced.quotes some.entry some.other.entry
Caught an error during fread, trying read.table.
V1 V2 V3
1 No.quotes entry1 entry2
2 "Unbalanced.quotes some.entry some.other.entry
The cool thing about this: Wether you read in a file or you do something else which has a fast and a slow way to do it, you can first try the fast way. If this fails, you can still try the other, more stable (but slower) way to do it. Also, you can use try() as often as you like. So if the slower way also fails, you can return something which your script can use further on.
Good luck!
EDIT [17/07/2015]: Please note that if your data structure in the successful case has a class with a length longer than 1, you get a warning. This is the case for data.tables. They have class(data.table()): data.table, data.frame. If you don't want the warning from the if (class(file) == "try-error") then you can simply write if (class(file)[1] == "try-error").
Thanks for visiting r-craft.org
This article is originally published at https://rcrastinate.blogspot.com/
Please visit source website for post related comments.