The {chronicler} package, an implementation of the logger monad in R
This article is originally published at https://www.brodrigues.co/
Back in February I discussed a package I was working on
which allowed users to add logging to function calls. I named the package {loudly}
but decided
to rename it to {chronicler}
.
I have been working on it for the past few weeks, and I think that a CRAN release could happen soon.
Introduction
So what does {chronicler}
do? {chronicler}
allows you do decorate functions, so that they
provide enhanced output:
library(chronicler)
## Loading required package: rlang
r_sqrt <- record(sqrt)
a <- r_sqrt(1:5)
Object a
is now an object of class chronicle
. Let’s print a
to the terminal:
a
## ✔ Value computed successfully:
## ---------------
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
##
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log().
as the output says, a
is an object of type chronicle
. Let’s use read_log()
as suggested:
read_log(a)
## [1] "Complete log:"
## [2] "✔ sqrt(1:5) ran successfully at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000240325927734375 secs"
The log tells us how a
was built, and it’s especially useful for objects that are the result
of many function calls:
r_sqrt <- record(sqrt)
r_exp <- record(exp)
r_mean <- record(mean)
b <- 1:10 |>
r_sqrt() |>
bind_record(r_exp) |>
bind_record(r_mean)
The log gives all the details:
read_log(b)
## [1] "Complete log:"
## [2] "✔ sqrt(1:10) ran successfully at 2022-04-01 21:14:28"
## [3] "✔ exp(.c$value) ran successfully at 2022-04-01 21:14:28"
## [4] "✔ mean(.c$value) ran successfully at 2022-04-01 21:14:28"
## [5] "Total running time: 0.00820255279541016 secs"
The end result, or what is called value
can be obtained using pick()
(you could also use a$value
):
pick(a, "value")
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
pick(b, "value")
## [1] 11.55345
Composing decorated functions
bind_record()
is used to pass the output from one decorated function to the next:
suppressPackageStartupMessages(
library(dplyr)
)
r_group_by <- record(group_by)
r_select <- record(select)
r_summarise <- record(summarise)
r_filter <- record(filter)
output <- starwars %>%
r_select(height, mass, species, sex) %>%
bind_record(r_group_by, species, sex) %>%
bind_record(r_filter, sex != "male") %>%
bind_record(r_summarise,
mass = mean(mass, na.rm = TRUE)
)
read_log(output)
## [1] "Complete log:"
## [2] "✔ select(.,height,mass,species,sex) ran successfully at 2022-04-01 21:14:28"
## [3] "✔ group_by(.c$value,species,sex) ran successfully at 2022-04-01 21:14:28"
## [4] "✔ filter(.c$value,sex != \"male\") ran successfully at 2022-04-01 21:14:28"
## [5] "✔ summarise(.c$value,mean(mass, na.rm = TRUE)) ran successfully at 2022-04-01 21:14:28"
## [6] "Total running time: 0.11384654045105 secs"
The value can then be saved in a new variable:
(my_df <- pick(output, "value"))
## # A tibble: 9 × 3
## # Groups: species [9]
## species sex mass
## <chr> <chr> <dbl>
## 1 Clawdite female 55
## 2 Droid none 69.8
## 3 Human female 56.3
## 4 Hutt hermaphroditic 1358
## 5 Kaminoan female NaN
## 6 Mirialan female 53.1
## 7 Tholothian female 50
## 8 Togruta female 57
## 9 Twi'lek female 55
You can save the output
object with saveRDS()
and share it; your colleague can then read the log
to learn how the object was created.
This package also ships with a dedicated pipe, %>=%
which you can use instead of bind_record()
:
output_pipe <- starwars %>%
r_select(height, mass, species, sex) %>=%
r_group_by(species, sex) %>=%
r_filter(sex != "male") %>=%
r_summarise(mass = mean(mass, na.rm = TRUE))
pick(output_pipe, "value")
## # A tibble: 9 × 3
## # Groups: species [9]
## species sex mass
## <chr> <chr> <dbl>
## 1 Clawdite female 55
## 2 Droid none 69.8
## 3 Human female 56.3
## 4 Hutt hermaphroditic 1358
## 5 Kaminoan female NaN
## 6 Mirialan female 53.1
## 7 Tholothian female 50
## 8 Togruta female 57
## 9 Twi'lek female 55
Condition handling
By default, errors and warnings get caught and composed in the log:
errord_output <- starwars %>%
r_select(height, mass, species, sex) %>=%
r_group_by(species, sx) %>=% # typo, "sx" instead of "sex"
r_filter(sex != "male") %>=%
r_summarise(mass = mean(mass, na.rm = TRUE))
errord_output
## ✖ Value computed unsuccessfully:
## ---------------
## [1] NA
##
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log().
Reading the log tells you which function failed, and with which error message:
read_log(errord_output)
## [1] "Complete log:"
## [2] "✔ select(.,height,mass,species,sex) ran successfully at 2022-04-01 21:14:28"
## [3] "✖ group_by(.c$value,species,sx) ran unsuccessfully with following exception: Must group by variables found in `.data`.\n✖ Column `sx` is not found. at 2022-04-01 21:14:28"
## [4] "✖ filter(.c$value,sex != \"male\") ran unsuccessfully with following exception: no applicable method for 'filter' applied to an object of class \"logical\" at 2022-04-01 21:14:28"
## [5] "✖ summarise(.c$value,mean(mass, na.rm = TRUE)) ran unsuccessfully with following exception: no applicable method for 'summarise' applied to an object of class \"logical\" at 2022-04-01 21:14:28"
## [6] "Total running time: 0.163575887680054 secs"
It is also possible to only capture errors, or catpure errors, warnings and messages using
the strict
parameter of record()
# Only errors:
r_sqrt <- record(sqrt, strict = 1)
# Nothing will be captured here, since sqrt(-10) returns a NA and a warning
r_sqrt(-10) |>
read_log()
## Warning in .f(...): NaNs produced
## [1] "Complete log:"
## [2] "✖ sqrt(-10) ran unsuccessfully with following exception: NA at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000255584716796875 secs"
# Errors and warnings:
r_sqrt <- record(sqrt, strict = 2)
# The warning gets captured
r_sqrt(-10) |>
read_log()
## [1] "Complete log:"
## [2] "✖ sqrt(-10) ran unsuccessfully with following exception: NaNs produced at 2022-04-01 21:14:28"
## [3] "Total running time: 0.00019383430480957 secs"
# Errors, warnings and messages
my_f <- function(x){
message("this is a message")
10
}
record(my_f, strict = 3)(10) |>
read_log()
## [1] "Complete log:"
## [2] "✖ my_f(10) ran unsuccessfully with following exception: this is a message\n at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000336408615112305 secs"
Advanced logging
You can provide a function to record()
, which will be evaluated on the output. This makes it possible
to, for example, monitor the size of a data frame throughout the pipeline. In the example below I
provide dim()
, which will return the dimensions of the data frame, as an argument to record()
:
r_group_by <- record(group_by)
r_select <- record(select, .g = dim)
r_summarise <- record(summarise, .g = dim)
r_filter <- record(filter, .g = dim)
output_pipe <- starwars %>%
r_select(height, mass, species, sex) %>=%
r_group_by(species, sex) %>=%
r_filter(sex != "male") %>=%
r_summarise(mass = mean(mass, na.rm = TRUE))
The $log_df
element of a chronicle
object contains detailed information. In most cases you
don’t need to worry about it:
pick(output_pipe, "log_df")
## # A tibble: 4 × 8
## outcome `function` arguments message start_time end_time
## <chr> <chr> <chr> <chr> <dttm> <dttm>
## 1 ✔ Success select ".,heigh… NA 2022-04-01 21:14:28 2022-04-01 21:14:28
## 2 ✔ Success group_by ".c$valu… NA 2022-04-01 21:14:28 2022-04-01 21:14:28
## 3 ✔ Success filter ".c$valu… NA 2022-04-01 21:14:28 2022-04-01 21:14:29
## 4 ✔ Success summarise ".c$valu… NA 2022-04-01 21:14:28 2022-04-01 21:14:29
## # … with 2 more variables: run_time <drtn>, g <list>
but if you want to look at the output of .g
, then you have to grab it and see:
# I coerce it to a data.frame just for the output here on my blog, to make the column `g` readable
as.data.frame(output_pipe$log_df[, c("function", "g")])
## function g
## 1 select 87, 4
## 2 group_by NA
## 3 filter 23, 4
## 4 summarise 9, 3
We can see that the dimension of the dataframe was (87, 4) after the call to select()
, (23, 4)
after the call to filter()
and finally (9, 3) after the call to summarise()
.
Monads
This package implements a logger monad. I might talk about monads in the future, but probably in a video; if you don’t know what monads are, don’t worry, no one really knows. Legend has it that to truly understand what monads are you have to have a PhD in computer science and have been born in the former Soviet Union. But to make things simple, you can think of a monad as a way to:
- embelish functions to provide additional output without having to touch the function’s core behaviour
- a way to compose these functions and work with the embelished outputs (also called monadic values)
- monadic values are basically containers that contain the actual value of the function evaluated on its inputs and something else (here, a log)
Monads are quite useful in some programming languanges, like Haskell. Not so much in R, but I think that the logger monad I propose here can be quite useful. So let me know if you find it useful or if you have suggestions!
You can install {chronicler}
from its github repo.
Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!
Thanks for visiting r-craft.org
This article is originally published at https://www.brodrigues.co/
Please visit source website for post related comments.