tibble 1.1
This article is originally published at https://www.rstudio.com/blog/
We’re proud to announce version 1.1 of the tibble
package. Tibbles are a modern reimagining of the data frame, keeping what time has shown to be effective, and throwing out what is not. Grab the latest version with:
install.packages("tibble")
There are three major new features:
A more consistent naming scheme
Changes to how columns are extracted
Tweaks to the output
There are many other small improvements and bug fixes: please see the release notes for a complete list.
A better naming scheme
It’s caused some confusion that you use data_frame()
and as_data_frame()
to create and coerce tibbles. It’s also more important to make the distinction between tibbles and data frames more clear as we evolve a little further away from the semantics of data frames.
Now, we’re consistently using “tibble” as the key word in creation, coercion, and testing functions:
tibble(x = 1:5, y = letters[1:5])
#> # A tibble: 5 x 2
#> x y
#> <int> <chr>
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
#> 5 5 e
as_tibble(data.frame(x = runif(5)))
#> # A tibble: 5 x 1
#> x
#> <dbl>
#> 1 0.4603887
#> 2 0.4824339
#> 3 0.4546795
#> 4 0.5042028
#> 5 0.4558387
is_tibble(data.frame())
#> [1] FALSE
Previously tibble()
was an alias for frame_data()
. If you were using tibble()
to create tibbles by rows, you’ll need to switch to frame_data()
. This is a breaking change, but we believe that the new naming scheme will be less confusing in the long run.
Extracting columns
The previous version of tibble was a little too strict when you attempted to retrieve a column that did not exist: we had forgotten that many people check for the presence of column with is.null(df$x)
. This is bad idea because of partial matching, but it is common:
df1 <- data.frame(xyz = 1)
df1$x
#> [1] 1
Now, instead of throwing an error, tibble will return NULL
. If you use $
, common in interactive scripts, tibble will generate a warning:
df2 <- tibble(xyz = 1)
df2$x
#> Warning: Unknown column 'x'
#> NULL
df2[["x"]]
#> NULL
We also provide a convenient helper for detecting the presence/absence of a column:
has_name(df1, "x")
#> [1] FALSE
has_name(df2, "x")
#> [1] FALSE
Output tweaks
We’ve tweaked the output to have a shorter header, more information in the footer. We’re using #
consistently to denote metadata, and we print missing character values as <NA>
(instead of NA
).
The example below shows the new rendering of the flights
table.
nycflights13::flights
#> # A tibble: 336,776 x 19
#> year month day dep_time sched_dep_time dep_delay arr_time
#> <int> <int> <int> <int> <int> <dbl> <int>
#> 1 2013 1 1 517 515 2 830
#> 2 2013 1 1 533 529 4 850
#> 3 2013 1 1 542 540 2 923
#> 4 2013 1 1 544 545 -1 1004
#> 5 2013 1 1 554 600 -6 812
#> 6 2013 1 1 554 558 -4 740
#> 7 2013 1 1 555 600 -5 913
#> 8 2013 1 1 557 600 -3 709
#> 9 2013 1 1 557 600 -3 838
#> 10 2013 1 1 558 600 -2 753
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> # minute <dbl>, time_hour <time>
Thanks to Lionel Henry for contributing an option for determining the number of printed extra columns: getOption("tibble.max_extra_cols")
. This is particularly important for the ultra-wide tables often released by statistical offices and other institutions.
Expect the printed output to continue to evolve. In the next version, we hope to do better with very wide columns (e.g. from long strings), and to make better use of now unused horizontal space (e.g. from long column names).
Thanks for visiting r-craft.org
This article is originally published at https://www.rstudio.com/blog/
Please visit source website for post related comments.