R News

pool package on CRAN

by RStudio | Open source & professional software for data science teams on RStudio · November 17, 2017

This article is originally published at https://www.rstudio.com/blog/

The pool package makes it easier for Shiny developers to connect to databases. Up until now, there wasn’t a clearly good way to do this. As a Shiny app author, if you connect to a database globally (outside of the server function), your connection won’t be robust because all sessions would share that connection (which could leave most users hanging when one of them is using it, or even all of them if the connection breaks). But if you try to connect each time that you need to make a query (e.g. for every reactive you have), your app becomes a lot slower, as it can take in the order of seconds to establish a new connection. The pool package solves this problem by taking care of when to connect and disconnect, allowing you to write performant code that automatically reconnects to the database only when needed.

So, if you are a Shiny app author who needs to connect and interact with databases inside your apps, keep reading because this package was created to make your life easier.

What the `pool` package does

The pool package adds a new level of abstraction when connecting to a database: instead of directly fetching a connection from the database, you will create an object (called a “pool”) with a reference to that database. The pool holds a number of connections to the database. Some of these may be currently in-use and some of these may be idle, waiting for a new query or statement to request them. Each time you make a query, you are querying the pool, rather than the database. Under the hood, the pool will either give you an idle connection that it previously fetched from the database or, if it has no free connections, fetch one and give it to you. You never have to create or close connections directly: the pool knows when it should grow, shrink or keep steady. You only need to close the pool when you’re done.

Since pool integrates with both DBI and dplyr, there are very few things that will be new to you, if you’re already using either of those packages. Essentially, you shouldn’t feel the difference, with the exception of creating and closing a “Pool” object (as opposed to connecting and disconnecting a “DBIConnection” object). See this copy-pasteable app that uses pool and dplyr to query a MariaDB database (hosted on AWS) inside a Shiny app.

Very briefly, here’s how you’d connect to a database, write a table into it using DBI, query it using dplyr, and finally disconnect (you must have DBI, dplyr and pool installed and loaded in order to be able to run this code):

conn <- dbConnect(RSQLite::SQLite(), dbname = ":memory:")
dbWriteTable(conn, "quakes", quakes)
tbl(conn, "quakes") %>% select(-stations) %>% filter(mag >= 6)
## # Source:   lazy query [?? x 4]
## # Database: sqlite 3.19.3 [:memory:]
##      lat   long depth   mag
##    <dbl>  <dbl> <int> <dbl>
## 1 -20.70 169.92   139   6.1
## 2 -13.64 165.96    50   6.0
## 3 -15.56 167.62   127   6.4
## 4 -12.23 167.02   242   6.0
## 5 -21.59 170.56   165   6.0

dbDisconnect(conn)

And here’s how you’d do the same using pool:

pool <- dbPool(RSQLite::SQLite(), dbname = ":memory:")
dbWriteTable(pool, "quakes", quakes)
tbl(pool, "quakes") %>% select(-stations) %>% filter(mag >= 6)
## # Source:   lazy query [?? x 4]
## # Database: sqlite 3.19.3 [:memory:]
##      lat   long depth   mag
##    <dbl>  <dbl> <int> <dbl>
## 1 -20.70 169.92   139   6.1
## 2 -13.64 165.96    50   6.0
## 3 -15.56 167.62   127   6.4
## 4 -12.23 167.02   242   6.0
## 5 -21.59 170.56   165   6.0

poolClose(pool)

What problem `pool` was created to solve

As mentioned before, the goal of the pool package is to abstract away the logic of connection management and the performance cost of fetching a new connection from a remote database. These concerns are especially prominent in interactive contexts, like Shiny apps. (So, while this package is of most practical value to Shiny developers, there is no harm if it is used in other contexts.)

The rest of this post elaborates some more on the specific problems of connection management inside of Shiny, and how pool addresses them.

The connection management spectrum

When you’re connecting to a database, it’s important to manage your connections: when to open them (taking into account that this is a potentially long process for remote databases), how to keep track of them, and when to close them. This is always true, but it becomes especially relevant for Shiny apps, where not following best practices can lead to many slowdowns (from inadvertently opening too many connections) and/or many leaked connections (i.e. forgetting to close connections once you no longer need them). Over time, leaked connections could accumulate and substantially slow down your app, as well as overwhelming the database itself.

Oversimplifying a bit, we can think of connection management in Shiny as a spectrum ranging from the extreme of just having one connection per app (potentially serving several sessions of the app) to the extreme of opening (and closing) one connection for each query you make. Neither of these approaches is great: the former is fast, but not robust, and the reverse is true for the latter.

In particular, opening only one connection per app makes it fast (because, in the whole app, you only fetch one connection) and your code is kept as simple as possible. However:

it cannot handle simultaneous requests (e.g. two sessions open, both querying the database at the same time);
if the connection breaks at some point (maybe the database server crashed), you won’t get a new connection (you have to exit the app and re-run it);
finally, if you are not quite at this extreme, and you use more than one connection per app (but fewer than one connection per query), it can be difficult to keep track of all your connections, since you’ll be opening and closing them in potentially very different places.

While the other extreme of opening (and closing) one connection for each query you make resolves all of these points, it is terribly slow (each time we need to access the database, we first have to fetch a connection), and you need a lot more (boilerplate) code to connect and disconnect the connection within each reactive/function.

If you’d like to see actual code that illustrates these two approaches, check this section of the pool README.

The best of both worlds

The pool package was created so that you don’t have to worry about this at all. Since pool abstracts away the logic of connection management, for the vast majority of cases, you never have to deal with connections directly. Since the pool “knows” when it should have more connections and how to manage them, you have all the advantages of the second approach (one connection per query), without the disadvantages. You are still using one connection per query, but that connection is always fetched and returned to the pool, rather than getting it from the database directly. This is a whole lot faster and more efficient. Finally, the code is kept just as simple as the code in the first approach (only one connection for the entire app), since you don’t have to continuously call dbConnect and dbDisconnect.

Feedback

This package has quietly been around for a year and it’s now finally on CRAN, following lots of the changes in the database world (both in DBI and dplyr). All pool-related feedback is welcome. Issues (bugs and features requests) can be posted to the github tracker. Requests for help with code or other questions can be posted to community.rstudio.com/c/shiny, which I check regularly (they can, of course, also be posted to Stack Overflow, but I’m extremely likely to miss it).

Thanks for visiting r-craft.org
This article is originally published at https://www.rstudio.com/blog/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

pool package on CRAN

You may also like...

Categories

pool package on CRAN

What the pool package does

What problem pool was created to solve

The connection management spectrum

The best of both worlds

Feedback

You may also like...

R Weekly 2019-08 No Warranty, Weather

Sketchnotes from TWiML&AI #115: Scaling Machine Learning at Uber with Mike Del Balso

Dealing with heteroskedasticity; regression with robust standard errors using R

Categories

What the `pool` package does

What problem `pool` was created to solve