Automated Data Collection with R and mlbgameday
This article is originally published at https://www.datascienceriot.com/
Opening day is on the way Time to set up a persistent database to collect every pitch thrown in this year’s baseball season.
The mlbgameday
package is designed to facilitate extract, transform and load for MLBAM “Gameday” data. The package is optimized for parallel processing of data that may be larger than memory. Learn more about the project here.
Install from CRAN
install.packages("mlbgameday")
Creating a Database
Extract Transform Load of MLB Advanced Media Data
Once you have a database in-place, you can get started quickly. The mlbgameday
package will work if your current database was gathered using the pitchRx
package.
Task Scheduling
I prefer to pull the day’s data early in the morning (for the day before.) What ever time you choose, you want to consider time zones and allow enough additional time to cover rain delays for late games, as not to miss any information. There are various task scheduling tools, depending on your operating system.
Linux or OSx: Cron is pretty much the universal standard. Cron is command line driven, but GUI interfaces exist for both operating systems.
Windows: Several options, but the built-in task scheduler is probably the best.
Thanks for visiting r-craft.org
This article is originally published at https://www.datascienceriot.com/
Please visit source website for post related comments.