Author: Econometrics and Free Software

April 7, 2019

Historical newspaper scraping with {tesseract} and R

I have been playing around with historical newspapers data for some months now. The “obvious” type of analysis to do is NLP, but there is also a lot of numerical...continue reading.

Econometrics and Free Software

April 7, 2019

Historical newspaper scraping with {tesseract} and R

I have been playing around with historical newspapers data for some months now. The “obvious” type of analysis to do is NLP, but there is also a lot of numerical...continue reading.

Econometrics and Free Software

March 31, 2019

Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

In this blog post I’m going to show you how you can extract text from scanned pdf files, or pdf files where no text recognition was performed. (For pdfs where...continue reading.

Econometrics and Free Software

March 31, 2019

Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

In this blog post I’m going to show you how you can extract text from scanned pdf files, or pdf files where no text recognition was performed. (For pdfs where...continue reading.

Econometrics and Free Software

March 20, 2019

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

There’s a lot going on in the development version of {tidyr}. New functions for pivoting data frames, pivot_wide() and pivot_long() are coming, and will replace the current functions, spread() and...continue reading.

Econometrics and Free Software

March 20, 2019

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

Econometrics and Free Software

March 5, 2019

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2

In part 1 of this series I set up Vowpal Wabbit to classify newspapers content. Now, let’s use the model to make predictions and see how and if we can...continue reading.

Econometrics and Free Software

March 5, 2019

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2

In part 1 of this series I set up Vowpal Wabbit to classify newspapers content. Now, let’s use the model to make predictions and see how and if we can...continue reading.

Econometrics and Free Software

March 3, 2019

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 1

Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four (1, 2, 3 and 4) blog posts, but there’s still a lot to explore....continue reading.

Econometrics and Free Software

March 3, 2019

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit

Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four (1, 2, 3 and 4) blog posts, but there’s still a lot to explore....continue reading.

Econometrics and Free Software

February 10, 2019

Manipulating strings with the {stringr} package

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 4, in which I...continue reading.

Econometrics and Free Software

February 10, 2019

Manipulating strings with the {stringr} package

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 4, in which I...continue reading.

Econometrics and Free Software

February 4, 2019

Building a shiny app to explore historical newspapers: a step-by-step guide

I started off this year by exploring a world that was unknown to me, the world of historical newspapers. I did not know that historical newspapers data was a thing,...continue reading.

Econometrics and Free Software

February 4, 2019

Building a shiny app to explore historical newspapers: a step-by-step guide

I started off this year by exploring a world that was unknown to me, the world of historical newspapers. I did not know that historical newspapers data was a thing,...continue reading.

Econometrics and Free Software

January 31, 2019

Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

I have been playing around with historical newspaper data (see here and here). I have extracted the data from the largest archive available, as described in the previous blog post,...continue reading.

Econometrics and Free Software

January 31, 2019

Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

Econometrics and Free Software

January 13, 2019

Making sense of the METS and ALTO XML standards

Last week I wrote a blog post where I analyzed one year of newspapers ads from 19th century newspapers. The data is made available by the national library of Luxembourg....continue reading.

Econometrics and Free Software

January 13, 2019

Making sense of the METS and ALTO XML standards

Last week I wrote a blog post where I analyzed one year of newspapers ads from 19th century newspapers. The data is made available by the national library of Luxembourg....continue reading.

Econometrics and Free Software

January 4, 2019

Looking into 19th century ads from a Luxembourguish newspaper with R

The national library of Luxembourg published some very interesting data sets; scans of historical newspapers! There are several data sets that you can download, from 250mb up to 257gb. I...continue reading.

Econometrics and Free Software

January 4, 2019

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Author: Econometrics and Free Software

Historical newspaper scraping with {tesseract} and R

Historical newspaper scraping with {tesseract} and R

Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 1

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit

Manipulating strings with the {stringr} package

Manipulating strings with the {stringr} package

Building a shiny app to explore historical newspapers: a step-by-step guide

Building a shiny app to explore historical newspapers: a step-by-step guide

Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

Making sense of the METS and ALTO XML standards

Making sense of the METS and ALTO XML standards

Looking into 19th century ads from a Luxembourguish newspaper with R

Looking into 19th century ads from a Luxembourguish newspaper with R

Editor Picks

CVE-2024-27322 Should Never Have Been Assigned And R Data Files Are Still Super Risky Even In R 4.4.0

Introducing Tapyr: Create and Deploy Enterprise-Ready PyShiny Dashboards with Ease

Categories

Platinum Sponsors

Sponsors

Buy us a coffee for $10.

Older posts