R News

Finding Economic Articles with Data (2nd Update)

by Economics and R - R posts · June 26, 2020

This article is originally published at http://skranz.github.io/

Almost a year is now gone since I posted my last update about my shiny-powered search app. It allows to search among currently more than 5000 economic articles that have an accessible data and code supplement:

https://ejd.econ.mathematik.uni-ulm.de

The main data for my app can be downloaded as a zipped SQLite database from my server. Let us do some analysis.

library(RSQLite)
library(dbmisc)
library(dplyr)
db = dbConnect(RSQLite::SQLite(),"articles.sqlite") %>%
  set.db.schemas(schema.file=system.file("schema/articles.yaml", package="EconJournalData"))

articles = dbGet(db,"article")
fs = dbGet(db,"files_summary")

Let us look grouped by journal at the share of articles whose code supplement has R files:

fs %>% 
  left_join(select(articles, id, journ), by="id") %>%
  group_by(journ) %>%
  mutate(num_art = n_distinct(id)) %>%
  filter(file_type=="r") %>%
  summarize(
    num_art = first(num_art),
    num_with_r = n(),
    share_with_r=round((num_with_r / first(num_art))*100,2)
  ) %>%
  arrange(desc(share_with_r))

journ	num_art	num_with_r	share_with_r
ecta	144	19	13.19
aeri	28	3	10.71
jep	127	12	9.45
restud	312	22	7.05
jpe	155	9	5.81
aejmic	129	5	3.88
aejpol	426	15	3.52
aer	1540	53	3.44
jeea	154	5	3.25
aejapp	430	13	3.02
aejmac	314	8	2.55
restat	813	6	0.74

We see that there is quite some variation in the share of articles with R code going from 13.2% in Econometrica (ecta) to only 0.74% in the Review of Economics and Statistics (restat). (The statistics exclude all articles that don’t have a code supplement or a supplement whose file types I did not analyse, e.g. because it is too large or the ZIP files are nested too deeply.)

Overall, we still have a clear dominance of Stata in economics:

# Number of articles with analyes data & code supplementary
n_art = n_distinct(fs$id)

# Count articles by file types and compute shares
fs %>% group_by(file_type) %>%
  summarize(count = n(), share=round((count / n_art)*100,2)) %>%
  # note that all file extensions are stored in lower case
  filter(file_type %in% c("do","r","py","jl","m")) %>%
  arrange(desc(share))

file_type	count	share
do	3338	70.44
m	1195	25.22
r	170	3.59
py	68	1.43
jl	8	0.17

Roughly 70% of the articles have Stata do files and a quarter Matlab m files and only 3.6% R files.

While R, Python and Julia increased their share over recent years, it seems not like a very strong trend yet.

sum_dat = fs %>% 
  left_join(select(articles, year, id), by="id") %>%
  group_by(year) %>%
  mutate(n_art_year = n()) %>%
  group_by(year, file_type) %>%
  summarize(
    count = n(),
    share=round((count / first(n_art_year))*100,2)
  ) %>%
  filter(file_type %in% c("do","r","py","jl","m")) %>%
  arrange(year,desc(share))  

library(ggplot2)
ggplot(sum_dat, aes(x=year, y=share, color=file_type)) +
  geom_line(size=1.5) + scale_y_log10() + theme_bw()

I also have a log file that anonymously stores data about which articles that have been clicked on. The code below shows the 20 most clicked on articles so far:

dat = read.csv("article_click.csv")

dat %>%
  group_by(article) %>%
  summarize(count=n()) %>%
  na.omit %>%
  arrange(desc(count)) %>%
  print(n=20)

## # A tibble: 2,707 x 2
##    article                                                                 count
##    <fct>                                                                   <int>
##  1 Consumer Spending during Unemployment: Positive and Normative Implicat~    50
##  2 Do Expert Reviews Affect the Demand for Wine?                              44
##  3 Tax Evasion and Inequality                                                 38
##  4 A Macroeconomic Model of Price Swings in the Housing Market                35
##  5 Is Your Lawyer a Lemon? Incentives and Selection in the Public Provisi~    33
##  6 The Welfare Effects of Social Media                                        31
##  7 The Rise of Market Power and the Macroeconomic Implications                29
##  8 Carbon Taxes and CO2 Emissions: Sweden as a Case Study                     27
##  9 Public Debt and Low Interest Rates                                         27
## 10 The Sad Truth about Happiness Scales                                       25
## 11 Job Polarization and Jobless Recoveries                                    24
## 12 The New Tools of Monetary Policy                                           24
## 13 Alcohol and Self-Control: A Field Experiment in India                      23
## 14 Disease and Gender Gaps in Human Capital Investment: Evidence from Nig~    23
## 15 Some Causal Effects of an Industrial Policy                                23
## 16 Food Deserts and the Causes of Nutritional Inequality                      22
## 17 Minimum Wage and Real Wage Inequality: Evidence from Pass-Through to R~    22
## 18 The Cost of Reducing Greenhouse Gas Emissions                              22
## 19 Adaptation to Climate Change: Evidence from US Agriculture                 21
## 20 Do Parents Value School Effectiveness?                                     21
## # ... with 2,687 more rows

So far there were over 11000 thousand clicks in total. Well, that is almost twice as much as the average number of Google searches in 100 milliseconds ;)

Thanks for visiting r-craft.org
This article is originally published at http://skranz.github.io/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Finding Economic Articles with Data (2nd Update)

You may also like...

Categories

Finding Economic Articles with Data (2nd Update)

You may also like...

Register for the Government & Public Sector R Conference

NumPy random seed explained

Level Up Your R/Shiny Skills with Appsilon’s Tailored Workshops

Categories