Data Science / R News

Slides for “Achieving Practical Reproducibility with Transparency and Accessibility” (DSSV 2020)

by Brian Lee Yung Rowe · July 31, 2020

This article is originally published at https://cartesianfaith.com

I was invited to speak at the ASA’s Symposium on Data Science and Statistics as well as the SAMSI/IASC conference on Data Science, Statistics, and Visualization thanks to Jim Harner at WVU. Both talks were on my approach to reproducible science based on my forthcoming book Introduction to Reproducible Science in R.

The talks make two key points:

The scientific method can be deconstructed into methodology and environment. Code is your methodology and your workstation is your environment.
Reproducibility is a function of transparency and accessibility

I’ve seen a lot of emphasis on tools to improve reproducibility but less so on process. Even if we solve the problem of being able to reproduce someone else’s work exactly (same data, same code) easily, the tool cannot interpret the methodology for us.

Here’s a simple example I learned in college. Suppose I have a black box that can compute 16/64. When you run the function you get the correct answer: 1/4. However, the method simply cancels the sixes to yield 1 over 4.

You may get desired results but for the wrong reasons. And bad actors will produce desired results by cheating/lying. Transparency ensures others can verify that results are credible. The hydroxychloroquine sham dataset provided by Surgisphere is a poignant example showing how data provenance is one component of transparency. An earlier “study” by the CEO of Surgisphere included doctored images as “results”. Without a transparent method, it’s hard to root out bad actors.

Related to transparency is accessibility: are people of differing ability levels able to reproduce results? In the deep learning realm, many models are not accessible due to their size. GPT-3 is estimated to have cost $4.5mm to train, which means few people can reproduce the results of GPT-3. (That said, the deep learning solution to this problem is to use transfer learning to demonstrate generalizability of the model, thus skirting the issue of strict reproducibility).

The complete presentation from DSSV goes into a bit more detail. Enjoy!

Rowe – DSSV 2020 Download

Thanks for visiting r-craft.org
This article is originally published at https://cartesianfaith.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Slides for “Achieving Practical Reproducibility with Transparency and Accessibility” (DSSV 2020)

You may also like...

Categories

Slides for “Achieving Practical Reproducibility with Transparency and Accessibility” (DSSV 2020)

You may also like...

GitHub Actions for R developers, v2

Couplings of Normal variables

Low Friction Package Management in Three Parts

Categories