Data Science / Econometrics / R / R News / Statistics

Exploring P-values with Simulations in R

by AO · May 22, 2016

This article is originally published at https://stablemarkets.wordpress.com

The recent flare-up in discussions on p-values inspired me to conduct a brief simulation study.

In particularly, I wanted to illustrate just how p-values vary with different effect and sample sizes.
Here are the details of the simulation. I simulated $n$ draws of my independent variable $X$ :

$X_n \sim N(100, 400)$

where

$n \in \{5,6,...,25\}$

For each $X_n$ , I define a $Y_n$ as

$Y_n := 10+\beta X_n +\epsilon$

where

$\epsilon \sim N(0,1)$
$\beta \in \{.05, .06,..., .25 \}$

In other words, for each effect size, $\beta$ , the simulation draws $X$ and $Y$ with some error $\epsilon$ . The following regression model is estimated and the p-value of $\beta$ is observed.

$Y_n = \beta_0 + \beta X_n$

The drawing and the regression is done 1,000 times so that for each effect size – sample size combination, the simulation yields 1,000 p-values. The average of these 1,000 p-values for each effect size and sample size combination is plotted below.

Note, these results are for a fixed $var(\epsilon)=1$ . Higher sampling error would typically shift these curves upward, meaning that for each effect size, the same sample would yield a lower signal.

There are many take-aways from this plot.

First, for a given sample size, larger effect sizes are “detected” more easily. By detected, I mean found to be statistically significant using the .05 threshold. It’s possible to detect larger effect sizes (e.g. .25) with relatively low sample sizes (in this case <10). By contrast, if the effect size is small (e.g. .05), then a larger sample is needed to detect the effect (>10).

Second, this figure illustrates an oft-heard warning about p-values: always interpret them within the context of sample size. Lack of statistical significance does not imply lack of an effect. An effect may exist, but the sample size may be insufficient to detect it (or the variability in the data set is too high). On the other hand, just because a p-value signals statistical significance does not mean that the effect is actually meaningful. Consider an effect size of .00000001 (effectively 0). According to the chart, even the p-value of this effect size tends to 0 as the sample size increases, eventually crossing the statistical significance threshold.

Code is available on GitHub.

Thanks for visiting r-craft.org
This article is originally published at https://stablemarkets.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Exploring P-values with Simulations in R

You may also like...

Categories

Exploring P-values with Simulations in R

You may also like...

How to Use the Pandas Set Index Method

Many reports from 1 RMarkdown file

RNA-seq transcript quantification from reduced-representation data in recount2

Categories