Correctly Reporting P-Values in Summary Tables Reported with xtable
This article is originally published at http://www.beardedanalytics.com
Often when writing a manuscript in using knitr and xtable I am flustered by my p-values. In simple summary tables, R conveniently rounds my p-values to be 0: a mathematically inappropriate task. A colleague recently commented on the poor reporting of my table (shown below using print.xtable with the type="html" argument), inspiring a much needed change.
Estimate | Std.err | Wald | Pr(>|W|) | |
---|---|---|---|---|
(Intercept) | 0.001704 | 0.000005 | 100409.770956 | 0.000000 |
sizemedium | 0.000046 | 0.000005 | 90.534705 | 0.000000 |
sizesmall | 0.000003 | 0.000005 | 0.294331 | 0.587458 |
time | -0.000004 | 0.000001 | 11.614917 | 0.000654 |
The fix is actually fairly straight forward, and can be summarized in a simple function: "fixp", with the code shown below:
fixp <- function(x, dig=3){ x <- as.data.frame(x) if(substr(names(x)[ncol(x)],1,2) != "Pr") warning("The name of the last column didn't start with Pr. This may indicate that p-values weren't in the last row, and thus, that this function is inappropriate.") x[,ncol(x)] <- round(x[,ncol(x)], dig) for(i in 1:nrow(x)){ if(x[i,ncol(x)] == 0) x[i,ncol(x)] <- paste0("< .", paste0(rep(0,dig-1), collapse=""), "1") } x }
All that's going on: the function is pulling in the summary table (usually through a $coef), trying to turn it into a dataframe (some already are, though some tables are numeric (e.g. lm)), throwing a warning if the last heading doesn't begin with "Pr" (as it may not be the column that contains p-values), and editing any values that were rounded to 0 (at the user specified rounding point) to be < the smallest number that could be rounded to (e.g. <.01). Then we output the edited table, all ready for reporting! To mimic what was above, we set our digits to be equal to 6 (so go out 6 decimal places for the p-value), and re-run:
Estimate | Std.err | Wald | Pr(>|W|) | |
---|---|---|---|---|
(Intercept) | 0.001704 | 0.000005 | 100409.770956 | < .000001 |
sizemedium | 0.000046 | 0.000005 | 90.534705 | < .000001 |
sizesmall | 0.000003 | 0.000005 | 0.294331 | 0.587458 |
time | -0.000004 | 0.000001 | 11.614917 | 0.000654 |
Much better! Also, to report a two digit p-value (for some writing styles), we simply set dig = 2:
Estimate | Std.err | Wald | Pr(>|W|) | |
---|---|---|---|---|
(Intercept) | 0.001704 | 0.000005 | 100409.770956 | < .01 |
sizemedium | 0.000046 | 0.000005 | 90.534705 | < .01 |
sizesmall | 0.000003 | 0.000005 | 0.294331 | 0.59 |
time | -0.000004 | 0.000001 | 11.614917 | < .01 |
By design, the p-values can be manipulated independent of the estimates. This allows reporting of the estimated coefficients in meaningful units (in the above example, very small units), while reporting the p-values on a scale that many writing styles request.
Want to try this yourself? Here's an example that you can try with just a built in dataset in R:
#this gives a summary table with a small p-value. Trying to report this with xtable would results in an R rounding issue! (mod <- coef(summary(lm(uptake ~ conc + Treatment + Type + Plant, data=CO2)))) #this fixes the p-value to 2 digits, correctly reporting p-values that would have been rounded to 0 fixp(mod,dig=2)
Here's the final output via print.xtable (dig=2 for fixp and xtable):
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 37.42 | 4.67 | 8.00 | < .01 |
conc | 0.02 | 0.00 | 7.96 | < .01 |
Treatmentchilled | -12.50 | 5.10 | -2.45 | 0.02 |
TypeMississippi | -23.33 | 6.01 | -3.88 | < .01 |
Plant.L | 21.58 | 11.14 | 1.94 | 0.06 |
Plant.Q | -4.62 | 2.27 | -2.03 | 0.05 |
Plant.C | 1.46 | 5.10 | 0.29 | 0.78 |
Plant^4 | 2.34 | 2.27 | 1.03 | 0.31 |
Plant^5 | -0.48 | 5.77 | -0.08 | 0.93 |
Plant^6 | -0.04 | 2.27 | -0.02 | 0.99 |
Plant^7 | -1.91 | 3.64 | -0.53 | 0.6 |
Plant^8 | -3.28 | 2.27 | -1.44 | 0.15 |
Plant^10 | 0.55 | 2.27 | 0.24 | 0.81 |
Limitations (ish):
- Again, this assumes that the last column is the one to be transformed. This is by design, though may be inconvenient in some situations. If needed, the change is easily made through the definition of the function.
- When the last column is manipulated, it becomes a character column in the dataframe. Alternatively, when it is rounded but no entry rounds to 0, it is numeric.
- This assumes a dataframe-style format of your table. Thus, this method will NOT be effective at correcting reported p-values for an individual test: say a t-test, where only the statistic is reported (and not a table). Personally this is not a concern, as I deal with these situations in other ways, but for some users seeking an overall "p-value fixing" method this may not be the answer.
As with other functions I write posts on, this function is available in my package (creatively named "myStuff") via Github. If you'd like to play with the most current version of the function, I'd encourage you to check it out here. Alternatively, to have access to other fun functions, install the package directly from GitHub with the code below (requires devtools):
devtools::install_github("flor3652/myStuff")
Thanks for visiting r-craft.org
This article is originally published at http://www.beardedanalytics.com
Please visit source website for post related comments.