R / R News

Correctly Reporting P-Values in Summary Tables Reported with xtable

by Michael Floren · March 11, 2016

This article is originally published at http://www.beardedanalytics.com

Often when writing a manuscript in using knitr and xtable I am flustered by my p-values. In simple summary tables, R conveniently rounds my p-values to be 0: a mathematically inappropriate task. A colleague recently commented on the poor reporting of my table (shown below using print.xtable with the type="html" argument), inspiring a much needed change.

	Estimate	Std.err	Wald	Pr(>\|W\|)
(Intercept)	0.001704	0.000005	100409.770956	0.000000
sizemedium	0.000046	0.000005	90.534705	0.000000
sizesmall	0.000003	0.000005	0.294331	0.587458
time	-0.000004	0.000001	11.614917	0.000654

The fix is actually fairly straight forward, and can be summarized in a simple function: "fixp", with the code shown below:

fixp <- function(x, dig=3){
  x <- as.data.frame(x)
  
  if(substr(names(x)[ncol(x)],1,2) != "Pr")
    warning("The name of the last column didn't start with Pr. This may indicate that p-values weren't in the last row, and thus, that this function is inappropriate.")
  x[,ncol(x)] <- round(x[,ncol(x)], dig)
  for(i in 1:nrow(x)){
    if(x[i,ncol(x)] == 0)
      x[i,ncol(x)] <- paste0("< .", paste0(rep(0,dig-1), collapse=""), "1")
  }
  
  x
}

All that's going on: the function is pulling in the summary table (usually through a $coef), trying to turn it into a dataframe (some already are, though some tables are numeric (e.g. lm)), throwing a warning if the last heading doesn't begin with "Pr" (as it may not be the column that contains p-values), and editing any values that were rounded to 0 (at the user specified rounding point) to be < the smallest number that could be rounded to (e.g. <.01). Then we output the edited table, all ready for reporting! To mimic what was above, we set our digits to be equal to 6 (so go out 6 decimal places for the p-value), and re-run:

	Estimate	Std.err	Wald	Pr(>\|W\|)
(Intercept)	0.001704	0.000005	100409.770956	< .000001
sizemedium	0.000046	0.000005	90.534705	< .000001
sizesmall	0.000003	0.000005	0.294331	0.587458
time	-0.000004	0.000001	11.614917	0.000654

Much better! Also, to report a two digit p-value (for some writing styles), we simply set dig = 2:

	Estimate	Std.err	Wald	Pr(>\|W\|)
(Intercept)	0.001704	0.000005	100409.770956	< .01
sizemedium	0.000046	0.000005	90.534705	< .01
sizesmall	0.000003	0.000005	0.294331	0.59
time	-0.000004	0.000001	11.614917	< .01

By design, the p-values can be manipulated independent of the estimates. This allows reporting of the estimated coefficients in meaningful units (in the above example, very small units), while reporting the p-values on a scale that many writing styles request.

Want to try this yourself? Here's an example that you can try with just a built in dataset in R:

#this gives a summary table with a small p-value. Trying to report this with xtable would results in an R rounding issue!
(mod <- coef(summary(lm(uptake ~ conc + Treatment + Type + Plant, data=CO2))))

#this fixes the p-value to 2 digits, correctly reporting p-values that would have been rounded to 0
fixp(mod,dig=2)

Here's the final output via print.xtable (dig=2 for fixp and xtable):

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	37.42	4.67	8.00	< .01
conc	0.02	0.00	7.96	< .01
Treatmentchilled	-12.50	5.10	-2.45	0.02
TypeMississippi	-23.33	6.01	-3.88	< .01
Plant.L	21.58	11.14	1.94	0.06
Plant.Q	-4.62	2.27	-2.03	0.05
Plant.C	1.46	5.10	0.29	0.78
Plant^4	2.34	2.27	1.03	0.31
Plant^5	-0.48	5.77	-0.08	0.93
Plant^6	-0.04	2.27	-0.02	0.99
Plant^7	-1.91	3.64	-0.53	0.6
Plant^8	-3.28	2.27	-1.44	0.15
Plant^10	0.55	2.27	0.24	0.81

Limitations (ish):

Again, this assumes that the last column is the one to be transformed. This is by design, though may be inconvenient in some situations. If needed, the change is easily made through the definition of the function.
When the last column is manipulated, it becomes a character column in the dataframe. Alternatively, when it is rounded but no entry rounds to 0, it is numeric.
This assumes a dataframe-style format of your table. Thus, this method will NOT be effective at correcting reported p-values for an individual test: say a t-test, where only the statistic is reported (and not a table). Personally this is not a concern, as I deal with these situations in other ways, but for some users seeking an overall "p-value fixing" method this may not be the answer.

As with other functions I write posts on, this function is available in my package (creatively named "myStuff") via Github. If you'd like to play with the most current version of the function, I'd encourage you to check it out here. Alternatively, to have access to other fun functions, install the package directly from GitHub with the code below (requires devtools):

devtools::install_github("flor3652/myStuff")

Thanks for visiting r-craft.org
This article is originally published at http://www.beardedanalytics.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Correctly Reporting P-Values in Summary Tables Reported with xtable

You may also like...

Categories

Correctly Reporting P-Values in Summary Tables Reported with xtable

You may also like...

Tidy dev day take two: Toulouse

Announcing bundle

Regression vs Classification, Explained

Categories