R News / Statistics

Mediators, confounders, colliders – a crash course in causal inference

by Florian Hartig · April 14, 2019

This article is originally published at https://theoreticalecology.wordpress.com

Although one would think that the basic concepts of statistics should be the same across all sciences, there is an amazing heterogeneity between fields in how statistics is taught and practiced.

I find one example of this are the validity concepts taught in the social sciences and economics (see Wikipedia). In short, those categorize “failure modes” of inference (e.g. construct validity, internal validity, external validity). For sure, ecologists are aware of these problems as well, but in ecology, they are not typically taught as a concise list / framework in the standard curriculum, which I have found to be immensely helpful for students.

Another another example is causal inference, and specifically the concept of mediators, confounders and colliders. This goes back at least to Pearl 2000 (see also Pearl 2009a,b), and with the popularity of SEMs in ecology, I’m sure that people have at least heard about causal inference in general. However, when reading the really excellent and highly recommended paper Lederer et al., 2019 “Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals.” in our group seminar, I got the distinct feeling that the practical interpretation of these ideas differs quite strongly between medical and ecological fields.

Lederer et al. first nicely establish an operational concept of causality that I would broadly agree with also for ecology: assume we look at the effect of a target variable (something that could be manipulated = predictor) on another variable (the outcome = response) in the presence of other (non-target) variables. The goal of a causal analysis is is to control for these other variables, in such a way that we estimate the same effect size that we would obtain if only the target predictor was manipulated (as in a RCT).

I’m sure everyone knows that, to do so, we have to control for confounders. I am less sure, however, if everyone is clear about what a confounder is. In particular, confounding is more specific than having a variable that correlates with predictor and response. The direction is crucial to identify true confounders. For example, Fig. 1 C from the Lederer paper shows a collider, i.e. a variable that is influenced by predictor and response. Although it correlates with predictor and response, correcting for it (or including it) in a multiple regression will create a bias on the causal link we are interested in. The bottomline of this discussions (and the essence of Pearl 2000, 2009) is that to establish causality for a specific link, we have to close the so-called back-door paths for this link, by

Controlling for confounders (back-doors, blue paths in the figure)
Not controlling for colliders, M-Bias, and other similar relationships (red paths)
It depends on the question whether we should control for mediators (yellow paths)

My impression is that these type of arguments are well-established in the medical and economic literature (in the sense that people regularly use them to defend inclusion / exclusion of variables in a regression), but that they are rarely invoked in the ecological literature.

Fig 1 DAGs, visualising the most important concepts. Red lines should not be accounted for. From Lederer et al., 2019

Moreover, what I really liked about the Lederer paper is their discussion of the Table 2 fallacy. The paper recommends that variables included as confounders should NOT be presented in the regression table at all (this is typically Table 2 in a paper, thus the name), because they are themselves usually not corrected for confounding (and they shouldn’t or at least don’t have to be corrected for, see Pearl 2000 / discussion above). Sensible advice, but I think contrary to common practice in standard and SEM regression reporting in ecology.

A cynical (but probably accurate) explanation for the fact that the Table 2 fallacy is the norm in ecology is that we rarely have a clear target variable / hypothesis, and thus we feel all variables that were used have to be discussed. A side effect is that this makes for the most boring result / discussion sections, where the effect of one variable after the other has to be discussed an interpreted. More importantly, however, each single variable that is interpreted as a causal effect should be controlled for confounding, or else we should make a clear distinction between the variables that are controlled, and those that aren’t. As I said, Lederer recommend not mentioning uncontrolled variables at all. I’m not sure if that is practical for ecology (as analyses are often semi-explorative), but I have recently been wondering about the option to separate reasonably controlled from possibly confounded variables by a bar in the regression table.

My only small quibble with the otherwise excellent Lederer paper relates to their comments about significance. First, I strongly support their call for concentrating on parameters and CIs instead of p-values. However, I find their recommendation to avoid the word “not significant” in favor of a vague term such as “the estimate is imprecise” a bad one (this is btw. similar to some other recent papers, e.g. Dushoff et al., 2019, Amrhein et al, 2019, which would make a nice topic for another post). The idea behind this recommendation is that researchers tend to misinterpret n.s. as “no effect”, but it seems to me the response should be to better educate researchers about what n.s. means, not to muddy the waters by hiding the fact that a test was done.

References

Lederer, D. J., Bell, S. C., Branson, R. D., Chalmers, J. D., Marshall, R., Maslove, D. M., … & Stewart, P. W. (2019) Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals. Annals of the American Thoracic Society, 16(1), 22-28.

Pearl, J. (2009) Causal inference in statistics: An overview. Statistics surveys 3, 96-146.

Pearl, J. (2000 / 2009) Causality. Cambridge University Press, 1st / 2nd ed.

Dushoff, J., Kain, M.P. and Bolker, B.M., 2019. I can see clearly now: reinterpreting statistical significance. Methods in Ecology and Evolution.

Amrhein, V., Greenland, S. and McShane, B., 2019. Scientists rise up against statistical significance.

Thanks for visiting r-craft.org
This article is originally published at https://theoreticalecology.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Mediators, confounders, colliders – a crash course in causal inference

You may also like...

Categories

Mediators, confounders, colliders – a crash course in causal inference

References

You may also like...

One Year as a Data Scientist at Simple

Geostatystyka w R

tidyverse 1.0.0

Categories