Interventions and "no causation without manipulation"
A rough Internet guide to interventions in causal models.
Interventions are the foundational abstraction in causal modeling. I recently found myself repeatedly explaining the definition of intervention to various audiences, so I figured I'd write up my usual schpiel to save time.
Roughly speaking, an intervention is as anything you or another agent does to change the data-generating process. The noun "intervention" has the associated verb "intervene" or, more colloquially, "do." Interventions do things that cause other things to happen. A causal model, by definition, can predict the outcome of an intervention.
We can do interventions in real life. Sometimes. In reinforcement learning, game theory, and related agent modeling problems, interventions are called "actions." For example, a robot agent's action is an intervention that potentially changes its environment.
In randomized experimentation, interventions are "treatments" or, if you are in a wet lab, "perturbations."
I say "sometimes" because, when it comes to causal inference, whether or not we can do a thing or even define clearly what is to be done affects what causal inferences we can make and how we make them.
People who train robots or run experiments know that real-life interventions can be costly and even infeasible. Thus, the primary benefit of a causal model is to predict the outcomes of interventions before we do them in real life, especially if we can't do them in real life.
The Ladder of Causal Inference
The figure above illustrates a "ladder of causal inference" (Pearl 2018). Note that the ladder of causality is a strict hierarchy, meaning that each rung virtually always underdetermines information at higher rungs (Bereimboim and Icard 2020).
Interventions sit squarely at the second rung. A machine learning model is a causal model if it sits at the second rung or above. In other words, the ability to model interventions and predict their effects is a requirement for modeling' frameworks entry into the club of causal models.
For example, if you were a data scientist at an e-commerce company, you might train a model that forecasts future sales given current conditions. However, a causal model would allow you to forecast what would happen to sales if you were to do an intervention, such as a promotional campaign. Depending on how nuanced the model is, it could predict how different promotions would affect sales, and you could select the ideal type of promotion given current conditions. That's more useful than the bottom rung predictive model.
A model needs an explicit representation of causal relationships and calculating what happens to an effect after intervening on a cause. A directed acyclic graph model gives us these things, along with a well-developed, broadly understood formal theory. However, other methods of modeling causality, such as differential equations, Petri nets, and computer simulations, can also meet these conditions.he figure above illustrates a "ladder of causal inference" (Pearl 2018). Note that the ladder of causality is a strict hierarchy, meaning that each rung virtually always underdetermines information at higher rungs (Bereimboim and Icard 2020).
Interventions sit squarely at the second rung. A machine learning model is a causal model if it sits at the second rung or above. In other words, the ability to model interventions and predict their effects is a requirement for modeling' frameworks entry into the club of causal models.
For example, if you have a causal model of the business cycles of your e-commerce business, it would be able to predict the effects of an intervention like running an ad campaign, if no ad campaign data was used to train the model.
To model an intervention, we need to have some representation of causal relationships, such as a directed acyclic graph. We also need some means of telling what happens to an effect given a change in the cause. The directed graph model gives us both. However, other methods of modeling causality, such as differential equations and simulations can satisfy these constraints as well.
The "ideal" intervention
An ideal intervention is one definition of intervention (from Spirtes 2000). Synonyms for "ideal intervention" include "structural intervention" (following Eberhardt), "surgical intervention" (following Pearl), and "independent intervention" (following Korb).
An ideal intervention is (1) an operation that targets a single variable in the model. That operation (2) sets that random variable to a fixed value. That value-fixing (3) blocks the influence from the target's causes, such that the target is now statistically independent of its causes.
There are nuances, of course. For example, if you instead applied a change that created a new probability distribution for the target that was not conditional on its causes instead of setting the target to a fixed value, we would still call this an ideal intervention. But thinking about fixed values is easier, and it is more likely to be what we want to think about in practical settings.
Randomization is an ideal intervention
A randomized experiment, such as an A/B test, is one where the subject's treatment (treatment A or treatment B) is assigned at random for each subject of the experiment. This random assignment is a particular type of ideal intervention. It fixes the cause to a value that is selected at random.
Manipulation Theory of Causality
It is essential to recognize that by creating any definition of intervention, we are delving into philosophy. Interventions are central to all philosophical theories of causation.
Some of these theories, called manipulability theories, explicitly define causality in terms of intervention. In these theories, a cause is a cause if intervening on it would change the effects. In other words, a cause is a cause if it can act like a knob for manipulating effects.
The following articulation of manipulability theory appeared in the 1979 edition of an influential textbook on experimental design in the social sciences.
The paradigmatic assertion in causal relationships is that manipulation of a cause will result in the manipulation of an effect… Causation implies that by varying one factor I can make another vary. (Cook & Campbell 1979: 36, emphasis in original)
Caveat of the manipulability theory and a contrasting definition
A caveat of manipulability theory is that it defines causality in terms of human agents. For example, consider the philosophical question, "If a tree falls in the forest and no one is around to hear it, does it make a sound?" The philosophical thought experiment suggests that a sound cannot exist unless an agent is around to perceive it. Similarly, manipulability theory indicates that a cause does not exist unless an agent is around to manipulate it. Yet most of the universe is "agent"-free, where presumably things still nonetheless cause things to happen.
The standard contrasting definition is the counterfactual definition of causality - a third rung definition. The basic idea of counterfactual theories of causation is that the meaning of causal claims can be explained in terms of counterfactual conditionals of the form "If A had not occurred, C would not have occurred." A formal definition is given by David Lewis 1973.
We think of a cause as something that makes a difference, and the difference it makes must be a difference from what would have happened without it. Had it been absent, its effects – some of them, at least, and usually all – would have been absent as well.
The first definition is typically ascribed to philosopher David Hume.
We may define a cause to be an object followed by another, and where all the objects, similar to the first, are followed by objects similar to the second. Or, in other words, where, if the first object had not been, the second never had existed.” - An Enquiry Concerning Human Understanding, 1748, Section VII
However, philosophers of metaphysics have been circling the causal definition for quite a while. For example, Buddha's idea specific conditionality (in Pali, Idappaccayatā) resembles counterfactual language and preceded Hume by a few millennia.
If this is, that comes to be; from the arising of this, that arises; if this is not, that does not come to be; from the stopping of this, that is stopped. - Saṃyutta Nikāya, 12.22.22
"No causation without manipulation"
I said that interventions are actions we can do in the real world sometimes. The "sometimes" relates to a famous aphorism amongst causal inference researchers who lean towards the manipulation theory of causality: "no causation without manipulation."
The idea is that something should only be defined as a cause if it could be a treatment in an experiment (Holland 1986, 954). If the experiment is ethically or practically infeasible, that is OK, so long as we can precisely conceptualize it as a hypothetical experiment. This idea is not mere philosophical trivia; it profoundly impacts how we perform causal analysis.
Under the "no causation without manipulation," you might say wealth is a cause of happiness because one could imagine an experiment where we intervened on people's wealth by giving them (or taking away) money. However, under this mode of thinking, you would not judge race as a valid candidate cause of wealth because what would that experiment look like? How would you manipulate race? Would you change an individual's external phenotype (e.g., skin color)? Or would you change the DNA? Would you pluck their soul from a black body and stick it into a white body, like some sort of racial Freaky Friday?
Or, perhaps, would you change how other people perceive an individual's race, such as changing a "black-sounding" name on a resume? If it's the latter, then isn't others' perception of an individual's race the actual cause in question, rather than the race itself?
On the other hand, a problem with the "no causation without manipulation" view is that it doesn't quite align with how people reason about causality. For example, people naturally understand causal statements "race causes wealth" or causal explanations like "he's poor because of his race." People had reasoned this way long before anyone invented randomized experiments (and indeed before they started imagining hypothetical experiments).
As a pragmatic point, my experience is that data science organizations of tech giants and tech-savvy companies favor the "no causation without manipulation" approach. These are experimentation-driven organizations, and it makes sense to precisely define causes in terms of hypothetical experiments defined in terms of business metrics.
I like the ontological precision of "no causation without manipulation.” I favor defining my problem space with hypothetically manipulable causes so long as I am not stretching the domain abstractions to absurd extremes. Replacing "race causes wealth" with "employers perception of race-based on cues on a resume causes wealth" borders on absurdity. Where that point of balance lies depends on the problem domain. "No causation without manipulation" makes sense if you're running digital experiments at Facebook.
However, in domains like psychology or quantitative marketing, it would be valuable to have models that could quantitatively assess statements like "she would have had a better experience if she were more charming", despite the fact that its harder to imagine manipulating one’s ability to charm.
If you liked this post, help me grow my audience by sharing.
Go Deeper
Bareinboim, E., Correa, J.D., Ibeling, D. and Icard, T., 2020. On pearl’s hierarchy and the foundations of causal inference. ACM Special Volume in Honor of Judea Pearl.
Holland, Paul W. 1986. “Statistics and Causal Inference.” J. Am. Stat. Assoc. 81 (396): 945–60.
Holland, Paul W. 2003. “Causation and Race.” ETS Research Report Series 2003 (1): i–21.
Rubin, D B. 1975. “Bayesian Inference for Causality: The Importance of Randomization.” The Proceedings of the Social Statistics Section of the.
Pearl, Judea, and Dana Mackenzie. The Book of Why: the New Science of Cause and Effect. Basic Books, 2018.
Great summary Robert.
Second section (starting from 4th paragraph) is repetitive.