The following is an excerpt from my forth-coming book on Causal Machine Learning. An early-access version of the book is available online.
Read other parts:
How causal ML adds value to the organization
There are several ways causal machine learning provides benefits to an organization engaged in engineering software reliant on machine learning. In particular, I argue that causality adds value by making machine learning more robust, decomposable, and explainable.
Robust ML: The model breaks less often and generalizes better.
Decomposable ML: Causal ML models can be separated into components, making them easier to maintain, test, deploy, and update.
Explainable ML: Components of the Causal ML model map to real world concepts. This mapping helps capture employee domain knowledge, helps decision-makers understand the model, and helps engineers debug the model.
More robust machine learning
Machine learning models lack robustness when differences between the environment where the model was trained, and the environment where the model is deployed, cause the model to break down. Examples of a lack of robustness include:
Overfitting: The learning algorithm places too much weight on spurious statistical patterns in the training data (and “held-out” data used for parameter tuning). “Spurious” means these patterns appeared by random chance, such as in the figure below, which illustrates apparent correlation between two obviously unrelated phenomenon. Spurious patterns in training data are unlikely appear in the environment where you hope to deploy the model. Techniques such as cross-validation and measuring algorithm capacity/complexity can attempt to ameliorate overfitting.
Underspecification [1]: Many equivalent configurations of a model perform equivalently on test data but perform differently in the deployment environment. One sign of “ ” is sensitivity to arbitrary elements of the model’s configuration, such as a random seed.
Data drift: As time passes, the characteristics of the data in the environment where you deploy the model differ or “drift” from the characteristics of the training data.
A common refrain is that these issues can be addressed with more data -- that given enough data, deep learning architectures can learn anything. But even if you have the luxury of unlimited data and the compute budget to train large models upon it, “correlation does not imply causation” holds true, even if it’s correlation from big data. Still others argue "junk in, junk out", meaning data quality trumps data quantity. Causal modeling can help us better understand what "quality" data looks like.
But these robustness problems do not condemn modern machine learning. Rather, they show we have work to do in discovering how to use causal methods to enhance the state-of-the-art. That is why Microsoft, Google, Meta, and other tech companies deploy causal machine learning techniques to make their machine learning services more robust. It is also why notable deep learning researchers are pursuing research combining deep learning with causal reasoning.
Causal modeling enhances robustness by representing invariant causal relationships between predictors and the predicted outcome. As a simple example, if I collect data on temperature and air pressure in the tropical coastal city of Honolulu, it will look much different than the temperature and air pressure data collected in the high elevation city of Katmandu. But the physics-based causal mechanisms connecting temperature and air pressure are the same (invariant) no matter where I am on Earth.
Capturing that causal invariance helps avoid overfitting. From a causal perspective, a “spurious statistical pattern” is a pattern that isn’t driven by some underlying causal relationship. For example, we might call a correlation between Nicolas Cage movie releases and drownings “spurious” because it’s a random alignment of truly unrelated events. However, a causal modeler might consider a correlation between ice cream sales and robberies as originating from a shared cause of Summer heat; and thus, not spurious at all. Causal models naturally avoid overfitting by connecting statistical patterns in the data to causal structure in the world.
Modeling causal invariance also helps avoid underspecification. One core contribution of formal causal inference is algorithms that tell you when a causal prediction is “identified,” i.e., not “underspecified,” meaning a unique answer exists given the assumptions and the data.
Finally, causal invariance helps with data drift. With the pressure/temperature example, if I train a model on the Honolulu data, I’ll achieve better performance on the Katmandu data if I can manage to encode the causal invariance of that physical mechanism in the model’s architecture. Of course, causal invariances rooted in known physical mechanisms is the ideal case. In non-physics domains such as econometrics and social science, identifying causal invariances that help minimize data drift is more challenging. But causal modeling provides a path to doing so.
More composable machine learning
Causal models decompose into components; tuples of effects and their direct causes. The benefits of this decomposability include:
Components of the model can be tested and validated separately.
Components of the model can be executed separately, enabling more efficient use of modern cloud computing infrastructure.
When additional training data is available, only the components relevant to the data need retraining.
Components of old models can be reused in new models targeting new problems.
There is less sensitivity to suboptimal model configuration and hyperparameter settings because components can be optimized separately.
The components of the causal model map to concepts in the modeling domain. This mapping helps members of your team turn their domain knowledge into model artifacts that stay with the organization after that employee’s tenure, and it helps new team members learn quickly.
More explainable machine learning
The behavior of modern machine learning algorithms can be hard to explain. Explicability is particularly important in the context of business and engineering. If your team deploys a predictive algorithm and it behaves in a way that hurts your business, you don’t want to be stuck spouting machine learning technobabble and handwaving when your boss asks you what happened. You want a concise explanation that hopefully suggests ways to avoid the problem in the future. As an engineer, you want that explanation distilled down to a concise bug report that shows in simple terms the nature of the error, what the correct output should have been, what inputs will reproduce the error, and where the code logic starts to go awry given those inputs. Armed with that explanation of the issue, you can efficiently fix the problem.
Explicability also matters to third-party users of a machine learning-based service. For example, suppose a product feature presents a user with a recommendation. That user could need to know why the feature made them a particular recommendation. An explanation is an essential element in providing recourse so the user can get better results in the future. For example, video streaming services often explain recommended content with “Because you watched X,” where X is viewed content similar to the recommended content. Instead, imagine richer content based on favored genres, actors, and themes. Instead of promoting rabbit holes of similar content, such explanations might suggest how you might explore unfamiliar content that could expand your tastes and generate more valuable recommendations in the future.
There are multiple approaches to explanation, such as analyzing node activation in neural networks. But causal models are eminently explainable because they directly encode easy-to-understand causal relationships in the modeling domain. Indeed, causality is the core of explanation; to explain an event means to provide the cause of the event [2]. Causal models provide explanations in the language of the domain you are modeling (semantic explanations) rather than in terms of the model’s architecture (“nodes” and “activations” - syntactic explanations).
[1] D'Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen, C., Deaton, J., Eisenstein, J., Hoffman, M.D. and Hormozdiari, F., 2020. Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395.
[2] “Explain.” Merriam-Webster.com Dictionary, Merriam-Webster, https://www.merriam-webster.com/dictionary/explain. Accessed 7 Mar. 2022.