How a riddle proves there's no "objective" predictive model.
Black swans, blue sapphires, and the problem of induction.
In logic, deduction is the process of reasoning from a set of logical premises to a conclusion. The process is certain, in that given the premises, there is no question about the conclusion. The classic example of deductive reasoning is as follows.
Premise: All men are mortal.
Premise: Socrates is a man.
Conclusion: Therefore, Socrates is mortal.
In contrast, induction is the case where the premises are not sufficient to reach a conclusion with certainty. The classic example is that of the black swan.
Premise: The first swan I saw was white.
Premise: The second swan I saw was white.
... lots of white swans
Premise: The nth swan I saw was white.
Conclusion: Therefore, all swans are white.
"The black swan" example is famous because Europeans long held on to the inductive conclusion that all swans were white until European expeditions in Australia discovered the black subspecies native to the region.
The black swan example illustrations the problem of induction, one of the core problems of philosophy. The problem of induction is an age-old philosophical interrogation of if and how inductive reasoning can lead to true knowledge.
Does machine learning have a problem of induction?
I argue that philosophical attempts to parse the problem of induction apply directly to statistical machine learning.
Philosopher Karl Popper took a rather extreme position on the problem. Popper believed that data could not provide evidence of a hypothesis or a prediction.
To a statistician or machine learning researcher, the notion that data does not favor one hypothesis/prediction over another borders on absurdity. For machine learning, the past decade has seen learning algorithms blow past multiple benchmarks of real-world impact. Many of these victories included defeating expert humans at games like Go and Poker, where brute-force computation doesn’t work. Given this evidence, it is tempting to use Popper's extreme views as a reason to dismiss philosophical discourse about "the problem of induction" out of hand.
I beg to differ. Popper is too extreme, but I propose understanding the problem of induction helps us parse how learning algorithms work. It helps us interpret statistical learners in terms of their inductive biases, and what trade-offs we are making when we prefer one inductive bias to another.
One of the best lenses to view the problem of induction is Nelson Goodman’s New Riddle of Induction.
Nelson’s New Riddle of Induction gives us solid intuition about prediction problems.
One aspect of the problem of induction is the difficulty in finding a logical justification for statistical prediction; the idea that patterns observed regularly in the training data will continue with unseen data.
Now we get to the meat of this post. Help me close out this awful year with some good subscription numbers. If you like this post, please share.
Goodman's riddle illustrates how the way we define a predictive modeling problem influences a model's predictions in surprising ways. Below, I refactor the traditional description of the riddle to use machine learning terminology.
Green, blue, or grue?
Suppose you are a geologist working for a commercial mining company. Your job is to predict where the company should dig to find precious gemstones.
For simplicity, let's focus on sapphires. Blue sapphires are valuable, and green sapphires are less so. Every time a dig produced a sapphire, you logged the geological features of the excavation site, and you logged the color.
Today, you will use this data to train a model that predicts the color of the sapphire given geological conditions. Your model makes a typical inductive assumption; in your specific case, it assumes that blue sapphires appear in one set of geological conditions, and green sapphires will appear in another. When you train your model, your model learned that a set of geological conditions (call it geology-A) produced green sapphires. A different set of geological conditions (call it geology-B) produced blue sapphires. The company can use your algorithm to prioritize its digs. Great job! You'll probably get a bonus.
Suppose, however, that we were to make up an entirely new color. Call it "grue." The color "grue" is going to have a definition that uses a hinky bit of disjunctive temporal logic; "grue" is defined as <{blue ANDÂ seen before today}Â ORÂ {green ANDÂ seen after today}>.
Given this definition, all those valuable blue sapphires in the training data were also grue. If the training data shows geology-B produces blue and grue sapphires, then your predictive model will predict future geology-B sapphires as both blue and grue. But this is a contradiction because, according to the definition of grue, after today, grue simplifies to green. Sapphires can't be both blue and green.
It can't be both green and blue. The problem is you ಠ_ಠ.
What happened here? You might think that grue is a deliberately absurd concept, and such concepts don’t emerge in real problems. Yes (it is absurd) and no (it happens all the time). Color is a concept defined in terms of what humans see. All I did is relax the assumption that how humans perceive color is invariant in time.
Might there be other things that people try to predict that vary across time or other dimensions? Are there cases where we wish to model concepts that are disjoint in some situations but leak into one another in others? I can think of many.
Indeed, Goodman tried to resolve the "grue-blue" contradiction by requiring concepts to be consistent across time. He argued that the concepts we use in predictive models have to be "law-like" or "projectable." We could extend this invariance requirement from time to other dimensions, such as culture.
The idea of grue as a data label is absurd. But your predictive algorithm doesn't know that; it's just an algorithm. Goodman's riddle shows how the concepts we use to define the prediction problem shape the predictions independently of the algorithm's math.
It shows us that no predictive model is "objective," since the definition of the prediction problem inherently depends on the modeler's subjective view.
Let's get meta. This was a fun post to write. I took the original form of Goodman's New Riddle, took out the logic lingo, and recast it in the context of a real-world modeling problem without compromising on simplicity. Let me know what you think in the comments below, and don't forget to share!
Further reading:
The problem of induction — Stanford Encyclopedia of Philosophy