My approach to statistical modeling is to take my knowledge and beliefs about the process that generates the data and explicitly encode them into a model. The approach is called model-based machine learning; an idea described quite well by Chris Bishop in an intro to a book he wrote on the topic.
Model-based machine learning contrasts with a broader class of methods statistical machine learning, which seeks to find nuanced statistical patterns in the data and uses those patterns to make predictions. Model-based machine learning also does this, but it finds patterns consistent with my knowledge and beliefs.
I take model-based machine learning and stretch it a bit further. I try to build in as many assumptions about the actual mechanisms driving the data as I can. A model with "mechanisms" could be something like a differential equation, a law of physics, a set of logical premises, or theory from economics, cognitive science, or linguistics. The approach depends on the domain.
I like I explicitly to build assumptions about mechanisms into my models because they make the models falsifiable.
Falsifiability means a model can be refuted by evidence. If your model suggests something should be true, and that thing is not true in the data you observe, then that data has falsified your model.
One of the practical advantages of building falsifiable models is it lets you know when the sands are shifting under your feet. Your model might be working fine and predict well, but once reality starts contradicting the assumptions underpinning your model, you know you should update the model before those predictions go off the rails.
Billionaire investor George Soros (that guy at the heart of a basically every conspiracy theory on the right) attributes his wealth to this manner of thinking. He develops models about how the markets are moving, trades on them, while at the same time seeking evidence about how they might be wrong. If he's wrong, he updates his models. Boom! Billionaire.
I'm only rich because I know when I'm wrong … I basically have survived by recognizing my mistakes.
~ George Soros
Interesting? Help me find other like-minded modelers.
Falsifiability helps me focus.
Let's define the term "model" a bit more loosely. Code that takes in data and spits out predictions in a computer is an algorithmic statistical model. But more broadly, any artifact that incorporates evidence about the world and makes predictions for how the world works is also a model. This definition includes the mental models that we have in our head, or that exist as memes on social media.
The popular culture is awash with non-falsifiable "models." These are the modern-day religions, the tribal shibboleths, the media narratives, the hashtags.
I find folks try to pull me into debates about such models, challenging me to refute them. A cousin will bait me with comments about the power of positive thinking. My brother asserts that gender studies is a plot by a crypto-Marxist elite and challenges me to prove otherwise. An intellectual weekly temps me with a provocative headline about the culture wars. An inlaw extolls the virtues of an Eastern medicinal tradition. One cannot falsify such models, and that is often by design.
Falsifiability is my noise filter. When I smell a model that can't be falsified, I do not engage.
I disengage not because I believe these models are wrong. Indeed many highly useful models are non-falsifiable, such as the theory of evolution and Freud's psychoanalytic theory.
I disengage because there is nothing to be done. It is a question of actionability. Falsifiable models are actionable. If you have a good model that fits most of the previous evidence, has predictive power, but is utterly refuted by some new piece of evidence, one then has something to do! You get to go back to the drawing board, inspect the mechanisms that comprise that model, and which of them is wrong, then figure out how to fix it. You get to make a better model.
Iterating on a broken model is where the fun is. One only has so much time to have fun. Why spend it spinning one's wheels?
Nice post! By putting the burden of proof for refuting a model on the proposer of the model, you neatly sidestep a tortuous conversation. It reminds me of what my mentor Gerry Sussman used to say when confronted with these kinds of arguments: "That theory is useless, because it's not even wrong!"