My bet on causal reinforcement learning

Plus composable networks in deep learning

Sep 03, 2019

AltDeep is a newsletter focused on microtrend-spotting in data and decision science, machine learning, and AI. It is authored by Robert Osazuwa Ness, a Ph.D. machine learning engineer at an AI startup and adjunct professor at Northeastern University.

Last week…

I noticed interesting posts about (links below)

Weight-agnostic neural networks
An intro to transformer networks
Fitness apps using computer vision

Also, I started preparations to teach a few data science grad students on a special topic — causal modeling in reinforcement learning.

Upon reflection on this topic, I’m making a bet: causal reinforcement learning will be the AI killer marketing app within the next ten years.

Background on RL and causal modeling

Reinforcement Learning is concerned with how software agents ought to take actions in an environment in a way that maximizes some notion of cumulative reward. Causal modeling is the building of models that can explicitly represent and reason cause and effect (the recent popular science book The Book of Why provides an accessible introduction to the topic).

In recent years, the machine learning research community has expressed growing interest in both fields. This interest in reinforcement learning has been fueled by significant achievements in combining deep learning and reinforcement learning to create agents capable of defeating human experts. Prominent examples include the ancient strategy game Go and team-based competitions in the fantasy computer game Dota 2. Some think deep reinforcement learning is the path to generalized AI.

How causal modeling fits in

Speaking of playing games, human agents "play" the "game" of living life by forming causal models of their environment. These are conceptual models (“this is a baseball, that is a window”) with causal relationships between objects (“if I throw the baseball at the window, it will shatter”). Causal models allows us transfer knowledge to new unfamiliar situations (“I bet if I throw this weird new hard heavy thing at that strange new brittle glassy thing, it will also break”).

Humans reason with these models when deciding what actions to perform and not to perform. Have you ever thought about your actions in a given situation and thought, "had I done things differently, things would have turned out better." That is called counterfactual regret, and it is a form of causal reasoning. You use a model of cause-and-effect in your head to mentally simulate how things would have played out if you had made different decisions. Counterfactual regret is the difference between this simulated would-be outcome and the outcome that actually played out. You are making use of powerful cognitive machinery when you make decisions you believe will avoid regretful outcomes based on causal reasoning about past decision-making.

Some game playing agents, such as the one that recently defeated human poker experts in no-limit Texas hold'em, do a brute-force version of minimizing counterfactual regret by simulating millions of games. This relies on far more experience and computing resources than a human player.

A lack of practical tools

I’m focused on implementation. While there are endless deep reinforcement programming tutorials online. However, there is a lack of practical case studies on incorporating causal modeling in programming reinforcement learning. There is a bit of research (e.g. schema networks by Vicarious AI), but I haven't actually seen any code. For example, what is the best way to combine programming abstractions in causal modeling such as Pearl’s do-operations with dynamic programming in reinforcement learning? My hope is that bringing the theory and methods of causal modeling to bear on programming RL may lead to building useful programming abstractions for building better RL agents, particularly those who can deal with unfamiliar situations with only a few real or simulated experiences, just like we humble humans.

A bet on automating decision science

I say I’m “betting” on this instead of making a prediction because bets are predictions with skin in the game. I’m working on this problem when I could be working on deepfakes or transformer networks (both highly worthwhile, lucrative, and somewhat horrifying pursuits) because I think this will pay-off.

I see something of an arbitrage opportunity. Commercial applications of deep reinforcement learning are virtually non-existent, excepting for the massive amounts of money dumped into OpenAI and Deepmind so they can play video games (nothing wrong with that).

However, the core theory in RL is the same theory that powers various elements of decision science; sequential experimentation, optimization, decision theory, game theory, auction design, etc. Applying these branches of theory to decision problems is precisely where data scientists have had the most impact, especially in tech.

Causal modeling is the thread that connects all of these fields. It even brings in computational cognitive psychology, where one models the causal models humans have in their heads. This allows one to model human irrationality, like cognitive biases and fallacies, from data on how their behavior. In my eyes, advertisers spend a great deal of time getting people to spend their money irrationally.

I think causal reinforcement learning will be the holy grail of marketing. Imagine a Black Mirror episode where they could spin up a cognitive clone of you in a digital pocket world and run a million simulations testing what products you'll buy. It'd be like that.

Ear to the Ground

Curated posts aimed at spotting microtrends in data science, machine learning, and AI.

Google AI blog post on exploring deep net structure

The Google AI blog published an interesting post on “weight agnostic” neural networks. The approach starts with composable network structures known to have certain “inductive biases” — meaning a bias that prefers one interpretation of a data over another one. Google AI expanded this approach by automating the composition of structures into a larger structure particularly suited to a prediction task. The composed structure is “weight agnostic” because the network weights can be random and still achieve good performance.

Background: Deep learning works as a graphically organized sequence of mathematical operations. Weights determine how numerical values are combined as they propagate through the graphical structure, such as the one pictured below. Here, the “add” nodes compute a weighted combination of incoming mathematical values from the parents.

“Training” a deep net means optimizing the wights, such that for a given input, the difference between calculated output and the actual output of minimized. Large datasets and computing resources are spent training deep nets with millions of weights.

A recent strand of research has instead focused on optimizing the structure while letting weights be mostly random. These results highlight how some structures are more suited to some problems than others.

Why this is cool:

This feels a lot more like how natural neural networks work. Animals can perform complex sensory tasks at birth without training. There may be some as yet undiscovered genetic mechanism that encodes pre-trained weights. However, it is more plausible that genes encode useful structures that can perform these tasks at birth, and then the weights are fined-tuned (and connections rewired) with learning and experience.
An old argument in the machine learning community is whether models should be represented with explicit logic or with black-boxy circuits as with deep learning. This work casts this as a false dichotomy. We can use basic structures with known inductive biases as primitives, and assemble them into complex circuits.

References:

Exploring Weight Agnostic Neural Networks — Google AI Blog

Nice blog post on transformer networks in NLP

Apropos of the above post, a composable structure that has a strong impact on natural language processing problems is the attention mechanism. Attention mechanisms allow focusing on parts of the input data that are more important. Attention reminds me of “chunking” in psychology; as an inductive bias it says some chunks are more meaningful than others in terms of the prediction task.

The Scaleway blog shared a quite readable

Natural Language Processing: the age of Transformers — Scaleway Blog

AI Long Tail

Pre-hype AI microtrends.

Bookkeeping as a potential case study for AI-powered SaaS

The vast majority of US businesses are small businesses, and the vast majority of them outsource their bookkeeping. 2016 saw a trend in bookkeepers and accountants setting up Wordpress websites and moving into fixed-fee billing, essentially creating an entire generation of basic SaaS apps. These services used Internet bookkeeping software vendors Quickbooks Online (Intuit) and Xero as their underlying platform, with human employees still doing manual work. Now Intuit and Xero are aggressively deploying machine learning automation of the more repetitive and human error-prone bookkeeping use cases. These are the tedious workflows that made it attractive for small businesses to outsource in the first place. Having used Intuit's products, I don’t imagine it is nimble enough to create genuinely game-changing AI-powered SaaS (I have no opinion on Xero). Instead, I suspect new companies such as Botkeeper, who have Xero and Quickbooks integration, exemplifies a potential new AI-powered bookkeeping SaaS model.

Onyx app uses image rec to count your reps

With last week’s news about Peloton’s lead up to IPO, people are thinking again about clever ways to link fitness and tech. Onyx and Pivot provide a straightforward application of computer vision to the simple problem of tracking reps.

I had a job once where employees were incentivized to use Fitbit. I recall being frustrated that coworkers’ time spent meandering around shopping malls got more points than my time spent at the gym, where my reps weren’t counted as steps. Beyond my personal experience, Xbox Fitness and Kinect have already validated this use-case in the market place. Onyx and Pivot are just tweaking it.

The emergent model for building these apps is to build the computer vision algorithms into hardware such as iPhone’s TrueDepth and Intel’s RealSense. Then, provide developers with SDKs on top of that stack. The SDK makes it easy for developers to build apps without having to be experts in machine learning.

Thanks again for reading. I’m trying to double the subscriptions this month. With enough subscribers, I can write more in-depth and well-researched posts and reports. So please subscribe or forward to folks who would find this content interesting.

Altdeep.ai Newsletter

Discussion about this post