Data science SaaS as a lifestyle business - Is this a thing? (Really, I’m asking.)
Plus GAN erotica, Hilary Mason and self-funding, and Gödel
AltDeep is a newsletter focused on microtrend-spotting in data and decision science, machine learning, and AI. It is authored by Robert Osazuwa Ness, a Ph.D. machine learning engineer at an AI startup and adjunct professor at Northeastern University.
Dose of Philosophy: Or, A Primer in Sounding Not Stupid at Dinner Parties: Gödel’s theorems and implications to AI
Ear to the Ground: A novel new model of intelligence and a new blockchain-based way crowdsourcing predictions
AI For the Rest of Us: Hilary Mason’s Fast Forward Labs
Data-Sciencing for Fun & Profit: GAN generated album covers and erotica
The Essential Read: Data science SaaS as a lifestyle business - Is this a thing? (Really, I’m asking.)
Trending items worth knowing about but not really worth reading about.
A total of 774 papers got accepted for ICML 2019 out of 3424 initial submissions (22.6% acceptance rate).
Some guy named Olli Niemitalo seems to have blogged a proposal for a generative adversarial network architecture back in 2010, four years prior to Ian Goodfellow’s seminal paper. Here is a sketch he included in the post. I’m linking Olli’s blog because its so baller, but I’m not linking the debate about whether or not he or Goodfellow should get the credit.
Dose of Philosophy: Or, A Primer in Sounding Not Stupid at Dinner Parties
Gödel’s incompleteness theorems and the implications to building strong AI
Gödel’s incompleteness theorems provide the most profound results in mathematical logic.
In modern logic, it is possible to express arithmetical statements, for example, “Given any numbers x and y, x + y = y + x”.
An axiom is a statement that is taken as true, such as “A probability is a real number greater than or equal to 1”. Axioms serve as premises for further reasoning.
Several axioms taken together form an axiomatic system (e.g., the axioms of probability), from which one can prove many arithmetical truths (e.g., Bayes rule).
The question Gödel addresses is whether one can use an axiomatic system to prove all arithmetic truths without being inconsistent— where inconsistent means the system produces contradictions.
Gödel proved that for any consistent axiomatic system, there are truths that cannot be proven by that system.
This suggests that there is a boundary on what mathematicians can know.
Implications to AI. One of the goals of AI research is to achieve “strong artificial intelligence”, meaning human-level general AI. Currently, we build AI as algorithms in Turing machines, which are consistent axiomatic systems and therefore subject to the theorem. Roger Penrose and J.R. Lucas argue that human consciousness transcends Turing machines because human minds, through introspection, can recognize their own inconsistencies, which under Gödel’s theorem is impossible for Turing machines. They argue that this makes it impossible for Turing machines to reproduce traits of human minds, such as mathematical insight.
On the radar
Causal entropy maximization — a new way of thinking about intelligence.
This one came to me from my friend Matt B. Its some truly novel research that proposes intelligence evolved as a result of the impulse try and control as much about the environment as possible. This causal entropy maximization could be used to train RL agents without explicit goals.
Hedge fund announces a new blockchain-based way crowdsourcing predictions
Numeria is a hedge fund that uses a blockchain-based protocol that allow data scientists to create, submit and stake prediction theses in a completely decentralized way. The extend their platform to the public with a new protocol called Erasure. Erasure allows one to verify the long term performance of a predictor, and provides cryptocurrency rewards for good predictions and penalties for bad predictions.
AI for the Rest of Us
In this issue (below) I address data science-based SaaS companies. I have it on good authority (though I don’t know for sure) that Fast Foward Labs, founded by Hilary Mason and later acquired by Cloudera, was entirely self-funded. Her self-funding is especially interesting given she was data scientist-in-residence at VC Accel Partners. Their business model was selling subscriptions to their research reports and from what I heard this was enough revenue to support their staff of around seven employees.
Data Sciencing for Fun and Profit
The Essential Read
Data science SaaS as a lifestyle business - Is this a thing? (Really, I’m asking.)
First, let's define terms. The term "lifestyle business" means different things to different people. Here I use the standard (pejorative) definition used in the tech industry, which is "a tech startup not chasing VC money."
Perhaps a better term is "location independent business," which I define as a lifestyle business that relies entirely on some collection of Internet services to solve their marketing needs, eschewing business models that depend on physical proximity to the customer.
These terms suck because they are defined in terms of a negative, like naming a domesticated feline a "not dog" instead of "cat." Despite this, choosing this self-funded model over the VC-funded model has important implications, namely:
The self-funded model optimizes revenue, not KPIs important to the next round of funding such as market share and PR.
The self-funded model has the ability to target niches that are too small to be of interest to VCs but could still make one a millionaire.
Let's define a "lifestyle-SaaS business" as a location independent SaaS business. For examples, look at the SaaS listings on the Empire Flippers website — Empire Flippers is a company that acts as a broker for entrepreneurs selling location independent businesses.
Finally, let's define "DS-based lifestyle-SaaS business" as a lifestyle-SaaS business where some fundamental element of its service pipeline is DS or ML-based. This category includes but is not limited to AI as a service companies (AIaaS).
So more specifically, is DS-based lifestyle-SaaS business at thing?
Are data scientists joining the SaaS startup movement?
There is something of a SaaS movement. Perhaps it was first articulated by 37signals in their 2006 book Getting Real. It gathers around podcasts like Startups for the Rest of Us and conferences like Microconf. I imagine it is comprised mostly of people who tire of the culture of the Bay Area and move to Austin or Boulder or Chiang Mai.
I am dying to know if there is a significant number of data scientists/machine learning engineers have joined this movement and applying their craft to building SaaS businesses?
Why the answer is not clear — sampling bias
Sampling bias. You only ever hear of AI startups that get seed funding or VC-funding getting such funding is considered newsworthy.
With the exception of AIaaS, AI is not the final product. If the focus is on revenue and not attracting more investment, then your marketing communicates business value and makes minimal or no mention of any AI secret sauce.
Why the answer might be yes — automation on steroids
A much-vaunted model for turning a consulting business into a SaaS business is to:
Identify patterns in consulting projects that are repeatable across customers and generate revenue
Build out that workflow using human labor and validate it with customers
Automate as much as that workflow as you can with software
Andrew Ng has a quote that "AI is automation on steroids." The logical extension of these ideas is to take some data science consulting workflow, automate it with ETL and an algorithm, and plug the outputs into some downstream pipeline that delivers the service to the customer.
Why the answer might be no — economics
Three economic issues suggest the answer might be no.
Big tech has all the data. Conventional wisdom has it that the algorithm doesn't matter, its the data that matters. Big tech companies have all the data, creating a massive barrier to entry in this space. This is true so long as the problem is prediction with a complex and high-dimensional features, although the model of creating tooling for other people's data has been validated by companies such as Kafka, Pachyderm, Alpine Data Labs, DataRobot, and Confluent. Companies like Figure 8 and Tamr validate the model of cleaning other peoples data. However, these are all venture-funded companies; I'm not sure if this is a viable model for a DS-based lifestyle SaaS business.
Big tech and VC money pay too much. The big tech companies, as well as VC money trying to build the next big tech company, pay a lot of money to data scientists and machine learning engineers. High salaries lead to a high opportunity cost of starting one's own business, as well as the high costs of partnering with or employing other data scientists.
Hard to get a low enough price point. My impression is that too build a lifestyle SaaS business one needs a service offering that costs under a few hundred dollars in revenue a quarter. At that price point, you can focus on digital marketing to drive online conversions. Anything above that threshold requires an enterprise sale, which takes you out of lifestyle SaaS land. Even in the case of a fully automated data science SaaS workflow, you would still need data scientist/data engineer person-hours to maintain that pipeline. Given the cost of employing this talent, I suspect it is hard to get a subscription price tag under that threshold in a sustainable way.
In search of an answer
I am in search of an answer to this riddle. In the above AI for the Rest of Us section, I mention Hilary Mason's Fast Forward Labs. From my personal experience, I suspect there might be things happening in the intersection between natural language processing and digital marketing, such as the creation of bespoke chatbots. But I have more to learn.
If you have insight, please ping me on Twitter @osazuwa.