Sorry MIT and DARPA, Minority Report for data science will not work
Plus a new Signals from China section
AltDeep is a newsletter focused on microtrend-spotting in data and decision science, machine learning, and AI. It is authored by Robert Osazuwa Ness, a Ph.D. machine learning engineer at an AI startup and adjunct professor at Northeastern University.
Announcements
Signals from China
The importance of developments in China’s AI research, AI industry, as well as China’s AI-related investments and political events are evident. So I’m devoting a whole section to it.
~ Robert
Ear to the Ground
Curated posts aimed at spotting microtrends in data science, machine learning, and AI.
Machine Learning teams have a people problem
This article surprised me because I can’t remember any essay that articulates such an obvious problem so well…
It articulates a spectrum, where on the left you have the AI startups whose acquihires you have probably heard about — basically teams of academics doing VC-funded research. On the other hand, there are a bunch of “.ai” startups ran by product managers and tech leads experienced with iterating and shipping quickly and getting feedback from the marketplace — a process that is biased against building risky AI features that create a lot of value.
Machine Learning teams have a people problem — Launchpad
On how recommendation algorithms stifle musical creativity
Initially, the value provided by subscription content companies like Spotify and Netflix was to provide access to a library of content and use recommendation algorithms to offer a type of personalized curation of that content.
But then, the content creators, seeing that these algorithms determined their commercial success, start creating content tailored specifically to being favored by these algorithms. Even Spotify and Netflix themselves are using data science insights from their own recommendation-driven usage data to drive decisions to create new content for their platform.
This article convincing argues this is a vicious cycle leading to the creation of low-risk, sterile content that reinforces our tastes, rather than driving us to expand and evolve our tastes. “There are pitches for specific playlists more often now. This didn’t happen even a year ago.”
The Automation of Creativity — Towards Data Science
Sorry MIT and DARPA, Minority Report for data science will not work
MIT made a PR splash last week with a Northstar, a GUI for data science and machine learning that invokes Minority Report and Tony Stark’s work stations. It is supported by a DARPA initiative to build tools where subject matter experts can do data science and machine learning without data scientists.
Perhaps there are some useful contributions here. But in my not-so-humble opinion is that this is folly for two reasons:
This tool has a learning curve — why spend time learning this when you could learn the same exploratory data analysis workflows in R, Python, or Julia?
Edge cases that broke the UI would block the task, and you’d have to start from scratch in R, Python, or Julia. Why not just work with one of R, Python, or Julia, so you can rely on StackExchange for the inevitable roadblocks, and a large open source community to create more user-friendly tools over time?
The most time-consuming part of data science is “data wrangling”, the process of transforming and reshaping data to suite some downstream analysis. Merely having a library of preprocessed data sets that can be dragged and dropped onto a dashboard is woefully insufficient.
Watch the video and let me know your opinion. I’m an expert, and this video was designed to show how intuitive the system is. And yet I had to replay parts of the video several times to understand elements of the workflow.
Data-Driven Discovery of Models (D3M) — DARPA
Signals from China
Reports on the developments in China’s AI landscape.
As mentioned here and elsewhere, China is deep into the implementation of a machine learning-powered Orwellian dystopia targeting the Uighur minority in Xinjiang province.
A first-hand account of a Uighur graduate student’s experience with this system in a recent episode of Planet Money offers insight into the technical depth of this system. According to the interview, it includes creating a detailed profile of each Uighur resident, including facial images with various expressions and angles, and voice recordings, presumably for use in future identification with facial and voice recognition software.
Both public spaces and private residences have video and audio surveillance. A tracking app stored on people’s phones has gait recognition software, and records of an individual’s daily habits. It heuristics for that red flag behaviors deemed suspicious, such as leaving one’s home through the backdoor and putting gas in someone else’s car.
Episode 924: Stuck In China's Panopticon — Planet Money
AI Long Tail
Pre-hype AI microtrends.
Technical debt in machine learning systems
This week saw discussion of a 2015 paper that provides an in-depth analysis of the unique kinds of technical debt that plague the development of software with machine learning components.
I add it to AI Long Tail because of its relevance to building AI startups, and because it is a practical antidote to hype. Here are a some high-level points from the paper:
ML technical debt can be difficult to detect because it exists at the system level rather than the code level. Typical methods for paying down code level technical debt are not sufficient to address system-level debt.
Software engineering practices favor modularity and abstraction, which maintains a logical consistency in inputs and behavior of a module. In contrast, use of machine learning is actually motivated by cases where it is not feasible to write code that structures inputs and behavior logically.
“Undeclared customers” introduce unknown dependencies between prediction outputs and other parts of the system that lead to problems when the predictive model is changed or updated.
Feedback loops abound, such as when the predictions of supervised learning algorithms influences influences its own future training data.
Sculley, David, et al. "Hidden technical debt in machine learning systems" Advances in neural information processing systems. 2015.
Data Sciencing for Fun and Profit
Data-Sciencing for Fun & Profit examines emerging trends on the web that could be exploited for profit using quantitative techniques.
Style-transfer in a weekend
Deep style-transfer take the content from one image and applies an artistic style from another image (typically a work of art). The article linked below provides the best practical tutorial for building a style-transfer architecture that I’ve seen to date. Importantly, in uses weights imported from a highly performant model.
Intuitive Guide to Neural Style Transfer — Towards Data Science
AltDeep is a weekly curated newsletter focused on microtrend-spotting in data science, machine learning, and AI.