Has anyone noticed a lot of ML research into facial recognition of Uyghur people lately?

plus Google Clound explains secret to successful deployment of AI, and how to be as good a data scientist as George Soros

AltDeep is a newsletter focused on microtrend-spotting in data and decision science, machine learning, and AI. It is authored by Robert Osazuwa Ness, a Ph.D. machine learning engineer at an AI startup and adjunct professor at Northeastern University.

Overview:

  • Ear to the Ground: Uyghur facial rec, an ML Chinese wall, and the need for ML experimentation skills

  • AI Long Tail: Head of Google AI Cloud identifies key characteristic of deployment of ML in industry

  • Data-Sciencing for Fun & Profit: AI-powered Cards Against Humanity

  • The Tao of Data Science: How Karl Popper can make you as good a data scientist as George Soros

Ear to the Ground

Curated posts aimed at spotting microtrends in data science, machine learning, and AI.

“Has anyone noticed a lot of ML research into facial recognition of Uyghur people lately?”

This question, asked on the machine learning subreddit and bounced around on Twitter, had a huge impact this week.

The comments in Reddit and on Twitter are worth the read.

Context. An impactful Pro-Publica article, as well as controversy surrounding Amazon selling a facial recognition software to police departments (later shown to erroneously match a number of black members of Congress as people with mugshots), has sparked a debate in the broader machine learning community about how dangerous a threat algorithmic bias presents, particularly when it comes to computer vision and people of color. Around the same time these stories came out, stories of ethnically-targeted policing and internment camps in China’s Xinjiang province, the historic home of the Uyghur ethnic group, started gaining traction in the West.

The debate about racial injustice in ML-powered policing becomes much less hypothetical when China’s ML community is publishing papers about automated racial profiling targeted an ethnic group the country is accused of systematically oppressing.

Are we headed towards an ML Chinese wall?

The above story suggests how differences in Chinese civic sensibilities from the West, as well as Chinese eagerness to deploy ML tech in surveilling and policing their people. To be clear, many members of Western governments and police forces are also advocating for such Orwellian policy developments; they just tent to suffer more legal restrictions and resistance from civic institutions than their Chinese counterparts.

Another movement in this direction occurred last week when journal publisher IEEE banned employees from Huawei from participating in the editorial and peer review process, then lifting that ban a short time later.

New essential skillset: managing ML experiments

When it works, deep learning works well. To get it to work well, you must run experiments designed to identify optimal hyper-parameter settings.

Generalizing from DL, modeling approaches that work with large, multidimensional data sets (e.g., probabilistic programming, Bayesian nonparametrics) also require experimentation to find optimal settings for some aspect of the modeling process.

There is a growing crop of tools and businesses aimed at managing ML experimentation, such as MLflow, ModelChimp, and Pachyderm. However, just as making use of Github for version control requires one to learn Git, making use of these tool requires learning a new skill set.

AI Long Tail

Andrew Moore, head of Google AI Cloud, argues that successful deployment of AI in industry depends on solving a problem in a way that simply would not be possible without AI.

In my view this argument is runs counter to a strategy undertaken by many AI startups, where one attempts to build a business with human-in-the-loop data science solutions, and then tries scale the business by automating humans out of that loop.

He provides four Google Cloud AI case studies in support of his argument:

  1. Energy company combines drones and Clouds AutoML computer vision technology for inspection of wind turbines.

  2. Real Estate firm enables home buyers to automatically search listing photos for specific features like “granite countertops.”

  3. The New York Times is using AI to scan and analyze images and words in thousands of its archived photos from its long history.

  4. Financial Services firm HSBC is using AI to detect suspicious activity indicative of fraud by screening customer data against publicly available data.

Data Sciencing for Fun and Profit

Experimentation with GPT-2: AI-generated Cards Against Humanity

There is a rising crop of experiments with GPT-2, including alternative endings to Game of Thrones, alternative religious scripture, fan fiction, and fake twitter accounts.

This week saw a GPT-2 powered Cards Against Humanity web app, that lets you play against an AI.

Open AI didn’t release full GPT-2 ostensibly because of potential malicious applications. Malicious potential suggests there is potential to monetize the technology as well, such as in marketing and content-generation.

The problem with monetizing GPT-2 of course is the amount of Internet NSFW content that went into the training data. Play this Cards Against Humanity game, and you will see AI-generated cards that are much more offensive than that of the original game.

This of course could be remedied by filtering the generated language, or by paying cheap labor to curate the training data.

Tao of Data Science

How Karl Popper can make you as good a data scientist as George Soros

Popper’s views on “falsifiability” and how to build better machine learning models

The Tao of Data Science is a column I write that explores how centuries of philosophers have been tackling the key problems of machine learning and data science. I include snippets and links to this column in this newsletter.

Karl Popper is best known for the view that science proceeds by “falsifiability” — the idea that one cannot prove a hypothesis is true, or even have evidence of truth by induction (yikes!), but one can refute a hypothesis if it is false.

If Popper were a data scientist…

Suppose Popper was a modern data scientist and needed to implement a machine learning solution to predict some phenomenon of interest. Given his philosophy of science, how would he have proceeded to implement his model?