Don't model the poem, model the poet
Huawei's AI-poet can fool experts but it can't tell you why, when, or where
AltDeep is a newsletter focused on microtrend-spotting in data and decision science, machine learning, and AI. It is authored by Robert Osazuwa Ness, a Ph.D. machine learning engineer at an AI startup and adjunct professor at Northeastern University.
Last week…
I read a lot but didn’t see much worth posting. So I devoted some extra effort to critiquing Huawei’s AI lab’s use of a cutting-edge language model to generate classical Chinese poetry. My opinion is in line with Gary Marcus and Ernest Davis’s opinion that we need more focus on building AI that understand time, space, and causality. Their NYT opinion piece on this subject is summarized below.
Ear to the Ground
How to build trustable AI
Gary Marcus and Ernest Davis a writing a good deal these days, as an effort to promote their new book Rebooting AI: Building Artificial Intelligence We Can Trust (which I’ve not yet read).

Marcus and Davis point to recent AI failures of social import, such as racially-biased facial recognition, and boil it down to “we can’t trust these algorithms.” I think that lacks a bit of nuance.
However, they go on to argue that the solution is to build AI that can grasp the concepts of time, space, and causality. That is a fine way to frame the problem.
For the concept of time, they give the example of asking a voice assistant “What type of computer did George Washington have?”, the AI should be able to answer based on the reasoning that George Washington predated computers. A human child today might not grasp this reasoning since both Washington and the invention of computers predate their existence. However, they would seize on temporal logic well enough if we switched “computer” for a technical artifact introduced within their lifetime.
For space and causality, they give the example of a child observing the sharpened raised edge of a cheese grater and being able to intuit how it produces grated cheese. Deep learning might link these two items with many examples, but it wouldn’t be able to extrapolate to a question like “what if the holes were triangles instead of circles?”
How to Build Artificial Intelligence We Can Trust — New York Times
How to de-hype an AI press release
At risk of coming off as their cheerleader, here is another quick post by Marcus and Davis that you should read before moving on the next section
Signals from China
Huawei’s Noah’s Ark AI lab makes deep fakes of classical Chinese poetry
Jeffrey Ding of the ChinAI recently wrote about how Huawei’s Noah’s Ark AI lab’s used a Generative Pretrained Transformer (GPT) architecture to generate classical Chinese poetry. See this post on Floydhub for an excellent explainer of the GPT deep net architecture.
Let me first get the politics bit out of the way. From a political perspective, this result demonstrates that a Chinese company can organize the internal processes required to get one of these difficult and expensive to train models performant on a new task. That’s all I’ll say about that.
I speak, read, and write Mandarin Chinese as a second language. Incidentally, I also dabble in poetry. However, though I can read the poems generated by this model, I’m not qualified to critique them (I can recite precisely one Li Bai poem from memory) — this is a study to which some people devote their lives.

In his post, Ding showed four poems and challenged readers to guess which one was written by a human. I guessed wrong, so did my wife, who is a native speaker.
I suspect the most impressive part element of this works is the generator’s ability to reproduce the poems’ form requirements.

We surprisingly observe that although we did not explicitly feed the model with any rules or features about classic Chinese poetry, such as the number of characters, rhyming, tune patterns and coupling, the model is able to generate poems that automatically meet these rules very well for the tens of forms.
This is impressive. Now let’s move on.
At this point, we now know that when we throw these kinds of deep nets at any given language problem, their millions of parameters will essentially memorize the complex empirical joint probability distribution of word frequencies in the corpus. If you can tune your model well, this guarantees impressive results in natural language processing tasks. Great, now that we know we can brute-force the problem, let’s move on to models that approach something like understanding.
It is fitting that Huawei focuses on classical Chinese because this is a real-life example of the philosophical problem of the Chinese Room. It is easy to beat a Tang poetry Turing test even with Tang poetry scholars if you throw a bunch of compute at memorizing the bulk of Tang poems on record.
Researchers Nivan and Kao dive deep into this issue in a recent paper investigating another cutting-edge language model from Google called deep bidirectional transformers or BERT (The Gradient has a fine write-up on this work). BERT can also be used for text-generation and relies on the same core ideas as GPT (Floydhub’s explainer has a section contrasting the two). They use BERT for a classification task and initially get amazing results. Then they show that merely removing statistical cues that have no meaningful connection to the target class causes BERT to perform no better than random. The complex joint probability these deep nets learn are do not represent anything meaningful in semantic terms.
And that is fine in many contexts, but the whole point of poetry is to create models of meaning with words.
I don’t want to come off as a deep learning naysayer. BERT and GPT are major advancements in my view because they show what can be done with brute force. Now that we know we can do it, we want to know what can be done with models more sparsity and less Kolmogorov complexity, explicit conceptual hierarchy, domain transferability, etc.
Another way of saying this is that this poem synthesis is highly sophisticated in mixing and matching elements of the poems in its training data. To borrow an idea from pop music, it is producing remixes and mash-ups.
Don’t get me wrong, remixes and mash-ups are useful! Remixing the familiar is why pop music is, well, popular. Further, there are some promising business models for using other generative machine learning techniques to remix elements of a big training dataset. For example, sophisticate remixing of actual faces to create new photo-realistic faces could disrupt the stock photo industry.

However, in the creative arts, we make a distinction between remixing and producing truly novel creations. GPT cannot produce novelty relative to its training corpus — this is not me gate-keeping creativity for humanity; it is a mathematically provable statement.
Apropos of the aforementioned article by Gary Marcus and Ernest Davis, GPT cannot model time, space, or causality. Huawei’s AI poet can tell you what, but not why, when, or where — things one would think are essential in poetic composition. Nor could it, as a friend of mine put it, compose a poem in the Tang style and form about the rolling red dunes of Mars — something a Tang poet could do if they learned what Mars was and had some photos of the Martian surface.
Don't model the poem, model the poet
Here is an AI challenge for companies in China or anyone else who wants to shoot for ground-breaking innovating in generative machine learning.

Portuguese poet and writer Fernando Pessoa developed the concept of a heteronym. A heteronym is a fictional character with her own set of life experiences, political views, daily habits, acquaintances, etc. The author creates a character and then writes pseudonymously as the character, using the character’s fictional background as a kind of literary Instagram filter. In other words, Pessoa didn’t create poetry so much as he created models of poets.
When Pessoa wrote poems through his poet constructs, it was a manual generation task, where he mentally imagined “what would this character write?” and then wrote it. Imagine being able to create some computational model of a Tang poet, train that model on a curated corpus of Tang poems, and then generate the poems automatically? What if poets themselves could create these computational heteronyms? In my view, this would be far more exciting than further illustrations of spending corporate money on another language modeling exercise.
Liao, Yi, et al. "GPT-based Generation for Classical Chinese Poetry." arXiv preprint arXiv:1907.00151 (2019).
ChinAI #66: Autumn Chrysanthemums on a Bridge — ChinAI Newsletter
Niven, Timothy, and Hung-Yu Kao. "Probing neural network comprehension of natural language arguments." arXiv preprint arXiv:1907.07355 (2019).
NLP’s Clever Hans moment has arrived — The Gradient
100,000 free AI-generated headshots put stock photo companies on notice — The Verge
Thanks again for reading. I’m trying to double the subscriptions this month. With enough subscribers, I can write more in-depth and well-researched posts and reports. So please subscribe or forward to folks who would find this content interesting.