ChatGPT: The future of attribution? (Part 1)
Evaluating ChatGPT on a hard and useful task.
First, I promise you that none of this post was generated by OpenAI’s ChatGPT.
By now, you’ve heard of OpenAI’s chat AI ChatGPT. There is hype around the tool, which is well-deserved because ChatGPT generates impressively coherent answers for various prompts. ChatGPT is the first chatbot I’ve seen that doesn’t suck. And I once worked on chatbots.
Being impressive is different from being useful. By “useful,” I mean someone can build useful software on ChatGPT’s API when it becomes available. To do that, it must perform that task consistently across various inputs.
I’ve been trying to find tasks where ChatGPT meets this definition of “useful.”
In this post and the next, I focus on the attribution task.
“Attribution” is my preferred word for looking at an effect and figuring out what caused it.
Examples of practical attribution questions:
Why did this customer buy this product? Was it because of the promotion? Or would they have bought it anyway?
Trying to figure out why a subscriber unsubscribed. Would an incentive to stay have helped?
What is the source of this bug in this software?
Why did the manufacturing line shut down?
In my view, the concept of “attribution” is more or less the same as:
What causal inference researchers call “actual causality.”
What cognitive scientists call “causal judgment”.
What engineers call “root cause analysis.”
What reinforcement learning researchers call “credit assignment.”
Why should we care if ChatGPT can do attribution?
I believe attribution is one of the biggest unsolved empirical problems in business. People would pay to know why something is not going the way they want it to and what they need to do to make it do so. But, unfortunately, standard empirical attribution measures (e.g., Shapely values, multitouch attribution, etc.) just don't cut it.
Imagine if a tool like ChatGPT could find the root cause of an error that occurred on an assembly line by reading the machine logs.
That would be a powerful tool.
What’s so hard about attribution?
Attribution requires causal counterfactual reasoning
The goal of attribution is to attribute outcomes to their causes. That requires a causal understanding of the world, which is arguably not possible from a bunch of text data.
Attribution depends multiple causes, some of which are unknown.
A man dies after an altercation by the police. The police put him in a strangle hold. He also had a heart condition. The stranglehold should not get the same amount of blame as the heart condition. Further, the stranglehold and the heart condition could have combined to cause the death. Further still, there could be unknown causes like unknown injuries sustained during the altercation.
The AI somehow has to deal with all of that.
Attribution requires knowledge of normality
Attribution depends on the normality of the causal factors. This can mean social expectations of what’s normal, as in “police officers should not use strangleholds.” An alternative definition is statistical regularity, as in “having heart disease is far more common than being placed in choke holds.”
Attribution is subjective
Some people blame the police for being unnecessarily violent. Some people blame the slain man for not complying with the police. It depends on how much agency you give the police and the man in the account of what happened.
Attribution is hard to validate
If you blame the police’s use of a chokehold, you might suppose that had the police not used the choke hold, the man wouldn’t have died. But the parallel universe where the chokehold didn’t happen is not observable.
Can ChatGPT attribute well enough?
These are hard problems, from a theoretical standpoint, I’m confident ChatGPT can’t solve them. ChatGPT does not understand how the world works and that’s a prerequisite for causal attribution.
But unlike skeptics like Gary Marcus, I’m an optimist. For some downstream tasks, you just need to get close.
Maybe ChatGPT can get close. Maybe it can’t tell me exactly where the root cause is, but it can recommend some good choices.
Or it can get close enough to make downstream tasks work. Maybe it can’t tell me why a particular individual named John Smith unsubscribed from my company’s subscription service, but perhaps it can help me identify other customers who are like John Smith before they churn.
Tools like ChatGPT are extremely good at interpolating between the Internet content it used as training data. Perhaps the bulk of attribution problems one encounters in practical settings is like pop music - just a mash-up of elements from countless similar attribution problems that one can readily find online.
1. Can ChatGPT understand counterfactual dependence?
A naive view of attribution focuses on counterfactual dependence. Event B is attributed to Event A if Event A had not happened, Event B would not have happened. Let’s test if ChatGPT gets us this far.
Robert: Mike, Jack, Ben, Susan, and Peter are at a party. There was an open bottle of beer on the table. Mike is moving through the room and accidentally bumps against the table. This caused the bottle to fall. How responsible is Mike for the bottle falling?
ChatGPT 🤖: Mike is probably partially responsible for the bottle falling. It sounds like he bumped into the table and caused the bottle to fall, so he played a role in the incident. However, it is also possible that the table was unstable or the bottle was already precarious, in which case Mike's actions may not be the sole cause of the bottle falling. Ultimately, determining the exact level of responsibility would depend on the specific circumstances of the situation.
Let’s dig in and see if ChatGPT is reasoning counterfactually.
Robert: Had Mike not bumped into the table, would the bottle have fallen?
ChatGPT 🤖: It is impossible to determine with certainty what would have happened if Mike had not bumped into the table. If the bottle was already precarious and was going to fall no matter what, then Mike's actions may not have been the cause of the bottle falling. However, if the bottle was stable and would not have fallen without Mike's intervention, then he would be more directly responsible for the bottle falling. Ultimately, determining the exact level of responsibility would depend on the specific circumstances of the situation.
This is a win as well. It displays counterfactual reasoning. It also describes uncertainty in that counterfactual logic, which is a bonus.
2. Can ChatGPT deal with “double prevention”
The problem with counterfactual dependence is that there are many cases where we humans would still attribute a cause even if there is no counterfactual dependence. “Double prevention” is one of these cases.
My follow-up question to ChatGPT extends the bottle-falling example to “double prevention.”
Robert: Suppose that Jack saw that the bottle was about to fall, and he reached out to catch the bottle. But just as Jack was about to catch the bottle, Peter bumped into Jack, which caused him to miss the bottle, which then fell to the ground. Who is more responsible for the bottle falling to the ground? Mike or Peter?
This is “double prevention” because Peter prevented Jack from preventing the bottle from falling. Counterfactual dependence holds; had Peter not bumped Jack, the bottle wouldn’t have fallen. But studies show that most people would not give as much attribution to Peter as to Mike.
Let’s see what ChatGPT says.
ChatGPT 🤖: It is difficult to determine with certainty who is more responsible for the bottle falling to the ground in this scenario. Both Mike and Peter played a role in the incident, and their actions contributed to the bottle falling.
Mike bumped into the table, which caused the bottle to fall, and Peter bumped into Jack, which caused Jack to miss the bottle. Without Mike's actions, the bottle may not have fallen, and without Peter's actions, Jack may have been able to catch the bottle.
Ultimately, determining the exact level of responsibility for the bottle falling would depend on the specific circumstances of the situation and the relative contributions of each person's actions. It may be that both Mike and Peter bear some level of responsibility for the bottle falling, or one person may be more responsible than the other.
ChatGPT’s answer impressively captures the chain of causal events and it continues to use correct counterfactual logic. Everything it says is accurate. But I call this a failure. Most humans would say Mike is more responsible than Peter.
If my judgment sounds harsh, imagine a high school student taking a physics test about two trains colliding. The student could write a beautifully written long-form essay about why two trains would never be on the same track in real life. It’s an artful dodge. We can’t credit artful dodges just because they are well-worded.
3. Can ChatGPT deal with over-determination?
Over-determination is another way we still attribute causes that don’t have counterfactual dependence. For example, suppose Billy and Janet throw a rock at a window. Billy’s hits first, and the window breaks. Over-determination means that we have multiple causal events sufficient to break the window. Counterfactual dependence is absent; if Billy had not thrown the rock, the window would still have broken. Yet most humans wouldn’t say that Billy is not to blame just because the window would have broken anyway.
Robert: Suppose that right after the bottle hit the floor, Susan tripped and fell into the table. Had Mike not bumped the table first, Susan would have caused the bottle to fall. How, if at all, does this change Mike’s degree of responsibility?
The bottle falling is analogous to the window, Mike is the first rock and Susan is the second. I would argue that of we want a bot that makes attributions like a human, Mike’s responsibility should not fall.
ChatGPT 🤖: If Susan was going to cause the bottle to fall regardless of Mike's actions, then this would not change Mike's degree of responsibility for the bottle falling. Mike's actions may have played a role in the bottle falling, but they were not the sole cause.
However, if Mike's actions were the direct cause of the bottle falling, and Susan's actions were not sufficient to cause the bottle to fall on their own, then Mike's actions would be the primary cause of the bottle falling. In this case, Mike would bear a higher degree of responsibility for the bottle falling.
Ultimately, determining the exact degree of responsibility for the bottle falling would depend on the specific circumstances of the situation and the relative contributions of each person's actions.
This is almost a win. I did not want Mike’s degree of responsibility to change and it did not.
The only problem is that ChatGPT suggests that it is possible that “Susan's actions were not sufficient to cause the bottle to fall on their own.” That contradicts my premise that “Had Mike not bumped the table first, Susan would have caused the bottle to fall.”
4. Can ChatGPT deal with variations in normality?
Robert: Suppose that after Mike bumped into the table, the bottle started teetering at the edge of the table. This table was abnormally unstable, and when the bottle teetered at the edge of the table, the table tipped, causing the bottle to fall. This would not have happened if it were a table with a normal level of stability. For most tables, the bottle would have stabilized and would not have fallen.
How does this change Mike’s degree of responsibility?
According to psychological studies of human causal judgement, the blame going to Mike should fall since an unstable table is “abnormal.”
ChatGPT 🤖: In this scenario, it is possible that Mike's responsibility for the bottle falling may be lessened due to the unstable nature of the table. If the table was abnormally unstable and would have tipped even without Mike's actions, then it could be argued that the table's instability was a significant contributing factor to the bottle falling.
However, it is important to note that determining responsibility in these types of situations can be complex and subjective, as there may be multiple factors at play. Even if the unstable table played a significant role in the bottle falling, it is still possible that Mike's actions contributed to the incident. Ultimately, the specific circumstances of the situation will determine the degree of responsibility of each individual.
Robert: Suppose that reason Mike hit the table is because he was distracted by seeing a extremely rare animal in the window. Normally he would have been paying attention to where he was going. How, if at all, does this change Mike's degree of responsibility?
In this case, the degree of responsibility attributed to Mike should go down. This is probably obvious to the reader, but studies also confirm that this is how human attribution judgements work.
ChatGPT 🤖: If Mike was distracted by seeing a rare animal in the window and this caused him to accidentally bump the table and make the bottle fall, then his responsibility for the bottle falling may be lessened. In this case, Mike's actions were unintentional and were the result of being distracted by an external factor that took his attention away from his surroundings.
However, it is important to note that determining responsibility in these types of situations can be complex and subjective, as there may be other factors at play. Even if Mike's distraction played a role in the bottle falling, it is still possible that his actions contributed to the incident. Ultimately, the specific circumstances of the situation will determine the degree of responsibility of each individual.
This is an impressive showing for ChatGPT. I did not expect it to do this well.
That said, this falling bottle example is not uncommon in the literature. It is likely ChatGPT got to see the test questions in advance.
Further, I don’t like all this talk about “accidents” and “intention.” It alludes to morality. I’m happy to leave debates on ChatGPT’s sense of morality to pop intellectuals on Joe Rogan’s podcast. If this or future versions of ChatGPT will find code bugs and explain why people churn, it won’t need morality to do so.
To do that, in my next post, I’ll analyze how ChatGPT performs on a similar example involving a simple game of chance.
If you liked this post, please share it with a friend.
Icard, T.F., Kominsky, J.F. and Knobe, J., 2017. Normality and actual causal strength. Cognition, 161, pp.80-93.
Lagnado, D.A., Gerstenberg, T. and Zultan, R.I., 2013. Causal responsibility and counterfactuals. Cognitive science, 37(6), pp.1036-1073.
Nie, A, Amdekar, A., Piech, C.J., Hashimoto, T., and Gerstenberg, T 2022. MoCa: Cognitive Scaffolding for Language Models in Causal and Moral Judgment Tasks.