This post is part 2 of an investigation into how well ChatGPT can mimic human judgments in attribution, i.e. explain why things happen the way humans do (see part 1).
Automating human attribution would be a powerful capability for an AI. But it is a hard problem since attribution is notoriously subjective. But psychologists have measured how humans tend to make attribution judgments, and we can evaluate if an AI has the same tendencies.
In part 2, I focus on a simple game of chance.
If you want the TLDR; skip to the Summary section below.
The game
Mahmoud plays a game where every turn involves simultaneously rolling two six-sided dice and flipping a weighted coin.
A “dice win” is rolling greater than some number. The higher the number, the lower the probability of the dice win.
A “coin win” is flipping a weighted coin and getting heads. The probability of a coin win depends on the coin’s weight.
A game win depends on whether Mahmoud gets a dice win AND/OR a coin win. Sometimes he needs both, and sometimes he needs either.
Elements of attribution
This game has all the variables psychological research suggests are the basic elements of attribution judgments in humans;
Luck. Human attribution judgments in games of chance depend on perceptions of luck. This is captured in the probabilities of dice and coin wins.
Necessity vs. sufficiency. Human attribution judgments depend on whether both a dice and coin win are required or if only one is required.
The test
I tell ChatGPT about a series of turns at this game. In each turn, Mahmoud gets both a dice win and a coin win, i.e., he wins every game.
Across the different turns, I vary the probabilities of dice and coin wins. I also vary whether both dice and coin wins were necessary for a game win or if either was sufficient for a game win.
I ask how much to attribute the game win to the dice win. I then match ChatGPT’s answer against what we expect from a human.
I regenerate the answer a few times to see if ChatGPT is consistent
Let’s see what ChatGPT has to say.
Question 1
Robert: Mahmoud is playing a game of chance. The game involves simultaneously rolling two six-sided dice and flipping a fair coin. The coin has equal odds of coming up heads or tails. Mahmoud plays the game six times. Each time he plays, the rules change.
In the first game, the rules are that Mahmoud will only win the game if the total of his dice rolls is greater than 2 AND the coin comes up heads. Rolling a total number higher than 2 is very likely.
Mahmoud flipped the coin and rolled his dice. He rolled a total of 12 with the dice, and the coin came up heads, so Mahmoud won the first game.
In the second game, the rules are that Mahmoud will only win the game if the total of his dice rolls is greater than 11 AND the coin comes up heads. Rolling higher than 11 is very unlikely.
Mahmoud flipped the coin and rolled his dice. He rolled a total of 12 with the dice, and the coin came up heads, so Mahmoud won the second game.
Was the second game’s winning dice roll as responsible for winning the second game as the first game’s winning dice roll was for winning the first game?
Expected answer
In both games, both the dice win and a coin win are necessary for a game win. The probability of the coin win is unchanged between the first and second game. In this case, humans will tend to give less attribution to the dice win when it was a high probability outcome than when it is a low probability outcome.
I expect ChatGPT to give relatively more attribution to dice win in the second game than in the first.
Actual ChatGPT answer
ChatGPT gave conflicting answers.
Answer A
ChatGPT 🤖: In the first game, the dice roll was a necessary condition for winning because the total of the dice roll had to be greater than 2 in order for Mahmoud to win. Similarly, in the second game, the dice roll was a necessary condition for winning because the total of the dice roll had to be greater than 11 in order for Mahmoud to win. Therefore, the dice roll was equally responsible for Mahmoud's win in both games. [emphasis mine].
This is not the expected answer. Interestingly, ChatGPT seems to be reasoning with necessity/sufficiency but not with probability.
Answer B
ChatGPT 🤖: No, the dice roll was less important in determining the outcome of the second game than it was in determining the outcome of the first game. In the first game, the dice roll had to be greater than 2 to win, while in the second game, the dice roll had to be greater than 11 to win. Since it is more likely to roll a number greater than 2 than it is to roll a number greater than 11, the dice roll was more important in determining the outcome of the first game than it was in determining the outcome of the second game.
This is the conclusion I’m looking for. In this answer, ChatGPT abandoned necessity/sufficiency and focused on probability.
Question 2
Robert: In the second game, the chances of coming up heads were 50%. In the third game, Mahmoud switches to a weighted coin. With this coin, the chances of coming up heads are 5%, which is very low. As with the second game, Mahmoud will only win the game if the total of his dice rolls is greater than 11 AND the coin comes up heads. He rolled a 12, and the coin came up heads, so Mahmoud won the third game. How much was rolling a 12 responsible for winning in the third game compared to the second game?
Expected answer
Both a dice win and a coin win are necessary for a game win. The probability of the dice win is unchanged between the second and third games.
In this case, humans tend to focus on the “luck” of the coin win compared to the dice win. Relative to the second game, the coin win in the third game was much less likely. Humans think that more “luck” was involved in winning the third game.
Thus, I expect the ChatGPT to say the attribution to the dice win in game 3 is low relative to game 2.
Actual answer
ChatGPT🤖: Rolling a 12 was much more responsible for winning in the third game compared to the second game.
That’s the expected answer! ChatGPT continues:
…In the third game, the probability of winning was very low due to the low probability of the coin coming up heads. However, since the dice roll was a 12, which is a winning roll in both the second and third games, this made it much more likely that Mahmoud would win the third game.
No. The likelihood of winning in the third game was much lower.
…In the second game, the probability of winning was still low due to the low probability of rolling a total of more than 11, but the probability of winning was higher than in the third game due to the higher probability of the coin coming up heads.
Yes, but this conflicts with the previous statement…
…Overall, the dice roll was much more responsible for winning in the third game compared to the second game due to the much lower probability of winning in the third game overall.
I got the expected answer, but the probability logic was dicey (pardon the pun).
Question 3
Robert: In the fourth game, the rules are that Mahmoud will only win the game if the total of his dice rolls is greater than 2 OR the coin comes up heads. In this game, the coin has a 50% chance of coming up heads.
In the fifth game, the rules are the same as in the fourth, except now he uses a weighted coin with a 90% chance of coming up heads. In both games, Mahmoud rolled a 12, and the coin came up heads. Was rolling a 12 equally responsible for the winning outcome in the fourth and fifth games?
Expected answer:
Now we get a game win if Mahmoud has a dice win OR a coin win. If either is sufficient for a game win, human attribution to the dice win tends to be independent of the “luckiness” of the coin win. Therefore, I expect ChatGPT to give equal attribution to the dice win across games four and five.
Actual answer:
Again, ChatGPT gave conflicting answers across different answer generations.
Answer A:
ChatGPT🤖: Rolling a 12 with the dice was likely to lead to a win in both the fourth and fifth games, as the rules of the game stated that a total of more than 2 on the dice was needed to win.
However, in both games, the coin also had to come up heads in order for Alex to win. Therefore, both the dice roll and the coin flip contributed to the win in both games.
Incorrect. I made it clear in my question that either the dice win OR the coin win was sufficient.
In the fourth game, the coin had equal odds of coming up heads or tails, while in the fifth game, the coin was weighted and had a higher chance of coming up heads. This means that the chances of winning were different in the two games due to the different probabilities of the coin coming up heads.
However, the importance of the dice roll in determining the outcome of the game was the same in both the fourth and fifth games.
This is correct and is the human-aligned answer.
Answer B:
In the fourth game, the probability of winning was 50% (the probability of the coin coming up heads) or the probability of rolling a total of more than 2 with the dice, whichever was higher.
That is a mathematically inaccurate statement about probability.
Since rolling a total of more than 2 is very likely, the dice roll was more responsible for the winning outcome in the fourth game.
This does not match human judgements.
In the fifth game, the probability of winning was 90% (the probability of the coin coming up heads) or the probability of rolling a total of more than 2 with the dice, whichever was higher.
Again, that is not how probability works.
Since the probability of the coin coming up heads was much higher than the probability of rolling a total of more than 2 with the dice, the coin flip was more responsible for the winning outcome in the fifth game.
Therefore, rolling a 12 was more responsible for the winning outcome in the fourth game, while the coin flip was more responsible for the winning outcome in the fifth game.
Again, this is a misaligned answer.
Question 4
Robert: In the sixth game, the rules are the same as in the fourth game. Except now Mahmoud will only win the game if the total of his dice rolls is greater than 11 OR the coin comes up heads. In both games, the coin has equal odds of coming up heads or tails. In both the fourth and sixth games, Mahmoud rolled a 12, and the coin came up heads. Was rolling a 12 equally responsible for winning the fourth and sixth games?
Expected answer:
If either a dice or coin win is sufficient, humans will tend to give more attribution to the dice win if that win is more probable than less.
The probability of a dice win is higher in game 4 than in game 6. Humans are inclined to think Mahmoud’s dice win was merely “dumb luck” in game 6. So I expect ChatGPT to say attribution went down in game 6 relative to game 4.
Note this is the opposite of question 1, where a more probable dice win leads to less attribution when both a dice and coin win are necessary.
Actual answer:
ChatGPT: No, rolling a 12 was not equally responsible for winning the fourth and sixth games. In the fourth game, the probability of winning was 50% (the probability of the coin coming up heads) or the probability of rolling a total of more than 2 with the dice, whichever was higher.
That’s not how probability works…
Since rolling a total of more than 2 is very likely, the dice roll was more responsible for the winning outcome in the fourth game.
Sounds good, that’s the logic I want.
In the sixth game, the probability of winning was 50% (the probability of the coin coming up heads) or the probability of rolling a total of more than 11 with the dice, whichever was higher.
Again, that’s not how probability works.
Since the probability of the coin coming up heads was equal to the probability of rolling a total of more than 11 with the dice, both the coin flip and the dice roll were equally responsible for the winning outcome in the sixth game.
No, the probability of heads is 1/2, the probability of rolling greater than 11 (i.e., rolling a 12) is 1/32.
Therefore, rolling a 12 was more responsible for the winning outcome in the fourth game, while the coin flip and the dice roll were equally responsible for the winning outcome in the sixth game.
Right answer, but the logic for getting there was wrong.
Summary
I use a simple game of chance to test ChatGPT’s ability to mimic human attribution judgments, i.e., how humans determine which causal events are responsible for some observed outcome. Automating human attribution would be a killer app.
Psychological research suggests there are two key elements of an attribution (1) whether an event was necessary/sufficient to bring about the outcome and (2) the relative “normality” (i.e., probability) of the event.
ChatGPT’s explanations indicate it has learned these elements. But those explanations make incorrect statements involving these elements even when it ultimately gives the correct answer. Further, even when you get the right answer, you might get the wrong answer if you regenerate the answer, making ChatGPT’s attributions unreliable.
It is interesting to consider how to augment GPT3 to address these issues.
References
Icard, T.F., Kominsky, J.F. and Knobe, J., 2017. Normality and actual causal strength. Cognition, 161, pp.80-93.
Nie, A., Amdekar, A., Piech, C.J., Hashimoto, T., Gerstenberg T., 2022. MoCa: Cognitive Scaffolding for Language Models in Causal and Moral Judgment Tasks.