There is a mathematical reason why machine learning systems like GPT-3 are incapable of understanding. The reason comes down to the fact that machine learning has no memory. It is just probabilistic associations. If there is only a 10% chance of going off topic, then after just seven exchanges there is a greater than 50% chance the machine learning model has gone off topic. The problem is that when prediction is just based on probabilities, the likelihood of making a misprediction increases exponentially. A long-term memory is needed in order to maintain long-term coherence.
GPT-3 is essentially a sophisticated Markov process. What is important about the Markov process is that the next step in the process is only dependent on the immediate previous step, or a fixed number of previous steps. There is no longer-term memory of the past that shapes the future.
On the other hand, the distinctive characteristic of understanding is long-term memory. When two people are talking about a topic, and understand each other, then the present conversation is on the same topic that it was in the past, regardless of how far in the past the conversation started.
This means that the probabilistic models, like GPT-3, are incapable of understanding because they are inherently incapable of long-term memory.






