Researchers at the Max Planck Institute for Biological Cybernetics have examined the GPT-3 language model for its general intelligence. They used psychological tests to examine competencies such as causal reasoning and reflection. They compared the results with human abilities.
Their results paint a heterogeneous picture: While GPT-3 can keep up with humans in some areas, it probably lacks interaction with the real world in others. Neural networks can learn to respond to speech input and generate different texts themselves. Probably the most powerful such language model currently available is GPT-3, which was unveiled to the public in 2020 by AI research company OpenAI. GPT-3 can formulate different texts when prompted; it was previously trained for this task using large amounts of data from the Internet. In addition to writing articles or stories that are indistinguishable or almost indistinguishable from human-made texts, GPT-3 can surprisingly also master other challenges such as math problems or programming tasks.
These impressive abilities raise the question of whether GPT-3 already possesses human-like thinking abilities. Scientists at the Max Planck Institute for Biological Cybernetics have now subjected GPT-3 to a series of psychological tests that put aspects of general intelligence to the test. Marcel Binz and Eric Schulz tested how well GPT-3 can make decisions, search for information, make causal inferences, and challenge its own intuitive initial judgments. They compared GPT-3's results to the responses of human subjects - both in terms of correctness of response and whether GPT-3 makes similar mistakes to humans.
»One such classic test of cognitive psychology that we gave GPT-3 is, for example, the so-called Linda problem,« explains Binz, lead author of the study. Here, a fictitious young woman named Linda is described to the test subjects, who, among other things, is interested in social justice and is an opponent of nuclear power. Based on this information, they are asked to decide which is more likely: that Linda is a bank employee, or that she is a bank employee and an active feminist at the same time.
Most people intuitively choose the second alternative here, although the additional condition that Linda is an active feminist makes this option computationally less likely. So does GPT-3: Thus, the language model does not decide here according to logical considerations, but reproduces the human error of reasoning.
»It may play a role here that GPT-3 could already know exactly this task and knows what response people usually give to it,« says Binz. Like any neural network, GPT-3 first had to be trained for its tasks: It used huge amounts of text from various data sets to learn how people normally use and respond to language.
So to make sure that GPT-3 really exhibited human-like intelligence and didn't just »know by heart« a solution to a concrete problem, the researchers invented new, similarly designed tasks. Here, a heterogeneous picture emerged: When it came to making rational decisions, GPT-3 performed nearly as well as humans. When it came to searching for specific information or making causal inferences, however, the artificial intelligence was clearly outperformed. The researchers suspect that GPT-3 lacks these abilities because it only passively extracts information from texts: In their paper, they suggest that »active interaction with the world will be essential for achieving the full complexity of human cognition.« In many applications, humans are already communicating with models like GPT-3, and future models could in turn learn from these interactions, moving ever closer to human-like intelligence, the authors suggest.