Study Finds AI Can Beat Average Human Creativity, but Not the Best

A new study comparing 100,000 people with top AI systems like GPT-4 finds that generative AI can now outscore the average person on creativity tests, but still cannot match the most creative human minds. Researchers say the future lies in collaboration, not competition.

Can artificial intelligence really be creative, or is it just remixing what humans have already made?

A new study led by researchers at Université de Montréal suggests the answer is more complicated than a simple yes or no. Generative AI systems such as GPT-4 can now outperform the average person on certain creativity tests, the team found, but the most creative humans still leave even the best AI models behind.

The work, published in the journal Scientific Reports, is described as the largest comparison to date of human creativity and the creativity of large language models, the technology behind tools like ChatGPT, Claude and Gemini.

Karim Jerbi, a psychology professor at Université de Montréal and associate professor at Mila, the Quebec AI Institute, led the research with collaborators including deep learning pioneer Yoshua Bengio.

The team’s findings show how far AI has come — and where it still falls short.

“Our study shows that some AI systems based on large language models can now outperform average human creativity on well-defined tasks,” Jerbi said in a news release. “This result may be surprising — even unsettling — but our study also highlights an equally important observation: even the best AI systems still fall short of the levels reached by the most creative humans.”

To make a fair comparison between humans and machines, the team needed a creativity test that both could take under the same conditions. They turned to the Divergent Association Task, or DAT, a tool developed by study co-author Jay Olson, now at the University of Toronto Mississauga.

The DAT is designed to measure what psychologists call divergent creativity — the ability to generate many varied and original ideas from a single prompt. Instead of asking people to draw or invent a product, it asks them to list 10 words that are as different in meaning from one another as possible.

A highly creative response might look like this: “galaxy, fork, freedom, algae, harmonica, quantum, nostalgia, velvet, hurricane, photosynthesis.”

Those words are not just random. They span different categories, senses and concepts, from space and weather to feelings and physics. Prior work has shown that people who do well on this kind of word task also tend to perform well on more traditional creativity tests involving idea generation, writing and problem solving.

That makes the DAT a quick, online-friendly way to tap into general creative thinking rather than just vocabulary. It also makes it easy to give the exact same challenge to an AI model.

“We developed a rigorous framework that allows us to compare human and AI creativity using the same tools, based on data from more than 100,000 participants, in collaboration with Jay Olson from the University of Toronto,” Jerbi added.

In the new study, the researchers fed the DAT prompts to several leading language models and compared their scores with those of more than 100,000 human participants. On average, some models, including GPT-4, scored higher than the typical human participant on this divergent word task.

That marks a milestone: for the first time, generative AI has crossed the threshold of average human performance on a widely used creativity measure.

But the story changes when you look at the upper end of the human distribution. The researchers report that the average score of the more creative half of human participants still exceeds the scores of all the AI models they tested. The gap widens further when they focus on the top 10% of human performers.

In other words, AI can now beat the middle of the pack, but not the standout human creators.

To see whether the results would hold up in more real-world creative activities, the team went beyond word lists. They asked both humans and AI systems to take on creative writing tasks, including composing haiku, summarizing movie plots and writing short stories.

Again, some AI systems could rival or surpass average human performance. But the best human writers still produced work that outshone the machines, according to the study.

The researchers also explored how much AI creativity can be tuned or guided. They focused on two levers that are familiar to AI developers but less known to the general public: temperature and prompting.

Temperature is a technical setting that controls how predictable or adventurous a language model’s responses are. At low temperature, the model tends to choose safe, common words and phrases. At higher temperature, it takes more risks, sampling from less likely options and producing more surprising combinations.

In this study, raising the temperature generally made AI outputs more varied and original, boosting scores on creativity tasks. But there is a trade-off: push the randomness too far and the results can become incoherent or irrelevant.

The way instructions are phrased — known as prompting — also turned out to be crucial. The team found that certain strategies, such as asking the model to think about the origins and structure of words (their etymology), nudged it toward less obvious associations and higher creativity scores.

Those findings underscore a key point: AI does not create in a vacuum. Human choices about how to configure and prompt these systems strongly shape how creative they appear to be. In that sense, human–AI interaction becomes part of the creative process.

The study arrives amid growing anxiety about whether AI will replace human workers in creative fields, from writing and design to music and film. The authors argue that their results offer a more nuanced picture.

On one hand, AI is now capable enough to compete with or exceed average human performance on specific, well-defined creative tasks. On the other, the most original, high-level creativity remains distinctly human.

It is time to rethink the framing of humans versus machines, according to Jerbi.

“Even though AI can now reach human-level creativity on certain tests, we need to move beyond this misleading sense of competition,” he said. “Generative AI has above all become an extremely powerful tool in the service of human creativity: it will not replace creators, but profoundly transform how they imagine, explore, and create — for those who choose to use it.”

Rather than signaling the end of creative professions, the researchers suggest that AI could become a kind of creative assistant, expanding the space of ideas that artists, writers, scientists and others can explore. For students and early-career creators, such tools might help overcome writer’s block, generate prompts or offer alternative perspectives.

At the same time, the study highlights the continuing value of human originality, intuition and lived experience — qualities that current AI systems do not possess.

“By directly confronting human and machine capabilities, studies like ours push us to rethink what we mean by creativity,” Jerbi added.

The study reflects a collaboration among researchers from Université de Montréal, Université Concordia, the University of Toronto Mississauga, Mila and Google DeepMind.

As generative AI systems continue to evolve, the team’s framework for measuring and comparing creativity could help track how the technology changes — and how humans choose to use it.

Source: Université de Montréal