New Study Uncovers the Art and Flaws of AI-Generated Imagery

A new study dissects the capabilities of AI tools Midjourney and DALL·E in generating images from text prompts. Researchers found that while these AI programs can create aesthetically pleasing images, they often struggle with basic instructions and reflect cultural biases.

In a world where artificial intelligence is rapidly making strides, a team of researchers has embarked on a mission to understand the capabilities and limitations of popular AI tools Midjourney and DALL·E. These generative AI programs have gained attention for their ability to transform written descriptions into visual art, but can they truly capture the essence of our ideas?

A collaborative study involving scientists from the University of Liège in Belgium, the University of Lorraine and EHESS in France sought to answer this question. By combining expertise in semiotics, computer science and art history, the researchers meticulously analyzed the images produced by these AI systems based on various criteria, such as shapes, colors and the arrangement of elements.

“Our approach is based on a series of rigorous tests,” co-author Maria Giulia Dondero, a semiotician and FNRS research director at the University of Liège, said in a news release. “We submitted very specific requests to these two AI systems and analysed the images produced according to criteria from the humanities, such as the arrangement of shapes, colours, gazes, the specific dynamism of the still image, the rhythm of its deployment, etc.” 

The findings, published in the journal Semiotic Review, reveal that while AI tools like Midjourney and DALL·E can generate visually appealing images, they often stumble when following straightforward instructions.

For instance, prompts involving negation, such as “a dog without a tail,” often result in images of dogs with tails or other inaccuracies. Similarly, depicting complex spatial relationships, such as “two women behind a door,” presents significant challenges.

The AI also struggles with actions and temporal sequences, sometimes interpreting “fighting” as dancing or failing to represent the progression of actions like “starting to eat” or “having finished eating.”

“These GAIs allow us to reflect on our own way of seeing and representing the world,” added lead author Enzo D’Armenio, a former researcher at ULiège and now a junior professor at the University of Lorraine. “They reproduce visual stereotypes from their databases, often constructed from Western images, and reveal the limitations of translation between verbal and visual language.”

The research team validated their results through repetition, conducting up to 50 generations per prompt to ensure statistical robustness. They discovered distinct aesthetic signatures in the models: Midjourney tends to produce “aestheticised” images with embellishments, while DALL·E offers greater compositional control but varies in the number and orientation of objects.

Despite their fascinating capabilities, the AI models are inherently statistical, producing the most probable outcomes based on their training datasets and the configurations set by their creators. This often leads to the reinforcement of cultural stereotypes.

For example, the prompt “CEO giving a speech” might yield predominantly male images from some models and mostly female from others, highlighting the biases embedded in their training data.

“GAIs produce the most plausible result based on their training databases and the (sometimes editorial) settings of their designers,” added co-author Adrien Deliège, a mathematician at ULiège, “these choices might standardise the gaze and convey or reorient stereotypes.” 

The researchers emphasize the importance of using interdisciplinary tools from the humanities to evaluate these technologies.

“AI tools are not simply automatic tools,” concluded D’Armenio. “They translate our words according to their own logic, influenced by their databases and algorithms. The humanities have an essential role to play in understanding and evaluating them.” 

The study underscores both the potential and the current limitations of AI-generated imagery, suggesting that while these tools can assist in visualizing ideas, they still fall short of perfect translation. The integration of humanities into the evaluation process is crucial for a comprehensive understanding of their cultural and symbolic implications.

Source: University of Liège