A new study from Queen Mary University of London has demonstrated that AI-generated voices have achieved a level of realism indistinguishable from human voices, raising important questions about technology’s role in communication, security and ethics.
AI voice technology has crossed a remarkable milestone. A new study from Queen Mary University of London reveals that synthetic voices are now indistinguishable from those of real humans, marking a significant leap forward in artificial intelligence capabilities.
Many have long viewed AI-generated speech as unconvincing and easily distinguishable from human voices. However, the latest research demonstrates that this perception is becoming increasingly outdated.
Published in the journal PLOS One, the study compared real human voices with two types of AI-generated voices: those cloned to mimic a specific person and those created from large voice models without a specific human counterpart.
Participants in the study evaluated the realism, dominance and trustworthiness of the voices.
The findings revealed that AI-generated voices could sound as real as human voices, making it challenging for listeners to tell them apart. Interestingly, these voices were often perceived as more dominant and, in some cases, more trustworthy than their human counterparts.
“AI-generated voices are all around us now. We’ve all spoken to Alexa or Siri, or had our calls taken by automated customer service systems,” corresponding author Nadine Lavan, a senior lecturer in psychology at Queen Mary University of London, who co-led the study, said in a news release. “Those things don’t quite sound like real human voices, but it was only a matter of time until AI technology began to produce naturalistic, human-sounding speech. Our study shows that this time has come, and we urgently need to understand how people perceive these realistic voices.”
Lavan highlighted the ease and rapidity with which the team could create voice clones using commercially available software.
“The process required minimal expertise, only a few minutes of voice recordings, and almost no money,” she added. “It just shows how accessible and sophisticated AI voice technology has become.”
The rapid improvement in AI voice synthesis has profound ethical, copyright and security implications. Concerns about misinformation, fraud and impersonation are paramount, especially as realistic voice generation becomes more accessible and advanced.
However, on a positive note, Lavan emphasized the potential of AI voice technology to offer exciting opportunities.
“There might be applications for improved accessibility, education and communication, where bespoke high-quality synthetic voices can enhance user experience,” she said.
Source: Queen Mary University of London

