AI Model Predicts Enjoyment and Fluidity of Zoom Calls

A new study by NYU scientists reveals an AI model that can predict the fluidity and enjoyment of videoconferences by analyzing conversational turn-taking and facial actions. This breakthrough could dramatically improve virtual meetings, making them more efficient and enjoyable.

Since the COVID-19 pandemic began, videoconference platforms like Zoom and MS Teams have become integral parts of our work and social lives. Despite their advantages, these platforms often suffer from moments that feel awkward or unproductive. Now, a team of scientists from New York University is offering a high-tech solution to make virtual meetings more enjoyable and efficient.

The researchers have developed an artificial intelligence model capable of assessing human behavior during videoconferences. This includes monitoring conversational turn-taking and facial expressions to predict if these interactions are smooth and enjoyable.

“Our machine learning model reveals the intricate dynamics of high-level social interaction by decoding subtle patterns within basic audio and video signals from videoconferences,” lead author Andrew Chang, a postdoctoral fellow at NYU’s Department of Psychology, said in a new release. “This breakthrough represents an important step toward dynamically enhancing videoconference experiences by showing how to avoid conversational derailments before they occur.”

To create this machine-learning marvel, over 100 hours of Zoom recordings were analyzed. The model took note of voice, facial expressions and body movements to identify disruptive elements that made conversations less fluid or enjoyable.

Interestingly, the model found that “awkward silences” were more detrimental to meeting quality than overlapping conversations, suggesting that energetic debates are more favorable than periods of silence.

To validate the model, more than 300 human judges reviewed the same videoconference footage, rating how fluid and enjoyable they found the exchanges. Their assessments closely aligned with the AI predictions, confirming the model’s reliability.

“Videoconferencing is now a prominent feature in our lives, so understanding and addressing its negative moments is vital for not only fostering better interpersonal communication and connection but also for improving meeting efficiency and employee job satisfaction,” added senior author Dustin Freeman, a visiting scholar in NYU’s Department of Psychology. “By predicting moments of conversational breakdown, this work can pave the way for videoconferencing systems to mitigate these breakdowns and smooth the flow of conversations by either implicitly manipulating signal delays to accommodate or explicitly providing cues to users, which we are currently experimenting with.”

The team’s research, published in the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) proceedings, showcases a significant advancement in the field of virtual communication, with potential applications that could extend beyond videoconferences to various forms of remote communication.

The paper was co-authored by Viswadruth Akkaraju and Ray McFadden Cogliano, both graduate students at NYU’s Tandon School of Engineering at the time, as well as David Poeppel, a professor of psychology at NYU and the Max Planck Society in Munich, Germany.

Source: New York University