In a landmark study, combining human expertise with AI has proven to significantly improve diagnostic accuracy in medicine, offering a transformative path forward for patient care and safety.
Hybrid diagnostic collectives consisting of human experts and artificial intelligence systems significantly outperform traditional diagnosis methods, according to an international study led by the Max Planck Institute for Human Development.
Diagnostic errors remain a critical challenge in medical practice.
While AI systems, especially large language models (LLMs) like ChatGPT-4, Gemini and Claude 3, offer innovative diagnostic support, they occasionally generate false information and reflect existing biases.
A research team from the Max Planck Institute for Human Development, in collaboration with the Human Diagnosis Project in San Francisco and the Institute of Cognitive Sciences and Technologies of the Italian National Research Council, has investigated the optimal collaboration between humans and AI.
The study’s results are promising: hybrid diagnostic collectives, combining human and AI inputs, yield significantly higher diagnostic accuracy than either humans or AI alone, particularly in complex, open-ended cases.
“Our results show that cooperation between humans and AI models has great potential to improve patient safety,” lead author Nikolas Zöller, a postdoctoral researcher in the Center for Adaptive Rationality at the Max Planck Institute for Human Development, said in a news release.
Realistic Simulations and Comprehensive Analysis
The research team utilized data from over 2,100 clinical vignettes provided by the Human Diagnosis Project.
These case studies, paired with correct diagnoses, enabled a comparison between diagnoses made by medical professionals and those generated by five leading AI models.
The researchers simulated various diagnostic scenarios — individuals, human collectives, AI models and mixed human–AI collectives — resulting in an analysis of more than 40,000 diagnoses.
The study revealed that while multiple AI models collectively outperformed 85% of human diagnosticians, human experts still excelled in many cases.
Notably, the combination of human and AI inputs led to the highest diagnostic accuracy. This approach leverages the complementary nature of human and AI errors: when one fails, the other often succeeds, creating a powerful safety net.
“It’s not about replacing humans with machines. Rather, we should view artificial intelligence as a complementary tool that unfolds its full potential in collective decision-making,” added co-author Stefan Herzog, a senior research scientist at the Max Planck Institute for Human Development.
Challenges and Future Directions
Despite the promising results, the researchers emphasize that the study was limited to text-based clinical vignettes and did not involve live clinical settings. Further studies are needed to determine whether these findings translate to real-world practice.
Additionally, the research focused solely on diagnosis rather than treatment, and the accuracy of a diagnosis does not always ensure optimal treatment outcomes.
Practical implementation and acceptance of AI-based support systems by medical staff and patients, as well as potential biases, remain areas for future research.
Broader Applications and Ethical Considerations
The study, part of the Horizon Europe-funded Hybrid Human Artificial Collective Intelligence in Open-Ended Decision Making (HACID) project, aims to enhance clinical decision-support systems by integrating human and machine intelligence.
The potential applications extend beyond health care, including the legal system, disaster response and climate policy.
“The approach can also be transferred to other critical areas — such as the legal system, disaster response or climate policy — anywhere that complex, high-risk decisions are needed. For example, the HACID project is also developing tools to enhance decision-making in climate adaptation,” added co-author Vito Trianni, a coordinator of the HACID project.
Conclusion
Hybrid human–AI collectives exhibit unparalleled potential in improving diagnostic accuracy and patient safety. As research progresses, this innovative approach could revolutionize health care delivery, ultimately leading to more equitable and effective patient care worldwide.