Key insights of today’s newsletter:
Google researchers have built a novel conversational AI system, AMIE, that has the ability to engage in diagnostic conversations.
The results are remarkable, AMIE had greater diagnostic accuracy and superior performance for 28 of 32 axes from the perspective of specialist physicians, and 24 of 26 axes from the perspective of patients.
However, there are many important limitations that need to be addressed, such as health equity and fairness, privacy, robustness, and more.
↓ Go deeper (5 min read)
A man tells his doctor, “I've got a bad back.” The doctor says, “It's old age.” The man goes, “I want a second opinion.” The doctor says: “Alright — you're ugly as well.”
In surprisingly unsensational fashion, an important piece of research was published on the Google Research Blog on Friday, January 12, 2024.
On first glance, the paper’s title — AMIE: A research AI system for diagnostic medical reasoning and conversations — doesn’t spark much joy, but the contents of it certainly do. In the introduction we can read:
The physician-patient conversation is a cornerstone of medicine, in which skilled and intentional communication drives diagnosis, management, empathy and trust.
(…)
While LLMs can accurately perform tasks such as medical summarization or answering medical questions, there has been little work specifically aimed towards developing these kinds of conversational diagnostic capabilities.
AMIE stands for Articulate Medical Intelligence Explorer. It’s a bespoke conversational agent optimized for diagnostic reasoning and conversations.
It combines two powerful ideas:
the ability of language models to ingest domain-specific data through a process called fine-tuning, and;
the ability of language models to engage in conversations in a flexible and contextual manner.
The goal of researchers is not to emulate traditional in-person evaluations. Instead, the tool allows people to converse with something that approximates a trained specialist, through a digital medium we are all very much familiar with: messaging.
Not only are chatbots “the most common way consumers interact with LLMs today”, the researchers write, it is also “a potentially scalable and familiar mechanism for AI systems to engage in remote diagnostic dialogue.”
Performance and limitations
AMIE’s performance was evaluated from two core perspectives: the patient and the physician. For the patient, things like trust in the service, perceived openness and honesty, and empathy were used as performance indicators. The diagnostic conversation and reasoning qualities — so both the quality of the conversation and the diagnostic accuracy — was evaluated by specialist physicians.
The results were remarkable. AMIE had greater diagnostic accuracy and superior performance for 28 of 32 axes from the perspective of specialist physicians, and 24 of 26 axes from the perspective of patients.
Now, before you decide to drop out of medical school after reading this, let’s assess the limitations of the experiment. The researchers carefully and deliberately state:
Any research of this type must be seen as only a first exploratory step on a long journey.
(…)
There are many important limitations to be addressed, including experimental performance under real-world constraints and dedicated exploration of such important topics as health equity and fairness, privacy, robustness, and many more, to ensure the safety and reliability of the technology.
Again, I must compliment the research team’s reticence to wave the we-don’t-need-doctors-anymore flag.
One major limitation that caught my eye was that the evaluation was not done with real patients. Instead, patient actors were asked to engage and evaluate.
Research by Elizabeth Stokoe, a British social scientist and conversation analyst, on role-play and simulation questions the authenticity of simulated interactions, by comparing actual and role-played police investigative interviews. Her paper discusses the implications for the efficacy of role-play methods for training and assessing communication. In other words, we should probably take those patient evaluations with a grain of salt.
Performing a healthcare study like this with real patients and real-world outcomes is a whole new ballgame.
Reimagining healthcare
Nonetheless, AI-powered diagnostics presents us with a real and tangible opportunity.
As a European, and especially a citizen of The Netherlands, my position is one of privilege. Access to health care is less of a topic here than elsewhere in the world. I know for example that 1 in 10 people in the United States don’t have health insurance.
People without insurance are less likely to afford the healthcare services and medications they need. But it’s not the only barrier to care. Limited availability of healthcare resources reduces access to health services and increases the risk of poor health outcomes. For example, physician shortages may mean that patients experience longer wait times and delayed care.
Preventive care and national — or even global — access to reliable and responsible medical diagnostics, powered by AI, could be part of the solution. The path is long and full of hurdles, but I do see it. I do see a future where AI assistants play a role in aiding clinicians. I say let’s move towards that future. Responsibly, and with incremental steps.
Join the conversation 🗣
Leave comment with your thoughts. Or like this post if it resonated with you.
Get in touch 📥
Have a question? Shoot me an email at jurgen@cdisglobal.com
Interesting! Thanks for this summary. I agree, I think we should look at this sort of technology more as a supplement to, than a replacement of, doctors. But I also wonder whether nurses might be left to take on more of a role in diagnosing and prescribing?