Monday, July 30, 2018

Say It Ain't So, HAL 9000

The classic film 2001: A Space Odyssey featured HAL 9000, one of the first -- but certainly not the last -- popular examples of artificial intelligence (AI) gone wrong.  Just as the mythical heartbroken kid asked Shoeless Joe Jackson to deny he was involved in throwing the 1919 World Series, it's starting to seem if we need to ask if perhaps we're finding out that AI performance in healthcare is more hope, or hype, than reality.
The biggest headlines came from a startling StatNews report about IBM Watson.  They obtained internal IBM documents that indicated the collaboration with Memorial Sloan Kettering Cancer Center. was something of a bust. with "multiple examples of unsafe and incorrect treatment recommendations."  The "often inaccurate" recommendations posed "serious questions about the process for building content and the underlying technology." 

Fortunately, the reports indicated, no patient deaths resulted from the faulty recommendations. 

IBM had gotten lots of good publicity about the project.  MSK is, of course, one of the premier cancer centers in the world, and the partnership was a coup for IBM.  MSK even touted itself as "Watson's oncology teacher." 

IBM blamed poor training for the flaws, as Watson based its recommendation on a few hypothetical cancer cases instead of on actual patient data, despite the fact that IBM was promising that Watson used the latter.  It was bragging about Watson even after it had become clear that Watson's recommendations often conflicted with national guidelines and that physicians were ignoring them. 

"This product is a piece of shit," one physician told IBM executives.  "We bought it for marketing and with hopes that you would achieve the vision. We can't use it for most cases."  His was not the only physician complaint. 

IBM maintains a positive attitude.  A spokesperson told Gizmodo that Watson is used in 230 hospitals worldwide, is able to treat 13 cancers, and supports care for over 84,000 patients.  IBM also promised STAT:
We have learned and improved Watson Health based on continuous feedback from clients, new scientific evidence and new cancers and treatment alternatives. This includes 11 software releases for even better functionality during the past year, including national guidelines for cancers ranging from colon to liver cancer.
Well, then, it's all right.  That is, unless one remembers the debacle at another leading cancer, MD Anderson Cancer Center.  The failure there may be attributed at least as much to MD Anderson as to any problems with Watson, but, still...

This is not the only recent example of healthcare AI overreaching.  A few weeks ago British start-up Babylon Health claimed its AI chatbot could diagnose patients at least as well as human doctors. 

Babylon tested its chatbot using questions that the Royal College of General Practitioners (RCGP) uses to certify actual doctors, and said that its score beat the average score of human doctors.  They also gave both the chatbot and seven human doctors 100 symptom sets, and say that the chatbot scored better on those as well. 

A Chinese AI, BioMind, similarly claimed to outperform China's "top doctors" -- by a margin of two to one. 

The RCGP, for one, is not buying it.  A spokesperson said
...at the end of the day, computers are computers, and GPs are highly-trained medical professionals: the two can't be compared and the former may support, but will never replace, the latter...No app or algorithm will be able to do what a GP does."
Babylon's medical director fired back:
We make no claims that an ‘app or algorithm will be able to do what a GP does’...that is precisely why, at Babylon, we have created a service that offers a complete continuum of care, where AI decisions are supported by real-life GPs to provide the care and emotional support that only humans are capable of...I am saddened by the response of a college of which I am a member. The research that Babylon will publish later today should be acknowledged for what it is — a significant advancement in medicine
The question of AI doing what physicians do was further illustrated by a new study from MIT.  The research computed "sentiment analysis" on the physicians' notes, outside of the data in the medical records themselves, and found a positive correlation between those scores and the number of diagnostic tests ordered.  The effect was strongest at the beginning of a hospital stay and when there was less medical information to go on.

The researchers concluded that physicians "provide a dimension that, as yet, artificial intelligence does not."  They attribute this to physicians' "gut feelings," saying:
There’s something about a doctor’s experience, and their years of training and practice, that allows them to know in a more comprehensive sense, beyond just the list of symptoms, whether you’re doing well or you’re not.  They’re tapping into something that the machine may not be seeing.
The researchers hope to find ways to teach AI to approximate what physicians are doing by capturing other types of data, such as their speech. 

Lastly, Eric Topel, MD, tried to summarize in a tweet, saying that we still simply don't know how good AI is in medicine.  His summary of the research:
All in all, despite the claims that some are making about AI -- whether it is Watson or Babylon or BioMind or any of the hundreds of other AIs being developed -- one would have to admit that they are not the same as human physicians and not yet a threat to replace them. 

But...

The fact that AI don't replicate what human physicians do doesn't bother me.  Those "gut feelings" and clinical insights that they are so proud of aren't always right and often aren't based on solid, or any, science.   Let's not mistake the value of human touch and interaction with infallibility.

I would hope we can do better.  I would hope that an AI's "gut feelings" will be based on millions of experiences, not the hundreds or even thousands that a human physician might have had for a specific issue.  I would hope that an AI wouldn't order unnecessary or even harmful tests or procedures.  I would hope that an AI wouldn't commit errors due to fatigue or impaired judgement or simple miscommunication. 

AI in healthcare is only going to get better.  It is going to play a major role in our care.  That is something to look forward to, not to fear (HAL 9000 notwithstanding!). 

No comments:

Post a Comment