University of Florida Study Determines That ChatGPT Made Errors in Advice about Urology Cases

Research results call into question the safety and dependability of using artificial intelligence in medical diagnosis, a development that should be watched by clinical laboratory scientists

ChatGPT, an artificial intelligence (AI) chatbot that returns answers to written prompts, has been tested and found wanting by researchers at the University of Florida College of Medicine (UF Health) who looked into how well it could answer typical patient questions on urology. Not good enough according to the researchers who conducted the study.

AI is quickly becoming a powerful new tool in diagnosis and medical research. Some digital pathologists and radiologists use it for data analysis and to speed up diagnostic modality readings. It’s even been said that AI will improve how physicians treat disease. But with all new discoveries there comes controversy, and that’s certainly the case with AI in healthcare.

Many voices in opposition to AI’s use in clinical medicine claim the technology is too new and cannot be trusted with patients’ health. Now, UF Health’s study seems to have confirmed that belief—at least with ChatGPT.

The study revealed that answers ChatGPT provided “fell short of the standard expected of physicians,” according to a UF Health new release, which called ChatGPT’s answers “flawed.”

The questions posed were considered to be common medical questions that patients would ask during a visit to a urologist.

The researchers believes their study is the first of its kind to focus on AI and the urology specialty and which “highlights the risk of asking AI engines for medical information even as they grow in accuracy and conversational ability,” UF Health noted in the news release.

The researchers published their findings in the journal Urology titled, “Caution! AI Bot Has Entered the Patient Chat: ChatGPT Has Limitations in Providing Accurate Urologic Healthcare Advice.”

“I am not discouraging people from using chatbots,” said Russell S. Terry, MD (above), an assistant professor in the UF College of Medicine’s department of urology and the study’s senior author, in a UF Health news release. “But don’t treat what you see as the final answer. Chatbots are not a substitute for a doctor.” Pathologists and clinical laboratory managers will want to monitor how developers improve the performance of chatbots and other applications using artificial intelligence. (Photo copyright: University of Florida.)

UF Health ChatGPT Study Details

UF Health’s study featured 13 of the most queried topics from patients to their urologists during office visits. The researchers asked ChatGPT each question three times “since ChatGPT can formulate different answers to identical queries,” they noted in the news release.

The urological conditions the questions covered included:

Overactive bladder,
Vasectomy,
Infertility,
Recurrent urinary tract infections (UTIs),
Kidney stones, and
Urotrauma.

The researchers then “evaluated the answers based on guidelines produced by the three leading professional groups for urologists in the United States, Canada, and Europe, including the American Urological Association (URA). Five UF Health urologists independently assessed the appropriateness of the chatbot’s answers using standardized methods,” UF Health noted.

Notable was that many of the results were inaccurate. According to UF Health, only 60% of responses were deemed appropriate from the 39 evaluated responses. Outside of those results, the researchers noted in their Urology paper, “[ChatGPT] misinterprets clinical care guidelines, dismisses important contextual information, conceals its sources, and provides inappropriate references.”

When asked, for the most part ChatGPT was not able to accurately provide the sources it referenced for its answers. Apparently, the chatbot was not programmed to provide such sources, the UF Health news release stated.

“It provided sources that were either completely made up or completely irrelevant,” Terry noted in the new release. “Transparency is important so patients can assess what they’re being told.”

Further, “Only 7 (54%) of 13 topics and 21 (54%) of 39 responses met the BD [Brief DISCERN] cut-off score of ≥16 to denote good-quality content,” the researchers wrote in their paper. BD is a validated healthcare information assessment questionnaire that “provides users with a valid and reliable way of assessing the quality of written information on treatment choices for a health problem,” according to the DISCERN website.

ChatGPT often “omitted key details or incorrectly processed their meaning, as it did by not recognizing the importance of pain from scar tissue in Peyronie’s disease. As a result … the AI provided an improper treatment recommendation,” the UF Health study paper noted.

Is Using ChatGPT for Medical Advice Dangerous to Patients?

Terry noted that the chatbot performed better in some areas over others, such as infertility, overactive bladder, and hypogonadism. However, frequently recurring UTIs in women was one topic of questions for which ChatGPT consistently gave incorrect results.

“One of the more dangerous characteristics of chatbots is that they can answer a patient’s inquiry with all the confidence of a veteran physician, even when completely wrong,” UF Health reported.

“In only one of the evaluated responses did the AI note it ‘cannot give medical advice’ … The chatbot recommended consulting with a doctor or medical adviser in only 62% of its responses,” UF Health noted.

For their part, ChatGPT’s developers “tell users the chatbot can provide bad information and warn users after logging in that ChatGPT ‘is not intended to give advice,’” UF Health added.

Future of Chatbots in Healthcare

In UF Health’s Urology paper, the researchers state, “Chatbot models hold great promise, but users should be cautious when interpreting healthcare-related advice from existing AI models. Additional training and modifications are needed before these AI models will be ready for reliable use by patients and providers.”

UF Health conducted its study in February 2023. Thus, the news release points out, results could be different now due to ChatGPT updates. Nevertheless, Terry urges users to get second opinions from their doctors.

“It’s always a good thing when patients take ownership of their healthcare and do research to get information on their own,” he said in the news release. “But just as when you use Google, don’t accept anything at face value without checking with your healthcare provider.”

That’s always good advice. Still, UF Health notes that “While this and other chatbots warn users that the programs are a work in progress, physicians believe some people will undoubtedly still rely on them.” Time will tell whether trusting AI for medical advice turns out well for those patients.

The study reported above is a useful warning to clinical laboratory managers and pathologists that current technologies used in ChatGPT, and similar AI-powered solutions, have not yet achieved the accuracy and reliability of trained medical diagnosticians when answering common questions about different health conditions asked by patients.

—Kristin Althea O’Connor

Related Information:

UF College of Medicine Research Shows AI Chatbot Flawed when Giving Urology Advice

Caution! AI Bot Has Entered the Patient Chat: ChatGPT Has Limitations in Providing Accurate Urologic Healthcare Advice

University of Florida Study Determines That ChatGPT Made Errors in Advice about Urology Cases

Lab Resources Categories