Anatomic pathologists understand that, along with breast cancer, diagnostic testing for prostate cancer accounts for a high volume of clinical laboratory tests. Thus, a recent study indicating that a new artificial intelligence (AI)-based software tool can dramatically improve physicians’ ability to identify the extent of these cancers will be of interest.
“The study found that Unfold AI’s patient-specific encapsulation confidence score (ECS), which is generated based on multiple patient data points, including MRI scans, biopsy results, PSA [prostate-specific antigen] data, and Gleason scores, is critical for predicting treatment success,” an Avenda press release states. “These findings emphasize the importance of Unfold AI’s assessment of tumor margins in predicting treatment outcomes, surpassing the predictive capability of conventional parameters.”
“Unfold AI’s ability to identify tumor margins and provide the ECS will improve treatment recommendations and allow for less-invasive interventions,” said study co-author Wayne Brisbane, MD, a urologic oncologist and UCLA medical professor, in another press release. “This more comprehensive approach enhances our ability to predict treatment outcomes and tailor interventions effectively to individual patient needs.”
“This study is important because it shows the ability of AI to not only replicate expert physicians, but to go beyond human ability,” said study co-author Wayne Brisbane, MD (above), a urologic oncologist and UCLA medical professor, in a press release. “By increasing the accuracy of cancer identification in the prostate, more precise and effective treatment methods can be prescribed for patients.” Clinical laboratories that work with anatomic pathologists to diagnose prostate and other cancers may soon have a new AI testing tool. (Photo copyright: UCLA.)
How Unfold AI Works
To gauge the extent of prostate tumors, surgeons typically evaluate results from multiple diagnostic methods such as PSA tests and imaging scans such as MRIs, according to a UCLA press release. However some portions of a tumor may be invisible to an MRI, causing doctors to underestimate the size.
Unfold AI, originally known as iQuest, was designed to analyze data from PSA, MRI, fusion biopsy, and pathology testing, according to a company brochure. From there, it generates a 3D map of the cancer. Avenda’s website says the technology provides a more accurate representation of the tumor’s extent than conventional methods.
“Accurately determining the extent of prostate cancer is crucial for treatment planning, as different stages may require different approaches such as active surveillance, surgery, focal therapy, radiation therapy, hormone therapy, chemotherapy, or a combination of these treatments,” Brisbane said in the UCLA press release.
Putting AI to the Test
In the new study, the UCLA researchers enlisted seven urologists and three radiologists to review 50 prostate cancer cases. Each patient had undergone prostatectomy—surgical removal of all or part of the prostate—but might have been eligible for focal therapy, a less-aggressive approach that uses heat, cryotherapy, or electric shocks to attack cancer cells more selectively.
The physicians came from five hospitals and had a wide range of clinical experience from two to 23 years, the researchers noted in The Journal of Urology.
They reviewed clinical data and examined MRI scans of each patient, then “manually drew outlines around the suspected cancerous areas, aiming to encapsulate all significant disease,” the press release states. “Then, after waiting for at least four weeks, they reexamined the same cases, this time using AI software to assist them in identifying the cancerous areas.”
The researchers analyzed the physicians’ work, evaluating the accuracy of the cancer margins and the “negative margin rate,” indicating whether the clinicians had identified all of the cancerous tissue. Using conventional approaches, “doctors only achieved a negative margin 1.6% of the time,” the press release states. “When assisted by AI the number increased to 72.8%.”
The clinicians’ accuracy was 84.7% when assisted by AI versus 67.2% to 75.9% for conventional techniques.
They also found that clinicians who used the AI software were more likely to recommend focal therapy over more aggressive forms of treatment.
“We saw the use of AI assistance made doctors both more accurate and more consistent, meaning doctors tended to agree more when using AI assistance,” said Avenda Health co-founder and CEO Shyam Natarajan, PhD, who was senior author of the study.
“These results demonstrate a marked change in how physicians will be able to diagnose and recommend treatment for prostate cancer patients,” said Natarajan in a company press release. “By increasing the confidence in which we can predict a tumor’s margins, patients and their doctors will have increased certainty that their entire tumor is treated and with the appropriate intervention in correlation to the severity of their case.”
UCLA’s study found that AI can outperform doctors both in sensitivity (a higher detection rate of positive cancers) and specificity (correctly detecting the sample as negative). That’s relevant and worth watching for further developments.
Pathologists and clinical laboratory managers should consider this use of AI as one more example of how artificial intelligence can be incorporated into diagnostic tests in ways that allow medical laboratory professionals to diagnose disease earlier and more accurately. This will improve patient care because early intervention for most diseases leads to better outcomes.
Research results call into question the safety and dependability of using artificial intelligence in medical diagnosis, a development that should be watched by clinical laboratory scientists
ChatGPT, an artificial intelligence (AI) chatbot that returns answers to written prompts, has been tested and found wanting by researchers at the University of Florida College of Medicine (UF Health) who looked into how well it could answer typical patient questions on urology. Not good enough according to the researchers who conducted the study.
AI is quickly becoming a powerful new tool in diagnosis and medical research. Some digital pathologists and radiologists use it for data analysis and to speed up diagnostic modality readings. It’s even been said that AI will improve how physicians treat disease. But with all new discoveries there comes controversy, and that’s certainly the case with AI in healthcare.
Many voices in opposition to AI’s use in clinical medicine claim the technology is too new and cannot be trusted with patients’ health. Now, UF Health’s study seems to have confirmed that belief—at least with ChatGPT.
The study revealed that answers ChatGPT provided “fell short of the standard expected of physicians,” according to a UF Health new release, which called ChatGPT’s answers “flawed.”
The questions posed were considered to be common medical questions that patients would ask during a visit to a urologist.
The researchers believes their study is the first of its kind to focus on AI and the urology specialty and which “highlights the risk of asking AI engines for medical information even as they grow in accuracy and conversational ability,” UF Health noted in the news release.
“I am not discouraging people from using chatbots,” said Russell S. Terry, MD (above), an assistant professor in the UF College of Medicine’s department of urology and the study’s senior author, in a UF Health news release. “But don’t treat what you see as the final answer. Chatbots are not a substitute for a doctor.” Pathologists and clinical laboratory managers will want to monitor how developers improve the performance of chatbots and other applications using artificial intelligence. (Photo copyright: University of Florida.)
UF Health ChatGPT Study Details
UF Health’s study featured 13 of the most queried topics from patients to their urologists during office visits. The researchers asked ChatGPT each question three times “since ChatGPT can formulate different answers to identical queries,” they noted in the news release.
The urological conditions the questions covered included:
The researchers then “evaluated the answers based on guidelines produced by the three leading professional groups for urologists in the United States, Canada, and Europe, including the American Urological Association (URA). Five UF Health urologists independently assessed the appropriateness of the chatbot’s answers using standardized methods,” UF Health noted.
Notable was that many of the results were inaccurate. According to UF Health, only 60% of responses were deemed appropriate from the 39 evaluated responses. Outside of those results, the researchers noted in their Urology paper, “[ChatGPT] misinterprets clinical care guidelines, dismisses important contextual information, conceals its sources, and provides inappropriate references.”
When asked, for the most part ChatGPT was not able to accurately provide the sources it referenced for its answers. Apparently, the chatbot was not programmed to provide such sources, the UF Health news release stated.
“It provided sources that were either completely made up or completely irrelevant,” Terry noted in the new release. “Transparency is important so patients can assess what they’re being told.”
Further, “Only 7 (54%) of 13 topics and 21 (54%) of 39 responses met the BD [Brief DISCERN] cut-off score of ≥16 to denote good-quality content,” the researchers wrote in their paper. BD is a validated healthcare information assessment questionnaire that “provides users with a valid and reliable way of assessing the quality of written information on treatment choices for a health problem,” according to the DISCERN website.
ChatGPT often “omitted key details or incorrectly processed their meaning, as it did by not recognizing the importance of pain from scar tissue in Peyronie’s disease. As a result … the AI provided an improper treatment recommendation,” the UF Health study paper noted.
Is Using ChatGPT for Medical Advice Dangerous to Patients?
Terry noted that the chatbot performed better in some areas over others, such as infertility, overactive bladder, and hypogonadism. However, frequently recurring UTIs in women was one topic of questions for which ChatGPT consistently gave incorrect results.
“One of the more dangerous characteristics of chatbots is that they can answer a patient’s inquiry with all the confidence of a veteran physician, even when completely wrong,” UF Health reported.
“In only one of the evaluated responses did the AI note it ‘cannot give medical advice’ … The chatbot recommended consulting with a doctor or medical adviser in only 62% of its responses,” UF Health noted.
For their part, ChatGPT’s developers “tell users the chatbot can provide bad information and warn users after logging in that ChatGPT ‘is not intended to give advice,’” UF Health added.
Future of Chatbots in Healthcare
In UF Health’s Urology paper, the researchers state, “Chatbot models hold great promise, but users should be cautious when interpreting healthcare-related advice from existing AI models. Additional training and modifications are needed before these AI models will be ready for reliable use by patients and providers.”
UF Health conducted its study in February 2023. Thus, the news release points out, results could be different now due to ChatGPT updates. Nevertheless, Terry urges users to get second opinions from their doctors.
“It’s always a good thing when patients take ownership of their healthcare and do research to get information on their own,” he said in the news release. “But just as when you use Google, don’t accept anything at face value without checking with your healthcare provider.”
That’s always good advice. Still, UF Health notes that “While this and other chatbots warn users that the programs are a work in progress, physicians believe some people will undoubtedly still rely on them.” Time will tell whether trusting AI for medical advice turns out well for those patients.
The study reported above is a useful warning to clinical laboratory managers and pathologists that current technologies used in ChatGPT, and similar AI-powered solutions, have not yet achieved the accuracy and reliability of trained medical diagnosticians when answering common questions about different health conditions asked by patients.