News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

Sign In

AI Cancer Diagnostics Struggle with Equity, Study Finds

Accuracy gaps in pathology AI affecting nearly 30% of diagnostic tasks highlight risks for clinical decision-making and patient outcomes, according to new research.

A new study is raising important questions for pathologists as artificial intelligence (AI) becomes more embedded in diagnostic workflows. Researchers report that AI systems used to interpret pathology slides for cancer diagnosis do not perform equally across all patient populations, with accuracy varying by race, gender, and age. The findings highlight why pathologists, who rely on objective tissue evaluation to guide treatment decisions, need to understand how bias can enter AI tools designed to support their work.

The study, published in Cell Reports Medicine, shows that pathology AI models can extract demographic information directly from tissue images, even though such details are invisible to human experts. That capability can influence diagnostic performance and potentially reinforce disparities in cancer care if left unaddressed.

“Reading demographics from a pathology slide is thought of as a ‘mission impossible’ for a human pathologist, so the bias in pathology AI was a surprise to us,” said senior author Kun-Hsing Yu, associate professor of biomedical informatics at Harvard Medical School and assistant professor of pathology at Brigham and Women’s Hospital. Yu emphasized that identifying and correcting bias is critical because AI-driven errors can affect diagnostic accuracy and downstream patient outcomes.

Testing Pathology AI Reveals Widespread Diagnostic Gaps

To assess the scope of the problem, Yu and his colleagues evaluated four commonly used deep-learning models under development for cancer diagnosis. These systems are trained on large collections of labeled pathology slides, learning visual patterns associated with disease that can then be applied to new samples. The team tested the models using a large, multi-institutional dataset spanning 20 cancer types.

Across all four models, the researchers found consistent performance gaps linked to patient demographics. Diagnostic accuracy was lower for certain groups defined by race, gender, and age. For example, the models struggled to distinguish lung cancer subtypes in African American patients and in male patients. They also showed reduced accuracy when classifying breast cancer subtypes in younger patients, and lower detection performance for breast, renal, thyroid, and stomach cancers in specific demographic groups. Overall, these disparities appeared in roughly 29% of the diagnostic tasks analyzed.

The findings were unexpected, Yu said, because pathology has long been considered one of the most objective areas of medicine. “Because we would expect pathology evaluation to be objective,” he said, “when evaluating images, we don’t necessarily need to know a patient’s demographics to make a diagnosis.” The results raised a fundamental question for the research team: why were AI systems failing to meet the same standard of objectivity expected of human pathologists?

Further analysis revealed three main contributors to bias in pathology AI. One factor is uneven training data. Pathology samples are often easier to obtain from some populations than others, resulting in imbalanced datasets that make accurate diagnosis more difficult for underrepresented groups. But Yu noted that data imbalance alone did not fully explain the observed disparities. “The problem turned out to be much deeper than that,” he said.

From Demographic Shortcuts to Fairer Diagnosis

Differences in disease incidence also play a role. Some cancers occur more frequently in certain populations, allowing AI models to become highly accurate for those groups while struggling in populations where those diseases are less common. In addition, the models appear capable of detecting subtle molecular and biological differences linked to demographics, such as mutations in cancer driver genes.

Kun-Hsing Yu (Photo credit: Harvard Medical School) 

Kun-Hsing Yu, associate professor of biomedical informatics at Harvard Medical School and assistant professor of pathology at Brigham and Women’s Hospital noted, “We found that because AI is so powerful, it can differentiate many obscure biological signals that cannot be detected by standard human evaluation.”

When models rely on these demographic-linked signals as shortcuts, accuracy can suffer across diverse patient groups.

To address these issues, the researchers developed a new framework called FAIR-Path. Built on a machine-learning approach known as contrastive learning, FAIR-Path trains models to focus on clinically meaningful differences—such as distinctions between cancer types—while minimizing attention to less relevant features, including demographic characteristics.

When applied to the tested models, FAIR-Path reduced diagnostic disparities by about 88%. “We show that by making this small adjustment, the models can learn robust features that make them more generalizable and fairer across different populations,” Yu said. Importantly, the improvement did not require perfectly balanced training datasets.

For pathologists, the findings underscore why careful evaluation of AI tools is essential as these technologies move closer to routine clinical use. The authors are now working with institutions worldwide to study pathology AI bias in different regions and clinical settings, and to adapt FAIR-Path for use in data-limited environments.

Finally, Yu said, the goal is not to replace human expertise, but to support it. “I think there’s hope that if we are more aware of and careful about how we design AI systems, we can build models that perform well in every population,” he said. For pathologists, the study reinforces the importance of remaining actively involved in how AI is developed, validated, and deployed, so that these tools enhance diagnostic confidence and equity, rather than introducing new sources of error into cancer care.

—Janette Wider

;