Proof-of-concept study ‘highlights that using AI to integrate different types of clinically informed data to predict disease outcomes is feasible’ researchers say
Artificial intelligence (AI) and machine learning are—in stepwise fashion—making progress in demonstrating value in the world of pathology diagnostics. But human anatomic pathologists are generally required for a prognosis. Now, in a proof-of-concept study, researchers at Brigham and Women’s Hospital in Boston have developed a method that uses AI models to integrate multiple types of data from disparate sources to accurately predict patient outcomes for 14 different types of cancer.
The process also uncovered “the predictive bases of features used to predict patient risk—a property that could be used to uncover new biomarkers,” according to Genetic Engineering and Biotechnology News (GEN).
Should these research findings become clinically viable, anatomic pathologists may gain powerful new AI tools specifically designed to help them predict what type of outcome a cancer patient can expect.
“Experts analyze many pieces of evidence to predict how well a patient may do. These early examinations become the basis of making decisions about enrolling in a clinical trial or specific treatment regimens,” said Faisal Mahmood, PhD (above) in a Brigham press release. “But that means that this multimodal prediction happens at the level of the expert. We’re trying to address the problem computationally,” he added. Should they be proven clinically-viable through additional studies, these findings could lead to useful tools that help anatomic pathologists and clinical laboratory scientists more accurately predict what type of outcomes cancer patient may experience. (Photo copyright: Harvard.)
AI-based Prognostics in Pathology and Clinical Laboratory Medicine
The team at Brigham constructed their AI model using The Cancer Genome Atlas (TCGA), a publicly available resource which contains data on many types of cancer. They then created a deep learning-based algorithm that examines information from different data sources.
Pathologists traditionally depend on several distinct sources of data, such as pathology images, genomic sequencing, and patient history to diagnose various cancers and help develop prognoses.
For their research, Mahmood and his colleagues trained and validated their AI algorithm on 6,592 H/E (hematoxylin and eosin) whole slide images (WSIs) from 5,720 cancer patients. Molecular profile features, which included mutation status, copy-number variation, and RNA sequencing expression, were also inputted into the model to measure and explain relative risk of cancer death.
The scientists “evaluated the model’s efficacy by feeding it data sets from 14 cancer types as well as patient histology and genomic data. Results demonstrated that the models yielded more accurate patient outcome predictions than those incorporating only single sources of information,” states a Brigham press release.
“This work sets the stage for larger healthcare AI studies that combine data from multiple sources,” said Faisal Mahmood, PhD, Associate Professor, Division of Computational Pathology, Brigham and Women’s Hospital; and Associate Member, Cancer Program, Broad Institute of MIT and Harvard, in the press release. “In a broader sense, our findings emphasize a need for building computational pathology prognostic models with much larger datasets and downstream clinical trials to establish utility.”
Future Prognostics Based on Multiple Data Sources
The Brigham researchers also generated a research tool they dubbed the Pathology-omics Research Platform for Integrative Survival Estimation (PORPOISE). This tool serves as an interactive platform that can yield prognostic markers detected by the algorithm for thousands of patients across various cancer types.
The researchers believe their algorithm reveals another role for AI technology in medical care, but that more research is needed before their model can be implemented clinically. Larger data sets will have to be examined and the researchers plan to use more types of patient information, such as radiology scans, family histories, and electronic medical records in future tests of their AI technology.
“Future work will focus on developing more focused prognostic models by curating larger multimodal datasets for individual disease models, adapting models to large independent multimodal test cohorts, and using multimodal deep learning for predicting response and resistance to treatment,” the Cancer Cell paper states.
“As research advances in sequencing technologies, such as single-cell RNA-seq, mass cytometry, and spatial transcriptomics, these technologies continue to mature and gain clinical penetrance, in combination with whole-slide imaging, and our approach to understanding molecular biology will become increasingly spatially resolved and multimodal,” the researchers concluded.
Anatomic pathologists may find the Brigham and Women’s Hospital research team’s findings intriguing. An AI tool that integrates data from disparate sources, analyzes that information, and provides useful insights, could one day help them provide more accurate cancer prognoses and improve the care of their patients.
Researchers say their method can trace ancestry back 100,000 years and could lay groundwork for identifying new genetic markers for diseases that could be used in clinical laboratory tests
Cheaper, faster, and more accurate genomic sequencing technologies are deepening scientific knowledge of the human genome. Now, UK researchers at the University of Oxford have used this genomic data to create the largest-ever human family tree, enabling individuals to trace their ancestry back 100,000 years. And, they say, it could lead to new methods for predicting disease.
This new database also will enable genealogists and medical laboratory scientists to track when, where, and in what populations specific genetic mutations emerged that may be involved in different diseases and health conditions.
New Genetic Markers That Could Be Used for Clinical Laboratory Testing
As this happens, it may be possible to identify new diagnostic biomarkers and genetic indicators associated with specific health conditions that could be incorporated into clinical laboratory tests and precision medicine treatments for chronic diseases.
“We have basically built a huge family tree—a genealogy for all of humanity—that models as exactly as we can the history that generated all the genetic variation we find in humans today,” said Yan Wong, DPhil, an evolutionary geneticist at the Big Data Institute (BDI) at the University of Oxford, in a news release. “This genealogy allows us to see how every person’s genetic sequence relates to every other, along all the points of the genome.”
Researchers from University of Oxford’s BDI in London, in collaboration with scientists from the Broad Institute of MIT and Harvard; Harvard University, and University of Vienna, Austria, developed algorithms for combining different databases and scaling to accommodate millions of gene sequences from both ancient and modern genomes.
“Essentially, we are reconstructing the genomes of our ancestors and using them to form a series of linked evolutionary trees that we call a ‘tree sequence,’” said geneticist Anthony Wilder Wohns, PhD (above), in the Oxford news release. Wohns, a postdoctoral researcher in statistical and population genetics at the Broad Institute, led the study. “We can then estimate when and where these ancestors lived. The power of our approach is that it makes very few assumptions about the underlying data and can also include both modern and ancient DNA samples.” The study may result in new genetic biomarkers that lead to advances in clinical laboratory diagnostics for today’s diseases. (Photo copyright: Harvard School of Engineering and Applied Sciences.)
Tracking Genetic Markers of Disease
The BDI team overcame the major obstacle to tracing the origins of human genetic diversity when they developed algorithms to handle the massive amount of data created when combining genome sequences from many different databases. In total, they compiled the genomic sequences of 3,601 modern and eight high-coverage ancient people from 215 populations in eight datasets.
The ancient genomes included three Neanderthal genomes, a Denisovan genome, and a family of four people who lived in Siberia around 4,600 years ago.
The University of Oxford researchers noted in their news release that their method could be scaled to “accommodate millions of genome sequences.”
“This structure is a lossless and compact representation of 27 million ancestral haplotype fragments and 231 million ancestral lineages linking genomes from these datasets back in time. The tree sequence also benefits from the use of an additional 3,589 ancient samples compiled from more than 100 publications to constrain and date relationships,” the researchers wrote in their published study.
Wong believes his research team has laid the groundwork for the next generation of DNA sequencing.
“As the quality of genome sequences from modern and ancient DNA samples improves, the tree will become even more accurate and we will eventually be able to generate a single, unified map that explains the descent of all the human genetic variation we see today,” he said in the news release.
Developing New Clinical Laboratory Biomarkers for Modern Diagnostics
In a video illustrating the study’s findings, evolutionary geneticist Yan Wong, DPhil, a member of the BDI team, said, “If you wanted to know why some people have some sort of medical conditions, or are more predisposed to heart attacks or, for example, are more susceptible to coronavirus, then there’s a huge amount of that described by their ancestry because they’ve inherited their DNA from other people.”
Wohns agrees that the significance of their tree-recording methods extends beyond simply a better understanding of human evolution.
“[This study] could be particularly beneficial in medical genetics, in separating out true associations between genetic regions and diseases from spurious connections arising from our shared ancestral history,” he said.
The underlying methods developed by Wohns’ team could have widespread applications in medical research and lay the groundwork for identifying genetic predictors of disease risk, including future pandemics.
Clinical laboratory scientists will also note that those genetic indicators may become new biomarkers for clinical laboratory diagnostics for all sorts of diseases currently plaguing mankind.
With improved genetic sequencing comes larger human genome databases that could lead to new diagnostic and therapeutic biomarkers for clinical laboratories
As the COVID-19 pandemic grabbed headlines, the human genome database at the US Department of Veterans Affairs Million Veterans Program (MVP) quietly grew. Now, this wealth of genomic information—as well as data from other large-scale genomic and genetic collections—is expected to produce new biomarkers for clinical laboratory diagnostics and testing.
In December, cancer genomics company Personalis, Inc. (NASDAQ:PSNL) of Menlo Park, Calif., achieved a milestone and delivered its 100,000th whole human genome sequence to the MVP, according to a news release, which also states that Personalis is the sole sequencing provider to the MVP.
The VA’s MVP program, which started in 2011, has 850,000 enrolled veterans and is expected to eventually involve two million people. The VA’s aim is to explore the role genes, lifestyle, and military experience play in health and human illness, notes the VA’s MVP website.
Health conditions affecting veterans the MVP is researching include:
The VA has contracted with Personalis through September 2021, and has invested $175 million, Clinical OMICS reported. Personalis has earned approximately $14 million from the VA. That’s about 76% of the company’s revenue, according to 2nd quarter data, Clinical OMICS noted.
“The VA MVP is the largest whole genome sequencing project in the United States, and this is a significant milestone for both the program and for Personalis,” said John West (above with wife Judy), Founder and CEO of Personalis, in the news release. “Population-scale sequencing projects of this nature represent a cornerstone in our effort to accelerate the advancement of precision medicine across a wide range of disease areas,” he added. (Photo copyright: MIT Technology Review.)
Database of Veterans’ Genomes Used in Current Research
What has the VA gained from their investment so far? An MVP fact sheet states researchers are tapping MVP data for these and other veteran health-related studies:
Differentiating between prostate cancer tumors that require treatment and others that are slow-growing and not life-threatening.
How genetics drives obesity, diabetes, and heart disease.
How data in DNA translates into actual physiological changes within the body.
Gene variations and patients’ response to Warfarin.
NIH Research Program Studies Effects of Genetics on Health
Another research program, the National Institutes of Health’s All of Us study, recently began returning results to its participants who provided blood, urine, and/or saliva samples. The NIH aims to aid research into health outcomes influenced by genetics, environment, and lifestyle, explained a news release. The program, launched in 2018, has biological samples from more than 270,000 people with a goal of one million participants.
“We’re changing the paradigm for research. Participants are our most important partners in this effort, and we know many of them are eager to get their genetic results and learn about the science they’re making possible,” said Josh Denny, MD, CEO of the NIH’s All of Us research program in the news release. Denny, a physician scientist, was Professor of Biomedical Informatics and Medicine, Director of the Center for Precision Medicine and Vice President for Personalized Medicine at Vanderbilt University Medical Center prior to joining the NIH. (Photo copyright: National Institutes of Health.)
Inclusive Data Could Aid Precision Medicine
The news release notes that more than 80% of biological samples in the All of Us database come from people in communities that have been under-represented in biomedical research.
“We need programs like All of Us to build diverse datasets so that research findings ultimately benefit everyone,” said Brad Ozenberger, PhD, All of Us Genomics Program Director, in the news release.
Precision medicine designed for specific healthcare populations is a goal of the All of Us program.
“[All of Us is] beneficial to all Americans, but actually beneficial to the African American race because a lot of research and a lot of medicines that we are taking advantage of today, [African Americans] were not part of the research,” Chris Crawford, All of US Research Study Navigator, told the Birmingham Times. “As [the All of Us study] goes forward and we get a big diverse group of people, it will help as far as making medicine and treatment that will be more precise for us,” he added.
Large Databases Could Advance Care
Genome sequencing technology continues to improve. It is faster, less complicated, and cheaper to sequence a whole human genome than ever before. And the resulting sequence is more accurate.
Thus, as human genome sequencing databases grow, researchers are deriving useful scientific insights from the data. This is relevant for clinical laboratories because the new insights from studying bigger databases of genomic information will produce new diagnostic and therapeutic biomarkers that can be the basis for new clinical laboratory tests as well as useful diagnostic assays for anatomic pathologists.
The key to success with pooled testing, says the lab’s director, is having the right personnel and equipment, and an LIS that supports the added steps
Experts believe pooled testing for COVID-19 could reduce the number of standard tests for SARS-CoV-2 by conserving testing resources and cutting lab spending on tests and testing supplies. However, some clinical laboratories have found pooled testing causes inefficiencies due to the lab’s lack of staff, limitations of existing equipment, and biosafety hood space, as well as not having a laboratory information system (LIS) that can manage the large volume of specimens and retesting involved in pooled testing.
One such example is the microbiology lab at 562-bed University of Vermont Medical Center (UVMC) in Burlington, Vt. After evaluating the pooled-testing method, Christina M. Wojewoda, MD, pathologist, Director of Clinical Microbiology at UVMC and an Associate Professor at the Larner College of Medicine at University of Vermont, decided last summer not to do pooled testing, due to the manual steps that the process requires.
The manual steps include having clinical laboratory scientists work under protective hoods to limit the virus’ spread, and both hood space and med techs are in short supply at UVMC, she explained during an exclusive interview with The Dark Report, Dark Daily’s sister publication.
“Our evaluation then is the same as it is now,” she commented. “The barriers to pooling still hold true. Instead of pooling, we keep up with the volume of COVID-19 samples by balancing in-house SARS-CoV-2 testing and send-out testing.”
Low Viral Load a Problem in Pooled Testing for SARS-CoV-2
Another problem, Wojewoda added, is when one patient’s sample in a pool of specimens has a low viral load of SARS-CoV-2. Clinical labs in some states have found that when the prevalence of the novel coronavirus in the population is below 5%, then pooled testing could be an effective testing strategy. However, although Vermont has a relatively low presence of the COVID-19 virus in the population, Wojewoda remains concerned about the viral load in a pooled sample.
“For us, it is less of an issue with prevalence in the population than an issue with low viral load in one patient sample, and that can happen with any prevalence level,” she said. “If there is a low level of virus in one sample, and that sample is combined with samples from four other patients to create the pool, you could dilute the virus below the assay’s level of detection. That means you could miss low-level positive patients.
“When we first considered pooling, we worried about missing those patients, but since then we’ve learned more about the SARS-CoV-2 virus,” she continued. “Now, we now know that patients start producing high levels of virus quickly and that low virus levels often occur toward the end of their infection, after they’ve probably been tested or identified.
“That means we’re less concerned with low levels of virus now than we were initially, at least when pooling five specimens in one tube. But it’s still something to watch for,” she noted.
What About Too Much Virus?
The opposite of this problem also is a concern. If the incidence of infection is too high in a population, then pooled testing could produce too many positive results. The required retesting then makes the process inefficient.
Wojewoda has heard similar concerns from her colleagues at other medical laboratories. They said they were not doing pooled SARS-CoV-2 testing for some of the same reasons.
“When we looked into pooled testing, a number of complications made it impractical,” she said. “Instead, we have been testing each patient individually.”
When patient COVID-19 samples exceed 500 in a day, UVMC sends those specimens to the Broad Institute in Cambridge, Mass., for testing.
During the summer, the rate of COVID-19 infections in Vermont was at about 1%, Wojewoda noted. In the last week of December, the Vermont Department of Health reported the seven-day average percentage of positive tests was 2.2%.
Laboratory Information System Challenges When Doing Pooled Testing
In addition to her concerns about the level of detection, UVMC’s laboratory information system (LIS) was another worry. “Clinical laboratories are designed to test one sample and get one result, and that one result goes into one patient’s chart,” she explained. “But when the lab makes a pool of, say, five patients’ samples, those five results need to go into five patients’ charts.
Wojewoda estimates that manual data entry for each of those results takes a solid minute per sample. “That’s not a lot, but it adds up over time, and it’s not something we do normally.”
Normally, lab test results get filed automatically into the patient’s chart, and then those results are available to patients online, she noted.
“There may be multiple fixes for this problem of accurately and efficiently getting pooled test results into the LIS, then reported to each individual patient, but for us the current state of our computer system requires that we enter each result into each patient’s chart manually. We try not to do that as much as possible because of the potential for errors from manual entry,” she said.
When Automation Falls Short
In addition, Wojewoda said that pooled testing cannot be automated the way most standard clinical laboratory tests are run.
“With routine testing, we put a sample on the instrument and let the test run,” she explained. “When we get the result, it goes into the patient’s chart. But, for pooled testing, we have to collect five samples and then pause to manually put a little bit of each of those five samples into one tube. Then, we put that tube on the instrument.
“After we get the results, we manually report the negative results into each patient’s chart,” she continued. “But if they’re positive, then lab staff must find the five tubes and test each one individually. Therefore, we’re doubling the time it normally takes to produce and report a positive result for SARS-CoV-2.”
Any positive results in a pooled sample, she explained, are held up at the instrument so that the lab staff can pull those five samples from the pool and test each one individually. “Then those individual results go into each patient’s chart, because potentially only one of the five might be positive. We don’t want all five of those patients to be labeled as positive if only one is positive,” she added.
Pooled testing for COVID-19 adds a layer of complexity that the UVMC lab does not normally do, noted the lab’s Director Christina M. Wojewoda, MD (above), a pathologist and Director of Clinical Microbiology at the University of Vermont Medical Center (UVMC) in Burlington, in an interview with The Dark Report. She added that the lab’s staff is already stretched thin and doing as much as possible. “In all these ways, pooled testing is different from how we usually run clinical lab tests. It’s clear that the idea behind pooled testing is to improve efficiency, and yet the need for manual data entry and pulling pooled samples apart create inefficiencies,” she commented. (Photo copyright: University of Vermont.)
Shortage of Lab Techs and Hood Space Compound Inefficiencies of Pooled Testing
Another problem is the requirement to pipette each specimen, she noted. “All infectious samples require hood space and a lab technician to do the work under the hood. But both hood space and lab techs are in short supply.”
Wojewoda explained that some tests being run at the UVMC lab are not being tested from the primary tube.
“There’s often a step where we take some of the primary sample and put it into a tube or cartridge for the test. Then, we put multiple samples together, and we have to pipette each one into the tube without cross contaminating the other samples,” she explained.
“At the same time, we have to track the five patient samples so that we can find the original specimen for testing if we need to do so later. All those steps take more staff time.
“So, while pooled testing saves reagents, it also takes more staff time for pipetting and data entry and the need to record which samples are in which tubes,” she noted. “That might require a spreadsheet or other electronic means to track which samples come from which patients.
“An automated way to do the pipetting would be helpful and would increase staff safety,” she added. “I worry when we’re working with something as infectious as SARS-CoV-2, because the lab techs must dig swabs out of liquid media before discarding them, while being careful not to contaminate anything around them.”
Pooled testing for COVID-19 clearly has potential. But, as Wojewoda explained, it brings complications that can cause inefficiencies. Clinical laboratory managers will want to evaluate existing instrumentation, automation, staffing, and laboratory informatics capabilities to determine if and how their labs would experience similar inefficiencies before a final decision to begin a program of pooled testing for COVID-19.
Gene sequencing is enabling disease tracking in new ways that include retesting laboratory specimens from before the SARS-CoV-2 outbreak to determine when it arrived in the US
On February 26 of this year, nearly 200 executives and employees of neuroscience-biotechnology company Biogen gathered at the Boston Marriott Long Wharf hotel for their annual leadership conference. Unbeknownst to the attendees, by the end of the following day, dozens of them had been exposed to and become infected by SARS-CoV-2, the coronavirus that causes the COVID-19 illness.
Researchers now have hard evidence that attendees at this meeting returned to their communities and spread the infection. The findings of this study will be relevant to pathologists and clinical laboratory managers who are cooperating with health authorities in their communities to identify infected individuals and track the spread of the novel coronavirus.
This “superspreader” event has been closely investigated and has led to intriguing conclusions concerning the use of genetic sequencing to revealed vital information about the COVID-19 pandemic. Recent improvements in gene sequencing technology is giving scientists new ways to trace the spread of COVID-19 and other diseases, as well as a method for monitoring mutations and speeding research into various treatments and vaccines.
Genetic Sequencing Traces an Outbreak
“With genetic data, a record of our poor decisions is being captured in a whole new way,” Bronwyn MacInnis, PhD, Director of Pathogen Genomic Surveillance at the Broad Institute of MIT and Harvard, told The Washington Post (WaPo) during its analysis of the COVID-19 superspreading event. MacInnis is one of many Broad Institute, Harvard, MIT, and state of Massachusetts scientists who co-authored a study that detailed the coronavirus’ spread across Boston, including from the Biogen conference.
What they discovered is both surprising and enlightening. According to WaPo’s report, at least 35 new cases of the virus were linked directly to the Biogen conference, and the same strain was discovered in outbreaks in two homeless shelters in Boston, where 122 people were infected. The variant tracked by the Boston researchers was found in roughly 30% of the cases that have been sequenced in the state, as well as in Alaska, Senegal, and Luxembourg.
“The data reveal over 80 introductions into the Boston area, predominantly from elsewhere in the United States and Europe. We studied two superspreading events covered by the data, events that led to very different outcomes because of the timing and populations involved. One produced rapid spread in a vulnerable population but little onward transmission, while the other was a major contributor to sustained community transmission,” the researchers noted in their study abstract.
“The same two events differed significantly in the number of new mutations seen, raising the possibility that SARS-CoV-2 superspreading might encompass disparate transmission dynamics. Our results highlight the failure of measures to prevent importation into [Massachusetts] early in the outbreak, underscore the role of superspreading in amplifying an outbreak in a major urban area, and lay a foundation for contact tracing informed by genetic data,” they concluded.
The use of genetic sequencing to trace the virus could inform measures to control the spread in new ways, but currently, only about 0.33% of cases in the United States are being sequenced, MacInnis told WaPo, and that not sequencing samples is “throwing away the crown jewels of what you really want to know.”
Another role that genetic sequencing is playing in this pandemic is in tracking viral mutations. One of the ways that pandemics worsen is when viruses mutate to become deadlier or more easily spread. Scientists are using genetic sequencing to monitor SARS-CoV-2 for such mutations.
A group of scientists at Texas A&M University led by Yue Xing, PhD, published a paper titled, “MicroGMT: A Mutation Tracker for SARS-CoV-2 and Other Microbial Genome Sequences,” which explains that “Although most mutations are expected to be selectively neural, it is important to monitor if SARS-CoV-2 will eventually evolve to be a stronger or weaker infectious agent as time goes on. Therefore, it is vital to track mutations from newly sequenced SARS-CoV-2 genome.”
Korber’s findings are important because the mutation the scientists identified appears to have a fitness advantage. “Our data show that, over the course of one month, the variant carrying the D614G Spike mutation became the globally dominant form of SARS-CoV-2,” they wrote. Additionally, the study noted, people infected with the mutated variant appear to have a higher viral load in their upper respiratory tracts.
Genetic Sequencing, the Race for Treatments, Vaccines, and Managing Future Pandemics
If, as Fauci and Morens predict, future pandemics are likely, improvements in gene sequencing and analysis will become even more important for tracing, monitoring, and suppressing outbreaks. Clinical laboratory managers will want to watch this closely, as medical labs that process genetic sequencing will, no doubt, be part of that operation.