A research team from Johns Hopkins Medicine and Johns Hopkins University has developed a machine-learning (ML) tool capable of predicting who has the highest probability of being naturally resistant to COVID-19 infection despite being exposed to SARS-CoV-2, the virus that causes it.

The study, published this week in PLOS One, aims to better understand the factors that influence COVID-19 resistance.

“If we can identify which people are naturally able to avoid infection by SARS-CoV-2, we may be able to learn — in addition to societal and behavioral factors — which genetic and environmental differences influence their defense against the virus,” said Karen (Kai-Wen) Yang, lead study author and a biomedical engineering graduate student in the Translational Informatics Research and Innovation Lab at Johns Hopkins University, in the press release. “That insight could lead to new preventive measures and more highly targeted treatments.”

To develop their model, the researchers gathered data from the Johns Hopkins COVID-19 Precision Medicine Analytics Platform Registry (JH-CROWN), which contains information for patients with a suspected or confirmed SARS-CoV-2 infection seen within the Johns Hopkins Health System, the press release states.

From this information, the research team selected patients who had received a COVID-19 test between June 10, 2020, and Dec. 15, 2020, and reported “potential exposure to the virus” as the reason for testing. Dec. 15 was chosen as the end date because it was just before large-scale COVID-19 vaccination efforts began in the US, which allowed researchers to avoid the confounding effects of vaccines, rather than natural resistance, on preventing COVID-19 infection.

The final cohort comprised 8,536 study participants who were divided into two groups: those who either did not share a household with any COVID-19 patients or whose household had 10 or more patients, and those who shared a residence with 10 or fewer people, with at least one being a COVID-19 patient.

The first group, consisting of 8,476 participants, served as the training and initial testing test, while the remaining 60 participants were grouped into a Household Index (HHI) Set, which served as a separate testing set.

EHR data from the cohort was analyzed using the Maximal-frequent All-confident pattern Selection Pattern-based Clustering (MASPC) algorithm, which combines patient demographic information, the relevant International Statistical Classification of Diseases and Related Health Problems (ICD) medical diagnostic codes, outpatient medication orders, and the number of comorbidities present for each patient.

“We hypothesized that MASPC would enable us to cluster patients with similar patterns in their data to define them as resistant and non-resistant to SARS-CoV-2, and with the hope that the algorithm would learn with each analysis how to improve the accuracy and reliability of future assignments,” explained co-senior study author Stuart Ray, MD, vice chair of medicine for data integrity and analytics, and professor of medicine at the Johns Hopkins University School of Medicine, in the press release. “This initial study using JH-CROWN data was conducted to give life to that hypothesis, a proof-of-concept trial of our statistical model to show that resistance to COVID-19 might be predictable based [on] a patient’s clinical and demographic profile.”

The researchers were able to identify 56 of these patterns, five of which captured who was most likely exposed to the virus.

“Looking for these patterns in HHI Set — the individuals most likely to have been exposed to SARS-CoV-2 in close quarters — and then statistically analyzing the results, our model’s best performance was 0.61,” says Ray. “Since a score of 0.5 shows only chance association between the prediction and reality, and 1 is 100% association, this shows the model has promise as a tool for identifying people with COVID-19 resistance who can be further studied.”

The researchers noted that the study has multiple limitations, such as potential bias from the self-reporting of COVID-19 exposure by participants, the small number of participants in the HHI group, the short timeframe of the study, and the possibility that participants may have taken tests for SARS-CoV-2 using home kits or at facilities outside the Johns Hopkins system, which would not have been recorded in the JH-CROWN database.