By harnessing advanced AI, MethylGPT decodes DNA methylation with unprecedented accuracy, offering new paths for age prediction, disease diagnosis, and personalized health interventions.
In a recent study posted to the bioRxiv preprint* server, researchers developed a transformer-based foundation model, MethylGPT, for the DNA methylome.
DNA methylation is a type of epigenetic modification that regulates gene expression via methyl-binding proteins and changes in chromatin accessibility. It also helps maintain genomic stability through transposable element repression. DNA methylation has features of an ideal biomarker, and studies have revealed distinct methylation signatures across pathological states, allowing for molecular diagnostics.
Nevertheless, several analytic challenges impede the implementation of diagnostics based on DNA methylation. Current approaches rely on simple statistical and linear models, which are limited in capturing complex, non-linear data. They also fail to account for context-specific effects such as higher-order interactions and regulatory networks. Therefore, a unified analytical framework that can model complex, non-linear patterns in various tissue and cell types is urgently needed.
Recent advances in foundation models and transformer architectures have revolutionized analyses of complex biological sequences. Foundation models have also been introduced for various omics layers, such as AlphaFold3 and ESM-3 for proteomics and Evo and Enformer for genomics. The achievements of the foundation models suggest that DNA methylation analyses could be transformed with a similar approach.
The study and findings
In the present study, researchers developed MethylGPT, a transformer-based foundation model for the DNA methylome. First, they acquired data on 226,555 human DNA methylation profiles spanning multiple tissue types from the EWAS Data Hub and Clockbase. Following deduplication and quality control, 154,063 samples were retained for pretraining. The model focused on 49,156 CpG sites, which were selected based on their known associations with various traits, as this would maximize their biological relevance.
The model was pre-trained using two complementary loss functions: masked language modeling (MLM) loss and profile reconstruction loss, enabling it to accurately predict methylation at masked CpG sites. The model achieved a mean squared error (MSE) of 0.014 and a Pearson correlation of 0.929 between predicted and actual methylation levels, indicating high predictive accuracy. Researchers also evaluated whether the model could capture biologically relevant features of DNA methylation. As such, they analyzed the learned representations of CpG sites in the embedding space.
They found that CpG sites clustered based on their genomic contexts, suggesting that the model learned the regulatory features of the methylome. In addition, there was a clear separation between autosomes and sex chromosomes, indicating that MethylGPT also captured higher-order chromosomal features. Next, the team analyzed zero-shot embedding spaces. This showed a clear biological organization, clustering by sex, tissue type, and genomic context.
Major tissue types formed well-defined clusters, indicating that the model learned methylation patterns specific to tissues without explicit supervision. Notably, MethylGPT also avoided batch effects, which often confound results in complex datasets. Besides, female and male samples demonstrated consistent separation, reflecting sex-specific differences. Next, the researchers assessed the ability of MethylGPT to predict chronological age from methylation patterns. To this end, they used a dataset of over 11,400 samples from diverse tissue types.
Fine-tuning for age prediction led to robust age-dependent clustering. Notably, intrinsic age-related organization was evident even before fine-tuning. Moreover, MethylGPT outperformed existing age prediction methods (e.g., Horvath’s clock and ElasticNet), achieving superior accuracy. Its median absolute error for age prediction was 4.45 years, further demonstrating its robustness. MethylGPT was also remarkably resilient to missing data. It exhibited stable performance with up to 70% missing data, outperforming multi-layer perceptron and ElasticNet approaches.
Analysis of methylation profiles during induced pluripotent stem cell (iPSC) reprogramming showed a clear rejuvenation trajectory; samples progressively transitioned to a younger methylation state over the course of reprogramming. The model was also able to identify the point during reprogramming (day 20) when cells began showing clear signs of epigenetic age reversal. Finally, the model’s ability to predict disease risk was assessed. The pre-trained model was fine-tuned to predict the risk of 60 diseases and mortality. The model achieved an area under the curve of 0.74 and 0.72 on validation and test sets, respectively.
In addition, they used this disease risk prediction framework to evaluate the impact of eight interventions on predicted disease incidence. Interventions included smoking cessation, high-intensity training, and the Mediterranean diet, among others, each of which showed varying degrees of effectiveness across disease categories. This showed distinct intervention-specific effects across disease categories, highlighting the potential of MethylGPT in predicting intervention-specific outcomes and optimizing tailored intervention strategies.
Conclusions
The findings illustrate that transformer architectures could effectively model DNA methylation patterns while preserving biological relevance. The organization of CpG sites based on regulatory features and genomic context suggests that the model captured fundamental aspects without explicit supervision. MethylGPT also demonstrated superior performance in age prediction across different tissues. Moreover, its robust performance in handling missing data (≤ 70%) underscores its potential utility in clinical and research applications.

News
How the FDA opens the door to risky chemicals in America’s food supply
Lining the shelves of American supermarkets are food products with chemicals linked to health concerns. To a great extent, the FDA allows food companies to determine for themselves whether their ingredients and additives are [...]
Superbug crisis could get worse, killing nearly 40 million people by 2050
The number of lives lost around the world due to infections that are resistant to the medications intended to treat them could increase nearly 70% by 2050, a new study projects, further showing the [...]
How Can Nanomaterials Be Programmed for Different Applications?
Nanomaterials are no longer just small—they are becoming smart. Across fields like medicine, electronics, energy, and materials science, researchers are now programming nanomaterials to behave in intentional, responsive ways. These advanced materials are designed [...]
Microplastics Are Invading Our Arteries, and It Could Be Increasing Your Risk of Stroke
Higher levels of micronanoplastics were found in carotid artery plaque, especially in people with stroke symptoms, suggesting a potential new risk factor. People with plaque buildup in the arteries of their neck have been [...]
Gene-editing therapy shows early success in fighting advanced gastrointestinal cancers
Researchers at the University of Minnesota have completed a first-in-human clinical trial testing a CRISPR/Cas9 gene-editing technique to help the immune system fight advanced gastrointestinal (GI) cancers. The results, recently published in The Lancet Oncology, show encouraging [...]
Engineered extracellular vesicles facilitate delivery of advanced medicines
Graphic abstract of the development of VEDIC and VFIC systems for high efficiency intracellular protein delivery in vitro and in vivo. Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-59377-y. https://www.nature.com/articles/s41467-025-59377-y Researchers at Karolinska Institutet have developed a technique [...]
Brain-computer interface allows paralyzed users to customize their sense of touch
University of Pittsburgh School of Medicine scientists are one step closer to developing a brain-computer interface, or BCI, that allows people with tetraplegia to restore their lost sense of touch. While exploring a digitally [...]
Scientists Flip a Gut Virus “Kill Switch” – Expose a Hidden Threat in Antibiotic Treatment
Scientists have long known that bacteriophages, viruses that infect bacteria, live in our gut, but exactly what they do has remained elusive. Researchers developed a clever mouse model that can temporarily eliminate these phages [...]
Enhanced Antibacterial Polylactic Acid-Curcumin Nanofibers for Wound Dressing
Background Wound healing is a complex physiological process that can be compromised by infection and impaired tissue regeneration. Conventional dressings, typically made from natural fibers such as cotton or linen, offer limited functionality. Nanofiber [...]
Global Nanomaterial Regulation: A Country-by-Country Comparison
Nanomaterials are materials with at least one dimension smaller than 100 nanometres (about 100,000 times thinner than a human hair). Because of their tiny size, they have unique properties that can be useful in [...]
Pandemic Potential: Scientists Discover 3 Hotspots of Deadly Emerging Disease in the US
Virginia Tech researchers discovered six new rodent carriers of hantavirus and identified U.S. hotspots, highlighting the virus’s adaptability and the impact of climate and ecology on its spread. Hantavirus recently drew public attention following reports [...]
Studies detail high rates of long COVID among healthcare, dental workers
Researchers have estimated approximately 8% of Americas have ever experienced long COVID, or lasting symptoms, following an acute COVID-19 infection. Now two recent international studies suggest that the percentage is much higher among healthcare workers [...]
Melting Arctic Ice May Unleash Ancient Deadly Diseases, Scientists Warn
Melting Arctic ice increases human and animal interactions, raising the risk of infectious disease spread. Researchers urge early intervention and surveillance. Climate change is opening new pathways for the spread of infectious diseases such [...]
Scientists May Have Found a Secret Weapon To Stop Pancreatic Cancer Before It Starts
Researchers at Cold Spring Harbor Laboratory have found that blocking the FGFR2 and EGFR genes can stop early-stage pancreatic cancer from progressing, offering a promising path toward prevention. Pancreatic cancer is expected to become [...]
Breakthrough Drug Restores Vision: Researchers Successfully Reverse Retinal Damage
Blocking the PROX1 protein allowed KAIST researchers to regenerate damaged retinas and restore vision in mice. Vision is one of the most important human senses, yet more than 300 million people around the world are at [...]
Differentiating cancerous and healthy cells through motion analysis
Researchers from Tokyo Metropolitan University have found that the motion of unlabeled cells can be used to tell whether they are cancerous or healthy. They observed malignant fibrosarcoma [...]