Advanced Computing in the Age of AI | Friday, March 29, 2024

Baylor’s HGSC, DNAnexus Collaborate for Large-Scale Genome Analysis 

The Human Genome Sequencing Center (HGSC) at Baylor College of Medicine and DNAnexus are coming together to analyze genomic data on a large scale.

In doing so, Baylor has made 430 terabytes of data available to over 300 researchers at the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium.

As a part of the collaboration HGSC has adopted DNAnexus’ enterprise cloud platform to power Baylor’s semi-automated, sequencing analysis pipeline called Mercury. Working with both DNAnexus as well as HGSC is Amazon Web Services, which is in turn using the Mercury pipeline to process CHARGE data. The financial terms of the collaboration were not disclosed.

“Many large-scale population studies to date have been limited in scope by a lack of the necessary compute power; this is a real hindrance in realizing the full promise of genomic medicine,” says DNAnexus CEO Richard Daly. “Through this collaboration with the HGSC and Amazon Web Services, 300 scientists can now perform downstream analyses on these invaluable health and aging data at a scale not previously possible.”

The project currently touches five institutions around the world, analyzing genome sequencing data from over 14,000 individuals to look into 3,751 complete genomes and 10,771 exomes.

As HGSC explains, to tackle a genomics project of this enormous scale required either quadrupling their existing compute core capacity or “jamming the cluster” for three to four weeks to get the job done. Instead, deploying a cloud-based infrastructure allowed HGSC to effectively scale up without making unnecessary investments.

To handle this load, CHARGE needs 2.4 million core-hours and 860 terabytes of storage. When the load hits its maximum, HGSC says the DNAnexus platform was used “to spin up more than 20,000 cores on demand” to push CHARGE data through the Mercury analysis pipeline.

“The management and analysis of genomes at the scale needed to appropriately power clinical studies requires computational infrastructure that exceeds the capacity of most institutional resources,” says Jeffrey Reid, assistant professor in Baylor’s department of Molecular and Human Genetics. “Working with DNAnexus and Amazon Web Services, we were able to rapidly deploy a cloud-based solution that allows us to scale up our support to researchers at the HGSC, and make our Mercury pipeline analysis data accessible to the CHARGE Consortium, enabling what will be the largest genomic analysis project that has even taken place in the cloud.”

EnterpriseAI