Genomic Sequencing at Children’s Mercy: Saving Time to Save Lives
Genomic sequencing – that is, rapid sequencing – is instrumental to diagnosing and treating critically ill patients, and managing the high data volumes involved in genomics is essential to the process. Children’s Mercy Hospital in Kansas City, MO, (354 beds, not-for-profit, treating children from birth through the age of 21) operates what it says is the world’s first whole genome sequencing center in a pediatric setting, where physicians, clinical laboratory scientists, molecular geneticists, bioinformaticians and software engineers work to sequence and analyze rare inherited diseases.
Children’s Mercy previously set the standard for decoding a genome in about two days — a process that had taken six weeks or longer. Now the hospital is deploying advanced scale technologies with the goal of completing the entire process, from enrollment to delivery of a final report to the physician, in 26 hours.
“Genomic sequencing, coupled with high-performance computing, gives us a whole new look at medicine by quickly shedding light on DNA variations that can explain a child’s condition,” said Shane Corder, HPC systems engineer at Mercy’s Center for Pediatric Genomic Medicine. The goal of genomic testing is to avoid unnecessary and painful tests, such as muscle biopsies, and to start treatment sooner.
“It’s not uncommon for children to endure dozens of clinic visits and scores of painful tests in search of a cause to their rare illness,” Corder said. “With genomic testing, we can get answers without repeated visits, painful tests or undue financial burden on the parents.”
The need for enhanced sequencing is intense. There are roughly 8000 known genetic diseases in the U.S., and one in every 30 children is known to have one of them. One in six children are admitted to a Kansas City hospital for a genetic disease, and unfortunately, these genetic diseases cause one in five deaths, according to Corder. The causes of less than 5000 diseases remain unknown, he said.
Sequencing is notoriously compute- and data-intensive, requiring advanced processing and data storage to support testing based on whole genome sequencing and whole exome sequencing. Each person’s DNA has more than 6.4 billion bases (DNA’s basic building blocks) encompassing 22,000 genes that code for 100,000 proteins.
“Clearly, informatics is the bottleneck in genomics,” said Corder. “Our goal is to keep pace with the data deluge in both our clinical and research environments so we can quickly analyze data to produce meaningful insights.”
On the compute side, Children’s Mercy has invested in an advanced scale cluster with 40 Linux nodes totaling 1,300 cores. For storage, Children’s Mercy has deployed DDN’s GS7K parallel file system appliance, with 1PB of storage. The center had previously deployed traditional scale-out NAS, which initially met their storage requirement for handling both clinical and research workflows. But over time, that approach lacked the scalability in performance and capacity to address demanding data creation and access needs, according to the hospital.
“We needed a highly scalable platform that could scale up or out to handle massive data ingest, processing, storage and collaboration,” said Corder. “Ultimately, we needed a more flexible, powerful approach.”
Additionally, because Children’s Mercy faced serious space constraints in its data center, it wanted a high-density solution.
The center selected DDN storage technology after determining DDN could keep pace with the diverse demands of the center’s Illumina HiSeq sequencers. In addition, DDN’s GRIDScaler platform helps enable the center’s collaboration efforts by integrating with IBM’s Spectrum Scale-based parallel file system to support clinical and research workflows. In addition, DDN is able to couple with the center’s planned deployment of the Edico DRAGEN Bio-IT processor, an FPGA-based genomic analysis acceleration technology.
The center’s IT infrastructure also includes the Nordic suite of software tools, designed to improve the alignment and analysis of critical data involved in the center’s STAT-Seq test for decoding an entire genome, which Time magazine in 2012 named a medical breakthrough of the year. According to Corder, the DDN GS7K scales to the processing needs of these tools, which are used with different types of sequencing, testing and specimen options, include SSAGA, which maps genes to a symptom; RUNES, for variant characterization; and VIKING, which integrates outputs from the two previous tests for filtering and trio familial analysis. The time required to perform STAT-Seq has been reduced by about 43 percent, or from 2.5 to 1.75 hours.
“The deluge of data is an industry-wide issue in healthcare,” Laura Shepard, DDN’s director of HPC markets, told EnterpriseTech. “With the increase in the amount of data being generated in sequencing processes, and also in microscopy and other areas of instrumentation, it’s across life sciences and all the research disciplines within it.”
She said high performance storage is increasingly needed because the amount of raw data output generated by genomic sequencing has increased “something like 1000 times over the last four or five years,” whereas traditional scale-out NAS has increased in performance by only about four times. “What you’re seeing is that the informatics portion of sequencing is becoming a huge bottleneck to achieving the outcome for the individual patient.
“So you have this broad industry-wide issue – how do I manage more and more data and infrastructure that I’ve built out that is really not scaling the performance to allow me to move quickly? What DDN is able to do is much faster, from a brute force level, remove that bottleneck from ingest, alignment and analysis standpoint.”