genome analysis

Nvidia and Harvard’s new tool speeds up genome analysis

According to the latest news, researchers affiliated with Nvidia and Harvard developed AtacWorks, a machine learning toolkit created to reduce the cost and time needed for rare and single-cell experiments. The analysis says that AtacWorks can run analyses on a whole-genome in just half an hour. At the same time, the traditional methods take multiple hours.

Significantly, most body cells carry around a complete copy of a person’s DNA. Billions of base pairs are crammed into the nucleus. However, an individual cell pulls out only the subsection of genetic components that it needs to function, with cell types like liver, blood, or skin cells using different genes.

AtacWorks works with ATAC-seq, a method for finding open areas in the genome in cells discovered by Harvard professor Jason Buenrostro. ATAC-seq gauges the intensity of a signal at every spot on the genome. Notably, peaks in the signal correspond to regions with DNA. The fewer cells available, the noisier the data appears, making it hard to know which DNA areas are accessible.

AtacWorks took under 30 minutes for inference on a genome

It is worth noting that ATAC-seq needs tens of thousands of cells to get a clean signal. Applying AtacWorks produces an equal quality of results with just tens of cells.

The Interesting question is, how it works? AtacWorks was learned on labeled pairs of ATAC-seq datasets, one high-quality and one noisy. With the help of a downsampled copy of the data, the model learned to predict an accurate, high-quality version and identify peaks in the signal.

Moreover, with the help of AtacWorks, the researchers found that they could spot accessible chromatin. Chromatin is a complex of DNA and protein whose role is packaging long molecules into more compact structures, in a noisy sequence of 1m reads nearly as well as traditional methods did with a clean dataset of 50m reads.

Significantly, running on Nvidia Tensor Core GPUs, AtacWorks took under 30 minutes for inference on a genome. This process would take 15 hours on a system with 32 CPU cores.

User Review
0 (0 votes)


Leave a Reply