Scientists managed to read the human genome with a pocket device for sequencing MinION the size of a smartphone. An article published in Nature Biotechnology also describes a record set by the same device for the longest reading of a DNA molecule – the length of a continuous reading was 882 thousand base pairs.
Despite the fact that genomes of dozens of animal species have been read to date, and the cost of sequencing has decreased by orders of magnitude compared to the first such experiments, the determination of the complete DNA sequence of eukaryotic organisms is still a non-trivial task. This is mainly due to the fact that a significant part of the genome is a repeating sequence – microsatellite DNA, tandem repeats, retroelements and so on.
The most common technology of high-performance DNA sequencing for today means the decomposition of the DNA molecule into small pieces of several hundred base pairs, their amplification (multiplication) and reading. From such small pieces with the help of mathematical algorithms, then the complete sequence of the genome is restored (this process is called an assembly). Many parts of DNA, especially those containing repeats, generally drop out, or researchers are not sure of their exact sequence. Even in the reference human genome, which was first published in 2001, there are still gaps.
To avoid this, engineers focused on sequencing technologies that allow the determination of the sequence of as long as possible a DNA molecule, preferably without amplification. To date, the most popular solution for reading and assembling large genomes was the technology of the company PacBio, which allows continuously reading several tens of thousands of pairs. For small genomes, for example, bacterial, Oxford Nanopore Technologies in 2014 proposed a “pocket” sequencer MinION , which is also able to continuously read long sequences, but it is limited in capacity.
MinION is a device the size of a smartphone that connects to a computer via a USB cable. The principle of its operation is based on the measurement of electrical conductivity during the stretching of the DNA molecule through the pore in the membrane of the device. The cost of the device and the starting kit of reagents is one thousand dollars, which is quite cheap compared to other existing technologies. Developers position it as a field sequencer, which can be used “in the jungle, in the Arctic, at the space station.” In confirmation of this recently with the help of MinION, several DNA sequences have indeed been read on the ISS , including the mouse mitochondrial genome.
Researchers from several American and Canadian institutions, including the University of California and the National Research Institute of the Human Genome (USA) have shown that with the help of MinION it is possible to successfully read the human genome. Moreover, scientists have optimized the sequencing protocol for reading ultra-long DNA fragments in hundreds of thousands of base pairs.The size of the human genome is about 3 billion base pairs (three gigabases). The DNA of the human cell line GM12878 was read by the staff of five laboratories, 39 working MinION cells and an optimized protocol for sample preparation were used. As a result, scientists obtained 91.2 gigabytes of data, which corresponds to a 30-fold coverage of the genome. The length of more than half of the read DNA fragments was 100,000 base pairs or more. Additionally, researchers have shown that with an optimized protocol it is possible to determine the DNA sequence up to 882,000 base pairs. In fact, the maximum length of reading is determined only by the quality of DNA extraction.
To combine sequences into one authors, we also had to optimize the build algorithms, since most existing programs are sharpened for short fragments. After comparison with the reference genome of line GM12878, which was read by more traditional methods many times, it turned out that the resulting sequence covers 85.8 percent of the genome, and the accuracy of assembly is close to 100 percent.
Reading such long DNA fragments during sequencing allowed the authors to fill in the gaps in the sequence of the human genome and simplify the analysis of many of its sections. For example, the locus of the main histocompatibility complex (HLA) has a complex structure and many repetitions, so its sequence is very difficult to determine. In this case, all the genes were in one continuous read sequence, which saved the researchers from the need to painstakingly assemble it from pieces. In addition, scientists demonstrated the ability to recognize with this technology sequencing epigenetic tags, in particular, DNA methylation, which with other technologies was impossible.
Sequencing of the human genome is a kind of reference point in determining the performance of the technology. This work promises to further simplify and reduce the cost of not only direct determination of the DNA sequence, but also its analysis. More information about the history of DNA sequencing technologies can be found in our material .