Genomic Epidemiology and Recent Update on Nucleic Acid–Based Diagnostics for COVID-19

Purpose of the Review The SARS-CoV-2 genome has been sequenced and the data is made available in the public domain. Molecular epidemiological investigators have utilized this information to elucidate the origin, mode of transmission, and contact tracing of SARS-CoV-2. The present review aims to highlight the recent advancements in the molecular epidemiological studies along with updating recent advancements in the molecular (nucleic acid based) diagnostics for COVID-19, the disease caused by SARS-CoV-2. Recent Findings Epidemiological studies with the integration of molecular genetics principles and tools are now mainly focused on the elucidation of molecular pathology of COVID-19. Molecular epidemiological studies have discovered the mutability of SARS-CoV-2 which is of utmost importance for the development of therapeutics and vaccines for COVID-19. The whole world is now participating in the race for development of better and rapid diagnostics and therapeutics for COVID-19. Several molecular diagnostic techniques have been developed for accurate and precise diagnosis of COVID-19. Summary Novel genomic techniques have helped in the understanding of the disease pathology, origin, and spread of COVID-19. The whole genome sequence established in the initial days of the outbreak has enabled to identify the virus taxonomy. Several rapid, accurate, and sensitive diagnostic methods have been developed; those are based on the principle of detecting SARS-CoV-2 nucleic acids in clinical samples. Most of these molecular diagnostics are based on RT-PCR principle.


Introduction
The Coronavirus Disease 2019 (COVID-19) pandemic became a global concern within a few months of the first report of the disease in December, 2019 in Wuhan, China. Soon, COVID-19, the zoonotic disease caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus, became the major focus in almost all the fields of biological sciences. Scientists and researches all over the globe started investigating the molecular pathogenesis of the disease. The major disciplines of biological sciences such as molecular genomics, virology, and especially the molecular epidemiological discipline have been affected widely [1][2][3]. Several diagnostic methods have been developed as a result of the investigation and elucidation of the genomic sequence of the SARS-CoV-2 and the knowledge about its evolutionary history. The present review summarizes the molecular epidemiological studies related to COVID-19 and the recent advancements in the development of molecular diagnostics for COVID-19.

Molecular Epidemiology of COVID-19
The International Association of Epidemiology defines molecular epidemiology as "the application of epidemiologic principles to study the molecular, biochemical, cellular and genetic mechanisms that underlie the pathophysiology, etiology and prevention of human diseases and related outcomes, as well as their early detection, treatment, or prognosis".
The application of systems biology, genomics, molecular, and cell biology techniques in epidemiology discipline has revolutionized the field of epidemiology and this has given a new discipline altogether called molecular epidemiology. This integrated multidisciplinary approach has resulted in the discovery of several novel biomarkers (genomic) to trace various infectious diseases. Also, molecular epidemiological studies have explored the etiology of several diseases providing detailed knowledge about the molecular pathology and disease progression. Ultimately, these contributions of the molecular epidemiological studies help design preventive strategies and management protocols [4]. Rapid advancements in sequencing technologies have further advanced the field of molecular epidemiology. The traditional molecular techniques have been made high throughput with these advancements. Therefore, these high throughput molecular sequencing-based technologies have started replacing the traditional molecular diagnostics and this has made the epidemiological investigations of the infectious diseases more rapid, specific, and sensitive. Sequencing and analysis of the whole genomic sequence of the pathogens in comparison with the already available database of sequences can increase the speed and accuracy of the epidemiological studies, thereby helping in the investigations of the disease pathogenesis and ways to tackle the infectious diseases [5].
Novel genomic techniques have helped in the understanding of the disease pathology, origin, and spread of COVID-19. The whole genome sequence established in the initial days of the outbreak has enabled to identify the virus taxonomy. These molecular tools have established that SARS-CoV-2 belongs to betacoronavirus family and shows clear divergence from SARS-CoV and MERS-CoV [6][7][8]. Application of a combination of genomic tools and phylogenetic analysis methods revealed that the SARS-CoV-2 along with the Bat-SARS-like coronavirus cluster together into a discrete lineage and the subgenus, Sarbecovirus [9, 10••].
The virology field has integrated the next-generation sequencing tools and has now moved from targeted sequencing of single genomes to genomic epidemiology [11]. These advancements have enabled the parallel sequencing of thousands of genomic sequences of pathogens of different species for the detection of pathogens. This approach has resulted in the availability of more than 2220 full genomes in the SARS-CoV-2 public database of the Global Initiative on Sharing All Influenza Data (GISAID) [12]. Initially, GISAID was dedicated to sharing of genomic data on influenza. However, GISAID has now extended its scope and has included the constantly updating SARS-CoV-2 database [13].
The SARS-CoV-2 genomic sequences available till date mainly fall into three clades. The S clade has 541 genomes, G clade has 931 genomes, and V clade has 208 genomes. Other 548 SARS-CoV-2 genomes fall in other additional clades. Till March 28, 2020, the GISAID repository on receptor binding surveillance has identified four different rare variants in the close proximity of the binding interface. The genomic variant V483A was identified in 16 samples from USA/ WA, L455I and F456V in one sample from Brazilian sample, and the genomic variant G476S in 10 samples from USA/WA samples [12]. The GISAID has also provided potential drug targets by analyzing the degree of sequence similarity of highly conserved genomic sequences between hCoV-19 and SARS virus. The main SARS protease shares 96% sequence identity and the polymerase shows 97% sequence identity with the protease and polymerase of hCoV-19, respectively. The inhibitors developed against SARS-CoV protease and polymerase also bind to the protease and polymerase of hCoV-19 in a similar way [12].
The evolution in the field of genomics has a prominent impact on the approaches employed in the epidemiology and public health [14][15][16]. Genomic epidemiology has helped reconstruct the evolutionary lineage of the viruses including the appearance of the viruses and their global spread. This integrated approach has assisted in the exploration of the evolutionary history and the structure and spread of the SARS-CoV-2 virus. Since the SARS-CoV-2 genome is mutating constantly, it is important to establish these emerging genomic variants that would help in understanding the source of transmission and would also help in accessing the community transmission. Genomic comparisons of the viral genomes help in tracing the most probable ancestor and from it has originally originated (the geographic location).
With the emerging numbers of new genomic sequences of SARS-CoV-2 from different sources of different geographical locations, a better understanding of the genomic epidemiology of SARS-CoV-2 would be established soon. Genomic epidemiology has the potential to recognize the cluster of transmission, the biological rate of transmission, and to predict the extent of spread of the COVID-19 pandemic.
Different countries all over the world have started projects to identify new genomic sequences of the SARS-CoV-2 virus to have a tight watch on the progress of the pandemic in terms of understanding the genetic diversity of the novel virus, the epidemiological patterns, developing suitable diagnostic protocols and methods, and ultimately designing of vaccines and the development of therapeutics for COVID-19. Many countries from the developed nations such as Europe and North America have reported genomic sequence of SARS-CoV-2 isolated from their regions. However, several countries in Asia, Middle East, Africa, and Latin America lack such infrastructure and projects. In this case, a number of publically available open-access platforms such as GISAID and associated sources (https://nextstrain.org/ncov) have proven to be useful and easily accessible tools to understand the COVID-19 pandemic by using the available genomic sequence of the SARS-CoV-2 virus [12].

Advancements in the Molecular Diagnostics for COVID-19
Since the day of the availability of the genomic sequence of SARS-CoV-2 virus in the GISAID, several companies and research organizations developed several diagnostic molecular diagnostic methods to detect the SARS-CoV-2 in the clinical samples. The access to the genome sequence data of SARS-CoV-2 virus has enabled to design primers and probes to target the viral genome and to develop SARS-CoV-2 specific tests [17].

Reverse Transcription-Polymerase Chain Reaction
RT-PCR is considered to be the gold standard for the detection of SARS-CoV-2 in the clinical samples. It relies on the amplification of the viral RNA which is first concerted to CDNA and then specific primers directed against the viral genome identify the presence of the virus in the clinical samples. Samples collected from the upper respiratory tract are currently used for RT-PCR-based testing. Reports are available that highlight the use of serum, stool, and ocular secretions as the source of viral RNA [18][19][20]. Most recently self-collected saliva samples have been used for RT-PCR tests. This method reduces the pain, risk of infection of the health care providers, and a fast process [21,22].
In RT-PCR, the primers target specific sequences on the viral CDNA and amplify the viral genome. Then, the RT-PCR is monitored in real time by the use of fluorescent dye or a sequence-specific DNA probe labeled with a fluorescent molecule [23].
RT-PCR-based methods are rapidly evolving. GenMark Diagnostics has recently developed a RT-PCR-based kit to rapidly detect SARS-CoV-2 in nasopharyngeal samples. The ePlex SARS-CoV-2 test employs test cartridges that contain reagents for magnetic field-based extraction of viral RNA.
The cartridge also has reagents for cDNA amplification and detection of viral genome in the sample by the use of ferrocene-labeled signal probes. The target is detected using voltammetry [33].
Although RT-PCR is considered as the gold standard for SARS-CoV-2 detection, it is associated with a few disadvantages. Concerns such as requirement of expensive instruments, highly trained persons, and the time taken to generate the results have instigated the companies and the researchers to further improve RT-PCR-based tests as well as to develop new technologies to detect SARS-CoV-2.

Isothermal Nucleic Acid Amplification
Unlike RT-PCR which requires ramping/changes in temperatures to carry out different steps and requires multiple temperature changes for each cycle, isothermal nucleic acid amplification method amplifies viral genomic targets at a constant temperature. Several methods have been developed based on this technique.

Reverse Transcription Loop-Mediated Isothermal Amplification
Reverse transcription loop-mediated isothermal amplification (RT-LAMP) employs four different primers specific for the target sequence of the genome and works in combination with reverse transcription step. Detection is based on photometry principles that measure the turbidity that results due to the precipitation of magnesium pyrophosphate as a by-product of amplification. This method allows for real-time monitoring of the results by either quantifying the turbidity or by fluorescence using fluorescent dyes. Since RT-LAMP technique needs only heating and visual monitoring, it is a very simple method and highly sensitive that makes it a potential method for SARS-CoV-2 detection [34][35][36][37].

Transcription-Mediated Amplification
Transcription-mediated amplification (TMA) is based on the principle of retroviral replication that is used to amplify viral RNA or DNA targets. Unlike RT-PCR, TMA needs only a single temperature and is more efficient than RT-PCR [38]. TMA employs a retroviral reverse transcriptase and T7 RNA polymerase for the detection of several pathogens. Hologic's Panther Fusion platform has been developed using the principles of both RT-PCR and TMA [39]. This is highly sensitive and has very less detection time. It does up to 1000 tests in 24 h and has the capacity to detect other pathogens whose symptoms overlap with COVID-19 in the same patient sample.

Recombinase Polymerase Amplification
Recombinase polymerase amplification (RPA) relies on a highly sensitive recombinase enzyme that recognizes specific DNA sequences and then displaces the strands and performs amplification of virus-specific genes. RPA performs at a single temperature and does not need expensive equipments [40]. This method has high potential of testing SARS-CoV-2 in clinical samples.

CRISPR-Based Assays
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of nucleic acid sequences present in prokaryotic organisms like bacteria. Special CRISPR enzymes such as Cas9, Cas12, and Cas13 can recognize and cut these sequences. The Cas12 and Cas13 family of CRISPR enzymes can be engineered to detect and cleave the viral RNA sequences [41].
There are two companies (Mammoth Biosciences and Sherlock Biosciences) that are investigating the possibility of using this technique in the diagnosis of SARS-CoV-2 virus in clinical samples. The SHERLOCK method utilizes Cas13 that cuts the reporter RNA sequence upon activated by SARS-CoV-2-specific guide RNA [42]. The DETECTR assay of the Mammoth Biosciences utilizes Cas12a that cleaves the reporter RNA and detects RNA sequences of the E and N genes of SARS-CoV-2 virus. This is followed by the isothermal amplification of the target sequences that are monitored visually by the use of a fluorophore [43]. These methods are sensitive and specific as well as do not need expensive instruments and can be developed on a paper strip. These tests take very less time to generate results as compared with the RT-PCR-based tests (less than 1 h) [43].

Microarray: Hybridization-Based Assays
Microarray assays are hybridization-based high throughput methods and have been recently used for the detection of SARS-CoV-2 nucleic acids in clinical samples. The first step is to generate cDNA from viral RNA. This is followed by the labeling of the cDNA with specific probes designed to detect virus-specific cDNAs in the sample. Then, these labeled cDNAs are hybridized to complementary oligonucleotide sequences on a chip. Then, the signals of the hybridized cDNAs are quantified by a detector after washing the unhybridized cDNAs, and hence, the presence of viral nucleic acid is detected [44]. Recently, the microarray-based methods have been useful in detecting mutations in the SARS-CoV-2 genome. It has detected 24 single-nucleotide polymorphisms in the SARS-CoV-2 spike (S) gene with maximum accuracy [45].
The microarray-based techniques are useful in detecting different strains of SARS-CoV-2 rapidly on a single platform. However, the biggest drawback of this technique is the associated cost. Recently, a low cost and with equal sensitivity to RT-PCR have been developed to detect SARS-CoV-2 [45].

Amplicon-Based Metagenomic Sequencing
This technique is especially important in genomic epidemiological studies as it simultaneously identifies the SARS-CoV-2 and the background microbiome that result in secondary infections. This technique is very useful in contact tracing and epidemiological studies related to the evolution of the virus. This is of utmost importance in identifying the mutations acquired by SARS-CoV-2 and in the study of the recombination events that is carried out by the virus with the background microbiome. This amplicon and metagenomics-based technique has been used to rapidly sequence the SARS-Co-V-2 genome in mere 8 h using nasopharyngeal samples from COVID-19 [46]. The Illumina company has developed advanced form of this technique where along with detecting various strains of CoVs, it identifies various other pathogens in the clinical sample [47].

Conclusion
Epidemiological studies have now integrated molecular genetics techniques to widen its scope. With the advancements in the molecular genetics and genomics discipline, sequencing at single cell level to multiple genomes has become possible. It has become possible to compare thousands of genomes at one go with the use of advanced bioinformatics tools. Integrating these molecule aspects with the epidemiological studies has given rise to the molecular or genomic epidemiological branch of science. The molecular epidemiological studies have proven effective in understanding the molecular basis of COVID-19 along with several other communicable diseases. Rapid sequencing of the SARS-CoV-2 viral genome has open avenues for the development of anti-virals and vaccines for COVID-19. The knowledge of the whole genomic sequence of the SARS-CoV-2 virus has led to the development of several molecular diagnostics that target the genomic sequence of the virus for its detection such as RT-PCR, isothermal amplification assays, hybridization microarray assays, ampliconbased metagenomics sequencing, and the cutting-edge CRISPR-related technologies60. Most of the molecular diagnostic tests for COVID-19 are based on RT-PCR. However, several ramifications to this gold standard technique are under process and a few have been developed and being used for the diagnosis of COVID-19. Till date, there are almost 112 molecular diagnostic tests available to detect SARS-COV-2, and of them, 90% of the tests are based on RT-PCR principles, followed by 6% tests that use isothermal amplification principles, 2% tests are based on hybridization technologies, and the rest 2% tests rely on CRISPR-based principles.

Compliance with Ethical Standards
Conflict of Interest The authors declare that they have no conflict of interest.
Human and Animal Rights and Informed Consent This article does not contain any studies with human or animal subjects performed by any of the authors.