Skip to main content

Analysis of the Genomic Distance Between Bat Coronavirus RaTG13 and SARS-CoV-2 Reveals Multiple Origins of COVID-19


The severe acute respiratory syndrome COVID-19 was discovered on December 31, 2019 in China. Subsequently, many COVID-19 cases were reported in many other countries. However, some positive COVID-19 samples had been reported earlier than those officially accepted by health authorities in other countries, such as France and Italy. Thus, it is of great importance to determine the place where SARS-CoV-2 was first transmitted to human. To this end, we analyze genomes of SARS-CoV-2 using k-mer natural vector method and compare the similarities of global SARS-CoV-2 genomes by a new natural metric. Because it is commonly accepted that SARS-CoV-2 is originated from bat coronavirus RaTG13, we only need to determine which SARS-CoV-2 genome sequence has the closest distance to bat coronavirus RaTG13 under our natural metric. From our analysis, SARS-CoV-2 most likely has already existed in other countries such as France, India, Netherland, England and United States before the outbreak at Wuhan, China.


  1. Guan W, Ni Z, Yu H, et al. Clinical Characteristics of Coronavirus Disease 2019 in China. New England Journal of Medicine, 2020, 382: 1708–1720

    Google Scholar 

  2. Zhou P, Yang X L, Wang X G, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 2020, 579: 270–273

    Google Scholar 

  3. Lam T T Y, Jia N, Zhang Y W. et al. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature, 2020 583: 282–285

    Google Scholar 

  4. Munnink B B O, Sikkema R S, Nieuwenhuijse D F, et al. Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science, 2020, 371(6525): eabe5901

    Google Scholar 

  5. Dong R, Pei S, Yin C, et al. Analysis of the hosts and transmission paths of SARS-CoV-2 in the COVID-19 outbreak. Genes, 2020, 11(6): 637

    Google Scholar 

  6. Deslandes A, Berti V, Tandjaoui-Lambotte Y, et al. SARS-CoV-2 was already spreading in France in late December 2019. International Journal of Antimicrobial Agents, 2020, 55: 106006

    Google Scholar 

  7. Sridhar V B, Monica E P, Kacie G, et al. Serologic testing of U.S. blood donations to identify SARS-CoV-2-reactive antibodies: December 2019-January 2020. Clinical Infectious Diseases, 2020, ciaa1785

  8. Carrat F, Figoni J, Henny J, et al. Evidence of early circulation of SARS-CoV-2 in France: findings from the population-based “CONSTANCES” cohort. European Journal of Epidemiology, 2021.

  9. Yu C, He R L, Yau S S T. Protein sequence comparison based on K-string dictionary. Gene, 2013, 529: 250–256

    Google Scholar 

  10. Wen J, Chan R H F, Yau S-C, et al. K-mer natural vector and its application to the phylogenetic analysis of genetic sequences. Gene, 2014, 546: 25–34

    Google Scholar 

  11. Deng M, Yu C, Liang Q, et al. A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications. PLoS ONE, 2011, 6(3): e17293

    Google Scholar 

  12. Sims G E, Jun S R, Wu G A, et al. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences, 2009, 106: 2677–2682

    Google Scholar 

  13. Sims G E, Jun S R, Wu G A, et al. Whole-genome phylogeny of mammals: evolutionary information in genic and non-genic regions. Proceedings of the National Academy of Sciences, 2009, 106: 17077–17082

    Google Scholar 

  14. Wu F, Zhao S, Yu B, et al. A new coronavirus associated with human respiratory disease in China. Nature, 2020, 579(7798): 265–269

    Google Scholar 

Download references


We thank the researchers worldwide who sequenced and shared the complete genomes of SARS-CoV-2 and other coronaviruses from GISAID (

Author information

Authors and Affiliations


Corresponding author

Correspondence to Stephen S.-T. Yau.

Additional information

This work was supported by Tsinghua University Spring Breeze Fund (2020Z99CFY044), Tsinghua University start-up fund, and Tsinghua University Education Foundation fund (042202008).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pei, S., Yau, S.ST. Analysis of the Genomic Distance Between Bat Coronavirus RaTG13 and SARS-CoV-2 Reveals Multiple Origins of COVID-19. Acta Math Sci 41, 1017–1022 (2021).

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI:

Key words

  • SARS-CoV-2
  • multiple origins of COVID-19
  • mathematical genomic distance
  • k-mer natural vector

2010 MR Subject Classification

  • 92-08