Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiological agent of coronavirus disease 2019 (COVID-19) [1]. It has infected more than 49.1 million individuals and caused more than 1.2 million deaths globally (https://covid19.who.int/). First report of person-to-person transmission was established as an important mode of the virus transmission after the disease was documented in a family cluster of five patients [2].

SARS-CoV-2 is a betacoronavirus with single-stranded positive sense RNA genome of around 29.8 kb [1, 3]. The genome sequences of SARS-CoV-2 shared 96% similarity with bat coronavirus (RaTG13) suggesting possible Bat origin of SARS-CoV-2 [4]. The SARS-CoV-2 genome had unique polybasic cleavage motif PRRA in the S1-S2 junction region of spike protein. This cleavage site determines the infectivity and host range that may be responsible for increased human to human transmission by SARS-CoV-2.

Currently, several vaccines are under different phases of clinical trials / development. Being a RNA virus, SARS-CoV-2 is likely to mutate over time that may significantly impact transmission potential and response to vaccines / antivirals and utility of diagnostics developed based on earlier viruses. For this, it is essential to isolate and characterize SARS-CoV-2 at different times/different places. This study reports isolation and sequence analysis of SARS-CoV-2 at the beginning of emergence of COVID-19 in India. Importantly, the cases did not give history of (H/O) travel or contact with persons with H/O travel.

Results and discussion

Nasopharyngeal swabs (NPS) from 5 COVID-19 patients (2 from the same family) were used to isolate the virus in Vero CCL-81 cells (ATCC, USA). Of these, 2 samples (8003 and 8004, IRSHA isolates) from a single family with Ct values < 15 in qRT-PCR showed distinct cytopathic effect (CPE, more than 90% cells detached) on day 3 post-inoculation (Fig. 1a). No cytopathic effects were observed for the remaining 3 samples even after one blind passage into T-25cm2 flask. Replication of SARS-CoV-2 was confirmed by immunofluorescence assay using serum sample from a convalescent-phase (27 days post-disease onset) COVID-19 patient. On day 2 post-infection, viral antigen(s) were detected only in infected cells (Fig. 1b). Successful virus isolation using nasopharyngeal swabs (NPS) was reported from Korea [5] and United States [6] (Vero CCL-81), Italy [7] (Vero E6) and Australia [8] (Vero cell line expressing human signalling lymphocytic activation molecule). Other countries used different sample types and cell lines for virus isolation. In the very first report of SARS-CoV-2 isolation from China, bronchoalveolar lavage sample was inoculated into primary culture of human airway epithelial cells [9]. Caco-2 cells were used for SARS-CoV-2 isolation in Germany [10]. Isolation of the virus from ocular fluid of COVID-19 patient suggests that human ocular cell types may be supporting SARS-CoV-2 replication [11].

Fig. 1
figure 1

Isolation and genomic characterization of SARS-CoV-2. a Phase-contrast microscopic image of uninfected control cells and cells infected with nasopharyngeal swab of SARS-CoV-2-infected patient exhibiting cytopathic effect, 3 days after inoculation. b Vero cells were infected with 1:1000 diluted 8003 and 8004 virus isolates. After 2 days post-infection, SARS-CoV-2 protein was detected with a convalescent patient’s serum and Alexa-fluor 488 conjugated goat anti-human-IgG. Nuclei were stained with DAPI. c Plaque morphology of two clinical isolates on Vero cells at day 3, day 4 and day 5 post-infection. Distinct plaques could be visualized on day 4 and day 5 post-infection. d Phylogenetic tree indicating complete genome sequences of SARS-CoV-2 isolates (n = 2) from Pune, India. Each strain is indicated by country, year of isolation, and GenBank accession number or GISAID number. The sequences obtained in this study are marked in filled colour triangles. Phylogenetic trees were constructed using neighbour joining method and distances were computed using maximum composite likelihood method in MEGA X software. Genetic distances were calculated using the p-distance model of nucleotide and amino acid substitution. The robustness of the resulting tree was assessed with 1000 bootstrap replicates

Plaque assay and CPE-based TCID50 were used to quantitate infectious viruses. Both the IRSHA isolates (8003-P0 and 8004-P0) formed clear and distinct plaques on day 4 post-infection (Fig. 1c). Plaque assay-based virus titers of 8003 and 8004 isolates were 3.0 × 106 and 7.5 × 106 PFU/mL, respectively. CPE-based virus titers of 8003 and 8004 were 10 4.34 and 10 5.6 TCID50 per mL, respectively, on day 3 post-infection and remained constant till day 5. Further passage in Vero CCL-81 cells did not lead to appreciable change in plaque assay-based titers; titers of 8003-P1 and 8004-P1 were 5 × 106 and 4.0 × 106 PFU/mL, respectively. However, the CPE-based titers showed significant increase on passaging and titers of 8003-P1 and 8004-P1 were 107.5 and 107 TCID50/mL, respectively.

For the genomic characterization, whole-genome sequencing of virus isolates from the first passage was performed using next-generation sequencing platform, Ion Proton system (Life technologies, USA) as reported earlier [12]. GenBank accession numbers for the generated sequences are MT416725 and MT416726. For phylogenetic analysis, 89 additional complete genome sequences were retrieved from GenBank and GISAID (https:www.gisaid.org) database. All 91 sequences were aligned using MAFFT v7.450, 2019 online program (https://mafft.cbrc.jp/alignment/software/).

The complete genome length of 8003 and 8004 viruses were 29,866 nucleotides (nt) with 5′untranslated region (UTR) of 265nt and 3′UTR of 188nt. These sequences were compared with reference sequence, NC045512 from Wuhan, China. At nucleotide level, IRSHA isolates were 99.96 ± 0.01% identical to the reference sequence. The distribution of the 11 mutations recorded was C241T (5′UTR), C313T, C3037T, C5700A, C14408T, C18928T (ORF1ab), A23403G, G23593C (spike protein) and G28881A, G28882A, G28883C (nucleocapsid phosphoprotein) (Table 1). Two mutations at positions, C313T and C3037T, were synonymous mutations with no change in amino acids, leucine (L) and phenylalanine (F). Eight mutations resulted in amino acid changes at positions 5700 (nsp3, A1812D), 14,408 (RdRp, P314L), 18,928 (3′ to 5′ exonuclease, exoN, P1821S), 23,403 (spike protein, D614G), and 23,593 (spike protein, Q677H). The mutations at G28881A, G28882A and G28883C resulted in amino acid changes, R9455K and G9456R in nucleocapsid (203 and 204) region.

Table 1 Analysis of sequence variations and amino acid changes of Pune isolates in comparison to reference strain, NC045512 (Wuhan, China)

Of these, 8 mutations were identical with c31/India, c32/India, and Israel sequences. A1812D, P1821S and Q677H mutations, respectively, in nsp3, exoN of ORF1ab and spike protein were unique to IRSHA isolates. A1812D mutation was in the protease domain of nsp3 region of orf1ab protein [13]. This mutation might modulate the function of nsp3 protein by affecting viral protein maturation process and innate immune signalling pathways. Mutation in exoN, P1821S might increase the fidelity of RNA synthesis by correcting nucleotide incorporation errors made by RdRp [14].

Phylogenetic analysis (Fig. 1d) revealed that IRSHA isolates were closely related to the Indian strains isolated from close contacts of infected patients with travel history to Italy (EPI ISL 420,555, EPI ISL 426,179). The nucleotide divergence within these sequences was found to be 0.02% while nucleotide identity with cluster of sequences with travel history to Italy was 99.97 ± 0.01%. The SARS-CoV-2 sequences of Italy origin were further segregated into two subgroups: IRSHA (Pune, western India) isolates clubbed with two other Indian sequences (India/c31/2020, India/c32/2020, Delhi, north India) from contacts of patients returned from Italy and viruses from Israel, Greece, Russia, Brazil, Germany and Netherlands (Fig. 1d). The other subgroup clustered with Indian isolates obtained from Italian tourists who visited India /their contacts and viruses from Italy and Spain [15]. Similar to Wuhan isolate, all these sequences belonged to the L strain of SARS-CoV-2, with C at 8782nd and T at 28114th position [16].

IRSHA sequences clustered together with sequences derived from Italy and other European countries are classified as clade G due to the presence of D614G mutation in the spike protein. Origin of D614G variant, located in the S1 domain of spike protein, was associated with accelerating the spread of infections in Europe and North America [17]. In our isolates, a unique mutation Q677H located near the S1-S2 junction region (681–684) might affect the SARS-CoV-2 fusion with cell membrane. Importantly, attenuated phenotype of SARS-CoV-2 variants was obtained following 15-30 bp deletion at S1-S2 junction region of the spike protein [18]. Detection of deletion at the flank sites of PRRAR in 3 clinical samples from Guangdong, China, is noteworthy [19]. In addition, circulation of other genetic variants such as 382 nucleotide deletion in orf8 region from eight hospitalized patients in Singapore and 81 nucleotide deletion in orf7a gene in a patient from Arizona, USA [20], has been reported and effect of such mutations on the transmissibility of the virus should be available soon.

In the initial phase of the pandemic, India reported cases in travellers from China (1st case), Italy and Iran [15, 21]. IRSHA sequences exhibited 99.94 ± 0.01% nucleotide identity with viral sequences obtained from patients having travel history to Iran. They formed separate cluster with four common nucleotide mutations at G1437A, G11083T (Orf1a), T28688C (Nucleocapsid) and G29742T (3′UTR).

Continuous evolution of SARS-CoV-2 was observed in viral sequences of Italy origin. The very first imported case of SARS-CoV-2 in Italy was in a Chinese traveller [22]. The sequence (GenBank accession No MT066156) was similar to Wuhan, China, except for a unique G251V mutation forming a separate clade V [22]. On several introductions / transmissions of SARS-CoV-2 in Europe, D614G mutation was identified in Italian patients with no travel history. These sequences formed a separate cluster, clade G [23] which spread to other countries including India as observed in this study.

Italy being a country of global choice for tourism, it is natural to identify COVID-19 cases among travellers from Italy and their close contacts. Though the patients from whom we isolated virus did not give H/O travel or contact with a traveller from abroad, sequence data suggest source of the virus to be related to the strain imported from Italy. Thus, sequence analysis can help in identifying origin and circulation of a particular virus strain in a population.

In summary, we report isolation and characterization of two Indian isolates from a single family without history of travel. Phylogenetic analysis revealed that source of the virus is related to the strain imported from Italy. The availability of fully characterized isolates along with well-standardized plaque and TCID50 assays will allow development of much required neutralization tests for understanding pathogenesis of the disease and evaluation of vaccines and other immunotherapeutics.