Since the resurgence of chikungunya virus (CHIKV) in India in 2005, waves of epidemic outbreaks have followed in subsequent years in diverse parts of this country [1,2,3,4]. Chikungunya (CHIK) fever is a mosquito-transmitted acute viral infection, characterized by high fever, arthralgia, myalgia, headache and rash. The virus is transmitted to humans by several species of mosquitoes, with Aedes aegypti and A. albopictus being the two main vectors in urban settings. CHIKV is a positive-stranded RNA virus with a genome approximately 11.8 kb in size. It consists of a single stranded, positive sense RNA genome with two open reading frames (ORFs). The ORF at the 5’ end encodes four nonstructural proteins (nsP1 - nsP4) while the ORF at the 3’ end encodes the structural proteins (capsid C, envelope glycoproteins E1 and E2 and three small cleavage products E3, 6K and the transframe protein TF).

E1 gene sequences are known to possess adequate information for phylogenetic classification [5] while E2 is known to be immunodominant within the structural polyprotein [6]. Based on this E1 and E2 gene sequences are generally used for phylogeny and related analyses [3, 7]. Phylogenetic analysis classifies all the CHIKV strains into three distinct genotypes [5] with the resurgent epidemic isolates from 2005 onwards falling into the Indian Ocean lineage (IOL) of the East, Central and South African (ECSA) genotype [8,9,10]. Further genetic characterization based on the envelope gene sequences of CHIKV isolates from Indian epidemics between 2009 and 2011 in the southern states of Tamil Nadu and Andhra Pradesh [4] and the eastern state of Odisha [3], revealed strain variants and evidence of adaptive evolution of the virus to both the vector and the host. During 2015, the southern state of Karnataka accounted for around 75% of the 27,553 suspected cases of CHIKV infection in the country. However, during 2016, Karnataka, Maharashtra, and New Delhi together accounted for nearly 55.4% of the 64,057 suspected cases. In 2017, Karnataka and Maharashtra, together accounted for nearly 63.1% of the 62,268 suspected cases [11]. The present study was undertaken to understand the genetic nature of the currently circulating CHIKV in the following cities: Pune and Nasik (Maharashtra), Bengaluru (Karnataka) and New Delhi - analysis based on the E1 and E2 genes. To further understand adaptive evolution of the ECSA genotype and IOL, selection pressure studies were undertaken followed by functional analyses.

Virus isolates stored in the repository of National Institute of Virology, Pune India were used for the study. These virus isolates were obtained from samples of patients infected with CHIKV during the 2015-2017 outbreaks in Bengaluru, Pune, Nasik and New Delhi as described previously [12]. Virus isolates (passage 2) obtained using the mosquito cell line C6/36 were used for RNA extraction. The genomic viral RNA was extracted from 11 representative isolates, Bengaluru (n = 5), Pune (n = 3), Nasik (n = 2), and New Delhi (n = 1), by QIAamp viral RNA mini kit (QIAGEN, Germany) according to the manufacturer’s instructions. The viral RNA was reverse transcribed using random hexamer and AMV reverse transcriptase (Promega Corporation, Madison, WI, USA) at 42 °C for 1 h. The E1/ E2 gene was amplified using 10.0 µl cDNA, Taq polymerase (Invitrogen) and primers as described previously [1]. PCR products were gel-purified using a QIAquick gel extraction kit (Qiagen) and sequenced using the BigDye Terminator cycle sequencing ready reaction kit (Applied Biosystems) on an automatic sequencer (ABI PRISM Genetic Analyzer 3100; Applied Biosystems). The E2-E1 gene region of these representative isolates from Pune, Nasik, Bengaluru and New Delhi were then sequenced (Suppl. Table 1). Overlapping molecular sequences (nucleotide and deduced amino acids) generated by the forward and the reverse primers in this study were manually processed using MEGA v.7.0.

An initial neighbour-joining (NJ) tree using MEGA v6 was used to identify the IOL from a starting dataset of all E2-E1 sequences (n = 863, available in GenBank as of March 2018) and reducing the redundancy on the basis of country and year of isolation, selecting one sequence per year for each country. The nucleotide substitution rate per site and the time of most recent common ancestor (tMRCA), were further estimated from time-stamped sequences of the IOL (n = 233) using the Bayesian Markov chain Monte Carlo (MCMC) method in the BEAST v2.0.2 software package [13]. The best fit nucleotide substitution model determined using Akaike Information Criterion (AIC) as implemented in MODELTEST 3.7 [14], was found to be GTR + G + I model (general time-reversible model with gamma-distributed rates of variation among sites and a proportion of invariable sites). The dataset was then analyzed assuming a relaxed (uncorrelated lognormal) molecular clock and using Bayesian skyline demographic models as coalescent prior. MCMC chain length was chosen to reach effective sample size > 200. The results obtained from MCMC analysis were assessed using Tracer v1.5 and the maximum clade credibility (MCC) tree was inferred using TreeAnnotator v2.0.2. The MCC tree was visualized using FigTree v1.4.0 software.

Selection pressure analysis of the E2-E1 region of the IOL and the ECSA genotype was performed in the Datamonkey server [15] using the single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), internal fixed effects likelihood (IFEL), and mixed effects model of evolution (MEME) methods. Amino acid sites were identified to be under positive selection pressure based on the statistical significance level (P ≤ 0.1 for SLAC, FEL, IFEL and MEME) by at least two of the methods. Mapping of the positively selected sites and major unique mutational sites of the new strains was done on the X-ray crystallographic structure of the E1-E2 heterodimer (3N41.PDB) available in the Protein Data Bank. To co-relate the mutational sites to known B-cell epitopes, the Immune Epitope Database (IEDB) and Analysis resource [16] was used for obtaining the experimentally known epitopes in the CHIKV structural polyprotein. Association of the mutational sites with computationally predicted CTL epitopes on CHIKV E2 and E1 proteins for important HLA-1 alleles [17] was also carried out.

The MCC tree based on the 11 E2-E1 gene sequences of Indian isolates from 2015-17 representing the four affected areas in the states of Karnataka (Bengaluru, 2015), Maharashtra (Nasik, 2015; Pune, 2015, 2016 and 2017) and New Delhi, 2016 as well as other representative strains is depicted in Fig 1. The rate of nucleotide substitution was estimated to be between 7.55 x 10−4 and 1.02 x 10−3 substitutions/site/year with the mean root age of the IOL being around 2003 (Suppl. Fig. 1). The posterior support for the Indian ocean sublineage comprising strains from Reunion, Comoros, Seychelles, Mauritius was found to be 0.93 (node a) while all the CHIKV isolates in this study belonged to the currently circulating Indian subcontinent sublineage (node b, posterior support 0.89). The 2015-2017 study isolates from different places, except a few Bengaluru 2015 isolates, were found to have ancestral origins between 2013 and mid-2014. Unique mutations, C19R, Y69I, S185Y, N219I, I377V and P409L in the E2 gene and K16E/Q, M27L, V28C, E50Q, D70L, K132Q/T, K181Q, E209K, K319N, S332T, N349K, S355T and S357L in the E1 gene were observed (Suppl. Table 2).

Fig. 1
figure 1

Phylogenetic tree (Indian Ocean Lineage) based on E2-E1 gene region sequences (n = 233) constructed by the Bayesian Markov Chain Monte Carlo (MCMC) method. Nodes referred to in the text are labeled a and b, with posterior support indicated. Taxa highlighted in pink possess the E1:A226V mutation while those in blue possess E2:V264A and E1:K211E. Branch color codes are: Red (Indian isolates of this study); Blue (other Indian isolates); Black (global isolates). X-axis represents the time in years. The figure on the right shows the expanded view of the group (taxa highlighted in blue and having posterior support 0.75) including the Indian CHIKV isolates in this study. All nodes with posterior support >0.7 are labeled in this group. Symbols indicate exceptions and mean the following: *(E2:V264A & E1:K211); # (E2: V264 & E1:K211E) and $(E2:V264 & E1:K211) (color figure online)

Selection pressure analysis (Table 1) of the ECSA genotype based on 214 unique sequences, revealed codon sites at position 110S/I/T, 164T/A, 375S/T/I of the E2 gene and 211K/E/N/R of the E1 gene showed evidence of being positively selected. The IOL analysis based on 194 unique sequences showed that E2: 69Y/H/I, 252K/Q/H and E1:211E/K/N/R were under positive selection. The unique mutational sites as well as the selection pressure sites identified in E1/E2 were mapped to known functional sites including the experimentally known B-Cell epitopes available from the IEDB (Suppl. Table 3) and the predicted MHC-1 restricted T-cell epitopes.

Table 1 Selection pressure analysis of the E2-E1 gene region of CHIKV isolates using the methods (SLAC, FEL, IFEL and MEME) available in the Datamonkey server

In this study the E2-E1 region of representative isolates (n = 11) from the four different areas was analyzed. The phylogenetic analysis (Fig. 1, Suppl. Fig. 1) revealed that the Indian subcontinent sublineage that is reported to have widely spread to Southeast Asia and Europe was found to show multiple subgroups. The estimates for the nucleotide substitution rate and the root age estimated in the study is almost comparable to a recent report [18]. Indian strains (Suppl. Table 4, Suppl. Fig. 1) from the State of Kerala (2007-2013) and West Bengal (2007-12) along with strains from Bangladesh and several South East Asian countries (including China, Myanmar, Malaysia, Thailand, Indonesia, Singapore, Cambodia), France and Australia were found to possess the A226V mutation that was associated with in-creased CHIKV transmission in Aedes albopictus mosquitoes [19, 20]. The 2015-17 study isolates were found to be close to strains identified in 2009-2012 from the states of Tamil Nadu, West Bengal, Kerala and Andhra Pradesh. Similar strains were noted from China, Japan, Pakistan, Bangladesh, France, Italy and American Somoa. An Australian strain of an imported CHIK case (unpublished), showed close identity with the Delhi 2016 strain. Overall, the molecular clock analysis revealed several independent evolutions of CHIKV within India. Diverse gene variants were specifically noted in the Bengaluru 2015 viruses (Suppl. Table 2) that may explain the major circulation of the CHIKV in Karnataka, in comparison to the other States of India during 2015 [11].

Within the E1 gene, none of the 2015-17 isolates possessed the A226V mutation. Also, none of the reported A226V-correlated secondary mutations (K233E, R198Q and K252Q) [21], of which K252Q was proven to affect dissemination in Ae. Albopictus [22], were found in our isolates. Notably, recent studies [21, 23], have demonstrated that two adaptive mutations, E1: K211E and E2:V264A, noted since 2010 [4, 24, 25] in the background of E1:226A provide remarkably higher fitness in terms of a significant increase in virus infectivity, dissemination and transmission in Ae. aegypti. The majority of the 2015-17study strains possessed E1: K211E and E2:V264A, implying that these mutations have been established in the CHIKV population in states such as Maharashtra, Tamil Nadu and Andhra Pradesh. Partial sequencing of earlier strains from Pune in 2010, revealed the existence of these mutations at an earlier time point (data not shown). Among the sequences used for the phylogenetic analysis, the strain Yemen 2011 and one Bengaluru 2015 strain (KA_Blore5310) were found to possess E2:V264A but not E1:K211E. On the contrary, a single strain HM159390 possessed E1:K211E and E2:V264. Another Bengaluru 2015 strain (KA_Blore5813), possessed neither E2:V264A or E1:K211E mutations.

Further, sequence analysis of the study strains showed the emergence of novel mutations (K16E/Q, K132Q/T, and S355T) in E1 being observed in multiple strains as well as Bengaluru isolates possessing several other mutations. Residue position 132 has been reported as a neutralizing antibody escape mutant in Sindbis virus [26] (Fig. 2) while residue 355 falls in the C-terminal domain III, which is known to contain virulence determinants of envelope proteins [27]. However, the significance of mutations at these positions needs experimental validation. None of the mutational sites could be associated with the transitional epitopes defined by residue positions 300, 361 and 381 [28]. Mutational sites 16 and 355 in E1 could be mapped on to potential T-cell epitopes presented by major HLA alleles HLA-B7 and HLA-B15 [17]. Site 211, that was found to be under selection pressure in earlier reports based on the ECSA genotype [2, 4], was notably positively selected, with regards to IOL, as well.

Fig. 2
figure 2

Mapping of positively selected sites and the major novel mutations on the E1-E2 heterodimer (3N41.PDB). E1 domains I, II and III are shown in red, yellow and blue, respectively, and the fusion loop (FL) in orange. E2 domains A, B and C are shown in cyan, green, and magenta respectively. Transitional epitopes identified in E1 (residue positions 300, 361 and 381) within domain III are indicated in pink. Experimentally known epitopes (1-19, 162-180, 186-200 and 242-259) in E2 (domains A and B) are indicated in violet. The positively selected sites are boxed (color figure online)

Within the E2 gene the unique mutation at position 185 was found to be associated with an experimentally known epitope (res. 186-200) in the IEDB (Suppl. Table 3) which is known to be on the exposed spike portion of the E2 protein (Fig. 2). The mutational site 19 falls within another known epitope (res. 1-19). Within E2, amino acid site 252 was identified to be positively selected in the IOL, as observed in an earlier study [2]. This residue is found to be located within the acid sensitive region that plays a role during the E2-E1 conformational changes occurring during viral entry [27]. The other positively selected site in IOL, at position 69, falls in the region E2:52-82 at the top of the spike in domain A [27] that is exposed and reported to be the point of contact for cellular receptors [29]. Further, site 164, which was seen to be under positive selection pressure, falls within a known epitopic region (res. 162-180) (Suppl. Table 3).

To summarize, the present study aimed to improve our understanding of CHIKV strains circulating in Karnataka, Maharashtra and New Delhi during 2015-17 in the context of their genetic evolution in different parts of the country. The results revealed the circulation of strains possessing Aedes aegypti mosquito-adaptive marker mutations E1: K211E and E2:V264A, in the absence of E1:A226V. Among the unique mutational sites in the isolates studied, K16E/Q, K132Q/T, S355T in E1 and C19R, S185Y in E2, could be associated with epitopes or virulence determining domains. The resurgence of CHIKV outbreaks in these regions may be attributed to indigenous evolution rather than importations. CHIKV outbreaks during the period of this study were notably restricted to mainly urban localities. Moreover, of the four areas investigated here, at least three (New Delhi, Pune and Bengaluru) are known for their dynamic populations, mainly due to rapid growth in their industrial sectors. This may have a significant role to play in influencing the herd immunity of the population, vector biology and virus-host interactions thus shaping the evolutionary dynamics of the virus. The occurrence of mutations in known B-cell epitopes underlines the need for studying cross-neutralization of virus isolates. The persistence of CHIKV and regular occurrences of outbreaks in India, highlight the importance of continuously monitoring the circulating CHIKV for changes in the virus population.