Background

Human metapneumovirus (HMPV), a close relative of respiratory syncytial virus (RSV), is a recognized major human pathogen that causes epidemics of respiratory tract illnesses in persons of all ages worldwide [1, 2]. Discovered in 2001 [1], HMPV was probably circulating for at least 50 years prior to this date [1]. Infection with HMPV may manifest as upper or lower tract respiratory illness, similar to that observed with RSV disease although HMPV has a considerably lower individual disease risk and population burden than RSV [3, 4].

A member of the Paramyxoviridae family of viruses, HMPV genome is a negative-sense single-stranded RNA molecule, 13.3 Kb long, encoding eight proteins [5]. Three surface proteins F (fusion), G (attachment glycoprotein) and SH (small hydrophobic) are encoded within the HMPV genome [6] and F and G nucleotide sequences have been largely used to study HMPV genetic variation [7]. Whilst the G gene shows higher sequence and amino acid diversity [811], only the F protein is confirmed to be immunogenic and protective [6, 12].

In the northern hemisphere peak HMPV disease occurrence is typically in winter and spring months of January to May [1315], while in the southern hemisphere peak prevalence is in the spring period of August to September [16]. In Kenya, peak HMPV prevalence has been recorded in June-July in the west and November-December in refugee camps in the northeast and northwest of the country [17, 18].

Worldwide, HMPV prevalence in hospital inpatient or community studies, in children or elderly adults, varies widely from as low as 1.7 % to as high as 17 %, with generally higher prevalence in outpatients compared to inpatients and, also, more in children younger than 5 years compared to older age groups [7, 13, 14, 16, 1926]. Studies in Kenya report HMPV prevalence between 3 and 6 % in acute respiratory infection cases in inpatient populations [17, 2729], and 7 to 8.6 % in outpatient [17, 30] but none have provided information on virus genetic characteristics and underlying evolutionary changes over successive epidemic seasons.

HMPV has been divided into two serologically distinct groups, A and B [1, 31]. Group A generally dominates over group B [7, 2426, 32] and has been reported to cause more severe disease than group B [33]. The two groups are further subdivided into subgroups A1, A2, B1 and B2 based on genetic differences in the surface proteins F and G but these do not show clear antigenic differences at least in neutralization assays using anti-sera raised in ferrets [8]. The A2 subgroup is the most genetically heterogeneous of the four subgroups and some studies have suggested its further sub-division into A2a and A2b sub-lineages based on sequence data [34, 35]. Based on the F gene, HMPV groups A and B have 84–86 % homology at nucleotide level and 94–97 % at amino acid level whilst within subgroup similarity is 94–96 % at nucleotide and 97–99 % at amino acid levels [8]. In comparison, the more diverse G protein shows only 50–57 % and 30–37 % similarity for nucleotide and amino acid sequences, respectively, between the two groups A and B [8]. Variants from both groups, and sometimes from multiple subgroups within the groups, can co-circulate in the same epidemic season [8, 12, 14, 35, 36].

Candidate vaccines targeting the G protein and a subunit vaccine of the F protein have shown promising results although, to date, none is licensed [3740]. We set out to understand the genetic diversity in the F and G genes in circulating strains in coastal Kenya in relation to seasonal introductions of the virus, to contribute information that may be important for vaccine development and virus infection control. We describe the molecular epidemiology of HMPV in child admissions at a coastal county hospital of Kenya, over a 5-year period, building on previous work in the hospital [27] in order to elucidate prevalence, circulating strains and genetic diversity in the most at risk paediatric population, contributing information on HMPV persistence and transmission.

Methods

Study population and sample collection

Study participants were identified through continuous surveillance of pneumonia admissions to the paediatric wards of Kilifi County Hospital (KCH) over a 5-year period between January 2007 and December 2011. KCH is located in coastal Kenya, 60 km north of Mombasa and is the main hospital that serves the residents of Kilifi County. The hospital handles 4000 to 4500 paediatric admissions annually and around 30 % of the under 5 year olds have an admission diagnosis of lower respiratory tract infection (LRTI) based on WHO definitions of severe and very severe pneumonia [41]. Hospital admissions can be further stratified according to residency within the Kilifi Health and Demographic Surveillance System, KHDSS [42]. Each year this coastal location experiences two rainy seasons: long rains between April and July and short rains between October and December. Further details of the location and ongoing surveillance have previously been described [27, 28, 43, 44].

Children were eligible for the current study if at admission they were aged 1 day to 59 months with syndromic severe or very severe pneumonia, i.e. cough or difficulty in breathing plus any one or more of the following: lower chest wall indrawing (severe pneumonia), oxygen saturation of less than 90 % (finger tip pulse oximetry), inability to drink or breast feed, prostrate or unconscious (very severe pneumonia) [27]. Following written informed consent from the parent or guardian, a nasopharyngeal flocked swab, nasal wash or combination of nasopharyngeal swab and oropharyngeal swab was collected from each child, placed in 3 ml viral transport medium, and stored at −80 °C prior to laboratory screening. The Kenya National Ethical Review Committee approved the study protocols.

Diagnostic real-time polymerase chain reaction (RT-PCR)

RNA was extracted from either 200 or 140 μl of nasopharyngeal samples using MagNA Pure LC32 automated total nucleic acid extractor (Roche Applied Science, Mannheim, Germany) or QIAamp Viral RNA minikit (Qiagen, Valencia, CA, USA), respectively, according to the manufacturer’s instructions, for virus screening and sequencing respectively. Virus detection was done using real-time PCR using a TaqMan probe based system as described by Hammitt et al., [28] in a multiplex PCR assay run on the ABI 7500 (version 2.5, Applied Biosystems, Foster City, California, USA). Samples with a cycle threshold (Ct) values of less than 35.0 were considered positive and taken through to sequencing. Samples from 2007 were processed as described in Berkley et al., [27]. In this case, RNA was extracted from 200 ul of nasal sample using the Magnapure LC Total Nucleic Acid Isolation Kit (Roche, Manheim, Germany) and virus detection conducted using the LightCycler Fast Start DNA MasterPLUS Hyb-Probe kit (Roche, Mannheim, Germany).

Gene specific PCR and sequencing

Surface proteins encoding genes, the fusion protein (F) and glycoprotein (G), were amplified in a one-step RT-PCR assay using Quantifast one-step RT-PCR system (Qiagen, Valencia, CA, USA). Primers targeting the ectodomain region of the F protein to give a 405 bp product were used to amplify a portion of the F gene as previously described [45]. Amplification of the G gene was performed using semi-nested PCR to yield a 930 bp product that included the G gene and a portion of the L gene [7]. Amplified products of both the F and G gene were checked on a 2 % agarose gel with Ethidium Bromide staining to ascertain successful amplification. The remainder of the PCR products were purified using GFX DNA purification kit (GFX-Amersham, Amersham, UK) according to the manufacturer’s instructions and taken forward for DNA sequencing. Because of PCR failures at the amplification stage, only 130 and 98 samples (out of the total of 160) were followed through to sequencing for F and G gene, respectively. Purified PCR products were sequenced using Big Dye Terminator 3.1 (Applied Biosystems, Foster City, California, USA) using the same PCR primers in both forward and reverse direction and generated with an ABI Prism 3130xl Genetic Analyzer (Applied Biosystems, Foster City, California, USA).

Sequence alignment, phylogenetic and molecular analysis

Raw sequences were assembled using either DNASTAR or Sequencher (version 4.10.1, Gene Codes Corporation, Ann Arbor, USA). Multiple sequence alignments (MSA) were undertaken in MAFFT v7.220 [46]. G gene sequences were trimmed to approximately 606 bp to remove the intergenic region and a portion of the L gene. To obtain comparison data, a GenBank search was conducted on 11-Jun-2015. The search terms were “human Metapneumovirus AND (F OR Fusion) AND 340[SLEN]:14000[SLEN]” for F protein data, and “human metapneumovirus attachment AND 600[SLEN]:14000[SLEN]” for G protein data. To be included in our comparison dataset the sequences had to have a complete overlap in their sequenced portion with Kilifi virus data, information on the country sampled and sampling date (at least year) of between 2007 and 2011. With these criteria we identified 290 (for F) and 233 (for G) sequences. Duplicate sequences were dropped. Genbank accession numbers of sequences for these analyses are available in Additional file 1: Table S1 for F gene and Table S2 for G gene.

Phylogenetic trees were generated using both Maximum Likelihood (ML) and Bayesian methods: For ML in MEGA v5.2.2 and for Bayesian in BEAST (Bayesian evolutionary analysis and sampling of Trees) v1.8.2. To genotype the Kilifi viruses, the F protein sequence data were analyzed with reference sequences deposited in GenBank (details provided in Additional file 1: Table S1). A genotype was only confirmed if sequences clustered with the reference sequences within a major branch with >70 % bootstrap support on the ML tree. Temporally structured phylogenetic trees were generated using BEAST. Tip dates (dd-MMM-yyyy) were used in all analyses. For comparison dataset sequences that had only the year of collection, date of collection was estimated to 01st of July of the reported year. All Bayesian analyses used HKY, gamma distribution with invariant sites as the model of evolution and demographic model of constant population size. The analysis was set to 50 million steps sampling after every 2500 steps. The output was only further analyzed when ESS (estimated sample size) for all parameters exceeded 200. We settled on the above parameters in the Bayesian analyses after alternative more complex models e.g. general-time reversible (GTR) failed to give a converged result. Maximum clade credibility trees were calculated using Tree Annotator 1.8.2. and visualized in Fig Tree v1.4.2.

To analyze variation in sequences, unique sequences and variable nucleotide and amino acid positions were identified using in-house python and ruby scripts. Patristic distances were analyzed in MEGA v5.2.2 [47]. Sequences generated in this study are deposited in GenBank under the accession numbers: KT191355-KT191484 for F and KT191299-KT191354 for G protein.

Results

Study population

Between January 2007 and December 2011, there were 16,439 admissions to KCH aged between 1 day and 59 months, of which 32.1 % (5284) were eligible for study as cases with syndromic severe or very severe pneumonia (Table 1). Overall, 62.8 % of these children were tested for HMPV, ranging by year between 43 and 83 % due to changes in proportion of non-residents of KHDSS included in the different samples (15.4 % in years 2007–09, versus 47.3 % in years 2010–11).

Table 1 Study population at Kilifi County Hospital, number tested and number of HMPV positive samples recorded

HMPV prevalence in child admissions

HMPV was detected in 160 (4.8 %) of the 3320 samples tested. Prevalence by year ranged from 2.9 % in 2007 to 8.8 % in 2009 (Table 1). Almost half of HMPV positive samples were identified in the years 2008 and 2009. Children under 6 months of age accounted for 44 % of cases (71/160) while 74 % (118/160) of cases were in children under 1 year old with only 1.3 % (2/160) HMPV positive for children >36 months (Table 2). Of the 160 HMPV positive cases, 83.8 % (134/160) and 16.2 % (26/160) presented with symptoms classified as either severe or very severe pneumonia, respectively (Table 2).

Table 2 HMPV positives stratified by age group of patients and pneumonia status in Kilifi County Hospital

Temporal occurrence and circulation patterns of HMPV

HMPV occurrence showed a seasonal pattern with the majority of cases being detected in the period from October of 1 year through to April of the next (Fig. 1a). The seasonal increase in cases tended to coincide with lower rainfall, higher temperature and lower relative humidity (Fig. 1b). For subsequent analysis we assume August as the end month of one season and September the start of the next (hence the colour scheme in Fig. 1a). However, there was no clear-cut demarcation between the end of one seasonal epidemic and the next as sporadic HMPV cases were detected in seasonal troughs, except for the inter-epidemic period between the end of the 2009-10 and rise of the 2010-11 seasons, where no cases were observed over a 6 month interval (Fig. 1a).

Fig. 1
figure 1

a Temporal distribution of HMPV positive samples in Kilifi over five years, showing number of positive samples each month on the primary axis and number of samples tested monthly on the secondary axis. Different colours indicate the different epidemics the samples were assigned; b Monthly weather patterns in Kilifi, Kenya in the period 2007–2011

Genetic diversity of HMPV samples from KCH

PCR amplification of the F gene was more successful than for the G gene, with 130 and 98 positive PCRs for F and G gene, respectively. A total of 123 samples from the 160 HMPV positives were successfully sequenced for both or either G or F gene only and genotyped (Table 1). There was no statistically significant difference (P = 0.613) in Ct values between sequenced samples and those that failed to be sequenced (numbering 37).

Among the 123 samples successfully sequenced for the F protein over a 345 nucleotide length region, 49 of these were unique. Overall mean nucleotide diversity for this subset was 0.106. In the phylogenetic analysis we combined the Kilifi unique F sequences with all others deposited in Genbank that were contemporaneous and overlapping in the sequence F portion. Both A and B HMPV groups, specifically A2, B1 and B2 were observed in Kilifi (Fig. 2). Subgroup A1 was not observed in Kilifi (Fig. 2). Within the subgroups, virus sequences from the same epidemic did not necessarily group together into marked clusters instead they were interspersed on the phylogenetic tree with the international sequences (Fig. 2). Majority of Kilifi sequences in the A2 subgroup occurred within three distinguishable clusters and when compared to global sequences, clustered closely with sequences from Canada and Nairobi Kenya and were highly similar in each of the subgroups in which they fell (Fig. 2). A ML phylogeny of the HMPV F sequences from Kilifi alone, color coded by epidemic is given in Additional file 2: Figure S1A. Notably, phylogenetic clusters formed within the different subgroups on this tree had sequences from multiple epidemic periods i.e. no clear temporal clustering.

Fig. 2
figure 2

Phylogenetic relatedness and temporal divergence of the combined Kilifi and contemporaneous global F protein sequences over the 345-nucleotide portion analyzed. Taxa of Kilifi viruses are coloured red. Node bars indicate the 95 % HPD height interval of the nodes; the node makers size are scaled by posterior support, for Kilifi, coastal Kenya 2007–11

Of the 98 PCR positives for the G protein, 56 samples sequenced successfully over the 606 nucleotides of the HMPV G coding region. This represented coverage of 88.2 % of the entire G coding sequence. All the 56 sequences were determined to be of genotype A2 within group A (figure not shown) with 53 providing unique sequences over the sequenced region. This unique subset showed an overall mean genetic diversity of 0.079. An ML phylogeny of the HMPV G sequences from Kilifi alone, color coded by epidemic is given in Additional file 2: Figure S1B.

The phylogenetic resolution was far greater with G sequences compared to F sequences (Additional file 2: Figure S1A), showing higher bootstrap support values and longer branch lengths. Viruses deemed identical in the F portion we sequenced possessed multiple nucleotide differences in the G portion (Additional file 2: Figure S1B). However, similar to what was observed in the F-based phylogeny, Kilifi sequences did not cluster strictly according to epidemic, but rather sequences from multiple epidemic periods frequently occurred within the phylogenetic clusters but these tended to be those deriving from successive epidemics (Additional file 2: Figure S1B).

Comparison of Kilifi G gene sequences with global sequences showed that Kilifi sequences clustered closely with some sequences from Canada, Peru, China and India. However, there were clusters of sequences from Peru, Canada, India, Greece, Uruguay and Rwanda for which close relatives were absent in Kilifi (Fig. 3a). The Kilifi G sequences diverged into three major clusters (cluster 1, 2, 3 in Fig 3; Additional file 2: Figure S1B) and one minor cluster (4 in Fig. 3). Each cluster consisted of sequences from viruses from more than one epidemic; cluster 1 of epidemic 2010-11; cluster 2 of epidemics 2008-09, 2009-10 and 2011-12 and cluster 3 of epidemics 2007-08, 2008-09, 2009-10 and 2010-11. Within each cluster, sequences from the same epidemic grouped together. While cluster 1 was distinctly removed from the other clusters (Fig. 3) and majority of global sequences, it was closely related to sequences mainly from Asia specifically China and India. The major cluster of Kilifi sequences (cluster 2) consisting of 22 sequences was most closely related to one sequence from India. Sequences in cluster 3 were closely related to sequences mainly from Canada and a few sequences from India (Fig. 3). There was a unique branch of sequences mainly from Peru and one from China into which none of the Kilifi sequences fitted.

Fig. 3
figure 3

Phylogenetic and temporal placement of Kilifi group A G protein sequenced viruses, for Kilifi, coastal Kenya 2007–2011. Panel a A total of 209 viruses compared in G sequences G (53 from Kilifi and 156 collated from GenBank from 7 countries). Branches leading to Kilifi viruses are coloured red. Three letter codes of countries comprising branches without Kilifi representative sequences are indicated next to the vertical line. Panel b 121 viruses that fell within the ancestral node leading to Kilifi viruses were reanalyzed in BEAST. Again branches and leaves of Kilifi viruses are colored red on the phylogenetic temporally calibrated tree. Node bars indicate the 95 % HPD height interval of the nodes; the node maker sizes are scaled by posterior support. The number 1, 2, 3 and 4 represent the three major and one minor cluster of sequences from Kilifi

A temporal analysis of genotype occurrence and circulation in Kilifi showed that the majority (91/123) of circulating isolates were A2 and this type was dominant and circulating in each of the five epidemics (Additional file 3: Figure S2) while B1 (3.3 %, 4/123) and B2 (22.8 %, 28/123) occurred less frequently (Table 1; Additional file 3: Figure S2). Whereas A2 and B2 were recorded in every epidemic and two subgroups circulated concurrently in each epidemic, B1 was only present in epidemic 2007–2008 (Additional file 3: Figure S2).

Subgroup prevalence patterns in Kilifi versus global

We compared the subgroup prevalence in the 123 F sequences from Kilifi with 290 global sequences we collated from GenBank to show genotype distribution by year. The global dataset was drawn from seven countries: Japan, Peru, Rwanda, Egypt, Thailand, India and Canada. The patterns in Kilifi appeared considerably distinct from the overall global patterns (Fig. 4). Only the year 2010 in Kilifi mirrored genotypes trends that were observed globally, with subgroup A2 dominating (Fig. 4b).

Fig. 4
figure 4

Pie charts showing the genotype distribution by year derived from F sequence analysis of samples from Kilifi, Coastal Kenya 2007-11. Panel a This is based on the 290 F sequences collated from GenBank. Panel b This is based on 123 F sequences generated from samples that were collected in this study at the KCH between 2007 and 2011. The numbers inside the pies indicate the genotype proportions per the respective year

Evolutionary analysis

We estimated the overall evolutionary rate for the F region analysed from the combined Kilifi-global sequence dataset. It was determined as 1.96 × 10−3 substitutions/site/year (95 % HPD Interval: 1.37 × 10−3, 2.57 × 10−3). This is including all group A and B strains. Divergence dates of the groups A-B, subgroups A1-A2, and subgroup B1-B2 from these F data were estimated as, 1944.16 [95 % HPD interval 1893.4, 1979.0], 1994.0 [95 % HPD interval 1986.3, 1998.9] and 1988.9 [95 % HPD interval 1972.8, 1997.8], respectively (Fig. 2). A similar analysis determined the evolutionary rate in the G region we sequenced for the A2 genotype to be 5.915 × 10−3 substitutions/site/year (95 % HPD Interval: 4.147 × 10−3, 7.887 × 10−3).

Analysis of protein changes in the F and G genes

The HMPV G protein is on average 236 amino acids long. For the Kilifi genotype A2 G protein sequences were predicted to encode 3 different protein lengths: 213, 217 or 228 due to usage of alternative stop codons. Our sequencing of the subgroup A2 was from amino acid 28 to end. We observed changes in these sequences leading to gains or loss of N-glycosylation sites. A total of five N-glycosylation sites at positions 30, 52, 145, 152 and 180 were identified on the sequenced G protein. One-hundred-three of the 228 codon positions were polymorphic and up to 5 variants were identified based on sharing a combination of ≥5 signature amino acid residues. There were six sites where amino acid changes led to gain and another eight different sites where changes led to loss of N-glycosylation (Additional file 4: Figure S3). The frequency of gain and loss of N-glycosylation overall was 36 and 56, respectively. Position 180 had one of the most frequent losses in N-glycosylation that occurred in 39 sequences. Overall, from the amino acid changes observed, the pattern of changes clearly demarcated the sequence set into five clusters (Additional file 4: Figure S3).

The HMPV F protein is on average 539 amino acids long. Our F sequencing encompassed 105 codon positions, representing 19.5 % of the entire F protein sequence. Of those that we sequenced, 15 % (17/115) showed amino acid changes, confirming its high degree of conservation. There was no N-glycosylation site observed in the sequenced region of the F protein (Additional file 5: Figure S4).

Discussion

The epidemiological and evolutionary patterns of circulating strains of HMPV remains poorly documented in most of sub-Saharan Africa. Using an existing framework for childhood pneumonia surveillance at a referral hospital in coastal Kenya we set out to describe HMPV epidemiology as well as its genetic diversity in this region and compared findings to global contemporaneous strains deposited into GenBank.

We found that 4.8 % of childhood pneumonia hospital admissions for the period 2007 to 2011 (inclusive) in Kilifi County Hospital were HMPV positive. Our results fall in the range 3.8 to 15 % [1315, 20, 23, 48] reported in pediatric hospital admissions in other parts of the world. A previous study in Kenya (albeit in a refugee population) identified HMPV prevalence of 5.7 % [17].

The HMPV infections in KCH admissions were most common in children <6 months (44 %), and 74 % of all HMPV cases occurred in children under 12 months of age, with 84 % of cases presenting with symptoms of severe pneumonia. Substantial disease burden associated with HMPV in the first year of life has been previously reported [7, 13, 16, 22, 49], highlighting the most affected age group and providing a guide on the populations to prioritize in future HMPV vaccine administration.

A seasonal pattern to HPMV positive samples was identified from October of 1 year to April of the next, corresponding to higher temperatures and lower rainfall. This is similar to the seasonal pattern of RSV at the same site [49, 50]. In Dadaab, a refugee camp 500Km north of Kilifi, peak HMPV prevalence occurs in December [17], similar to Kilifi. In other parts of the world, seasonality in HMPV prevalence has been previously reported [14, 15, 17] with peak prevalence either coinciding with the winter season, concurrent or after the RSV epidemic season [13, 15, 20]; alternating between winter and spring [51] or peaking in the late spring-summer months [52] in the northern hemisphere whilst studies in Australia show peak seasons in spring [16] which is concurrent with the RSV peak season. In 2010, no HMPV was detected between April and September. Studies in Europe have similarly shown HMPV prevalence varies from year to year [51, 52].

Three HMPV subgroups A2, B1, B2 were found in Kilifi during the study period; A1 was absent but A2 and B2 occurred over the whole surveillance period whilst B1 was only recorded in the 2007 and 2008 in low numbers. All the samples sequenced for G gene were A2, reflecting the fact that A2 was the predominant subtype in Kilifi in every season/year of the study. Examination of global HMPV sequences in GenBank for the period covered in our study showed low representation of A1, possibly explaining its absence from Kilifi. Interestingly, studies have shown A1 to be dominant in the USA [13] and South Africa [25] and B1 in an Australian study of inpatient admissions of all ages over a 4 year period [16]. Genotype B1 was only detected in 2007 and 2008, and undetectable in the remaining 3 years. Our analysis of comparison sequence data in public databases of the same period showed that B1 was indeed circulating elsewhere in the 3 years that we did not detect it in Kilifi. A 20-year study in the USA reported sporadic detection of B1 genotype [53] and may in part explain the intermittent pattern of occurrence that we observe in Kilifi. Studies covering a longer time period may better resolve the pattern of genotype occurrence. The identification of three subgroups of HMPV was possible using only F sequence data and not the G sequence data possibly owing to the larger number of samples successfully sequenced for the F.

Co-circulation of multiple lineages of HMPV has been previously reported [7, 14, 53, 54]. Furthermore, dominant strains may vary in different seasons and locations [7, 36, 55] and genotypes may dominate in 1 year then be replaced by another the subsequent year [16, 30, 53]. In Kilifi, our data showed a contrasting scenario with one subgroup A2 dominant in all seasons, with the other subgroups especially B2 co-circulating in lower numbers. Long-term surveillance to ascertain if there is genotype replacement in the subsequent years after the study will be important in determining genotype patterns in Kilifi. The dissimilarity between the distribution of genotypes in circulation in Kilifi relative to the global pattern, comparing year by year, supports the hypothesis that HMPV migrates across the world at a relatively slower rate compared to other respiratory viruses like Influenza A to allow the existence of localized genotype replacement patterns.

The G gene is the most variable gene in the HMPV genome [10, 56]. It has been suggested that frequent variation in the G gene may be a strategy to evade the host immune system selective pressure [57]. We found that the G protein evolutionary rate was three times higher in the region we analyzed compared to the F protein evolutionary rate. Overall the diversity observed in the G protein sequences was far higher as compared to the F protein sequences. Although the all G sequences we obtained were of A2 subgroup, based on the phylogenetic clustering, bootstrap support and amino acid change patterns, we could classify our A2 subgroup viruses into four further clusters.

Amplification and sequencing of the G protein did not succeed for more than half of our HMPV positive samples. About 60 samples failed at the G PCR amplification stage and a further 30 failed at the nucleotide sequencing stage. In the end we only obtained sequences for subgroup A2. It is possible that this was caused by insufficient match of our primers to the circulating variants/subgroups thus impeding amplification and sequencing. This low recovery rate of the HMPV G protein sequences limited our study power to fully understand genotypes and variants that circulated in Kilifi over the study period. An alternative explanation for the PCR/sequencing failures is possible RNA degradation as the study used archived material.

The HMPV F gene was determined to be less diverse which concurs with previous findings and pneumovirus F protein diversity in general [8]. Furthermore, 52 % (64/123) of the F gene sequences were determined as 100 % identical to a sequences in the remaining set whilst only 5.3 % (3/56) of the G gene sequences were identical to each other. This suggests that to tease out any differences between these sequences that are 100 % F gene identical, sequencing another gene for instance the G gene or even the whole genome will be necessary.

Whilst there were no N-glycosylation sites detected in the F protein sequenced and only few amino acid changes observed, the G protein had more sites where there was loss of N-glycosylation and also had several amino acid changes owing to the more nucleotide sequence diversity observed. The F portion we sequenced does not encompass the three potential N-glycosylation sites at positions 57, 127 and 353 that have been previously described for HMPV [58].

Many of the characteristic epidemiological and evolutionary patterns observed in this study for HMPV mirror the findings previously reported for RSV from the Kilifi population. For both viruses, the highest disease burden is in the paediatric population occurring during early infancy (though HMPV burden is overall smaller) [49]. Both RSV and HMPV show an annual seasonal pattern with peak activity months well overlapped [49, 50] and multiple genotypes occur during epidemics [50, 59]. Further we show that like for RSV, analysis of the G protein encoding region distinguishes better the variability of strains occurring across epidemics than the F [60]. Nonetheless, a few differences can still be picked in their patterns. Firstly, group/subgroup temporal dominance or replacement is clearer with RSV than HMPV [59]. Secondly, the substitution rate observed in HMPV G appeared much higher than estimated for RSV G [61]. Thirdly, most of the genetic variants in RSV occur over a single epidemic and disappear but for HMPV, variants seem to persist for more than a single epidemic before disappearing [50]. To provide further new insights, future studies should undertake whole genome study of these viruses and analyze specimens collected over a longer time period and across multiple sites in Kenya and Africa for a better understanding of the transmission, evolution and persistence mechanisms of these important human pathogens.

There are a number of limitations of this study that should be considered when interpreting the results. Fewer G gene sequences were obtained from the study, relative to F, and this could be the reason that only the A2 variant from the worldwide pool was identified. There was a change through time in the selection of samples for testing based on residency status, which would have resulted in a wider catchment area for 2010-11 than earlier. However, the low level of temporal clustering observed suggests the samples to be drawn from a similar pool of variants. As noted in previous reports from this surveillance [27, 43, 49], collection of nasal specimens from children with life-threatening features is a continual challenge that could bias estimates of prevalence and variant composition.

Conclusions

In conclusion, we report on 5 years of epidemiological surveillance and on circulating HMPV genotypes in the coastal Kenyan location of Kilifi. The study reveals three of the four globally circulating HMPV subgroups, with the same dominance of A2 subgroup, but with annual variation in subgroup prevalence not mirrored in the wider global dataset, and little temporal clustering of the subgroups A2 in this region of the world. The dissimilarity between the distribution of genotypes in circulation in Kilifi relative to the global pattern, may suggest that slow global migration of HMPV allows the existence of localized genotype replacement patterns. One major peak season of HMPV was observed, and prevalence was universally highest in infants, especially those <6 months of age. Furthermore, HMPV cases occurred in roughly annual outbreaks, with a prevalence of around 5 % in severe and very severe pneumonia paediatric admissions to the County hospital.

Abbreviations

HMPV, human metapneumovirus; RT-PCR, real time polymerase chain reaction; KCH, Kilifi County Hospital; HPD, highest posterior density.