Background

Hepatitis C virus (HCV) is a major worldwide health concern affecting approximately 200 million persons around the world and is the most common blood-borne infection . In 60-85% of HCV cases develop cirrhosis and hepatocellular carcinoma (HCC) [1]. In Pakistan, approximately 17 million people are infected with HCV and 8-10% individuals are HCV carriers [2]. HCV is a member of viral family Flaviviridae, genus hepacivirus [3]. The HCV genome consists of 9.6 Kb, linear, uncapped and single strand RNA (ssRNA). The open reading frame is about 9,024 base pair which encodes a polyprotein of 3010 amino acids [4]. It has been estimated that 1012 virions per day are produced in chronically infected patients. This leads to HCV diversity estimated at 10-3–10-4 base substitutions per site per year [5]. HCV is classified into genotypes, subtypes, isolates and quasispecies [6]. Currently HCV is classified into six main HCV genotypes i.e. 1, 2, 3, 4, 5 and 6 with a genetic variation at nucleotide/amino acid level at 30% [7].

These genotypes vary in their geographical distribution, transmission route and treatment response [8]. Genotypes 3a, 1a and 1b appears to have worldwide distribution due to their transmission through use of injectable drugs, blood transfusion and use of improperly sterilized surgical and medical equipments [9]. Genotype 3a is a common genotype in South Asia and Pakistan [1013] 1b and 1a in the Japan [14], USA and Europe [15], genotype 4 in Middle East, North and Central Africa [16, 17], genotypes 5 in South Africa and genotypes 6 in Hong Kong [18]. The Balochistan province of Pakistan has the highest percentage of 1a (4.03%) [10]; however, the highest prevalence in the country has been reported from Lahore city (23.6%) [19].

The present study describes the phylogenetic characterization of complete genome of an HCV isolate belonging to genotype 1a from Pakistan. In spite of the recent developments we still lack a vaccine against HCV infection. The standard treatment for HCV is pegylated interferon alpha separately or with ribavirin [20]. The combine therapy can eradicate 50% virus in case of genotype 1a. Due to mutation, viruses have made the way to dodge IFN dependent immune response [21]. The triple therapy (PegIFN-α+ribavirin+ protease inhibitors) has 20-39% higher rates of sustained virological response rate compare to Peg-IFN plus RBV. The genetic diversification of HCV is the special characteristic of the RNA molecule. The variation is the result of the error prone NS5B polymerase [22]. Due to this lack of accuracy, a diverse population is generated, called as "quasispecies", almost with a single mutation in each cycle of replication [23]. Through highly dynamic process of replication, it can produce 10 trillion of viruses in a day. This continuing process of mutation allows the HCV to escape from the host immune response leading to persistent infection [9, 2426].

A serum from anti-HCV (anti-HCV positive IMX System ELISA kit Abbot, Germany) and HCV RNA positive individual was collected. This study was designed to amplify, clone and sequence genotype 1a cDNA, Pakistan isolate. Serum sample was genotyped in Molecular Diagnostics lab, CEMB, University of The Punjab to detect HCV genotypes and subtypes in Pakistan [27]. The protocol involved a multiplex PCR [28]. Genotype 1a samples were selected for further analysis. To amplify the entire genome of HCV genotype 1a (Pakistani isolate) in multiple fragments, specific sense and antisense primers were designed for different regions of HCV genotype 1a including 5’ UTR Core, E1, E2, P7-NS2, NS3, NS4a, NS4b, and NS5a, NS5b and 3’ UTR.

Commercially available GF-1 Viral Nucleic Acid Extraction kit (Vivantis, Cat#GF-RD-300, Vivantis Technologies, Subang Jaya, Malaysia) was used for RNA extraction. HCV RNA was extracted from 200 μl serum as per kit protocol. Complementary DNA (cDNA) was synthesized by reverse transcribing the extracted RNA (10 μl) with reverse transcriptase enzyme Maloney Maurine Leukemia Virus (M-MLV reverse transcriptase enzyme) (Invitrogen, Life Technologies, NY, USA). The PCR reaction was carried out in a thermal cycler with Taq DNA polymerase (Invitrogen, Life Technologies, NY, USA). The amplification was performed with 4 μl of cDNA by using sense and antisense primers for each gene with reaction mixture (10X PCR Buffer 2.0 μl, MgCl2 (25 mM) 2.4 μl, dNTPs (2 mM) 2.0 μl, Outer sense primer (10 pmol/μl) 2.0 μl, Outer antisense primer (10 pmol/μl) 2.0 μl, dH2O (nuclease free) up to 4.6 μl, Taq DNA polymerase (5 U/μl) 0.4 U, RT-PCR product 4.0 μl, total reaction volume 20 μl. Second-round PCR were performed for each sample, nested PCR was done by using internal primers IAS and IS within the first round PCR amplicon. PCR products were analyzed on a 1.2% agarose gel. For purification of DNA from agarose gel GF-1 Gel DNA Recovery Kit (Vivantis Cat# GF-GP-100, Vivantis Technologies, Subang Jaya, Malaysia) was used following the manufacturer’s protocol. Once pure DNA products were obtained, these products were accurately quantified using a spectrophotometer (NanoDrop™, NanoDrop products, USA). Sequencing of the PCR amplified fragments was performed by triplicate using gene specific reverse and forward primers in separate reactions. Sequencing was performed according to the manufacturer’s instructions (Big Dye Deoxy Terminators; Applied Biosystems, Weiterstadt, Germany) on automated sequencer (Applied Biosystems; 3100 DNA Analyzer). The reaction mixture for single reaction consisted of, Big Dye 2 μl, 5X sequencing buffer, 1.5 μl, forward or reverse gene specific primer 1 μl (10 pmole), sterile dH2O 3.5 μl, template DNA 2 μl, total reaction volume 10 μl. Once confirmed by sequencing analysis, cloning of PCR amplified DNA fragments were performed using TA cloning kit (Invitrogen, USA Cat # K2020-20, Life Technologies, NY, USA). Clones were sequenced by triplicate to obtain consensus sequence for entire genes. The sequencing data was then analyzed for different clones carrying the various fragments of HCV Genotype 1a and the corresponding consensus sequence was generated. The sequence was then submitted to NCBI GenBank database. Homology studies of the nucleotide sequences of amplified and sequenced PCR products with known nucleotide sequences present in NCBI was done through standard nucleotide–nucleotide Blast (Basic Local Alignment Search Tool) software available at website http://www.ncbi.nlm.nih.gov/BLAST.

A detail search of Genebank was carried out to find the sequences from different countries, specially neighboring countries for analysis, of full length sequence of Hepatitis C virus, subtype 1a. HCV subtype 1a, were not available from Iran and India. India has reported (8) full length sequence of 3a, (1) 3i and subtype of a full-length sequence is not mentioned in Genebank. For Phylogenetic analysis, and genetic distance, full length genome sequences representing eight different HCV subtypes 1a were retrieved from GenBank database.

These sequences were reported from HCV 1a infected patients residing in different countries. Denmark AF271632.1, United Kingdom EU862841, Japan AB520610.1, Switzerland AF271632.1, Germany AF271632.1, Germany (Baden-Wurttemberg) EU862841, USA AF271632.1, USA (Massachusetts Boston area) EU862841, USA (Tennessee) EU862841, USA: Massachusetts EU862831.1and USA (New York) EU862841. Pair wise and multiple alignment of the nucleotide sequences was performed by using ClustalW [29].

Evolutionary relationships of taxa by Neighbor-Joining method

Phylogenetic analysis was conducted by using Neighbor-Joining method [30] at 1000 bootstrap analysis and substitutions method was transitions plus transversions using MEGA 5 software package. The optimal tree with the sum of branch length was 0.62167471. The evolutionary distances were computed using the p-distance method and are in the units of the number of base differences per site. The analysis involved 12 nucleotide sequences [31] (Figure 1a).

Figure 1
figure 1

(a) Evolutionary relationships of taxa by Neighbor-Joining method and (1b) Molecular Phylogenetic analysis by Maximum Likelihood method.

Molecular phylogenetic analysis by maximum likelihood method

The evolutionary history was inferred by using the Maximum Likelihood method based on the General Time Reversible model. The tree with the highest log likelihood was (−42333.9343). Initial tree for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The tree was drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 12 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. Substitutions type (nucleotide), very strong branch swap filter and single number of thread. There were a total of 9716 positions in the final dataset [31] (Figure 1b).

The estimates of evolutionary divergence analysis for nucleotide and amino acid was conducted using the equal input model [32]. The sequences were translated by Standard genetic code. All positions containing gaps and missing data were eliminated.

In this study we report the first full length sequence of HCV Pak-cemb-1 from Pakistan (KC283194). The estimated of evolutionary divergence of Pakistani isolates, was (AF2711632.1-Germany) was 0.06557 and (AF271632.1-Denmark 0.29400, (Table 1). The Pakistani isolate has travelled an evolutionary nucleotide distance of approximately 0.22843 nucleotide distance. The sequence is phylogenetically similar to a German strain in comparison to the countries USA, United Kingdom, Switzerland, Japan and Denmark.

Table 1 Estimates of evolutionary divergence of various countries (isolate) from KC283194-Pakistani (isolate)

HCV infection is a matter of serious concern in Pakistan. Approximately 17 million people are infected and in the past ten years the predominant genotype has been shown to be 3a (60–55.10%), followed by genotype 1a, with a rate of 10.25% [33, 34]. This shows that this virus is rapidly spreading in Pakistan. However, its various genotypes have not been characterized genetically except 3a [35]. The phylogenetic analysis of the full length HCV genotype 1a confirms the designation of Pakistani isolate to be 1a. It was found in the same cluster of full length HCV genotype 1a sequences reported from different continents of the world. The Pakistani isolate has diverged more rapidly compared to other similar German strain. This indicates that divergence of this Genotype 1a Pakistani isolate is occurring at more rapid evolutionary speed in correlation to other 1a genotype lineages reported from different regions of the world.

Current therapy of choice is pegylated interferon alpha separately or in combination ribavirin, which can eradicate 50% virus in case of genotype 1a [20]. The inclusion of protease inhibitors to the current therapy increases 20-39% higher rate of sustained virological response rate [21].

Direct acting antiviral agents (telaprevir and boceprevir) has been approved by FDA, an are recommended for genotype 1a. HCV genotype 1a shows greater hindrance to treatment compare to other genotypes. Due to its nature and prevalence in Pakistan, the evolutionary analysis will help in the evaluation and development of new antiviral therapies and possible vaccine development. Moreover, the association of HCV genotype 1a full length nucleotide sequences with the mutational study, epidemiology, severity of disease and its response to interferon therapy needs to be evaluated