Background

Papillomaviruses are a large group of pathogens that cause epithelial proliferations in a wide spectrum of vertebrate species. More than 100 different human papillomaviruses (HPV) have been isolated [1, 2]. Such an extensive genotype variety has not yet been detected in nonhuman species, although papillomavirus genomes have been isolated from many species where careful investigational efforts were made [3, 4]. Most papillomaviruses appear to be species-specific or at least restricted to infection of closely related animals within the same genus. Papillomavirus genomes have been cloned from 20 mammalian host species. Thus far, only two avian papillomavirus genomes were cloned, one from a chaffinch and the second from a parrot papilloma. To date no complete avian papillomavirus genome had been sequenced.

In a large survey of 25,000 captured chaffinches (Fringilla coelebs) in the Netherlands, papillomas were found on the foot or tarsometatarsus (the bare part of the leg) of 1.3% of the birds [5]. The DNA of a Fringilla PV (FPV) was isolated from such skin papillomas, and two partial sequences of FPV, totaling about 900 basepairs, were determined [6, 7]. Papillomavirus particles have also been observed using electron microscopy in greenfinches (Carduelis chloris) [8], and in canaries (Serinus canarius) [9].

Cutaneous papillomas were observed on the palpebrae, commisure of the beak, and the head of a captive African grey parrot (Psittacus erithacus timneh) [10]. The Psittacus erithacus papillomavirus genome was cloned [11], and abbreviated PePV in accordance with the nomenclature guidelines for nonhuman papillomaviruses [3]. PePV DNA was also detected in one other oral papilloma of an African grey parrot, and was not present in an oral papilloma of an Amazon parrot (Amazona ochrocephala), nor in 24 cloacal papillomas of Amazon parrots, macaws (Ara sp.), conures (Aratinga sp.) and cockatoos (Cacatua moluccensis) [12, 13].

This report describes the first complete nucleotide sequence of an avian papillomavirus from a cutaneous lesion of an African grey parrot: PePV.

Results and discussion

Complete sequence of the PePV genome

The complete nucleotide sequence of PePV contains 7304 basepairs (bp) (Fig. 1), and has a GC-content of 49.3%. The size of the PePV genome is the second-smallest of the animal papillomaviruses, after bovine papillomavirus type 4 (BPV-4) (7265 bp) [14]. The part of the PePV E1 ORF containing the Sal I cloning site is homologous and colinear with the corresponding region in other PVs. This indicates that no sequences have been lost during the establishment of the PePV clone. The position of the first nucleotide of the PePV genome corresponds to the start codon of the first major open reading frame in the early protein region.

Figure 1
figure 1

Nucleotide sequence of the Psittacus erithacus papillomavirus type 1 (PePV), GenBank Accession number AF502599.

Open reading frame organization

All papillomaviruses have their open reading frames (ORFs) on the same coding (sense) strand of their circular double-stranded DNA genome. Usually, a papillomavirus genome contains seven major ORFs coding for five early (E) proteins E6, E7, E1, E2 and E4, followed by two late (L) capsid proteins L2 and L1, and a non-coding upstream regulatory region (URR). The layout of the PePV genome is different from the organization of other characterized PV genomes (Fig. 2). PePV does not contain classic E6 or E7 ORFs. Instead, it contains an E8 ORF in front of the E1 ORF, followed by an E9 ORF which overlaps with the aminoterminal part of the E1. The E8 ORF has the capacity to code for a 177 amino acid (aa) protein with a predicted molecular weight of 19.6 kilodaltons (kDa), and the E9 ORF encodes a 195 aa protein of 22.7 kDa. Both E8 and E9 do not show recognizable homology with any other known papillomavirus or non-papillomavirus proteins in GenBank. Manual alignment of the PePV E8 with a series of E7 proteins revealed two Cys-X-X-Cys motifs separated by 23 amino acids instead of the usual 28 or 29 residues. In the aminoterminal part of E8, a stretch of amino acids (DNLLCHESSMDD) is similar to the putative cellular division motif involved in retinoblastoma tumor suppressor protein (pRb) binding. In PePV, this pRb-binding motif is more related to the pRb-binding domain of the large T (LT) antigen of polyomaviruses than that of the E7 of other papillomaviruses. However, alignment of PePV E8 with SV40 LT did not reveal further similarities. The E8 is maybe part of a remnant of a very ancient common evolutionary origin of the early region proteins of avian papillomaviruses and the LT proteins of polyomaviruses. Computational motif searches (using SMART, InterPro and Prosite algorithms) failed to detect biologically significant sites, patterns or conserved protein motifs in PePV E8 and E9. Using the PSORT algorithm for the prediction of the subcellular location of proteins [15, 16], E8 was predicted to be cytoplasmic, and E9 to be either nuclear or cytoplasmic. The cellular existence and function of E8 and E9 remains unknown.

Figure 2
figure 2

Linear representation of the ORFs of the Psittacus erithacus papillomavirus genome. NCR; non-coding region.

Unusual ORFs have also been described in the genomes of the Europen elk papillomavirus, deer papillomavirus, and reindeer papillomavirus, which contain a transforming gene (E9) between the E2 and L2 ORFs [17]. The subgroup B bovine papillomaviruses (BPV-3, -4, and -6) are another group of papillomaviruses that lack an E6 (but do have an E7) [14, 18]. Instead of an E6, there is a BPV-4 E8 ORF that encodes a 42-residue polypeptide that can transform NIH3T3 cells [19]. No discernable homology between BPV-4 E8 and PePV E8 sequences was detected. It seems that E6 functions are either not required by some papillomaviruses, or that they are performed by another viral (or host) protein. It remains to be established if this unique organization of the early region is typical for PePV or common to other/all avian papillomaviruses.

Upstream regulatory region (URR)

In PePV, the noncoding region or upstream regulatory region (URR) between the stop codon of L1 and the first ATG of E8 is only 460 bp long (nucleotides (nt) 6845–7304). Only one typical palindromic E2-binding site (E2BS) with the consensus sequence ACC-N6-GGT is found at nt 7214–7225. Two additional atypical putative E2BS (ACC-N4-GGT) are found at nt 7020–7029 and 7174–7183. The URR also contains a polyadenylation site (AATAAA; nt 7279–7284) located 16 nucleotides 5' of a CA dinucleotide, necessary for the processing of the L1 and L2 capsid mRNA transcripts.

PePV sequence similarity to other papillomaviruses

The sequence similarity between PePV-1 and HPV-1 (a benign cutaneous PV), HPV-5 (an epidermodysplasia verruciformis-associated PV), HPV-16 (a prototypic mucosal high-risk PV), and bovine papillomavirus type 1 (BPV-1, a fibropapillomavirus) was investigated by pairwise alignments of the corresponding ORFs and their proteins (Table 2). PePV showed only low similarity to other papillomaviruses. Maizel-Lenk (dot) matrix plots illustrate that similarity can only be observed in the conserved parts of E1 and L1 (Fig. 3).

Figure 3
figure 3

Dot plot matrix (Maizel-Lenk plot) aligning PePV with HPV-5 (A), HPV-1 (B), HPV-16 (C), and BPV-1 (D),

Table 1 Position of the open reading frames of PePV.
Table 2 Percentage nucleotide (amino acid) similarity of the different PePV ORFs with corresponding ORFs of HPV-1, HPV-5, HPV-16, and BPV-1.

Phylogenetic analysis

In order to compare the PePV sequence with that of the chaffinch (FPV), we retrieved two partially overlapping partial sequences in the FPV L1 ORF (GenBank accession numbers K02020 and U29669) and one piece of FPV E1 sequence (K02019) from GenBank [7, 20]. A total of 312 amino acids (132 E1 and 180 L1 residues) could be compared for both viruses. The similarity at the amino acid level between PePV and FPV was 68% in the E1 region, and 47% in the L1 region.

To define the relationships of PePV with FPV and with other papillomaviruses, we constructed a phylogenetic tree based on a compound E1/L1 312 amino acid sequence alignment of 50 human and animal papillomaviruses. The resulting neighbour-joining phylogenetic tree (fig. 4) clusters different papillomavirus groups, largely according to their tissue tropism and oncogenic potential [1, 21, 22]. The two avian papillomaviruses form a monophyletic cluster with a common branch that originates close to the unresolved center of the unrooted papillomavirus evolutionary tree, near to the origin of the branch that groups the cutaneous papillomaviruses associated with epidermodysplasia verruciformis (EV). The avian papillomaviruses occupy a unique position among the other known papillomaviruses, with whom they are only distanly related.

Figure 4
figure 4

Phylogenetic analysis of a 312 amino acid alignment (132 E1 and 180 L1 residues, corresponding to nt. 2015–2410 in HPV-1 E1, and nt. 6292–6831 in HPV-1 L1) of 50 human and animal papillomaviruses. Papillomaviruses included (with their GenBank accession numbers) were FPV (K02019, K02020 and U29669), bovine BPV 1 (NC_001522), BPV 2 (NC_001521), BPV4 (X05817), canine oral COPV (NC_001619), cottontail rabbit CRPV (NC_001541), deer DPV (NC_001523), Equus caballus EcPV (AF498323), European elk EEPV (NC_001524), Felis domesticus FdPV 1 (AF480454), HPV 1 (NC_001356), HPV 3 (NC_001588), HPV 4 (NC_001457), HPV 5 (NC_001531), HPV 6 (NC_000904), HPV 9 (NC_001596), HPV 12 (NC_001577), HPV 13 (NC_001349), HPV 15 (NC_001579), HPV 16 (NC_001526), HPV 17(NC_001580), HPV 18 (NC_001357), HPV 19 (NC_001581), HPV 22 (NC_001681), HPV 23 (NC_001682), HPV 25 (NC_001582), HPV 29 (NC_001685), HPV 35 (X74477), HPV 37 (NC_001687), HPV 44 (NC_001689), HPV 48 (NC_001690), HPV 49 (NC_001591), HPV 50 (NC_001691), HPV 51 (NC_001533), HPV 52 (NC_001592), HPV 53 (NC_001593), HPV 56 (NC_001594), HPV 58 (NC_001443), HPV 60 (NC_001693), HPV 63 (NC_001458), HPV 65 (NC_001459), HPV 68 (X67161), HPV 75 (Y15173), Mastomys natalensis MnPV (NC_001605), Ovine OvPV 1 (NC_001789), OvPV 2 (NC_001790), Psittacus erithacus PePV (AF502599), Canine papillomavirus type 2 CPV2 (unpublished), Phocoena spinipinnis PsPV 1 (NC_003348), Rhesus RhPV 1 (NC_001678), and rabbit oral ROPV (NC_002232).

Divergence between PePV and FPV coincides with divergence of their Psittaciformes and Passeriformes host species

Mammals and birds diverged around 310 million years (Myr) ago during the late Paleizoic Era [23]. The start of the first avian differentiation into Paleognathae (larger, flightless birds such as ostrich, rheas, cassowary, emu, and kiwi), Galliformes (turkey, chicken), Anseriformes (duck, goose) and other Neognathae (all other extant modern birds) is currently thought to have coincided with the Mesozoic breakup of the world-continent Pangaea at the Jurassic-Cretaceous boundary 146 Myr ago [24]. Most of the extant avian orders were establishing themselves at the Cretaceous-Tertiary boundary 65 Myr ago [25]. The Psittacus erithacus belongs to the family of the Psittacidae in the order of the Psittaciformes (parrots), whereas the chaffinch (Fringilla coelebs) belongs to the family of the Fringillidae in the order of the Passeriformes (perching birds).

The rates of nucleotide substitutions between two homologous sequences can be used as a measure for the time elapsed since the two sequences diverged (i.e. the molecular clock concept). When we know the papillomavirus mutation rate, we can approximate the divergence time between PePV and FPV. We earlier calculated a papillomavirus mutation rate based on the divergence of the Felis domesticus PV (FdPV-1) and the canine oral papillomavirus (COPV) since the divergence of their Felidae and the Canidae host species 38–50 Myr ago, to be 0.73 to 0.96 × 10-8 nucleotide substitutions per site per year [26]. We constructed a 921 bp pairwise alignment between PePV and FPV for 441 bp in E1 and 480 bp in L1 where the nucleotide sequence of FPV was available in GenBank (corresponding to nt 1908–2348 and 6107–6586 in PePV). In this 921 bp alignment, 385 differences between PePV and FPV were observed. Using the estimations of the mutation rates derived from the feline/canine papillomavirus divergence, this corresponds to PePV and FPV diverging from each other 44 to 57 Myr ago. Since these calculations were based on alignments of the most conserved regions in the papillomaviral genome, this calculation is likely an underestimation of the true divergence time. This would place the PePV/FPV divergence at about the same time that their Psittaciformes and Passeriformes host species were diverging from each in the Late Cretaceous or Early Tertiary period. The high level of congruence between divergences in the papillomavirus phylogenetic tree and the divergence of their host species lineages supports co-speciation. Co-speciation was also hypothesized to be a prominent feature in mammalian and avian herpesvirus evolution [27].

Papillomaviruses in inbred species: emerging infectious pathogens?

Although African grey parrots are not yet officially listed as an endangered species, their survival is threatened by the same factors that most other exotic parrot species face, and they are monitored by the World Wildlife Fund for conservation concerns. In nature, the range of the African Grey parrot extends from Guinea-Bissau and Sierra Leone to southern Cameroon, Congo, Uganda, nortwestern Tanzania, and southwestern Kenya. Human-caused destruction and fragmentation of their habitats in the tropical rain forests cause an increase in inbreeding and reduced heterozygosity. The illegal parrot smuggling trade also causes the natural population size to decrease, and the ensuing import restrictions lead animal breeders to a higher degree of inbreeding in the captive population. Reduced diversity in the major histocompatibility complex (MHC) genes due to inbreeding and genetic bottlenecks may contribute to an increased sensitivity to emerging infectious pathogens, as has been observed in exotic felids [28]. Whenever a population goes through a demographic and genetic reduction, papillomaviruses seem to become more prevalent. This has been documented in endangered exotic felid species, such as the snow leopard, where papillomaviruses are causing an increasing number of cutaneous squamous cell carcinomas [29]. Also the Florida manatee, one of the most endangered marine mammals in American coastal waters, currently suffers from an epidemic of viral papillomatosis [30]. We have previously described this phenomenon in pygmy chimpanzees (Pan paniscus) and in Greenlandic Inuits and Navajo Indians, where species-specific papillomaviruses (PpPV and HPV-13, respectively) cause oral focal epithelial hyperplasia, a disease rarely encountered in non-inbred populations [31, 32].

Papillomaviruses are ancient viruses that infect amniotes

The amniotes (Amniota) are a clade that includes the dinosaurs and most of the extant land-dwelling vertebrates, namely mammals, birds and reptiles. They evolved 360 to 286 Myr ago during the Carboniferous Period in the late Paleozoic Era. Papillomaviruses have currently been characterized in more than 20 mammalian species, and in 2 avian species. Papillomavirus particles have also been described in a a Bolivian side-neck turtle reptile species [33]. Since papillomaviruses have been described in mammals, birds and reptiles, and were never found in amphibians or fish, it is tempting to speculate that the host-specificity of papillomaviruses would encompass the amniotes. This means that species-specific papillomaviruses could potentially infect more than 20,000 living species, living in virtually every habitat of the planet. We know that papillomaviruses have been detected throughout the world, even in non-gregarious hosts. This wide geographic distribution cannot be attributed to transmission as an airborne infection (with the possible exception of pulmonary fibromatosis in European elks) [34], since transmission of papillomaviruses requires close direct cutaneous or mucosal contact.

Together with the viral species-specificity and the genomic stability of their double-stranded DNA, this requirement for close physical contact makes it unlikely that interspecies transmission in recent history can account for the global presence of a spectrum of papillomaviruses in many amniotes. Assuming that, like humans, all of the more than 20,000 species in the amniotes clade have their own set of species-specific genotypes (humans have more than 100 HPV genotypes), papillomaviruses could be the oldest, largest, and most diverse viral family.

Materials and methods

DNA sequencing

The PePV genome was cloned in the Sal I restriction enzyme site of pBR322 [11]. Subclones were prepared by partial digestion of the PePV-insert with Sau 3AI. The Sau 3AI restriction fragments were ligated with dephosphorylated Bam HI-cut pUC19. After transformation of MAX Efficiency DH5α E. coli (Life Technologies/Invitrogen, Carlsbad, CA), the bacteria were incubated for blue-white colony screening on agar plates containing X-gal and IPTG. Ten Sau 3AI-subclones with a PePV-insert ranging in size from 250 to 1900 basepairs were investigated. Plasmid DNA was extracted using the QIAGEN Midi Plasmid Purification Kit (QIAGEN, Hilden, Germany). Nucleotide sequencing was started using pBR322-specific primers or the universal primers in the multiple cloning site of pUC19. Primer walking sequencing was performed, using 59 sequencing primers to cover the complete genome on both strands. Sequencing was performed on an ABI Prism 310 Genetic Analyzer (Perkin-Elmer Applied Biosystems, Foster City, CA, USA) at the Leuven and Prague core DNA sequencing facilities. Chromatrogram sequencing files were inspected with Chromas 2.2 (Technelysium, Helensvale, Australia), and contigs were prepared using SeqMan II (DNASTAR, Madison, WI).

DNA sequence submission

The nucleotide sequence data reported in this paper were deposited in GenBank using the National Center for Biotechnology Information (NCBI, Bethesda, MD) BankIt v3.0 submission tool http://www3.ncbi.nlm.nih.gov/BankIt/ under accession number AF502599.

DNA and protein sequence analysis

DNA and protein similarity searches were performed using the NCBI WWW-BLAST (Basic local alignment search tool) server on GenBank DNA database release 118.0 [35]. Molecular weight of the putative proteins was calculated using the Molecular Biology Shortcuts (MBS) ProtCALC program http://www.justbio.com/protcalc/. The subcellular location of proteins was predicted using the PSORT II server at the National Institute for Basic Biology in Okazaki, Japan http://psort.nibb.ac.jp[15, 16]. Protein motif searches were performed using SMART (Simple Modular Architecture Research Tool) at the European Molecular Biology Laboratory (EMBL) in Heidelberg http://smart.embl-heidelberg.de[36], InterPro (release 4.0) at the European Bioinformatics Institute (EBI) http://www.ebi.ac.uk/interpro[37], and Prosite (release 17.7) at the proteomics server of ExPASy (Expert Protein Analysis System) of the Swiss Institute of Bioinformatics (SIB) http://www.expasy.ch/prosite[38]. Pairwise sequence alignments were calculated using the GAP-program on the Sequence Analysis Server at Michigan Technological University http://genome.cs.mtu.edu/align/align.html. Maizel-Lenk dot matrix plots were calculated via the DotMatrix module on the 'Molecular Toolkit' server of Colorado State University http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html using a (for PV full genome alignment optimal) window size of 115 nucleotides and a mismatch allowance of 60/115. Multiple sequence alignments were prepared using CLUSTALW [39], and corrected in the GENEDOC alignment editor [40]. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 2.1 [41].