Contagious ecthyma, otherwise known as contagious pustular dermatitis, is a neglected zoonotic disease caused by orf virus (ORFV), a member of the genus Parapoxvirus. It primarily affects sheep, goats, and wild animals and can be transmitted to humans through direct or indirect contact [1, 2]. The virus is extremely contagious and can exist within the animal’s wool and faeces for several years [2]. The infection presents as intensified skin wounds with exacerbated blisters throughout the buccal cavity, leading to weight loss and anorexia. The morbidity rate often reaches 100%, resulting in emaciation in adults and young animals, which negatively affects the herd economy [2]. Animal handlers are prone to zoonotic infection, often manifested by the formation of pustules that can cause severe pain on the hands and can spread to other regions, such as the genitals and face [3]. The ORFV genome consists of double-stranded DNA (dsDNA) and contains almost 130 distinct genes. Genes in the central region are relatively conserved and are involved in mature virion formation and virus replication. In contrast, genes in the terminal regions are more variable and are often associated with virulence and immune modulation [4]. Despite its global distribution, only 14 complete genome sequences are available thus far. The absence of a complete genome sequence of an Indian isolate makes it difficult to perform genetic analysis and thus hinders further functional studies. Therefore, we performed molecular detection of ORFV isolates prevalent within the black Bengal goat breed of central India, followed by a complete genome sequence analysis using a next-generation sequencing (NGS) platform, and we utilized comparative genomics approaches to determine phylogenetic relationships and perform evolutionary and recombination analysis.

The study area included the Dhar district of Madhya Pradesh, a state in central India (75.30E, 22.59 N). During 2017, 10 clinical samples were collected from infected goats aged between one and 11 months that showed typical skin lesions on their lips. The scab samples were stored at −80 °C until further analysis. Total genomic DNA was isolated from the skin tissue according to the protocol described by Sarker et al., using a DNeasy Blood and Tissue Purification Kit (QIAGEN, Germany) [5]. The presence of the virus was confirmed by PCR, utilizing four sets of primers targeting ORFV011, ORFV020, ORFV059, and ORFV108, commonly known as B2L, E3L, F1L, and A32L, respectively (Supplementary Table S1) [6]. The PCR-amplified products were purified using a MiniElute Gel Extraction Kit (QIAGEN, Germany) and sent to a sequencing service for Sanger sequencing. Subsequently, gene-specific phylogenetic trees were constructed to infer the genetic relatedness and transboundary potential of the present isolate compared to other isolates reported in India and other regions of the world. Phylogenetic trees were constructed by the maximum-likelihood (ML) method, using the general time-reversible (GTR) substitution model and 1,000 bootstrap replicates in MEGA 6.0 (Fig. 1, Supplementary Figs. S1-S3). Nucleotide BLAST and phylogenetic analysis based on these four genes confirmed that the isolate from this study was 99%-100% identical to other isolates with sequences in the GenBank database. At the global level, the highest similarity was observed with the isolates China/NP/2011, China/FJ-YT2015, China/HuB13/2013, and USA/ORFD/2003. At the country level, the present isolate shared the highest nucleotide sequence similarity with an ORFV isolate from Odisha, a state in eastern India. The global comparison was consistent with a previous study showing its closeness to Chinese ORFV isolates [6], indicating the transboundary potential of this virus to spread to neighbouring states or countries. We speculate that trade and animal movement are the likely cause of the observed epidemiological linkage between isolates from the two distinct geographical areas.

Fig. 1
figure 1

Phylogenetic analysis based on ORFV011 sequences of ORFV. The phylogenetic tree was constructed by the maximum-likelihood method with the GTR model, using MEGA 6.0 software. The numbers at the branching points indicate the bootstrap support calculated for 1,000 replicates. The black triangle indicates the isolate from the present study.

A phylogenetic tree based on concatenated sequences indicated that the present isolate, Ind/MP/17, is closely related to the isolate China/GO/2012 (KP010354) (Supplementary Fig. S4). Thus, China/GO/2012 was used as a reference genome for mapping and mutation analysis. Viral DNA was isolated from a clinical sample, and sequenced using a NextSeq 500 NGS platform [7, 8]. The resulting genome sequence, with a total length of 139,807 bp, was submitted to the GenBank database under accession number MT332357. Like those of other PPV isolates, the genome of Ind/MP/17 has a high G+C content (63.7%). The inverted terminal repeats (ITRs) are 3,910 bp in length and cover the region of ORFV001 and ORFV134. Each ITR was found to contain a terminal BamHI site. Telomere resolution motifs were composed of TAAAT, followed by a spacer sequence, ACCCGACC, and (T)6 residues, which formed the terminal hairpin loop (Fig. 2). Using the NCBI ORF Finder tool and NCBI BLAST, we identified 132 ORFs for a distinct set of genes. By comparing our sequence with that of the reference genome using the BioEdit and ExPaSy tools, we found 488 unique mutations in the current isolate (Supplementary Table S2). The largest numbers of synonymous amino acid substitutions were observed in RNA helicase NPH-II, RNA polymerase subunit RPO147, and the virion core protein P4a precursor. We observed the largest number of nonsynonymous substitutions within immune regulatory genes, including the NF-kappa pathway inhibitor and ankyrin/F-box protein. These mutations might be responsible for maintaining the heterogeneity and virulence of this pathogen. Using DnaSP to compare all 14 available complete genome sequences of ORFV, the nucleotide diversity (π) and haplotype diversity (Hd) were estimated to be 0.02815 and 1.000, respectively. Selection pressure analysis (θ = dN/dS; 0.02911) revealed that ORFV had undergone purifying selection. Tajima's D test of neutrality resulted in a significant negative value (-0.14928), suggesting that this virus might be undergoing a period of evolutionary expansion. A similar pattern of θ values was obtained using partial gene sequences of recently studied avipoxviruses and ORFV, with values ranging from 0.065 to 0.200 [9, 10]. These results were consistent with the observed dN and dS values and suggested that selection pressure had altered the rate of evolution. Perfect mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeat microsatellites, as well as compound microsatellites, were identified using IMEx software [11] with the following parameters: type of repeat, perfect; repeat size, all; minimum repeat number, 6, 3, 3, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta- and hexanucleotide repeats, respectively. For ??compound simple sequence repeat?? (cSSR) identification, the distance between two ??simple sequence repeats?? (SSRs) (dMAX) was kept at 10 nucleotides. Our study revealed 1,108 SSRs and 94 cSSRs scattered throughout the ORFV genome. The ORFV genome is rich in dinucleotide repeats (76.5%), followed by trinucleotide (18.14%) and mononucleotide repeats (5.14%). Hexanucleotide microsatellites occurred infrequently and constituted only 0.18% of the total microsatellites (Supplementary Fig. S5). The distribution of classified repeats suggested that dinucleotide GC/CG repeats are prevalent within the ORFV genome, which is consistent with what has been reported for other DNA viruses, such as human papillomavirus [12]. Dinucleotide repeats can adopt a Z confirmation or other alternative secondary DNA structures to facilitate recombination activity [13]. By monitoring a single mononucleotide repeat, Houng et al. revealed the transmission dynamics of a human adenovirus during an epidemic [14]. Therefore, these microsatellites could be used as a tool for epidemiological and evolutionary studies of ORFV.

Fig. 2
figure 2

Analysis of the ITRs of ORFV. (A) Left end (5’) sequence alignment of 14 complete ORFV genome sequences, showing a terminal BamHI site (green box) and telomere resolution motifs (red box) (ATTTTTT-N(8)-TAAAT). (B) Right end (3’) sequence alignment of 14 complete ORFV genome sequences, showing a terminal BamHI site (green box) and telomere resolution motifs (red box) (ATTTTTT-N(8)-TAAAT).

We retrieved the 14 available complete ORFV genome sequences from the GenBank database, along with those of two pseudocowpox virus (PCPV) isolates and one bovine papular stomatitis virus (BPSV) isolate, belonging to the genus Parapoxvirus (PPV), and two monkeypox virus (MPV) isolates, belonging to the genus Orthopoxvirus, to construct a phylogenetic tree. Our analysis revealed that all of the ORFV isolates were more closely related to the PPVs (PCPV and BPSV) than to the orthopoxvirus (MPV) (Fig. 3). Based on an analysis of the complete genome sequences of Chinese ORFV isolates, Chi et al. reported that viruses derived from goats and sheep formed two host-specific clades with 100% bootstrap support [15]. Later, Velazquez-Salinas et al. observed another distinct clade of ORFVs isolated from both sheep and goats in Mexico from 2007 to 2011 [10]. Interestingly, this clade included an isolate from sheep (Ger/D1701) for which a complete genome sequence was available [10], allowing us to verify the status of our isolate within the clade by comparing all available complete genome sequences of ORFV. The present isolate shows a close phylogenetic relationship to members of the goat-specific clade and to the isolates Chi/GO/2012 and USA/ORFD/1982, with 63-100% bootstrap support. However, the isolate Ger/D1701 did not appear in the host-specific clade and was associated with other clades of goat isolates, supporting the results of a study carried out by Velazquez-Salinas et al. [10] (Fig. 3). To investigate the source of genetic variation among all of the complete ORFV genomes from diverse geographical areas, we looked for evidence of recombination using the RDP, GENECONV, Bootscan, MaxChi, Chimaera, Siscan, PhylPro, LARD, and 3Seq methods in the RDP4 program [16]. Forty potential recombination events with significant P-values were detected across the complete ORFV genome (Supplementary Table S3). Viruses undergo genetic recombination to form new variants in the population by deleting many of their nonessential genes or acquiring new host genes [17]. The genome sequencing results have shown that recombination plays a vital role in the evolution of viruses, including vaccinia virus and variola virus [18, 19]. The 40 potential recombination events identified in Ind/MP/17 could have actively contributed more than 50% of the genome. Thus, the Ind/MP/17 isolate has the potential to evolve via recombination and act as a major or minor parent to form new variants.

Fig. 3
figure 3

Phylogenetic analysis of PPVs. Nineteen complete genome sequences, including the terminal repeats, were aligned to construct a phylogenetic tree with 1000 bootstrap replicates using the maximum-likelihood method with the GTR model. The black triangle indicates the ORFV isolate from the present investigation.

In conclusion, we report here the complete genome sequence of an ORFV isolate circulating in central India. Through in-depth comparative genomic analysis, we identified recombination events that were possibly involved in ORFV evolution and the generation of novel variants. This information will be useful for further understanding ORFV biology, epidemiology, and vaccine development.