Introduction

Contagious ecthyma or Orf is a zoonotic viral disease of sheep, goats, and other small ruminants characterized by proliferative skin lesions in and around the oral cavity in the form of erythematous macule, papule, vesicle, pustule, and scabs. The causative agent is the Orf virus (ORFV), a member of the genus Parapoxvirus of the Poxviridae family. The virus is highly contagious, quite stable in the environment, and remains in the infectious form in wools or animal excreta for months to years1. The disease is manifested by proliferative lesions on the mouth and muzzle that usually get resolved in 1–2 months2. These facial and oral lesions in lambs may interfere with suckling, while lesions on the udder may interfere in feeding neonates. Similarly, foot lesions often cause transient lameness in infected animals, and together all these results in poor health and loss of body weight. Lesions progress through all clinical stages but are generally non-proliferative and usually resolve within 2–3 weeks. ORFV specific antibodies do not seem to confer protective immunity, although the IgG2 isotype is believed to provide some defense against ORFV infection3. As IgG2 is not secreted in the colostrum of ruminants, lamb and kids don’t get required protection4. Although Orf is normally non-fatal in adults, often comes with high morbidity (up to 100%). While in neonates, Orf can be life-threating as it interferes with suckling of milk from the infected udder or predisposing the animals to the secondary bacterial or fungal infections5. For these reasons, the mortality rate may reach up to 15%6. There is increasing evidence of ORFV to cross‐infect other species of animals such as camels, gazelles, reindeers, musk ox, and Japanese serows3.

The virus can infect humans, particularly those who are closely associated with animal handling. Zoonosis occurs most frequently during lambing, shearing, docking, drenching, or slaughtering of affected animals1,3. Orf infections in humans appear in hand7 but occasionally seen in the face8, nose9, axilla10, scalp11, genitals12,13, urethral8, and pericanthal eyelid skin and the wound heals spontaneously. However, in immunosuppressive individuals, large-sized poorly healing lesions could remain for an extended period up to a couple of months14. This possesses a significant health risk to animal-handlers and veterinarians who often get infected by direct contact and develop painful pustular lesions in the skins. Complications of Orf with secondary bacterial infections are potentially life-threatening and need urgent medical attention.

The ORFV is a classic epitheliotropic virus, having a double-stranded DNA genome with a higher (64%) GC content15. The genome consists of central conserved and terminal variable domains with size varying from 134 to 139 kbp having ~ 130 putative genes, 88 of which are conserved to Chordopoxviruses16,17. Having such a devastating character, this virus has got less attention in terms of genomic information, which is evident from the availability of only eleven complete genome sequences worldwide. Several conserved genomic regions such as envelope protein B2L (ORFV011), F1L (ORFV059), and A32L (ORFV108) were used for ORFV identification and phylogenetic tree construction18. Still, there is a lack of clarity regarding the real diversity of ORFV due to the absence of a reliable system for virus identification, which consists of hypermutable regions such as microsatellites rather than conventional conserved genes.

Simple sequence repeats (SSRs), also known as microsatellites, refer to mono-, di-, tri-, tetra-, penta- and hexanucleotide sequence units that are repeated in tandem in a genome19. Those short motifs of DNA are distributed ubiquitously in the genome of eukaryotes20, and prokaryotes21, and is regarded as the most variable type of DNA sequence within the viral genome22,23. The microsatellites may be classified as either simple or compound, depending on the constituent of nucleotide sequences. The interruptions present in the microsatellite will give rise to interrupted pure, compound, interrupted compound, complex and interrupted complex types. Two or more microsatellites resides directly adjacent to each other to form compound microsatellites by interruption of repeats24. Due to its unique characteristics, these SSRs play a major role in meiotic recombination25,26,27, the evolution of species28, genome mapping29, differentiation of viral strains30, studying population genetics31, and secondary structure formation32. Many studies have highlighted the presence of microsatellite repeats in viruses, such as menovirus33, vesicular stomatitis virus34, hepatitis C virus35, and human respiratory syncytial virus (RSV)36. Here, we report for the first time a comparative analysis of microsatellites with respect to the abundance, distribution, composition, and polymorphism of SSRs within ORFV through in-silico approach, followed by the development and characterization of thirteen microsatellites markers. Using these tools, we further tested its usefulness by screening the viral genome from an ORFV outbreak and constructing a concatenated phylogenetic tree, which elucidated that the investigated virus closely related to the Chinese isolate. These markers could be used as a tool for making multiplex PCR assays for virus identification, strain demarcation, and evolutionary analysis.

Materials and methods

Genome sequences

The publicly available eleven complete genome sequences of ORFV isolates obtained from the NCBI database (www.ncbi.nlm.nih.gov) were used for genome-wide in-silico microsatellites analysis. To compare genomic sequences of different lengths, we calculated the Relative Density (RD) and Relative Abundance (RA) values. RD is defined as the total length (bp) contributed by each microsatellite per kilobase (kb) of sequence analyzed whereas; RA is the number of microsatellites present per kb of the genome (kb). Among all the strains, we have chosen OV-SA00 (Acc. number: AY386264) as the reference to evaluate the polymorphism of microsatellites through in-silico approach as well as the development of SSRs for Indian origin ORFV (Table 1).

Table 1 Overview of microsatellites in ORFV complete genome sequences.

Microsatellites identification, investigation, and statistical analysis

For identification of perfect mono, di, tri, tetra, penta, hexa as well as compound microsatellites, IMEx software37 was utilized. Microsatellites from genomes were extracted using the ‘Advance-Mode’ of IMEx using the parameters previously used for RNA viruses38,39 and DNA viruses40. The parameters used were as follows: type of repeat: perfect; repeat size: all; minimum repeat number: 6, 3, 3, 3, 3, 3 for mono, di, tri, tetra, penta and hexanucleotide repeats, respectively. The maximum distance allowed between any two SSRs (dMAX) is 10 nucleotides. Other parameters were used as default. Compound microsatellites (cSSRs) were not standardized in order to determine real composition.

Multiple sequence alignment and identification of polymorphic SSRs

The microsatellites of OV-SA00 were considered for the identification of polymorphic microsatellites as well as consensus motifs. Sequences were first transferred to BioEdit version 7.2.5 software41 and aligned by CLUSTAL W42 module and checked manually for the presence of polymorphism. The Circos plot was generated using the Circos software to map the genome size, CDS, SSRs distribution, cSSRs distribution, and GC content in ORFV (OV-SA00) genome.

Disease outbreak and sample data collection

The study did not involve experiments on live vertebrates. Rather, samples were collected from the diseased goats (showing the symptoms of Orf) those reported for veterinary care where scab samples were collected by veterinary professionals as a routine practice. In October and November 2017, an outbreak of ORFV was noticed in Black Bengal goats in the Eastern-Indian state of Odisha with the geographical location (20.4625° N, 85.8830° E). Tissue samples in the form of scabs from four suspected goats were collected at both infective and recovery/convalescent phase and simultaneously treated for wounds with 2% boro glycerine and parenteral application of Enrofloxacin @ 5 mg/kg IM (Fig. 1).

Figure 1
figure 1

ORFV infection in goat. Representative figure depicting clinical cases of ORFV infection in Black Bengal goat having proliferative lesions around the lip recorded in the study area.

About 5 g of tissue samples were collected from each animal and subsequently dissolved in phosphate-buffered saline (PBS, pH 7.2) added with antibiotics and antifungal supplements in a labeled sterile tube. The homogenized samples were then treated with tissue lysis buffer containing proteinase K, and the mixture was incubated at 56 °C overnight. Finally, the mixture was passed through a column, and DNA was purified from the column by using the standard phenol–chloroform method as described by Sambrook et al.43 and stored at − 20 °C until further use. The suspected samples collected during this outbreak produced the expected PCR-amplified fragment size of 140 bp using ORFV specific primers orf1 and orf244 having nucleotide sequences Orf1: 5′-CGCAGACGTGGCTGAGTACGT-3′ and Orf2: 5′-TGAGCTGGTTGGCGCTGTCCT-3′, which confirmed the presence of the virus.

Development of polymorphic SSRs

The polymorphic microsatellites identified through in-silico approach were further validated through in-vitro approach using ORFV positive clinical sample. Motifs located within defined flanking regions were PCR amplified using specially designed SSR-PCR primer pairs by Primer3Plus web tool (https://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi/). The primer length was kept between 18 and 22 bp with product size in the range of 130–200 bp. For proper annealing to the template DNA, the annealing temperature was adjusted between 54 and 61 °C. The thermal cycling conditions for all genes were as follows: initial denaturation step at 95 °C for 5 min, with 35 cycles of denaturation at 95 °C for 50 s, with varying annealing temperature for each set of primers (55–61 °C) and extension step at 72 °C for 90 s with a final extension at 72 °C for 7 min. PCR amplification was performed in a Thermal Cycler system 2,720 (Applied Biosystems, USA) (Table 2).

Table 2 Characteristics of the 13 microsatellite markers developed for the ORFV.

The amplified products were resolved by electrophoresis in a 3% agarose gel. The PCR amplified products, stained with ethidium bromide, were visualized and photographed using a Gel Doc™ XR + System with Image Lab™ Software (Bio-Rad®). Subsequently, the amplified products were purified using QIAquick® purification kit (QIAGEN, USA) and the purified fragments were sent for sequencing using 3100 ABI sequencer (Applied Biosystems, USA) as described by Sanger et al.45. All sequences obtained were analyzed and verified twice in each direction.

Sequencing data analysis and phylogenetic tree construction

The sequencing results of the developed SSR markers were aligned by using discontiguous-MegaBLAST to identify specific regions among the reads (microsatellites) within the ORFV genome46. Next, the sequencing results were subjected to the BLASTx analysis, which compares translational products of the nucleotide query sequence to protein databases (https://www.ncbi. nlm.nih.gov). A concatenated phylogenetic tree was constructed using the bootstrap consensus tree building method of neighbor-joining with bootstrap value 500 through MEGA 5 to elucidate the genetic relationship of the outbreak sample with the global strains of ORFV.

Results

Distribution of SSRs and cSSRs in ORFV genome

Our study revealed a large number of SSRs scattered throughout the ORFV genomes varying from 1,036 to 1,181 in number with an average of 1,092 per genome. The RA and RD ranged from 7.6–8.4 and 53.0–59.5, respectively, in the analyzed ORFV genomes. However, previous reports in other DNA viruses such as human papillomaviruses (HPVs), the RA and RD ranged from 3.6–8.3 and 23.9–59.147. In the case of Herpesviruses, RA and RD occurred to be 4.1–13.3 and 26.9–102.948. On examining the SSR unit size classes, dinucleotide repeats were found to be most abundant (76.9%), followed by trinucleotide (17.7%) and mononucleotide repeats (4.9%) in all the genomes. Tetranucleotide and hexanucleotide repeats were least in number and represented 0.4% and 0.2% within the ORFV genome, respectively. There were no SSRs with pentanucleotide repeats observed in the ORFV genome. Approximately 90% and 10% of microsatellite motifs were distributed within coding and noncoding regions. Among the non-coding region, 4.8% are present in the UTR, while 5.4% in the intergenic regions, where functional protein and hypothetical protein occupied 68.8% and 21%, respectively. The genome-wide scan revealed the presence of 83–107 cSSRs, with an average of 93 occurrences per genome. In the case of compound microsatellites, the calculated RA and RD ranged from 0.6–0.8 and 12.1–17.0. However, in other DNA viruses such as HPVs, RA, and RD exhibited 0–1.2 and 0–27.3, whereas, in Herpesviruses, the RA and RD occurred 0.1–1.8 and 2.2–35.147,48. Approximately 89.5% and 10.5% of microsatellite motifs were distributed within coding or non-coding regions, respectively. Among the non-coding region, 5.0% were represented in the UTR while 5.5% in the intergenic region, where functional protein and hypothetical protein occupied 60.7% and 28.8%, respectively (Figure S2).

The percentage of individual microsatellites being part of compound microsatellite (cSSR%) ranged from 7.9 to 9.0 (Table 1). Based on dMAX value, the maximum distance between any two adjacent microsatellites and if the distance separating two microsatellites is less than or equivalent to dMAX, than microsatellites are classified as cSSRs49. To determine the impact of dMAX, all the studied genome sequences were chosen to determine the variability of cSSRs with increasing dMAX. The value of dMAX was set between 10 and 100 by Microsatellite Identification Search Analysis (MISA)50. Our analysis revealed an overall increase in the number of cSSRs with higher dMAX value and attained a plateau (Fig. 2).

Figure 2
figure 2

Frequency of cSSRs in relation to varying dMAX (10–100) across eleven ORFV complete genomes represented on the right side of the graph. A higher cSSR incidence was observed with increasing dMAX in the genomes.

Genomic parameters influencing SSR and cSSR distribution

We tested for the correlation between genome size and GC content with the incidence, RA, RD of SSRs and cSSRs. Except incidence (R2 = 0.6162, p > 0.05), all other parameters such as RA and RD of SSRs had no correlation (R2 = 0.002374, p > 0.05; R2 = 0.18, p < 0.05) with the genome size and GC content (R2 = 0.09377, p < 0.05, R2 = 0.00126, p > 0.05; R2 = 0.08129, p < 0.05). The regression analysis of cSSRs showed significant correlation with the incidence (R2 = 0.6483, p > 0.05) and RA (R2 = 0.4823, p > 0.05) while displayed non-significant correlation with RD (R2 = 0.3759, p < 0.05). On the contrary, the GC content was weakly correlated with the number (R2 = 0.02903, p > 0.05), RD (R2 = 0.004839, p < 0.05) and RA (R2 = 0.03917, p < 0.05) of cSSRs.

The frequency of classified repeat types

The overall frequency of mononucleotide repeats A/T (64.1%), dinucleotide repeat motif CG/GC (81.6%) were the most prevalent than poly G/C (35.9%), GA/TC (5.0%), AC/GT (4.5%), AG/CT (3.9%), CA/TG (3.6%) and AT/TA (1.4%), respectively. Analysis of the classified tri-repeat types revealed that the ORFV genome had 30 types of trinucleotide from which CGC/GCG, GCC/GGC, CAG/CTG, AGC/GCT, CCG/CGG were abundantly present exhibiting 18.2%, 14.5%, 6.3%, 6.2%, and 6.3%, respectively. The most common tetra and hexanucleotide repeats were CGAG/CTCG (34.9%), ACTC/GAGT (18.6%), GTGA/TCAC (9.3%) and AGTTAC/GTAACT (15.0%), ACACTC/GAGTGT (15.0%), respectively. However, the accession specific analysis illustrated that the frequency of mono, di, tri repeats varied from each other (Fig. 3a–c).

Figure 3
figure 3

Types of SSRs distribution. (A) Distribution of different motifs of mononucleotide SSRs within ORFV genomes, (B) distribution of different motifs of dinucleotide SSRs within ORFV genomes, and (C) distribution of different motifs of trinucleotide SSRs within ORFV genomes.

Motif complexity of compound microsatellites

Compound microsatellites (cSSRs) are composed of two or more adjacent individual microsatellites. Generally, cSSR having the pattern like, m1-xn-m2, m1-xn-m2-xn-m3 are considered as ‘2-microsatellite’ and ‘3-microsatellite’, respectively49. Majority of cSSRs were composed of two motifs, followed by tri, tetra, and penta-motifs (Supplementary file 1). Interestingly, two long stretches of cSSR were composed of identical motifs repeated 12 times, which was exclusively found in the genome of AY386264. The CTG–CAG compound microsatellite composed of self-complementary motifs has been proposed to be created by recombination51. However, our study showed no such compound microsatellites which contained self-complementary motifs, suggesting that these compound microsatellites were not likely to be derived from recombination. Motifs exhibiting the form [m1]n-xn-[m2]n can be termed as SSR-couples and are represented the maximum time in the genome. In this study, SSR couples, such as (CG)-x-(GC), (GC)-x-(GC), (GC)-x-(CGC), (GT)-x-(GC), (GC)-x-(CG), (CT)-x-(C) were presented in all analyzed genome. A number of self-complementary motifs such as (CG)3-x1-(GC)3, (CG)4-x1-(GC)3, (CG)3-x7-(GC)3, (CG)3-x0-(GC)3, (GC)3-x8-(CG)3, (CG)3-x7-(CG)3, (GC)3-x0-(CG)3, (CG)3-x4-(GC)3 have been observed in ORFV, which played a pivotal role in secondary structure formation. Motif duplication is one of the phenomena in which a similar motif is located on both ends of the spacer sequence, for example (CA)n-(X)y-(CA)z. About 22.1% of the total cSSR were made up of duplicated sequences having the motif pattern (GC)-x-(GC), (CG)-x-(CG), (GA)-x-(GA), (CA)-x-(CA), (CT)-x-(CT), (TC)-x-(TC), (CA)-x-(CA)-x-(CA), (A)-x-(A), (AG)-x-(AG)-x-(AG)-x-(AG)-x-(AG)-x-(AG)3-x-(AG)-x1-(AG)-x-(AG)-x-(AG)-x-(AG)-x-(AG)-x-(AG), (AG)-x-(AG), (C)-x-(C), (CA)-x-(CA), and (CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT)-x-(CT) (Supplementary file 1).

Identification of polymorphic microsatellites through in silico approach

For a polymorphic microsatellite, the length of the repeat block should be non-identical with that of the other sequences in the database, and this length difference must be a multiple of the repeat unit19,30,52. For the identification of a polymorphic microsatellite, eleven strains of ORFV were used, where (AY386264) acted as the reference. A total thirteen number of polymorphic microsatellites were observed; among these, two were observed within the hypothetical protein, three in the intergenic regions, and rest eight in the protein-coding/genic regions. The polymorphic genic region containing the microsatellites encodes several important proteins such as Ankyrin repeat protein (ANK protein), DNA-binding phosphoprotein, virion core protein, Granulocyte–macrophage colony-stimulating factor (GM-CSF), Interleukin 10 protein (IL-10), Putative serine/threonine-protein kinase protein (Table 2, Figure S3). The Circos map provides a clear vision regarding the SSRs and cSSRs distribution and other related details in ORFV (OV-SA00) genome (Fig. 4).

Figure 4
figure 4

Circos plot showing the Genome size, CDS, Distribution of SSRs, selected SSR markers, cSSRs and GC content in ORFV (OV-SA00) genome. From outer track to inner track: Genome size, CDS, SSRs, selected SSR markers (Black lines within the SSR), cSSRs and GC content.

Development and characterization of SSR markers

All clinical samples collected during the outbreak were found to be positive for ORFV tested by producing the desired PCR amplicon size of 140 bp (Fig. 5).

Figure 5
figure 5

Clinical samples evaluation by universal OFRV primers. Electrophoresis gel showing the PCR amplicon of four suspected ORFV clinical samples collected from Black Bengal goats. M: 100 bp DNA ladder; -C: Negative control (PCR using nuclease-free water as DNA template); 1–4: Clinical samples.

We chose all thirteen polymorphic markers to validate in-vitro. Hence, PCR was set with each primer sets to amplify the DNA isolated from a positive clinical sample. The SSR name, primer sequences, expected size, targeted motif, functional region, protein motif position, gene, ORF number, and annealing temperature, were summarized in Table 2. All the SSR markers produced reliable and reproducible PCR products with the expected molecular size (Fig. 6).

Figure 6
figure 6

Clinical sample validation using SSR markers. Electrophoresis gel showing the PCR amplicon of the developed SSR markers in ORFV. SSR markers from SSR1 to SSR13; M:50 bp DNA ladder; -C: Negative control (PCR using nuclease-free water as DNA template).

The amplified SSRs were further characterized by sequencing, mapping with the GenBank database through BLASTn and BLASTx. The results of BLASTn alignment revealed a 100% of query coverage and a high identity percentage (91–100%) between the respective sequencing product and their equivalent genes from the published OV-SA00 isolate genome sequence. The results of BLASTx alignment revealed various degrees of query coverage (38–96%) and a high identity percentage (91–100%) with their equivalent amino acid sequences (Table 3).

Table 3 Alignment of the 13 sequenced microsatellite markers (partial) against the complete genome present in the NCBI database.

The concatenated phylogenetic tree showed the ORFV of our study closely related to Chinese isolate (MG712417) (Fig. 7). We observed the presence of 2–3 alleles within ORFV genomes.

Figure 7
figure 7

The concatenated phylogenetic tree was constructed using the bootstrap consensus tree building method of neighbor-joining with bootstrap value 500 using MEGA 5. Black triangle represents the ORFV isolates of present investigation showing its relationship with eleven global strains.

Discussion

Microsatellites, otherwise known as short tandem repeats (STRs), or a variable number of tandem repeats (VNTRs) are being used to discriminate various viruses, such as human cytomegalovirus (hCMV)22,23, white spot syndrome virus (WSSV)53,54,55,56,57,58, Herpes Simplex virus type 130,59, Herpes Simplex virus type 260, Herpesvirus 361, Herpesvirus 662, Adenovirus63, Ostreid herpesvirus 164,65, Marek’s disease virus 166, and Spodoptera littoralis multiple nucleopolyhedrovirus (SpliMNPV)67 due to its polymorphic in nature. To get the insight into the microsatellite in ORFV, we have employed a comparative genomics approach for development and characterization through in-silico and in-vitro analysis and validated our findings using samples collected from the recent Orf outbreak for the first time.

The specific parameters, such as its incidence, RA and RD of SSR and cSSR in ORFV genomes, show abundance variation as compared to their genome size and GC content due to the heterogeneity of ORFVs. Until now, limited full-length ORF genomes exist in the database. Based on our analysis, we observed little variation in RA and RD in ORFV. However, in other viruses such as HPVs47 and Herpesviruses48, higher variation in RA and RD were reported. The large variation with the parameters was not observed in ORFV, probably due to the lack of enough size difference in the genome. However, a limited number of complete genome sequences are available for this virus, in comparison to HPV and herpesviruses, which act as a constraint to get the optimal range. Correlation analysis confirmed that incidence of both SSRs and cSSRs, RA of cSSRs were dependent on genome size, but independent of GC content, which was similar to that of HPV47, but opposite to HIV68, potexvirus, carlavirus, and tobamovirus69,70,71. The distribution of microsatellite in the viral genome is pathogen-specific rather than host-specific. The increase of cSSR is predominant when dMAX approaches 10–90 bp and further decreases with the increase of dmax (Fig. 2). This may be due to the occurrence of SSR in the overlapping regions of increasing dMAX. The ORFV genomes have more SSR within coding regions than non-coding regions in comparison with other DNA virus, such as herpes simplex virus. This might be due to higher relaxed selection pressure on coding regions in comparison to the non-coding region in the respective virus.

The cSSRs percentages of ORFV ranges from 7.9 to 9.0%, which is lower in comparison to HIV-1, 0–24.2%68, Geminivirus, 0–27.2%72, Herpesvirus, 8.1–33.3%48. Generally, the number of compound microsatellites decreases with an increase in complexity73. Moreover, the lack of sufficient genomic resources from diverse geographical locations may contribute to a stagnant range of cSSRs%. In ORFV, 22.1% of cSSRs were composed of similar motifs, probably contributed by genome duplication. Some study suggests that genome duplication may be helpful for the repeat tendency mechanism74, which promotes the expansion of genome size such as yeast75,76.

In ORFV genomes, the poly A/T repeats were significantly more prevalent than poly G/C repeats, similar to eukaryotic and prokaryotic genomes1,2. The presence of mononucleotide repeats in Mengovirus and Encephalomyocarditis virus affect virus growth in murine cell culture77. In the case of ORFV, its significance needs further validation. In this study, we also observed the microsatellites having polymorphism in poly A/T (ORF117), poly C/G (ORF121), within the important immune-regulatory genes, such as in GM-CSF and ANK protein, respectively (Supplementary file 2). GM-CSF secreted by a variety of cell types triggers neutrophil, monocyte, and eosinophil myelopoiesis and stimulate early events in immune responses, controlling the differentiation and function of antigen-presenting dendritic cells. IL-2 is a T-cell-derived lymphokine that stimulates T-cell and NK cell activation and proliferation and activated-B-cell proliferation78,79. ANK protein leads to the down-regulation of hypoxia-induced factor (HIF) activity and regulates energy metabolism, angiogenesis, the apoptotic cascade, the NF-kB signaling pathway, and cell cycle regulation80. The functional effects of this polymorphism in these regions require further investigations.

Dinucleotide CG/GC is more prevalent in most of the ORFV genomes, similar to that of DNA viruses such as HPVs47, Caulimoviruses, Geminiviruses52,81. CG/GC repeat could form Z‐conformation or other alternative secondary DNA to facilitate the recombination activity82. In our study, the polymorphism within dinucleotide (AC/CA)3 and (CG/CG)3 observed within the hypothetical protein. Dinucleotide repeats have the highest slippage rate as compared to any other type of repeats81. Among 257 viral genomes examined in a published study, the highest number of dinucleotide SSRs were found when compared to the other types83. Dinucleotide repeats are also speculated to be recombination hot spots84,85. In this study, the presence of higher di-nucleotide repeats over tri-nucleotide repeats suggests a possible role of hosts in the evolution of di-nucleotide repeats within poxvirus genomes. Inconsistency frequency of SSRs in different accession of the same virus may be attributed to instability because of a higher slippage rate86.

Trinucleotide motif ATA/TAA/AAT or ATT/TTA/TAT were most prevalent in most genomes of poxvirus whereas in other DNA virus GAG/AGA was most prevalent in HPVs and AAG/GAA in caulimoviruses. The higher density of trinucleotide repeats was observed compared to any other repeat type within coding regions of eukaryotic and prokaryotic genomes32. Interestingly, dynamic mutations within trinucleotide repeats responsible for the development of some diseases in humans87, as well as viral enzymes that interfere pathogenicity of Influenza virus88. Our study revealed the presence of trinucleotide CGC/GCG and GCC/GGC repeats to be most prevalent than others. The trinucleotide polymorphism was observed in some immunoregulatory genes such as ANK protein (GGC/GCC)3 (ORF008), IL-10 protein (AGT/ACT)3 (ORF127) and structural genes virion core protein (GAG/CTC)3, Putative serine/threonine-protein kinase (CGC/GCG)3, which needs further functional evaluation.

Three polymorphic SSRs such as (A/T)7, (T/A)6, (AGTTAC/ GTAACT)3 were observed within non-coding regions. The microsatellite present within the non-coding reasons was evolutionarily neutral and can be utilized as an excellent molecular marker30. Finally, we have characterized, those polymorphic markers present at non-genic as well as coding (genic) regions. These genic microsatellites, however, may provide adaptive variation important to viral evolution and genetic variability, perhaps similar to the functionally important mononucleotide runs found in VSV34 and RSV36 and virulence of avian influenza virus encephalo-myocarditis virus89,90. It is noteworthy to mention that, recently, the microsatellite present in HSV-1 glycoprotein coding region US4 was useful for strain differentiation30. The concatenated tree, which was constructed utilizing sequence information of characterized markers, confirmed that the ORFV of the present study closely related to Chinese isolate (MG712417). Our previous report, as well as several other studies, observed a similar pattern of relationship18,91. We speculate that trans-boundary and cross-species transfer of ORFV isolates could have resulted in this, as India is geographically adjacent to China. It is interesting to observe the presence of a number of the alleles (2–3) within ORFV genomes indicates the existence of polymorphism within microsatellites, which could act as a useful tool to estimate the diversity61. Using a single repeated mononucleotide was able to follow the dynamics of transmission of a human adenovirus during an epidemic63. Therefore, microsatellites constitute a potentially powerful tool for epidemiological studies of the transmission routes and evolution of ORFV and other related poxviruses. This study provides an important new type of molecular markers useful to investigate questions not only related to epidemiology but also for deciphering the diversity of the virus. However, the characterized microsatellites of the present study are not biased to the particular strain, which indicates the presence of recombinant strains circulating within the Indian subcontinent. This information is not concrete, which requires validation by several whole-genome sequence analysis of ORFV isolates from Indian origin. So far, our understanding of the functional and evolutionary role of microsatellites in ORFV biology is limited, which needs further in-depth evaluation and possible implementation.

In conclusion, the study of microsatellites in ORFV genome is a key step towards better understanding the nature, function, and evolutionary biology of the species. Our preliminary results can be considered as a useful tool for ORFV strain demarcation, diversity estimation, and evolutionary analysis. Our next plan is to characterize several ORFV strain complete genome from Indian origin through next-generation sequencing to get a better insight into genome organization, development of a suitable multiplex panel, which can be utilized as an effective tool for virus identification, genotyping and evolutionary analysis of the respective virus.