Torque teno virus (TTV) is a small non-enveloped virus with a single-stranded circular DNA genome that primarily infects humans. It belongs to the genus Alphatorquevirus within the family Anelloviridae [1, 2]. Its genome contains an untranslated region (UTR, divided into UTR A and UTR B) [3] and a coding region with three to four open reading frames. Of the two regions, the UTR is more conserved, and it has been hypothesized that it may play an important role in viral replication [4]. The high genetic variability of TTV previously led to its classification into seven genogroups [5,6,7], which were subdivided into multiple genotypes [6, 8]. The current taxonomy of the family Anelloviridae is based on analysis of entire ORF1 sequence and imposes the following cutoff values for sequence divergence: genera >56%, species >35%, subspecies >20%, and isolates <20% [9, 10]. According to the latest ICTV Master Species List, the genus Alphatorquevirus includes 26 species (Torque teno virus 1-31 and four species whose members infect non-human primates) [10].

Despite the extremely high prevalence of TTV, searches for association with disease have not yielded any significant results. Genotype-specific pathogenicity has also been considered; for instance, isolates similar to TA278 TTV1 (former genotype 1) were associated with hepatitis of unknown etiology and increased ALT levels [1], as well as head and neck carcinoma [11]. Isolate-specific pathogenicity remains debatable, as there is no strong evidence to support or to reject this hypothesis.

Although TTV does not seem to be pathogenic per se, its capacity to modulate and evade immune response [12, 13] may predispose infected individuals to multiple autoimmune diseases [14] or augment pre-existing condition(s). In the last few years, attention shifted from the pathogenic potential of TTV to its potential use as a biomarker. There is strong evidence supporting the use of TTV DNA-aemia as biomarker for monitoring the kinetics of functional immune competence before and after solid organ transplantation [15,16,17], monitoring the efficacy of antiviral treatments in HIV-infected patients [18], or even predicting clinical outcome in SARS-CoV-2-infected patients [19]. However, discriminating between TTV isolates that may be associated with pathology and innocuous isolates should not be overlooked, and genotyping and tracking their distribution thus remain an important task.

Despite several studies investigating the prevalence of TTV in Romania and its association with pathology [20,21,22], information on the molecular characteristics and phylogeny of these viruses is still limited in this country.

The aim of this study was to determine the prevalence and geographical distribution of TTV in Romania and to describe the phylogenetic relationships between isolates found in healthy Romanian subjects, as well as in hemodialyzed patients.

Two hundred thirty-six clinically healthy volunteers undergoing a routine medical check-up (110 males and 126 females) were selected between April and May 2019 from Romania’s major healthcare centers (Table 1). Three of the individuals were immigrants from Europe, the Middle East, and the Far East, respectively. Blood samples were collected after informed consent was signed. The study was approved by the National Institute of Research and Development for Food Bioresources Ethics Committee with the registration number 342/16.05.2014. Consent forms for underage subjects were signed by legal guardians. Samples were given codes, and analysis was performed in a blind manner. Only information on age, gender, and residence was available.

Table 1 Infection rate in the tested population sample

Briefly, genomic DNA was purified from whole blood using a commercial kit (PureLink® Genomic DNA Mini Kit, Invitrogen) and was used for PCR amplification of a fragment of UTR B using primers described previously [3]. Gel-purified amplicons (PureLink® PCR Purification Kit, Invitrogen) were quantified using a Qubit fluorometer (dsDNA HS Assay Kit, Invitrogen) and subjected to direct sequencing using a BigDye™ Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) on a 3130 Genetic Analyzer (Applied Biosystems).

Chi-squared and Fisher's exact test were used to examine possible differences in the prevalence of TTV DNA between males and females, while the independent-samples Mann-Whitney U test was used to compare means. Statistical analysis was conducted using SPSS Statistics software v 20.0.0 (IBM).

The subjects selected for the study were residents of all eight development regions of Romania (NUTS 2 regions): the Bucharest metropolitan area (Bucharest and Ilfov County), South, South-East, North-East, Center, North-West, West, and South-West development regions. The place of residence of the study participants is presented in Fig. 1.

Fig. 1
figure 1

Geographical distribution of analyzed samples. (a) Overall analyzed samples (n = 236). Each region is composed of 4-7 counties and represented by a different color. The number of samples analyzed from each county is circled in black. The total number of samples analyzed per region is circled in red. The number of samples from non-Romanian subjects is shown in red outside of Romania’s borders. (b) Geographical distribution of samples sequenced and submitted to ENA.

Of the 236 subjects analyzed, 156 (66%) were positive for TTV DNA. There was no difference between the distribution of TTV DNA in males and females (chi-squared p > 0.05). A statistically significant difference in the distribution of TTV DNA in different age groups (independent-samples Mann-Whitney U test p = 0.002) was identified, where prevalence increased with age. The geographical distribution of TTV DNA prevalence did not differ significantly and ranged between 54.5% (12/22) in the South-West region and 85.7% (6/7) in the West region.

The TTV prevalence recorded in our study group was lower than those reported for the general population of other European countries (e.g., 84-88% in Italy, Finland, Poland, and Greece [23]) and Asian countries (e.g., 72% in India [24], 83.4% in Qatar [25], and 93.3% in China [26]) but higher than the TTV prevalence reported for Iran (49.3%) in 2018 [27]. However, these variations might have been due to the genomic target used for TTV detection. Also, as TTV DNA prevalence and viral load have been shown to increase with age [20, 23], our results may be explained by the age of subjects tested in different studies. The genomic region amplified in our study is generally highly conserved and enables the detection of many different TTV variants. However, isolates of TTV9, TTV10, TTV25, TTV26, and TTV29 (all members of the former genogroups 4 and 5) have a low detection rate using this primer set.

Of the TTV-positive samples, 80 random samples (20 from Bucharest, five from the Center region, seven from North-East, two from North-West, 18 from South, 15 from South-East, eight from South-West, four from West and one non-Romanian sample) were subjected to direct amplicon sequencing. More than half (43/80, 54%) of the positive samples generated electropherograms with overlapping peaks and correct base spacing, suggesting the presence of multiple TTV variants within the same individual. Those sequences were not included in the phylogenetic analysis. Mixed infections with viruses belonging to the same or different genera of the family Anelloviridae – in both healthy individuals and patients with various medical conditions – are frequently reported and are a characteristic of the members of this family [5, 20, 25, 28,29,30].

The sequences obtained from subjects with monotypic infections (n = 37) were submitted to the European Nucleotide Archive (ENA) of EMBL under the accession numbers LR742476-85, LR742487-512, and OU989706. In order to maintain the resolution of the phylogenetic analysis, four of these sequences (shorter than 200 bp) were excluded from further analysis. In addition to these samples, 12 other sequences obtained previously from obese Romanian patients with diabetic nephropathy who were undergoing hemodialysis were included. Also, two positive control samples from a subject known to have TTV infection, collected in 2015 (LB809941) and 2019 (LR742486), were included in the analysis (Fig. 2).

Fig. 2
figure 2

Phylogenetic tree of human TTV based on a fragment of approximately 300 bp of the UTR B region. The evolutionary history was inferred using the maximum-likelihood method and the Kimura 2-parameter model with 1000 bootstrap replicates. A discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.7031)). Bootstrap values > 80% are indicated on the respective branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Evolutionary analysis was conducted in MEGA X [31], and annotations were added using EvolView [52]. Black circles, sequences obtained from healthy Romanians for this study; white circles, sequences obtained from Romanian obese dialyzed patients with diabetic nephropathy in 2015; black triangles, sequences from positive controls. Taxonomically assigned isolates are shown in burgundy, and unassigned isolates are shown in grey.

Phylogenetic analysis was conducted by the maximum-likelihood method with the Kimura 2-parameter model and 1000 bootstrap replicates, using MEGA X software [31]. The model for estimating genetic distance was chosen using MEGA X. Information on isolates used for constructing the phylogenetic tree is found in Supplementary Material.

The sequences obtained in this study were relatively distant from most of the reference strain sequences for species identification. In order to obtain a high-resolution phylogenetic tree, only the sequences displaying a high percentage of identity to the sequences obtained in this study were included in the analysis.

The phylogenetic tree (Fig. 2) clearly depicts the relationships between the sequences obtained from Romanian subjects, as well as the relationships between these sequences and others assigned or unassigned to worldwide TTV isolates. The analyzed sequences grouped into six clades (designated A to F) supported by bootstrap values greater than 80%.

Clade A contained sequences highly similar to those of isolates belonging to the species Torque teno virus 3, such as the Polish isolate P/1C1, as well as unassigned sequences from Malaysian patients infected with hepatitis viruses, while clades E and F clustered with sequences similar to those of isolates belonging to the species Torque teno virus 1, such as American isolate US32 and French isolate T3PB. Romanian isolates with sequences similar to that of the Finnish isolate TTV3-HEL32 formed clade D. In the former genogroup classification of TTV, isolates US32, T3PB, and HEL32 belonged to genotype 2 [32], genotype 3 [33], and genotype 6 [34], respectively, of genogroup 1. The majority of the Romanian sequences (41/47, 87%) clustered in these four clades (A, D, E, and F).

Clade B comprised one Romanian sequence from a dialyzed patient, TTV31-Hebei-1 isolate – originating from a Chinese patient with fatal fever [7] – and the unassigned isolate TTVMY HB34 from a Malaysian hepatitis B patient (unpublished results). TTV31-Hebei-1 is an established member of the genus Alphatorquevirus [10] and, according to the former classification, belonged to genogroup 7 [5]; the Romanian isolate (RO-od21) may belong to the same species.

Clade C consisted of Romanian sequences and an unassigned Malaysian isolate. The closest sequences to clade C that were not included in the clade were TTV19-SANBAN and a member of the species Torque teno virus 24 (svi-1).

The genomic region analyzed in this study, as well as in another recent study [35], proves to have a higher phylogenetic resolution, as it is able to discriminate between TTV subspecies, defined as a grouping of isolates with ORF1 nucleotide sequence divergence of 35-20% [9]).

The first protocols used for TTV phylogenetic analysis were based on N22 PCR, which amplifies mostly isolates belonging to the species Torque teno virus 1-5 (former genogroup 1) [36], and thus data regarding the phylogenetic distribution of TTV in different populations are scarce and limited to studies from Italy and South America (Brazil and Uruguay). A study describing the phylogenetic relationships between TTV isolates from Iranian hepatitis patients [27] highlights the importance of choosing genomic regions with good phylogenetic resolution, because the tree obtained in the above-mentioned study lacks support (bootstrap values for major branches below 10) and leads to artificial grouping or dividing of known highly dissimilar or similar isolates into clusters.

Studies performed in Italy [37], Brazil [38], Uruguay [39], and Japan [40] showed TTV genogroup 3 to be the most prevalent, followed closely by genogroup 1, while the least prevalent was genogroup 2. Studies from Brazil found genogroup 5 isolates to be the most frequent [29, 41]. Most Romanian sequences obtained in this study exhibited high similarity to TTV1 and TTV3 isolates, formerly belonging to genogroup 1 (41/47, 87%). Some of the Romanian sequences were closer to TTV19 and TTV24 – formerly belonging to genogroup 3 (5/47, 11%). None of the sequences obtained resembled isolates of TTV6-7 (genogroup 2), TTV25-26, TTV29 (genogroup 4), or TTV9-10 (genogroup 5) (data not shown). Nevertheless, one of the disadvantages of using UTR B primers is the low detection rate of TTV9, TTV10, TTV25, TTV26, and TTV29 isolates (from former genogroups 4 and 5).

Analysis performed in different countries (Hungary [11, 30], Czech Republic [42], Egypt [43], Saudi Arabia [44], India [45], Japan, Korea, Shanghai, Mongolia, Colombia, Cameroon, Germany, and UK [46]) revealed TTV1 isolates similar to TA278 and US32 (former genotypes 1 and 2) to be the most prevalent and widespread in populations worldwide. Other studies showed evidence that TTV3 isolate HEL32 (former genotype 6) was found mainly among Asians, while isolate TTV1 isolate T3PB (former genotype 3) was found mainly among Europeans [46]. Sixteen sequences (34%) obtained in our study clustered with isolate T3PB, supporting these last reports.

Migration rates have been shown to alter the epidemiology of viral infections (e.g., HCV genotype circulation in Turkey [47], dispersal of HIV from Uganda [48]). In the present context of global migration – mainly due to the refugee crisis, Romania was not a preferred destination for Asian immigrants. Moreover, the Migration Policy Institute (https://www.migrationpolicy.org/programs/data-hub/charts/immigrant-and-emigrant-populations-country-origin-and-destination) and Eurostat (https://ec.europa.eu/eurostat/databrowser/view/tps00177/default/map?lang=en) data show that most immigrants in Romania come from the neighboring countries and that the efflux of the Romanian population is higher than the influx. Considering these aspects, the particular distribution of TTV in Romania could be explained by low immigration rates.

Sequences obtained from samples collected in Bucharest were found in all clades, except for B and D. Bucharest is the capital city of Romania, with almost 2,000,000 permanent inhabitants, and is the largest university center and employer in the country. Many of its inhabitants come from all over the country, thus explaining the diversity observed in the current study for the Bucharest metropolitan area.

All sequences from the North-East region (indicated by yellow squares in Fig. 2) were found in clade A, together with TTV3 isolates P/1C1 and tth16, as well as two unassigned isolates from Malaysia (i.e., TTVMY HC9, from the serum of a control subject, and TTVMY HB25, from the serum of an HBV-infected patient). Sequences obtained from samples collected from the West and Center of the country (indicated by blue and pink squares in Fig. 2) were aggregated in clade F, together with an unassigned isolate from Malaysia (TTVMYC124) and TTV1 isolate T3PB. The separation of sequences originating in subjects living in the North-East and West regions of Romania in distinct clades may be explained by a geographical barrier – the Carpathian Mountains – limiting the population exchange between these areas. However, in order to test this hypothesis, more sequences from subjects residing in the Center and the West regions should be added to the analysis.

Of the samples collected from immigrants, two out of three were TTV positive. The sequences obtained from the positive samples were, however, of insufficient quality for further analysis.

The sequences obtained from the subject with TTV infection in 2015 (LN809941) and 2019 (LR742486) grouped separately. The sequence obtained in 2015 aggregated together with the TTV1 US32 isolate, while the sequence obtained four years later shared the highest degree of similarity with an unassigned isolate derived from an HBV-infected Malaysian (TTVMY HB25) and grouped closer with isolates currently belonging to TTV3. This result suggests that periodic clearing of TTV and acquisition of a different strain can occur. Another explanation takes into consideration the fact that mixed TTV infections occur [5, 20, 25, 28,29,30] and that the viral dynamics can change over time [49]. Thus, at a single point in time, a certain strain may have a higher viral load and become the only one detected using the selected amplification strategy or Sanger sequencing.

Sequences derived from dialyzed obese patients with diabetic nephropathy were found scattered in almost all clades (Fig. 2). However, almost half of these sequences (5/12) grouped together with TTV3 (isolate HEL32, former genotype 6) and TTV19 (SANBAN). TTV3 isolate HEL32 has an overall low prevalence [50] and is more likely to be found in hepatitis patients [30, 43]. TTV19 isolate SANBAN was shown to produce a protein suppressing the NF-κB pathway, contributing to TTV pathogenicity and relating it to autoimmune and/or inflammation-prone conditions [51].

The most common TTV isolates among healthy Romanian individuals were similar to TTV1 and TTV3 isolates (former genogroup 1). There seems to be a geographical distribution of TTV isolates from east to west in the country, while in the metropolitan area of the capital and neighboring counties, the circulation of TTV isolates has no apparent boundaries. It is possible that TTV from the most prevalent species (e.g. Torque teno virus 1-6, former genogroup 1) are less likely to be involved in the onset or modulation of diseases, while rarer isolates should be studied further in association with pathological conditions. Thus, analysis of genomic regions with phylogenetic resolution below the species level is important.