Introduction

Arthropod-borne viruses (arboviruses) represent a threat to global public health, especially in tropical and subtropical regions of the world. Arboviruses have emerged or re-emerged in increased numbers in the last 20 years. Most of them are maintained in nature through a cycle of infection between vertebrates and arthropods, with humans as accidental, dead-end hosts. Several factors contribute to the emergence of arboviruses. Some of them are globalized transportation, uncontrolled urbanization, and the lack of effective vector controls. The Brazilian Amazon has the highest number of arboviruses that have been detected in the world [59]. Over 150 arboviruses can infect humans in Brazil. The most common include dengue virus (DENV), chikungunya virus (CHIKV), Zika virus (ZIKV), Oropouche virus (OROV), and yellow fever virus (YFV) [21]. In addition, Mayaro virus (MAYV), Rocio virus (ROCV), Saint Louis encephalitis virus (SLEV), and West Nile virus (WNV), among others, have also been reported [18, 38, 39, 41, 47, 48, 51, 60]. Malaria is also endemic in the Amazon region, with a significant predominance of Plasmodium vivax (∼85%) [46]. Manaus, the capital of Amazonas state, is the most populous city of the North Region of Brazil, with about 2.5 million inhabitants. The Manaus metropolitan area is a malaria-endemic area [3, 52, 53]. Manaus reported 8370 malaria cases in 2018 [54]. Malaria and arboviral diseases are difficult to differentiate, as patients share similar clinical features and laboratory findings. A systematic review of the literature searching for the known records of pathogens causing febrile illness in Latin America between 1980 and 2015 found that the etiologies of non-malaria febrile illness most frequently reported were viral infections, of which most were arboviruses [42]. We hypothesized that febrile patients with a negative test for malaria might have arboviral infections. To test this hypothesis, we used an arbovirus DNA microarray platform and next-generation sequencing (NGS) to screen for virus infections. Surprisingly, we found by NGS analysis that only one out of ten analyzed pools of serum samples was positive for an arbovirus (the ZIKV) in the studied population.

Material and methods

Study site and participants

We performed a cross-sectional study of a convenience sample comprising 200 acute febrile patients with suspected malaria, recruited from August 2016 to February 2018 at the Fundação de Medicina Tropical Dr Heitor Vieira Dourado (FMT–HVD), a reference institution for infectious diseases in the city of Manaus, the capital of Amazonas state, in the Western Brazilian Amazon. Inclusion criteria: participants with a diagnosis of suspected malaria, aged ≥18 years, residing in the Manaus metropolitan area, with a febrile syndrome characterized by fever detected at the time of attendance or self-reported, with at least two other symptoms such as headache, myalgia, retro-orbital pain, rash, photophobia, asthenia, anorexia, malaise, with up to 10 days after symptoms onset, and with a negative thick blood smear test. All participants signed a written informed consent prior to entering the study. This study was reviewed and approved by the Ethics Committee of the Schools of Pharmaceutical Sciences of Ribeirao Preto (SPSRP), University of Sao Paulo (USP), Brazil (CEP/FCFRP nº 400).

Diagnosis of malaria

Microscopic examination of blood smears was used for malaria diagnosis. Negative tests were those in which the parasite was absent in a thick blood smear after the counting of 200 leukocytes in high-magnification fields. Quality control with a second review, performed by experienced microscopists, was routinely performed on 100% of the slides to confirm the results of the tests.

Peripheral blood samples

Ten mL of venous blood was collected in Vacutainer tubes (BD Vacutainer, Franklin Lakes, NJ) without anticoagulant. Samples were immediately centrifuged to obtain serum samples, which were stored at -80ºC. An aliquot of 1-2 mL of serum was sent to the Laboratory of Virology, SPSRP-USP, for virological analysis. Serum samples were stored at -80ºC until use.

Screening for arboviruses and viruses transmitted by small mammals

Arboviruses and viruses transmitted by small mammals were screened using a DNA microarray platform as described previously [33]. Briefly, total RNA was obtained from 200 µL of each serum sample using a PureLink Viral RNA/DNA Mini Kit (Invitrogen, USA). Complementary DNA (cDNA) was synthesized using random primers and SuperScript III Reverse Transcriptase (Invitrogen, USA). Each cDNA or pool of equimolar amount of cDNAs was subjected to random PCR amplification for fluorescent labelling with 1 nmol Cy3-dUTP (Sigma-Aldrich, USA). Labeled PCR products were purified using a Wizard SV Gel and PCR Clean-Up System (Promega, USA). PCR products were hybridized against the DNA microarray using an Oligo aCGH/ChIP-on-Chip Hybridization Kit (Agilent, USA). Slide images were obtained using an Axon GenePix 4000B scanner (Molecular Devices, USA), with a 532-nm laser and 10-μm resolution. The median fluorescence intensity of each spot was calculated from the scanned images using GenePix Pro 7 software (Molecular Devices, USA). The raw fluorescence data of each spot was normalized against the negative control probes. The normalized data were log2-transformed to reduce variability. The mean signal intensity of all viral species probe groups was compared to the mean signal intensity of the negative control probes, using Welch’s t-test. The criteria for virus detection in a sample were as follows: mean signal intensity of the probes significantly (p < 0.05) higher than the mean signal intensity of the negative control probes and a normalized mean intensity of at least 2, i.e., at least fourfold higher than the mean signal intensity of the negative control probes, thus reducing the chance of virus misidentification due to cross-hybridization.

Next-generation sequencing with low sequencing depth (~1 million reads)

Total RNA was purified from 200 µL of serum samples using a PureLink Viral RNA/DNA Mini Kit (Invitrogen, USA). The complementary DNA (cDNA) synthesized from the RNA was amplified by random PCR as described previously [33]. The PCR products were purified using a QIAquick PCR Purification Kit (QIAGEN, USA), and the DNA concentration was determined using a NanoDrop 2000 spectrophotometer (Thermo Scientific, USA). The PCR product of a serum sample or pool of PCR products (two to four PCR products) in equimolar concentration (~20 ng of total DNA per µL) was sequenced at Biotecnologia Pesquisa e Inovação (BPI, Botucatu-SP, Brazil). Sequencing libraries were prepared using a Nextera XT DNA Library Preparation Kit (Illumina, San Diego, USA). The library samples were labeled with adapter sequences using a Nextera XT Index Kit. Paired-end sequencing was performed using an Illumina MiSeq platform (Illumina) and a MiniSeq Reagent Nano Kit V2 (300-cycles) (Illumina), which provides an average of 1 million reads of 2 × 150 bp fragments.

Next-generation sequencing with high sequencing depth (~25 million reads)

All sequencing procedures were performed at the Technological Innovation Center (TIC) of Evandro Chagas Institute, Belem, Brazil. Briefly, total RNA was purified from 200 µL of serum, using a PureLink Viral RNA/DNA Mini Kit (Invitrogen, USA). The cDNA was synthesized using random primers and SuperScript IV Reverse Transcriptase (Thermo, USA). The cDNAs were mechanically fragmented, quantified using a Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, USA), and analyzed using a 2100 Bioanalyzer system (Aligent, Santa Clara, USA). Sequencing libraries were prepared using a Nextera XT DNA Library Preparation Kit (Illumina). The samples were labeled with adapter sequences using a Nextera XT Index Kit (Illumina). Paired-end sequencing was performed using a HiSeq 2000 Sequencing System (Illumina, USA) and a MiSeq Reagent Kit v2 (600 cycles) (Illumina), which provides an average of 25 million reads of 2 × 250-bp fragments.

Viral metagenomic analysis

Viral metagenomic analysis was performed using Galaxy (https://usegalaxy.org/), an open-source platform for genomic analysis. Initially, the paired sequences were analyzed using Kraken v1.3 [61], a taxonomic classification system for short DNA sequences. Then, the raw sequence data were aligned with reference sequences for each virus that showed the highest number of reads classified by Kraken. A sequence alignment was performed using the Bowtie2 program, which is highly efficient for aligning short sequencing reads to reference sequences [34, 35].

Similarity analysis of the primate erythroparvovirus 1 genome sequences

Consensus genomic sequences obtained using the Bowtie2 program for primate erythroparvovirus 1, for all pools of samples, were aligned using the CLC Sequence Viewer 8 program (QIAGEN, Germany) using default parameters.

Phylogenetic analysis of ZIKV

The ZIKV consensus genomic sequence obtained using the Bowtie2 program and genomic sequences of representative ZIKV isolates from in the Americas and Asia were aligned using CLC Sequence Viewer 8 software (QIAGEN, USA). This alignment was used to construct a phylogenetic tree using the Galaxy platform, an open-source software package [2]. The IQ-TREE program was used for phylogenetic reconstruction [40]. The sequences were analyzed using the ModelFinder program to select the evolution model [32]. The ultrafast bootstrap (UFBoot) approximation approach was used to compute the support of phylogenetic groups in the maximum-likelihood-based trees [29].

Results

Characteristics of the study population

Characteristics of 200 participants with suspected malaria, enrolled between August 2016 to February 2018, are shown in Table 1. The average age of the participants was 46.2 years. The majority were male (66%). The main symptoms were fever (100%), chills (80%), headache (79%), and retro-orbital pain (110) (Supplementary Table S1 and Supplementary Fig. S1).

Table 1 Characteristics of participants recruited at the Fundação de Medicina Tropical Dr Heitor Vieira Dourado (FMT–HVD

Screening for arboviruses and viruses transmitted by small mammals

Initially, 124 out of 200 participants (participants 1 to 124) were screened for the presence of viral infection (Supplementary Table S2). The cDNA synthesized from the total RNA purified from each serum sample was analyzed individually or in pools with a DNA microarray platform for the detection of arboviruses and viruses transmitted by small mammals. No virus was detected in any serum sample.

Viral metagenomics: next-generation sequencing with low sequencing depth (1 million reads)

Metagenomic analysis of individual or pooled PCR products obtained from 18 serum samples (Table 2) allowed the identification of several viruses in these samples. Supplementary Figure S2 shows the sequencing reads classification for pool 1.1 as an example. The species Human mastadenovirus C showed the highest number of classified reads (n = 53,456), while several other virus species showed a lower number of classified reads. Table 3 shows the summary of virus species with the highest number of classified reads for each pool of samples. The highest number of classified reads for pool 2.1 (n = 40,268) and pool 3.1-7 (n = 103,768) also corresponded to members of the species Human mastadenovirus C. For pool 4.2-5, pool 5.2-8, and pool 6.2-9, three virus species showed more than 100 classified reads in each case. For pool 7.2-10, the species Choristoneura occidentalis granulovirus had the highest number of classified reads (n = 799). Then, the viral genomic coverages were determined using the Bowtie2 program. Table 4 shows the genome coverage for each virus species that showed the highest number of classified reads with the Kraken program. Supplementary Figure S3 shows the alignment of the raw sequencing data of pool 1.1 to a reference sequence representing the species Human mastadenovirus C (35,932 bp), which was the virus species with the highest number of classified reads (n = 53,456) with the Kraken program for this pool. The viral reads of pool 1.1 aligned to a single region of approximately 476 bp located near the 5 end of the human mastadenovirus C genome, representing a genomic coverage of only 1.3%. The high number of reads covering a short genomic region could be an artifact introduced by the genomic enrichment performed by random RT-PCR. The alignment of the raw sequencing reads from pool 2.1 and pool 3.1-7 to the human mastadenovirus C reference sequence showed a pattern similar to that found in pool 1.1 (Table 4). Interestedly, the alignment of the raw sequencing data for pool 1.1 to human endogenous retrovirus K113, a virus with a low number of assigned reads (n = 124) by the Kraken program, showed 18.4% genomic coverage when analyzed using the Bowtie2 program, higher than the coverage observed for human mastadenovirus C (Table 4, Supplementary Fig. S4). These results suggest that the human endogenous retrovirus K113 is more likely to be present in the samples of pool 1.1 than human mastadenovirus C. However, when NGS detects viruses, high genomic coverage (~100%) is expected in the alignment analysis. Reads found for human endogenous retrovirus K113 in pool 1.1 covering a small percentage of the genome (18.4%) could be due to the low sequencing depth (approximately 1 million reads) provided by the Nano 300 cartridge. In addition, the low sequencing depth might have impeded the detection of other viruses that were present in the serum samples with a low viral load. Therefore, we decided to perform another NGS using another cartridge that provides greater sequencing depth.

Table 2 Patients subjected to viral metagenomic analysis
Table 3 Virus species with the highest numbers of reads classified by the Kraken program
Table 4 Virus genome coverage of reads

Viral metagenomics: next-generation sequencing with high sequencing depth (25 million reads)

Sequencing libraries were prepared for pools of RNAs isolated from 76 individual serum samples (participants 125 to 200) (Table 5). The sequencing reads were first classified using the Kraken program. Supplementary Figure S5 shows the result of this analysis. Primate erythroparvovirus 1 (formerly Human parvovirus B19) was the virus species with the highest number of classified reads (n = 2,470,215) for pool 5. The raw sequencing data of each pool were aligned, using the Bowtie2 program, to a viral reference sequence representing each of the virus species that showed the highest number of classified reads with Kraken. As an example, Supplementary Figure S6 shows the alignment of the raw sequencing data of pool 5 to a reference sequence of primate erythroparvovirus 1. A large number of reads (n = 2,564,414) were assigned for this virus, representing 100% genome coverage and high coverage per site (median, 89,660 pairs of reads), strongly suggesting the presence of this virus in the samples from pool 5. Table 6 shows the virus species with the highest genome coverage for each pool of serum samples. In addition to pool 5, primate erythroparvovirus 1 also was detected in all the other sample pools. However, in those pools, the alignment data showed a lower number of reads (ranging from 15 to 1669 pairs of reads), a lower percentage of genome coverage (ranging from 33 to 99%), and lower genome coverage per site (ranging from 1 to 112 pairs of reads) for this virus. We aligned the consensus sequences of the primate erythroparvovirus 1 genome obtained from all sample pools for similarity analysis (data not shown). This alignment showed a high degree of similarity among the genome sequences, with a few point mutations in all sequences when compared to the genomic sequence obtained for pool 5. However, the low genome coverage per site (between 1 and 2 pairs of reads per site) was not enough to support the existence of those point mutations. These results suggest that the detection of primate erythroparvovirus 1 in pools other than pool 5 was due to cross-contamination. The second most frequent virus found in the samples pools was Escherichia virus Lambda, detected in eight pools (Table 6). However, this is a bacteriophage that does not infect humans. We found only one arbovirus among the analyzed samples, ZIKV in pool 3. The Bowtie2 program analysis assigned to this virus 152 reads, with a genome coverage of 82% and a genome coverage per site between 1 to 13 pairs of reads. In a phylogenetic analysis, a 1664-nucleotide-long consensus sequence from this virus, spanning the NS3, NS4A, and NS4B genes of ZIKV, showed that this virus grouped with ZIKV detected in Manaus in 2016, with high bootstrap support (90%) (Fig. 1). Furthermore, we also found evidence of patients infected with pegivirus C in pool 5 and pool 6. Finally, we detected human endogenous retrovirus K113, with genome coverage ranging between 93 to 100%, in all sample pools (Table 6).

Table 5 Identification of pools of cDNAs subjected to NGS
Table 6 Virus species showing the highest genome coverage in the analysis with the Bowtie2 program
Fig. 1
figure 1

ZIKV phylogenetic tree. Phylogenetic relationships among the 1664 nucleotides spanning the NS3, NS4A, and NS4B genes of the ZIKV genome were resolved using a maximum-likelihood tree inferred with IQ-TREE under the GTR + R4 model of nucleotide substitution. The robustness of the tree topology was assessed with 1000 bootstrap replicates. Nodes with bootstrap values ≥ 70% are shown. ZIKV isolates from Asia and America are named based on their GenBank accession number, the country, and the isolation year. The ZIKV detected in this study is indicated by a red asterisk. A ZIKV isolate from Uganda in 1947 (GenBank accession number MW143019) was used to root the tree. We have hidden the Uganda_1947 (MW143019) node to improve the visual presentation of the phylogenetic tree. The tree was visualized using the CLC Sequence Viewer 8 program (QIAGEN, USA).

Discussion

The incidence of malaria is continuing to decline in Brazil; however, febrile illnesses of unknown origin are common, especially in malaria-endemic regions [7]. We hypothesized that arboviruses might be responsible for these febrile cases in the Amazon region of Brazil. To test this hypothesis, we first screened the serum samples of 124 of the 200 participants in this study, using a DNA microarray platform designed for the detection of arboviruses and other viruses transmitted by small mammals [33], but no viral sequences were detected. As an alternative approach, we used NGS to obtain metagenomic information from 76 of the 200 participants, which showed the presence of only one arbovirus (ZIKV) in one out of ten pools of samples. These results suggest that the hypothesis was not correct and that arboviruses are not frequent in suspected malaria cases with negative thick blood smear test results in the metropolitan region of Manaus. However, there are some limitations to this study that need to be mentioned for a better interpretation of these results. This was a cross-sectional study of convenience sampling, enrolling participants over two years to detect seasonal circulation of arboviruses. Multiple arboviruses are endemic to Manaus, including DENV, CHIKV, and ZIKV [28]. Therefore, we believe the screening of 200 participants recruited in this period would have allowed the detection of arboviruses if they were prevalent in the studied population. However, a study including a random selection of participants would be necessary to confirm our findings. The use of a thick blood smear test for malaria diagnosis was another limitation of our study. This low-sensitivity test might not identify individuals with asymptomatic or oligoasymptomatic infections in endemic areas. In this study, we included only symptomatic participants with a diagnosis of suspected malaria to reduce the chance of false-negative results with the thick blood smear test. An additional limitation of the study was the enrollment of participants up to 10 days after the onset of symptoms, which might have reduced the chance of arbovirus detection. We used this criterion because it is a strategy used for malaria testing at the FMT–HVD. Most people infected with arboviruses are viremic for about 2-7 days from onset of symptoms; however, viremia can last for more than 10 days in some cases [14, 15, 26]. Eighty-four of the 200 participants (42%) were sampled up to seven days after the onset of symptoms in 2016, 2017, and 2018 (Supplementary Table S1). This number of participants and the enrollment over two years would have ensured the detection of arboviruses if they were circulating among this population during the study period. Although DENV is the most prevalent arbovirus in Brazil, our study did not detect this virus. This finding could be related to the decline in dengue cases in 2017 and 2018 after the Zika epidemic. [9].

Only one arbovirus was detected in this study, ZIKV. ZIKV is transmitted to humans mainly by the bite of infected Aedes aegypti mosquitoes. There have also been reports of sexual transmission, vertical transmission, and transmission through blood transfusion [49]. ZIKV was detected first in Brazil in the North Region in May 2015 and then spread to almost the entire country [11, 62]; however, with a lower incidence, a fact that could explain the low rate of ZIKV detection in this study. Phylogenetic studies have suggested that the introduction of ZIKV in Manaus occurred in November 2015 [25]. In 2017, the state of Amazonas reported 394 cases of ZIKV infections, with the highest numbers (n = 339) of virus infections occurring in Manaus [37]. A previous study showed that ZIKV circulating in Manaus grouped within a single monophyletic clade within the American clade [25]. An ML phylogenetic tree constructed using the consensus sequence of the ZIKV detected in our study showed that this virus grouped within the Manaus clade (Fig. 1), supporting the accuracy of our sequencing data.

To our knowledge, this is the first virus metagenomic study of suspected malaria cases in the Amazon region of Brazil. A previous study of suspected malaria cases (n = 341) in the same region in 2013 surveyed for the presence of DENV, but it was not detected in any of the participants [23]. In the same study, the authors also surveyed for the presence of primate erythroparvovirus 1 in children between 1 and 15 years old (n = 44), finding that the prevalence was 18% for this virus. Previous studies have shown a prevalence ranging from 17 to 24% for primate erythroparvovirus 1 DNA in the Amazon region in Brazil, although serological evidence indicates that the prevalence of this virus can be up to 74.5% [22, 24]. Recently, a literature review found the frequency of B19V (antigen or DNA) in the blood varies from 0.02 to 1.9%, according to the sensitivity of the method used for the diagnosis as well as the activity of the virus in the community [16]. In the present study, we detected primate erythroparvovirus 1 in one out of ten pools of samples, which falls within the frequency (0.02 to 1.9%) of detection in blood for this virus. Members of the species Primate erythroparvovirus 1 are non-enveloped viruses, with a single-stranded DNA genome of 5596 nt, belonging to the genus Erythroparvovirus, family Parvoviridae. The virus is transmitted mainly by the respiratory route. The main targets of primate erythroparvovirus 1 are erythrocytes and erythroid precursors [10], although the virus also shows tropism for a variety of other cell types, such as megakaryocytes, endothelial cells, fetal myocytes, and placental trophoblasts [4]. After primary infection, the virus remains detectable in symptomatic and asymptomatic individuals [12, 19, 58], in some cases, for 70 years or more [44]. Primate erythroparvovirus 1 infection is commonly asymptomatic, benign, or self-limiting, and clinical manifestations include erythema infectiosum, transient aplastic crisis, flu-like symptoms, rash, arthralgia, and arthritis, and this virus has recently been associated with a number of rheumatic diseases, including systemic lupus erythematosus (SLE); thus, parvovirus infection can present with multisystemic symptoms resembling SLE both clinically and serologically [4]. Our findings and data from the literature confirm the importance of primate erythroparvovirus 1 in the differential diagnosis of malaria in the Amazon. Escherichia virus Lambda was the second most frequently detected viral sequence in this study, suggesting a bacterial infection in some of the participants or an external cross-contamination at any steps throughout the whole sequencing protocol, which is a common finding in viral metagenomic studies [30]. We will conduct a bacterial metagenomic analysis to test these hypotheses in the future. We also detected pegivirus C in two pools of samples. Members of the species Pegivirus C (genus Pegivirus, family Flaviviridae) are transmitted among humans mainly through exposure to contaminated blood. This virus has been found at higher frequency in injection drug users [8, 31], dialysis patients [1, 27], and transplant recipients [17, 20]. So far, there is no evidence that pegivirus C is involved in any pathological events [13]. Previous studies have shown the widespread distribution of this virus in the country, with prevalence ranging from 5 to 21.7%, including the Amazon region [5, 36, 45, 50, 55,56,57].

In addition to the exogenous viruses, we also detected a human endogenous retrovirus in all pools of samples. Endogenous retroviruses comprise up to 5-8% of the human genome [6, 43]. Therefore, the detection of human endogenous retrovirus K113 in all pools of samples confirms the high quality of the sequencing method and the viral metagenomic pipeline used in this study. The number of reads mapping to this virus in all pools of samples was highly similar, suggesting the natural replication of this virus, not related to the disease presented by the participants. The fact that great variability in the genome coverage and depth of genomic coverage among the detected viruses were found might be related to the presence of different viral loads in the serum samples and not to the poor quality of the sequencing method or viral metagenomic pipeline.

In summary, we have detected several viruses in suspected malaria cases with negative blood tests for Plasmodium infection. These results underscore the importance of metagenomic analysis for screening for virus infections in malaria-endemic regions. High-throughput diagnostic methods, such as the DNA microarray platform, could be improved, including probes for the newly identified viruses. Then, the DNA microarray platform, which is both less labor-intensive and less expensive than NGS, could be used for virus surveillance in malaria-endemic regions.