Introduction

Antibiotics are used to treat bacterial infectious diseases in human and veterinary medical practices. Farmers also administer antibiotics to food-production animals, plants, aquaculture, etc., mainly for promoting growth and preventing infections [1,2,3]. However, the misuse of antibiotics in clinical situations and in agriculture leads to antibiotic residues in animal-derived products and environments [4, 5]. Additionally, overusing antibiotics provides a selective pressure which facilitates the acquisition of antibiotic resistance genes (ARGs) through mutations or horizontal gene transfer—requiring mobile genetic elements (MGEs) as important carriers [6,7,8,9,10]. Antibiotic resistance poses a significant concern for public health because the antibiotic-resistant bacteria associated with the animals are often pathogenic to humans, cause complicated, untreatable, and prolonged infections in humans, and subsequently may lead to higher healthcare costs and even death [5, 11]. Water environments like aquaculture farms and rivers in urban areas are potential reservoirs for ARG pollution and hot spots of horizontal gene transfer [12,13,14,15].

Vietnam is a top producers and exporters of aquaculture products [16] and its farmers use antibiotics for disease treatment and prevention, and to stimulate shrimp/fish growth [17,18,19]. Although the Vietnamese government authority has issued strict regulations for the use of antibiotics in aquaculture [20], some antibiotics are still being reported even after they have been banned [21,22,23]. As reviewed by Lulijwa et al. [24], Vietnam is one of the leading users of antibiotics. The inappropriate use of antibiotics in agriculture is attributed to an emergence of drug-resistant pathogens [23, 25]. These factors contribute to the emergence of multi-drug resistant (MDR) bacteria in aquaculture and the surrounding environments. Therefore, research on the bacterial structure and diversity, and ARG abundance in water environments is urgently needed. Although several studies focused on antibiotics and ARG issues in aquaculture environments in Southern Vietnam have been done [19, 23, 26,27,28], the status of ARGs in aquaculture and the surrounding environment in Northern Vietnam has still not been assessed.

Metagenomic research using next-generation sequencing techniques is a powerful tool that can be utilized to investigate the microbial community and functional genomes in various environments [29,30,31,32,33,34]. To our knowledge, application of metagenomics to study antibiotic resistance in Vietnam aquaculture environments are in their infancy. Determining the structure and composition of bacterial communities regarding antibiotic resistance in different water systems may aid in controlling the spread of antibiotic-resistant elements. The Day River basin—one tributary of the Red River system branching 35 km upstream of Hanoi, is an essential component of the flood control system in the Red River, Hanoi, Vietnam [35]. Water pollution and degradation are concentrated in the middle and downstream areas of these river basins [36]. This study used a high-throughput sequencing-based metagenomic approach to investigate the seasonal distribution of microbial communities and antibiotic resistance genes in downstream Day River, Ninh Binh, Vietnam. The present study provides information both on the structure of bacterial communities, ARGs, and MGEs in water bodies and the correlation among ARGs, MGEs, microbial communities, and environmental factors. This study can offer insight for further controlling the prevalence of ARGs in the aquatic environments of Northern Vietnam. By the lack of studies that survey the microbial communities and antibiotic resistance genes in Northern Vietnam, our result could be the unique data that explored the correlation between the abundance of ARGs, MGEs, and water environmental parameters regarding the seasonal variations.

Material and methods

Sample collection

Sampling sites: water samples at two different locations—the Day River (including four sites—denoted as R1, R2, R3, R4) and three shrimp ponds (outside the Day River dike, namely, P1, P2, P3) (Fig. 1, Additional file 1: Table S1) downstream of the Day River in Ninh Binh, Vietnam. Sites were sampled in rainy (June—denoted as RS) and dry (December—denoted as DS) seansons in 2018. Most shrimp ponds outside the Day River dike operate a flow-through system that pumps in the cleaned river for shrimp farming and discharges wastewater back to Day River through a discharge canal, namely DCOD (which is directly connected to the river). The wastewater from aquaculture ponds inside the Day River dike is collected into a discharge canal (called DCID) connected to the river via a trench drain with a concrete sluice gate to control water flow. DCOD and DCID samples served as reference samples.

Fig. 1
figure 1

Sampling locations. Schematic diagram of sampling locations in Day River Downstream, Ninh Binh, Vietnam. P1, P2, P3: three different shrimp pond sites; R1, R2, R3, R4: four different river sites. DCOD, DCID: discharged canals. P1, P2, P3, R1, R2, R3, R4 and DCOD are located outside the Day River dike. DCID is located inside the Day River dike

Using sterile bottles, water samples were collected from each site in triplicates (3 × 500 mL) at a depth of 0.5 m from the water surface. All the samples were stored in a dark portable icebox and transferred to the laboratory within four hours. Physicochemical parameters of the water samples were measured by the Institute of Chemistry, Vietnam Academy of Science and Technology, and by using the Horiba Multiparameter water quality checker U-50 (Additional file 1: Table S1).

DNA extraction and sequencing

DNA extraction: Water samples were prefiltered through a 15 µm pore size filter then filtered again through the mixed cellulose ester gridded membrane filter (Membrane Solutions—USA, pore size 0.22 μm). Total DNA was extracted from filtered membranes using the Power Soil DNA isolation kit (Qiagen GmbH, Germany).

DNA sequencing: Total DNA extraction of all triplicate samples (collected from each sampling site) were pooled together to minimize the potential variations during DNA extraction and sent to Macrogen Inc. (Seoul, Republic of Korea) for Illumina Novaseq 6000 sequencing (150 bp × 2). Our DNA sequences data was submitted to Genbank with project accession number PRJNA770010.

Sequence analysis

Raw sequences were screened and trimmed using Pearf software (https://microbiology.se/software/petkit/) with the following parameters: pearf -q 28 -f 0.25 -t 0.05 -l 30 to obtain the filtered sequences. The filtered raw sequences were mapped against small subunit ribosomal RNA (SSU rRNA) sequences using the Metaxa2 software [37] (https://microbiology.se/software/metaxa2/) to access the microbial community structure.

The filtered raw sequences were mapped against the Comprehensive Antibiotic Resistance Database (CARD) version 3.0.7 [38] using EDGE script [39] (https://edge.readthedocs.io/en/latest/index.html) with BWA tool [40] (http://bio-bwa.sourceforge.net/) for ARG diversity and abundance elucidation.

The filtered raw sequences were also mapped against the Mobile Genetic Elements database (MGE) [41] for determining MGE composition.

The filtered raw sequences were used as input for de novo assembly in the Megahit software [39] (megahit–min-contig-len 300–presets meta-sensitive -m 0.9) to produce contigs. From all assembled contigs, the ORFs were predicted by using the Prodigal software with option -p meta for metagenomic samples and -c option for closed ends of ORFs [40] (prodigal -c -f gbk -g 11 -m -n -p meta) and aligned them to CARD (amino acid sequences) database using “diamond blastp” [42] (https://www.wsi.uni-tuebingen.de/lehrstuehle/algorithms-in-bioinformatics/software/diamond) (https://github.com/bbuchfink/diamond) with an E value threshold of 1e−10, a bit score of 50, and a sequence similarity cut-off > 70% to access detail ARGs diversity and abundance. In addition, PlasFlow tool (version 1.1) was applied to classify the genome location (chromosome- and plasmid-like contigs) of all the assembled long contigs (> 1000 bp) [43]. The ORFs were normalized by the number of copies of the 16S rRNA gene—estimated using the barrnap 0.9 tool (https://hpc.ilri.cgiar.org/barrnap-software) (https://vicbioinformatics.com/software.barrnap.shtml). Our workflow is summarized in Fig. 2.

Fig. 2
figure 2

Workflow diagram

Statistical analysis

We used Microsoft Excel tools F.test and T.test for statistical analyses. The Pearson correlation was used to assess correlations among the abundance of ARGs, MGEs, microbial communities, and environmental factors.

Results

Microbial community composition and abundance

At the phylum level, we detected a total of 37 bacterial phyla in all samples; with Proteobacteria, Bacteroidetes, and Actinobacteria are the most dominant phyla (Fig. 3). The bacterial community structure shows a similar pattern between all samples at the phylum level. The river water samples collected during the rainy season (include R1-RS, R2-RS, R3-RS, and R4-RS, namely, R-RS) show a significantly higher relative abundance of Proteobacteria and Bacteroidetes than the other samples (including P1-RS, P2-RS, and P3-RS, namely, P-RS; R1-DS, R2-DS, R3-DS, and R4-DS, namely, R-DS; P1-DS, P2-DS, and P3-DS, namely, P-DS). The DCOD-RS displays a similar pattern with R-RS, while DCID-RS shows similarity with P-RS.

Fig. 3
figure 3

Seasonal Distribution of microbial communities of river and shrimp pond water samples at the phylum level. “Other classified phyla” indicates the sum of the abundance of phyla with their maximum relative abundance percentages lower than 1% in any sample. P1, P2, P3: three different shrimp pond sites; R1, R2, R3, R4: four different river sites. DCOD, DCID: discharged canals. P1, P2, P3, R1, R2, R3, R4 and DCOD located outside the Day River dike. DCID located inside the Day River dike. RS: rainy season; DS: dry season

We found a total of 335 bacterial families in all samples; of these, 38 families have a relative abundance greater than 1% in at least one sample (Additional file 1: Fig. S1A). The relative abundance at the family level in R-RS is higher than in R-DS, P-RS, and P-DS—mainly dominated by Comamonadaceae, Moraxellaceae, Pseudomonadaceae, Cytophagaceae, and Sphingomonadaceae (p < 0.05). The relative abundance of Microbacteriaceae is higher in P-RS than in others. We observe typical families associated with opportunistic pathogens, including Burkholderiaceae, Neisseriaceae, and Xanthomonadaceae (Additional file 1: Fig. S1A) in the river and shrimp pond water samples.

We found 34 bacterial genera that have a relative abundance greater than 1% in at least one sample (Additional file 1: Fig. S1B). Acinetobacter, Cellvibrio, Arcicella, Novosphingobium, and Rhodobacter are more abundant in R-RS (Additional file 1: Fig. S1B), while Candidatus Pelagibacter was found to be more abundant in P-RS (Additional file 1: Fig. S1B). The bacteria with relative abundance greater than 0.1% in at least one sample in R-RS differed at the species level from R-DS, P-RS, P-DS (Additional file 1: Fig. S1C). Opportunistic pathogens, including Acinetobacter baumannii, Acinetobacter junii, Acinetobacter sp., Pseudomonas aeruginosa were mainly observed in R-RS. Otherwise Candidatus Pelagibacter ubique, Candidatus Pelagibacter sp. IMCC9063, and Candidatus Aquiluna rubra were the most prevalent species of P-RS, R-DS, P-DS.

ARGs composition, abundance, and resistance mechanisms

Our Illumina high-throughput sequencing data show that the ARG composition of each sample was quite similar (Fig. 4A). We found a total of 27 ARG types (92 subtypes) categorized by drug class antibiotic. The predominant ARG type was MDR, with ARG conferring resistance to two or more drug class categories, followed by rifamycin, aminoglycoside, and sulfonamide. Although ARG composition in each sample was not different, ARG abundance was higher in R-RS samples (Fig. 4A). Among different MDRs, the most dominant subtype is MDR14a. MDR14a is a subtype of multi-drug-resistant-gene that resists 14 drug classes, including macrolide, fluoroquinolone, monobactam, carbapenem, cephalosporin, cephamycin, penam, tetracycline, peptide, aminocoumarin, diaminopyrimidine, sulfonamide, phenicol, and penem (Additional file 1: Fig. S2A).

Fig. 4
figure 4

Abundance of main ARGs resistance types. ARGs resistance types categorized by drug class resistance type (A) and ARGs resistance mechanisms (B). P1, P2, P3: three different shrimp pond sites; R1, R2, R3, R4: four different river sites. DCOD, DCID: discharged canals. P1, P2, P3, R1, R2, R3, R4 and DCOD located outside the Day River dike. DCID located inside the Day River dike. RS: rainy season; DS: dry season

Using total metagenome sequencing contigs, we found 206 ARGs in our samples. Additional file 1: Fig. S2B shows the abundance of the top 15 ARGs from each sample. The most abundant ARGs were mexK, MuxB, and ugd with the copy of ARG per 16S rRNA gene value as 0.046, 0.045, and 0.038, respectively. Abundance values of individual ARGs in R-DS and P-DS were the lowest with the copy of ARG per 16S rRNA gene value of each ARG below 0.01, except ugd of P-DS.

As to ARG resistance mechanisms, our results in Fig. 4B indicate that “antibiotic efflux” is the predominant resistant mechanism in all samples, followed by "antibiotic target alteration combining antibiotic target replacement", "antibiotic inactivation", "antibiotic target alteration", "antibiotic target protection", "antibiotic target replacement", "reduced permeability to antibiotic", and "antibiotic efflux combining reduced permeability to antibiotic".

Composition and abundance of MGEs

We detected a total of 238 MGEs representing four groups including plasmids, transposon, integrons, and insertion sequences (IS) in all samples. Figure 5A demonstrates that the total quantity of transposon-like MGE was the most predominant, followed by IS -like MGEs, plasmid-like MGEs, and integron-like MGEs. Figure 5B shows the top 15 MGEs from each sample, among which tnpA is the most abundant MGE, followed by is9, iscrsp1, istB, and istA.

Fig. 5
figure 5

The Distribution and abundance of MGE types. A The abundance values of MGE group. B The abundance values of the top 15 MGE types of each sample. P1, P2, P3: three different shrimp pond sites; R1, R2, R3, R4: four different river sites. DCOD, DCID: discharged canals. P1, P2, P3, R1, R2, R3, R4 and DCOD located outside the Day River dike. DCID located inside the Day River dike. RS: rainy season; DS: dry season

Correlation among ARGs, MGEs, microbial communities, and environmental factors

Correlation between bacterial communities and environmental factors

Table 1 indicates that the relative abundance of Proteobacteria and Bacteroidetes phyla strongly and positively correlated with pH and NO3 concentration (R > 0.75, p-value < 0.05), whereas the relative abundance of Actinobacteria phylum significantly and positively correlates with temperature and pH. Conductivity, TDS, and salinity significantly and negatively affect the relative abundance of Proteobacteria (R < − 0.5, p-value < 0.05).

Table 1 Pearson correlation between the relative abundance of bacterial communities and the indicated environment factors

Correlation between ARGs/MGEs and environmental factors

Our results revealed that the abundance of ARGs positively correlates with temperature, pH, and NO3 concentration (R > 0.5, p-value < 0.05), whereas it negatively correlates with conductivity, TDS, and salt concentration (R < − 0.5, p-value < 0.05) (Table 2). The abundance of MGEs also significantly positively correlates with NO3 concentration and negatively correlates with conductivity, TDS, and salt concentration. Interestingly, the abundance of plasmids significantly correlates with temperature but not with pH. In contrast, the abundance of integrons and insertional sequences significantly but weakly correlates with pH (p-value < 0.05) but does not correlate with temperature. However, temperature and pH do not significantly affect the abundance of transposons (Table 2).

Table 2 Pearson correlation among the abundance between ARGs, MGEs, and environment factors

Correlation between ARGs or MGEs with bacterial communities

Our data show that the abundance of MGEs and ARGs (top ARGs except for glycopeptide- and mupirocin-resistant-gene) strongly and positively correlated with Proteobacteria and Bacteroidetes phyla (p-value < 0.001) (Table 3). The abundance of Actinobacteria phylum significantly correlates with rifamycin-, aminocoumarin-, and mupirocin-resistant-gene (Table 3). Additionally, the relative abundance of typical species associated with opportunistic pathogens including Acinetobacter baumannii, Acinetobacter junii, Acinetobacter sp., and Pseudomonas aeruginosa has a positively strong correlation with the abundance of ARGs and MGEs (R > 0.79, p-value < 0.001) (Table 4, Additional file 1).

Table 3 Pearson correlation among the abundance of ARGs, MGEs, and bacterial communities at the phylum level
Table 4 Pearson correlation among the abundance of ARGs, MGEs, and bacterial communities at the species level

Correlation between ARGs and MGEs

Table 5 shows that the abundance of total ARGs positively and strongly correlates with the abundance of total MGEs (R = 0.868, p-value < 0.0001). Among the 14 most abundant ARG types, ARG conferring resistance to mupirocin has a weak and insignificant correlation with all types of MGEs. The abundance of penam-resistant-gene significantly correlates with integrases and insertional sequences. The abundance of aminocoumarin-resistant-gene significantly correlates with transposases, intergrases, and insertional sequences. The abundance of all 11 remaining ARGs has a significant correlation with both total MGEs and individual MGE types (R > 0.5) (Table 5). The correlation of ARG conferring resistance to sulfonamide with intergrases is strongest (R = 0.969, p-value < 0.0001), followed by MDRs, macrolide, fluoroquinolone (R > 0.85, p-value < 0.001).

Table 5 Pearson correlation between the abundance of ARGs and the abundance of MGEs

Discussion

Bacterial community composition and abundance

In the present study, we found that bacterial community structures at the phylum level differed slightly by season (rainy vs. dry) or location (river vs. shrimp pond). Proteobacteria, Bacteroidetes, and Actinobacteria were the most dominant phyla in all water samples collected (Fig. 3), consistent with the previous studies done in Vietnam [26, 44], China [29, 45], and the US [32]. The relative abundance of Proteobacteria and Bacteroidetes are significantly different compared to others (not show data), suggesting that environmental factors may affect the abundance of bacterial communities, especially on Proteobacteria and Bacteroidetes. The abundance of the bacterial community found in the river water during the rainy season includes the following families: Comamonadaceae, Moraxellaceae, Pseudomonadaceae, Cytophagaceae, and Microbacteriaceae (Additional file 1: Fig. S1A); genera: Acinetobacter, Cellvibrio, Arcicella, Novosphingobium, Rhodobacter, and Candidatus Pelagibacter (Additional file 1: Fig. S1B). Comamonadacea is often associated with high nutrient conditions such as urban streams, soil, activated sludge, and wastewater [46,47,48]. Typical families associated with opportunistic pathogens were also observed, such as Burkholderiaceae, Neisseriaceae, Xanthomonadaceae. The following species were found to be most abundant in the river water during the rainy season: Acinetobacter sp., Pseudomonas aeruginosa, Acinetobacter baumannii, Acinetobacter junii, Pseudomonas stutzeri, Pseudomonas pseudoalcaligenes. These bacteria are potentially pathogenic and could cause many diseases such as urinary tract infection, pneumonia, meningitis, dermatitis, and are the primary cause of nosocomial infections [49]. The noticeable detection of these bacteria in both the river water and shrimp ponds suggests that the Day River Downstream can be a reservoir of pathogenic bacteria and thereby pose potential risks to shrimp and human health, especially during the rainy season.

ARGs and MGEs composition and abundance

Our data revealed that the pattern of major ARGs and MGEs just slightly varies by season and location. Overall, the abundance of ARGs and MGEs is higher in the river water during the rainy season (Figs. 4 and 5). These results suggest that the quantity of ARGs and MGEs might relate to each other. Presumably, rainfall might bring pollutants from the soil into the natural water bodies. This may lead to intensive pollution of the natural water bodies, increase the amount of ARGs [50], and even stimulate the river water flow to bring contaminants and ARGs into the downstream environment [51], changing the physicochemical properties of the water environment. Once the water environment becomes contaminated with ARGs, the ARGs will persist as pollutants and pose a challenge to eliminate [11, 52].

Overall, MDR- (especial MDR14a subtype), rifamycin- and aminoglycoside-resistant-genes are predominant in both shrimp ponds and the Day River (Fig. 4A; Additional file 1: Fig. S2A), suggesting that the above antibiotics are the most prevalent antibiotics consumed in the study area. Previous studies have indicated that phenicol, tetracycline, and sulfonamide were commonly used in Vietnam [19, 27, 53, 54]. Our data is consistent with Ronald et al. [24] that Vietnam is one of the leading users of antibiotic compounds [24].

On the other hand, transposases MGEswere predominant in all samples collected (Fig. 5A). Transposases and integrases are the main acquisition drivers of ARGs by MGEs [55]. TnpA and intI1 are important MGE markers, frequently found in various environments [56, 57]. Class 1 integron (intI1) might be a good indicator of antibiotic resistance associated with anthropogenic pollution due to the positive correlations with ARGs and anthropogenic pollution [58]. Our data revealed that tnpA is the most abundant type of transposase, and intI1 was in the top 15 MGEs of each sample (Fig. 5B). We also found a significant correlation between ARGs and MGEs (Table 5). Thus, pollution from human activity might contribute to antibiotic resistance.

Correlation between bacterial communities and environmental factors

Different physicochemical properties (e.g. pH, temperature, NO3, TDS, salinity) and antibiotic usage patterns lead to variation in the bacterial community in aquatic environments [59,60,61,62]. Our data shows that pH affected the relative abundance values of all three of the most abundant bacterial phyla (Proteobacteria, Bacteroidetes, Actinobacteria) (Table 1). Previous studies have indicated that pH value is one of the most significant environmental factors influencing the microbial communities of freshwater [63,64,65,66,67,68,69]. In addition, NO3 concentration strongly and positively correlates with the two most abundant phyla; Proteobacteria and Bacteroidetes. Conductivity, TDS, and salinity significantly negatively correlated with the relative abundance of Proteobacteria. As mentioned above, Proteobacteria and Bacteroidetes are substantially more abundant in the river water during the rainy season. We also observe that during the rainy season, the pH value and NO3 concentration of the river water increases, while the salinity, TDS, and conductivity decrease (Additional file 1: Table S1), which might facilitate the growth of Proteobacteria and Bacteroidetes, contributing to the emerging abundance of these bacterial communities. Proteobacteria play essential roles in denitrification, phosphorus removal, and organic degradation [70]. Bacteroidetes decompose complex organic compounds and polymers to create simpler molecules for other microorganisms’ utilization [71]. Actinobacteria decompose tough compounds and certain toxic compounds [72]. These bacteria might play a significant contribution to removing river water pollutants during the rainy season and therefore may be why there is a high abundance of them in our samples.

ARGs mechanisms, and correlation between ARGs and MGEs

The primary antibiotic resistance mechanism used in the shrimp pond water is quite similar to those in the river water, "antibiotic efflux", followed by "antibiotic target alteration combining antibiotic target replacement" is quite similar to the mechanism used in the river water (Fig. 4B). Moreover, the ARGs belonging to "resistance nodulation cell division (RND) antibiotic efflux pump" (including MexK, MuxB, acrB, MexF, adeF, mtrA, MexB, ceoB, adeJ, adeI, CRP, OprM, OprN, mdtB, and OpmH), "pmr phosphoethanolamine transferase" (ugd), and "multidrug and toxic compound extrusion (MATE) transporter" (pmpM) are the most abundance ARGs found in all samples. These results suggest that "antibiotic efflux" is the most abundant mechanism, thus intrinsic resistance mechanisms play essential roles in water environments in this study.

MGEs are crucial in transferring ARGs to new hosts to generate new resistant strains through horizontal gene transfer. Metagenomic studies observed the horizontal co-transfer of ARGs and MGEs [73,74,75]. Our data reveals that the abundance of ARGs and MGEs is strongly and significantly correlated (Table 5). This result suggests that horizontal gene transfer might also contribute to the co-transfer of ARGs and MGEs, resulting in antibiotic resistance formation at the Day River downstream.

Correlation of ARGs/MGEs with bacterial communities and environmental factors

In our study, the abundance of ARGs and MGEs is strongly correlated with Proteobacteria and Bacteroidetes (Table 3). Jian-Hua Wang et al. indicated that Proteobacteria carried more resistance genes than other phyla [76]. The dispersion of ARGs among pathogenic bacteria already had a significant effect on shrimp cultures worldwide, linked to massive losses in production [77]. Thi Thu Hang Pham et al. indicated that multi-resistant bacteria in intensive shrimp cultures might disseminate in the natural environment [78]. Thus, current water environments containing ARGs/MGEs carrier bacteria pose a risk of the dispersion of ARGs among pathogenic by horizontal gene transfer.

Our data also revealed that the abundance of ARGs and MGEs positively correlates with temperature, pH, and NO3 concentration, whereas negatively correlates with conductivity, TDS, and salt concentration (Table 2). However, the correlation of MGEs with temperature and pH was found to be insignificant. This might be due to the temperature specifically affecting the abundance of plasmids, while the pH affecting integrons and insertional sequences. The NO3 concentration affected all types of MGEs (Table 2). Thus, our data suggest environmental factors are associated with ARG abundance and MGEs in the Day River downstream.