Introduction

The study of traditional Chinese medicine (TCM) has led to important advances, such as the discovery of artemisinin for which the Nobel Prize was awarded (Christensen 2021). In 2018, the World Health Organisation included TCM in its globally influential medical compendium (Cyranoski 2018), and the awareness of and attention to TCM has gradually increased worldwide. Particularly, during various epidemics, such as malaria, severe acute respiratory syndrome, and coronavirus disease-19, TCMs have shown good clinical performance and potential for further development (Liu et al. 2012; Chen et al. 2021; Christensen 2021; Li et al. 2021a; Wang and Yang 2021). Chinese herbal slices (CHSs) are prescription medicines produced by processing Chinese herb raw materials, and can be used directly as medicines; they also constitute the raw materials of Chinese patent medicines and medicine extracts (Deng et al. 2019; Mo et al. 2019; Song et al. 2021), which are other important commodities of Chinese medicine (Song et al. 2021). Therefore, the quality of CHSs directly affects TCM clinical efficacy and drug safety, with important effects on human health (Deng et al. 2019; Mo et al. 2019; Li et al. 2020; Zhang et al. 2020; Song et al. 2021).

CHSs have gone through various stages of planting, processing, and storage. This leads to the presence of a wide variety of microorganisms, either endophyte and epiphyte, or introduced during the processing stage (Zhang 2018; Xia 2019; Gan et al. 2020; Zhang et al. 2020; Wu et al. 2021). Despite the presence of innocuous or even positive microorganisms, such as endophytes that promote the accumulation of active pharmaceutical compounds, there are numerous microbial contaminations that affect the quality and safety of CHSs and their downstream proprietary Chinese medicine products (Deng et al. 2019; Li et al. 2021b). Certain pathogenic microorganisms that enter the human body can affect human health and may be fatal (Deng et al. 2019; Gan et al. 2020). Thus, CHS microbial contamination is a vital issue for human medication. A comprehensive understanding of the microbial load status of CHSs will provide data support for microbial contamination assessments, and standard CHS formulations to promote the sustainable and healthy development of the CHS industry (Li et al. 2021b). Because of the uniqueness of CHS samples, the microbiological testing methods differ from those of general chemical medicines and proprietary Chinese medicines in terms of the sampling amount, testing procedures, and evaluation standards (Deng et al. 2019; Li et al. 2020, 2021b). The 43rd edition of the United States Pharmacopoeia, 10th edition of the European Pharmacopoeia, 17th edition of the Japanese Pharmacopoeia, and 2020 edition of the Chinese Pharmacopoeia, include inspection methods and limit standards for CHS microbial loads (Fan et al. 2020). Additionally, the pharmacopoeia methods for microbial counting and control bacteria inspection in CHSs are constantly being improved (Zhu et al. 2020). The former includes sampling, the selection of bacteria and culture media, and the counting of microorganisms and heat-resistant bacteria. The latter includes sample pre-treatment, enrichment, separation, and identification (Liu et al. 2018; Zhu et al. 2020; Fan et al. 2020). In fact, in earlier versions, the 8th European Pharmacopoeia and 16th Japanese Pharmacopoeia included inspection methods and limit standards for microbes in Chinese herbal medicines (natural medicines), and the 2015 edition of the Chinese Pharmacopoeia described limit standards for some CHS-related microbes, which played a positive role in guiding Chinese medicine manufacturers to consider microbial contamination and standardise the processing of Chinese herbal medicines (Liu et al. 2018; Li et al. 2020).

Although new standards have been issued and continue to be improved for detecting microbial contamination in CHSs, these methods mainly rely on traditional culture approaches, which are time- and labour-intensive and show low accuracy and efficiency (Deng et al. 2019; Li et al. 2019; Shi et al. 2019; Gan et al. 2020). In addition, most of the microorganisms cannot be cultured under laboratory conditions (Montgomery 2020). The amount of microbial background in CHSs is high (Liu et al. 2018; Deng et al. 2019; Li et al. 2020), and greatly interferes with the analysis of controlled bacteria. In cultures performed to obtain single colonies, there is a low probability of obtaining various microorganisms with low abundance at a given throughput, leading to a risk of missed detection. The choice of inspection method also strongly influences the enrichment and separation of different bacteria (Deng et al. 2019; Shi et al. 2019; Li et al. 2021b); these limitations make it difficult to comprehensively understand the microorganisms in CHSs. Continuous development of our understanding of the microbiome has enabled interpretation of the structures and functions of microbial communities at the species, gene, and metabolite levels (Sharon et al. 2019). Microbiome sequencing technology and analysis methods are useful for the multidimensional evaluation of microbial community structure succession and functional changes (Bahram et al. 2018). For example, amplicon sequencing based on genus/species markers, such as 16 S rDNA, can be used to analyse microbial diversity and community structure succession (Chen et al. 2020). On this basis, structural equation model fitting, network analysis, meta-analysis, and function annotation can achieve sufficient data mining (Banerjee et al. 2018). The application of microbiome technology and analysis methods to obtain microbial population information is important for risks assessment of CHS microbial contamination and provides a reference for formulating quality control standards for microorganisms in CHSs (Deng et al. 2019; Li et al. 2019; Shi et al. 2019; Gan et al. 2020).

In this study, pure culture-based and high-throughput genetic detection technologies were used to identify microorganisms in large samples of various CHSs. This approach greatly shortened the time required to comprehensively detect the composition and diversity of the CHS microbial populations. A database of CHS-related microbial species information was established, and a preliminary functional analysis was carried out. This research aimed to facilitate the assessment on microbial contamination and microbial transformation in CHSs. Our results provide theoretical and technical support for higher standards of CHS quality control and promote the modernisation of TCM.

Materials and methods

CHS sample collection

We selected 30 batches of each of five varieties of CHSs, including Angelica (Ang), Astragalus (Ast), Glycyrrhiza (Gly), Codonopsis (Cod), and Rheum (Rhe). A total of 150 batches of CHS samples was collected. The collected CHSs were individually packaged in the market. The raw medicinal materials for preparing CHSs are mainly produced in several authentic regions in China. Different raw medicinal materials and their slices undergo different planting, processing, and concocting processes. For the same kind of slices, we collected samples from different manufacturers and different production areas, but with the same plant tissue composition. Therefore, although the different slice samples for each raw medicinal material had undergone similar planting, processing, and concocting methods, they had experienced different geographical, climatic, and social factors, differentiating them in time and space. These differences also contribute to the representativeness of the collected samples for each CHS.

Quantitative determination of total aerobic microbial count, total yeast and mold counts, bile salt-tolerant gram-negative bacteria, and Salmonella

Processing from plants to raw medicinal materials to slices is a non-aseptic operation, and the products are packed by quality-inspection agencies for inspection, supervision, and sale. After sample collection, the individually packaged CHSs were transported to the laboratory. The unsealing, grinding, and preparation of the bacterial suspensions were performed under strict aseptic conditions to avoid introducing uncertain contaminants.

The total aerobic microbial count (TAMC), total yeast and mold count (TYMC), and bile salt-tolerant gram-negative bacteria (BGB) were determined as described in the “Microbial Limit Inspection of Non-sterile Products–Microbial Counting Method” (General Principle 1105) and the “Non-sterile Products–Microbial Limit Inspection–Controlled Bacteria Inspection Law” (General Rule 1106) in the 2015 edition of the Chinese Pharmacopoeia (Liu et al. 2018; Li et al. 2020). Before microbial isolation or enrichment culture, the CHSs were ground directly, without any surface sterilisation treatment. Thus, the obtained information on the bacterial community of the CHSs contains both surface and endophytic bacteria. Specifically, 10 g of each sample was added to 100 mL of pH 7.0 sodium chloride-peptone buffer and shaken to prepare a 1:10 test solution, which was serially diluted by 10-fold with pH 7.0 sterile sodium chloride-peptone buffer. For TAMC, tryptic soy peptone agar medium was poured, 1 mL sample dilution was spread, and the samples were cultivated at 33 °C for 5 days. For TYMC, the test solution was serially diluted by 10-fold with pH 7.0 sterile sodium chloride-peptone buffer solution, and Sabouraud’s dextrose agar medium was used. For BGB determination, the test dilutions (1 mL of 10− 1, 10− 2, and 10− 3) were inoculated into 10 mL of intestinal enrichment liquid medium and incubated at 35 °C for 48 h. The above-mentioned pre-culture, equivalent to a 1 g sample, was inoculated into 100 mL of intestinal bacteria enrichment liquid medium. After cultivation at 33 ℃ for 24 h, the sample was inoculated onto a violet red bile salt glucose agar medium plate and cultivated at 33 ℃ for 24 h, after which the BGB was determined.

For Salmonella detection, 10 g of CHS was inoculated into 100 mL tryptic soy peptone liquid medium. After incubation at 33 ℃ for 48 h, 0.1 mL of the above culture was inoculated into 10 mL RV Salmonella enrichment liquid medium, and cultured at 33 ℃ for 48 h. A small amount of RV Salmonella-enriched liquid culture was then streaked and inoculated onto a xylose-lysine deoxycholate agar medium plate, and cultured at 33 ℃ for 48 h.

Species identification of typical pure-cultured colonies

Different microbial colonies with typical morphologies were selected from the tryptic soy peptone agar plates and violet red bile salt glucose agar plates, the strains were separated and purified, and the 16 S rDNA gene fragment was amplified using 27 F and 1492R as primers. The amplified fragments were sequenced using the Sanger method, and the measured sequences were aligned with the BLAST software (http://blast.ncbi.nlm.nih.gov/Blast.cgi) to determine the species of the isolated and purified bacteria.

Metagenomic extraction and 16 S rDNA high-throughput sequencing

Roughly 100 g of sample powder from each batch was mixed evenly. Sample powder (25 g) was mixed with 225 mL of 0.9% sterile sodium chloride solution, and the mixture was homogenised for 1–2 min and allowed to stand for 1 min, after which the supernatant was used as the 1:10 test solution. The TAKARA bacterial genome kit (Shiga, Japan) was used to extract contaminated microbial DNA as a sequencing template. For the V4-V5 region of 16 S rDNA, primers 515 F 5′-GTGCCAGCMGCCGCGGTAA-3′ and 926R 5′-CCGTCAATTCMTTTGAGTTT-3′ were used for PCR amplification. After agarose gel electrophoresis detection, the PCR-positive amplification products were recovered using the AxyPrepDNA gel recovery kit (Axygen, Union City, CA, USA). Using the recovered PCR product as a template, a fusion primer composed of 5′-MiSeq adapter-barcode-sequencing primer-specific primer-3′ was used for secondary PCR amplification. During secondary PCR amplification, sequencing tags were added, and the PCR products of each sample were purified with AMPure XP beads (Beckman Coulter, Brea, CA, USA). The purified PCR product was quantitatively analysed using a Qubit 3.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). After homogenisation, the products were sequenced on the Illumina MiSeq high-throughput nucleic acid sequencing platform with a MiSeq Reagent Kit v3 (2 × 300 cycle) chip (San Diego, CA, USA).

Sequence quality control and analysis

After sequencing, the data for each sample were split according to the barcode information, and the sequence quality was controlled and filtered. A sequence was recognised as valid only when the original sequence contained a complete barcode tag sequence. Splicing was performed based on the overlapping relationship between paired-end reads, and the spliced sequences were subjected to quality control filtering to remove non-specific amplified sequences containing ambiguous bases, single-base high repetitive regions, and some chimaeras generated during PCR. Sequence quality control and sequence stitching were performed using the Trimmomatic and FLASH software, respectively. USEARCH software was used to perform operational taxonomic unit (OTU) cluster analysis on the optimised spliced sequence. The OTUs were divided according to a 97% similarity level, and the optimal sequences were selected as representative. Then, we aligned the OTU representative sequences with the NCBI database using the Mothur software, annotated the species information, obtained taxonomic information, and counted the community composition and species abundance of each sample in each taxon (Bahram et al. 2018; Banerjee et al. 2018; Sharon et al. 2019; Chen et al. 2020).

Statistical analysis

QIIME software and Python were used to select the most abundant OTU sequence at the genus and species taxonomic levels as the representative sequences, perform multiple sequence alignments, and construct a phylogenetic tree (Bahram et al. 2018; Banerjee et al. 2018; Sharon et al. 2019; Chen et al. 2020). R and Python were used for alpha and beta diversity analyses based on the species and their abundance data, including diversity index and non-metric multidimensional scaling (NMDS) analyses. The Excel 2017 (Microsoft, Seattle, WA, USA) and SPSS20.0 software (SPSS, Inc., Chicago, IL, USA) were used for statistical analysis of the data, and the significance of the differences was evaluated by one-way analysis of variance. According to the abundance of each species in each sample, the SparCC algorithm was used for correlation analysis (including positive correlation and negative correlation). Statistical tests were performed using an R software package to calculate the indicator values of OTUs in each group, after which the indicator values between groups were statistically analysed (p < 0.05 was selected by default) to reveal the indicator species of each group.

Results

Microbial load status of five types of CHSs revealed by pure-culture counting

Figure 1a shows the four types of solid media used to detect TAMC, TYMC, BGB, and Salmonella in this study and their respective growth status on these media. TAMC and TYMC were both counted at appropriate dilution gradients adjusted to ensure counting accuracy. The detailed data distribution pattern is presented in Fig. 1b, c. The analysis of TAMC, TYMC, BGB, and Salmonella in the 150 batches of CHS samples showed that the average l-g (logarithm base 10) values for TAMC in the five CHSs (Ang, Ast, Gly, Cod, and Rhe) were 3.6, 3.9, 3.43, 3.01, and 2.24, respectively, with ranges of 2.600–6.300, 2.020–5.040, 2.110–5.110, 2.049–5.045, and 0.699–4.968 (Fig. 1b). The average l-g values for TYMC in Ang, Ast, Gly, Cod, and Rhe were 1.845, 2.1, 1.79, 1.52, and 1.42, respectively, with ranges of 0.70–3.23, 1–3.16, 0.70–3.11, 0.70–2.49, and 0.70–4.81 (Fig. 1c). Among the 150 CHS batches, BGB were detected in 84 batches (56.0%). Among these, 24 batches were of Ang, 17 of Ast, 18 of Rhe, 20 of Gly, and 5 of Cod. The average l-g values of BGB in Ang, Ast, Gly, Cod, and Rhe were 1.83, 1.7, 2.13, 0.77, and 2, respectively, with ranges of 0.5–3.5, 0.5–3.5, 0.5–3.5, 0.5–3.5, and 0.5–5.5 (Fig. 1d). Regarding the counts of microorganisms following pure culture, the highest to lowest microbial contamination loads were found in Ast, Gly, Ang, Cod, and Rhe. This trend was similar for TAMC, TYMC, and BGB; however, Cod showed the lowest BGB load. The TAMCs of all the batches for all five CHS types were lower than the European and Japanese Pharmacopoeia limits (107), and those of Ast, Gly, Cod, and Rhe were lower than the US Pharmacopoeia limit (106). All the batches for all five CHS types showed lower TYMCs than the European and Japanese Pharmacopoeia limits (105), and Ang, Ast, Gly, and Cod showed lower TYMCs than the US Pharmacopoeia limit (104). For BGB, the loads of Ang, Ast, Gly, and Cod were all lower than the limit standard of the Chinese Pharmacopoeia (104), whereas some samples of each of the five CHS types did not meet the limit standard of the Japanese Pharmacopoeia (102). Salmonella, which is considered one of the most important food-borne pathogens in the world, was detected in three of the 150 CHS batches. Among these, there were two batches of Gly and one of Rhe.

Fig. 1
figure 1

The four types of solid media used to detect TAMC, TYMC, BGB, and Salmonella. a Growth status of the bacteria, b amounts of aerobic bacteria, c yeasts and molds, and d bile salt-tolerant gram-negative bacteria in five types of CHSs

Microbial load of five CHSs determined by Sanger sequencing of 16 S rDNA from typical pure-cultured colonies

For each CHS, one typical colony was selected from the tryptic soy peptone agar plates, while two were selected from the violet bile salt glucose agar plates. The results of bacterial species identification based on 16 S rDNA Sanger sequencing after purification are shown in Table 1. The identified bacterial species were distributed in four families and nine genera. Among them, the highest detection rate was for Bacillaceae on the tryptic soy peptone agar plates and Enterobacteriaceae on the violet bile salt glucose agar plates, followed by Cellobacteriaceae and Pseudomonadaceae. Some important conditional pathogens, such as Enterobacter cloacae and Klebsiella pneumoniae, as well as heat-resistant microorganisms, such as Bacillus and Paenibacillus, were detected in some CHSs. These bacteria are commonly found in the environment. Bacillus bacteria are difficult to kill, and may be introduced in the process of planting, production, or processing. Therefore, there is a certain degree of medication risk for some CHSs that are brewed or taken directly. Notably, no Salmonella was detected on the violet bile salt glucose agar plates. This may be explained as follows. Before the detection of Salmonella from the xylose–lysine deoxycholate agar medium plates, enrichment culture was performed in tryptic soy peptone liquid medium and RV Salmonella enrichment liquid medium. Thus, the Salmonella was not detected until after enrichment. Therefore, in the case of very low Salmonella abundance, the probability of its detection on violet bile salt glucose agar plates or tryptic soy peptone agar plates is small, unless it is first subjected to enrichment culture in RV Salmonella enrichment liquid medium.

Table 1 The results of bacterial species identification from the typical colonies on the violet red bile salt glucose agar plates and tryptic soy peptone agar plates

Microbial load of five CHSs determined by amplicon sequencing at the microbiome level

Preliminary analysis of the off-machine sequencing data showed that the median number of clean tags per sample was 7–8 × 104, the number of valid tags was approximately 7 × 104, the tag length was around 400 bp, and the number of OTUs was approximately 3 × 103 (Supplementary Fig. S1a). Overall, the number of tags was sufficient, and each OTU was supported by more than 20 tags. The dilution curve showed that the Shannon diversity index had levelled off at the end of the extension curve, and no additional OTUs were found by adding sequencing data, indicating that the amount of sequencing was sufficient and the data reliable (Supplementary Fig. S1b). According to OTU-based microbial classification, 41 phyla, 97 classes, 231 orders, 450 families, 1200 genera, and 601 species were identified, and 200 microbial species were recognised as uncultured microorganisms. Phylogenetic analysis was performed on the top 50 OTUs with the highest abundance values, and their relative distributions in each batch of samples are shown in Fig. 2. These OTUs were mainly annotated as Proteobacteria, Firmicutes, Actinobacteria, and Bacteroidetes. There were certain differences in the relative abundance patterns of these OTUs between samples from different CHSs, and the pattern variation was lower for samples from the same CHS than that of samples from different slices. The evolutionary relationships between the identified top 100 bacteria at the genus and species levels are shown in Supplementary Fig. S2 and S3, respectively. The evolutionary tree contained abundant branches and nodes, indicating that the evolutionary status of the microorganisms in the investigated CHSs was rich. To conservatively distinguish the abundance of microorganisms in each CHS, the presence of specific OTUs in half or more of the sample batches was considered to indicate the corresponding microbial load in the CHS. Accordingly, the average numbers of phyla, classes, orders, families, genera, and species among the five CHSs were 19.2, 42.4, 101.6, 173.2, 397, and 158.8, respectively, and their respective coefficients of variation were 5.71%, 10.63%, 5.51%, 6.82%, 4.08%, and 6.22% (Table 2). Therefore, there were small variations in the bacterial richness among the various CHSs; from highest to lowest bacterial richness at both the species and genus levels, the CHSs were in the following order: Ast, Rhe, Gly, Ang, and Cod. Supplementary Fig. S4 shows the abundance distribution of the microbial genera identified in each CHS, suggesting that the abundances of the different genera in each CHS displayed a certain degree of uniformity, that is, diversity.

Fig. 2
figure 2

Phylogenetic analysis of the top 50 OTUs with higher abundance values and their relative distribution in each batch of samples

Table 2 The average number of phyla, classes, orders, families, genera and species annotated by OTUs corresponding to various CHSs

A total of 150 batches of samples contained Escherichia species with an average relative abundance of 1.35%. Among them, the average relative abundances of E. coli in Gly, Ang, Ast, Cod, and Rhe were 0.95%, 1.57%, 1.32%, 1.26%, and 1.66%, respectively. Staphylococcus microbes were detected in most sample batches, with an average relative abundance of 0.72%; the average relative abundances in Gly, Ang, Ast, Cod, and Rhe were 0.44%, 0.50%, 1.64%, 0.24%, and 0.23%, respectively. No Staphylococcus aureus was detected; Staphylococcus xylosus was the main species detected, which is less pathogenic, along with Staphylococcus squirrel, which is more pathogenic. Pseudomonas microbes were detected in 150 batches of samples, with an overall average relative abundance of 1.57% and specific values in Gly, Ang, Ast, Cod and Rhe of 2.82%, 1.61%, 1.11%, 0.72%, and 1.60%, respectively. For the common pathogenic bacteria Pseudomonas aeruginosa, the average relative abundances in Gly, Ang, Ast, Cod, and Rhe were 0.0082%, 0.0136%, 0.0118%, 0.7167%, and 1.5986%, respectively. Clostridium microorganisms were detected in 150 batches of samples, with relative abundances in Gly, Ang, Ast, Cod, and Rhe of 10.30%, 10.00%, 2.49%, 15.60%, and 1.11%, respectively. High-throughput sequencing did not detect Salmonella in any samples.

Considering the potential positive effects of many endophytic bacteria in Chinese herbal medicines (Fu et al. 2021; Wu et al. 2021), the potential of each of the most abundant bacteria genera to act as beneficial medicinal plant endophytic bacteria was determined based on the literature (Wu et al. 2021). Moreover, based on the controlled bacteria stipulated in the pharmacopoeia of various countries, including bile salt-resistant Gram-negative bacteria, heat-resistant bacteria, and Salmonella, the most abundant bacteria genera were assessed to identify any pharmacopoeia-controlled bacteria. Supplementary Fig. S5 shows the relative proportions of potential beneficial endophytes and compendial control bacteria in the top 15, 30, 45, 60, and 75 microbial genera. As the number of analysed genera increases, the numbers of known pharmacopoeia control bacteria and medicinal plant probiotic bacteria level off. Taking the top 75 genera as an example, the number of genera that can be judged as control bacteria according to the pharmacopoeia is nine, while the number of potentially beneficial endophytes of medicinal plants according to the literature is 34. We found 39 genera, but it is challenging to priorly assess whether they are controlled bacteria or probiotics. Nevertheless, this means that the proportion of endogenous beneficial bacteria was substantial, and at the same time, there was some intersection between the two groups.

Cluster and correlation analyses of the top 30 microbial genera based on relative abundance

The top 30 microbial genera based on relative abundance included Bacteroides, Clostridium sensu stricto 1, Lactobacillus, Acinetobacter, Streptococcus, Lachnospiraceae NK4A136 group, Weissella, Pseudomonas, Faecalibacterium, Rodentibacter, Alistipes, Prevotella 9, Escherichia-Shigella, Pantoea, Roseburia, Aquabacterium, Brevundimonas, Bacillus, Allobaculum, Lachnoclostridium, Citrobacter, [Eubacterium] coprostanoligenes group, Terrisporobacter, Bifidobacterium, Corynebacterium 1, Erwinia, Klebsiella, Ruminococcus 2, Parabacteroides, and Staphylococcus, as shown in Fig. 3a. Members of the Clostridium, Escherichia, Pseudomonas, and Staphylococcus genera are BGB, which are regarded as controlled bacteria in the pharmacopoeia. In addition, Klebsiella, Citrobacter, Pantoea, and other genera have been reported as pathogenic bacteria or conditional pathogens (Liu et al. 2018; Deng et al. 2019; Li et al. 2019, 2020, 2021b; Shi et al. 2019; Zhang et al. 2020; Gan et al. 2020; Song et al. 2021). Some of the top 30 microbial genera pose no health risks, including Lactobacillus, Allobaculum, Weissella, Alitipes, Bifidobacterium, Romboutsia, and Blautia (Liu et al. 2018; Deng et al. 2019; Li et al. 2019, 2020, 2021a, b; Shi et al. 2019; Zhang et al. 2020; Gan et al. 2020; Song et al. 2021). In addition, according to the genus-level community structure composition of each CHS based on the top 30 genera, different kinds of CHSs showed different abundance patterns. Among them, the evenness of the abundance distributions of each genera in Rhe and Ast was relatively high. The five CHSs showed substantial differences in the abundances of many genera. For example, the abundances of Staphylococcus, Bacillius, and Weissella in Ast were relatively high compared those in the other CHSs; Klebslella, Citrobacter, and Pseudomonas were more abundant in Gly; and Aquabacterium, Prevotella_9, Rodentibacter and Lactobacillus were more abundant in Rhe. Bacteroides was relatively abundant in Ang, Cod, and Ast, while Clostridium_sensu_stricto_1 was relatively abundant in Cod, Gly, and Ang. Cluster analysis was performed based on the relative abundance patterns of the top 30 microbial genera. Overall, the samples from each CHS were relatively clustered, with Rhe showing the highest intra-variety clustering, followed by Cod, Ast and Gly (Fig. 3b). This indicates that, for specific CHSs, the microbial compositions of the top 30 genera have inherent characteristics. The high aggregation in Rhe also indicated that the uniform characteristics of the aforementioned genera in Rhe were applicable to each batch of samples. The Ast samples were relatively dispersed and did not exhibit aggregation, indicating that the characteristics of the aforementioned Ast genera were an effect of the addition of multiple batches of samples, and the differences between the batches were substantial. The abundance distribution pattern of the top 30 genera in all 150 batches of samples was most similar to that of Ang, followed by those of Cod, Gly, and then Rhe. Additionally, there was some overlap between the samples of different types of CHSs. Based on the top 30 abundant genera and the differences in their abundances between samples, R was used to calculate correlations between the microbial genera, and the obtained numerical matrix was ​​visually displayed in a heatmap (Supplementary Fig. S6). Most genera showed significant correlations (+ or −) with each other. Several control bacteria were negatively correlated with other microbial genera.

Fig. 3
figure 3

Histogram of the relative abundance of top 30 microbial genera in each CHS and hierarchical clustering of each sample based on the relative abundance of the top 30 microbial genera

Similarities and differences in the microbial loads of different CHSs characterised by alpha diversity indices

Certain genera and species of microorganisms detected in at least half of the sample batches were used as thresholds for judging their presence in the corresponding CHS. Gly, Ang, Ast, Cod, and Rhe contained 403, 391, 413, 375, and 413 genera of microorganisms, respectively, among which 319 co-occurred in all five CHSs, 51 in four, 24 in three, and 29 in two slices. Sixty-six genera appeared in only one type of CHS (Fig. 4). Alpha diversity analysis of all the identified microbial genera and their abundances suggested that, based on the observed species index (Fig. 5a), Cod and Rhe differed significantly at the 0.1, and Gly and Rhe at the 0.5 level. Based on the phylogenetic whole-tree diversity index (Fig. 5b), there was a significant difference between Cod and Rhe at the 0.1, and Gly and Rhe at the 0.5 level. Based on the Shannon index (Fig. 5c), there were significant differences between Gly and Ang at the 0.1 level, and both Gly and Rhe, and Cod and Rhe, at the 0.01. Based on the Simpson index (Fig. 5d), there were significant differences between Gly and Ang at the 0.5 level, Gly and Rhe at the 0.01, Ang and Cod at the 0.1, and Cod and Rhe at the 0.01 level. Overall, according to their microbial abundance and diversity indices (from high to low), the CHSs were ranked as follows: Rhe, Ang, Ast, Cod, and Gly. Therefore, from a comprehensive perspective of species richness and diversity, there were both intersections and differences among the five CHSs.

Fig. 4
figure 4

Genus-level microorganism intra-group distribution and inter-group difference of the alpha diversity index in the CHSs. The four diversity indices are: observed species, phylogenetic diversity whole-tree, Shannon index, and Simpson index. The abscissa is the grouping, and different groups are distinguished by different colours. The ordinate is the index value. “ns” represents non-significant difference, *, **, and *** represent significant differences at the 0.1, 0.05, and 0.01 levels, respectively

Fig. 5
figure 5

Venn diagram of the intersection of microbial genera carried by the five CHSs

Similarities and differences in the microbial loads of different CHSs characterised by beta diversity indices and indicator species

Based on the identified microbial genera and their abundance data, NMDS three-dimensional clustering was performed on all samples using different algorithms (Fig. 6 and Supplementary Fig. S7). The Euclidean algorithm showed a better fit compared to other algorithms, such as the Bray-Curtis, Binary-jaccard, unweighted unifrac, and weighted unifrac. Overall, samples from the same CHS type samples were relatively clustered; Rhe showed the greatest aggregation. In contrast, the various CHSs were not clearly distinguished in the NMDS three-dimensional space and there was some crosstalk among them. The top 100 indicator species were identified based on the species and their abundance (Fig. 7). These species were found in all five CHSs, but each CHS showed some dominant species, such as OTU_10229 (Pseudomonas koreensis) in Ang, OUT_138 (Ambiguous taxa in Bacteroides genus) in Ast, OTU_12506 (unidentified species in Pseudomons genus) in Cod, OTU_349 (uncultured bacterium) in Gly, and OTU_16 (Ambiguous taxa in Escherichia-Shigella genus) in Rhe. These results suggest that there are both similarities and differences in the characteristics of the microbial loads of these five traditional CHSs.

Fig. 6
figure 6

NMDS analysis performed with the Euclidean algorithm based on the data on microbial genera in each sample and their relative abundance. The Euclidean algorithm showed a better fit compared to other algorithms, such as the Bray-Curtis, Binary jaccard, unweighted unifrac, and weighted unifrac. The coordinate axes NMDS1, NMDS2, and NMDS3 are three sorting axes. Each point in the figure represents a sample, and the same colour represents the same CHS. Samples showing greater differences are farther apart in the figure

Fig. 7
figure 7

Analysis of indicative species in various CHSs. The abscissa of the bubble graph represents the sample grouping. The bubble size represents the indicator value of each OTU-represented species in the sample group, that is, the indicative size of the species in the group. The sequence of the OTU and the annotations of its various taxa (kingdom, phylum, class, order, family, genus and species) are detailed in Supplementary Table S1

Prediction of the function of microorganisms in CHSs

After Clusters of Orthologous Groups (COG) function annotation of 16 S rDNA, 25 COG items, including energy production and conversion, amino acid transport and metabolism, carbohydrate transport and metabolism, lipid transport and metabolism, and secondary metabolite biosynthesis, transport, and catabolism, were predicted (Supplementary Fig. S8). These items were distributed in the five CHSs. Forty-one Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways, including degradation and metabolism of foreign substances, lipid metabolism, polysaccharide biosynthesis and metabolism, terpenoid and polyketone metabolism, secondary metabolite biosynthesis, and amino acid metabolism, were predicted by KEGG function annotation of the 16 S rDNA. Rhe showed the highest annotation intensity in the KEGG pathway analysis, whereas Cod showed the lowest annotation intensity. The annotation patterns of the KEGG pathways of Ang, Ast, Gly, Cod, and Rhe were slightly divergent (Fig. 8).

Fig. 8
figure 8

KEGG function prediction based on 16 S rDNA. The relative proportions of predicted KEGG secondary entries characterised by microorganisms in each CHS are shown in the heatmap

Discussion

Previous studies have verified the feasibility of genetic detection technology for microbial evaluation in TCM, including high-throughput sequencing of bacterial 16 S rDNA amplicons, which is suitable for the rapid detection of non-predetermined bacterial species and diversity (Deng et al. 2019; Li et al. 2019; Shi et al. 2019; Gan et al. 2020). Compared with the results of pure culture and other molecular detection techniques, such as high-throughput gene microarrays, bacterial 16 S rDNA high-throughput sequencing detects more bacterial diversity. The main bacterial genera in the CHSs detected with the high-throughput gene microarray method can also be detected with 16 S rDNA amplicon high-throughput sequencing, in addition to bacterial genera that cannot be detected using high-throughput gene microarrays (Liu et al. 2018; Deng et al. 2019; Li et al. 2019, 2020; Shi et al. 2019; Gan et al. 2020). However, bacterial 16 S rDNA amplicon high-throughput sequencing uses the bacterial 16 S rDNA subunit variable region as a template to design primers, which are relatively conservative and not sufficiently specific. Additionally, because of the preferences of the PCR amplification process, the resolution for some low-concentration DNA templates may not be ideal (Hugerth and Andersson 2017; Deng et al. 2019; Golebiewski and Tretyn 2020). For example, in this study, Salmonella was detected in two batches of Gly and one of Rhe after enrichment and culture in accordance with the method stipulated in the pharmacopoeia; however, based on high-throughput sequencing, no Salmonella was detected in any of samples. In this study, by using large samples, sufficient sequencing data were obtained. The number of valid tags was approximately 7 × 104, the tag support for each OTU was above 20×, and the dilution curve flattened out. Based on these large samples and big data, we identified more than 1000 bacterial genera in 5 types of CHSs. Few similar studies have reported such detailed information on microbial genus/species identification in CHSs. Increasing the number of samples and performing deep-sequencing can partially compensate for the poor resolution of low-abundance bacteria and poorly biased templates in 16 S rDNA amplicon high-throughput sequencing. In addition, Enterobacter, Pseudomonas, and Bacillus, with the highest relative abundances identified in previous related studies, were among the top 30 microbial genera determined by amplicon sequencing at the microbiome level (Liu et al. 2018; Deng et al. 2019; Li et al. 2019, 2020; Gan et al. 2020), as well as the main genera determined by Sanger sequencing of 16 S rDNA from the typical pure-cultured colonies. These results demonstrate the applicability of this research method, rationality of the quality control measures, and reliability of the results. They also show that high-throughput sequencing technologies and traditional pure-culture technologies each have their own advantages for microbial contamination detection in traditional CHSs. The combination of the two technologies can fully assess the microorganisms present in CHSs, and, at the same time, reduce the misdetection rate of extremely low-abundance control bacteria (owing to the enrichment steps of traditional pure-culture technology).

Based on large samples and big data from microbiome sequencing, we established a database of the microbial genera/species and their abundance information for various CHSs. Multidimensional analysis showed that, in their bacterial loading pattern, the similarities and individuality of the five types of CHSs coexisted. Most of the microbial genera were commonly found in all the CHSs examined, and the differences in the richness and diversity indices between some CHSs were not significant. In the hierarchical clustering and NMDS analysis, there was some overlap among samples from different CHSs. There were significant differences in the richness and diversity indices among multiple CHSs. Additionally, samples from the same type of CHS were relatively clustered in hierarchical clustering and NMDS analysis; each type of CHS had a dominant indicator species. This feature can be traced back to the similarities and individuality of the sources of microbial contamination. For different CHSs (between groups), the host plants and the methods of processing and storage are important factors that cause differences in the structure of the loaded microbial community. First, the relationships between host plants and specific endophytic bacterial isolates may exhibit parasitic or symbiotic characteristics, depending on the host’s genotype, tissue, and health status, among other factors (Cope-Selby et al. 2017). Differences in the physiology, metabolism, and growth habits of different species affect their ability to recruit various endophytic bacteria (Campisano et al. 2014). Differences in the processing methods of different Chinese herbal medicines and CHSs determine the direction and extent of changes in the microbial community loaded onto the CHSs during the process from harvested fresh plants to Chinese herbal medicine products. In our study, many detected bacteria are widely present in nature and in the intestines of humans and animals, suggesting that one of the main sources of bacteria in CHSs is the soil background and the fertilisation and planting management processes. CHS processing is also an important source of microbial contamination. For example, the detection of desulphurisation microorganisms in some sample batches indicates that an illegal sulphur fumigation treatment may have been used, but also shows that different processing methods specifically introduce or enrich certain types of microorganisms. For the same CHSs (within groups), our samples had different origins and manufacturers. Although the main technical steps of planting and processing were similar, they experienced spatiotemporal differences. First, the community structure of endophytic bacteria is influenced by multiple spatiotemporal factors, including season, altitude, latitude, longitude, and soil conditions (Chiellini et al. 2014; Yang et al. 2017). The bacterial endophytic community structure changes with environmental conditions and constitutes a key mechanism in the physiological strategy of plants to adapt to their environment (Wu et al. 2021). Second, the variability of the microorganisms from different spatiotemporal environments suggests that there are differences in the microorganisms introduced during processing. In addition, the health of the host plant can also affect the endophyte colonisation. Pathogen-mediated surface damage to the host promotes the establishment of certain endophytes and disrupts an initially stable microbial ecosystem (Bogas et al. 2015; Wu et al. 2021). The differences in the microbial communities of the CHS caused by this factor contribute to both between-group and within-group differences. Overall, the differences between groups were much larger than those within groups, which suggests that the main reasons for the differentiation of the bacterial loads of CHSs are the differences in the host plants and their processing methods. The spatiotemporal differences characterised by different origins, manufacturers, and the health status of the plant batches should be considered as important uncertain factors in the quality control of different batches of CHSs.

The huge microbial community of CHSs revealed in this study suggests that even after heating or alcohol disinfection treatments, potential microbial pathogenic risks cannot be completely removed, particularly those due to heat-resistant bacteria and the secondary metabolites of various microorganisms (Deng et al. 2019). It is worth noting that the effects of CHS microbial loads on medicinal efficacy and human health are extremely prominent and complex. In general, the effects of loaded bacteria on the efficacy of CHSs can be summarised into different models based on the plant, processing, and storage stages. At the plant stage, the growth, stress resistance, and medicinal active substance accumulation modes of medicinal plants determine the final efficacy of Chinese herbal medicines and CHS. Studies have shown that many medicinal plants have a mutually beneficial relationship with endophytic bacteria (Wu et al. 2021). For example, endophytic bacteria can promote the growth of medicinal plants in the cultivation stage (Fouda et al. 2021), improve the stress resistance of host plants (Hong et al. 2018; Płociniczak et al. 2019), promote the accumulation of biologically active compounds in medicinal plants (Ek-Ramos et al. 2019), or convert medicinal plant metabolites into biologically active compounds (Song et al. 2017a, b). The most common endophytic bacteria, including Bacillus, Enterobacter, Pseudomonas, Pantoea, and Streptomyces, can affect medicinal plant growth, stress resistance, and metabolism. In addition, the metabolites of the endophytic bacteria themselves may serve as important pharmaceutical active ingredients (Rustamova et al. 2020). Recently, the concept of the “microbiome–microbial metabolites–plants” ecosystem has been proposed to explain the mechanisms of action of effective active substances in TCM against COVID-19 (Fu et al. 2021). Leupeptin-producing Streptomyces species transfers leupeptin, an active pharmaceutical ingredient, into certain anti-COVID-19 Chinese medicines through a symbiotic relationship with host plants. During the processing and storage stages, part of the endophyte transformation system continues to function; however, the newly introduced microorganisms intervene in the “microbiome–microbial metabolites–plant” ecosystem, owing to the transformation of the microorganisms or the promotion of the drugs. The accumulation or decomposition of the active ingredients in drugs can transform them, reducing drug efficacy. Therefore, CHSs can act as substrates, while the entire microbial community acts as a transformant to form a complex microbial transformation system and generate complex secondary metabolites (Xin et al. 2006; Xu et al. 2008; Qi et al. 2009; Song et al. 2017a, b; Shi et al. 2019; Wu et al. 2021). Microbial transformation can lead to two results. First, it can help in the generation of structural analogues of the active ingredients in CHS, enhancing efficacy (Xin et al. 2006; Xu et al. 2008; Qi et al. 2009; Song et al. 2017a, b; Wu et al. 2021). Second, it can cause the loss of active ingredients, the production of toxic substances, and side effects (Shi et al. 2019). However, current research has mainly focused on the effects of rhizosphere, surface, and endophytic microorganisms on the efficacy of drugs at the plant stage. For post-harvest processing and storage, the dynamic changes of microbial communities and their effects on drug efficacy are less understood. In this study, based on 16 S rDNA, the bacterial diversity in CHSs were revealed and the potential biological functions were annotated. The annotated COG entries included the synthesis of secondary metabolites, and the KEGG entries included the degradation and metabolism of exogenous substances, terpenoid and polyketone metabolism, secondary metabolite biosynthesis, and biosynthesis of grade metabolites, showing that the microbial transformation of active ingredients caused by the microorganisms in CHSs is an extremely important issue. These results provide basic data for the control of the microbial contamination in CHSs. Moreover, they have certain implications for research on the “microbiome–microbial metabolites–plants” ecosystem in Chinese herbal medicines during processing and storage.

In this study, by combining the pure-culture technology prescribed by the pharmacopoeia and amplicon sequencing-based microbiome technology, the microbial profiles of five typical CHSs were mapped, revealing their ubiquitous species-rich microbial communities. The high-throughput sequencing method can quickly and comprehensively analyse the diversity of contaminating microorganisms in CHS samples; when combined with traditional culture methods, the misdetection rate of low-abundance control bacteria can be reduced, allowing a more objective assessment of the risk of microbial contamination in traditional CHSs. In terms of the community structure reflected by microbial species and their abundances, the five CHSs studied here share some common features as well as displaying their own individual properties. Predicted functional terms based on bacterial communities suggest potentially active microbial transformation systems with CHS as the substrate and microbial communities as the transformants. Previous studies have revealed the close relationship between microbial community structure and function, and highlighted the important impact of microbial transformation on TCM (Song et al. 2017a, b; Bahram et al. 2018; Sharon et al. 2019; Wu et al. 2021). Our work intuitively presents the huge microbial community in CHSs and the non-negligible variability of the microbial community structure across different CHSs and different batches of the same CHS. Therefore, it is crucial to further assess the relationship between microbial community structure and function in CHSs to promote CHS quality assessment and the research on the microbial transformation of TCM. In the future, researchers can more conveniently use microbiome strategies, such as diversity, meta-genomics, macro-transcriptome, and macro-metabolic analyses, to analyse the composition and macro-metabolic networks of entire microbial communities. Combining the advantages of traditional pure cultures and high-throughput sequencing-based microbial community structure and function analyses in microbial community safety assessments, including the microorganisms themselves and the products formed by transformants, will expand the dimension of the microbial safety evaluation of CHSs and improve their overall quality.