Introduction

Microorganisms are the decomposers that play an indispensable role in the material exchange and energy operations of the ecosystem [1]. However, high energy efficiency, advanced petroleum extraction technology, and widely distributed oil wells promote the use of petroleum in industries. Various petroleum industry activities have resulted in serious oil contamination in the environment [2]. Petroleum pollution affects the growth and reproduction of many kinds of microorganisms, resulting in changes in the soil microbial diversity [3].

In the soil, petroleum will bind soil particles together, which leads to low water and oxygen content [4]. Organic petroleum easily combines with inorganic nitrogen and phosphorus to limit the nitrification and dephosphorization of soil, slowing the circulation of nutrients. In addition, organic matter in petroleum can greatly increase soil organic carbon content. These changes in the physicochemical properties of the soil will change bacterial communities [5].

Previous studies on microorganisms were based on traditional separation culture [6,7,8,9], but most of the microorganisms in environmental samples cannot be cultured; therefore, it is difficult to objectively evaluate microbial diversity. Modern molecular biotechnology overcomes the shortcomings of pure culture methods and can estimate species diversity and richness from the genetic level, analyze species variation, and accurately understand the composition of microbial communities in the environment [10]. Based on 16S rRNA molecular marker technology, analysis of microbial diversity has become one of the important methods of microbial ecology research [11].

The soil alkane-degrading bacterial communities in the Karamay and Daqing Oilfields mainly consist of Pedobacter, Mycobacterium, and unknown alkB-harboring microorganisms [12]. The total petroleum hydrocarbon (TPH) concentration negatively correlated with soil microbial biomass carbon and positively correlated with microbial biomass nitrogen, but had no correlation with the evenness index of the Yellow River Delta region [13]. TPH was the most important factor that shifted the diversity of soil bacteria in Oman, USA [14]. Moreover, the relationship between the petroleum contamination gradient and bacterial diversity is still unclear [15].

Bioremediation has major advantages in utilizing the hydrocarbon-degrading microorganisms from environments to eliminate contaminants and has been considered as an efficient, environmental-friendly, and cost-effective technology [16]. It is essential and valuable to make in-depth investigations on microcommunities of the ecosystem. Cenozoic deposits and underground freshwater resources are abundant in the Dagang Oilfield of Tianjin, and they provide versatile environmental conditions and resources for soil microbes. We have focused on the diversity of the soil bacterial community and its correlation with environmental factors in the oil-contaminated soils of the Dagang Oilfield.

The purpose of this study is as follows: (1) to study the bacterial community structure in surface and oil-contaminated deep soils and compare it to nonpetroleum-contaminated soils; (2) to clarify the relationship between bacterial community diversity and the several environmental factors of petroleum-contaminated soil; and (3) to ascertain the microbial resources for further exploitations.

Materials and Methods

Geography of the Dagang Oilfield and Soil Collection

The Dagang Oilfield is located in southeastern Tianjin in between the Bohai Sea and the Hebei Province. It covers more than 20 districts, cities, and counties in Tianjin, Hebei, and Shandong Provinces. Since 1964, the oil production field area has covered approximately 18,716 km2, including land, marine tidal flats, and shallow sea areas. The Dagang Oilfield contains abundant oil resources and very broad exploration prospects.

Soil samples were gathered from the Dagang Oilfield in January 2019. A total of 26 soil samples, including 3 pairs of nonpetroleum-contaminated surface and deep soil samples and 10 pairs of oil-contaminated surface and deep soil samples, were collected. The sampling strategy was as follows: (1) the distance between each sampling point exceeded 100 m, and (2) different locations with various TPHs were sampled. Each site was marked as a square with a side length of 20 cm. Surface soils (5–10 cm) and deep soils (35–40 cm) were collected by a five-point sampling method. A total of 30 g of soil was collected from each site; afterward, a 2-mm filter was used to filter the soil sample. Each soil sample was divided into two parts: one for the determination of environmental factors and the other for DNA sequencing. The collected soil samples were air-dried and stored at − 80 °C.

Physical and Chemical Analyses

The pH value was measured in a 1:5 (w/w) aqueous solution with a pH meter (PHS-3C, INESA, Shanghai, China). The electrical conductivity (EC) was measured in a 1:5 aqueous solution with a conductivity meter (FE30, Mettler-Toledo, Switzerland). The water content (WC) was detected by oven-drying fresh soil to a constant weight at 105 °C. The total organic carbon (TOC) content was measured by the dilution-heating method [17]. Total nitrogen (TN) content was determined using the Kjeldahl method [18]. Determination of total phosphorus and total potassium was done by the NaOH melting method [19].

DNA Extraction and Sequencing

The cetyltrimethyl ammonium bromide (CTAB) method was used to extract bacterial DNA from the soil (TIANamp Soil DNA Kit). Then, the sample was diluted to 1 ng/μL with sterile water. The qualitative analysis of DNA was conducted using agarose gel electrophoresis. The polymerase chain reaction (PCR) products were sequenced by an Illumina HiSeq. 2500 platform (250-bp paired-end reads).

Analysis of Illumina Sequencing Data

The raw sequence data were processed. Ambiguous data were removed, and the effective tags applied to the operational taxonomic unit (OTU) analysis were retained by QIIME (version 1.2.3.3). OTUs were chosen using the preferred open-reference OTU picking strategy in QIIME, and USEARCH was used to cluster sequences with similarity higher than 97% together to form an OTU. A representative sequence was chosen from each OTU, and the taxonomy was assigned to each of the representative sequences with a Ribosomal Database Project Classifier against Greengenes at a confidence threshold of 0.8 in QIIME. Representative sequences were aligned against the Greengenes core set using PyNAST. Then, using FastTree, a phylogenetic tree was generated, which was used to calculate the weighted UniFrac distance.

Alpha diversity indices, which include observed species, the Shannon diversity index, and the Simpson diversity index, were calculated using QIIME. Principal coordinates analysis (PCoA) based on a weighted UniFrac distance matrix was calculated in the R Stats package (version 3.3.1). A heat map was drawn using the pheatmap package in R. Canonical correlation analysis (CCA), analysis of similarities (ANOSIM), and a mental test were conducted using the vegan package in R.

Data Availability

The original sequences of the research were deposited at the National Center of Biotechnology Information (NCBI) Sequence Read Archive under the accession number: PRJNA545042.

Results

Physical and Chemical Properties of Soils

The environmental parameters of 26 soil samples are shown in Table 1. The range of the TPH gradient was 0–0.4043 g/g, WC gradient was 0.1128–0.4070 g/g, pH gradient was 7.6–8.9, EC gradient was 735–8150 μs/cm, and TOC gradient was 6.880–172.000 mg/g. These environmental factors have a wide range of values, especially for TPH. Generally, the TPH of the surface soil in each pair was greater than that of the deep soil. Samples were divided into four groups based on the TPH levels from low to high: normal (NS11, ND11, NS12, ND12, NS13, ND13), petroleum level 1 (OD5, OD6, OD7, OD8, OD9, OD10, OS8; TPH 0.006–0.018 g/g), petroleum level 2 (OD3, OD4, OS3, OS4, OS7, OS9, OS10; TPH 0.025–0.050 g/g), and petroleum level 3 (OD1, OD2, OS1, OS2, OS5, OS6; TPH 0.108–0.4303 g/g).

Table 1 Geographical, physical, and chemical properties of soils

TPH was significantly negatively correlated with water content and significantly positively correlated with potassium (Supplementary Table S1). WC was significantly negatively correlated with potassium. The pH value was significantly negatively correlated with EC and potassium. EC was significantly positively correlated with TN, TP, and potassium. TN was significantly negatively correlated with potassium.

Statistics for 16S rRNA Sequencing Data

After sequencing the V4 region of the 16S rRNA of 26 soil samples, the number of total tags was 1,737,904, and the average number of tags per sample was 66,842. A total of 81,548 OTUs were obtained through OTU cluster analysis (Supplementary Fig. S1), and the average number of OTUs per sample was 3136. OS7 (5060) had the highest number of OTUs, and OD10 (804) had the lowest.

For OTU picking, we obtained 9724 OTUs from 26 soil samples. The sum of OTUs in oil-contaminated soils (OS: 7389; OD: 7534) was higher than that of nonpetroleum-contaminated soils (NS: 5209; ND: 5007) (Fig. 1). Moreover, shared OTUs took up more than 67% between NS and ND (3497/5209) and OS and OD (6035/7389). And shared OTUs took up about 56% between NS and OS (4268/7389) and ND and OD (4256/7534). The number of OTUs shared by all four groups was 3146. However, 775 and 948 species of bacteria were unique to the oil-contaminated surface and deep soils, respectively. These values demonstrate that the oil-polluted environment had produced a large number of new OTUs, and these bacteria are likely to use petroleum hydrocarbons as a carbon source.

Fig. 1
figure 1

Venn diagrams of OTUs included in four groups of soils

Bacterial Community Composition and Alpha Diversity Analysis

At the phylum level (Fig. 2), Proteobacteria (30.3–54.2%), Firmicutes (6.3–16.9%), and Bacteroidetes (9.2–14.8%) were the three most abundant species found in NS. Proteobacteria (8.0–86.6%), Cyanobacteria (24.5–25.7%), and Firmicutes (6.7–61.6%) were the three most abundant species found in ND. Proteobacteria (34.8–61.9%), Firmicutes (4.6–29.7%), and Bacteroidetes (4.0–13.8%) were the three most abundant species found in OS. Proteobacteria (8.0–86.3%), Firmicutes (6.6–61.2%), and Actinobacteria (0.1–15.7%) were the three most abundant species found in OD.

Fig. 2
figure 2

Column accumulation chart of species abundance (phylum level). a From left to right, TPH gradually increases across 26 samples and b four group samples

ANOSIM results showed that there were significant differences in bacterial communities between the NS and ND, NS and OS, and ND and OD. It also showed that there were no significant differences in bacterial communities between OS and OD (Supplementary Table S2).

Observed species were more numerous, and bacterial diversity was richer in the surface soil samples than in the deep soil samples for both nonpetroleum- and oil-contaminated soils. In addition, normal surface soils had the most diverse evolutionary lineages (Supplementary Table S3).

Rarefaction curves of the number of observed species number in most soils gradually plateaued. However, in OS7, OD9, OS2, and OD4, the slope of the rarefaction curves was still steep when the sequencing data volume reached 40,000, indicating that if the sampling continued, new OTUs would be detected (Supplementary Fig. S2). However, as shown in Fig. S3, the end of each sample curve tended to be straight, indicating that the diversity of microbial communities would not increase. These results indicate that the number of sequencing samples can sufficiently predict microbial diversity (Supplementary Fig. S3).

Correlations Between Environmental Factors and the Bacterial Community

TPH, TOC, and TN had the greatest impacts on the soil bacterial community in the Dagang Oilfield; pH had a significant impact on surface soils but no significant impact on deep soils. TP had a significant impact on deep soils but no significant impact on surface soils. However, TN had a significant impact on all soils. WC, K, and EC had no significant impacts on the soil bacterial community in the Dagang Oilfield (Table 2).

Table 2 Effects of environmental factors on bacterial abundance in soil

TPH was significantly negatively correlated with the bacterial diversity and the number of bacterial species; pH was significantly negatively correlated with bacterial diversity. TOC was significantly negatively correlated with the number of bacteria and the number of observed bacterial species. TN was significantly negatively correlated with the number of observed bacterial species. TP was significantly negatively correlated with sample coverage (Supplementary Fig. S4).

The two main axes explained 44.29% of the soil bacteria diversity variations. The soil bacterial communities were clustered by color, indicating that the TPH determines the soil community (Fig. 3). In surface soils, the most important environmental factor affecting the distribution of bacterial communities was TOC, followed by TPH and TN. In deep soils, the most important environmental factors affecting the distribution of bacterial communities were TPH, followed by TOC and TN. In summary, both CCA and the mental test indicated that TPH, TOC, and TN had the greatest impact on the soil bacterial community in the Dagang Oilfield.

Fig. 3
figure 3

Redundancy analysis (RDA) of a surface soils and b deep soils (The influence of environmental factors on the bacterial community is indicated by arrows. The length of each arrow represents the impact of the corresponding environmental factor on the community distribution. The longer is the length of the arrow, the greater is the effect.)

Effects of Petroleum Pollution Concentration on the Bacterial Community

The OTUs with the largest abundance differences in soil bacteria were listed in the LEfSe analysis plot. Cyanobacteria, Rickettsiales, Sphingomonadales, Thermoleophilia, Rhizobiaceae, Sphingomonadaceae, Bradyrhizobiaceae, Solirubrobacterales, Bradyrhizobium elkanii, Bradyrhizobium, Galellales, and Methylobacteriaceae were suitable to be grown in nonpetroleum-contaminated soils. Acidobacteria, Bacteria, Acidobacteriales, Faecalibaculum, Bryobacter, Rhizomicrobium, Acidibacter, Candidatus Solibacter, Sphingobacteriaceae, Granulicella, and Halanaerobiales Mucilaginibacter were suitable to be grown in oil-contaminated soils (TPH 0.0250–0.0500 g/g). Acidobacteria, Bacteria, Acidobacteriales, Faecalibaculum, Bryobacter, Rhizomicrobium, Acidibacter, Candidatus Solibacter, Sphingobacteriaceae, Granulicella, Halanaerobiales, and Mucilaginibacter were suitable to be grown in oil-contaminated soils (TPH 0.108–0.4043 g/g) (Supplementary Fig. S5).

Most samples could be clustered together according to the TPH content. Phyllobacterium and Ralstonia were widely distributed in soils (except for OD1, OS3, and OS5), indicating that they were the predominant bacteria in the Dagang Oilfield. Therefore, they had a powerful ability to adapt to the environment. Streptococcus and Pontibacter were the two most abundant genera in nonpetroleum-contaminated surface soils. Unidentified-Chloroplast and unidentified-Mitochondria are the two most abundant genera in nonpetroleum-contaminated deep soils (Supplementary Fig. S6). Marinobacter, Alcanivorax, and Mycobacterium were the three most abundant genera in oil-contaminated soils. In addition, Alcanivorax, Mycobacterium, Pseudomonas, and Micrococcus are petroleum hydrocarbon-degrading bacteria. These results show that the soil of Dagang Oilfield has abundant oil-degrading microorganisms.

The two main axes explained 37.26% of the soil bacterial diversity, and soil bacterial community was clustered by both color and size, indicating that TPH, TN, and TOC determine the soil community (Fig. 4).

Fig. 4
figure 4

PCoA of a TPH and TOC and b TPH and TN

Soil Bacterial Metabolic Pathway Prediction (Third Level) by KEGG

Oil-contaminated soils had more methane metabolic pathways and more polycyclic aromatic hydrocarbon degradation pathways than normal soils (Fig. 5). Methane metabolic and polycyclic aromatic hydrocarbon degradation pathways use petroleum hydrocarbons as substrates. OS8, OD6, OD5, OS2, OS9, and OD10 had abundant methane metabolic pathways (Z > 3). OS5, OD6, OS6, OS10, and OS4 had abundant polycyclic aromatic hydrocarbon degradation pathways (Z > 3).

Fig. 5
figure 5

Bacteria metabolic pathway heat map (third level). (The ordinate is the top 35 most abundant metabolic pathways; the abscissa is a sample clustering tree of the evolutionary relationships of metabolic pathways. Z value: normality of relative abundance.)

Discussion

In oil-contaminated soils, Euryarchaeota was the most abundant phylum of Archae bacteria, which appears to be consistent with the study of Sutton et al. [20]. When TPH was 0.04–0.4043 g/g (except in OS3 and OD1), the relative abundance of Gammaproteobacteria was positively correlated with the concentration of oil pollution (Fig. 3), which agrees with a previous report [21]. In oil-contaminated soils, Proteobacteria and Firmicutes were the dominant bacteria, suggesting that they may have petroleum hydrocarbon metabolic pathways (Fig. 2). Yakimov et al. [22] demonstrated that Proteobacteria and Firmicutes can use petroleum hydrocarbons as a carbon source. The relative abundance of Cyanobacteria in nonpetroleum-contaminated soils was higher than that in oil-contaminated soils for both surface and deep soils, indicating that petroleum pollution had inhibited the growth of Cyanobacteria (Fig. 2). However, Cyanobacteria can use petroleum hydrocarbons as carbon source in oil-contaminated water [23]. We speculate that the dispersion and accumulation of petroleum in water are different from that in soil. Petroleum spills in water show horizontal diffusion and dilution in water. However, petroleum spills on land show vertical penetration and absorption by humus [24]. This phenomenon leads to the accumulation of volatile poisonous substances in the soil, causing a significant damage to the Cyanobacteria.

Proteobacteria, Chloroflexi, Euryarchaeota, and Verrucomicrobia belong to the top 10 most abundant phyla in oil-contaminated soils (Fig. 2), and their metabolic pathways are interconnected. Chloroflexi can degrade petroleum hydrocarbons by anaerobic respiration. Proteobacteria can consume petroleum hydrocarbons and produce organic acids [25], while Euryarchaeota can use these organic acids as intermediates [26]. Verrucomicrobia can also convert methane into formaldehyde and carbon dioxide [27].

ANOSIM showed that there were significant bacterial community differences between surface and normal deep soils but no significant bacterial community differences between surface and oil-contaminated deep soils (Supplementary Table S2) [28]. We assume that oxygen content and nutrients in nonpetroleum-contaminated soils are different at different depths. However, in oil-contaminated soils, petroleum pollution is the main factor affecting community diversity. Oil adheres to soil particles and binds them together, which reduces the difference in the soil oxygen content between the surface and deep layers. As a result, the effects of depth on the distribution of bacterial communities in oil-contaminated soils are reduced. We found that most oil-contaminated soils could be clustered based on TPH (heat map, Supplementary Fig. S7).

LEfSe [29] has shown that the TPH preference of bacteria has generated a novel way to rehabilitate oil-contaminated soils: the addition of oil-degrading bacteria to oil-contaminated soils [8]. This targeted bioenhancement method not only improves the efficiency of the repair of oil-contaminated soils but also reduces the cost.

Redundancy analysis (RDA) and mental test showed that TPH, TOC, and TN had the greatest impact on the soil bacteria community in the Dagang Oilfield (Table 2 and Fig. 3). The nutrient element that had the greatest impact on bacterial diversity was TN, which is consistent with the findings of Ward et al. [30]. The TPH was significantly negatively correlated with bacterial diversity (Supplementary Fig. S4). However, there have been varying conclusions about the impact of oil pollution on the diversity of soil bacteria. Li et al. [9] found that the oil content of soils was positively correlated with bacterial diversity in the Daqing and Karamay Oilfields. Gao et al. [13] found that the oil pollution concentration in the oil-contaminated Yellow River Delta area was positively correlated with the soil microbial diversity and evenness index. However, Nakatsu [31] and Green et al. [32] found that the concentration of PAHS was negatively correlated with the diversity of microorganisms. Wilson and Jones [33] found that an increase in oil content reduced the soil microbial diversity in the Loess Plateau.

Ward et al. [30] discovered that the Amoco oil degradation rate was much higher under aerobic conditions than that in anaerobic conditions. When petroleum hydrocarbons were used as a carbon source, sufficient trace element content had a decisive influence on the microbial degradation of hydrocarbons. Margesin et al. [34] found that after adding N, P, and other nutrients to oil-contaminated areas, the total number of bacteria increased several fold over a short time and was accompanied by a rapid decrease in the content of n-alkanes and isoprenoids. This observation indicates the importance of environmental factors to petroleum-degrading bacteria in the repair of petroleum-contaminated soils. The exploration of bacterial community diversity and its correlation with environmental factors in the oil-contaminated Dagang Oilfield soils has greatly contributed to the use of biostimulation to repair petroleum-contaminated soils.

Conclusions

In this study, we investigated 26 samples from the Dagang Oilfield to clarify the bacteria community distribution along several environmental gradients in both surface (5–10 cm) and deep (35–40 cm) petroleum-contaminated soils. There were significant differences in the structure of bacterial communities between nonpetroleum-contaminated surface and subsurface soils, but no significant difference was observed in the structure of bacterial communities between petroleum-contaminated surface and deep soils. We also found that total petroleum hydrocarbon, total organic carbon, and TN content were the three main drivers of the bacteria community distribution in the Dagang Oilfield. Our findings can provide references for sampling strategy and bacteria community compositions in petroleum-contaminated soils. Also, the distribution patterns of the bacteria community and taxa, especially some petroleum-degrading bacteria (mainly Alcanivorax and Mycobacterium), along the environmental gradients in petroleum-contaminated soils provided references for further new gene exploitations, such as some petroleum hydrocarbon degradation genes. Finally, further in-depth investigations on microbial communities in petroleum-contaminated soils in large scale or with greater depth were expected.