Introduction

Dissolved organic matter (DOM) is a complex mixture of diverse organic compounds. DOM is considered to have significant impacts on the characteristics of contaminants in the environment, including trace metal speciation, bioavailability, and toxicity [1]. Preliminary research on the heavy metal distribution in landfill leachate revealed a significant association between a large fraction of such metals and DOM derived from municipal solid waste (MSW), suggesting that DOM plays a critical role in controlling the behavior and transport of heavy metals [2]. Nevertheless, the complex nature of DOM poses a challenge to fully elucidating its characteristics within landfill leachate.

Various spectroscopic and mass spectrometric techniques have been employed to examine the characteristics of DOM. Optical characterization techniques, such as ultraviolet-absorbance spectroscopy and three-dimensional excitation and emission spectroscopy (3D-EEM) followed by chemometric analysis, have yielded valuable information about the overall properties of DOM, including its aromatic content, molecular weight, and the relative proportions of humic, fulvic, and protein-like constituents [3]. However, these methods are limited to the detection of chromophores or fluorophores and do not provide detailed molecular information. More comprehensive assessment of DOM at the molecular level has been made possible through the application of nuclear magnetic resonance (NMR) and high-resolution/ultra-high resolution mass spectrometry (HRMS/UHRMS) [4,5,6]. While Fourier-transform ion cyclotron resonance (FTICR) mass spectrometry (MS) is considered an advanced technology for DOM analysis, its widespread use is hindered by the high costs associated with instrument acquisition, maintenance, and operation. In recent years, the Orbitrap analyzer has emerged as a promising alternative due to its accessibility and suitability for analyzing complex mixtures such as DOM [7].

Orbitrap MS has been widely employed to clarify the characteristics of DOM in various environments [7,8,9], including in landfill leachate [10, 11]. Previous studies have mainly explored DOM characteristics in various landfill leachate sources in China: DOM fingerprints were investigated in MSW and pits of an incineration plant [10], DOM molecular transformation was evaluated in leachate from MSW stored for incineration [12], DOM characteristics of young and old leachate from the landfill samples derived from an anaerobic landfill and stockpile sources of MSW were compared [6], leachate DOM transformation during membrane bioreactor treatment was assessed [13], and leachate DOM derived from MSW leachate plants was compared to molecular information from a different primary membrane system [14]. However, differences in the composition of the wastes cause differences in DOM characteristics. In European countries and Japan, incineration has been the prevailing intermediate treatment for MSW since the 1970s. Incineration residues include bottom ash and fly ash, which are ultimately deposited in sanitary landfills. In general, organic matter concentrations are lower in incinerated residue landfills than in direct landfills, and the impact of organic matter is less significant. Nonetheless, certain organic matter types, such as fractions of unburned and organic byproducts, are more prevalent in incineration residues than other landfilled materials [15, 16]. Investigations of the characteristics and differentiation of DOM in landfill leachate derived from incineration residues and incombustible waste have been limited. Although the DOM concentrations in landfill leachate recorded in this study were not very high, incineration residues (fly ash and bottom ash) are considered to have high concentrations of heavy metals and other toxic compounds, which have the potential to significantly impact the surrounding environment. The interactions between heavy metals and DOM in leachate play a crucial role in pollutant behavior and bioavailability, emphasizing the need for further exploration of DOM composition [17].

To fill these research gaps, this study was conducted with the following objectives: to examine the molecular composition of DOM in landfill leachate; to investigate temporal and spatial variations in the molecular characteristics of DOM through data visualization and statistical analysis, with a specific focus on differences among six sampling points within the landfill; to compare the characteristics of DOM in Japanese landfill leachate derived from incineration residue and incombustible waste with those from landfill sites in other countries.

Materials and methods

Sample collection

Leachate samples were collected bi-monthly at six monitoring wells (G1.1, G1.2, G2, G3, G4, G5) located in five sections covering nearly the whole landfill site from October 2019 to June 2020. A map of the landfill site with sampling points is shown in the Supplementary Material (Fig. S1). The landfill operated from 1976 to 1995, subsequently underwent closure, and has been devoid of new waste inputs ever since. Leachate samples were collected with the aid of a smart groundwater purging pump (Daiki Rika Kogyo, Saitama, Japan). All samples were collected in 5-L polyethylene plastic bottles and transported in coolers to the laboratory as quickly as possible.

Orbitrap MS analysis

Solid-phase extraction (SPE) was applied to all leachate samples to concentrate the samples and remove inorganic salts that could cause ionization suppression before unknown (non-target) screening analysis. Four liters of leachate samples were filtered through pre-combusted glass-fiber filters (GF/B, 47 mm, Whatman), acidified with hydrochloric acid (35%, Nacalai Tesque, Inc., Kyoto, Japan) to pH 2.0 ± 0.1, and then subjected to Solid Phase Extraction on two parallel cartridges using Bond Elut PPL (1 g, 6 mL, Agilent Technologies, USA; [18]). The SPE procedure was controlled using Suwannee River natural organic matter reference material (SRNOM, 2R101N) purchased from the International Humic Substances Society (IHSS). The SPE eluates, containing the extracted DOM, were stored at – 30 °C in a dark environment for subsequent analysis. The detailed sample pretreatment procedures are described in Text S1 of the Supplementary Material.

All DOM extracted through the SPE procedure was analyzed with an Orbitrap mass spectrometer (Exactive Plus; Thermo Fisher Scientific, Waltham, MA, USA). The Orbitrap mass spectrometer operates based on the principles of electrostatic trapping and detection of ions. The Orbitrap mass analyzer consists of a coaxial inner spindle-like electrode surrounded by an outer barrel-like electrode. The trapped ions inside the mass analyzer orbit in stable trajectories around the central electrode with harmonic oscillations along it and their motion is characterized by their frequency in the trapping field. The frequency is subsequently used to calculate the mass-to-charge ratio (m/z) of the ions. The oscillation frequencies are detected using time-domain image current detection and are transformed into mass spectra through the Fourier transform [19,20,21]. The Orbitrap mass spectrometry principle differs from other conventional mass spectrometers primarily in its method of ion trapping and detection, as well as the resulting high-resolution capabilities. The advantages of Orbitrap mass analyzer include high resolution for clear separation of closely spaced peaks, which is valuable for the analysis of complex samples with multiple components; high sensitivity allowing the detection of low–abundance ions; high accuracy in mass assignment, ensuring precise identification of molecular ions; and versatility in ionization sources, providing flexibility for different sample types [20, 21]. The ionization method was electrospray ionization (ESI), which is a soft ionization technique with a low incidence of fragmentation, thereby increasing the likelihood that analytes remain intact [22]. External mass calibration was conducted daily over the m/z range of 105–1500, resulting in mass accuracy within ± 1 ppm for all ions in the standard solution collected that day. The MS spectra were obtained in negative ion mode. The extracted DOM was introduced to the ESI source through flow injection and the liquid chromatography (LC) system was used without a column. The mobile phases included mobile phase C of 100% methanol (MeOH; Wako, Osaka, Japan) and mobile phase B of 10 mM ammonium formate (NH4HCO2; Sigma-Aldrich, St. Louis, MO, USA). All SPE samples were mixed well, diluted to a dissolved organic carbon (DOC) concentration of approximately 282.00 mg C/L, and filtered through 0.45-μm polytetrafluoroethylene syringe filters. Filtered SPE leachate solutions were added (50% of the total volume), along with 50% 10 mM NH4HCO2, to 2-mL LC glass vials. Suwannee River fulvic acid (SRFA; 2S101F) purchased from the IHSS was used as the reference material for control of the HRMS settings. The final DOC concentrations of SPE samples and SRFA were around 141.00 mg C/L. SPE blanks and the solvent of MeOH/10 mM NH4HCO2 (50/50, v/v) were used as experimental blanks. HRMS settings were selected to optimize sensitivity, obtain stable signals, optimize the match with common molecular formulae assigned for SRFA in the literature (> 95%; [23]), and obtain the fewest unique peaks in the molecular formula assignment results. Detailed LC–MS experimental conditions are listed in Tables S1 and S2 (Supplementary Material).

Statistical analysis and visualization of data

Molecular formula assignment

In total, 50 scans were acquired for each mass spectrum and exported from Xcalibur software (Thermo Fisher Scientific) in comma delimited (.csv) format including m/z, intensity, resolution, and noise. Molecular formulae were assigned based on m/z peaks using the open-access package MFAssignR [24] in R (v. 4.0.2), which has several major functions. The noise level was assessed using the Kendrick mass defect (KMD) slice method with the function KMDNoise, which was modified from Riedel and Dittmar [25]. Only analyte peaks with a signal-to-noise ratio > 3 were retained. The IsoFiltR function was used to identify potential polyisotopic ion masses containing 1–2 13C or 34S prior to molecular formula assignment, ensuring monoisotopic and polyisotopic peaks. Preliminary CHO assignment was conducted with the MFAssignCHO function and the CH2 homologous series was selected as internal recalibration ions. The Recal function was used to generate a list of potential recalibrant series. After recalibration of the entire m/z dataset, final molecular formula assignment was conducted using the MFAssign function and filtered using Microsoft Excel 2023 (Microsoft Corp., Redmond, WA, USA) with the following constraints: compositional formula, C, H, O, N, S; number of carbon atoms, 1–80; number of hydrogen atoms, 1–160; number of oxygen atoms, 1–40; number of nitrogen atoms, 0–2; number of sulfur atoms, 0–1; mass tolerance, ± 1.0 ppm; peak intensity, > 50,000; and elemental ratios, 0.2 ≤ H/C ≤ 3.1, 0 ≤ O/C ≤ 1.2, and 0 ≤ N/C ≤ 1.3 [26]. The conditions for peak selection were set as a signal-to-noise ratio > 3 with peak intensity > 50,000. These conditions were set to optimize matching with commonly assigned ions and reduce the number of unique ions assigned to SRFA. The seven golden rules proposed by Kind and Fiehn [26] were employed as a reference to validate the assignment of molecular formulae. A detailed process of molecular formula assignment using MFAssignR package for this data set is provided in the Supplementary Material (Text S2). The solvent blank, SPE blanks, SRFA, and SPE leachate samples were processed in the same manner for molecular formula assignment. The signals present in the blanks were subtracted from the sample spectra.

Classification and characterization of DOM molecules based on elemental composition and a van krevelen diagram

DOM molecules were classified by elemental composition (i.e., CHO, CHON, CHOS, or CHONS). After constructing a van Krevelent diagram using the molar ratio of hydrogen to carbon (H/C) as the ordinate and the molar ratio of oxygen to carbon (O/C) as the abscissa, five major classes of DOM could be identified based on the modified aromaticity index (AImod), O/C, and H/C values [6, 24, 27, 28]. The five classes identified included aliphatic compounds (H/C ≥ 1.5), polyphenols (0.66 ≥ AImod > 0.50, H/C < 1.5), and polycyclic aromatics (AImod > 0.66, H/C < 1.5). Moreover, for compounds with H/C < 1.5 and AImod ≤ 0.50, formulae with O/C < 0.50 were categorized as low-O unsaturated phenolic compounds, while those with O/C ≥ 0.5 were classified as high-O unsaturated phenolic compounds (refer to Table S5 of the Supplementary Material for details). Furthermore, the van Krevelen diagrams were divided into eight regions based on O/C and H/C ratios. These regions corresponded to lipids, proteins, amino sugars, carbohydrates, unsaturated hydrocarbons, lignins/carboxylic-rich alicyclic molecules (CRAM)-like organics, tannins, and aromatic organics, detailed descriptions of which are provided in Table S6 of the Supplementary Material [4, 11, 29]. To obtain deeper insights into the distribution and abundance of molecular formulae within the given O/C and H/C ratio ranges, Kernel density estimation (KDE) was conducted using Matlab software (version R2023a) to estimate the probability density of the assigned molecular formulae at six sampling points. Subsequently, the characteristics of DOM were further evaluated through comparison of the results of Orbitrap MS analysis with those obtained from EEM-parallel factor (PARAFAC) analysis.

Carbon oxidation–reduction state and degree of unsaturation in DOM

The nominal oxidation state of carbon (NOSC) and double bond equivalent minus oxygen per carbon ([DBE − O]/C) were used to represent the oxidation–reduction state of carbon and the degree of unsaturation of DOM, respectively [6, 30 – 32]. The NOSC vs. (DBE − O)/C plot can be divided into four distinct zones, and the classification and description of these zones is presented in the Supplementary Material (Text S3, Table S7).

KMD analysis for peak sorting and structural characterization

KMD analysis was employed to sort peaks of complex spectra based on their homologous relatives (CH2, H2, COO, CO, OH, C6H5), providing insights into the structural characteristics of organic compounds and their functional groups [6, 27]. At the same KMD value, a higher Kendrick nominal mass corresponds to more repeated functional groups within the DOM.

Hierarchical cluster analysis of orbitrap ms data

Mass lists of duplicate Orbitrap MS data for each sampling month were merged into a single dataset by averaging the intensity values of the duplicate measurements. In total, 30 datasets, corresponding to the 30 sampling times, were combined using the outerjoin function of Matlab software. The first column of the merged dataset represents m/z and was used for data merging. The subsequent 30 columns contained the intensity values of the 30 samples. The resulting data matrix was transformed into a binary format, where a peak was classified as present (peak intensity = 1) or absent (peak intensity = 0). Furthermore, the data were normalized by dividing the peak counts by the total number of molecular formulae assigned to each sample. This approach allows peak intensity differences between m/z values and among spectra to be ignored, as described by Kujawinski et al. [33]. Hierarchical cluster analysis was conducted using the pheatmap function of R software with the Ward.D linkage clustering method and Bray–Curtis distance measure.

Results and discussion

Spe results and molecular formula assignment of reference material

The recovery of DOC from Suwannee River natural organic matter reference material during the SPE procedure was 88.6% when the initial DOC concentration was 20.59 mg C/L and a feed volume of 63 mL was used. This result is consistent with the study conducted by Li et al. [34], which reported DOC recovery of up to 89% with a feed volume of 62.5 mL and an initial DOC concentration of 20 mg C/L. For all leachate samples (G1.1, G1.2, G2, G3, G4, and G5), the recovery rates of DOM using SPE were 62.5 ± 4.3%, 60.9 ± 3.8%, 68.7 ± 1.3%, 55.0 ± 2.8%, 66.7 ± 3.2%, and 59.3 ± 8.3% (n = 10), respectively, across 5 sampling months. The experiment yielded reproducible results for duplicate samples (Table S8, Supplementary Material). These DOC recoveries are acceptable relative to previous studies using PPL cartridges [18, 35].

On each Orbitrap MS measurement day, SRFA was analyzed alongside all samples. Hawkes et al. [23] reported SRFA composition obtained with 17 different HRMS instruments from an international laboratory comparison. The assigned ions for a given sample (i.e., SRFA) that were common across all or all but one instrument were designated “common,” and ions that were clearly detected and assigned by at least three instruments were categorized separately as “detected ≥ 3” ions. Without accounting for the isotopic pattern associated with each molecular formula, Hawkes et al. [23] suggested that a detection rate of > 95% of common ions, O/C = 0.52 ± 0.032, H/C = 1.09 ± 0.044, m/z = 423 ± 46 and AImod = 0.34 ± 0.02 (< ± 2 standard deviations from the mean), indicating good performance and reproducible measurement of the reference material SRFA at mass error ± 1.0 ppm. The results of this study indicate a match percentage of common peaks of 97.1 ± 0.2% (n = 6), which meets the threshold of Hawkes et al. [23]. The average O/C and H/C ratios, m/z and AImod of all molecular formulae in this study (O/C = 0.51 ± 0.004, H/C = 1.10 ± 0.007, m/z = 422 ± 1.2, AImod = 0.33 ± 0.003; n = 6) also met the thresholds proposed by Hawkes et al. [23]. However, to more confidently assign molecular formulae, an isotopic pattern check is necessary, such as the one included in the IsoFiltR function of the MFAssignR package.

There were 2252 ± 108 assigned molecular formulae in SRFA peaks, of which 1357 ± 63 included matching isotopic peaks containing 1 13C, and 203 ± 17 included matching isotopic peaks containing 2 13C. Finally, only monoisotopic peaks with at least one corresponding isotopic peak were selected with the assigned molecular formula. After an isotopic pattern check, the results showed a match rate of common peaks in SRFA of 82.5 ± 1.5%. Hawkes et al. [23] compared the SRFA composition data of D’Andrilli et al. [36] with the database, stating that 75% of the common SRFA peaks were found in D’Andrilli et al. [36]. Some low-abundance peaks were not assigned due to a more conservative approach in molecular formula assignment, considering elemental and isotopic constraints, or MS 13C peak confirmation, as reported by D’Andrilli et al. [36] [23]. Additionally, Song et al. (2021) reported that 83% of the detected common ions in the SRNOM sample, using Orbitrap ESI-HR-MS and the MFAssignR package, were found in the standard results from Hawkes et al. [23]. This indicates that our results were reasonable when considering isotopic pattern checks in molecular formula assignment. Additionally, following the isotopic pattern check, the metric values obtained from the molecular formula were as follows: O/C = 0.51 ± 0.004, H/C = 1.11 ± 0.008, m/z = 421 ± 1.6, and AImod = 0.33 ± 0.004 (n = 6). All metric values fell within 2 standard deviations of the average results reported by Hawkes et al. [23], suggesting they were not statistically different from the sample of instruments studied by Hawkes et al. [23] at a 95% confidence level. Our results for molecular formula assignment were in good accordance with those acquired from international instruments. This data analysis procedure, including isotopic pattern check, was applied in the same manner to all landfill leachate samples.

Visualization of orbitrapMSs data for leachate DOM

Elemental composition and DOM classification based on the Van Krevelen diagram

In total, 878–1,462 molecular formulae were assigned for each sampling time and 1265–1845 molecular formulae were assigned to each sampling point (based on the combined assignment results of 10 measurements) across all six sampling points during the 5-month sampling period. The detailed results of the assigned molecular formulae for each sample are presented in Table S9 of the Supplementary Material. Among sampling points, G3 samples had the most CHON-containing molecular formulae (32.5–35.5%). G2 and G5 had the most CHOS-containing molecular formulae (12.1–17.6% and 11.2–15.7%, respectively), while G4 was distinguished by a large number of CHO-containing molecular formulae (87.6–88.9%) (Fig. 1).

Fig. 1
figure 1

Element compositions of DOM in landfill leachate

Figure 2a shows that the distribution of DOM molecular formulae in landfill leachate, as observed using the van Krevelen diagram constructed in this study (O/C: 0.1–0.7, H/C: 0.7–2.0), differed compared to previous research. Young leachate, obtained from a temporary stockpile of MSW [6] and the pit of an incineration plant during the initial 5–7 days of waste fermentation [12], exhibited O/C values of 0.1–0.5 and H/C values of 0.6–2.2. Meanwhile, the DOM of mature leachate collected from a 27-year-old anaerobic landfill showed O/C of 0.1–0.9 and H/C of 0.5–2.1 [6, 12, 37]. Liu et al. [6] noted that DOM in young leachate is primarily derived from the hydrolysis of fresh organic constituents within MSW, including proteins, polysaccharides, and (hemi)cellulose. This process results in higher H/C ratios and lower O/C ratios on the van Krevelen diagram compared to mature leachate. Conversely, in the case of mature leachate generated through continuous landfilling, a diverse array of organic materials is present and microorganisms progressively consume the bioavailable components.

Fig. 2
figure 2

a Van Krevelen diagram. b percent of assigned molecular formulae’s number. c percent of relative abundance of DOM classes in G1.1, G1.2, G2, G3, G4, and G5 samples

In contrast to the direct landfilling of MSW with no incineration process, MSW in Japan undergoes incineration prior to landfilling. The landfills sampled in this study have been in operation for 25–44 years and no longer receive new waste input. The carbon content was reported as 2.00–5.55% (mean: 3.59 ± 1.29%) in the waste layers and 0.22–3.28% (mean: 1.33 ± 1.13%) in the soil layers of an old incineration residue landfill [38, 39]. These findings suggest the potential for DOM release from landfill waste, with leachate DOM attributed to incineration residues, incombustible waste, microbial landfill activity, and soil cover. The O/C ratios obtained in this study are higher than those for young leachate but lower than those of mature leachate from conventional landfill sites, while the H/C ratios are lower in both of these types of leachate than in the present study. Thus, differing waste management practices result in variations in waste composition, consequently leading to differences in landfill leachate composition.

The leachate mainly consists of low-oxygen unsaturated phenolic compounds (40.5–54.0%) and aliphatic compounds (19.4–47.3%) (Fig. 2a–b). High abundances of polyphenolic compounds in G4 (14.0%) and high-O unsaturated phenolic compounds in G4, G2, and G5 (13.2–20.3%), especially G4, were observed, while a large amount of aliphatic compounds was found in G3 (47.3%). Among samples, G5 had the largest percentage of low-O unsaturated phenolic compounds. The distribution of DOM regions showed similar patterns between samples G1.1 and G1.2, as these two sampling points were located in the same section. The most notable difference in DOM characteristics among leachate sampling points was observed between the G3 and G4 samples. The abundance and relative intensity of compounds followed a similar trend to the distribution of assigned molecular formulae within each region (Fig. 2b–c). This suggests a consistent relationship between the abundance of compounds and the assignment of molecular formulae within regions. Another approach, classifying the van Krevelen diagram into eight distinct regions, yielded similar results, as illustrated in Fig. S3 of the Supplementary Material. The majority of molecular formulae in leachate DOM occurred in the lignin/CRAM-like region (51.5–74.6%) and protein region (14.7–37.5%), followed by the lipid region (3.2–5.8%). Few components were located in the amino sugar, tannin, unsaturated hydrocarbon, or condensed aromatic region. The carbohydrate region contained almost no components. Within G3, the CHON-containing molecular formulae were predominantly associated with the protein region (30.8%) and lignin/CRAM region (62.8%), while in G4, most CHON-containing molecular formulae were found in the lignin/CRAM region (91.8%).

The distribution of molecular formulae assigned to various regions and their contributions to the characteristics of the DOM identified in this study differed from previous research. Liu et al. [6] reported a higher proportion of aliphatic compounds (75%) and a lower proportion of low-oxygen unsaturated phenolic compounds (24%) in young leachate, while mature landfill leachate primarily contained high-oxygen unsaturated phenolic compounds (37%), followed by low-oxygen unsaturated phenolic compounds (28%) and aliphatic compounds (27%).

Based on the categorization of DOM using the van Krevelen diagram, G3 samples exhibited a distinctive profile characterized by a high abundance of CHON-containing molecular formulae. Stedmon and Cory [40] reported that proteins and their degradation products consist of a biodegradable fraction of DOM preferred by bacteria, which was considered an indicator of the presence and proportion of labile organic carbon and nitrogen. This observation aligns with findings obtained via EEM-PARAFAC, wherein G3 samples demonstrated the highest maximum fluorescence intensity of protein-like components among all sampling points during a 13-month investigation [41]. Furthermore, G3 samples exhibited the highest biological index value, indicating a potential association with biological activity or the presence of biologically labile compounds [41]. In G4 samples, the prominence of CHO-containing molecular formulae can be attributed primarily to the degradation products of lignin-CRAM, proteins, and lipids. The van Krevelen diagram also indicates a higher abundance of polycyclic aromatic compounds in G4 (3%) compared to other sampling points (Fig. 2a–b). Seo et al. [42] noted that incineration residues in solid waste may contain persistent organic pollutants such as dioxins and polycyclic aromatic hydrocarbons. Therefore, polycyclic aromatic compounds in the leachate were likely derived from incineration residues. The proportion of polyphenols was highest in G4 (15%), and it ranged from 3.5–5.8% in all other samples. Polyphenols were present as precursors released from the cell walls of microorganisms and were liberated during the degradation of substances such as lignins. Through polyphenol oxidases, these polyphenols are oxidized to quinone, which condenses with itself or various amino groups to form humic substances characterized by large, high-molecular-weight molecules [43].

Figure 3 shows kernel density plots depicting the distribution of DOM on the van Krevelen diagram for all six sampling points. On the density plot, regions with higher density values are displayed as intense blue colors, while regions with lower density values appear lighter. In KDE, both the intensity of each data point and the number of points in a particular region contribute to the density calculation. The G3 sample exhibits the highest density at high H/C values of around 1.5 (range: 1.3–1.7) and low O/C values of around 0.3 (range: 0.2–0.4). Samples G1.1, G1.2, and G2 had H/C and O/C ratios of 1.2–1.5 and 0.2–0.4, respectively, while for sample G5 they were 1.2–1.5 and O/C = 0.3–0.5, respectively. Notably, sample G4 was characterized by an apparent pattern along a diagonal line, with H/C and O/C ratios of 0.7–1.5 and 0.2–0.6, respectively, in the region of highest density. The density values were highest in samples G5 and G1.1, at 5.0, followed by samples G1.2, G2, and G3 (4.5), and finally by sample G4 (3.0). Interestingly, sample G4 exhibited the widest concentration range among the five sampling points. According to Kim et al. [27], the lines in the van Krevelen plot correspond to the following reactions: (A) demethylation, methylation, or alkyl chain elongation; (B) dehydrogenation or hydrogenation; (C) condensation or hydration; and (D) oxidation or reduction. At all six sampling points, some DOM is located in the region of high H/C and low O/C. However, along with the diagonal line representing demethylation (line A), high-density compounds concentrated with lower H/C and higher O/C values were most apparent in G4, followed by G5. This indicates that the demethylation process is likely strongest in G4, decreasing in the order G4 > G5 > G2 > G1.2, G1.1 > G3. Furthermore, for oxidation, dehydrogenation, and condensation reactions, sample G4 had more compounds on the van Krevelen plot compared to the other samples.

Fig. 3
figure 3

Kernel density plots of DOM distributions

Carbon oxidation–reduction state and degree of unsaturation of DOM

The carbon oxidation–reduction state and degree of unsaturation of DOM from six sampling points are presented in Fig. 4. The majority of DOM present in landfill leachate was in a reduced state (76.2–95.4%), primarily falling within Zone 2 (reduced and unsaturated states) and Zone 3 (reduced and saturated states). The findings of this study were compared with those of [6], who observed that the majority of DOM in young leachate fell within Zone 3 (reduced and saturated state). Almost no DOM was in the oxidized state in young leachate, while mature leachate exhibited a more even distribution across all four zones [6].

Fig. 4
figure 4

NOSC vs (DBE-O)/C plots of G1.1, G1.2, G2, G3, G4, G5

Samples G4 and G5 in this study had larger proportions of compounds in the oxidized zones compared to the other sampling points, at 22.1% and 11.4%, respectively. Around 76.3% of compounds in the G4 samples were in a reduced state, while 23.1% were in an oxidized state. On the other hand, the DOM distributions at other sampling points had a low proportion of DOM compounds belonging to the oxidized zone (4.1–8.8%). Among samples, G4 exhibited the highest average modified aromaticity index value, of 0.319, followed by G5 (0.244), G2 (0.233), G1.1 (0.228), G1.2 (0.217), and finally G3 (0.178) (Fig. 4). This finding is consistent with the highest value of specific ultraviolet absorbance at 254 nm (SUVA254) being observed in G4 (4.5 ± 0.3 L/mg C·m), followed by G5, with a SUVA254 of 3.8 ± 1.3 L/mg C·m. The remaining samples displayed lower SUVA254 values ranging from 2.5 to 3.6 L/mg C·m [41]. This finding highlights the large fraction of aromatic compounds in DOM at sampling points G4 and G5.

Molecular structure

At the same KMD (CH2), and KMD (H2) lines, the DOM in G3 had a higher Kendrick nominal mass compared to G4, G5, and G2, indicating more organics containing CH2 structures in G3 and more H abstraction reactions occurring in G4 [6] (Fig. 5a–b). At the same KMD(COO), KMD(CO), KMD(OH), and KMD(C6H5) values, the DOM in G4, G5 and G2 exhibited a higher Kendrick nominal mass than the DOM in G3, G1.1, and G1.2, suggesting more carboxyl, carbonyl, hydroxyl groups, unsaturated bonds, and organics with benzene-ring structures (Fig. 5c–f). This result indicates a greater variety of functional groups and a more structurally complex composition of organics in G4 and G5 compared to G3. Various reactions occurring in the landfill, including demethylation, hydrogen extraction, carboxylation, and aromatization, could lead to increased oxidation of the refractory fractions of DOM due to the involvement of oxygen [13]. Additionally, demethylation and hydrogen extraction play significant roles in enhancing the unsaturation and aromaticity of DOM [13]. These reactions may explain the higher abundances of compounds located in the carbon oxidation state of Zones 1 and 4, as well as the higher aromaticity of G4 samples described in Sect. "Carbon oxidation–reduction state and degree of unsaturation of DOM" and the opposite trend of sample G3.

Fig. 5
figure 5

Plots of Kendrick mass defect of a CH2. b H2. c COO d CO. e OH. f C6H5. and Kendrick nominal mass for leachate DOM in 6 sampling sites in 5 months

Utilizing Orbitrap MS as a recently emerging technology for unknown screening analysis of DOM, and integrating data analysis and visualization techniques, we were able to elucidate the differences in DOM characteristics. This was achieved by examining elemental composition, categorizing DOM compounds using van Krevelen diagrams, analyzing reactions via Kernel density plots, assessing Carbon oxidation–reduction states and degrees of unsaturation, and exploring chemical structure based on Kendrick mass effect plots. These differences were prominently observed in samples G3 and G4, despite both originating from the same old Japanese landfill site. This discrepancy contrasts with previous research findings [6], which indicated distinct variations in DOM characteristics depending on landfill age, presumably due to a series of reactions occurring during the transformation from young to old landfill stages.

Hierarchical cluster analysis of orbitrap ms data

Figure 6 illustrates the dissimilarity among sampling points over 5 months of sampling. In general, minimal difference among sampling times was found, whereas the constituents of leachate DOM exhibited more obvious variations among sampling sites. Samples G1.1 and G1.2 were most similar in terms of molecular formula composition because they were collected in the same section of the landfill site, G1, while the G3 sample was most different from the other samples. The G4 samples were distinct from the clustered G1.1, G1.2, G2, and G5 samples. As shown in Fig. S4a–b (Supplementary Material), DOM compounds mainly cover the m/z range of 200–600, with G4 having more compounds of m/z 400–600 and G3 having more compounds of m/z 200–400. The G5 samples collected in October 2019 and February 2020 had the largest numbers of assigned molecular formulae and were similar to G2 samples, while G5 samples collected in December 2019 and in April and June 2020 had fewer assigned molecular formulae and clustered with samples from G1.2, G1.2, and G2. The amount of DOC in sample G5 was lowest in April and December 2019, and in June 2020, among all sampling times, with values of 4.8, 2.6, and 3.7 mg C/L, respectively (Table S8, Supplementary Material). A positive correlation (r = 0.41, p < 0.001, n = 72) was found between DOC and the groundwater level, suggesting that hydrological factors influenced the characteristics of DOM and led to changes in the molecular formulae detected in the G5 samples [41]. Considering the landfilling time, the variation of molecular composition among samples is unlikely to depend on the operating period. Samples were collected from various points within the old landfill site, filled at different times, including G1.1 and G1.2 (1976–1985), G2 (1977–1986), G3 (1985–1989), G4 (1987–1993), and G5 (1994–1995). Despite G3 and G4 operating over almost the same period, they showed the most discrepancy in DOM characterization. Furthermore, while G5 samples were collected from an extended disposal area and G2, G1.1, G1.2, or G4 were collected from an old disposal area, they exhibited similarities to some extent in their molecular compositions, as observed from the heatmap of assigned compounds.

Fig. 6
figure 6

Heatmap of assigned compounds across all landfill leachate samples. (Color intensity for each compound, when present (1), is calculated as 1 divided by the number of assigned compounds; when absent (0), it is calculated as 0 divided by the number of assigned compounds within each landfill leachate sample.)

This study demonstrates that the characteristics and differentiation of DOM leachate derived from incineration residues and incombustible waste beneath soil cover differ from those of young and mature DOM leachate originating from anaerobic landfill sites, underscoring the primary role of waste composition rather than landfill age in shaping DOM characteristics.

Conclusion

This study investigated the characteristics and variations of DOM in leachate from an old Japanese landfill site containing incineration residues and incombustible waste using ESI Orbitrap MS and spectral analysis. The DOM in landfill leachate was dominated by CHO compounds (58.5–88.9%), in contrast to conventional mature landfill leachates, which typically have a larger proportion of CHON compounds. The majority of DOM fell within the lignin-CRAM region (51.5–74.6%), followed by the protein region (14.7–37.5%). Additionally, the DOM mainly contained unsaturated phenolic compounds (40.5–54.0%) and aliphatic compounds (19.4–47.3%). A large proportion of compounds fell into reduced zones (76.2–95.4%) rather than being evenly distributed between reduced and oxidized zones, as is typically observed in mature landfill leachates. Waste composition was found to have a greater influence on DOM characteristics than landfill age. The variations in molecular composition among sampling points were more pronounced than the temporal variations. Among six sampling points, samples G3 and G4 exhibited marked differences in DOM characteristics. Sample G3 had high proportions of CHON compounds (32.5–35%) and proteins, while aliphatic compounds were the most abundant (64.2%). This suggests a larger contribution of biodegradable DOM to sample G3 than to the other samples. Conversely, sample G4 exhibited a significant contribution from CHO compounds and the highest relative abundance of low-oxygen, unsaturated phenolic compounds (49.5%). Compared to the other samples, G4 had a larger percentage of polyphenol and polycyclic aromatic compounds, which were least abundant in G3. Furthermore, sample G4 exhibited the highest degree of aromaticity, as well as the most carbonyl, carboxyl, and hydroxyl functional groups and benzene rings among all samples. These characteristics indicate that G4 has a high content of refractory DOM, which is resistant to degradation. Moreover, the demethylation process was most apparent in G4 samples. When comparing the samples, the order of DOM in terms of refractory nature was G4 > G5 > G2 > G1.2 and G1.1 > G3. These findings will facilitate subsequent exploration of the differing binding capabilities of DOM and heavy metals present in landfill leachate.