1 Introduction

Natural organic matter (NOM) is a complex mixture of degraded natural compounds and plays a crucial role in global carbon cycling. Dissolved organic matter (DOM) refers to the dissolved fraction from bulk NOM, which can pass through a 0.7 μm glass/quartz fiber filter. DOM is actively involved in many biogeochemical processes, e.g., cycling of carbon, binding of metals and microbial growth. They offer building blocks and energy sources for aquatic biota, limit light penetration in surface water and interact with anthropogenic compounds (Opsahl and Benner 1997; Wagner et al. 2015; Wu and Tanoue 2001). Additionally, DOM also leads to issues such as water odor, disinfectant demands, membrane fouling, and the production of carcinogenic disinfection byproducts (Zhou et al. 2022). The elemental composition and molecular reactivities of DOM can reveal its role in biogeochemical processes including origin, storage, transport, decomposition and its release as greenhouse gas (Nebbioso and Piccolo 2013; Nelson and Siegel 2013).

Understanding the biogeochemistry of DOM is the prerequisite to the management of ecosystem services provided by soil, glacier, lake, riverine, ocean, permafrost, and atmosphere systems. First of all, soil organic matter (SOM) contains at least three times as much carbon as is estimated in either the atmosphere or terrestrial vegetation (Schmidt et al. 2011), and the leaching of SOM leads to the formation of DOM (Roth et al. 2019; Ye et al. 2020). Such an organic carbon pool is undoubtedly sensitive to any changes from the local environment, and in addition, human activities contribute to the biogeochemistry cycle on SOM and DOM (Wang et al. 2019a; Wang et al. 2021c; Zhang et al. 2020a). It is also important to point out that DOM consists of a large variety of functional groups and surface areas, which serve as a suitable container for metals, salts, nutrients, and organic pollutants. The fluxes of continental DOM together with these adducts have important influences on inland waters, lakes, and marine ecosystems. For instance, lake DOM is mostly derived from terrestrial sources (Fu et al. 2006; Wu and Tanoue 2001), and along with freshwater discharge, it can be the determinant of productivity in coastal areas. Meanwhile, the ocean is regarded as the ultimate destination of terrestrial DOM, whose transformation through freshwater systems serves as an important regulator of the quantity of DOM delivered to marine systems (Coble 2007; Tranvik et al. 2009). Last but not least, it has been estimated that the permafrost regions in the Arctic contain 1700 Pg of organic carbon, and in the near future, climate warming in the Arctic will lead to permafrost thaw and cause the release of 41–288 Pg carbon by the year of 2100 (Schuur et al. 2013). The majority of this large amount of ancient DOM will ultimately be transferred to the atmosphere, and give positive and profound feedbacks on climate change (Vonk et al. 2013). Many early studies have been aimed to determine the biogeochemistry cycles of DOM (e.g., Fu et al. 2006), however, achieving a molecular-level understanding of its individual compounds remains the utmost challenge. For this reason, detailed characterization is a crucial step to decipher DOM’s chemical role and ultimate fate on a global scale.

Despite recent achievements, a detailed structural characterization of DOM and its variations under different environmental conditions remain largely elusive (Minor et al. 2014). A major issue is that DOM often presents at low concentrations (e.g., 0–10 mg L–1 in water) compared to the much higher concentration of inorganic salts (approximately 35 g L–1 in seawater), which can greatly interfere with the analyses (Sandron et al. 2015). The second problem is that a typical DOM sample consists of tens of thousands of compounds. The heterogeneity of their compositions and structures makes it impossible to separate mixtures into individual compounds by liquid chromatography (LC), electrophoresis, or isoelectric focusing (Qi et al. 2022; Sandron et al. 2015).

Bulk analyses using isotopic values, ultraviolet-visible absorption spectrophotometry (Chen et al. 2020) and excitation-emission matrix (EEM) fluorescence spectroscopy with Parallel Factor Analysis (PARAFAC) (Yamashita and Jaffé 2008) can extract information on a large portion of the DOM pool, but do not offer the structural information from the individual components. This limitation remains the same for advanced physical methods such as soft X-ray spectroscopy (Abe et al. 2005), nuclear magnetic resonance (NMR) spectroscopy and fluorescence polarization (Cory and McKnight 2005), which mostly only work well for pure samples. These approaches require isolation and concentration steps prior to analysis, and thus consequently fractionate the DOM pool.

DOM in the atmosphere, water, soil, and sediment is featured by its extraordinary molecular diversities (Chen et al. 2022). Comprehensive chemical and molecular compositional profiles on DOM can be obtained using mass spectrometry (MS) with ultra-high resolution and mass accuracy (MA). MS has been revolutionized by pursuing instruments with increasing mass resolving power (R), making it possible to calculate elemental formulas (CcHhOoNnSsPp…where c, h, o, n, s and p are the numbers of the respective elements) of tens of thousands of compounds simultaneously from accurate mass measurements. Among the different MS instrument types, time-of-flight (TOF), orbitrap and Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR MS) are the three high-resolution mass analyzers. TOF MS can routinely only achieve a broadband R over 10,000 and MA within 5 to 10 ppm, which is much lower than the other two FT instruments (orbitrap and FT-ICR MS) (Boesl 2017). Nevertheless, TOF mass analyzers offer a fast scan rate and no theoretical upper limit for the mass-to-charge ratio (m/z). These advantages make it particularly useful for probing large biomolecules such as intact proteins. The orbitrap, a new addition to the family of Fourier transform mass analyzers whose prototype introduced in 1999, has now been widely distributed since its commercial launched in 2004 (Hu et al. 2005). Experiments on a modified Orbitrap Elite instrument showed that R over 1,000,000 could be achieved with appropriate tuning requirements, and such performance allowed the separation of molecule’s fine isotopic structure (Denisov et al. 2012). The FT-ICR MS, first introduced in 1974 (Comisarow and Marshall 1974), currently exhibits the highest R and best MA among the various mass analyzers. Lozano et al. (2019) demonstrated a new strategy to acquire a constant ultrahigh R over 3,000,000 across a broad m/z range from 260 to 1500 for complex mixture analysis.

Of the wide varieties of MS instruments available, the FT-ICR MS holds the greatest potential for state-of-the-art NOM research, given its inherently ultra-high R and MA for unequivocal molecule assignments (LeClair et al. 2012; Zhang et al. 2012; Zhang et al. 2020b). For this reason, FT-ICR MS is currently the instrument of choice in geochemistry, environmental chemistry, and atmospheric chemistry for the analysis of organic matter in different environmental matrices at the molecular level, including its properties and reactivity, with sample consumption of only microgram amounts (Qi et al. 2022; Sleighter and Hatcher 2007; Zhang et al. 2020b). However, regardless of the recent achievements, many challenges remain in applying FT-ICR MS to DOM-related research. This review article summarizes the history, principles, recent advances, analytical strategies, novel applications, and future challenges encountered in the analysis of complex DOM systems.

2 Principles of FT-ICR MS

In this section, the most critical principles on FT-ICR MS are briefly introduced; more detailed reviews can be found elsewhere (Qi and O'Connor 2014). Generally, a sample must first be ionized, producing gas-phase ions, which are then analyzed and detected according to their mass-to-charge ratio (m/z), generating the mass spectrum by plotting signal intensity against m/z.

In FT-ICR MS, ions are produced externally from the ion source accumulated in the ion optic system, and transferred into an ion cyclotron cell which is embedded in a spatially uniform magnetic field (B) and an electric field (E) (Fig. 1). In the cell, ions with the same m/z will have the same fundamental cyclotron frequency (ωc), which is determined by the magnitude of B and is inversely proportional to the m/z (Eq. 1a):

Fig. 1
figure 1

(a) Schematic representation (left) and actual picture (right) of a closed cylindrical cell. The cell is aligned with the magnet field (z-axis); the excitation and detection plates are located along the xy-plane. The trapping plates are located at each end of the cell, and the orbiting ions are shown in yellow. (b) Schematic representation of how the ions are excited, detected, recorded, and Fourier transformed to generate the mass spectrum

Note that the charge (q) is calculated by multiplying the number of charges (z) by the charge of an electron (e); hence, m/q is related to the measured m/z. The presence of the electrostatic field offsets the center of ions’ cyclotron motion and leads to the reduced cyclotron frequency (Eq. 1b) and the magnetron frequency (Eq. 1c) in the xy-plane.

Cyclotron frequency:

$$\omega c=\frac{qB}{m}$$
(1a)

Reduced cyclotron frequency:

$$\omega_+=\frac{\omega_c}2+\sqrt{\left(\frac{\omega_c}2\right)^2-\frac{\omega_z^2}2}$$
(1b)

Magnetron frequency:

$$\omega_-=\frac{\omega_c}2-\sqrt{\left(\frac{\omega_c}2\right)^2-\frac{\omega_z^2}2}$$
(1c)

The induced current from the ions can be detected as a function of time and recorded as a composite sum of sinusoidal waves called “free induction decay (FID)”. The FID is Fourier transformed to a frequency domain signal, followed by a mass calibration with respect to the m/z domain (Qi and O'Connor 2014).

A DOM sample may contain hundreds of thousands of organic species with close mass differences in between and within clusters of nominal mass. For this reason, MA is crucial for the confident assignment of individual elemental formulas within a mass spectrum. The mass error in parts-per-million (ppm) is defined by Eq. 2. Normally, FT-ICR MS can achieve MA at the sub-ppm or even parts-per-billion (ppb) level (Lozano et al. 2019).

$$\mathrm{Mass}\ \mathrm{accuracy}=\frac{m\mathrm{experimental}-m\mathrm{theoreticl}}{m\mathrm{theoretical}}\times {10}^6\kern0.5em \mathrm{in}\;\mathrm{ppm}$$
(2)

It is also important to point out that MA directly depends on the mass resolving power (R), and a prerequisite for successful analysis is that the signal must be resolved and distinguished from others. For example, a mass split of 0.1 mDa must be resolved in order to assign unique molecular compositions up to 500 Da for DOM species containing C, H, O, N and S. FT-ICR MS is best known for its mass resolving power (Eq. 3), where m and ω are m/z and the cyclotron frequency for the peak of interest, respectively, and Δm and Δω are the peak widths at half maximum (FWHM). DOM studies require R to be over 300,000 to distinguish mass differences of a few millidaltons (Tfaily et al. 2013), for example, C3 versus SH4 (3.4 mDa) and 13C versus CH (4.5 mDa). Moreover, a R of roughly 1,000,000 is required for distinguishing 12C4 and 13CSH3 (1.1 mDa) (Hsu et al. 2011).

$$R=\frac{m}{\varDelta m}=\frac{\omega }{\Delta \omega }$$
(3)

3 History and recent developments

Ultra-high resolution mass spectrometers are indispensable tools for the analysis of complex chemical mixtures. Currently, FT-ICR MS possesses the highest performance capabilities, has been widely utilized in petroleomics research, and has continued to be a driving force for developing experimental methodologies, data processing, and new instrumentation (Lozano et al. 2020). Advances in petroleomics have been adapted to the study of different complex chemical systems, which then directly push the frontiers of many other fields (Lozano et al. 2020; Qi et al. 2022).

The natural DOM compounds mainly consist of C (49.5±3.3%), H (5.0±1.0%), O (43.0±4.1%), N (1.7±1.0%), S (2.0±1.3%) and other elements. Additionally, 13C-NMR data reveal that environmental DOM contains approximately 30% aromatic groups, 23% alkyl groups, 22% carboxyl groups, 20% alkoxy groups, and 5% carbonyl groups (Novak et al. 1992). Nowadays, various analysis methods (e.g., UV, EEM) are available for bulk analysis of DOM samples to estimate their elemental composition and average structure. It is important to note that each method has its unique bias on different analytes. For example, UV and EEM are simple and convenient to characterize the bulk sample, but they are not able to analyze the samples at the molecular level. NMR and MS are modern instruments that can obtain detailed information on individuals. And FT-ICR MS is the technique which has sufficient mass resolving power to separate and accurately assign elemental compositions to individual molecules, thus allowing studies and more robust conclusions about DOM compositions, microbial decomposition pathways, and regional/global cycles. For the above reasons, nowadays, many studies tend to utilize a combination of different methods to show a complementary picture of DOM (Chen et al. 2022; Tfaily et al. 2013; Zhou et al. 2018).

Fievre et al. (1997) utilized FT-ICR MS at the National High Magnetic Field Laboratory (MagLab) to analyze dissolved NOM samples collected from the Suwannee River for the first time, thus creating a new direction and fundamentally changing the science in geochemistry. During the next two decades, more research groups devoted their efforts to this area. In 2018, the 21 Tesla (T) FT-ICR MS at the MagLab was applied to the analysis of DOM samples, demonstrating a resolving power up to 2,700,000 at m/z 400 (Smith et al. 2018).

A total of 571 publications can be searched via "Web of Science" using keywords "FT-ICR MS" and "organic matter" from the year 2011 to 2020 (Fig. 2a). Today, many groups worldwide are working on complex NOM systems, including crude oil, biofuel, bitumen, lignin, sediments, soil, atmospheric aerosols, drinking water, etc. (Fig. 2b). The original works were pioneered by William Cooper (Florida State University) and Alan Marshall (MagLab), and later, by Patrick Hatcher (Old Dominion University), Thorsten Dittmar (University of Oldenburg, Germany) and Philippe Schmitt-Kopplin (Helmholtz Zentrum München, Germany). The above researchers explored analytical methods and standard protocols, including sample preparation for comprehensive molecular characterization (Merder et al. 2020; Tfaily et al. 2015), data handling for mass spectra interpretation (Merder et al. 2019), hyphenated techniques coupled with tandem mass spectrometry (MSn) and trapped ion mobility spectrometry (TIMS) for isomeric structural elucidation (Hawkes et al. 2018; Kurek et al. 2020), or even mass spectrometry imaging techniques for mapping the three-dimensional distributions of DOM in plants (Zhang et al. 2021).

Fig. 2
figure 2

(a) Keywords among the publications from the year 2011 to 2020: the size of the dot represents its weight in all the papers, as each paper consists of several keywords, the lines therefore indicate the connections between the publications. (b) Examples of research groups working in complex NOM systems and related fields. Categories of different sample types and method developments are labelled (Adapted from Palacio Lozano et al. (2020))

4 Extraction of DOM

Atmospheric aerosol and water (e.g., fog water, cloud water, rain and snow) are composed of inorganic constituents, such as sulfate, nitrate, ammonium, sea salt, and a very complex mixture of organic compounds (Andreae and Rosenfeld 2008; Bao et al. 2018; Bruggemann et al. 2020; Chen et al. 2022; Kundu et al. 2012; Su et al. 2021; Wu et al. 2021; Xie et al. 2020; Zhao et al. 2013). Water-soluble organic matter (WSOM) consists of a substantial fraction (20–80%) of particulate organic matter (Sullivan et al. 2004), which is light-absorbing and contributes to global warming (Yue et al. 2022). Moreover, inorganic ions usually cause interferences during the WSOM characterization. Therefore, separation of the WSOM from inorganic ions is crucial for studying the organic compounds in atmospheric aerosol and atmospheric water (Lin et al. 2012). The concentration of dissolved organic carbon in the deep ocean is typically lower than 50 μmol L–1. This very low DOM concentration with large amounts of inorganic salts present in water samples simultaneously causes significant analytical challenges. De-salting, isolation and concentration of the DOM components are the prerequisites for DOM characterization.

Extraction of DOM can typically be achieved by one of four ways: (1) ultrafiltration, (2) direct drying or freeze-drying, (3) reverse osmosis coupled with electrodialysis (RO/ED), or (4) solid-phase extraction (SPE). SPE has been shown to be the “easier and quicker technique” compared to the others (Simjouw et al. 2005). Therefore, it is the most adapted extraction method. In SPE, molecules are retained on an appropriate solid phase while the DOM sample flows through a prepacked column; compounds of interest are then eluted using suitable solvents. Studies show that C-18 and styrene divinylbenzene-based sorbents are the most effective for DOM extraction across a wide range of water types; the recovery of the DOM components depends mainly on the polarity of the molecules (Minor et al. 2014). Dittmar et al. (2008) examined polar to highly polar prepacked polymer and silica-based sorbents to compare differences of extracted DOM molecules from seawater using different SPE cartridges. The authors found that the styrene divinylbenzene polymer was most efficient, as it recovered 62% of DOC as salt-free extracts. Subsequent [1H]NMR, C/N and δ13C analyses also confirmed that styrene divinylbenzene extracted a more representative proportion of DOM than the other materials; and hence, this extraction protocol has become a widely recognized method for simple and reproducible solid-phase extraction of DOM from water samples. It should be noted that the acidification during the SPE extraction can impact the molecular composition of DOM in different environmental matrices (Han et al. 2022).

Soil organic matter (SOM) is a category of complex NOM, which is a crucial reservoir for carbon and nutrient biogeochemical cycling in soil-based ecosystems. For SOM characterization, organic compounds must be extracted from the soil with little or no chemical alteration (Bahureksa et al. 2021; Wang et al. 2021b; Zhang et al. 2020a). Tfaily et al. (2015) investigated a broad range of chemically diverse soil types using a combination of solvents with varying polarity. The FT-ICR MS analysis of the extracted SOM indicated that hexane was suitable for lipid-like compounds with very low O/C ratios, while water was selective for biological significant molecules such as carbohydrates and amino-sugars with high O/C ratios. In addition, acetonitrile extracted more compounds in the tannin and lignin-like regions with O/C > 0.5; methanol was more suitable for the compounds with O/C < 0.5. However, given that no extraction approach can fully recover the DOM components from an intricated environmental sample, a thorough methodology selection should be conducted based on the sample conditions before the FT-ICR MS analysis.

5 Ionization techniques on DOM coverage

The ionization selectivity for DOM is mainly determined by two fundamental groups in a molecule: polar and non-polar groups. MS was not applied for the DOM analysis until the invention of the electrospray ionization (ESI) source in the late 1990s. Such a technique ionizes compounds from the aqueous solution and produces protonated or deprotonated molecular ions ([M+H]+ or [M−H]) for the subsequent MS analysis (Fenn et al. 1990). Predominantly, negative electrospray ionization (ESI-) is utilized for the MS study of DOM due to its high selectivity for polar acidic functional groups, requiring only a very small sample amount for analysis. Nevertheless, the ESI spectra are often cluttered by salt adducts, while compounds containing sulfur (S) or nitrogen (N) are often lost. In addition, size exclusion chromatography also shows that higher molecular weight and UV-active compounds are often excluded from ESI spectra (Hawkes et al. 2019).

During ESI, the sample solution forms charged droplets via a high voltage. After the droplet evaporates, the charge is concentrated and once the charge exceeds the Rayleigh limit, the droplet dissociates and generates ions (Fenn et al. 1990). According to this mechanism, more polar components are favored by ESI, with primary polar species formed in positive mode and acidic polar species in negative mode.

The carboxyl group is probably the most important moiety in DOM molecules, as it controls DOM’s acidity, buffering activity, water retention and metal-binding strength, etc. The DOM pool consists of a large portion of carboxyl-rich alicyclic molecules (CRAM), including both aromatic and aliphatic carboxylic functions, and this makes ESI- the routine method for MS in geoscience studies. However, ion suppression is observed when measuring complex DOM samples, and the ion intensity is often reduced due to space charge effects. Moreover, DOM compounds vary in heteroatom content, functional groups, molecular size and stereochemical conformation etc., which cause charge competitions between ions during the ESI process. For all the reasons above, there is no universal ionization technique or MS parameter setting that will detect all of the DOM components with equal efficiency, so the mass spectra of a DOM sample measured by various ion sources often differ dramatically in the spectral appearance (Qi et al. 2020). As a result, a proper selection of the ionization conditions or even a combination of ionization methods should be considered for comprehensive characterization of the environmental sample.

For research purposes, various atmospheric pressure ionization (API) sources such as atmospheric pressure photoionization (APPI), atmospheric pressure chemical ionization (APCI), laser desorption/ionization (LDI) and matrix-assisted laser desorption/ionization (MALDI), have been investigated for the study of DOM and related substances. For example, in APPI, the sample solution vaporizes and absorbs photons emitted by a discharge lamp to form ions; therefore, APPI ionizes less polar compounds as compared to ESI. D'Andrilli et al. (2010b) compared the FT-ICR MS analysis of marine DOM. It was found that APPI produced ions with higher carbon unsaturation, whereas ESI generated more oxygenated ions. Moreover, APPI could also ionize polar species and was more tolerant to matrix effect from solvents and salts, thus allowing molecular characterization over different degrees of hydrogen saturation and oxygenation. Podgorski et al. (2012) also showed that an N-containing ion species with the same molecular formula generated by APPI exhibited a 13 times higher S/N than that by ESI. Similar results were demonstrated by Qi et al. (2020), who systematically evaluated the performance of ESI, APPI and APCI using a commercial lignin standard. The authors showed that the number of heteroatoms (e.g., N, S, P) in the lignin molecule significantly increased its chemical diversity. For example, S is more likely to form thiol groups as observed by ESI, while the S-based thiophenic species linked to the aromatic core are uniquely detected by APPI. In 2018, paper-based spray ionization was also adapted for the analysis of DOM, which showed more tolerance to salt contamination than ESI (Kim et al. 2018).

Compared to API, matrix-assisted laser desorption ionization (MALDI) is less prevalent in DOM studies, primarily because of its lower ionization efficiency using conventional matrices (Qi and Volmer 2019), which inhibit the reproducible acquisition of high-quality mass spectra. Nevertheless, MALDI measures solid-phase sample within milliseconds, and hence, it can be directly applied as a shotgun method to determine the most abundant compounds in various environmental samples. For instance, Solihat et al. (2018) demonstrated a proof-of-concept experiment for using this method to analyze soil samples. Only 500 μg of unprocessed soil powder was fixed on a metal plate using double-sided adhesive tape and analyzed directly using FT-ICR MS. Such an approach avoided the extensive extraction and sample desalting steps that other API techniques required. Cao et al. (2015) also compared DOM analysis by MALDI and ESI, demonstrating the complementary nature of the two techniques, often, molecular information invisible by ESI was observed by MALDI. Interestingly, although most peaks at even m/z numbers have identical molecular formulas compared to the corresponding peaks at odd m/z, an average difference of ~1.007825 Da (corresponding to the atomic mass of hydrogen) was observed between even and odd m/z values (Fig. 3). Therefore, the authors speculated that each peak at even m/z having the same molecular formula as that at an odd m/z was actually a molecular ion [M+e]- (the mass of an electron (e) is 0.000549 Da). Moreover, the related peak at odd m/z was a deprotonated molecule [M-H]- (the mass of a hydrogen proton is 1.007276 Da). These radical anions were successfully distinguished by ultra-high resolution of FT-ICR MS (Cao et al. 2015).

Fig. 3
figure 3

FT-ICR mass spectra of NOM in MALDI (A) and ESI (B) for the m/z from 200-800; the scale-expanded m/z segment at nominal m/z 447 in MALDI (C) and ESI (D); FT-ICR mass spectra, at nominal m/z 439 (E, MALDI odd m/z), and nominal m/z 440 (F, MALDI even m/z) in MALDI-FT-ICR mass spectra. Reprinted from Cao, et al. (2015), with permission from Elsevier

Overall, different ionization methods exhibit different selectivities for DOM molecules. For a comprehensive study of DOM, the choice of method should be tailored to meet the particular measurement purposes, cover the desired compound classes, and prevent erroneous interpretations from a single ionization source.

6 Data visualization and processing

Considering the immense complexity of DOM mass spectra, simplification and classification are necessary to process the large datasets and extract the relevant information efficiently from the mass spectra. Numerous graphical and statistical methods have been developed to study DOM to visualize data, analyze samples, enable fingerprinting, and facilitate comparisons between datasets. In general, the van Krevelen plot is most widely utilized, which helps to sort the same type of compounds in specific regions and makes it straightforward to visualize possible links between molecules (Kim et al. 2003). Bar charts such as oxygen, carbon number and double bond equivalent (Tose et al. 2018) distribution have been plotted to examine the chemical features of DOM. The Kendrick mass defect (KMD) rescales the molecules' m/z units according to their homologous structural units (e.g., 14.01565 for CH2) and allows the alignment of thousands of compounds with the same functional groups (Qi et al. 2016). The above methods are being continuously improved to extract hidden information from the FT-ICR MS data. For example, the modified aromaticity index (AImod) is calculated to reflect the carbon unsaturated bond density in a given elemental formula. Such a parameter is typically used in combination with a van Krevelen plot for data visualization; compounds with AImod ≥0.5 are usually identified as aromatic compounds, ≥0.67 are unambiguous indicators of polycyclic condensed aromatics, while <0.5 corresponds to non-aromatic compounds (Koch and Dittmar 2006). Rivas-Ubach et al. (2018) proposed the multidimensional stoichiometric compound classification (MSCC) to classify the assigned DOM molecules into lipid-, peptide-, aminosugar-, carbohydrate-, nucleotide-liked compounds according to the C/H/O/N/P stoichiometric ratios. D’Andrilli et al. (2015) also calculated a molecular lability boundary (MLB) based on the FT-ICR MS data to complement the van Krevelen diagrams. By dividing the data into more and less labile constituents, DOM compounds above the MLB at H/C ≥1.5 correspond to more labile materials, while those below the MLB, H/C <1.5, exhibit less labile, more recalcitrant character. The application of the MLB also offers a lability threshold to compare data from other FT-ICR MS that analyze DOM samples from around the world, e.g., DOM in glacial was found to be most labile, followed by that in marine and freshwater systems. Mann et al. (2015) proposed a CHO index based on C, H and O in the molecule to codify SOM components according to their observed degradation potentials: compounds with an index between -1 and 0 showed high degradation potential, while similar stoichiometries in the base-soluble component did not. In addition, the CHO index allows for direct identification of the distribution of heteroatoms in the soil sample when compared with the van Krevelen diagram.

Due to the ultra-complexity of FT-ICR MS data, mathematical statistics are becoming commonplace; multi-variate analyses can translate raw DOM data with different sizes, structures, and sources into a model. For example, principal component analysis (PCA), redundancy analysis (RDA) and Spearman’s correlation analysis have widely been applied to petroleomics and environmental samples to indicate the variability and correlate molecular characterization between samples (Li et al. 2018). More recently, Hur et al. (2018) introduced a scatterplot of univariate analysis, so called “volcano plot”, to handle the DOM from the petroleomics sample. Such plots were generated by plotting p-value fold change. The differences between two large datasets can therefore be clearly visualized in a statistically significant way and the filtered peaks are plotted using traditional DBE versus carbon number and van Krevelen plots for further analysis.

7 Software developments

Mass spectral processing and characterization of complex DOM compounds including signal-to-noise ratio (S/N) filtering, mass recalibration, chemical formula assimilation, isotopic validations, and graphic visualization, etc., are nowadays frequently performed via freely available or commercial computer programs.

The freely available Formularity software developed by Andrey Liyu and Nikola Tolić at Pacific Northwest National Laboratory, USA, is capable of assigning chemical formulas to FT-ICR MS spectra of DOM by comparing measured m/z values with theoretical m/z values of pre-constructed formula library and selecting the optimum formula among multiple ambiguous solutions based on the minimum number of non-oxygen heteroatoms and homologous inspection (Tolić et al. 2017). Kew et al. (2017) published an MS visualization program by proposing an interactive web browser-based graphical form. This tool provides centroid mass spectra, van Krevelen diagram, DBE versus carbon number plots and AImod versus carbon number plots, offering a powerful visualization of data from DOM mass spectra. However, the entire spectral processing must be performed on the user’s computer using command lines, thus requiring sufficient computer knowledge of the user. Leefmann et al. (2019) also proposed a fully web browser-based tool called UltraMassExplorer (UME). The website offers a complete data evaluation process, from the calculation of elemental compositions to data visualization. The mass spectra can be uploaded in ASCII or CSV formats, consisting of m/z signals and intensities. The processed data can be reviewed by interactive visualizations including 2D and 3D van Krevelen plots. However, some limitations should also be mentioned for these algorithms. For example, Formularity requires input of mass spectral data manually and provides relatively poor performance for peaks with m/z over 500. Similarly, multiple formulas can be assigned for a single peak by UME, and therefore, a further filtration is required to select the formula of best fit.

Merder and Dittmar recently launched a server-based tool called “ICBM-OCEAN (Institute for Chemistry and Biology of the Marine Environment, Oldenburg–complex molecular mixtures, evaluation & analysis)”. This program integrates published novel approaches, including removal of noise, sidebands and contaminants, alignment of spectra from different samples, isotope ratio filter linear quantile regression filter, and fundamental statistical analysis (Fig. 4) in order to achieve standardized analysis of DOM acquired from FT-ICR MS (Merder et al. 2020; Merder et al. 2019; Riedel and Dittmar 2014).

Fig. 4
figure 4

Workflow of ICBM-OCEAN. At each critical step, diagnostic tools enable optimization of parametrization to achieve robust results. Reprinted from Merder et al. (2020), with permission from ACS Publications

Similarly, the MFAssignR Code based on R language could estimate noise, filter 13C and 34S polyisotopic masses, recalibrate masses, and assign molecular formulas for Suwannee River and aerosol DOM samples (Schum et al. 2020). The free TRFu Code was compatible to the negative FT-ICR MS spectra of different NOM types (e.g., freshwater, marine, soil, atmospheric aerosol, biochar) with the formula assignment accuracy up to 94% (Fu et al. 2020a). The TRFu Code was further updated to the NOMDBP Code and FTMSDeu Code to simultaneously assign chemical formulas to NOM and organohalogen molecules (Fu et al, 2020b; Fu et al. 2022). In addition to 13C, 37Cl, and 81Br isotopic patterns, three filter rules and an empirical mass distribution rule of iodinated molecules have been proposed to improve the formula assignment accuracy for organohalogens. The FTMSDeu Code is also compatible to deuterium labeled FT-ICR MS spectra, providing a powerful tool in elucidating liable hydrogen atoms (mainly hydrogen in carboxyl and hydroxyl functional groups) and non-liable hydrogen atoms (such as hydrogen in carbonyl or ether group) in DOM molecules.

Regarding commercial software, PetroOrg (by the MagLab) and Composer (by Sierra Analytics, Inc.) are highlighted, which were developed by experts in the analysis of crude oil and related materials. However, both software programs were initially developed to meet the demands of the petroleomics community and thus parameters should be more carefully optimized to adapt them to the organic geochemistry community.

While FT-ICR MS is a powerful technique to resolve the complexity of DOM mixtures at the molecular level, even the ultra-high resolution of FT-ICR MS may result in ambiguous or false formula assignments to peaks in a DOM mass spectrum, especially at the high m/z region. Phase correction is a data processing method that increases the mass resolving power of any Fourier transform instruments by a factor up to two and the S/N by square root of 2 compared to the conventional magnitude mode (Qi et al. 2012). Although software for absorption mode processing is available for both FT-ICR and Orbitrap MS, it has not been widely applied in DOM studies yet. For this purpose, Da Silva et al. (2020) tested the absorption mode for analyzing a reference sample -- Suwannee River natural organic matter (SRNOM), using analytical figures of merit including precision, accuracy, and reproducibility. The result showed that the additional effort for processing the absorption mode mass spectra achieved three times higher mass resolving power and a pronounced improvement for molecules containing nitrogen and sulfur (Fig. 5).

Fig. 5
figure 5

(a) Van Krevelen diagram showing the compound class of new molecular formulas assigned to shared peaks in triplicate measurements of the absorption mode (peaks shared between modes are shown in gray); (b) m/z density distribution of the peaks shared between the modes and additionally assigned in the absorption mode. Reprinted from Da Silva et al. (2020), with permission from the ACS Publications

Overall, untargeted molecular analyses of complex mixtures are relevant for many research fields, including proteomics, petroleomics and geochemistry. The demand for more efficient data processing, visualization, fingerprinting and comparisons between samples is snowballing. Open scripts, online tools and commercial software for DOM data processing will undoubtedly further flourish in the community and receive more attentions in the future.

8 Paradigm in geochemistry and environmental science

Determining the detailed composition of DOM in different environments is the prerequisite to understand its impact on water quality, pollutant bioavailability, as well as its long-term effect on microbial/anthropogenic activities. During the last two decades, FT-ICR MS has become a well-established analytical tool in the community for detailed DOM analyses. In the limited space of this review, we were not meant to be comprehensive but rather highlight some of the important milestones in FT-ICR MS application originating from around the world  to review the paradigm of FT-ICR MS in geochemistry studies.

In the early 2000s, the community adopted KMD plots to visualize the mass spectra. Such a plot allows compounds with the same functional groups or structural units to be aligned into a compact visual space, without calculation of elemental formula. For example, the molecular mass of each molecule can be normalized to the CH2 group, to generate a horizontal row of the points in the plot that representing a homologous series of molecules separated by 14.0000 Da. By contrast, in the vertical row, higher KMD values indicate that these molecules are more saturated (Stenson et al. 2003). Later, Kim et al. (2003) introduced the van Krevelen plot to sort the compounds according to their O/C and H/C ratios. The diagram makes it straightforward to visualize the components within the same category as well as their possible links. For example, Mopper et al. (2007) utilized the diagram to map the humic components (consisting of polar, carbonyl groups) collected from the reversed-phase LC, and it was found that fractions with higher O/C ratios eluted off faster than the lower ones.

Although the van Krevelen plot has become an attractive and necessary step to provide valuable insight into the DOM data, its drawbacks should be kept in mind. Firstly, the assignment of elemental formulas is required for the calculation of O/C and H/C ratios. Secondly, the information on molecular mass is lost in such a two-dimensional diagram. Thirdly, the boundary of compound classes is not notable in the plot and the fractions tend to be centralized in the lignins and tannins territory. Lastly, the calculated elemental formula represents molecule’s composition only, with no connection to the hundreds or thousands of possible structural isomers. For example, a recent study using TIMS-FT-ICR MS demonstrated that approximately 6 to 10 isomers could be assigned for each elemental formula (Leyva et al. 2019). Therefore, the ratio of elements is merely a reference to identify the compound classes. For the above limitations, additional criteria such as CHO index, AImod, MLB are further applied to complement the van Krevelen plot.

While FT-ICR MS is a powerful instrument for the analysis of complex DOM samples, the combination with another analytical technique further increases separation power. As early as 1997, Fievre et al. (1997) coupled LC with FT-ICR MS, and demonstrated that individual structural information on fulvic acid mixture could be obtained. The reason for the improved performance is the increased ionization efficiency of the compounds of interest, as they are no longer in the presence of various other types of constituents. Recently, Kim et al. (2019) demonstrated that the separation of lignin- and tannin-type compounds from NOM can be achieved via LC coupled with FT-ICR MS. In this work, they were able to perform both quantitative comparison and molecular level investigation on NOM with a sample of 2 μg. Additionally, Qi et al. (2021) utilized LC-FT-ICR MS for online analysis of DOM in river- and rain-water. With sophisticated instrumental optimization, different portions of metal salts, carboxyl-rich alicyclic molecules, organosulfates (OSs), and lignin-like compounds were successfully fractionated within one LC cycle of 20 min (Fig. 6).

Fig. 6
figure 6

(a) Total ion current chromatogram for surface water (SW, blue) and rainwater (RW, red) samples by LC-FT-ICR MS with four different fractions labeled. (b) Averaged mass spectra of SW for the four fractions (left) and mass scale expansion of the region of m/z 364.96–365.25 of fractions SW II and SW III. The nitrogen-containing compounds are labeled in yellow, and the sulfur-containing compounds (mainly OSs) are labeled in blue. (c) Averaged mass spectra of RW for the four fractions and mass scale expansion of the region of m/z 364.96–365.25 of fractions RW II and RW III. Reprinted from Qi et al. (2021), with permission from the ACS Publications

The light-absorbing fraction of DOM is called chromophoric DOM (CDOM), which directly affects the optical properties of the aquatic ecosystems and protects aquatic organisms from UV radiation. It should be noted that the light-absorbing fraction of organic aerosol in the atmosphere is generally called brown carbon (BrC) (Hu et al. 2021; Laskin et al. 2015; Li et al. 2020; Yue et al. 2022). When excited by photons with specific energy, a sub-fraction of CDOM emits fluorescence; this portion of DOM is called fluorescent DOM (FDOM). Due to these properties, UV and fluorescence spectroscopies have been widely utilized to investigate the aromaticity, sources, reactivities and diagenetic states of DOM (Coble 2007). In addition, such spectroscopic tools can provide quantitative information, which is considered a major disadvantage of FT-ICR MS for DOM analysis (Li and Hur 2017) (Ge et al. 2022). Tremblay et al. (2007) coupled UV and EEM with FT-ICR MS to trace the transport and changes of DOM from mangrove porewaters in a Brazilian estuary. Subsequently, the peak picking algorithm to define fluorescent fractions in EEM was improved by the statistical fitting approach, PARAFAC, which can break down the fluorescence signal into underlying individual fluorescent components with specific excitation and emission spectra (Stedmon and Bro 2008).

Stubbins et al. (2014) analyzed river water samples from boreal Québec, Canada, using the EEM/PARAFAC approach and FT-ICR MS to define the molecular signatures associated with EEM (Fig. 7). It was found that 39% of the molecular formulas identified by FT-ICR MS were related to the PARAFAC components, showing that the coupling of FT-ICR MS and EEM/PARAFAC offered information to the biogeochemical cycling of a much larger proportion of the DOM pool.

Fig. 7
figure 7

The van Krevelen plot shows the molecular families associated with PARAFAC components and data points are coloured by molecular weight. Reprinted from Stubbins et al. (2014), with permission from the ACS Publications

The negative-mode ESI is the most applied ionization method for DOM analysis, making FT-ICR MS biased for hydro-soluble compounds. By contrast, GC-MS is suitable for volatile compounds, which are generally organo-soluble (Fu et al. 2012; Tang et al. 2020; Zhang et al. 2020b). In addition, pyrolysis (Py) can be coupled with the GC-MS to crack the non-volatile DOM components into small volatile molecules at a high temperature (e.g., 800 °C), and therefore, more precise information can be quantitatively obtained by GC-MS, with very small amounts of sample. For instance, Py-GC-MS is a classical method to determine the source, degradation state, and specific functional units for terrestrial lignin (Goñi and Hedges 1992; Hedges et al. 1982; Louchouarn et al. 2010). Nevertheless, such a method derivatizes and liberates a fraction of the monomers for analysis, and hence, only provides partial data due to the specificity of the various treatments. So, by the integration of GC-MS, or two-dimensional GC-MS with FT-ICR MS, both organo- and hydro-soluble DOM can be well characterized in detail (Drosos et al. 2017; Xu et al. 2021).

Fingerprinting of DOM and determining its fate in various environments is crucial to understanding the global carbon cycle. Oceans are a large and important natural environment. Marine DOM represents the largest organic carbon reservoir on Earth, playing an important role in many biogeochemical processes (Hedges et al. 2000). Usually, studies in the field of marine science span several sampling sites, and FT-ICR MS analysis is utilized to reveal the sources, similarities, and differences of DOM from site to site; formation and degradation processes; environmental impacts and potential reactive nature. Several representative publications in this field deserve special mention, including Hertkorn’ and Dittmar’s work on the Pacific Ocean (Dittmar and Koch 2006; Hertkorn et al. 2006), Koch’s work on Antarctic bottom waters (Koch et al. 2008) and D'Andrilli’s work on the Weddell Sea (D'Andrilli et al. 2010b). The above paradigm is further expanded to investigate the DOM flux from fluvial systems into oceans and the anthropogenic influences from watersheds (He et al. 2019; He et al. 2020; Pang et al. 2021; Roebuck Jr et al. 2019; Sleighter and Hatcher 2008; Wang et al. 2021a; Wang et al. 2019b). An underlying problem in these works is to distinguish the composition of marine DOM from terrestrial DOM. Koch et al. (2005) established the groundwork by comparing the molecular composition of a marine DOM from the Weddell Sea with terrestrial DOM from the coast of Northern Brazil and found surprisingly, that one-third of the molecular formulas were observed in both samples. The authors assumed that the similarities were due to biotic and photochemical processes, independent of the environmental conditions where the DOM samples came from. Later on, this hypothesis was also confirmed by another study (Zark and Dittmar 2018).

Peatland is a major reservoir of organic carbon, which is climate-sensitive and can be greatly affected by ground temperature, warming air, soil respiration and microorganisms. Peatland DOM mainly consists of cellulose and lignin-derived phenols from plants (Benner et al. 1984). It is known that warm climate enhances microbial productivity and makes the peatland DOM more vulnerable to degradation. Such a phenomenon causes a positive feedback to climate change by turning the carbon reservoirs into emitters of greenhouse gases (CO2 and CH4). However, due to the complex geochemical conditions, it is difficult to characterize the decomposition pathways of DOM. For example, the DOM composition in peatland changes with depth and varies between alkaline fens and more acidic bogs. FT-ICR MS was utilized to reveal compositional differences of peatland DOM as well as their reactivities. D'Andrilli et al. (2010a) reported the DOM in fen and bog porewaters from the Glacial Lake Agassiz Peatlands of northern Minnesota, USA and concluded that hydrogenotrophic methanogenesis is related to the less reactive NOM in bogs, while moister conditions may benefit sedge-dominated fens and increase the pool of more labile unsaturated and aromatic compounds. A subsequent study indicated that oxygen shedding and DOM reduction are important processes during peat decomposition, which may explain the increased CO2 concentrations relative to CH4 in peatlands and can serve as an indicator for climate feedbacks due to the higher global warming potential of CH4 compared to CO2 (Tfaily et al. 2013).

The above applications of FT-ICR MS coupled with other analytical instruments have offered an appreciation of the extent to which specific DOM compound in soils, surface waters, glaciers, sediments, and atmosphere could play important roles in a variety of biogeochemical processes. With the development of modern analytical techniques, the fate of DOM in these environments has also become apparent and begun to attract many new practitioners.

When viewing these paradigmatic studies, it is obvious that the van Krevelen plot has been adapted as a routine strategy to visualize the complex FT-ICR MS datasets. Nevertheless, it is also important to point out that this approach fits the compounds into a two-dimensional diagram according to the O/C and H/C ratios, so that information on the molecular weight is lost. More importantly, each data point in the plot may represent a large number of distinct structural isomers! To overcome these issues, complementary data handling and analytical tools are increasingly applied in the field to better explore the geochemical information hidden in the DOM mixtures.

Last but not least, data processing such as PCA and RDA utilized the abundances of the ion signals; however, the signal abundances in the FT-ICR MS measurements are greatly affected by the experimental parameters of the instrument. Therefore, the parameter setting of FT-ICR MS should keep constant for applying the statistical analysis to ensure statistical confidence and accurately correlate the features between DOM and environment (Hawkes et al. 2020).

9 Novel applications

Advances in FT-ICR MS capabilities and methodologies have significantly benefitted the research communities of geochemistry and atmospheric chemistry. For example, the three-dimensional distribution of DOM molecules can even be mapped and quantified by MS. Araújo et al. (2014) who applied a mass spectrometry imaging (MSI) technique to view the spatial and lateral distributions of soluble lignins in hand-cut sections of stems of Eucalyptus. As a result, the proportions of syringyl and guaiacyl monolignols were readily detected and relatively quantified (Fig. 8).

Fig. 8
figure 8

Phloroglucinol staining and mapping of sinapyl, coniferyl, and p-coumaroyl alcohols in freshly hand-cut sections of Eucalyptus stems. For relative quantification, the figures were converted to grey images. Reprinted from Araújo et al. (2014), with permission from the ACS Publications

Molecular characterization of DOM is the first step to develop a complete understanding of environmental geochemistry processes and dynamics. Once the elemental composition is determined, the next step is to obtain its chemical and structural information. A thorough knowledge of DOM composition as well as its structure is crucial for the understanding of its role in the environment. Tandem mass spectrometry (MSn) and hydrogen/deuterium (H/D) exchange are two of the most important methods for structural analysis in the field of MS to offer further insights into the structural diversity.

For linear biomolecules such as proteins, their primary structures can be uniquely defined by the one-dimensional sequences, while for most DOM, structural elucidation requires multi-dimensional determinations, including functional groups, linkages, structural isomers and even stereochemical configurations. Tandem MS is a technique to break down selected ions (precursor ions) into small fragments (product ions) and in-depth structural information of the precursor ions can be proposed from the product ions.

Most commonly, collision induced dissociation (CID) is utilized as an ion activation method. In CID, precursor ions are isolated and subjected to collisions to induce fragmentations; the resulting product ions offer information on the ion’s functional groups and structural units. The fragments can be further isolated and subjected to more collisions (MS3, MS4…) to acquire increased information (Sleno and Volmer 2004). CID with FT-ICR MS has gradually been applied to explore the dominant features of complex DOM mixtures, e.g., probe functional groups, CO2 and H2O units originating from highly ionizable carboxylic groups (Witt et al. 2009; Zark et al. 2017; Zark and Dittmar 2018). Banoub et al. (2015) authored an informative tutorial article, which summarized the application of MSn for sequencing lignin and its degradation compounds. More recently, infrared multi-photon dissociation (IRMPD) has been introduced as a technique to cleave C-O and C-C bonds of a molecule’s backbone chain, providing informative and complementary information to CID. Furthermore, IRMPD has been introduced to aquatic DOM to reveal structural information and reactivity pathways that are unavailable from CID (Brown et al. 2016; Kurek et al. 2020).

Although MSn plays an important role in protein sequencing, its application to DOM is currently still restricted to identify certain functional groups. This is caused by the complex features of DOM: e.g., ions species are too closely spaced to be individually isolated; isobaric interferences and precursor ions with the same nominal mass could be fragmented together in MSn, leading to ambiguous structural interpretation. Overall, efforts are required in the future to develop advanced approaches that are suitable for DOM structure analysis.

Normally, functional groups such as carboxyl, hydroxyl or amide contain labile hydrogens, which can be replaced by deuterium in solution or in the gas phase, and therefore, the number of hydrogens exchanged by deuterium in a DOM compound under such conditions might serve as an indicator to the number of the corresponding functional groups containing labile hydrogens. Kostyukevich et al. (2013) reported a simple in-source H/D exchange method based on FT-ICR MS for the determination of labile hydrogens in all constituents composing DOM. The experiment indicated that Suwannee River fulvic acid (SRFA) and lignosulfonate molecules contain 2-5 labile hydrogens, while the compounds of Siberian crude oil ionizing in negative ESI mode had one labile hydrogen. The three substances represent different degrees of transformation of DOM in nature, and H/D exchange FT-ICR MS offers new insights into the carbon cycle on the Earth.

The FT-ICR MS describes the DOM at the molecular level using chemical formulas. Unfortunately, the structural diversity of many compounds remains unknown. Moreover, the isomeric feature of DOM makes its structure analysis a challenging problem. In addition to MSn and H/D exchange, trapped ion mobility spectrometry (TIMS) offers an alternative to reveal the isomeric diversity and structure features of complex mixtures. For example, Gao et al. (2019) described the SRNOM fractionated by multi-step SPE with different elution solvents, the fractions were both characterized by FT-ICR MS and IMS-MS, and showed the combination of the two instruments characterized the complex mixture into a new depth. Fernandez-Lima and co-workers pioneered the early works for the integration of TIMS with FT-ICR MS and reported its unique advantages for the analysis of complex mixtures (Fig. 9) (Benigni et al. 2017; Leyva et al. 2020).

Fig. 9
figure 9

The ion species isolation and CID are shown as a function of the ion mobility scans (bottom row). Five IMS bands were assigned to the heterogeneous ion mobility profile of [C18H18O10-H]-, and candidate structures were proposed based on their ion mobility and the MSn matching score. Reprinted from Leyva et al. (2020), with permission from the ACS Publications

It is also important to point out that TIMS-FT-ICR MS was able to report the lower abundance compositional trends compared to TIMS-TOF MS (Tose et al. 2018). Very recently, the Fernandez-Lima group further expanded the boundaries of TIMS-FT-ICR MS by performing elemental formula-based ion mobility and MSn analysis for DOM structural characterization (Leyva et al. 2020). In the reported work, ion species with a nominal mass of 393 were isolated including isobars and isomers, and five IMS bands were assigned to the heterogeneous profile of the precursor ion [C18H18O10-H]- and correlated with the data obtained by MSn (Fig. 9).

10 Current issues and future work

Mass spectrometry has become a well-developed and highly valuable tool in environmental and earth sciences. In early studies, FT-ICR MS was introduced to the geochemistry and atmospheric chemistry community to answer questions such as: What is the complex DOM mixture composed of? How to identify its sources and chemical transformations? How will it evolve and impact on the environment? During the last two decades, advances in the sample treatment, methodology development, data handling and instrument modification of FT-ICR MS have greatly contributed to a better understanding in the fields of environmental chemistry, organic geochemistry, and atmospheric chemistry.

Despite the fundamental principles, recent developments, and significant achievements of FT-ICR MS summarized here, the pertinent and immediate challenges remain. While ESI is the most viable method for DOM, various ionization methods are available to provide complementary information by preferentially accessing specific molecule types. Unfortunately, there are still fractions of DOM that cannot be fully detected. When the raw samples are solid or absorbed in situ, it is challenging to extract the representative DOM for the MS analysis. In addition, fragments and adducts resulting from the ionization process can further complicate the mass spectrum and lead to erroneous information. It is important to highlight that the signal intensity of compounds observed in MS is influenced by a number of different factors, including sample preparation protocols, matrix effects, solvent pH values, molecules’ ionization efficiencies, and instrument parameters. The contribution of these factors needs to be better considered, especially when comparing measurements from different batches and laboratories (Hawkes et al. 2020). More importantly, the observed compound intensity in a mass spectrum cannot be linked to its concentration straightforwardly. Although the use of spiked standards at known concentrations makes quantification of certain compounds of interest possible, quantification of a broader range of compounds presents a difficult task. Each analytical technique has its strengths and weaknesses, and the use of hyphenated techniques would address the remaining gaps and allow the most meaningful instrumental performance. Complementary analyses are increasingly used to better understand abiotic and biotic processes of DOM and to predict its geochemical impact.

When viewing the mass spectral data, it is essential to keep in mind that identical elemental formulas are not necessarily equal to identical molecules. Although FT-ICR MS is capable of doing compositional analysis of complex mixtures, illustration of the molecular structure remains a challenge. MSn approaches such as CID offer structural information; however, due to the complex nature of DOM samples, nominal mass resolution on the front-end quadrupole is not efficient enough to isolate a specific molecular ion from a narrow m/z window. Therefore, multiple ion species and their isomers would be mass fragmented simultaneously, leading to complicated MSn spectra, which are represented by several different precursor ions. Coupling with chromatography often allows separation of isomers and isobaric species, as well as efficient mass isolation of single molecular species. On the other hand, these hyphenated methodologies would generate much larger and even more complex datasets. Developing software for handling, visualization and comparison of multiple complex datasets is becoming a new challenge.