Introduction

Natural organic matter (NOM) is ubiquitous in both terrestrial and marine environments and encompasses important reservoirs in the global carbon budget (Amon et al. 2004; Cory et al. 2013; Moran et al. 2016; Bruhwiler et al. 2018; Kurek et al. 2021, 2022 and references therein). Furthermore, components of NOM participate in a number of important biogeochemical and environmental reactions; supporting heterotrophic microbial activity; mediating redox processes; controlling the photochemical fate, bioavailability, and transport of trace metals and organic pollutants; and influencing critical aspects of water treatment such as membrane fouling, disinfectant stability, and byproduct formation (Haitzer et al. 2003; Wei-Haas et al. 2014; Peng et al. 2014; McNeill and Canonica 2016; Oldham et al. 2017a,b; Wenk et al. 2021; Kurek et al. 2021 and references therein). NOM composition is a function of its provenance and the biogeochemical environment in which it evolves (Fig. 1). Samples from specific locations possess a suite of generic properties that are both characteristic of their precursor environment and exhibit stability in their properties such as spectroscopic character, molecular weight distribution, and chemical composition.

Fig. 1
figure 1

Distributions of UV–visible and fluorescence spectroscopic properties: a fluorescence index (FI), b humification index (HIX), and c specific UV absorbance at λ = 254 nm (SUVA254) of the dissolved organic matter (DOM) across atmospheric, marine, and terrestrial biospheres. IHSS end-member natural organic matter (NOM) isolates, Suwannee River I and Pony Lake Fulvic Acids (SRFA and PLFA), overlay each distribution plot to show how it is now clear that they no longer are representative known gradients in NOM across biospheres. FI is the ratio of fluorescence at 470 and 520 nm, from excitation at 370 nm. HIX is the ratio of integrated fluorescence from 435 to 480 nm and from 300 to 345 nm, from excitation at 254 nm. Both are unitless. Specific UV absorbance (SUVA) is the ratio of absorbance at 254 nm to dissolved organic carbon concentration (units: m2 g−1 C). The conceptual diagram in the upper right-hand corner depicts the environmental biogeochemical gradient and subsequent changes to FI, HIX, and SUVA254 as materials move toward the ocean. (Figure source: D’Andrilli et al. (2022); https://pubs.acs.org/doi/10.1021/acs.est.2c04240; further permissions related to the material excerpted should be directed to the ACS)

Protocols for isolating NOM, and its humic and fulvic acid (HA and FA, respectively) fractions have been developed based on preparative scale column chromatography and reverse osmosis methods and widely applied since the 1980s (Aiken et al. 1979, 1992; Serkiz and Perdue, 1990; Sun et al. 1995; Maurice et al. 2002; Koprivjnak et al. 2006; Dittmar et al. 2008; Green et al. 2014). Using these protocols, relatively large quantities (100 g to several kg) of NOM have been isolated and served the scientific community with standardized/reference materials for use in comparative studies.

For nearly 4 decades, the International Humic Substances Society (IHSS), a non-profit scientific society, has distributed standard material that are collected under the direct supervision of the IHSS from four designated sites (1 aquatic and 3 soil) and adhere to strict protocols. Additionally, IHSS has facilitated the distribution of reference material that has been isolated by other parties from a broader range of sites using community consensus and similar, but not necessarily identical, protocols. The standard materials only comprise humic and fulvic acids, while reference samples can include HA and FA extracts as well as materials (simply referred to as “NOM”) isolated using a reverse osmosis protocol with a significantly higher recovery that retains a broader range of organic compounds (Koprivnjak et al. 2006; Green et al. 2014). For isolating NOM from waters with higher inorganic ion concentrations, the RO process has been coupled with electrodialysis and/or ion exchange for the simultaneous removal of inorganic ions to minimize precipitate formation and membrane fouling issues (Koprivnjak et al. 2009).

It is important to note that different lots of the standard material collected from the same sites over a period of decades appear to be compositionally similar (Averett et al. 1994; Green et al. 2015). Furthermore, freeze-dried HA, FA, and NOM materials have been found to be compositionally stable. Unfortunately, the currently available standard and reference materials represent a quite narrow range from a biogeochemical and global perspective. Aquatic NOM collection sites now comprise only two locations (both in North America) since the supply of the Pony Lake (Antarctica) FA reference material was exhausted in 2016. All three soil collection sites are also located in North America.

These samples are used by a diverse group of scientists to study the role that NOM plays in mediating biogeochemical and environmental reactions in soil and water. Over the past decades, the availability of these samples has made it possible for researchers to study aquatic and terrestrial natural organic matter chemistry and compare their findings with those of other scientists worldwide (Vindedahl et al. 2016; Zherebker et al. 2020; Zhou et al. 2000). Furthermore, these materials are also often used to corroborate analytical measurements within and across research laboratories (Hawkes et al. 2020; Zherebker et al. 2020; Zhou et al. 2000), shaping ‘best practice’ for developing or assessing new methodologies as well as a comparative guide in experiments involving new NOM isolates from other study sites.

The intent of the IHSS in establishing the standard and reference collection was that a sufficient quantity would be available to support research for several decades. Since its initiation, the reference collection of the IHSS has been augmented by the addition of new isolates that have been made available to the broad community, but not necessarily at quantities that are sustainable over many decades. Overall, the availability of both standard and reference isolates has provided an invaluable service to a broad research community spanning all fields of environmental engineering and science.

The locations for the collection of IHSS reference and standard humic samples were primarily selected based upon (1) the long-term biogeochemical stability of the source environment, (2) representative characteristics with respect to NOM precursor chemical composition (aquatic vs terrestrial provenance), and (3) site accessibility. Most of the legacy IHSS standards and references were obtained from soils, leonardite (a naturally oxidized brown coal used as an industrial source of humic acid), peat (for terrestrial samples) and a few mid-latitude lakes and rivers at sites located in North America and Scandinavia. Of the mid-latitude aquatic samples, the HA, FA, and NOM samples from the Suwannee River (SRFA, SRHA, and SRNOM, respectively), which drains the Okefenokee Swamp in Georgia, USA (Yin and Brook 1992, Supplement 1 in SI), have been most important in serving as a representative terrestrially derived aquatic NOM biogeochemical end-member. This site qualifies as an IHSS standard for the HA and FA fractions and as a reference for the reverse osmosis NOM. In contrast to these materials from mid-latitude sites, there has been a reference fulvic acid from a hypereutrophic lake (Pony Lake or PLFA, Supplement 2) located on Ross Island, Antarctica, which has served as an endmember representing aquatic fulvic acid derived solely from microorganisms (algae and bacteria) (McKnight et al. 1994; Brown et al. 2004). To date, however, the collection of IHSS NOM and humic materials currently available to researchers from the IHSS has a distinctly terrestrial bias.

Since the establishment of the IHSS standard and reference program, research focus has expanded to examine the extent to which human activity has modified terrestrial and aquatic environments in an era now termed the Anthropocene. These alterations of the environment include direct introduction of anthropogenically derived materials into soils and freshwater and marine environments. These substances range from local inputs of wastewater effluent, microscopic fragments of plastics, to trace levels of refractory organic chemicals (Rice and Westerhoff 2015, 2017; Park et al. 2018; Barber et al. 2019; Lee et al. 2020, 2021; and references therein). These inputs contribute to the overall pool of organic material in these environments. Human activity has also greatly impacted the biogeochemical cycles of nitrogen, phosphorus and sulfur, which are also important constituents of NOM. Moreover, superimposed on these direct anthropogenic and biogeochemical influences, the hydrological cycle is also changing and can alter the accumulation and transport of NOM, and potentially its composition as well. In this light, it is timely to consider the potential expansion of the IHSS collection of standard and reference materials. In addition, despite significant advances in analytical instrumentation many challenges remain in characterizing NOM, HA and FA. In particular, the comprehensive identification of specific chemical structures in NOM remains elusive, although state-of-the-art analytical methods such as high-resolution mass spectrometry and multi-dimensional nuclear magnetic resonance spectroscopy are providing new opportunities. Thus, advances in NOM molecular characterization certainly stand to benefit from a new generation of readily available standard and reference NOM from both anthropogenically disturbed and pristine sites.

This paper presents the consensus findings of the attendees of a 2-day workshop to consider potential “next-generation” reference and standard NOM, humic and fulvic acid samples. The workshop was partially motivated by the depletion of the Pony Lake fulvic acid reference, and attendees covered all user group disciplines as well as all career stages. The following questions guided the identification of potential additional reference materials to the current IHSS portfolio.

  • Why is there a need for new reference NOM in addition to our current set of references?

  • What do potential future sites offer with respect to addressing important knowledge gaps defined by the NOM research community?

  • Which other sites should be considered (including those where large-scale isolation of NOM is logistically difficult) to meet the future needs of the community?

Detailed descriptions of current IHSS sites and potential new sites for samples are included in the supplemental section (SI). Recommendations for approaches for establishment of data archiving and access by the international user community were also developed.

Background and rationale on the isolation of NOM

A key aspect of launching the standard and reference FA, HA, and NOM collections was to support the development and establishment of methods for isolating humic substances from soil and water that would provide isolates with consistent and comparable chemical properties. The IHSS originally adopted a preparative scale, operationally defined chromatographic approach employing XAD (poly-divinylbenzene-co-ethylvinylbenzene) resins. Specifically, HA and FA as defined by the IHSS have an enhanced sorption to XAD resins at low pH, i.e., when the phenolic and carboxylic acid functional groups are protonated but have a limited affinity when these groups are ionized at high pH (Aiken 1985). This operationally defined chromatographic approach can be scaled to both natural water samples and extracts of soil and sediment organic matter to facilitate comparisons along a continuum of terrestrial and aquatic samples (Thurman and Malcolm 1981). Because some XAD resins are no longer commercially available, for recent sampling campaigns (e.g., Suwannee River in 2016) the IHSS has replaced XAD-8 resin with Supelite™ DAX-8, a polymethylmethacrylate resin from Sigma-Aldrich, which has a similar performance to XAD-8 (Chow 2006).

Since the development and application of the chromatographic approach, the IHSS has expanded its isolation methodologies to include reverse osmosis (RO) to isolate NOM from aquatic sites because of RO’s ability to capture nearly the entire dissolved organic matter pool (> 90%) as opposed to only the HA and FA fractions. The reverse osmosis method, however, cannot be readily applied to soil extracts to obtain an isolate that meets the same operational criteria as for an aquatic sample. Finally, styrene divinylbenzene (SDVB) resins, marketed by Agilent Technologies as solid phase extraction (SPE) PPL Bond Elut cartridges, has become commonly used to isolate NOM recently because of its ability to remove salts and capture a larger component of the NOM pool beyond the HA and FA fractions isolated by the XAD method (> 60%: Dittmar et al. 2008; Green et al. 2014). Notably, PPL cartridges have been extensively applied to isolate marine NOM. However, multiple cartridges would be required to sample large quantities of ocean water (Green et al. 2014), posing challenges to scale up to the quantities needed for standard or reference samples. Currently, IHSS does not provide any reference or standard NOMs that have been obtained using PPL resin.

For either a resin-based or RO approach, the isolation of an aquatic reference sample requires an extensive field effort. For example, the collection of the major aquatic humic reference materials from the Suwannee River in southeastern Georgia, USA, in 1983 resulted in several kg of fulvic acid and took about 4 months of effort based at Stephen Foster State Park. This effort was supported by the Water Resources Division of the U.S. Geological Survey as part of an initiative to better understand the importance of dissolved organic material (DOM) in determining water quality. The availability of standard and reference isolates, especially from sites that are difficult to access and represent unique NOM, such as PLFA, can accrue secondary benefits beyond strictly research endeavors. For example, accessibility of NOM stocks promotes inclusivity by removing barriers that can exclude financially and logistically challenged scientists working globally from conducting their research.

Impact and continuation of current IHSS NOM sample offerings

The broader impact of the IHSS standard and reference material is clearly apparent in the publications and citations accumulated involving Suwannee River materials (Figs. 2 and 3,). The rate of publication has grown steadily since the first papers appeared in 1987 and there has been exponential growth of citations since 1988, now reaching a rate of over 6000 citations per year. The standard and reference NOM isolates have had unanticipated popularity as the basis for laboratory experiments in addition to being used as material to support the development and validation of isolation methods.

Fig. 2
figure 2

Number of publications using fulvic acids, humic acids, and natural organic matter from the Suwannee River (SI: Supplement 1). The plot was generated using Web of Science on June 15, 2022, spanning 1980–2022 and including the search terms: “Suwannee” AND “fulvic” OR “humic” OR “natural organic matter” OR “NOM”. Updated from Hozalski et al. (2018)

Fig. 3
figure 3

Number of publications using fulvic acids, humic acids, and natural organic matter from the Suwannee River (SI: Supplement 1). The plot was generated using Web of Science on June 15, 2022, spanning 1980–2022 and including the search terms: “Suwannee” AND (“fulvic” OR “humic” OR “natural organic matter” OR “NOM”). Updated from Hozalski et al. (2018)

Pony Lake fulvic acid was distinct in that it lacked any molecular components derived from terrestrial precursors, due to its location. As such, its precursor organic materials are exclusively derived from single cell algae and bacteria in an environment that is comparable to other eutrophic aquatic environments (McKnight et al. 1994; Brown et al. 2004). In the context of the site selection for the IHSS samples, PLFA satisfies criteria (see previously) 1 and 2, but not 3 because it was procured through a collaborative effort between the U.S. National Science Foundation (NSF, as part of a much larger biogeochemical study) and the IHSS. At the time, it was thought that enough material was collected to last about 25 years. Distribution by the IHSS began in 2006 and a major driver was the fact that it represented a biogeochemical endmember that represented NOM derived from strictly microbial precursors in contrast to existing IHSS offerings. It also satisfied an emerging focus on the composition and reactivity of algae and bacterial derived DOM in both freshwater and marine environments. The research conducted using PLFA provided key evidence regarding both the compositional diversity and reactivity of NOM and has demonstrated that there is no “one size fits all” model for NOM reactivity (e.g., D’Andrilli et al. 2013; Quentel and Filella 2008). As with SRNOM the broader impacts of this reference NOM are significant (Fig. 4) with the number of publications that utilize PLFA increasing rapidly until 2016 when stocks were depleted.

In addition, the relatively new Mississippi River NOM (MRNOM) reference material has played an important role in a number of studies including the testing of analytical methods (Kim et al. 2019), photochemical processes (Partanen et al. 2020) and adsorption phenomenon (Wu et al. 2021). This new addition to the IHSS portfolio represents an aquatic NOM extracted from a watershed that is derived from both terrestrial (from runoff) and microbial (through in-stream primary productivity) precursor materials. Further, the Mississippi River at this location is semi-developed and represents a watershed that is impacted by anthropogenic activity.

A conclusion from the workshop was that the NOM research community would be best served by maintaining current stocks of Suwannee River fulvic humic acid, and natural organic matter, Mississippi River NOM, and Pony Lake fulvic acid (or NOM). However, the attendees recognized the need to expand the IHSS’ current offerings to reflect the diverse nature (both its composition and reactivity) of NOM to best serve the research community that studies its role in important biogeochemical, environmental, and engineering processes.

Expansion of reference material across the spectrum of pristine to anthropogenically impacted environments

In addition to the current IHSS standard and reference materials, the workshop attendees reached a strong consensus that expansion of the current IHSS offerings was needed to better understand the role and reactivity of NOM in the Anthropocene. The identification of potential new sites was based upon criteria that reflect knowledge gaps in our current understanding of NOM’s roles in carbon and other important biogeochemical cycles as well as environmental processes in the Anthropocene. A list of potential sites for collection of high-priority types of NOM and other future sites for collection of soil NOM is provided in Table 1. A summary of the discussion at the workshop on types of sites is given (Fig. 4).

Table 1 Summary of potential and other future IHSS reference and/or standard DOM/NOM sites
Fig. 4
figure 4

Timeline of publications citing the application of Pony Lake fulvic acid (SI: Supplement 2) in NOM research, from 2007 to 2021 by analyzing references in SciFinder using the search terms “1R109F”, “Pony Lake NOM”, or “Pony Lake fulvic acid”. *Pony Lake fulvic acid stocks were depleted by 2016

Need for unique end-member sites

The previous paradigm of a one-dimensional continuum from one endmember to another (i.e., on a continuum from terrestrial to aquatic) does not fully capture the range of diversity in NOM found in the biosphere. This is apparent from simple spectroscopy measurements (Fig. 1) (D’Andrilli et al. 2022) but is also evident from the suite of new analytical techniques such as high-resolution mass spectrometry, electrochemical redox characterizations, and improved functional group analysis (McAdams et al. 2018; McKnight et al. 2001; D’Andrilli et al. 2013, 2022; Cawley et al. 2013). A dendritic representation of NOM is more appropriate (Fig. 5). In this new paradigm, whereby endmembers exist as “branches” from a central category of NOM pools and the specific biogeochemical and/or anthropogenic environments responsible for their creation and chemical diversity are incorporated. This more complex description of NOM recognizes the diversity of precursors and reactions (both biotic and abiotic) involved in NOM production and is based upon knowledge gained from molecular level structural characterization and studies that targeted specific modes of NOM reactivity, e.g., photochemical, redox, bioavailability, and metal complexation.

Fig. 5
figure 5

A dendritic representation of three main natural organic matter (NOM) pools on Earth (atmospheric, marine, and terrestrial) showing their diverse nature in subdivision branches. Dashed boxes show where Suwannee River and Pony Lake Fulvic Acids (SRFA and PLFA) and Mississippi NOM fall within the terrestrial category and highlight the lack of diversity currently available to the dissolved organic matter community. Dashed arrows represent eventual contributions of organic matter to marine ecosystems from the atmospheric and terrestrial biospheres. The atmospheric, marine, and terrestrial biosphere colors coincide with Fig. 1; all others coincide with the branching nature. Figure modified from D’Andrilli et al. (2022)

The challenges presented by this new paradigm lie in defining “branched” end-members, examples of which may include the extent of NOM in situ processing (photobleaching, secondary biotic and abiotic reactions); unique precursor composition and heteroatom content; degree of anthropogenic influence (percentage of wastewater influence, wildfire modifications; eutrophication); and bioreactivity (autotrophy vs. heterotrophy). Such newly defined endmembers would both expand the diversity of the biogeochemical space of IHSS reference and standard samples (Fig. 5) and would enable researchers to study NOM from locations that they could otherwise not access, as exemplified by PLFA.

Need for additional wetland or pristine site to complement the Suwannee River site

The Suwannee River in southern Georgia, USA, has been the stalwart site for reference and standard fulvic and humic acid as well as NOM for more than 40 years (SI: Supplement 1). This site was originally chosen by the IHSS for its high DOC concentration and biogeochemical and hydrological stability associated with the huge extent of its source, the Okefenokee Swamp, and its protection as the Okefenokee National Wildlife Refuge. The refuge is a designated National Wilderness Area and has been recommended as a UNESCO World Heritage Site. Recently, however, the US Army Corps of Engineers will be reviewing a proposed mining project that would be located in the Trail Ridge area that forms the eastern boundary of the Swamp. If this mining project were to go forward, there would potentially be alterations in hydrology, that coupled with a changing climate, could influence the biogeochemical stability of this site. As such, there is a need to expand the current set of NOM references to include a comparable pristine wetland environment that would be protected from anthropogenic disturbances. Such samples are critical to compare these less impacted systems to those that are more directly affected by human activity. The challenge for site selection, however, is to identify environments that will remain relatively pristine and stable on the scale of decades. Given the vulnerability of the implementation of the Clean Water Act in the US to political considerations and similar pressures associated with water scarcity worldwide, stability of a pristine natural water site may be inherently uncertain.

Need for mixed-source sites

Currently, the Mississippi River NOM is the only IHSS reference material derived from an aquatic system found in a suburban built environment, where NOM precursors are predominantly derived from upstream natural and agricultural influenced sources but still possess some contributions from local anthropogenic activity. At this stage, however, this one reference site is not sufficient to reflect the diversity of anthropogenic impact on NOM composition. Additional sites are needed where NOM precursors are impacted in a different manner by anthropogenic activity. An example of such a site could be a public water supply formed by alterations to the watershed e.g., the construction of dams. An older IHSS drinking water supply NOM (Nordic Reservoir, Norway) has been discontinued, and a new IHSS reference material could fill this niche. An example of a potential site meeting this need would be the Neversink Reservoir, a reservoir that supplies drinking water to New York, USA (SI: Supplement 4).

Need for predominantly anthropogenically derived sources of NOM

One characteristic of the Anthropocene is that many lakes and rivers are impacted by treated wastewater effluent discharged to surface waters (or groundwaters in some cases). Wastewater effluent organic matter (EfOM) is the focus of many studies and publications because of its importance during advanced wastewater reuse, its ability to react with metals or oxidants, its composition altered by the presence of surfactants or other synthetic organics, and its unique photo-reactivity (Rice and Westerhoff 2015, 2017; Barber et al. 2019; Maizel and Remucal 2017). In areas where there is rapid urbanization (e.g., across southern and eastern Asian regions), municipal water withdrawal has dramatically increased, generating annually 120 km3 of wastewater in 24 countries, which comprises 39% of the global municipal wastewater production (Park et al. 2018). Although municipal wastewater constitutes only 1% of the renewable surface water, it can disproportionately affect the receiving river water, particularly downstream of rapidly expanding metropolitan areas, resulting in eutrophication, increases in the amount and lability of organic carbon, and pulse emissions of CO2 and other greenhouse gases (Fig. 6).

Fig. 6
figure 6

Source waters include rivers and lakes. Municipal drinking water is typically treated to remove humic and fulvic acids. Societal inputs include anthropogenic chemicals and human metabolites. Municipal wastewater treatment is designed to remove hydrophobic organics, degrade low-molecular weight organics, and biologically produced extracellular polymeric substances (EPS). Effluent organic matter (EfOM) is then discharged into downstream or other source waters

Schematic of the urban water cycle and typical changes in natural organic matter. DOC and SUVA values represent typical values.

Any of the 15,000 wastewater treatment plants in the USA (Rice and Westerhoff 2017) could be selected as a representative source of EfOM. However, based upon discussions in the workshop on potential sites, the attendees recommended a site within a major city that employs secondary wastewater treatment (i.e., nitrification plus partial denitrification) in a location of the USA with relatively low total dissolved solids in the water to facilitate NOM isolation, should a method such as reverse osmosis be applied. Further background on possible sites is presented in SI: Supplement 5.

Need for sites representative of dynamic systems

Several of the most important issues impacting environmental quality in the Anthropocene are reflective of change and disturbance. Examples include thawing permafrost, increased wildfire events, intensifying drought conditions, and changes in coastal water quality due to runoff, and saltwater intrusion. Further paired isolates, contrasting the extent of disturbance either temporally or spatially, may serve to aid in systematically characterizing the impacts of climate and other anthropogenic derived changes on NOM composition. For example, a paired set of NOM isolates, one from a site not previously impacted by wildfire, and one from a nearby site exposed to wildfire, would serve to compare, and contrast impacts on the introduction of dissolved black carbon into an ecosystem. Isolates from dynamic locations may require collecting a larger sample at the time of collection via coordinated efforts of the NOM research community, with an agreement to preserve a subset of collected material for future study as a “time capsule” reference. In this way, changes can be tracked over time, and/or as analytical methods evolve that may shed new light on NOM composition and reactivity. Challenges remain, however, to capture the heterogeneity of several of these sites. For example, permafrost soils have been shown to have water extractable organic matter that widely vary in composition as a function of radiocarbon age (Gagné et al. 2020), e.g., ice-rich Pleistocene-aged deposits (> 12,000 years bp) that are potentially most vulnerable to thermokarst development. This “old” carbon content may differ considerably in composition relative to modern permafrost NOM (Rogers et al. 2021).

A particularly dynamic sample would be from an atmospheric deposition site, such as a standard rainwater or aerosol samples. The OM composition of an atmospheric sample is extremely sensitive to location, air mass, source influence (Wozniak et al. 2014; Willoughby et al. 2016) as well as time. A site for collection of such a sample could be co-located at any of multiple monitoring stations, e.g., National Atmospheric Deposition Program in the U.S. or internationally. The reason that such a sample would be valuable is that molecular analyses reveal atmospheric DOM to have a distinct highly oxygenated, high S, high N, low aromaticity composition, and may serve as a unique endmember (Wozniak et al. 2008; Cottrell et al. 2013). An atmospherically derived NOM would be valuable as a component in atmospheric processes related to deposition that can impact water bodies e.g., Fe binding, nutrient delivery, deposition to remote locations like the Arctic and Antarctica and the open ocean (Wozniak et al. 2013; Stubbins et al. 2012).

Need for sites that leverage coincident data and traditional knowledge

To maximize the relevance of future reference NOM isolates, it would be beneficial that such sites be integrated within other relevant data collected historically and/or in the process of being measured on a regular and consistent basis, e.g., hydrology, temperature, pH, primary production, and dissolved oxygen. This approach would ideally provide historical, climate, ecological, and biogeochemical context to the potential site(s). Long-term monitoring data can come from collaborations with federally supported networks, e.g., Long-Term Ecological Research (LTER), Long-Term Research in Environmental Biology (LTREB), National Ecological Observatory Network (NEON), Critical Zone Network (CZN), Permafrost Carbon Network, and USGS monitoring programs. Further, sites within the jurisdiction of indigenous communities and their stewards complement Western scientific methods with traditional knowledge. Integration with existing databases and traditional indigenous knowledge (when applicable) about sites can help guide NOM research. In addition, coincident data will facilitate the ability to compare and contrast critical physical and biogeochemical aspects of these sites over time.

Need for NOM samples from outside North America

With the exception of three discontinued reference isolates from Norway (Hellrudmyra and the Nordic reservoir) and one Antarctic isolate (PLFA), there were no IHSS reference or standard sites with NOM isolated from locations outside of North America. NOM is ecologically, geographically, and temporally variable, and over 95% of the global population resides outside North America. Thus, the expansion of isolates should reflect greater geographic diversity in addressing the needs for new types of NOM highlighted in Fig. 5, as well as to promote NOM research globally. One example of such site would an oligosaline lake in a coastal area of Greenland (SI: Supplement 6). Proximity to an intercontinental airport make it a logistically feasible location to collect a unique end-member NOM from a well-studied pristine aquatic environment with a very long residence time and excessive microbial and photochemical processing (Anderson and Stedmon 2007; Osburn et al 2017). For NOM study on other continents containing other biomes, (e.g., South America, Africa, Australia, and Asia) sites that have been previously studied by others may be suitable (Amon and Benner 1996; Mladenov et al. 2005, 2007; Cawley et al. 2012a,b; Remington et al. 2011; Yan et al. 2014; Simon et al. 2021; and references therein). Because workshop participants were predominantly from North America and Europe, identification of specific potential sites outside of these two continents was not addressed.

Potential new sites identified by workshop attendees

Each potential new site presented at the workshop was discussed in breakout groups with respect to the criteria summarized above. The potential new sites and their justification are summarized in Table 1, along with other future sites for collection of soil NOM.

Chemical characterization

Well-characterized standard and reference materials of the IHSS have served as reference materials for the development of a variety of analytical techniques that are broadly used to study processes occurring in soils and natural waters. Historically, characterizing NOM composition focused on broad bulk properties or detailed examination of particular fractions of the NOM pool, such as the fulvic and humic acid fractions (Mopper et al. 2007; Minor et al. 2014). Elemental analyses, bulk stable and radioisotopic measurements, and nuclear magnetic resonance (NMR) spectroscopy (both 13C and proton) provide a degree of bulk level information (Cao et al. 2016; Guillemette et al. 2017). NMR, for example, provides valuable information about functional groups, such as carboxylic acids, and the relative abundance of aromatic and aliphatic carbon, which are useful in understanding the connection between chemical properties and environmental processes such as pH buffering, sorption by hydrous oxides and metal complexation (Kaiser et al. 2003; Mitchell et al. 2013; Wallace et al. 2020). Light measurements such as UV–Vis and fluorescence spectroscopy provide high throughput non-destructive approaches to assess NOM composition for the fractions of NOM that absorb photons and fluoresce, respectively, such as fulvic and humic acids (Coble et al. 1990; Fellman et al. 2010; McKnight et al. 2001; Wünsch and Murphy 2021. These spectra can be used to calculate commonly reported indices such as specific UV absorbance that are related to direct measurements of aromaticity (e.g., Weishaar et al. 2003). At a finer level, analysis of biochemicals such as lignin phenols, benzene polycarboxylic acids, amino acids, amino sugars, and carbohydrates provides insights into source and degradation history (Amon et al. 2004; Kaiser and Benner 2009; Spencer et al. 2008; Wagner et al. 2019). Over the last 5 decades, these and similar analyses at both bulk and molecular levels have greatly improved insights into NOM dynamics in a range of ecosystems. These measurements remain critical for a fundamental assessment of NOM composition, and all require regular intercalibration and referencing with community reference material representative of contrasting NOM types.

In recent decades, advanced instrumental approaches have become increasingly common and have matured to provide unparalleled insights into NOM. For example, electrochemical methods now can provide precise measurements of redox potentials and electron transfer capacities (i.e., electron accepting and donating capacity, EAC and EDC; Aeschbacher et al. 2010; Walpen et al. 2020). The molecular weight distribution of NOM is a fundamental parameter that can be measured using size exclusion chromatography (SEC) (Chin et al. 1994; McAdams et al. 2018). SEC measurements can only be compared when measured under the same conditions; therefore, collecting these data on the catalog of IHSS materials in a single laboratory or multiple laboratories with clearly defined methods would provide critical information on relative differences in molecular weight. One-dimensional (1-D) NMR has been widely used for studying the bulk structural components of NOM molecules as discussed previously; however, for more detailed molecular information or structural subtleties multi-dimensional NMR (2-D) is required and due to the complexity of these techniques few laboratories undertake them (Minor et al. 2014; Simpson et al. 2012). Similarly, both high resolution (e.g., Orbitrap) and ultrahigh-resolution MS techniques (e.g., Fourier Transform Ion Cyclotron Resonance or FT-ICR MS) using negative and positive mode electrospray ionization (ESI) are highly specialized and provide useful information on the molecular composition of NOM (e.g., C, H, N, O, and S-containing molecular formula, heteroatom classes, and compound classes) (D’Andrilli et al. 2013; Hawkes et al. 2020; Kurek et al. 2021; 2022). The ultrahigh precision, resolution, and accuracy of FT-ICR MS make it arguably the best tool for examining DOM composition, but the rapid growth of Orbitrap technology is facilitating more accessibility to the molecular analysis of complex matrices such as NOM to a larger population of researchers (Remucal et al. 2012, 2020; Hawkes et al. 2020). While these advanced methods provide important fundamental information about NOM composition and reactivity, they each require specialized equipment and expertise to both operate the instrumentation and interpret the data generated. Therefore, providing validated advanced NOM characterization data to the entire NOM community is of great importance.

The NOM community is continually developing and refining new approaches to further our understanding of its composition and reactivity. One such method has been the development of ramped pyrolysis oxidation (RPO) coupled with isotopic analyses that greatly helps to tease apart NOM source and age (Rosenheim et al. 2013; Rosenheim and Galy 2012). This technique has already been implemented alongside FT-ICR MS to tie together changes in thermostability alongside molecular composition (Rogers et al. 2021).

Because the composition of NOM determines its reactivity (Guerard et al. 2009; Berg et al. 2019; Milstead and Remucal 2021), standardized methods for assessing reactivity of this material are needed. In particular, future development of a standardized methodology for bio-lability (e.g., assimilable organic carbon) could provide a valuable benchmark. Similarly, standardized approaches for assessment of other metrics of reactivity, such as photolysis or disinfection-by-product formation potential, would be useful in both a water resource management and regulatory context. In contrast, reaching a consensus on which approaches are applicable as a “standard method(s)” for a range of NOM samples depends on the researchers having access to the same set of standard and reference NOM samples to make meaningful comparisons.

NOM data curation

Increasing research in natural organic matter distribution, biogeochemical role, function, and fate has led to a proliferation of studies examining NOM chemical composition making the FAIR principles (findable, accessible, interoperable, and reusable according to https://www.go-fair.org/fair-principles/) relevant. Clearly, a community that would benefit from FAIR datasets is users of the IHSS standard and reference NOM. In addition to providing fundamental chemical characterization information, well curated data and the inclusion of both raw data and calculated values (e.g., spectral indices or molecular formula) can provide a training opportunity for new researchers.

A great deal of data is available on IHSS standards and reference materials from characterizations performed by the IHSS and the diverse research groups using their materials. Some of these data have been compiled by IHSS and are provided via their website (humic-substances.org), but most of these data are scattered throughout research publications that are not directly accessible through IHSS. Currently, many of these documents are not open access, and the data are in graphics or tables that are not in machine-readable formats.

In addition to the variety of data sources described above, data about the IHSS isolates encompass a variety of data types and formats. For instance, the IHSS website alone consists of data for elemental compositions, acidic functional groups, amino acid composition, carbohydrate composition, 13C-NMR estimates of carbon distribution, ESR data, FTIR spectra, fluorescence spectra, and both 1H and 13C solution and solid-state NMR data for both standards and reference samples. All these data would require extensive metadata, which should be structured in a standardized format (e.g., by adapting the Ecological Metadata Language, EML–https://eml.ecoinformatics.org/) in addition to formatting and archiving with a persistent identifier to conform to the FAIR data standards. Without this metadata, data provenance and quality are opaque, which may lead to incorrect or inconsistent analyses.

To overcome these limitations to the accessibility of data on IHSS samples, the organization (and community) needs to encourage and facilitate the documentation of data on IHSS samples according to FAIR principles. This may include additional data curation and management to address issues with accessibility, transcription errors, and proper interpretation and use. Potential actions that could address this need would be for IHSS to support an information manager or partner with a larger environmental data repository to directly support users and manage datasets. In addition, the IHSS could institute a “user agreement” that requires users who purchase IHSS materials to abide by best practices related to data availability when publishing results. The user agreement would ideally require users of IHSS products to: (i) acknowledge the materials used with a standardized text string, in published papers, abstracts, and dataset, (ii) provide raw data in machine readable format and give notice (using the DOI) to IHSS, in datasets, and (iii) report publications (via DOI) that use IHSS materials.

Conclusions

The reference and standard materials offered by the IHSS have supported considerable advancements and developments in NOM research to date. The time now is appropriate to generate a new generation of reference materials, extending some of the original collection and adding new material from selected contrasting sites. The sites suggested here serve as inspiration where the diverse properties of NOM span new gradients in quality, functioning, and reactivity, which if adopted will hopefully support a new generation of discovery in NOM biogeochemistry and analytical chemistry. Such a selection of materials will facilitate research into current topics incorporating an expanded pristine end-member selection to investigate carbon cycling, which is less affected by human activity as well as sites with variable levels of anthropogenic disturbance.