Introduction

Soil is a component of the terrestrial system that is recognised for its vast reservoir of carbon, however, only the carbon in the upper soil horizons is used to estimate the global soil organic carbon (SOC) stock (Batjes 1996). There are many deep soils in the world, and organic carbon has been reported in deep regolithic profiles to depths of tens of metres (Harper and Tibbett 2013; Jackson et al. 2002; Moreland et al. 2021). These stocks of carbon can be substantial; in Harper and Tibbett’s (2013) study carbon storage to bedrock (mean 21 m, range 5 to 38 m depth) was 2–5 times of that reported from the surface 0.5 m.

The composition of the carbon compounds that occur in the deep profiles is unknown. The study of low molecular weight organic carbon compounds (LMWC) will provide an understanding of the sources of soil organic matter, microbial activity and the pathways of degradation and stabilisation of soil organic matter (Bull et al. 2000; Feng and Simpson 2007). This will be of particular importance where this carbon occurs within deep soils, because at present little is known about the distribution or dynamics of this material, and particularly in response to global change.

For instance, components such as aliphatic lipids, steroids, terpenoids, glycerols and carbohydrates (mono- and disaccharides), extracted sequentially by a series of dichloromethane and methanol solvent mixtures, were used to study soil organic matter biomarkers such as plant waxes, bacteria and fungi (Otto and Simpson 2007). In strongly podsolized laterites, an abundance of free lipids has been interpreted to indicate the conditions of the area such as combinations of acidity and waterlogging, limitation of complex elements (Fe, Al and Si) and decreases in microbial activity (Bardy et al. 2008).

Meanwhile, macromolecular organic carbon (MOC) is usually composed of large-non-volatile organic substrates that are difficult to analyse directly using conventional GC/MS. However, pyrolysis and thermochemolysis have been developed for characterisation of large organic substrates because they lead to extensive fragmentation, which enables easy separation and identification of fragments by GC/MS (Kaal et al. 2009; Klingberg et al. 2005; Shadkami and Helleur 2010). Pyrolysis involves degradation of MOC by elevated temperatures in the absence of oxygen. Signature fragments of MOC are produced during pyrolysis that identifies the form(s) of MOC present. Several carbon materials have been qualified by this technique such as polysaccharides, humic acid, degraded wood or charred materials (Buurman et al. 2007, 2009; del Rı́o et al. 2001; Kaal et al. 2009; Kaal and Rumpel 2009). Furthermore, thermochemolysis involves assisted fragmentation with an alkylating reagent at elevated temperatures. TMAH is a common reagent used to assist the thermochemolysis reactions. This procedure requires lower temperatures than pyrolysis to degrade MOC because the TMAH cleaves the MOC selectively at ester and ether bonds (Clifford et al. 1995; Shadkami and Helleur 2010) and the fragment compounds are easily characterised by GC/MS. Again, the carbon fragments identified by this method can be used to imply the presence of compounds such as lignin, cellulose and chitin, including proteinaceous materials.

Previous work has focused on samples with relatively large amounts of organic carbon, mostly in near surface soils, and this carbon has been well characterised (e.g. Kleber and Johnson 2010; Kögel-Knabner and Rumpel 2018). Whereas there is increasing recognition of the occurrence of deep soil carbon stores to depth of tens of metres (Harper and Tibbett 2013; Jackson et al. 1996) the LMWC and MOC in this organic matter have not been characterised and quantified. The main objectives of this paper are thus to (i) identify and quantify the LMWC in deep soil profiles from three field locations in south-western Australia where soil organic carbon stores have been previously reported by Harper and Tibbett (2013), and (ii) examine the utility of the thermochemolysis and pyrolysis techniques for characterisation of MOC of deep soil carbon.

Materials and methods

Identification and quantification of non-volatile low molecular weight compounds (experiment 1)

Samples

Samples from three field sites located in south-western Australia and previously reported by Harper and Tibbett (2013) were analysed namely GL03 (-34º 21’ 56”, 117º 26’ 31”), PT06 (-31º 57’ 19”, 117º 5’ 25”) and ST04 (-33º 36’ 23”, 116º 39’ 30”). Carbon concentrations of each profile at depths of 0.5–1, 10–11 and 19–20 m were 0.55, 0.05, 0.04% total organic carbon (TOC) (GL03); 0.17, 0.04, 0.03% TOC (PT06) and 0.16, 0.04, 0.04% TOC (ST04). Soil pH values (0.01 M CaCl2) for these samples were 5.6, 6.7, 6.5 (GL03), 4.8, 4.5, 5.8 (PT06) and 5.9, 7.4, 5.9 (ST04). These profiles have sandy surface horizons, with lower horizons to bedrock having sandy clay to medium clay textures.

Briefly, the samples were taken from farmland in south-western Australia where deep weathering profiles have formed on the Archean granites and gneisses of the Yilgarn Craton (Gilkes et al. 1973). One profile from each location was selected at random. Vegetation prior to agriculture was deep rooted native plants which had been removed 50–80 years previously and replaced by shallow-rooted annual crops such as cereals or annual pastures. In the year prior to sampling, Pinus pinaster plantations had been established (Harper and Tibbett 2013) with these still containing the pasture plants within their inter-row areas.

Extraction of low molecular weight compounds

Samples were weighed into Erlenmeyer flasks, ethyl acetate was added and the flasks closed with glass stoppers. The extraction was carried out for 2 h using a shaker. The samples were then filtered through glass fibre filter (Whatman GF/A) and the solutions transferred to 100 ml round bottom flasks (or 50 ml size in case of surface soil). The solution was concentrated to approximately 1 ml using a rotary evaporator, with water bath temperatures of 60–65 °C. Then, the solution was transferred to a 5 ml round bottom flask and concentrated until dried. In the final step, the sample was redissolved with 1 ml of internal standard, n-decane in ethyl acetate, before analysis by GC/MS and identification of compounds. The samples were prepared in duplicate.

Quality control and reproducibility of procedures

A soil reference sample was prepared by mixing 100 g of sample CU03 at depth increment of 12–13 (0.02% TOC) and 27–28 m (0.01% TOC). The extraction procedure efficiency was tested by measuring the recovery of a spiked commercial standard, (Z)-docos-13-enamide from this sample. An average recovery of 95% (SD = 7, n = 3) was obtained. Furthermore, this reference sample was re-analysed at the same time as the field samples in order to monitor the quality and reproducibility of procedure.

Analytical gas chromatography/mass spectrometry

Samples were analyzed by capillary GC on a Shimadzu GC 2010. The column used for GC separation was 30 m in length, 0.25 μm in diameter and 0.25 μm thick (SGE BPX5) and helium was used as the carrier gas. The column oven temperature was set to 60 ºC and injection temperature was 310 ºC using splitless injection mode. The column was heated to 60 ºC for 1 min then ramped at 30 ºC min− 1 to 100 ºC, held at this temperature for 1 min before heating at a rate of 4 ºC min− 1 to a final temperature of 300 ºC then held at this temperature for 5 min. Compound identification was achieved by comparison with the National Institute of Standards and Technology (NIST) database with a threshold match of > 80%. Quantification of compounds was achieved by comparing peak area of a selected compound to the peak area of an internal standard of known concentration. The response factor was assumed to be 1 for all compounds. Concentration of compounds was then normalized to sample weight before the carbon concentration of each compound was determined.

Characterisation of deep soil macromolecular organic carbon (experiment 2)

Sample preparation

A full profile (CU03, Harper and Tibbett 2013) with depths of 0–0.1 (4.68% TOC), 1–2 (0.23% TOC), 10–11 (0.02% TOC), 19–20 (0.02% TOC) and 28–29 m (0.01% TOC) was selected for this experiment. The location of this sample site was − 32º 35’3”, 117º 6’15”. Soil pH values (0.01 M CaCl2) for these samples were 5.1, 5.5, 6.2, 6.3 and 6.4, respectively.

Samples were pre-concentrated by using 5 g of sample placed into a 25 ml plastic tube with 15 ml of 0.1 M NaOH and flushed with nitrogen gas before extraction. The suspension was shaken for 24 h and then centrifuged for 15 min. Following this the solution was decanted and two further extractions were done by adding 15 ml of 0.1 M NaOH and shaking the tube by hand for 10 min. After each of the extractions the solution was separated by centrifugation as mentioned above. The combined extracts were then freeze-dried for 4 days (Hetosicc CD4 freeze dryer).

Thermochemolysis with TMAH (TC(TMAH)-GC/MS)

Approximately 5 mg of solid extract was placed in glass tubes (1 cm diameter and 18 cm long) treated with 200 µl of TMAH (25% in methanol) and flushed with nitrogen gas. Samples were allowed to stand for 30 min then methanol was evaporated completely. The tubes were sealed under vacuum then placed in an oven at 250 ºC for 30 min. After cooling to room temperature, condensation of the samples was observed. The tubes were then opened and extracted with ethyl acetate (3 × 1 ml). The residues were filtered through 0.45 μm polytetrafluoroethylene (PTFE) membrane before concentrating the samples under a nitrogen stream. Finally, the samples were analysed by GC/MS.

Samples were analyzed by capillary GC on a Shimadzu GC 2010. The column used for GC separation was 30 m in length, 0.25 μm in diameter and 0.25 μm thick (SGE BPX5) and helium was used as the carrier gas. Conditions were used as follows; the column was set to 40 ºC for 1 min and then a temperature ramp was applied at 8 ºC min− 1 to a final temperature of 300 ºC which was held for 30 min. Aliquots of 1 µl were injected at 310 ºC using splitless injection mode.

Mass spectra were obtained by a Shimadsu QP2012S mass spectrometer with ion source temperature and interface temperature of 200 ºC. Compounds were scanned at the rate of 2000 amu sec− 1 and range of 45–1000 m/z. Peaks were identified by comparison with the NIST database.

Pyrolysis-GC/MS for laboratory standard samples and deep soil carbon

In this study, analysis was conducted on 0.5 mg samples and three standards, lignin, cellulose and chitin mixed with kaolinite at 3.85% TOC, without pre-treatment. Pyrolysis was performed at two temperature ranges (i) 250–340 ºC and (ii) 250–600 ºC, for 10 s with a heating rate of 10 ºC min− 1 using a pyrolysis-GC/MS from an Agilent 6890 GC interfaced to an Agilent 5973b mass selective detector. The GC was fitted with a fused silica DB5 phase capillary column (60 m length x 0.25 mm i.d. x 0.25 μm film thickness) and helium was used as the carrier gas. Split injection modes of 15:1 and 10:1 were applied for standard and field samples, respectively. The GC oven temperature was programmed from an initial temperature of 40 °C and held for 2 min, then increased at a rate of 4 °C min− 1 to a final temperature of 280 °C isothermal for 20 min, 70 eV full scan (50–550 amu sec− 1) and selected ion (m/z 57, 91, 123, 156, 170, 184, 191, 192, 217) analyses were simultaneously recorded at a scan speed of ~ 2 scans sec− 1. The identification of individual compounds was based on comparison of mass spectral and retention time data to laboratory standards, the NIST mass spectral library and other published data.

The relative abundance of each compound was calculated as percentage of peak area and normalised to 100%. However, absolute quantification was not carried out because this would require separate verification of each of the compound identities using pure standards and then determination of the response factors for each of these compounds. Nevertheless, the relative abundance is suitable for comparing samples in a semi-quantified way (Buurman et al. 2007; González-Pérez et al. 2012; Vancampenhout et al. 2012).

Results

Identification of low molecular weight compounds in deep soil profiles (experiment I)

Compound identification

Generally, the diversity of compounds declined with depth. However, three main compound classes were identified in the chromatograms: (1) terpenes (RT = 6.70–21.74 min), (2) functionalised long chain alkanes including alkanoic acids, alkyl amides and alkanols (RT = 22.42–44.22 min), and (3) bioactive compounds and plant sterols (RT = 54.49–58.12 min) (Table 1).

Table 1 Concentration of LMWC (µg C g soil-1) for locations GL03, PT06 and ST04 at 3 depths (n = 2)

Chromatograms obtained from profiles GL03, PT06 and ST04 revealed small amounts of terpenes only in the surface soil (0.5–1 m) (Table 1). Several terpenes were common between the different locations such as camphor (9) in location GL03 and ST04; aromadendrene (46) and epiglobulol (68) in PT06 and ST04; and α-selinene (82) and cubenol (83) in GL03 and PT06. However, other terpenes were only detected at a single location. For example, borneol (14) from GL03; patchoulane (38), globulol (76) and aromadendrene oxide (86) from PT06; and isothujol (11), isoborneol (15), germacrene (52) and β-eudesmol (85) from ST04.

Fatty acids, commonly observed in all chromatograms, were divided into 2 groups according to their occurrence. The first group were fatty acids such as hexadecanoic acid (115), (Z)-octadec-9-enoic acid (124) and octadecanoic acid (126) which were commonly observed in 0–1 m. The second group consisted of alkyl amides and alkanols, which were observed throughout the profiles. Examples were hexadecan-1-ol (118), (9E)-9-hexadecen-1-ol (119), (9Z)-9-octadecenamide (138) and (Z)-docos-13-enamide (146). Furthermore, bis (6-methylheptyl) phthalate (137) was observed in all depths. In contrast, nitrogen containing compounds, such as (9Z)-9-otadecenenitrile (oleanitrile) (135), were only observed in deeper layers (11–12 and 18–19 m) (Table 1).

Plant steroids, such as pollinastanol (148) and cholesterols (129, 149, 152), were observed at 0.5–1 m. Another steroid, 4,4’-{[4-Hydroxy-5-(2-methyl-2-propanyl)-1,3-phenylene]bis(methylene) bis[2,6-bis(2-methyl-2-propanyl) phenol] (150) was found throughout most profiles and locations but with only a 70% match to the database (Table 1), so this is only a tentative identification.

Compound quantification

Clearly, (Z)-docos-13-enamide (146) made the largest contribution over the whole depth of the profile at locations PT06 and ST04 (Table 1). Additionally, the relative contribution of this compound tended to increase with depth representing 42% of LMWC at 0.5–1 m, 74% at 11–12 m and 81% (18–19 m) at the PT06 site; and 32% of LMWC at 0–1 m, 27% at 11–12 m and 53% at 18–19 m at the ST04 site. However, bis(6-methylheptyl) phthalate (137) was the dominant compound in profile GL03 (Table 1) which represented 16% of LMWC at 0.5–1 m, 31% at 11–12 m and 57% at 18–19 m.

Quantitative analysis revealed that LMWC concentrations ranged from 3.15 to 14.27 µg C g soil− 1. Although fatty acids, amides esters and alcohols were typically observed in all locations and depths, the dominant compounds differed among locations. For example, in the 0.- 1 m sample from location GL03, fatty acids (115 + 126), long chain alkyl amides (138 + 146), and bis(6-methylheptyl) phthalate (137) had similar concentrations (1.42, 1.44 and 1.46 µg C g soil− 1, respectively). Bis(6-methylheptyl) phthalate (137) had a similar concentration (1.44 µg C g soil− 1) at a depth of 11–12 m but a slightly higher concentration (3.49 µg C g soil− 1) at 18–19 m. At the PT06 and ST04 locations, (Z)-docos-13-enamide (146) was the most abundant compound throughout the profiles (Table 1). On the other hand, several terpenes, bioactive compounds and plant sterols were minor species, sharing approximately the same magnitude of concentrations in the first bands of all locations (Table 1).

Characterisation of deep soil macromolecular organic carbon (experiment 2)

TC(TMAH)-GC/MS: Procedure application for standards and surface layer

For TMAH thermochemolysis of pure lignin, the dominant peaks identified in the GC-MS arose from dimethoxy benzenes (Table S1) such as 3,4-dimethoxybenzaldehyde, 1-(2,4-dimethoxyphenyl) ethanone, methyl 3,4-dimethoxybenzoate, 1,2-dimethoxy-4-[(1E)-3-methoxy-1-propen-1-yl] benzene, 1,2-dimethoxy-4-(2-methoxyvinyl) benzene, 1,2-dimethoxybenzene, 1,2,4-trimethoxybenzene, vanillinor4-hydroxy-3-methoxybenzaldehyde and 1,2-dimethoxy-4-(2-propenyl)-benzene, which are typically obtained from methylation of lignin. Furthermore, various kinds of alkyl benzenes were also identified from the chromatogram (Fig S1).

The chromatogram obtained following TMAH thermochemolysis for pure cellulose and chitin (Fig S2a) indicated the presence of many benzene derivatives (Table S2). Analysis of extracted cellulose/kaolinite samples revealed furans, while pyridine compounds were obtained from extracts of chitin/kaolinite samples. However, the number of peaks derived from this method was low compared to the number of compounds derived from methylation of pure cellulose and chitin.

A wide variety of MOC was obtained in the surface 0–0.1 m layer (Fig. 1) and the analysis indicated the presence of lignin, terpenes, polysaccharides and proteins. Aromatic compounds of non-unique origin were also identified including naphthalene, toluene and phenol (Table 2). Unfortunately, compounds detected from the deeper samples using this technique could not be categorically identified due to very low similarity index with the compounds in the NIST database.

Fig. 1
figure 1

Chromatogram of soil MOC in the 0–0.1 m layer assigned by GC/MS. Table 2 contains the full list of peak identifications

Table 2 TMAH products of soil at 0–0.1 m depth

Pyrolysis-GC/MS: for standards and deep soil carbon characterisation

The main pyrolysis products of lignin combined with kaolinite (Tables S3 and S4) were (a) simple aromatics such as benzene, toluene, styrene and phenol; (b) methoxy-substituted phenol and benzene; (c) naphthalenes and phenanthrenes with methyl-substitutents; and (d) unsaturated hydrocarbons, mainly alkenes (Table S4). Pyrolysis of cellulose generated signature products such as 1,4:3,6-dianhydro-α-D-glucopyranose, furans, ketones, alcohols and acids. On the other hand, pyrolysis of chitin produced compounds containing nitrogen such as pyridine, pyrole, nitriles and amides (Table S4).

Hexadecanoic acid was the only compound observed from pyrolysis of the 0–0.1 m band at 250–340 ºC. This is likely to be a natural component of the soil carbon rather than a pyrolysis product from MOC degradation. No detectible compounds were obtained from pyrolysis of the deeper samples at 250–340 ºC. However, 72 compounds were identified from pyrolysis at 250–600 ºC from throughout the profile (Table 3). Representative chromatograms of MOC at 0–0.1 m and 28–29 m are given in Fig. 2.

Table 3 Pyrolytic products (250–600 °C) and percentage of peak area of soil samples at 5 depths
Fig. 2
figure 2

Chromatogram after pyrolysis of MOC in the 0–0.1 m (a) and 28–29 m (b) layers assigned by GC/MS. Table 3 contains the full list of peak identifications

Generally, the identified compounds were most abundant in the 0–0.1 m layer and declined in abundance with depth. Simple aromatic forms such as benzene, toluene, ethylbenzene and styrene were observed throughout the profile. By contrast, pyridine and benzonitrile species mostly occurred in the 0–0.1 m horizon (Table 3).

Relative abundance of deep soil pyrolysis products

Relative abundances of compounds at each profile depth showed that pyrolysis of the carbonaceous component of the soil resulted predominantly in aromatic compounds (50–80%) including toluene, benzene and styrene at all depths (Table 3). These compounds can arise from the pyrolysis of lignin but the absence of corresponding methoxy and phenol derivatives suggests that they are more likely evidence of carbon black from past fire events. There was also a large proportion of compounds of unknown origin (13–42%). Polysaccharides were present in small proportions near the surface (17% in surface soil and 6% at 1–2 m) but declined in deeper layers.

Discussion

Identification of non-volatile LMWCs

Three distinct compound classes, namely (1) terpenes, (2) functionalised long chain alkanes including alkanoic acids, alkyl amides and alkanols, and (3) plant steroids were obtained from all profiles. Terpenes are a large and diverse class of compounds produced by all plants. The previous vegetation dominated by trees and shrubs and its replacement over the past 50–80 years by agricultural pursuits with annual crops are likely to be the main source of the observed terpene compounds. For example, many of the terpenols are derived from eucalypts (de Blas et al. 2013), which have dominated the landscape in south-western Australia for thousands of years (Brooker and Kleinig 1990). As pines were only established in the year preceding sampling, it is unlikely that they contributed to the organic compounds in the deeper profiles.

In addition to the indigenous eucalypts, there are many other genera of native plants in south-western Australia that contain terpenes in resins or essential oils (Dell and McComb 1979). This is in addition to the terpenes that plants produce as photosynthetic pigments, and hormones. The half-life of soil terpenes originating from the original endemic vegetation is unknown.

However, the current study suggests that LMWC derived from the current above ground vegetation occurs mainly in the surface horizon and does not percolate to the deeper layers. This could be due to the hydrophobicity of these compounds (de Blas et al. 2013; Franco et al. 2000) and also the shallow root systems of both the annual pasture plants and the young Pinus pinaster trees. In contrast, the root systems of the original natural vegetation are likely to have extended to bedrock (Dell et al. 1983), while leaching may be limited due to the relative low rainfall (450 mm year− 1) in the area (Harper and Tibbett 2013).

Similarly, plant steroids were mostly observed at 0.5–1 m depth. The behaviour of these compounds was also similar to the terpene group as they do not extend into deeper layers. Plant sterols are highly resistant to biodegradation (Bull et al. 2000) and are constituents of agricultural crops as well as the native vegetation.

Fatty acids were dominant in top soil layers and may originate from plants and microorganisms (Andreetta et al. 2013; Franco et al. 2000; Spielvogel et al. 2014). Fatty acids can occur as cutin monomers in plant waxes and have been detected in leaves, roots and wood of forest trees (Andreetta et al. 2013; Martínez-Iñigo et al. 2001; Spielvogel et al. 2014). Fatty acids can also be an indicator of biological influence in various environments, such as temperate grassland, lateritic formation processes or even in deep seas, as reported previously (Bardy et al. 2008; Feng and Simpson 2007; Huang et al. 2013). The novel finding of this study was that fatty acid amides (138 + 146) were commonly observed throughout the deep profiles (1.11–10.81 µg C g soil− 1) while fatty acids were not detectable in the deeper layers.

In particular, (Z)-docos-13-enamide (146) occurred throughout these profiles. This compound has been identified in wood and bark and is the main component of stem oil of Eucalyptus urophylla x grandis hybrids (Xu et al. 2013). Its presence in these profiles is likely to be a fingerprint for the previous vegetation. Furthermore, the persistence of (Z)-docos-13-enamide (146) in fire-affected soils was reported by Atanassova et al. (2012). Their study found that (Z)-docos-13-enamide was still detected even when the soil was heated up to 300 ºC in air.

Although, (Z)-docos-13-enamide is mostly considered as a compound extracted from woody parts of eucalyptus trees, it can also be derived from a broad range of microorganisms including Trichoderma harzianum, a ubiquitous soil fungus (Siddiquee et al. 2012), Paecilomyces sp., an endophytic fungus of Panex ginseng (Xu et al. 2009) and Bacillus sp., a halophilic bacteria collected from solar salt works (Donio et al. 2013).

Fatty alcohols, containing carbon numbers ranging from 7 to 35, have been discovered in various environments such as soil, peatland and marine sediment (Huang et al. 2013; Treignier et al. 2006; Yang et al. 2014). However, fatty alcohols including hexadecan-1-ol (118) and (9E)-9-hexadecen-1-ol (119) have not previously been reported in deep soil layers. It is likely that these fatty alcohols were derived from hexadecanoic acid (115), one of the fatty acids commonly found in many organisms (Kaneda 1991). Significant amounts of hexadecanoic acid (115) and (9Z)-hexadecen-9-ol (119) have also been isolated from halophilic bacteria from extreme environments such a deep-sea carbonate rock at a methane cold seep in Japan (Hua et al. 2007).

Other long chain compounds including, (9Z)-9-octadecenenitrile (135) and (9Z)-9-Octadecenamide (138) have also been detected from Paecilomyces sp. and Bacillus sp., respectively (Donio et al. 2013; Xu et al. 2009). Interestingly, the latter compound is produced by a wide range of microorganisms and has antimicrobial properties as discussed by Donio et al. (2013). This could be a reason for persistency of this compound in most of the samples studied (2.25 × 10− 4 − 5.75 × 10− 4 g C g TOC− 1). It can be concluded that fatty acid amides and alcohols, including nitrogen containing compounds present in deep soils of all locations, are derived from microorganisms and plants.

It has been reported that phthalate compounds are components of wood that resist decomposition by organisms such as fungi and termites (Xu et al. 2015). This could be a reason that bis(6-methylheptyl) phthalate (137) was observed abundantly at all locations and depths in this study. Although bis(6-methylheptyl) phthalate (137) has not been reported as a compound in eucalypts and pines, several related compounds have been identified including, phthalic acid (Liu and Xu 2012; Xu et al. 2015).

Contribution of LMWC in deep soil

The presence of LMWC will be largely influenced by vegetation and in situ decomposition of plant roots. For example, terpenes and plant steroids were minor contributions in surface soils. In general, the amount of soil carbon arising from LMWC from the three profiles was in the microgram per gram range, which is significantly less than the TOC content determined from dry combustion or wet chemistry (Harper and Tibbett 2013). Even when the LMWC concentrations for species that were below the detection limit or residual compounds that may not have met the 80% match criterion of this study are factored in, it seems likely that the LMWC represents only a small proportion (0.92–3.56%) of the TOC in deep soil profiles.

Thermochemolysis (TMAH-GC/MS) of surface soil profile

An abundance of MOC was identified in the surface soil layer. Various polysaccharide compounds indicated the decomposition of organic material produced by recent vegetation. For example, furan and pyran are released from carbohydrates by wildfire or biodegradation (Atanassova et al. 2012; Schellekens et al. 2009). The contribution from vegetation could also be proposed from the presence of 1,2,3-trimethoxy-5-methylbenzene which could be the syringyl unit of lignin. This lignin subunit could represent the residue of grass (Buurman et al. 2009). It is likely some of the syringyl measured was the result of the recent degradation from current agricultural practices: annual pasture and crop rotation.

Terpene species such as globulol and yalangene are constituents of the essential oils of native eucalypts and exotic pines, respectively (Dob et al. 2005; Tan et al. 2008; Xu et al. 2013) and provide evidence of the link to vegetation. Furthermore, condensed aromatic rings such as naphthalene observed from surface soil are the signature compounds of fire events (del Rı́o et al. 20012002). Fire is widely used for prescribed burning and land clearing, and wildfires are common where fuel loads are high enough (Burrows 2008).

Compound characterisation and distribution in deep soil profiles

This is the first study to characterise soil organic carbon compounds in profiles to a depth of 29 m. This study exemplified the persistence and distribution of different carbon compounds over the entire regolith depth. It is highly possible that many of the compounds identified are root-derived as evidenced by the early studies of root distribution in this region (Carbon et al. 1980; Dell et al. 1983). The main pyrolysis products of lignin identified in this study are methyl or methoxy substituted phenols, which agrees well with the study of Jiang et al. (2010). Naphthalenes and phenanthrenes were also obtained from this study which has previously been reported by Asmadi et al. (2011).

The aromatic compounds that were identified could result from the different pyrolysis pathways for lignin. Asmadi et al. (2011) demonstrated that the degradation pathway occurs after the ether bonds of lignin are broken. Consequently, the aromatic compounds are released and react via two main pathways. The first pathway is radical induced rearrangement followed by decomposition to cresols and xylenols, and subsequently phenol and benzofuran are generated. Secondly, aromatics are further reacted via homolysis of the O-CH3 bonds to produce catechols and pyrogallols before these intermediates are decomposed to produce naphthalenes and phenanthrenes. For cellulose, a key pyrolysis product was 1,4:3,6-dianhydro-α-D-glucopyranose, which corresponds with the studies of Lu et al. (2011) and Stefanidis et al. (2014). Although the signature compounds are mainly furans and cyclopentanones, there is no evidence of levoglucosan, which may be influenced by the pyrolysis temperature.

The detection of aromatic compounds over the entire depth implies that lignin is a recalcitrant biopolymer compared to cellulose and chitin. This is consistent with the observation that lignin comprises a major portion of root tissue and has a low decay rate (Kögel-Knabner 2002; Rasse et al. 2005). This is also consistent with the study of Moreland et al. (2021) which investigated the age of carbon in deep saprolite in the Sierra Nevada, with this increasing with age. The interaction between SOM and the soil matrix has been found to be important in recent studies (e.g. Lehmann and Kleber 2015; Schmidt et al. 2011). This bears investigation in future studies, particularly as it will relate to the rates of decomposition of deep carbon stores following deforestation.

Cellulose and N-containing compounds appear to be mostly distributed in the surface soil at 0–0.1 m and obviously linked to above ground inputs. The results also suggest that these materials are decomposed rather than leached to greater depths. The abundance of compounds of unidentified origin suggests that soil carbon also includes many residual compounds from different stages of degradation.

Performance of thermochemolysis and pyrolysis methods

Thermochemolysis with TMAH and pyrolysis methods were found to be robust for characterisation of carbon determined from pure carbon substrate products. However, the performance of these methods when dealing with samples with small concentrations was different. When comparing both methods applied to the same surface soil sample, TC(TMAH) resulted in selective fragmentation of MOC resulting in products with complex structures. In comparison, pyrolysis gave a large number of simple compounds.

Secondly, terpene compounds, indicative of plant species were observed when samples were analysed using TC(TMAH) but were lost during heating in the pyrolysis technique. The difference demonstrates the advantage of using both techniques to characterise soil organic components.

For Py-GC/MS, the pyrolysis temperature had a big effect on the number of compounds observed, with one (hexadecanoic acid) at 250–340 °C and 72 at 250–600 °C. To achieve a greater understanding of MOC in deep soil profiles at least three complementary improvements are recommended. Firstly, pre-concentration of samples with 0.1 M NaOH before analysis with Py-GC/MS. Secondly, employing multi-shot Py-GC/MS in the temperature range 250–600 °C. Thirdly, exploiting selective fragmentation with TMAH combined with Py-GC/MS at temperatures lower than thermal degradation may resolve loss of compound information.

Conclusions

The vertical distribution of LMWC and MOC was investigated for the first time in three deep soil profiles in south-western Australia. Three common compound classes of LMWC were obtained from surface soils (1) terpenes, (2) fatty acids and amides of fatty acids and (3) plant steroids, but only alkyl amides and alkanols were dominant in deep soils. The concentration of LMWC ranged from 3.15 (0–1 m) to 14.27 (11–12 m) µg C g soil− 1. The LMWC terpenes and plant steroids in surface soils may result from vegetation dominated by eucalypts and pine. The compounds typically isolated from deep soils such as (9Z)-9-octadecenenitrile, (9Z)-9-octadecenamide, hexadecan-1-ol and (9E)-9-hexadecen-1-ol. Overall, (Z)-docos-13-enamide and bis(6-methylheptyl) phthalate may come from microorganisms and previous native vegetation.

Analysis of both LMWC and MOC indicated that the present vegetation influences only the surface soil, which was revealed by a range of biomarkers including terpenes, methoxy benzenes, pyrans (cellulose), pyridines and the occurrence of yalangene and patchouli alcohol. The presence of soil biota, such as bacteria and fungi, was revealed by the distribution of N-containing compounds not derived from chitin. Furthermore, naphthalene compounds indicated the imprint of past fire in this area. This study suggests that carbon in the form of aromatic compounds was stabilised in deep soil, whereas other carbon sources such as cellulose, chitin, and N-containing compounds were confined to the surface soil.