Introduction

Providing food security for a rapidly growing global population of which a large fraction is malnourished is one of the greatest challenges in the modern era (IFPRI 2017). Conventional solutions such as increased land clearing and increasing usage of pesticides to produce sufficient food are unfavourable due to their environmental impacts and long-term unsustainability. Thus, novel alternatives are needed to efficiently produce the approximately 70% more food needed by 2050 (Beuchelt and Virchow 2012). Crop loss to pests and pathogens throughout food production/supply represent a major threat to this aim. Pests and pathogens may reduce crop yield by 80%, thereby presenting a significant challenge to crop productivity (Oerke 2006). Detection of pests and pathogens within pre- and post-harvest settings is therefore essential to minimize the impacts on crop production. The post-harvest consumer stage can be viewed as one of the most critical points during food production/supply, because maximum resource allocation has occurred at this point, making plant–pathogen interactions especially relevant at this stage. Current methods for crop diagnosing crop health include remote sensing, molecular based methods and complex analytical techniques all of which have drawbacks. Remote sensing, including hyperspectral imaging and thermography, are highly responsive to environmental conditions and distance to the measured object making it difficult to determine disease specificity (Mahlein 2016). Molecular based techniques [for example, polymerase chain reaction (PCR), fluorescent in situ hybridization (FISH), and enzyme-linked immunosorbent assay (ELISA)] are time consuming and prone to contamination (Chitarra and Van Den Bulk 2003; Schaad and Frederick 2002; Wallner et al. 1993). Complex analytical methods, for example gas or liquid chromatography coupled to mass spectrometry (GC/LC–MS, require extensive sample preparation and are difficult to use in the field (Martinelli et al. 2015).

The development of flexible, non-destructive sensors capable of providing adequate detection sensitivity and pathogen specificity are keys goal for the detection of crop pathogens (Mahlein 2016; Skolik et al. 2018). This includes the ability to detect early effects of plant stress or disease, to differentiate between the effects of abiotic and biotic stresses and between different diseases, and to quantify the severity of the stress or disease (Mahlein 2016). While various molecular and imaging techniques to detect crop pathogens are under development, the limitations of many analytical methods combined with the above criteria for sensor technologies, has led to the development of label-free, non-destructive spectroscopic techniques that provide information about the chemical structure of analysed samples. The spectroscopic approach, originating in analytical chemistry, has been translated to the biological sciences mainly through advancements in computational analysis and the ability to measure live samples (Chan and Kazarian 2016). Application of these techniques to model plant and crop systems has the potential to both provide novel insight into plant–pathogen interactions, whilst generating a large number of variables for autonomous classification of disease states for detection of pests and pathogens.

Mid-infrared (MIR) spectroscopy has made substantial headway in the biological sciences as a non-destructive and rapid bioanalytical sensor technology (Martin et al. 2010). This is because MIR spectra have been effective at providing molecular insights into biological systems, while providing a large number of variables on which to discriminate samples. More recently, spectrochemical techniques have made substantial progress in the plant and crop sciences, specifically with regard to the analysis of dynamic processes and plant biology-related to crop production (Butler et al. 2015, 2017; Ord et al. 2016; Skolik et al. 2018). This has been effective mainly through the development of new data analysis methods including multivariate analysis. For MIR biospectroscopy, data analysis can be split into two main types: exploratory and diagnostic. Exploratory analysis is aimed primarily at data visualization and pattern recognition (Trevisan et al. 2012). Diagnostic analysis employs the use of classifier algorithms to evaluate the potential for diagnosis of sample condition (healthy versus diseased, for example) (Trevisan et al. 2012). While the two frameworks typically have different objectives, they are closely linked and in general, the exploratory precedes the diagnostic framework. Among the many available data analysis options, unsupervised principal component analysis (PCA) and supervised linear discriminant analysis (LDA) have been used alone or in combination to successfully investigate a large number of biological phenomena based on MIR data (Li et al. 2015; Strong et al. 2017). Both PCA and LDA have formed core components of biospectroscopy data analysis. Related classifiers including support vector machine (SVM) and linear discriminant classifier (LDC) have also found ample application in the diagnostic framework. Such advancements have highlighted the potential for MIR biospectroscopy as an effective sensor technology for the plant and crop sciences (Skolik et al. 2018). Despite this progress, the number of investigations on intact samples has been limited, which is arguably and important prerequisite for the development of fully non-destructive horticultural sensors, and thus more research on intact samples is required. Recently, both attenuated total reflection Fourier-transform infrared (ATR-FTIR) and Raman spectroscopy have been favoured for studying intact samples of important crops (Butler et al. 2015; Ord et al. 2016; Fu et al. 2016; Trebolazabala et al. 2013). Raman spectroscopy is a complementary method to FTIR spectroscopy and the two are often combined for a more robust analysis, as each have specific drawbacks due to the distinct light–matter interactions they measure (Baker et al. 2014; Butler et al. 2016). Compared to macro-FTIR, Raman scattering as a low probability event can be highly variable, prone to interference from fluorescence, featuring a small measurement area typically between 20 and 30 µm, and uses more intense laser powers potentially leading to photobleaching (tissue decomposition) of delicate organic samples (Butler et al. 2016; Yeturu et al. 2016). Nevertheless, a strong case has been made in previous studies demonstrating the effectiveness of Raman spectroscopy for direct detection of microbial pathogens in intact crops (Egging et al. 2018; Farber and Kurouski 2018; Yeturu et al. 2016). While direct detection of a plant–pathogen interaction generates spectral changes suitable for disease detection, indirect detection of plant infection through spectral changes in tissues because of pathogen attack remains difficult but offers a novel approach especially for early or pre-symptomatic stages of disease (Skolik et al. 2018). ATR-FTIR spectroscopy has the advantage of macro-measurements increasing the measurement area while also affording a very defined magnitude of light penetration into the sample (Kazarian and Chan 2013; Chan and Kazarian 2016). This may be more suitable for analysis at the whole-plant level as many experiments still rely on previous removal (cutting) of leaves and fruit, which is not truly non-destructive (Trebolazabala et al. 2017; Yeturu et al. 2016). It is therefore essential to evaluate Raman complementary methods such as FTIR, as a combination approach can overcome the limitations of a single technique, as well as the variable nature of crops and plant–pathogen systems covered by modern agriculture.

Herein we use ATR-FTIR spectroscopy to study the effects of damage and ambient infection by the sour rot causing agent G. candidum, both directly and indirectly, in commercially obtained, consumer-stage (red ripe) intact cherry tomatoes. Conventional spectral analysis for the characterization of main absorbance peaks of tomato and fungus G. candidum is followed by exploratory and diagnostic multivariate analysis to probe subtle biochemical changes induced indirectly by damage and infection. Changes to the surface of tomato fruit are characterized in response to damage and sour rot infection using the tandem technique of PCA–LDA to maximize inter-class differences between damage, infected, and control fruit. The diagnostic potential of this approach is evaluated using the tandem classifier PCA–LDC, to distinguish damaged and infected tomato fruit from healthy controls indirectly and autonomously.

Materials and methods

Sample preparation and storage

Vine cherry tomatoes cv. Piccolo were obtained from a local supermarket (Sainsbury’s Lancaster Main Store, UK). All analyses were performed prior to the advertised expiration date, which at the time of purchase presented a window of 8 days. Tomatoes were removed from their commercial packaging taken off the vine and adapted to room temperature (23 ± 1°C) and 35–40% relative humidity for 2 h prior to initial analysis. Loose tomatoes were split into two sets, including a control series accounting for changes occurring in naturally ripening tomatoes over the analysis timeframe (Fig. 1a–c), as well as a set of tomatoes punctured through the stem scar, at 0 h, to a depth of approximately 1 cm with a 21-gauge sterile syringe needle leaving the remainder of the skin intact (Fig. 1d). Damaged tomatoes were thus susceptible to ambient infection at the puncture site, which was visible starting at 48 h post-puncture, and full colonization of the punctured stem scar was observed at 96 h post-puncture (Fig. 1f). Control and damaged tomatoes were dark stored in cardboard boxes under identical conditions to compare pathogen development and allow stem scar infection with ambient fungal spores. Damaged tomato fruit (0 h) were subsequently analysed opposite their shelf-life-matched controls, allowed to age naturally, at 48 and 96 h post-puncture, to assess pathogen infection caused by initial damage. Prior to analysis, tomato fruit were washed thoroughly with de-ionized water to remove dust and debris, as well as fungal growth and fluid exudate present on fruits at the early infection (48 h) and late infection (96 h) stages, prior to spectral acquisition, in order to characterize changes in fruit skin, without contribution from the fungus itself (analysed separately) or exudates from the site of infection. The fungal–fruit complex on fully colonized tomatoes at 96 h post-puncture (Fig. 1f) was analysed to obtain spectra from the fungus in its native state on tomato fruit.

Fig. 1
figure 1

Symptoms associated with tomato fruit damage (d), early (e), and late (f) infection of tomato fruit by G. candidum compared to their shelf-life-matched controls (ac)

ATR-FTIR spectroscopy

MIR spectra were acquired from intact tomato fruit, using a Bruker Tensor 27 IR spectrometer with Diamond ATR Helios attachment (Bruker Optics, Coventry, UK). Spectra were acquired over the range 4000–400 cm−1 with a spectral resolution of 8 cm−1, 3.84 cm−1 data spacing, 32 co-additions and a mirror velocity of 2.2 kHz for optimum signal-to-noise ratio (Martin et al. 2010; Baker et al. 2014). Background spectra were taken prior to each sample to account for ambient atmospheric conditions. The diamond ATR crystal defined a spatial resolution (sampling area) of approximately 250 µm × 250 µm. The whole fruit was placed on the sample stage with no more than 0.1 kg of applied pressure. Between sample measurements, ATR cleaning wipes containing isopropyl-alcohol (Bruker Optics, Coventry, UK) were used to clean the ATR diamond crystal between measurements. Five points from around the fruit circumference were measured; two spectra from each of the five points for a total of 10 measurements per fruit. Ten spectra per fruit generally supply enough replicates for PCA–LDA, which provide intra-class differences (i.e., variance specific to 0 h control vs. 0 h damage, 48 h control vs. 48 h early infection, and 96 h control vs. 96 h late infection) while minimizing the effect of natural tissue heterogeneity potentially masking the subtle effects underpinning plant response to pathogen. Six fruits were measured for each treatment group. Measurements of G. candidum were taken in vivo without modification as part of the tomato–pathogen complex at late infection state (96 h post-puncture). The fungal mass completely covered the ATR crystal during measurements; six separate samples (10 spectra from each fungal sample) of G. candidum were measured to obtain a representative in vivo spectrum.

Pre-processing and computational analysis

All computational analysis was conducted using the open source IRootlab toolbox (https://github.com/trevisanj/irootlab) specialized for analysis of IR spectra (Trevisan et al. 2013), in conjunction with Matlab 2016 (The Math Works, MA, USA), unless otherwise stated. Raw spectra were truncated to the spectral fingerprint region between 1800 and 900 cm−1, which is the primary region where biomolecules absorb IR radiation. Fingerprint spectra were pre-processed using the rubber band-like baseline correction algorithm and maximum normalized to account for differences in sample thickness and ATR diamond contact pressure. Class mean spectra were used for direct analysis. Exploratory PCA reduces the dataset down to factors that account for spectral variance; PCA was optimized using the IRootlab pareto function, where the first 10 PCs accounted for more than 99% of the variance in the dataset [see Supporting Information (SI) Figure S1]. These served as input variables for LDA forming the composite technique PCA–LDA (Trevisan et al. 2012). While PCA reduces the complexity of the spectral data, it is unsupervised, does not account for class labels, views all classes as one and therefore does not distinguish between control, damaged, or infected tomatoes for the purposes of extracting class-specific differences (Trevisan et al. 2012). Combined with a supervised approach, LDA following PCA (PCA–LDA) maximizes the spectral differences between classes (control vs. damage/infected), and thus allows the extraction of the class-specific biomarkers associated with damage and subsequent sour rot (Martin et al. 2010; Kelly et al. 2011; Trevisan et al. 2012). Pairwise comparisons between two classes generate one linear discriminant (LD), which summarizes the main class-specific differences between control and afflicted tomato fruit. This linear discriminant can be visualized as cluster plots, where each spectrum is defined as a point where overlap and separation of points indicate similar or dissimilar features, respectively (Trevisan et al. 2012). PCA–LDA loadings provide a ‘spectrum-like’ graph indicating the wavenumbers at which variance between classes is most pronounced, as indicated by the peak magnitude (variance). Peak maxima are used as ‘spectral biomarkers’ indicative of the biological process under investigation (Kelly et al. 2011).

Exploratory analysis by way of cluster separation along LD1 was explored, to determine whether significant alterations between control and damaged/infected groups were evident. In each case, a pairwise comparison conducted of damaged, early, and late-infected tomatoes at 0, 48, and 96 h, with their shelf-life-matched controls. For a characterization of the main spectral alterations, PCA–LDA loadings, in combination with a peak-pick algorithm (20 cm−1 minimum separation) identifying peak maxima, was used to tabulate the top six most prominent wavenumber alterations (spectral biomarkers). Identified spectral biomarkers were given chemical assignments matched to previously characterized spectral biomarkers, considering parameters including species, tissue type, instrumentation (method, measurement area, interrogation depth, data analysis), and biological interaction (plant–pathogen).

Group classification was evaluated using PCA in combination with a linear discriminant classifier (PCA–LDC), which tests autonomous classification accuracy based on spectral differences (Butler et al. 2017; Gajjar et al. 2013; Friedman et al. 2001). PCA–LDA and PCA–LDC were cross-validated using 10 k-folds. Further information regarding analysis of biospectroscopy data can be found at https://github.com/trevisanj/irootlab and in the literature (Trevisan et al. 2012; Kelly et al. 2011). To test for statistically significant differences in PCA–LDA scores along the primary LD between controls, damaged, and infected tomato, LD1 scores for each biological sample were averaged and tested for significance using unpaired two-tailed t tests (GraphPad Prism).

Results and discussion

Spectral characterization of surface structures of intact tomato fruit S. lycopersicum

Spectra from whole tomato fruit surface structures reflect prominent biochemical components present in the cuticle and cell wall (Serrano et al. 2014). There were no differences visible in the appearance of control, undamaged tomato fruit during the 96-h analysis window (Fig. 1a–c). In contrast, damaged tomato (0 h) had a small puncture wound from the syringe at which fungal infection developed (Fig. 1d–f). Figure 2 shows the primary absorbance intensities of intact tomato fruit corresponding to Fig. 1; for the control set at 0, 48, and 96 h (Fig. 2a) and damaged (0 h), early infected (48 h), and late infected (96 h) (Fig. 2b) over the baseline-corrected and normalized ATR-FTIR fingerprint spectrum over the region 1800–900 cm−1. Comparison of spectra from both control and damage/infected classes shows that the top six main vibrational bands, and chemical assignments were identical as depicted in Fig. 2 and Table 1. Absorbance intensities shown in Fig. 2 and assigned in Table 1 reflect prominent biochemical components of plant surface structures including cutin, phenolic compounds, waxes, and potentially volatile organic chemicals (VOCs) (Baldassarre et al. 2015). Several of these compounds had been identified previously from the inner or outer face of isolated tomato cuticles (España et al. 2014; Heredia-Guerrero et al. 2014), despite differences in spectral resolution and equipment used to characterize isolated tomato cuticles (España et al. 2014). This is consistent with the thickness of the cuticle during the late red-ripe stage of tomato fruit (España et al. 2014; Heredia-Guerrero et al. 2014) and the shallow interrogation depth of the ATR-FTIR beam (~ 1–3 μm). Cuticle components readily identified include vibrational modes associated with the main polymer cutin at wavenumbers 1728, 1462, 1165, and 1103 cm−1 (España et al. 2014; Heredia-Guerrero et al. 2014). Phenolic compounds are among other cuticle constituents that strongly absorb IR radiation, and were identified by absorption at 1605 cm−1 (España et al. 2014; Heredia-Guerrero et al. 2014). We also identified an absorption peak at 1223 cm−1 that is not present in isolated cuticle (Heredia-Guerrero et al. 2014) but which has been previously associated with monoterpenes, more specifically geranyl acetate, a structural component of many VOCs (Ord et al. 2016; Rodríguez et al. 2013; Schulz and Baranska 2007). Tomato fruit produce a characteristic profile of secondary metabolites including VOCs during ripening (Petro-Turza 1986; Buttery et al. 1988, 1990) and it likely that monoterpenes characteristic of VOCs present at the red-ripe stage, represent a unique contribution to the fingerprint spectrum of intact tomato fruit compared to isolated cuticle (Rodríguez et al. 2013). Alternatively, it is possible that the absorption at 1223 cm−1 may simply be a broad absorption band related to the previously identified δ(OH) mode between 1246 and 1243 cm−1 associated with both cutin and other polysaccharides (Heredia-Guerrero et al. 2014). Both cuticle and underlying plant layers including the cell wall have been well studied using MIR-based biospectroscopy (Heredia-Guerrero et al. 2014; Largo-Gosens et al. 2014). However, due to several caveats, the number of studies on intact, and hence physiologically competent, samples have been limited, limiting also the development of vibrational spectroscopy for applied horticulture (Skolik et al. 2018). Characterizing spectral features of tomato fruit in vivo, such as the cuticle, provides a first but important step in this endeavour and will contribute significantly to the sustainability of crop protection measures. Yet the role of the cuticle, and other epidermal structures as part of the tomato fruit skin, in post-harvest quality, shelf-life, and pathogen susceptibility remains debated especially at the molecular level, in part due to the difficult nature of this recalcitrant layer and the intimate relationship with the underlying cell wall (Domínguez et al. 2015; Lara et al. 2014). To shed light on this, analytical surface techniques such as ATR-FTIR spectroscopy are ideal, as demonstrated by the ability to measure delicate intact fruit truly non-destructively in vivo (Fig. 2). But before the molecular in vivo details can be uncovered, surface characterization of intact fruit is necessary to aid the interpretation of more subtle changes hidden in the spectral data, which can only be extracted through multivariate analysis, similar to how previous cuticle component characterization aids the interpretation of the tomato fruit skin in vivo shown in Fig. 2 (Heredia-Guerrero et al. 2014). Comparing between isolated constituents and their native arrangements, in fruit or otherwise, will remain necessary to aid in the identification of candidate target compounds to serve as spectral biomarkers for varying conditions, especially dynamic physiologically driven ones, including plant–pathogen interactions. It is therefore important to characterize the candidate plant compounds being measured by MIR biospectroscopy techniques, for appropriate interpretation of spectral data from physiologically competent samples in vivo. Further, indirect detection of damage to the fruit surface (cuticle, cell wall, epidermis) and pathogens affecting crops, such as tomato which are easily compromised by damage leading to infection, would be of utmost interest for commercial development. Once characterized, changes in the MIR signature caused by abnormalities such as damage, pathogen infection, or stress, will prove useful for monitoring fruit condition as it pertains to shelf life through the post-harvest food system, thereby improving crop utilization.

Fig. 2
figure 2

ATR-FTIR spectrum of intact tomato fruit S. lycopersicum cv. Piccolo, over the fingerprint region (1800–900 cm−1): a control series at 0 (light grey), 48 (grey), and 96 (black) h; b 0 h damaged (light grey), 48 h early infection (grey), and 96 h late infection (black)

Table 1 Primary absorbance peaks of intact tomato fruit S. lycopersicum cv. Piccolo

Spectral alterations associated with tomato fruit damage and sour rot infection by Geotrichum candidum

The MIR spectrum of fruit surface structures is altered in response to damage through the stem scar and subsequent infection by G. candidum. Artificially damaged tomatoes (Fig. 1d) exposed to ambient conditions showed no initial signs of fungal infection after 24 h (data not shown), whilst at 48 h post-puncture (early infection) clear signs of infection were evident around the puncture site (black arrows) (Fig. 1e), and at 96 h post-puncture (late infection) substantial pathogen growth had covered the puncture site (Fig. 1f). Based on visible symptoms starting at 48 h post-puncture (Fig. 1d, f) the pathogen was determined to be G. candidum, a non-specific fungus known as a ubiquitous contaminant of tomato processing equipment (Thornton et al. 2010). Because the mean spectra of control and damage/infected tomatoes (Fig. 2a, b) were nearly identical, with respect to direct comparison of main vibrational bands (Fig. 2 and Table 1), PCA–LDA was employed to investigate if subtle class-specific effects were detectable between control and compromised tomatoes. This approach was intended to determine if any changes caused by damage and subsequent pathogen infection were observable indirectly without contributions from the fungus itself.

Class-specific differences for damage, early, and late infection were observed for tomato fruit compared to their healthy counterparts, as determined by multivariate analysis using PCA and LDA in tandem (PCA–LDA). Pairwise comparisons (class versus control) lead to the generation of a single LD, in this case generating three PCA–LDA score plots (Fig. 3a–c). Figure 3 shows a clear separation of clusters belonging to each paired class, indicating differences in spectra acquired from controls and damaged, early, and late infected (Fig. 3a, b, and c, respectively). Separation along LD1 indicates significant differences within fingerprint spectra, which are specific to damage and infection. PCA–LDA score plots reveal significant data cluster separation, with statistical differences between damaged (p = 0.003), early infected (p = 0.0001), and late-infected (p = 0.0003) fruit, compared to their shelf-life-matched controls. This suggests that spectral changes are most pronounced for the early infected stage, showing the largest degree of separation along LD1 (Fig. 3b), followed by the late-infected stage (Fig. 3c) and damaged fruit (Fig. 3a), respectively. Loading plots (Fig. 3d–f) indicate the wavenumber regions responsible for the observed cluster separation within the PCA–LDA score plots (Fig. 3a–c) (Martin et al. 2007; Trevisan et al. 2012). Wavenumbers identified through peaks within loading pots, represent the areas with the highest degree of variance. Table 2 summarizes the top six discriminating wavenumbers identified from PCA–LDA loading plots. These top six wavenumbers, identified via the peak-picking algorithm described, are assigned as tentative spectral biomarkers responsible for the class-specific differences. Spectral biomarkers identified by PCA–LDA were considered a match if these were within ± 10 wavenumbers of those identified within the other classes. It is noteworthy that because PCA–LDA potentially extracts very subtle differences within complex tissue architectures, biomarkers identified this way may not originate from the prominent cuticle components evident in the fingerprint spectra shown in Fig. 2, but may represent small fractions of molecules embedded in the epidermal matrix. For this reason and without extensive validation of their origin, spectral biomarkers are assigned tentatively.

Fig. 3
figure 3

PCA–LDA 1-dimensional score plots showing class-specific cluster separation indicative of spectral differences between damaged, early, and late infection opposite their shelf-life-matched controls (ac); corresponding loadings show specific wavenumbers responsible for clustering along LD1 (df)

Table 2 Top six discriminating class-specific wavenumbers and tentative chemical assignments, from LD1 loading plots associated with tomato fruit damage, early infection or late infection versus control classes

Several wavenumbers identified for the various fruit conditions showed overlap, where biomarkers as discriminators for initial damage were also identified as discriminators for early and late infection. Vibrational modes at 1701, 1632–1628, 1254–1246 cm−1 were seen to be consistent between initial fruit damage and early G. candidum infection. Absorption at 1701 cm−1 was the only exact match between these two classes. These three wavenumber regions are assigned as carbonyl groups in fatty acid esters of cutin (1701 cm−1) (España et al. 2014); carbon–carbon bonds in phenolic cuticular compounds (1632–1628 cm−1) (Heredia-Guerrero et al. 2014); and hydroxyl group deformation in cutin or other polysaccharides (1254–1246 cm−1) (Heredia-Guerrero et al. 2014), which are part of the epidermal surface. Alternatively, the region from 1254 to 1246 cm−1 has been associated with the amide III band of proteins or methylene functional groups of phospholipids (Movasaghi et al. 2008), which are also potential targets of ATR-FTIR as part of the epidermis. Consistency within spectral biomarkers was also observed between those indicative of damage and those identified within late-stage G. candidum infection, specifically absorption bands at 1582–1574, 1520, and 1215 cm−1. Interestingly, both absorption at 1520 and 1215 cm−1 were exact matches to wavenumbers related to initial fruit damage and may therefore play a role in both damage response and response to pathogens (Table 2). Absorption bands between 1582 and 1574 cm−1 are strongly associated with the amide II band of proteins (Movasaghi et al. 2008). The absorption band at 1520 cm−1 is potentially a shoulder region of the amide II peak but more likely associated with carbon–carbon bonds in phenolic compounds (Heredia-Guerrero et al. 2014), although this region has also been associated with alkene groups in aromatic compounds, or the imine group in nucleic acids (Movasaghi et al. 2008). Class-unique wavenumbers occur only in the early and late infection stages (upon appearance of visual symptoms). All absorption bands identified in damaged tomato occur also in either early or late infection and generate no unique absorbance peaks within the top six tentative biomarkers. Wavenumbers unique to early infection include absorbance at 1747 and 1366 cm−1 (Table 2). Vibrational modes at 1747 cm−1 are associated with double bonds in carbonyl and alkene functional groups of cutin, wax and suberin-like compounds, as well as lipids in general (España et al. 2014; Heredia-Guerrero et al. 2014). Besides compounds including cutin and waxes, also cellulose, pectin, polysaccharides, and sesquiterpenes are biomolecules, which have vibrational modes that absorb at 1366 cm−1 (Largo-Gosens et al. 2014; Heredia-Guerrero et al. 2014; Movasaghi et al. 2008). These spectral biomarkers appear to be unique to early infection of tomato fruit (Table 2). In contrast, late infection of tomato fruit shows specific absorbance at 1724 and 1466 cm−1 and are associated with carbonyl vibration of cutin, lipids, polysaccharides, or phenolic esters; and methylene vibration of cutin or other waxes, respectively (Heredia-Guerrero et al. 2014; Movasaghi et al. 2008). Taken together, these results indicate prominent changes occurring simultaneously across several compounds including lipids, proteins, and carbohydrates, many of which represent prominent components of the epidermal structure including cuticle and cell wall components.

Spectral alterations associated with tomato fruit damage are partially retained during subsequent early and late pathogen infection. Initially, tomato damage induces a wounding response, as colonization by G. candidum has not yet occurred (Figs. 1d, 3a, d), suggesting that the observed spectral alterations are specific to wounding. Both metabolic activity and VOC composition change in response to plant wounding; at the red-ripe stage, damage elicits changes in the VOC profile (Baldassarre et al. 2015). Wavenumbers identified as discriminators for fruit damage may therefore reflect prominent changes to the VOC profile, potentially combined with up-regulation of genes involved in defence reactions and the resulting changes in metabolism (Baldassarre et al. 2015). As VOCs diffuse through plant surface layers, their interaction with the cuticle, cell wall, or epidermis in general may produce alterations in these layers leading to the observed spectral changes (Peñuelas and Llusià 2001). Also, because damage has a direct effect on post-harvest deterioration and shelf life through various biochemical and physiological events (Lara et al. 2014; Watada and Qi 1999), rapid damage detection of tomato fruit using spectrochemical analysis would help prevent subsequent infection and spoilage induced by spreading microorganisms such as G. candidum. Spectral biomarkers from the initial response to wounding are retained in part during subsequent early and late infection (Table 2). Although several spectral biomarkers are consistent between damaged and early as well as late-infected tomatoes, both early and late infection show unique spectral characteristics as well. It therefore seems plausible that changes in tomato fruit surfaces resulting from damage, share common biochemical alterations with early and late-stage infection, for example through a general stress response transitioning into a pathogen-specific response, explaining the overlap in biomarkers previously described (Table 2). As an increasing number of genetic and metabolic changes are induced by wounding and subsequent infection, the change in spectral profile likely reflects the move from damage response to plant–pathogen interaction, explaining the development of unique biomarkers at the early and late infection stages (Table 2).

Spectral alterations in plant surface structures of tomato, related to plant–pathogen interactions have been previously identified and may be related to conserved changes in epidermal surface structures in response to stress through the reactive oxygen species (ROS) network. Plant–pathogen interactions induce complex signalling networks leading to the induction of the hypersensitive response (HR) and/or systemic acquired resistance (SAR), both of which involve significant alterations to metabolism including lignification, suberization, callose deposition, changes in ion fluxes and lipid peroxidation (Camejo et al. 2016). The HR also involves the activation of programmed cell death (PCD). This is often accompanied by an oxidative burst generating ROS in the form of superoxide radical (O2), hydrogen peroxide (H2O2) and hydroxyl radical (·OH) accumulation (Apel and Hirt 2004; Hakmaoui et al. 2012; Suzuki et al. 2011). More generally, ROS signatures are altered in response to abiotic and biotic stresses alone and in combination (Camejo et al. 2016; Choudhury et al. 2017). Further, the oxidative burst, initiated during plant–pathogen interactions with fungi, generates essential ROS, which influence structural features of both cuticle and cell wall (AbuQamar et al. 2017). Part of the response to damage and pathogen attack, specifically at the late ripening stage is accelerated fruit softening caused by cutin depolymerization, which occurs naturally during the ripening program (Saladié et al. 2007; Brummell and Harpster 2001). Observed changes are therefore likely associated with a stress response initiated by fungal infection at a distance (in this case infection at the stem scar) and not caused by fungal released cutinases leading to cutin hydrolysis and depolymerization. To this end, the region 1750–1700 cm−1 has been implemented in the measurement of cutin in tomato cuticles, with the potential to determine the degree of cutin esterification (España et al. 2014). This region was not only identified as a major cuticle component of intact tomato fruit (Table 1), but was also extracted by PCA–LDA for all classes (Table 2) making this spectral region a potentially robust biomarker indicative of spectral alterations associated with cuticle-dependent shelf-life and pathogen susceptibility. ROS signatures, or more specifically downstream targets of ROS present in epidermal surface structures such as the cuticle and cell wall, may therefore offer suitable targets for the detection and potential quantification of both abiotic/biotic stresses in various combinations (AbuQamar et al. 2017; Choudhury et al. 2017). Wavenumbers associated with the categories of damage, early, and late infection (Table 2), have also been identified as biomarkers related to abiotic and biotic stress in the epidermal surface structures of intact leaves of Acer pseudoplatanus (Sycamore) (Ord et al. 2016). Importantly, in this and the study by Ord et al. (2016), ATR-FTIR spectroscopy coupled with the composite technique PCA–LDA was employed emphasizing the effectiveness of this technique to extract biochemical information from dynamic biological processes. Spectral biomarkers identified in A. pseudoplatanus were associated with abiotic stresses caused by ozone and vehicle air pollution, as well as biotic stress caused by the tar spot leaf fungus Rhytisma acerinum. Changes in the cuticle and cell wall, as well as ROS signalling, are early events in the response of plants to environmental stress making it plausible that certain biochemical and biophysical changes occurring in plant surface structures in response to stress are conserved between species. Consequently, the observed alterations in the spectral signature of A. pseudoplatanus leaves (Ord et al. 2016) may be linked to the generation of ROS in response to stress, providing the connection between biomarkers observed here and those measured in A. pseudoplatanus stress response (Ord et al. 2016). This would explain the appearance of spectral biomarkers in tomato fruit related to damage and biotic stress, which have been previously associated with both abiotic and biotic stresses in surface structures of the distantly related A. pseudoplatanus. Although spectral biomarkers identified here in tomato match with stress biomarkers reported previously, the biomarkers occur in different combinations, which may be due to a combination of factors including inter-species differences, difference in tissue type, or differences conferred by disease (stress) specificity. Nevertheless, the identification of such a large number of spectral biomarkers point to strong commonalities between these two, different species, and suggests that spectral alterations relate to dynamic physiological changes pertaining to biotic and abiotic stress responses. While difficult to confirm through spectrochemical analysis alone, once additional data become available, the link between changes in the MIR signature, changes in epidermal structures, plant stress, and specific signalling pathways such as ROS, will become increasingly clear.

In vivo spectral characterization of sour rot pathogen Geotrichum candidum

Interaction of G. candidum with tomato fruit in vivo appears to alter the MIR spectrum characteristic of typical fungi. The fungus G. candidum was measured on the tomato fruit as depicted in Fig. 1f at 96 h post-puncture. To date, MIR has been primarily used to study fungal pathogens from isolated and prepared samples (Salman et al. 2012, 2010). G. candidum is an economically important pathogen as it induces sour rot in many fruit and vegetable crops including tomato (Cantu et al. 2008). Its ubiquitous occurrence, as part of the human micro-biome, soil, as well as horticultural processing equipment, makes pathogenic strains of G. candidum a threat to crops (Thornton et al. 2010). Further, G. candidum can improve the conditions for infection by other pathogens thus contributing to further infection or synergistic pathogen interactions (Suzuki et al. 2014; Wade et al. 2003).

Figure 4 shows the ATR-FTIR fingerprint spectrum of G. candidum in vivo on tomato fruit. The main six vibrational bands of G. candidum in vivo are shown in Table 3. Identified vibrational bands are distinct from those of tomato fruit (Fig. 2 and Table 1) and contain several absorbance peaks consistent with those of other fungal pathogens (Salman et al. 2012, 2010). Absorbance peaks at 1639, 1547, 1404, and 1034 cm−1 could be assigned to a typical fungal MIR spectrum (Salman et al. 2012, 2010). In comparison, vibrational bands at 1342 and 1238, which are prominent peaks of G. candidum in vivo, appear to be much less pronounced or even absent depending on the fungal species under study (Salman et al. 2012). Main absorbance peaks of G. candidum in vivo, show vibrational modes associated with proteins between 1639 and 1342 cm−1 (Movasaghi et al. 2008; Salman et al. 2012) Specific absorbance peaks over this region include 1639, 1547, and 1404 cm−1, corresponding to the fundamental protein vibrations amide I, amide II, and (C–N) vibration, respectively (Movasaghi et al. 2008; Salman et al., 2010, 2012). Vibration at 1034 cm−1 is also readily identified as belonging to the chitin (C–O) bond (Salman et al. 2012, 2010). Absorbance at 1639, 1547, 1404, and 1034 cm−1 are thus all consistent with those previously characterized in fungal isolates of Colletotrichum, Fusarium, Rhizoctonia and Verticillium species (Salman et al. 2012, 2010). However, the vibrational bands identified here at 1238 and 1342 cm−1 do not appear to be a common constituent of other fungal pathogen isolates (Fig. 4). Phosphate (PO42−) vibrational band at 1238 cm−1 is strongly associated with nucleic acids such as part of the DNA or RNA phosphate backbone (Movasaghi et al. 2008). Polysaccharide vibration (CH2), atypical of fungi was also identified as a strong peak of G. candidum as part of the tomato fruit–pathogen complex (Fig. 4). It is likely that the discrepancy between the spectrum of G. candidum and those of other species is a result of in vivo analysis. The unique interaction between fungi and their host plants could influence the measured composition of the fungus, when compared to MIR spectra of fungal isolates, which are homogenized and taken out of their biological context. Although it cannot be ruled out that lack of sample preparation (dehydration and homogenization) prior to spectral acquisition led to a higher water content and more heterogeneous arrangement, which influenced the MIR spectrum (Fig. 4), the fundamentally different biochemical composition of fungal pathogens to that of tomato fruit is reflected in their respective MIR fingerprint spectra (Figs. 2 and 4). This fundamental difference in composition has led to the direct detection of fungal pathogens within plant tissues using differences in MIR spectral data. However, here we demonstrate that the typical fungal spectrum may have very unique features when measured intact. What remains to be seen is whether the differences in the MIR fungal spectrum arise due to simple fungal heterogeneity, or whether the interaction between plant and pathogen is the driving force for changes in its MIR fingerprint. Regardless, the characterization of pathogens in their native state, and as part of in vivo host–pathogen systems, is necessary to fully evaluate MIR for the non-destructive and rapid analysis of plant–pathogen interactions outside of the laboratory under the many variable conditions in which they occur.

Fig. 4
figure 4

ATR-FTIR fingerprint spectrum of in vivo sour rot pathogen G. candidum present on tomato fruit at the 96 h late infection stage

Table 3 Primary absorbance peaks of fungal pathogen G. candidum in vivo on tomato fruit

Autonomous indirect detection of damage and infection based on alterations to tomato fruit surfaces

Diagnostic classifiers based on PCA–LDC are effective at detecting tomato fruit damage and infection indirectly and autonomously. PCA–LDC is one of many classifier algorithms used as training/validation datasets to evaluate the potential for autonomous classification based on MIR spectra (Butler et al. 2017; Strong et al. 2017). To evaluate the potential for autonomous detection of damage, early, and late infection, compared to healthy shelf-life-matched controls using MIR, spectra of intact tomato fruit were used as training/validation datasets for the PCA–LDC classifier (Fig. 5). Discrimination of classes using PCA–LDC has recently been applied to plant tissues with high accuracy (Butler et al. 2017). Classification of healthy controls, compared to their initially damaged but non-infected counterparts, showed the lowest observed accuracy at 78% for healthy controls, while freshly damaged tomatoes were identified correctly 83% of the time (Fig. 5a). In comparison, tomato fruit showing early signs of sour rot were accurately classified at 97%, and 92% for healthy controls at 48 h post-puncture (Fig. 5b). Late-stage G. candidum-infected tomatoes correctly classified 83% similar to freshly damaged tomatoes, was in contrast to the classification of control group at 96 h, which showed a classification accuracy of 96% (Fig. 5c). This was consistent with the separation observed along the primary LD for these classes (Fig. 3), as well as with the classification rates achieved by Butler et al. (2017) investigating calcium nutrient deficiency in tissues of Commelina communis. Interesting is the higher classification accuracy at early (97%) compared to late-stage infection (83%) (Fig. 5b, c). Late-stage infection leading to tissue breakdown and fruit softening, is likely more similar to the natural ripening process represented by control fruit at 96 h, when compared to younger control tomatoes at 48 h opposite their freshly colonized early infected counterparts. Also, the switch from damage to pathogen response may be more pronounced in comparison to shelf-life changes in younger tomatoes at 48 h. Such effects may lead to the higher classification accuracy in early infected compared to late-infected tomatoes. This is beneficial as detection of early infection is favourable over detection at later disease stages.

Fig. 5
figure 5

Classification rates (%) of damage, early, and late infection, compared with shelf-life-matched controls, extracted from PCA input to linear discriminant classifier (PCA–LDC)

These collective data suggest that correct classification of infected tomato fruit improves with disease progression. Demonstrating that this is possible indirectly based only on changes in fruit epidermis not yet afflicted by pathogens, will be important to be able to detect damaged tomato fruit, prior to the development of the symptoms of fungal infection in order to reduce food waste through the repurposing of the affected crops. In addition, the early detection of fungal infection would help prevent the effects and spread of post-harvest disease. It is well documented that damage to delicate fruits and vegetables leads to rapid spoilage (Tournas 2005) and thus early symptoms of damage may also serve as a pre-symptomatic indicator for imminent infection by ambient microorganisms. To this end, classifier performance may be further optimized by increasing the number of factors (PCs) fed into the LDC. For commercial development, appropriate training and test datasets would likely improve classification accuracy further. Nevertheless, preliminary classification accuracy of around 80% upwards is promising and certainly provides precedence for further development of spectrochemical analyses as a tool for crop protection.

Conclusions and future perspectives

Spectrochemical analysis combined with multivariate analysis offers a non-destructive sensor technology for the analysis of intact crops, active pathogens, and plant–pathogen interactions. Spectral characterization of intact tomato fruit showed prominent components from the cuticular layer of the plant epidermis including cutin, phenolic compounds, waxes, and VOCs. During healthy growth and plant–environment interactions, these compounds are notably modified with consequences for fruit quality and thereby provide unique groups of compounds serving as targets of dynamic processes pertaining to crop biology. At the environmental interface, the cuticle is of specific importance due to its role, as part of the cell wall, in the determination of fruit qualities such as susceptibility to cracking and pathogen infection (Isaacson et al. 2009; Lara et al. 2014).

Multivariate analysis (PCA–LDA) can effectively discriminate healthy and compromised tomato fruit, based on damage and sour rot infection by G. candidum, effectively detecting pathogens indirectly. Spectral alterations in tomato fruit epidermis caused by damage and sour rot, induced changes in cuticle structure, which were assigned as tentative biomarkers. Damage, early and late-stage infected fruit thereby showed unique spectral profiles, while partial overlap of spectral markers between damage and early infection, as well as damage and late infection suggests a potential for disease specificity at these distinct stages. Disease specificity based on unique spectral markers is tentatively linked to complex and evolving stress responses. While the exact connection between spectral biomarkers of compromised tomato fruit and specific stress responses remains unclear, they are linked either directly or indirectly to plant responses such as ROS, SAR, and the HR. Clear alterations observed between healthy and damaged tomatoes further suggests the potential to identify damaged fruit prior to pathogen colonization. This may prevent disease spread, or to repurpose unmarketable specimens. Spectra of fungal pathogens and tomato fruit are fundamentally different offering direct detection of colonized pathogens within the intact fruit–pathogen complex.

Automatic detection of damage, early, and late infection through changes in fruit epidermal surface layers was evaluated based on the related classification model PCA–LDC. Indirect detection of damage and infection was shown to be effective with detection accuracy improving with disease development. Classification of tomato fruit damage and infection ranged between 83% and 97%, which may be improved through knowledge transfer, the use of more sophisticated classification models, and trials with larger sample cohorts available to commercial growers.

Adapting spectrochemical analysis for fundamental plant science has been successful, yet more work is required to exploit the sensor potential of MIR spectrochemical analysis in complex crop systems. Herein, we demonstrate the ability to analyse individual parts of plant–pathogen complexes in vivo and show that effects of damage and infection generate unique spectral signatures reflecting common stress responses in fruits. These signatures are effective for the autonomous detection of compromised fruit crops non-destructively both directly and indirectly. This opens the door for future work, which may focus increasingly on intact or native plant systems. Portable spectrochemical analysis equipment including MIR and Raman probes are becoming increasingly available and just beginning to be explored for crop analysis (Egging et al. 2018; Farber and Kurouski 2018; Fu et al. 2016; Trebolazabala et al. 2013; Yeturu et al. 2016). Rapid developments in MIR spectrochemical analysis for plant and crop science will likely to lead to concrete large-scale applications for crop protection and production in the near future.

Author contribution statement

FLM and MRM jointly initiated and led the research. FLM, MRM and PS all contributed to the research design. PS conducted experiments, analysed the data and wrote the manuscript. FLM and MRM provided feedback on data analysis and on the manuscript. All authors read and approved the manuscript.