Introduction

Planting various hybrids of larch species in commercial forestry has gained increasing importance around the world, owing to their heterotic vigor (Pâques 2004; Baltunis et al. 1998). In Sweden, for example, the interest in growing Larix eurolepis Henry—a hybrid of European larch (Larix decidua Mill.) and Japanese larch [Larix kaempferi (Lamb.) Carr.]—has increased over the past decades. This is much due to a greater focus on climate adaptation in forestry. Future climate change will probably alter the growing conditions in Scandinavia in a way that makes forestry with high productivity exotic species more attractive. Using several species, instead of the few traditional ones, Picea abies (L.) Karst and Pinus sylvestris L., is a way to spread the risk of an unknown future. Hybrid larch offers good growth, relatively short rotation age and show high wind stability at older ages (Ekö et al. 2004).

Commercial seeds of hybrid larch are produced in seed orchards, which normally include a number of grafts of one L. kaempferi clone (maternal parent) and of several L. decidua clones (paternal parents). There are also examples where the species of the maternal and paternal clones are the reversed. To get as high proportion of hybrid seed as possible, seeds are normally collected only from the maternal clones, while the paternal clones only are used for pollination (Hannerz et al. 1993). Still the seeds from seed orchards of this design show varying hybrid seed proportions over years, as shown in a French study (Acheré et al. 2002) where it varied between 2 and 67 %. This may be caused by poor synchronization in flowering among the two species in combination with external pollen from nearby larch plantations, including hybrids or the same species as the maternal clone (Pâques 2000). Another reason may be that the maternal clone is not self-sterile, i.e. producing self-pollinated seeds (Pâques et al. 2013). There are also examples of hybrid seed orchard designs where several maternal clones are included from which seeds are collected. However, the hybrid seed proportion from such designs is likely to be lower since the maternal trees are able to pollinate each other, producing viable pure parental seed. For instance, the English NT23 has such a design and has shown a hybrid rate of 15 % (Helmersson et al. 2015). The varying proportion of hybrid seeds results in problems both for nursery managers and larch growers (individual forest farmers or companies).

As seeds can easily be transferred between countries or between places within countries, there could be a high risk to the maintenance of the genetic purity of seed lots as the transaction practice hitherto is based on “trust-on-labels”. For instance, the EU regulations demands that hybrid seed producers have to provide information about the hybrid proportion (Pâques 2009). In the past, several approaches have been used to distinguish hybrid larch from its pure parental species, including morphological traits (Gauchat and Pâques 2011), cytogenetic analysis (Zhang et al. 2010), leaf fatty acid compositions (Sato et al. 2008) and molecular markers (Acheré et al. 2004; Pâques 2009). Rogueing of seedlings in the nursery is also used in practical operations, as the seedlings of pure parental species grow slowly compared to the seedlings of the hybrid. Although these methods allow identification of larch species and their hybrids, they have mostly limited application in certification of hybrid larch seed lots. Some of the approaches are destructive, but all are considered expensive.

The aim of this study was to evaluate the feasibility of near infrared spectroscopy (NIRS) as cost-effective, technically less complex and non-destructive method for identifying hybrid larch seeds from that of pure parental species. NIRS is a widely recognized analytical technique that measures moisture and chemical composition of biological materials based on absorption of near infrared radiation by bonds between light atoms, such as C–H, O–H and N–H that result in overtones and combination bands detectable in the 780–2500 nm wavelength region (Workman and Weyer 2012). NIRS has been proven successful for authentication of melon genotypes (Seregély et al. 2004), separation of Eucalyptus globulus genotypes (Castillo et al. 2008), hybrids of Eucalyptus (Humphreys et al. 2008) and pine (Espinoza et al. 2012) species as well as detection of introgression in Coffea arabica cultivars (Bertrand et al. 2005). To our knowledge, no study has been carried out to identify hybrid larch seeds using NIRS, which is of paramount importance for monitoring natural hybridization in commercial open pollinated mixed species seed orchard.

Materials and methods

Seed samples

Seeds of L. decidua, L. kaempferi and their hybrid (L. × eurolepis) were obtained from clonal archive at the research station of the Swedish Forest Research Institute at Ekebo, Sweden (55°57′N, 13°07′E, 80 m). Seeds of L. decidua of known maternal (D02V983) and paternal (S21K9780044) clones, and similarly L. × eurolepis (S21K9580102 × S21K9580032), were produced by controlled pollination. Seeds of L. kaempferi were produced by open pollination of known maternal clone (S08N1001) but unknown paternal clone. Seeds of both L. decidua and L. × eurolepis were produced in 2010 and that of L. kaempferi were produced in 1995. The seeds were stored in a freezer (−4 °C) from the time of harvest. A total of 336 seed samples, 112 samples per species, were randomly drawn from the total seed lots of each family to serve as working sub-samples for NIR analysis.

Spectral acquisition

NIR reflectance spectra of single seeds, expressed in the form of log (1/Reflectance), were collected with XDS Rapid Content Analyzer (FOSS NIRSystems, Inc.) from 400 to 2498 nm at an interval of 0.5 nm. Individual seeds were directly placed at the center of the scanning glass window of the instrument with 9 mm aperture at stationary module and then covered with the instrument’s lid that had a black background. Prior to collecting the NIR spectra of single seed, reference reflectance measurement was taken using the standard built-in reference of the instrument. In addition, reference measurements were taken after every 20 scans to reduce the effects of possible instrumental “drift”. For every seed, 32 monochromatic scans were made and the average value recorded.

Spectral data pre-treatment

NIR data are not usually amenable for direct analysis due to unwanted systematic variation that is not correlated with the chemical composition of the sample caused by instrumental drift, path length differences, base line shift and light scattering. This systematic noise in the spectra may be removed from the original NIR spectra to enhance signal to noise ratio using spectral data pre-treatment techniques; the most common ones in NIR spectroscopy being multiplicative signal correction (MSC), standard normal variate transformation (SNV), first and second derivatives (Rinnan et al. 2009). The MSC and SNV approaches reduce the scatter effect originating from sample physical variability while derivatives remove multiplicative effects in the spectra thereby reducing peak overlap and particle size between the samples. In this study, the original spectral data were pre-treated using these techniques, and the performance of the multivariate classification models were evaluated.

Multivariate classification modelling

Multivariate classification models were derived by partial least squares-discriminant analysis (PLS-DA) and bi-orthogonal projection to latent structures-discriminant analysis (O2PLS-DA). PLS-DA is a regularized version of linear discriminant analysis, in which a matrix of dummy variables that indicate class membership was used as regressand and the digitized NIR spectra as a descriptor matrix (Næs et al. 2002). Unlike PLS-DA modelling, O2PLS-DA modelling is a two-step approach where it first removes spectral variations that have no correlation with the classes and then fit the discriminant models based on predictive spectral variation. The spectral filtering step in O2PLS-DA was unique from the spectral treatments mentioned above as it took the response variable into account in its algorithm and removed more general types of interferences in the spectra by removing components orthogonal to the response variable calibrated against. To this end, the O2PLS-DA modelling approach used the information in the categorical response matrix Y (a matrix of dummy variables in our case) to decompose the X matrix (the spectral data) into three distinct parts: (1) the predictive score matrix and its loading matrix for X, (2) the corresponding Y-orthogonal score matrix and loading matrix of Y-orthogonal components, and (3) the residual matrix of X (Trygg and Wold 2003). Components orthogonal to the response variable containing unwanted systematic variation were then subtracted from the original spectral data to produce a filtered descriptor matrix. The final discriminant model was then computed using the filtered predictive spectral variations only.

Prior to modelling, the data set was divided into calibration and prediction sets. The calibration set was composed of 225 individual seeds (75 individual seeds per seed lot) and the prediction set was composed of 111 individual seeds (37 individual seeds per seed lot). Classification models were developed using absorbance values as regressor and a Y-matrix of dummy variables (1 for member of a given class, 0.0 otherwise) as regressand. The models were first fitted on both visible and NIR regions (400–2500 nm) and then on NIR region (780–2500 nm) only, and the performance of these models were compared. All calibrations were developed on mean-centered data sets and the number of significant model components were determined by a seven-segment cross validation (a default setting). A component was considered significant if the ratio of the prediction error sum of squares (PRESS) to the residual sum of squares of the previous dimension (SS) was statistically smaller than 1.0 (Eriksson et al. 2006). Finally, the computed models were used to classify samples in the prediction set, and seeds were considered as member of a given class if the predicted value was greater than an acceptance threshold (≥0.5) and all others were considered as non-members. Classification accuracy, expressed in percentage, was computed as the proportion of seeds predicted correctly as member of a given class to the total number of seeds in the prediction set for that class.

To analyze absorption bands that accounted for the discrimination of pure and hybrid larch seeds, a parameter called variable influence on projection (VIP) was computed as follows.

$$VIP_{AK} = \sqrt {\left( {\mathop \sum \limits_{a = 1}^{A} \left( {w_{ak}^{2} \times \left( {SSY_{a - 1} - SSY_{a} } \right)} \right) \times \frac{K}{{\left( {SSY_{0} - SSY_{A} } \right)}}} \right)}$$

VIP for A components and K variables is a weighted sum of squares of the PLS weights (w) for a given component a and k variable, taking into account the amount of explained Y-variance (SSY) of a component, and SSY0 and SSYA are sum of squares of the response variable Y before and after extracting A number of components, respectively. Its major advantage is that there will be only one VIP-vector, summarizing all components and Y-variables, thereby enabling absorption bands that influence the discriminant models to be identified. As a rule, predictors with VIP value greater than 1.0 have a strong influence on the model, but a cut-off around 0.7–0.8 has been suggested to discriminate between relevant and irrelevant predictors (Eriksson et al. 2006). All calculations were performed using Simca-P + software (Version 13.0.0.0, Umetrics AB, Sweden).

Results

Spectral profile

The average reflectance spectra, in the form of log (1/R), of Japanese and European larch seeds and their hybrid showed a similar profile with one major absorption peak in the visible region at 460 nm and three major peaks in the NIR region around 1470, 1924 and 2100 nm (Fig. 1). Absorbance values in the visible range were slightly higher for seeds of L. × eurolepis than seeds of the other two species. In the NIR region, seeds of L. decidua had higher absorbance values than seeds of L × eurolepis, which in turn had higher absorbance values than seeds of L. kaempferi. As a whole, the Vis + NIR spectra provided sufficient information to develop classification models.

Fig. 1
figure 1

Average raw spectra of L. decidua, L. kaempferi and their interspecific hybrid (L. × eurolepis) in reflectance spectroscopy mode

Classification of pure and hybrid larch seeds by PLS-DA modelling

PLS-DA models developed using Vis + NIR spectral range (400–2500 nm) required 9–15 significant components (A) to describe 91–94 % of the class variation (R2Y) in the calibration set, depending on the data set (Table 1A). The prediction power of the models according to cross-validation (\({{\text{Q}}^{2}}_{\text{cv}}\)) ranged from 85 to 87 %. For samples in the prediction set, the accuracy of predicted class membership for L × eurolepis was 100 % across all data sets, except the 2nd derivative data set where one seed sample was rejected as a non-member. Similarly, the accuracy of predicted class membership for L. decidua seeds was 97–100 %; and that of L. kaempferi was 95–97 %.

Table 1 Summary of PLS-DA models developed using Vis + NIR (400–2500 nm) and NIR (780–2500 nm) regions together with predicted class membership of L. decidua, L. kaempferi and L. × eurolepis in the prediction set (n = 37 for each seed lot)

When the PLS-DA models were developed using the NIR region alone (780–2500 nm), the number of significant components to build the model was slightly lower than the previous models built using Vis + NIR region. However, the computed models still explained 86–94 % of the class variation for the calibration set with 80–87 % prediction ability according to cross-validation (Table 1B). For samples in the prediction set, the classification accuracy of pure and hybrid larch seeds did not change much compared to the model built using Vis + NIR region, except the 1st derivative data set that resulted in 13 % less classification accuracy for L. kaempferi (cf. 84 % in NIR and 97 % in Vis + NIR, Table 1).

Classification of pure and hybrid larch seeds by O2PLS-DA modelling

The O2PLS-DA models developed using the Vis + NIR had two predictive and 7–14 Y-orthogonal components, depending on the data set (e.g. A = 2 + 10 for untreated data set, Table 2). The predictive spectral variation (R2XP) accounted for 9–46 % of the total spectral variation of the pure and hybrid seed classes while the Y-orthogonal spectral variation (R2Xo) constituted 47–82 %, depending on the data set (Table 2A). The predictive spectral variations (R2XP), in turn, modelled more than 90 % of the variation between pure and hybrid seed classes (R2Y) in the calibration set for all but untreated data sets, with 83–90 % prediction accuracy (\({{\text{Q}}^{2}}_{\text{cv}}\)) according to cross validation (Table 2A). For models fitted using the NIR region alone, the two components were also required to build the models, but the Y-orthogonal components were slightly lower than the full spectrum models (Table 2B). The predictive spectral variation ranged from 5 to 28 %, which in turn modelled 77–90 % of the class variation with 74–88 % classification accuracy according to cross-validation (Table 2B). The modelled class variation (R2Y) and the predictive ability of the model (\({{\text{Q}}^{2}}_{\text{cv}}\)) were larger for pre-treated than untreated data sets, particularly for SNV-treated data set, irrespective of the wavelength region. As a whole, the model statistics showed that the NIR region alone contained substantial information that allowed hybrid larch seeds to be discriminated from pure parental larch seeds.

Table 2 Summary of O2PLS-DA models developed to discriminate pure and hybrid larch seeds using Vis + NIR and NIR regions

The O2PLS-DA models computed using SNV-treated data sets consistently assigned L. decidua and L. kaempferi seeds in the prediction set to their respective classes with 100 % accuracy in both Vis + NIR and NIR regions, while the classification accuracy for L × eurolepis seeds was 97 % in the NIR region and 100 % in Vis + NIR region (Fig. 2). The model fitted on MSC-treated data set also resulted in complete recognition of pure and hybrid seeds in the Vis + NIR region, as also the model fitted on 1st derivative data set, except for the L. kaempferi seeds that were classified with 97 % accuracy (Table 3). In the NIR region alone, the classification accuracy was slightly less, particularly for MSC-treated data set where the accuracy was 11, 8 and 3 % less than that in the Vis + NIR region for L. decidua, L. kaempferi and L × eurolepis, respectively.

Fig. 2
figure 2

Class membership of L. decidua (a), L. × eurolepis (b) and L. kaempferi (c) seeds in the prediction set (n = 37 for each seed lot) predicted by O2PLS model developed using SNV-transformed data set according to their class. Note that the red dashed line is threshold for classification. (Color figure online)

Table 3 Predicted class membership of L. decidua, L. kaempferi and L. × eurolepis in the prediction set (n = 37 for each seed lot) by O2PLS-DA models developed using Vis + NIR (400–2500 nm) and NIR (780–2500 nm) regions of different data sets

To get a better insights into the modelling process, score and loading plots for O2PLS-DA model fitted on SNV-treated data set were further examined (note that this model had consistently higher prediction accuracy in both Vis + NIR and NIR regions than models fitted on other data sets). The score plot (t[1] vs. t[2]) showed a clear separation of L. decidua seed lot from the other two seed lots along the first predictive component, while L × eurolepis seed lot was clearly separated from the pure larch seed lots along the second component (Fig. 3a). The corresponding predictive loading plot for the first component revealed that one sharp peak at 410 nm and four broad absorption bands in 1409–1630, 1886–1996, 2019–2190 and 2230–2410 nm that were useful to discriminate L. decidua seed lot from the other seed lots (Fig. 3b). The loading plot for the second predictive component also showed one sharp peak at 460 nm and two broad absorption bands in 840–1190 and 1217–1620 nm that were mainly accounted for discriminating L × eurolepis seed lot from the pure parental seed lots, while an absorption peak at 638 nm was mainly accounted for discriminating L. kaempferi from the L × eurolepis seed lot (Fig. 3c).

Fig. 3
figure 3

Score plot for the first two predictive components (t1 vs. t2) of O2PLS-DA model built using SNV-transformed spectra, depicting clear-cut separation of seeds of European (green circle), Japanese (red triangle) and hybrid (blue inverted triangle) larch (a) together with loading plots for the first (b) and second predictive components (c) showing absorption bands accounted for class discrimination. Note that the ellipse in a is 95 % confidence interval according to Hotelling’s T 2 test (a multivariate generalization of Student’s t test). (Color figure online)

To determine which phenomena were irrelevant for discriminating seeds of pure and hybrid larch and in which spectral region they occurred, the score and loading plots of the first two Y-orthogonal components, accounting for 52 % of the spectral variation, were examined. The score plots for the first two Y-orthogonal components showed few samples fell outside of the 95 % confidence ellipse (Fig. 4a) according to Hotelling’s T 2 test (a multivariate generalization of Student’s t test), but excluding these outlying samples did not improve the model. The corresponding loading plots showed that small peaks centered at 415, 687, 1456, 1929 and 2111 nm in the first component (Fig. 4b); and peaks centered at 484, 771, 1453, 1929 and 2110 nm in the second component (Fig. 4c) were accounted for spectral variations uncorrelated to class discrimination. Thus, the sources of irrelevant spectral variation explained by the model were related to divergence in color, moisture content and storage reserves among individual seeds within each class, which in turn was related with individual seed size variation.

Fig. 4
figure 4

Orthogonal Score plot of O2PLS-DA model built using SNV-treated spectra, showing few outliers of European (green circle), Japanese (red triangle) and hybrid (blue inverted triangle) larch seeds (a) together with the corresponding loading plots for the first (b) and second orthogonal components (c) showing absorption bands. Note that the ellipse is 95 % confidence interval according to Hotelling’s T 2 test. (Color figure online)

Absorption bands accounted for classification of pure and hybrid larch seeds

The VIP plot shows that absorption bands in 400–750 nm, with two major peaks centered around 460 and 638 nm and two shoulder peaks in the vicinity of 415 and 687 nm had a strong influence on the discrimination of pure and hybrid larch seeds (VIP > 1; Fig. 5). In the NIR region, absorption bands in 1890–2201 and 2245–2500 nm, with peaks centered at 1929, 2098, 2332 and 2490 nm also accounted for class discrimination. Other NIR regions of interest that helped improve class discrimination appeared in the 860–1380, 1410–1505 and 2240–2388 nm (VIP = 0.81 − 1.0).

Fig. 5
figure 5

VIP plot for the O2PLS model built on SNV-treated data set in 400–2500 nm wavelength region. The threshold of significant contribution in model building is shown by red dashed line. (Color figure online)

Discussion

The results demonstrated that pure and hybrid larch seeds can be successfully distinguished by Vis + NIR spectroscopy and multivariate modelling. PLS-DA models fitted on both raw and pre-treated spectral data sets resulted in comparable classification accuracy for samples in the prediction set. However, spectral pre-treatments slightly reduced the number of components needed to build the models, which might be attributed to the removal of scatter effect to some extent (Rinnan et al. 2009). On the contrary, the O2PLS-DA modelling approach needed only two predictive components to successfully classify pure and hybrid larch seeds. In particular, the classification model fitted on SNV-treated data set had excellent prediction ability (Q2) compared to other data sets (note that a model with Q2 ≥ 0.9 is considered excellent sensu Eriksson et al. 2006). This indicates that not only the scatter effect but also other sources of spectral noise that are not correlated with class discrimination exist.

Analyses of score and loading plots of Y-orthogonal components revealed some outlying samples from each seed lot, and absorption peaks at 416, 680, 1456, 1930 and 2110 nm that accounted for spectral variation uncorrelated with class discrimination (Fig. 4b, c). Absorption bands in 1400–1500 and 1900–2000 nm with peaks at 1450 and 1940 nm were likely attributed to the presence of water due to O–H stretch first overtone and combination bands involving O–H stretch and O–H (Osborne et al. 1993; Tigabu and Oden 2003). The absorption band in 2050–2200 nm is characteristic of CH2 stretch-bending combinations and correlates positively with fatty acid composition (Hourant et al. 2000). Apparently differences in color, moisture content and fatty acids among individual seeds within each seed lot can possibly cause spectral variations that have no correlation to class discrimination. By extracting these spectral variations irrelevant to class discrimination, the O2PLS-DA modelling approach resulted in dimensionally less complex and parsimonious models (A = 2 for O2PLS-DA vs. A = 15 for PLS-DA). Dimensional complexity is an important factor in the interpretation of multivariate analysis and parsimonious models with few components are often highly preferred (Trygg and Wold 2003).

In the visible region, absorption maxima that were accounted for discriminating L. decidua, L × eurolepis and L. kaempferi seeds appeared at 410, 460 and 638 nm, respectively (Figs. 3b, c, 5). Apparently seeds of L. kaempferi appear to be more red-brownish than L. decidua and L. × eurolepis seeds, which in turn vary slightly in color. As the seed coat and the megagametophyte (storage organ) account for more than half of the total seed mass and are of maternal origin, the chemical composition of the seed coat would presumably be influenced more by the genotype of the maternal than paternal parents. It should be noted that the maternal parent for the hybrid larch in the present study was L. decidua while the paternal parent was L. kaempferi. Many conifers also exhibit genotypic variation in seed physical traits, such as surface structure of seeds (Tillman-Sutela et al. 1998), seed size and germinability (Mamo et al. 2006) as well as qualitative color characteristics of the seed coat (Tillman-Sutela and Kauppi 1995); thus color variation might be expected among seed lots investigated in the present study. Our finding for Larix agrees with previous studies that have demonstrated the efficacy of the visible region for classifying wheat kernels according to their color (Wang et al. 1999) and in identifying seed origin and parents of P. sylvestris L. trees (Tigabu et al. 2005).

In NIR region, absorption bands in 1409–1630, 1886–1996, 2019–2190 and 2230–2410 nm were highly relevant for discriminating L. decidua seeds from L × eurolepis and L. kaempferi seeds (Figs. 3b, 5). The 1409–1630 nm region of the NIR reflectance spectra presents two broad peaks at 1480 and 1550 nm, which corresponds to first overtone of O–H and N–H and combination band of C–H vibration of various functional groups; notably ROH, starch, H2O and protein moieties reported in other studies (Workman and Weyer 2012). The absorption band in 1900–2000 nm with absorption peak centered at 1929 nm may arise from C=O stretch second overtone, combination of O–H stretch and HOH deformation, and O–H bend second overtone. The results of previous studies have shown that molecular moieties of protein, starch and water show overlapping absorption peaks in this region (Shenk et al. 2001; Workman and Weyer 2012). The absorption bands in 2019–2190 and 2230–2410 nm are characteristic of CH2 stretch-bend combinations as well as other vibrational modes of molecular bonds (Workman and Weyer 2012). Several fatty acids, notably polyunsaturated fatty acids, in several oil crops have shown positive correlation to absorption bands in these regions (Hourant et al. 2000; Osborne et al. 1993). Tigabu and Oden (2003) also found correlations between absorbance values in these spectral regions and major fatty acids as a basis for discrimination of viable and empty seeds of Pinus patula Schiede and Deppe.

Thus, NIR spectroscopy appears to have detected differences in chemical compounds between the two species and the hybrid, probably reserve compounds like lipids and proteins as well as moisture content differences, thus allowing L. decidua seeds to be distinguished from seeds of L × eurolepis and L. kaempferi. Fatty acids such as linoleic, Δ5-olefinic, pinolenic and oleic acids were the major composition in seeds of larch species that contributed to discriminating filled-viable, empty and insect-attacked seeds of three larch species in another study (Tigabu and Oden 2004). It should be noted that lipids are the dominant storage reserve compounds in the seeds of many conifers including those of larch; and the major fatty acids include linoleic, Δ5-olefinic, pinolenic and oleic acids that account for 43.1, 30.6, 27.4 and 18.8 % of the total fatty acids, respectively in L. decidua seeds while linoleic acid accounts for 45.5 %, Δ5-olefinic acid for 28.9 %, pinolenic acid for 25.8 % and oleic acids for 18.4 % of the total fatty acids in seed lipids of L. kaempferi (Wolff et al. 1997, 2001).

Absorption bands in 840–1190 and 1217–1620 nm regions allowed discrimination between L × eurolepis seed lots and the pure parental seed lots. The absorption bands in these regions are characteristic of the third overtone of C–H stretching vibration, combination of N–H second overtone stretching vibration and C–H stretch and deformation (Osborne et al. 1993; Shenk et al. 2001; Workman and Weyer 2012). Functional groups responsible for absorption in these regions are mainly CH3, CH2, ArNH2 (aromatic amino acids) and NH2, which are common molecular moieties of fatty acids and proteins Thus, it appears that NIR spectroscopy has utilized differences in fatty acids and proteins as a basis for discriminating seeds of L × eurolepis from L. kaempferi and L. decidua. This divergence in seed storage reserves between hybrid and pure parental (particularly L. kaempferi) seeds is expected because the contribution of the paternal parent (which is L. kaempferi in this study) to the total seed mass is much lower than that of the maternal parent. The embryo (a smaller fraction of the seed mass) is derived from both parents while more than half of the seed mass is of maternal origin. Maternal variation in seed storage reserves is also evident as reproductive allocation in plants is generally governed by the genetic constitution (see review, Bazzaz et al. 2000). Tigabu et al. (2005) have found maternal variation in storage reserves as the basis for identifying among maternal parents of Scots pine using NIR spectra.

Conclusions

Classification of pure and hybrid larch seeds using NIR spectroscopy was successful. Thus, the results highlight the feasibility of NIR spectroscopy as a non-destructive method for verification of hybrid larch seeds. The technique is rapid and efficient as it takes ca. 2 min to scan a single seed, non-destructive and there is no need for sample preparation. Moreover, the OPLS-DA modelling approach results in the development of parsimonious models that provide additional information that allow within-class variation to be explained. From practical viewpoint, the technique can offer a unique opportunity for seed orchard managers to rapidly estimate the hybrid seed yield from open pollinated mixed species seed orchards. Breeders can also benefit from use of the NIR technique to assess the efficiency of artificial pollination in seed orchard management research. As a whole, NIR spectroscopy is a very promising method which should be verified using seeds from many different families before recommending its use in operational practice.