1 Introduction

Pinus halepensis Mill. is a circum-Mediterranean, drought-adapted species that covers broad areas of Spain, France and Italy. As the second most extensive species by area in Spain (outnumbered only by Quercus ilex L. with 2.8 million hectares), it covers two million hectares, of which natural stands account for around 806,000 ha. This species has been widely used for reforestation of degraded soils in arid and semiarid regions of Spain such as Castilla y León, where P. halepensis plantations currently cover almost 20,000 ha (DGCN 2002). Broadleaf species such as Q. ilex L. and Quercus faginea Lam. had grown naturally in these areas before they were transformed into croplands during the last century. Soil erosion became a huge problem when the croplands were abandoned, especially in areas with steep slopes. Pinus halepensis was extensively planted to protect and improve soils, and Q. ilex and Q. faginea are now beginning to grow in these stands. Thus, these plantations have a primarily protective, but also a productive, function. Determining forest productivity can inform decision making in forest management in order to achieve both protective and productive goals for these stands. Useful guidelines for silvicultural practices, such as thinning in plantations on poor and limy soils (Montero et al. 2001), can avoid slowing down forest growth. Similarly, when forest potential productivity information is available prior to plantation, the most suitable species can be selected for afforestation projects (Bravo-Oviedo and Montero 2005). In short, reliable productivity models are necessary for sustainable forest management.

Stand height is the most suitable forest productivity indicator for even-aged stands because of the correlation between height growth and volume growth (Heyer 1841; Skovsgaard and Vanclay 2008). Site index (dominant height at a reference age) has been widely used as a forest productivity indicator because dominant height is less influenced by standard silviculture (Ortega and Montero 1988). Sometimes, site index cannot be determined because dominant height is not available (for example, in stands subjected to thinning from above or high grading or in new afforestations). In these cases, the site index can be estimated by means of its relationship to environmental factors.

Several authors have studied the relationship between forest productivity and environmental parameters, which often include climatic, physiographic and physical/chemical soil parameters (Aertsen et al. 2010; Bravo et al. 2011; Sanchez-Rodriguez et al. 2002). However, parameters related to the soil organic horizon are less frequently considered (Laamrani et al. 2014; Romanya and Vallejo 2004) and soil biochemical parameters—which constitute key soil quality indicators (Bueis et al. 2016b; Muscolo et al. 2014; Schloter et al. 2003; Staddon et al. 1999)—are seldom included (Mahía et al. 2006; Pietrzykowski et al. 2015).

Several studies, some of them focusing on P. halepensis to predict climate change impact on forest growth, tend to agree on climate as the main factor driving species growth (Condes and Garcia-Robredo 2012; Gandullo et al. 1972; Girard et al. 2011; Klein et al. 2014; Rathgeber et al. 2005; Río et al. 2014; Sabate et al. 2002). Río et al. (2014) also related the lower P. halepensis site index in south-eastern Spain to lower nutrient availability, lower cation exchange capacity, higher C to N ratio and lower soil clay content. Most efforts at predicting site productivity from site parameters have concentrated on more productive species such as Pinus radiata D. Don, Pinus pinaster Ait. or Pinus sylvestris L. (Afif-Khouri et al. 2010; Alvarez-Alvarez et al. 2011; Bueis et al. 2016b).

The main aim of this study was to develop a model for predicting the site index of P. halepensis plantations in northern Spain using soil (physical, chemical, biochemical and related to the organic horizon), climatic and physiographic parameters. We hypothesise that, according to the available literature, site quality for P. halepensis plantations in Castilla y León may be predicted by means of soil and climatic parameters related to water and nutrient availability.

2 Material and methods

2.1 Study area and sampling plots

This study was developed in 32 plots (Online Resource 1) in man-made P. halepensis stands belonging to the Spanish National Forest Inventory (DGCN 2002) and located in the centre of the region of Castilla y León. Predominant understory species included Q. ilex, Q. faginea, Genista scorpius (L.) D.C., Dorycnium pentaphyllum Scop., Staehelina dubia L., Lithodora fruticosa L. and Salvia lavandulifolia Vahl. Stand age ranged from 45 to 61 years, and the slope in the plots ranged from 0 to 55%. The calcareous soils originated from carbonate-rich parent materials and can be classified as Calcixerepts within the Inceptisol order (Llorente and Turrion 2010). Soil reaction was alkaline with pH ranging from 8 to 8.9. Mean annual temperature of the area is 11.7 °C, mean annual precipitation is 456 mm, mean temperature of the warmest month is 20.9 °C and mean temperature of the coldest month is 3.8 °C. The Lang, Martonne and Annual Hydric Indexes all classify this area as arid/sub-humid. Table 1 shows the main silvicultural characteristics of the plots studied.

Table 1 Stand characteristics of Pinus halepensis Mill. plots (n = 32) used to develop a discriminant model for predicting the site index in calcareous plantations in northern Spain

Stand age for every plot was established as the difference between the year of plantation and the year when the Third National Forest Inventory was carried out. Dominant height (H 0) was determined for each plot based on data from the Third National Forest Inventory (DGCN 2002), following the Assmann (1970) criterion. Site index (SI) was estimated for each plot using the equation developed by Montero et al. (2001):

$$ {H}_0= a\cdot {\left(1-{e}^{-0.203954\cdot t}\right)}^{1/1.046295} $$

where H 0 is the dominant height in meters and t is the age in years. The value of a was determined for every plot from the previous equation and the current H 0 and t values. To determine site index for each plot (dominant height at the reference age of 80 years), the value of a for each plot and the reference age (t = 80 years) were introduced into the equation and the resulting value of H 0 was the site index. Three quality classes were established based on the site index value for each plot: low (site index ranging between 6.5 and 9.5), medium (site index between 9.5 and 12.5) and high (site index between 12.5 and 15.5). Accordingly, 8 plots were classified into the lowest quality class, 9 into the highest quality class and 15 into the medium quality class.

2.2 Sampling

Soil sampling included the forest floor or organic horizon and 10 cm of mineral topsoil, based on the method of Jokela et al. (1988). Bravo et al. (2011) and Bueis et al. (2016b) also adopted this method, as environmental changes are more strongly reflected in this layer. Each plot had four sampling points located 5 m from the centre of the plot in N, S, E and W directions. At each point, one disturbed and one undisturbed soil mineral sample were taken. Undisturbed samples were collected in a steel cylinder (5 cm diameter and 5 cm height) to maintain their original structure. The four disturbed samples from each plot were grouped together to obtain a composite mineral soil sample per plot.

The organic horizon was divided into two fractions: almost undecomposed litter fraction (L) and fragmented fraction plus humified fraction (FH). This horizon was sampled in 20 × 20-cm quadrants and mixed to get a composite sample of each fraction per plot. The total organic horizon thickness (OHT) was measured in the field.

2.3 Physical, chemical and biochemical soil analysis

Disturbed mineral soil samples were air-dried at 105 °C and sieved with a 2-mm screen before analysis. All analyses were done in duplicate. Bulk density and field capacity were determined with the samples of undisturbed soil.

Physical parameters included percentage of coarse particles (>2 mm); particle distribution by pipette method (MAPA 1993) and subsequent determination of clay content, sand and silt following the International criteria (ISSS) and silt and sand following the USDA criteria; porosity by determining bulk and real density; and available water as the difference between water content at field capacity and the permanent wilting point, which were determined using Eijkelkamp pF-Equipment.

Chemical parameters included pH using a 1:2.5 (soil/water) suspension (MAPA 1993); easily oxidizable carbon (EOC) through the Walkley and Black (1934) method; cation exchange capacity (CEC); exchangeable cations (Ca, Mg, K and Na) by means of extraction with 1 N ammonium acetate (pH = 7) (Schollenberger and Simon 1945) and subsequent determination of cations in the extract using an atomic absorption/emission spectrometer; Fe, Al, Zn and Mn contents following the DTPA-TEA method (Lindsay and Norvell 1978); available P (AP), which was extracted using anion exchange membranes (Turrión et al. 1997) and determination of P in the extracts using colorimetry (Murphy and Riley 1962); gypsum content (Richards 1954); total calcium carbonates following the methodology of Bundy and Bremner (1972); total N (TN) through dry combustion using a LECO CHN 2000 elemental analyser; total organic C (TOC) by subtraction of C content in carbonates from the total C content in soil, measured with a LECO CHN 2000 analyser and reactive calcium carbonates following the Bashour and Sayegh (2007) modification of the Drouineau (1942) method.

Biochemical parameters included mineralizable C (Isermeyer 1952); microbial biomass C, N and P following the fumigation-extraction method (Vance et al. 1987) and using a Skalar TOC autoanalyser to determine C and N content in extracts and colorimetry to determine P content (Murphy and Riley 1962). The relationships microbial biomass C/microbial biomass N, microbial biomass C/TOC, mineralizable C/TOC and the microbial metabolic quotient (qCO2 = mineralizable C / microbial biomass C) were calculated.

Organic horizon samples were dried at 60 °C and weighed to determine the amount of biomass per hectare for L (OL) and FH (OFH) fractions. A representative portion was ground up and analysed with a LECO-CHN 2000 element analyser to determine total C and total N concentrations of L and FH fractions, as well as the (TC/TN)L and (TC/TN)FH ratios.

2.4 Climatic and physiographic data

Precipitation and temperature data were obtained from the Digital Climatic Atlas for the Iberian Peninsula (Ninyerola et al. 2005), and the climatic parameters calculated were mean seasonal precipitation (PW: winter precipitation; PSP: spring precipitation; PSU: summer precipitation; PA: autumn precipitation), annual total precipitation (TP), mean annual temperature (MAT), mean temperature of the coldest and warmest month (MTCM and MTWM, respectively), mean minimum temperature in the coldest month (MMCM), mean maximum temperature in the warmest month (MMWM) and potential evapotranspiration (PET) following the Thornthwaite (1949) method; actual evapotranspiration, deficit and surplus were calculated based on climatic data (temperature and precipitation) by computing the Monthly Water Balance as described by Thornthwaite and Mather (1955); and Martonne Index (De-Martonne 1926), Lang Index (Lang 1915) and, finally, Annual Hydric Index were calculated through the following expressions as described by Thornthwaite (1949):

$$ \mathrm{Annual}\ \mathrm{Hydric}\ \mathrm{Index}= H-0.6 A\kern0.5em H=\frac{\mathrm{Surplus}}{\mathrm{PET}}\kern0.5em A=\frac{\mathrm{Deficit}}{\mathrm{PET}} $$

Physiographic parameters included elevation and latitude, as recorded in the National Forest Inventory. Slope was measured in the field with a clinometer.

2.5 Statistical analysis

First, a principal component analysis (PCA) was applied to each group of variables (physical, chemical and biochemical soil, organic horizon, climatic, physiographic) to reduce dimensionality and select the non-correlated variables that accounted for most of the data variability. The studied variables were standardized before the PCA. This analysis was applied to each group of variables first because it is widely thought that correlations appear mainly between variables belonging to the same group. We selected the principal components that presented an eigenvalue higher than 0.7 and accounted for at least 70% of the overall data variability. The variable with the highest absolute value coefficient (factor loadings) was chosen to represent each principal component, as proposed by Jolliffe (1973).

The Shapiro-Wilk test was applied to test normality in the selected variables. Those showing lack of normality were transformed using the log (X), exp (X), 1/X, X2 and √X functions. Transformed variables showing lack of normality were replaced by the variable with the next highest absolute value coefficient in the PCA. The variables selected through PCA were tested for correlation using Pearson’s correlation coefficient, to avoid including strongly correlated variables in the discriminant analysis. Finally, a discriminant analysis was carried out to develop a discriminant model for predicting the site index from the variables selected through PCA. Discriminant analysis classifies new cases into established groups according to their properties; discriminant functions have the following general structure:

$$ Y={\beta}_0+\sum_{j=1}^p{\beta}_j\cdot {X}_j $$

where Y is the score obtained for each group, β 0, β 1, … β p are the coefficients obtained and X j is the value of the p variable selected as the predictor to represent soil, physiographic and climatic factors (Hair et al. 1999). New observations are assigned to the group with the highest score. This classification technique has been used in similar studies (Bravo-Oviedo and Montero 2005; Bravo et al. 2011; Bueis et al. 2016b; Harding et al. 1985). In the discriminant analysis, equal prior probabilities of belonging to a group were considered for the three groups. Resubstitution was used to evaluate the discriminant models. The whole dataset was used to define and evaluate the model so the estimation of the rates of correct resubstitution presents an optimistic bias. Models including combinations of three, four and five variables coherent with biological processes and dynamics were tested. No model included two variables belonging to the same group. Then, the three, four or five variables included on each model belonged to different groups of variables (soil physical, chemical and biochemical, related to the organic horizon, climatic and physiographic variables). The model presenting the highest correct resubstitution rate (percentage of observations correctly classified into their actual class) with the least number of predictor variables was selected. Statistical analyses were performed with Statgraphics Centurion XVI software for Windows (Statgraphics 2014).

3 Results

Edaphic, climatic and physiographic characteristics of the plots studied are summarized in Online Resources 2, 3 and 4, respectively (Bueis et al. 2016a).

The four principal components selected from the PCA of soil physical properties accounted for 93.8% of the data variability (Table 2). The variables selected for those four principal components were silt (ISSS), coarse particles, porosity and clay. From the PCA of soil chemical properties, seven principal components (accounting for 87.5% of data variability) were selected and EOC, carbonates, Zn, reactive carbonates, TOC/TN, Fe and gypsum were chosen to represent them. The PCA of soil biochemical properties informed the selection of four principal components that represented 97.8% of data variability; the biochemical variables selected to represent them were microbial biomass N, qCO2, mineralizable C and microbial biomass P. The two principal components selected from the organic horizon PCA accounted for 74.9% of data variability, and the variables chosen were OFH and (TC/TN)L. Four principal components from the PCA of climatic parameters accounted for 93.6% variability and correspond to the Martonne index, MTWM, MMWM and PSU. Latitude and slope were the variables selected for the two principal components from the PCA of physiographic parameters, which accounted for 86% of data variability. In summary, 23 variables were selected from the PCAs of groups of variables, including 17 edaphic variables (silt, coarse particles, porosity, clay, EOC, carbonates, Zn, reactive carbonates, TOC/TN, Fe, gypsum, microbial biomass N, qCO2, mineralizable C, microbial biomass P, OFH, (TC/TN)L), 4 climatic variables (Martonne index, MTWM, MMWM, PSU) and 2 physiographic variables (latitude and slope). Eight principal components (accounting for 86.9% of the variability) were selected from the PCA of the 17 edaphic variables (physical, chemical, biochemical and related to the organic horizon), and the variables chosen to represent those principal components were microbial biomass N, silt, qCO2, gypsum, Zn, porosity, microbial biomass P and clay (Table 3).

Table 2 Principal component analysis summary for each group of variables and variable chosen from each principal component
Table 3 Principal component analysis summary for the 17 edaphic variables selected

The normality hypothesis was rejected (5% significance level) for the Zn, Martonne index and MTWM variables. No transformation of these variables presented a normal distribution, so they were replaced by the variable with the next highest absolute value coefficient in the PCA: TOC/TN, Annual Hydric Index and PET, respectively. Both TOC/TN and PET followed a normal distribution, but Annual Hydric Index was transformed into Annual Hydric Index2. The final 14 variables selected for the discriminant analysis were silt, clay, porosity, gypsum, TOC/TN, microbial biomass N, qCO2, microbial biomass P, Annual Hydric Index2, PET, MMWM, PSU, latitude and slope. Correlations between selected variables were studied (Table 4), and strong correlations were found between MMWM and PET and latitude, so these pairs of variables were not included together in the discriminant analysis.

Table 4 Pearson’s correlation coefficients among the variables selected for use in the discriminant analysisa

Discriminant models including three, four and five variables as predictors were studied. More than a hundred biologically consistent combinations of three, four and five variables belonging to different groups (soil physical, chemical and biochemical variables, along with climatic and physiographic variables) were tested. The resubstitution error rates of the discriminant models studied ranged from 37.50 to 59.37%. As a general rule, models with three predictor variables have a higher resubstitution error than those using four or five variables as predictors. Models that included a soil physical variable (especially clay and porosity), a soil nutrient-related variable (especially chemical or biochemical), a climatic variable (especially Annual Hydric Index2) and a physiographic variable (especially slope) presented the best correct classification rates. Besides, all models with a correct classification rate higher than 60% included a biochemical parameter.

The model selected to predict the site index for P. halepensis plantations in northern Spain had four variables—porosity, microbial biomass N, Annual Hydric Index2 and slope—and presented a correct classification rate of 62.50%. Extreme quality classes (highest and lowest) had a very high correct classification rate (75.00 and 77.78%, respectively). No plot belonging to the highest quality class was misassigned to the lowest quality class, and only 12.5% of plots belonging to the lowest quality class were misassigned to the highest quality class (Table 5). Table 6 presents the parameters of the discriminant model selected to predict site quality in P. halepensis plantations in northern Spain.

Table 5 Correct classification resubstitution rates of the model selected to predict site quality in Pinus halepensis plantations in northern Spain
Table 6 Discriminant functions coefficients for predicting site quality in Pinus halepensis plantations in northern Spaina

4 Discussion

The model selected to predict the site index for P. halepensis plantations in northern Spain presented a correct classification rate of 62.50% and included porosity, microbial biomass N, Annual Hydric Index and slope as the four predictor variables.

Climate is the main driver of Mediterranean forest growth in general and P. halepensis growth in particular (del Castillo et al. 2015; Gandullo et al. 1972; Olarieta et al. 2000; Rathgeber et al. 2005; Río et al. 2014). The correlation between precipitation and P. halepensis growth is usually significant and positive while the correlation between growth and temperature is significant and negative, because higher temperature provokes higher evapotranspiration and reduces water reserves (Condes and Garcia-Robredo 2012). The discriminant model developed includes Annual Hydric Index as a predictor, combining information about temperature and humidity. The area studied presents an arid climate, so the Annual Hydric Index always had a negative value. Because this variable was not normally distributed, it was transformed into Annual Hydric Index2 (a positive value) for inclusion in the discriminant analysis. Thus, the higher the Annual Hydric Index2, the drier the climate and the lower the predicted site quality (Table 6).

Water availability does not entirely depend on supply through precipitation and loss through evapotranspiration. Soil physical parameters such as particle size distribution or porosity determine the amount of water that percolates down into the soil profile during precipitations as well as the water retention capacity of the soil. Río et al. (2014) found that higher P. halepensis site indexes presented soils with clay or loamy clay textures while stands with lower site indexes had lower amounts of clay and were more sensitive to climate. Rathgeber et al. (2005) developed a model that included information about soil water capacity, in addition to precipitation and temperature, to simulate radial P. halepensis growth in France. The model selected to predict the P. halepensis site index in northern Spain includes porosity, a parameter that can be determined easily, that integrates information about water and the soil aeration regime. In calcareous soils, physical limitations are likely to be compounded by the fine texture and cementing action of calcareous materials. Kishchuk (2000) indicated that calcareous soils may physically affect root penetration, water infiltration and gas exchange in ways similar to compacted soils, with fewer physical limitations as soil porosity increased. Higher porosity in the area studied would thus predict higher P. halepensis site quality, as porosity makes it possible for water to penetrate the soil, reach the rhizosphere and aerate the roots properly.

Physiographic parameters such as slope are often related to forest growth in P. halepensis stands all over the Mediterranean region (Al Omary 2011; Condes and Garcia-Robredo 2012). Slope increases water runoff by diminishing the water percolation into the soil profile and is also related to higher nutrient loss and soil erosion. Eroded materials from steep slopes accumulate in areas with less slope, creating greater depth to carbonates in those areas. Kishchuk (2000) stated that the deeper carbonates in soil correspond to greater forest growth. In other words, as the slope of the plot increased, predicted site quality decreased for P. halepensis plantations in northern Spain, which is in accordance with the findings by Al Omary (2011) in P. halepensis plantations in Jordan.

Several studies found that P. halepensis growth (which is associated with the site index defined as dominant height at a reference age) is mainly driven by water availability in stands within the natural distribution area of the species in Spain (Condes and Garcia-Robredo 2012; del Castillo et al. 2015; Gandullo et al. 1972; Río et al. 2014). The same trend was found by Olarieta et al. (2000) in P. halepensis plantations in northeast Spain. Therefore, water availability seems to be the most limiting factor for the species productivity in Spain, regardless of being within or outside the natural distribution of the species. Moreover, similar results were achieved by Rathgeber et al. (2005) in P. halepensis stands in France, Klein et al. (2014) and Maseyk et al. (2011) in Israel and Toromani et al. (2015) in Albania, so this conclusion may be generalizable to the entire Mediterranean area.

Usually, water availability is included in related studies by means of climatic parameters (Condes and Garcia-Robredo 2012; del Castillo et al. 2015; Pasho et al. 2011). However, the amount of water that is actually available for plant roots is determined not only by the contribution of water from precipitations but also by the site factors allowing the water to percolate down the soil profile (Gandullo et al. 1972) and the water retention capacity of soils (Olarieta et al. 2000; Rathgeber et al. 2005) as reflected in the present study.

The model developed to predict P. halepensis productivity also included microbial biomass N as a predictor. Higher microbial biomass N resulted in lower site productivity in the area studied. Higher values for biochemical properties in soils are generally related to higher forest productivity (Bueis et al. 2016b; Foote et al. 2015; Mahía et al. 2006), as the microorganisms responsible for soil nutrient turnover and availability participate in mineralization processes. However, in some ecosystems, an opposite trend has been observed: microorganisms may actually immobilize N in soils with very low N availability or litter input with a very high C/N ratio (Recous et al. 1995; Song et al. 2007). Microorganisms can uptake N very quickly because of their high surface-area ratio, which prevents nutrient loss from leaching but creates an N deficit for the plants (Kuzyakov and Xu 2013). This in turn diminishes stand productivity as N limits productivity in forest ecosystems.

Models with a correct classification rate higher than 60% included a biochemical parameter, indicating that biochemical parameters are determining factors in the P. halepensis site index. However, laboratory soil analyses do not usually include biochemical analyses, and not all laboratories are equipped for them. When the microbial biomass N biochemical parameter was removed from the model developed here, the error rate increased by 6.25%. Nevertheless, this reduced model could be an interesting alternative when biochemical soil analyses are not available.

5 Conclusions

A discriminant model using environmental parameters related to water and nutrient availability was developed to determine the P. halepensis site index for plantations in northern Spain. The model included four parameters, Annual Hydric Index, soil porosity, slope and microbial biomass N. Three of those parameters reflect the water regime of the soil: Annual Hydric Index (which combines temperature and precipitation information), soil porosity (responsible for water penetration in soil and retention in the rhizosphere) and slope (responsible for water runoff on the soil surface). Soil porosity adds information about the aeration regime of the soil, and slope is related to nutrient and soil losses through erosion. The fourth parameter, microbial biomass N, reflects the N deficit due to microbial immobilization, which is a common reality in N-limited ecosystems. The model developed here presented a correct classification rate of 62.5% overall, with a high rate of plots correctly classified into the extreme high (77.8%) or low (75.0%) quality classes. This very interesting fact can inform the focus of forestry practices in P. halepensis plantations in Castilla y León and the choice of optimal species for new afforestation programmes for successful and sustainable stand management in the studied area. Further studies would be necessary to test the validity of this model on independent samples and also to determine whether the developed model could be applicable out of the study area and broadened to other Mediterranean areas.