Learning distance effect on lignite quality variables at global and local scales

Determining scale and variable effects have critical importance in developing an energy resource policy. This study aims to explore the relationships in heterogeneous lignite sites using different scale models, spatial weighting as well as error-based pair-wise identification. From a statistical learning framework, the relationships among the quality variables such as geochemical variables and the contributions of the coordinates to quality measures have been exhibited by generalized additive models. In this way, the critical roles of spatial weights provided by the coordinates have been specified at a global scale. The experimental studies reveal that incorporating the geological weighting in the models as the additional information improves both accuracy and transparency. Because relationships among lignite quality variables and sampling locations are spatially non-stationary, the local structure and interdependencies among the variables were analyzed by geographically weighting regression. The local analyses including spatial patterns of bandwidths, search domains as well as residual-based areal dependencies provided not only the critical zones but also availability of pair-wise model alternatives by calibrating a model at each point for location-specific parameter learning. The results completely show that the weighting models applied at different scales can take spatial heterogeneity into consideration and these abilities provide some meta-data and specific information using in sustainable energy planning.


Introduction
The sustainability of an economic development directly related with resource planning. As one of the primary energy resources, lignite still has paramount importance due to its public and industrial usage along with the environmental sustainability. Therefore, performing lignite inventory analysis and providing reliable information directly from the sites maintain its prominence (Schweinfurth 2009). From an energy resources policy, lignite has global heterogeneous bedding structure in earth. For this reason, the studies on lignite reserves and quality parameters have generally been performed at global scale (Siddiqui et al. 2018). This perspective conceals the understanding of small scale variability and providing useful information at local scale.
The quantification of spatial relationships is the main preliminary operations for mine investment, technological planning as well as environmental impact assessment (Galetakis and Vamvuka 2009). The effects of lignite quality variables on calorific value, which are of primary interest in lignite inventory estimation, are arguably space invariant. On the other hand, an important spatial heterogeneity in lignite sites may be recorded in practice and this characteristic requires applying spatial search methodologies (Chuai and Feng 2019).
Lignite data comprise of both attributions and locational information (Gabrosek and Cressie 2010). Based on the geographical information (coordinates) and observable characteristics such as thickness and calorific value, many reserve estimation methodologies have been suggested in coal inventory evaluation. One of these approaches, the researchers practiced an uncertainty-oriented kriging model for lignite reserves (Pavlides et al. 2015). In a different study, a software based algorithm suggested for handling well-log data to estimate the proximate parameters of coal beds. As a result of this, a GIS-based information system was suggested (Mert and Dag 2015). In general, investigations and decisions incorporating energy sources are generally characterized via high complexities. To appraise the uncertainties and control of quality, recently an efficient scale-based resource model has been developed (Yüksel et al. 2017).
Due to the limitations of conventional methods in identifying scale effect problem, there have been an increasing number of methods proposed in the past decade. From a general review, the following studies were performed on the ground of the optimal use of energy resources and the appraisals of the scale-based uncertainties. One of the decision-oriented studies on energy sources, a vision of information systems that are state of the art in the energy field was presented (Dominguez and Amador 2008). In an outstanding study, the signed distance functions were used in global coal reserve estimation (Deutsch and Wilde 2013). Scale-dependent and spatially varying considerations in soil contamination problem were discussed from a geographical perspective (Lia et al. 2017). The coal mapping problem was considered as a compositional data analysis problem and transformation-based applications have been suggested as a solution (Karacan and Olea 2018). Based on the spatial relationships, various models were proposed to explore the reasons in back of the extensive height of the zones in a coal seam (Li et al. 2018). The spatial ecological uncertainties encountered in coal sites have been evaluated by different pattern indices (Wu et al. 2019). Most recently, coal qualite evaluation has been performed using multielemental compositions and geographical information (Kanduc et al. 2019).
Even though the explorative energy source models in literature provide some information for making general evaluations, there are still some limitations. Principally a meta-data covers the available information concerning inter dependence and this small scale variability have been generally omitted. Due to the importance of the meta-data parameters such as search radius and weights, the quantitative lignite site investigations should pay regard to distance-based transparency along with the accuracy.
Filling a gap in coal reserve evaluation from a sustainability perspective is the main motivation of this study. The current literature follows a single univariate geostatistical method. However, this paper suggests a critical methodology and concentrates on the multi-parameter analysis by considering the place of the sample points and their relative contributions. By this way, the critical parameters and some meta-data such as local bandwidths and relationships are revealed based on multivariate spatial evaluation. Two real lignite sites were considered and the relationships among the quality variables were explored from a spatially weighting perspective.
For a global exploration, Generalized Additive Models (GAMs) (Wood 2017) have been used and importance of geographical weighting has been presented by a multivariate analysis framework. As discussed by Ravindra et al. (2019), the GAM ensures a specification of target by determining the model from the point of smooth function as an interchange of the elaborative parametric dependencies on the covariates. To represent the relationships and effective patterns at global scale, an additive model-based evaluation has been conducted in extensive analysis due to the potential flexibility, smoothing capacity, and fitness of the GAM.
In a similar way, the areal relationships in connection with bandwidth and weighting have been revealed at local scale by Geographically Weighted Regression (GWR) (Brunsdon et al. 2002). The method considered spatial heterogeneity, generated differentiated estimations of parameters and non-stationarity across spatial locations (Li et al. 2019a, b).
Both the methods discovered the effect of distance (bandwidth) and spatial positions (weight effect) on the variable values using performance measures. Moreover, a comparative error-based evaluation and performance measures indicated the best combination and interdependencies.

Geographical consideration of lignite deposit
Coal mining associates with the depositional process of coals in a basin. Therefore, in connection with geological environments and structures, coal quality is controlled by some geological factors. Therefore, spatial properties encountered in a depositional process should be considered in an analysis.
As the explanatory term, heterogeneity addresses regional similarity and global difference in a deposit (Yacim and Boshoff 2019). Identifying and modeling spatial heterogeneity in a lignite site require specifying critical field parameters like influential distances (bandwidth) and the main determinants like weights. The Learning distance effect on lignite quality variables at global and local scales 857 spatially varying connections are pertinent to different observations recorded at a series of formlessly distributed geographic locations in a site (Yaylagul 2019). Therefore, to provide some meta-data and information on inter-dependencies, the analyses should be performed both at global and local scales. The lignite quality determinants: ash (%), sulfur content (%), volatile matter (%), moisture (%), and lower calorific value (LCV) (kJ/kg) are considered including spatial variability.
Determining the specification of the most favorable approach and the scale to accelerate the preliminary coal assessment has critical importance (Andrews-Speed et al. 2005). A global analysis based on spatial relationships assumes to apply statistical properties equally across a coal reserve. In some cases, the global models may mask the spatial relationships and they can not accurately reflect the spatial heterogeneity. In such circumstances, the local analysis subrogates the global perspective (Tutmez et al. 2012). By an areal approach, small scale variability and localization (clusters) in a lignite site can be identified using similarity indicators and local weightings.

Analysis at global scale
The multivariate analysis can usually be expressed by general linear structures and explained underneath. Multiple least-square adjustments are employed to predict a target Y variable for various independent X indicators. The dependence between response variable and indicator variables within a linear model can be structured as: The non-linear fits can potentially perform more accurate estimations for the target Y. In a GAM model, each linear component b j x ij is replaced with a (smooth) nonlinear function f j x ij À Á . Thus, the new configuration permits non-linear relationships between each indicator and the response parameter. Put it differently, in place of a single coefficient for every variable a non-parametric function can be calculated for each predictor, to provide the best estimation of the response variable values.
Globally, the GAM structure can be expressed as follows (James et al. 2013): In Eq.
(2), f j for each X j are calculated separately. Thereafter, all of the components are added together. The main superiority of a GAM model is reducing the residual in estimation of a target Y from distributions via appraising undefined functions linked via a function. The link function converts the target variable to relate it to set of covariates; thus, the structure can be designate by using the probability distribution of the target (Yoon 2019). Adapting an additive model with a smoothing spline is one of the trustworthy methods. Thus, a model is fitted including multiple indicators (input variables) by recurrently amending the fit for each indicator in turn, holding the others fit. As in more to splines, polynomial regression can also be employed in establishing a component for the additive models (Wood et al. 2015).

Analysis at local scale
In an ore deposit, the areal coefficients varied in space address an indication of non-stationary. Geographically Weighted Regression (GWR) identifies with the spatial non-stationary of observational dependencies, provides a weighting of non-stationary measurements that is regionally certain, and also permits model parameters to vary in space. While a global approach assumes the processes generating the measurement data are the same everywhere so that an individual parameter is predicted for each covariate, the GWR considers this presumption by adjusting a model at each point to provide location-specific parameter estimates for each process (Li et al. 2019a, b).
The linear structure given in Eq.
(1) is solved by a least squares method as follows: The geographically weighted analysis provides a local solution for the structure in Eq. (3) using kernel functions. The parameters are obtained by solving where, b denotes an estimate of b, and Wðu i ; v i Þ address an n by n matrix whose diagonal components designate the spatial weighting of each of the n measurements in regard to point i (Yu et al. 2020). The rest of the matrix elements arise from zeros. To represent the relative significance among regions, a spatial weight matrix Wðu i ; v i Þ can be established as follows: From a continuous distribution function, the weighting (w ij ) term corresponds the scaled distances between i and j, The commonly used function, Gaussian structure can be expressed as follows where, b and d ij represent the bandwidth of the search domain and the distance between the sample i and the middle of the kernel, respectively.

Case studies
To perform the applications, two lignite sites in Turkey were considered. The lignite reserves in Turkey are placed in commonly continental sedimentary basins of Tertiary age (Ediger et al. 2014). The lignite-bearing areas also show the properties of various geological mechanisms. The analyses have been performed based on experimental and algorithmic framework. To provide technical information by local and global analyses, a series application was performed using R packages such as GWmodel (Brunsdon et al. 2002) and gam (Hastie 2015), respectively.

Global scale case study
In the first application, the overall structure commonly recorded in a lignite deposit has been investigated. The lignite reserve in Canakkale, Turkey was considered in this evaluation. The lignite area covers Mesozoic Limestone and various Schistous blocks. The thickness of lignite seam has variability between 5 and 65 m and it is placed in conglomerates (MTA 2010). Figures 1 and 2 illustrate location map and sampling structure (drill holes), respectively. The descriptive statistics including mean, standard deviation and relative uncertainty of 93 measurement points (TKI 2018) are outlined in Table 1.
In Table 1, the explanatory indicator, relative uncertainty (U R ) addressed a ratio of standard deviation (S td ) and mean (M). Comparing with the other variables, ash and sulphur content includes the big variability (uncertainties). The small uncertainty in locations addressed a consistent sampling design. Figures 3 and 4 describe the global interrelationships between coordinates and quality parameters, respectively.
As recorded in Fig. 3, Easting direction has two modes in comparison with Northing. A positive correlation which addresses a directional trend is also recorded. In Fig. 4, all the quality variables show about normal distributions. The extreme variability was recorded for volatile matter and sulphur content, respectively.
An unconventional experimental design has been created to understand the relationships between the indicator variables and LCV. The GAM-based global analyses have been performed using non-linear continuous spline  functions and this provided some consistency and transparency using different exponential functions. The GAM models have been assessed by the corrected determination coefficient (CDO) below: where, n and k denote the number of observations and variables (without constant), respectively. Table 2 summarizes the design and the corresponding CDO performances. Although the model performances have been provided as very high, the models building without ash cannot equal the rest of the model performances.

Local scale case study
In the second application, the lignite deposit located in Tekirdag, Turkey was considered. The origin of the site sediments is in Oligocene and quaternary epoch. The lignite seams are placed in marl and sandstones. In general, the lignite seams in the region exhibit discontinuous property (Atalay and Tercan 2017). Figures 5 and 6 show location map and sampling structure, respectively. Excavated lignite is employed in the thermal power plant as the fundamental fuel (MTA 2010). The statistical properties of the data set including 51 sampling locations (TKI 2018) are outlined in Table 3. The variations of the variables corresponding with LCV are illustrated in Fig. 7. Table 3 and Fig. 7 show that ash and sulphur content have the big variability. There is a clear trend between ash content and LCV monitored.
To specify and simplify the local analysis, the relationships between the indicator variables and LCV have been considered in pairs. The contributions of the pair-wise indicator variables on LCV have been assessed by different levels of bandwidths (m) such as 500, 1000,2000,5000,10,000,15,000,20,000,25,000. The patterns between zone of influence and model performance are illustrated in Fig. 8.

Discussion
As a fossil fuel, lignite includes both organic and inorganic compounds. The correlations and inter-dependencies recorded in the descriptive statistics are partly related with this content as well as the geological structure. For example, the relatively strong spatial correlation between ash and LCV recorded in both data sets indicate the importance of amount of the inorganic matter contents. The global scale analyses were performed based on generalized additive framework. In the first part, the single effects of the coordinates and quality variables have been investigated. Since the model is additive, the effect of each quality variable on LCV have been analyzed individually while holding the rest of the quality variables fixed. As can be seen in Table 2, the most of the combinations revealed very high GAM performances on 10 (5 ? 5) models. In addition, the role of ash parameter, which has strong negative correlation with LCV, has been recorded (Fig. 4). In the second part of the global analysis, the importance of the coordinates (weighting) were investigated by an experimental setup. By this way the contribution of distance effect have been incorporated into the models using three quality variables directly. Figure 9 illustrates the outcomes of this investigation. The increasing in the performances using spatial weighting revealed that geological structure and positions create new additional information in model development. All the model combinations in Fig. 9 demonstrate the improvements provided by spatial weighting. Therefore, it is required to consider geographical sampling information together with the indicator variables measurements in any energy sources planning investigation.
The local scale analyses have provided some information both on the effective distances and the resulting model residuals. Determination of the zone of influences revealed the effective zones and localisation all together. Hence, dependence between the pairs and LCV exhibited a categorization as in Fig. 10. For example, at fixed moisture content, the correlated distance (search radius) between ash and LCV reaches up to 3 km. However, sulphur content and volatile matter signify relatively small scale variability in terms of LCV. The determination of the limits of the search domains has also practical benefits for the optimal planning of lignite energy resources in production decision making steps. The local scale appraisals were approached on the model residuals. The local analyses enabled in specification for the residual design and provided to be some appropriate fits for the observations having various distributions. Figure 11 illustrates the absolute errors for the pair-wise variable models. The established model using moisture and ash provided the best accuracy. This result supports the zone of influence expressed in Fig. 10. In other words, the increasing effective distance ensures the increasing accuracy. The pair-wise model built by sulphur content and volatile matter also promotes this determination.
The learning structures ensure step-by-step procedures that can be employed in real applications each time the data become available. Almost all of lignite resources have heterogeneous properties and the evaluations can be made both global and local scales. By the global tool, a model can be fitted involving multiple quality variables via recursively revising the fit for each variable in turn, continuing the others fit. In a local perspective, the spatially non-stationary relationships among the lignite quality variables and measurement locations can be appraised. Thus, by conducting an integral resource appraisal at different scales will be provided some indispensable

Conclusions
Performing convenient and efficient computational analysis and providing reliable results on energy resources have paramount importance for the optimal use of energy sources, industrial development as well sustainable environment. From this point of view, to explore the complex relationships encountered in heterogeneous lignite sites, two real case studies have been designed and different scale models have been performed. The methodology used in this study is different from any single univariate geostatistical analysis as it focuses on the multivariate analysis by considering the location of sample points.
The global analyses indicated the contributions of spatial weights. Comparing with the non-weighted (without coordinate information), weighted additive models provided better accuracies and flexible-transparent models. This study has used geographically weighting approach and it conducted local scale analyses. This implementation enabled to specification for the error structure. The analyses on the zone of influence and pair-wise residuals showed that the determination of accurate calorific value variation not only depends on effective search radius but also associates with the spatial-dependence based weights. The methods considered spatial heterogeneity, and generated estimations of parameters and non-stationarity across spatial locations. By this way, uncertainty and autocorrelation among the lignite quality parameters have been appraised.  There are some limitations of the models. Firstly, a global model may be additive so as a result of this may be restricted. Similarly, a local analysis could be influenced from collinearity and outliers. From a coal science view, a consolidation of compositional data property and suggested methodology would be required in a future study.
Acknowledgements The authors would like to extend their appreciation to the General Directorate of Turkish Coal Enterprises (TKİ) for the data sets.

Compliance with ethical standards
Conflict of interest The authors declare that there is no conflict of interest.
Ethical approval All the authors have agreed for authorship, read and approved the manuscript. The manuscript in part or in full has not been submitted or published anywhere.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.