Keywords

1 Introduction

Due to rapid changes in the digitalization of every-day life that results in the high velocity of data generation, it becomes more and more important that graduates, regardless the study field, have competence in data and business analysis. Data analysis is the process of examining data sets in order to draw conclusions about the information they contain; increasingly with the aid of specialized systems and software. Data analysis technologies and techniques are widely used in commercial industries to enable organizations to make more-informed business decisions and by scientists and researchers to verify or disprove scientific models, theories and hypotheses.

We are in the era of “big data”, which is characterized by the high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making (Gartner IT Glossary 2018). Hidden information in big data can help companies unlock the strategic value of this information by allowing a company to understand where, when and why the company’s customers buy; to optimize workforce planning and operations; to improve inefficiencies in company’s supply chain; to predict market trends and future needs and to become more innovative and competitive; to name just a few opportunities. It means that data visualization increasingly becomes a top-need for organizations that are data-driven. There is a need today to develop data visualization abilities within companies, as unique digital assets in the business, for maximum impact and consistent execution against strategic business practices and goals, are implemented (Big Data Quarterly 2016; Šebjan and Tominc 2015). These abilities of organizations can be undoubtedly built based on the competences of their employees.

Quantitative research competence and the so called “statistical thinking” have become indispensable for able citizenship, and for the competences needed for successful placement in the labour market.

Recent decades saw a great increase in the use of quantitative research methods, not only in the natural science, but particularly in the social sciences. Although the research process itself is in general the same, the quantitative and qualitative researches differ in terms of methods for data collection, the procedures and methodologies used for data analysis and the style of communication of the findings (Kumar 2005).

There are some important characteristics and advantages of using quantitative methods. An important feature is that the variables that quantitative methods include in the analysis are both numerical and nominal; data analysis is, in principle, quick, especially with the help of software; the results can be generalized to a statistical population if the data base represents a random sample; very often, quantitative analysis-based findings are also accepted as more credible by policy and decision makers.

In this chapter we first introduce the selected set of quantitative statistical methodologies, that address the basic principle of inferential statistics with the principles of generalizing sample statistics to the parameters of the statistical population – thus, to show how to use sample data to estimate population parameters. In the next section of the chapter we present the use of the binary logistic regression. The goal of logistic regression is to explain categorical variable, divided into two groups, on the interval of explanatory variables (interval scaled, ratio scaled, categorical) (Janssens et al. 2008).

In the third part of this chapter the multi-criteria decision-making is presented; it has proven useful in solving spatial decision problems. The main goal of the third section of this chapter is to develop and apply the multi-criteria model for the protection of agricultural land for food self-sufficiency. The issue of food self-sufficiency (i.e. covering domestic food needs with domestic production) and related food security is namely becoming increasingly important both in the world and in the European Union and, of course, in individual European Union Member States, including Slovenia (Court of Audit 2013).

Based on the literature review, the theoretical backgrounds of multi-criteria decision-making, together with its’ use in land-use evaluation and management are introduced. In continuation of this chapter, the multi-criteria model for the protection of agricultural land for food self-sufficiency is developed, taking into account the characteristics of the protection of agricultural land and public data base information in Slovenia. For this purpose, we followed the frame procedure for multi-criteria decision making by using the group of methods based on assigning weights to criteria (Čančer and Mulej 2013) – in this research, we used the Simplified Multi-attribute Rating Technique and the Analytic Hierarchy Process. Selected geographical and economic factors were structured in the criteria hierarchy. In synthesis, the additive model was used in order to select the most favorable solution. The aggregate values obtained with an additive model were completed by considering synergies and redundancies among criteria by a fuzzy measure – discrete Choquet integral. The results enable suggesting measures for the protection of agricultural land for food self-sufficiency.

2 Multivariate and Univariate Statistics

Multivariate statistical methods are extremely popular in contemporary research, both in business, economics and management studies, as well as in the geoinformatics and spatial studies in general. Multivariate statistics provides analytical tools when complex research models are addressed, with many independent and dependent variables, all or majority of them being associated with each other to varying degree. The domains of multivariate and univariate statistics can be characterized by the number of dependent and independent variables involved in the research, which are defined as follows (Tabachnick and Fidell 2013):

  • Independent variables are the conditions that differ, and to which the subjects or objects of analysis are exposed; or are the characteristics that the subjects or objects of the analysis bring into the research.

  • Dependent variables are predicted by independent variables. That is why independent variables are sometimes called predictor variables, while the dependent variables are sometimes called response or outcome variables.

The dependent and independent variables are defined within the context of an individual research, therefore the dependent variable in one research may be in the role of independent variable in the other.

Univariate, bivariate and multivariate statistical methods are usually used simultaneously, especially in the interdisciplinary research (Hu and Zhang 2017) - in several situations, the cross-disciplinary scientific approach, which combines management sciences and natural sciences, is important. Examples can be found in natural resources management (Robinson et al. 2012), in the context of real-estate markets (Benson et al. 2000), or when performing decision-making on the basis of geo-analysis models (Yue et al. 2015).

Univariate statistics refer to the research situation, where single dependent variable is involved, but there may be several independent variables involved (for example testing the difference between the sample mean and population mean, testing the difference between population means in k different groups, that may represent independent or paired groups, one-way analysis of variance and several others). Bivariate statistics usually refers to the analysis of relationship between two variables, usually by Pearson correlation coefficient, Spearman rank correlation coefficient or Chi-square analysis. Multivariate statistical methods are the extension of univariate and bivariate statistics. Within multivariate methodological approach several dependent and independent variables are simultaneously analysed and complex interrelationship among variables are revealed and assessed in statistical inference (Tabachnick and Fidell 2013).

3 Descriptive and Inferential Statistics

Inferential statistical methods use sample statistics to make predictions about the population parameters (Agresti and Finlay 2009), with the quality of the inference being dependent on how well the sample represents the population characteristics. In general, sampling is a process of selecting a few units (a sample) from a bigger group (a sampling population), to become a basis for estimating or predicting the prevalence of an unknown piece of information, situation or outcome regarding the sampling population (Kumar 2005). Ideally samples are selected by some random process, so that they represent the population of interest; two main factors that affect the inferences drawn from a sample are the size of a sample and the extent of variation in the sampling population.

While the descriptive statistics describes samples of subjects or objects of the research in terms of variables or combination of variables, inferential statistical techniques test hypotheses about differences in populations on the basis of measurements made on samples. If differences are reliable, descriptive statistics (sample statistics) can be used to provide estimations of statistical parameters in the population. Descriptive and inferential statistics are rarely an either-or proposition, since we are usually interested in both describing and making inferences about the data, but the restrictions and constraints that should be taken into account are different. For example, if simple description of the sample is the major goal, many assumptions which are necessary for inference within the multivariate statistical methods, may be relaxed (Tabachnick and Fidell 2013).

4 Statistical Inference: Estimation

A single number calculated from a sample, is called a point estimate (a sample statistic):

  • \( \overline{\mathrm{Y}} \) – a sample mean value

  • p – a sample proportion

A sample mean value (statistic) is used to estimate the population mean value (parameter) of the variable analysed, μ, while the sample proportion (statistic) is used to estimate the population proportion (parameter), π (the proportion of statistical units with a certain characteristics).

The sampling distribution of a sample statistic is the probability distribution that specifies probabilities for the possible values the statistic can take (Agresti and Finlay 2009). Each sample statistic has a sampling distribution (sample mean, sample proportion, sample median and so forth). A sampling distribution speciefies probabilities not for individual observations but for possible values of a statistic computed from the observations; it is important in inferential statistics for predicting how close a statistic falls to the parameter it estimates – sampling distribution describes how statistic varies (ibid).

According to the central limit theorem (McClave et al. 2011), the sampling distribution of the random sample mean for large samples is approximately a normal distributionFootnote 1. The approximate normality of the sampling distribution applies no matter what the shape of the population distribution of the observed variable – for large random samples the sampling distribution of \( \overline{\mathrm{Y}} \) is approximately normal even if the population distribution is highely skewed, U shaped or highely discrete (such as binary distribution) (ibid).

How large the sample size must be to make the above conclusion of the central limit theorem, largly depends on the skewness of the population distribution of the observed variable – if the population distribution is bell shaped, the sampling distribution is bell shaped as well for all sample sizes. More skewed distributions require larger sample sizes, but for more cases the sample size of 30 is sufficient, although it may not be large enough to make precise inference (ibid). The result applies also to proportions, since a sample proportion is a special case of the sample mean, for observations coded as 0 and 1 (ibid).

The mean of sampling distribution equals mean of the sampled population (\( \overline{\mathrm{Y}} \) is the unbiased estimate of μ) and the standard deviation of sampling distribution equals standard error of the mean:

$$ {\mathrm{se}}_{\overline{\mathrm{Y}}}=\frac{\mathrm{stand}.\mathrm{dev}.\mathrm{of}\ \mathrm{sample}\mathrm{d}\ \mathrm{population}}{\mathrm{square}\ \mathrm{root}\ \mathrm{of}\ \mathrm{sample}\ \mathrm{size}} $$

If the sample size, n, is large, the sample standard deviation, s, is a good estimator of a standard deviation of sampled population, σ:

$$ \mathrm{s}=\frac{1}{\mathrm{n}-1}\sum \limits_{\mathrm{i}=1}^{\mathrm{n}}{\left({\mathrm{y}}_{\mathrm{i}}-\overline{\mathrm{Y}}\right)}^2 $$

A large samples confidence interval for μ, with the confidence level (1 – α) %, equals

$$ \overline{\mathrm{Y}}\pm {\mathrm{z}}_{\upalpha /2}{\mathrm{se}}_{\overline{\mathrm{Y}}}, $$
(2.1)

where zα/2 is the adequate z-value of a standardized normal distribution.

Commonly used confidence levels are (Tol 2009; Bathelt et al. 2004):

$$ {\displaystyle \begin{array}{l}90\%\Rightarrow {\mathrm{z}}_{\upalpha /2}=1.645\\ {}95\%\Rightarrow {\mathrm{z}}_{\upalpha /2}=1.96\\ {}99\%\Rightarrow {\mathrm{z}}_{\upalpha /2}=2.56\end{array}} $$

Similarly holds for the population proportion. If the sample size is large (the condition is satisfied if np ≥ 15 and n(1 − p) ≥ 15) and a random sample is selected from the target population, the sampling distribution of sample proportion is approximately normal (McClave et al. 2011). The mean of sampling distribution equals population proportion (p is the unbiased estimate of π) and the standard deviation of sampling distribution equals standard error of the proportion:

$$ {\mathrm{se}}_{\uppi}=\sqrt{\frac{\mathrm{p}\left(1-\mathrm{p}\right)}{\mathrm{n}}} $$

A large samples confidence interval for π, with the confidence level (1 – α) %, equals

$$ \mathrm{p}\pm {\mathrm{z}}_{\upalpha /2}{\mathrm{se}}_{\uppi} $$

where zα/2 is the adequate z-value of a standardized normal distribution.

Example 2.1: Some Characteristics of the Real Estate Market

In the city of Maribor, the real estate agency wanted to estimate the market value of the apartments, available on the market, based on the past transactions and characteristics of apartments that were sold in the past. Among the influential factors having an impact on the market value (price in EUR/s.m) also the distance from the city centre was considered. The sample of 1117 past transactions was analysed; the sample was obtained in years 2013 and 2014 (Živkovič and Tominc 2016). The descriptive statistics of the sample data for variables “price per sq. m” and “the distance from the city centre” is as follows (Table 2.1).

Table 2.1 Sample descriptive statistic for real estate data

The frequency histogram, presented at Fig. 2.1 shows the frequency distribution for variable describing the distance of the real estate from the city centre. This Figure also presents the normal distribution, adjusted to the empirical frequency distribution, showing the differences among the empirical distribution (sample data) and theoretical distribution (adjusted normal distribution). Fig. 2.2 presents the frequency histogram for variable describing the price of the real estate (in EUR per sq.m.; frequency distribution and adjusted normal distribution exhibit a higher degree of similarity. The statistical tests for testing the hypothesis about the normal distribution of variables in the population (Kolmogorov-Smirnov and Shapiro-Wilk test) exceed the scope of this chapter, and the reader can get acquainted with them in the literature (Tabachnick and Fidell 2013).

Fig. 2.1
figure 1

Frequency distribution for the “Distance from the city centre” (in m)

Fig. 2.2
figure 2

Frequency distribution for the “Price” (in EUR/sq.m)

Using the above mentioned procedure and Eq. (2.1), the 95% confidence interval for the population mean value for variable “distance from the city centre” is calculated:

$$ \overline{\mathrm{Y}}\pm {\mathrm{z}}_{\upalpha /2}{\mathrm{se}}_{\overline{\mathrm{Y}}}=1.80\pm 1.96\ast \frac{5.36}{\sqrt{1117}}=1.80\pm 0.314 $$

Therefore, we can be 95% confident, that the population mean value for mean distance from the city centre for the apartments sold in the market in 2013 and 2014 in Maribor, lies between the lower bound of 1.486 km and the upper bound of 2.114 km.

Similarly, we can calculate the 95% confidence interval for the population mean value for variable “price in EUR per sq. m”. We can be 95% confident, that the population mean value for mean price in EUR per sq. m, for the apartments sold in the market in 2013 and 2014 in Maribor, lies between the lower bound of 846.385 EUR per sq. m and the upper bound of 868.935 EUR per sq. m.

4.1 The t-distribution

For obtaining a confidence interval for any sample size (also less than 30), the assumption of normal population distribution must be fulfilled – in this case the sampling distribution of \( \overline{\mathrm{Y}} \) is normal even for small sample sizes (Agresti and Finlay 2009). If the population standard deviation, σ, is known, the exact standard error of the sample mean is known as well:

$$ {\mathrm{se}}_{\overline{\mathrm{Y}}}=\frac{\upsigma}{\mathrm{square}\ \mathrm{root}\ \mathrm{of}\ \mathrm{sample}\ \mathrm{size}} $$

and the following equation can be used also when n < 30:

$$ \overline{\mathrm{Y}}\pm {\mathrm{z}}_{\upalpha /2}{\mathrm{se}}_{\overline{\mathrm{Y}}} $$

But in practice, the population standard deviation and therefore the exact standard error of mean, are not known. Substituting the sample standard error, s, for σ, to get the estimated standard error, then introduces extra error. To account for this increased error, the t-distributionFootnote 2 for obtaining confidence intervals, is used. T-distribution is robust against violation of the normal population distribution assumption (a statistical method is called robust with respect to particular assumption if it performs adequately even when the assumption is violated (ibid)). Even if the population distribution is not normal, the confidence interval with the confidence level (1 – α) %, still works quite well (especially if n > 15):

$$ \overline{\mathrm{Y}}\pm {\mathrm{t}}_{\mathrm{n}-1;\upalpha /2}{\mathrm{se}}_{\overline{\mathrm{Y}}}, $$
(2.2)

where tn − 1; α/2 represents the t-score of the t-distribution with (n–1) degrees of freedom, df.

As the sample size gets larger, the normal population distribution becomes less important because of the central limit theorem (ibid).

Example 2.2: Mean Price per sq. m for Apartments with a View to the Lake

In August 2017, it was reported, that Zurich remains the most expensive location for Swiss property at CHF12,250 per square metre. However, houses in Lucerne have gained the most in value over the past decade, with one square metre costing CHF 8500, up 82% on 2007. In particular demand are homes in lake regions, according to a report published on August 17, 2017 by the federal institute of technology ETH Zurich. The area with the second-highest increase in property prices over the past 10 years was Horgen, overlooking Lake Zurich, where a square metre has climbed by 80% to CHF 11,000 (Swissinfo 2017).

Suppose, that a random sample of 20 apartments that have a view to the lake were obtained. The sample mean value for price per sq. m equals 11,110 EUR, with the sample standard deviation that equals 3250.25 EUR.

Using the above described procedure and Eq. (2.2), the 95% confidence interval for the population mean value for variable “price per sq. m” for apartments with the view to the lake, is calculated:

$$ \overline{\mathrm{Y}}\pm {\mathrm{t}}_{\mathrm{n}-1;\frac{\upalpha}{2}}{\mathrm{se}}_{\overline{\mathrm{Y}}}==11,110\pm {\mathrm{t}}_{19;0.025}\ast \frac{3,250.25}{\sqrt{20}} $$

= 11, 110 ± 2.093 ∗ 726.78

Therefore, we can be 95% confident, that the population mean value for mean price per sq.m for the apartments sold in the market, lies between the lower bound of 9588.85 EUR and the upper bound of 12,631.15 EUR.

5 Multivariate Statistical Methods: Binary Logistic Regression

A logistic regression method allows one to predict a discrete outcome, such as group membership from a set of variables, that may be continuous, discrete, dichotomous or a mix. Although the logistic regression answers the same questions as other methods for classification (for example as the discriminant analysis), it is much more flexible regarding any other multivariate method: unlike discriminant analysis, logistic regression has no assumptions about the distribution of the predictor variables, predictor variables do not have to be normally distributed and linearly related to the dependent (grouping) variable, or of equal variance within each group (Tabachnick and Fidell 2013). Unlike a multiple regression method, logistic regression does not require a linear relationship between the dependent and independent variables. Also the error terms (residuals) do not need to be normally distributed and homoscedasticity is not required. As already mentioned, the dependent variable in logistic regression is not measured on an interval or ratio scale.

But some other assumptions still apply. The dependent variable may have two or more categories – binary or multinomial logistic regression analysis may be employed. Binary logistic regression, that is presented in this chapter, requires the dependent variable to be binary. Logistic regression also requires the observations to be independent of each other (the observations should not come from repeated measurements or matched data). Logistic regression also requires there to be little or no multicollinearity among the independent variables. This means that the independent variables should not be too highly correlated with each other. Although this analysis does not require the dependent and independent variables to be related linearly, it requires that the independent variables are linearly related to the logit or log odds (defined later by Eq. 2.3). Logistic regression typically requires a large sample size as well. A general guideline is that one needs at minimum of 10 cases with the least frequent outcome for each independent variable in the model. For example, if you have 5 independent variables and the expected probability of your least frequent outcome is 0.10, then you would need a minimum sample size of 500 (10∗5/0.10) (ibod).

Logistic regression is often used in analyses that use data of geographic location in different fields of research (for example Rodrigues et al. 2014; Bhat et al. 2009).

In the binary logistic regression model, the outcome variable, Y, is the probability of having one outcome or another based on a nonlinear function of the best linear combination of predictors; with two outcomes (ibid):

$$ {Y}_i=\frac{e^u}{1+{e}^u} $$

where Y i is the estimated probability that the i-th case (i = 1,2, …,n) is in one of the categories and u is the linear regression equation:

$$ u=A+{B}_1{X}_1+{B}_2{X}_2+\cdots +{B}_k{X}_k $$

with constant A, coefficients B j and predictors X j for k predictors (j = 1,2, …, k).

Taking the logarithm leads to the logit or log odds in the form

$$ \ln \left(\frac{Y}{1-Y}\right)=A+\sum {B}_j{X}_j, $$
(2.3)

For estimating the coefficients of the logit function the maximum likelihood procedure is used. This is an iterative procedure that starts with the arbitrary values of coefficients and determines the direction and size of change in coefficients that will maximize the likelihood of obtaining the observed frequencies. Then residuals of the predictive model nased on those coefficients are tested and another determination of direction and size of change in coefficients is made. The procedure continues until the coefficients change very little. Maximum likelihood estimates are then those parameter estimates that maximize the probability of obtaining the observed outcome frequencies (Hox 2002).

The procedure od applying the binary logistic procedure is presented in the following example, where the suitability of the model is assessed, using the model Chi-square, Cox and Snell R square, Nagelkerke R square and Classification table (Janssens et al. 2008); the meaning and the significance of the logistic regression coefficients is assessed as well.

Example 2.3: Characteristics of NUTS-2 Regions in Slovenia

We would like to find out whether several indicators associated with spatial, economic and social view-points are characteristical for the NUTS-2 regions in Slovenia. In the NUTS (Nomenclature of Territorial Units for Statistics) codes of Slovenia (SI), the three levels are:

  • NUTS 1, no subdivisions

  • NUTS 2, Macro regions (two)

  • NUTS 3, Statistical regions (12)

The data for 212 municipalities of Slovenia was obtained from the Statistical Office of Slovenia (Statistical office RS 2017) for the following indicators:

  • Population density (population per sq. km, 2nd half of 2016)

  • Natural change (natural increase per 1000 population, 2016)

  • Net migration (total net migration per 1000 population, 2016)

  • Registered unemployment rate, 2016

  • Tertiary education (number of students per 1000 population, 2015/2016)

  • Convicted adults and juveniles per 1000 population (2015)

  • Waste (in kg per capita, 2016)

  • Persons employed per enterprise in the municipality, 2015

  • Number of dwellings per 1000 population

  • Average monthly net earnings (in EUR, 2016)

Municipalities were grouped according to the NUTS 2 macro regions in Eastern region (139 municipalities, value 1) and Western region (73 municipalities, value 0).

The descriptive statistics for variables included into the analysis is presented by Table 2.2.

Table 2.2 Mean values and standards deviations for selected indicators of spatial, social and economic characteristics of municipalities in NUTS-2 regions of Slovenia

The descriptive statistics in Table 2.2 shows, that in the municipalities of the Eastern region there is on average lower population density, lower (negative) natural increase of population, higher registered unemployment rate, lower net migrations, lower number of students, higher number of convicted adults and juveniles per 1000 inhabitants, lower generated waste, almost equal number of persons employed by enterprises in the municipality and number of dwellings per 1000 inhabitants and on average lower monthly net earnings, as compared with the Western region of Slovenia.

The logistic regression results are presented in the Tables 2.3, 2.4, 2.5, 2.6, 2.7 and 2.8 below (discriminant classification technique, that could be useful in such research, cannot be used, since the data significantly differ from the multivariate normal distribution).

Table 2.3 Case processing summary
Table 2.4 Dependent variable encoding
Table 2.5 Omnibus tests of model coefficients
Table 2.6 Model summary
Table 2.7 Classification table
Table 2.8 Variables in the logistic regression equation

The model has been estimated on the basis of 200 cases, due to 12 missing cases, as presented by Table 2.3.

The Dependent variable encoding table (Table 2.4) is linked to the coding of the “region” variable, with 0 for Western region and 1 for Eastern region. It is important, that the dichotomous dependent variable is encoded by 0 and 1, in order to be able to perform the logistic regression and to answer the relevant research question, namely, what is the contribution of a selected independent variable to the differences among both regions.

The output is comprised of two blocks, “Block 0” corresponds to the zero-model (the model is initial model, created only on the basis of constant) and “Block 1” is related to the full-model (that includes all independent variables). Here the tables of the “Block 1″ results are presented.

With the purpose to determine the suitability of the model, the results for the model Chi-square, Cox and Snell R square, Nagelkerke R square and Classification table, are presented in Tables 2.5, 2.6 and 2.7.

The model Chi-square which may be found in Table 2.5, suggests that the full-model is more suitable than the zero-model, with at least one of the regression coefficients of the independent variables is significantly different from zero. The number 10 for degrees of freedom corresponds to the number of estimates in the full-model as compared to the zero-model (regression coefficients for 10 independent variables). Also the Step, Block and the Model Chi-squares are all equal in the current application (blocks of variables are included within the stepwise method).

Cox & Snell R square in Table 2.6 is comparable to R square in the linear regression, in a sense that a higher value corresponds to a better fit of the model, but the Cox & Snell R square cannot accept the maximum value of 1. Nagelkerke R square in Table 2.6 does meet this requirement, since it falls within a range from 0 to 1. This criterion is 0.611, which is on the high side and points to the quite high suitability of the full-model.

In the classification table, presented in Table 2.7, it appears that the full-model classifies 115 + 51 units as being correct (83%), whereas 14 municipalities were incorrectly included into the Western region based on the values of independent variables, and similarly, 20 were incorrectly included into the Eastern region. The score of 83% of correct classification points to the quite high suitability of the full-model, as well.

The next step consists of the assessment of the logistic regression coefficients. On the basis of the Table 2.8, the probability of response variable (Eastern region), using the Eq. (2.3), is:

$$ {\displaystyle \begin{array}{l}\mathrm{u}=2.096-0.007{\mathrm{X}}_1-0.012{\mathrm{X}}_2+0.019{\mathrm{X}}_3+\\ {}+0.677{\mathrm{X}}_4-0.028{\mathrm{X}}_5+0.831{\mathrm{X}}_6-0.001{\mathrm{X}}_7+\\ {}+0.063{\mathrm{X}}_8-0.005{\mathrm{X}}_9-0.007{\mathrm{X}}_{10}\ \end{array}} $$

The Wald statistics, which is Chi-square distributed, and the p-values linked to it, are used to determine whether the regression coefficient differs significantly from zero or not. Results in Table 2.8 reveal, that regression coefficients linked to Population density, Registered unemployment rate, Convicted adults and juveniles per 1000 population and Average monthly net earnings, differ significantly from zero. The interpretation of the regression coefficients is examined in terms of “odds” and “log odds”.

The “log odds” is defined as follows:

$$ u=A+{B}_1{X}_1+{B}_2{X}_2+\cdots +{B}_k{X}_k= Log it= Log\ odds= Log\ \frac{Probability\ (Event)}{Probability\ \left( No\ event\right)} $$

(In our case the probability that the municipality belongs to the Eastern region.)

The regression coefficient B shows the change in the “log odds” for a change in the independent variable X by one unit. The above equation may be transformed into the function of “odds”:

$$ Odds=\frac{Probability\ (Event)}{Probability\ \left( No\ event\right)}={e}^{A+{B}_1{X}_1+{B}_2{X}_2+\cdots +{B}_k{X}_k}\kern0.5em $$

The Exp(B), sometimes called “odds ratio”, reveals that the change of an independent variable for one unit, will result in decrease (in the case of negative B) or in the increase (in the case of positive B) of the “odds”, by a factor expressed by Exp(B).

Therefore, the explanation of the significant regression coefficients is as follows.

  • The negative and significant regression coefficient B linked to the Population density, that equals −0.007 and the “odds ratio” that equals 0.993 indicate, that for the unit increase in the Population density we see the 0.7% decrease in “odds” of being in the Eastern region. Municipalities with the higher Population density are less likely to be a part of the Eastern region.

  • The positive and significant regression coefficient B linked to the Registered unemployment rate, that equals 0.677 and the “odds ratio” that equals 1.968 indicate, that for a unit increase in the Registered unemployment rate we see the 96.8% increase in “odds” of being in the Eastern region. Municipalities with higher Registered unemployment rate are more likely to be a part of the Eastern region.

  • The positive and significant regression coefficient B linked to the Convicted adults and juveniles per 1000 population, that equals 0.831 and the “odds ratio” that equals 2.295 indicate, that for a unit increase in the rate of Convicted adults and juveniles per 1000 population, we see the 129.5% increase in “odds” of being in the Eastern region. Municipalities with higher rate of Convicted adults and juveniles per 1000 population, are more likely to be a part of the Eastern region.

  • The negative and significant regression coefficient B linked to the Average monthly net earnings, that equals −0.007 and the “odds ratio” that equals 0.993 indicate, that for the unit increase in the Average monthly net earnings, we see the 0.7% decrease in “odds” of being in the Eastern region. Municipalities with the higher Average monthly net earnings, are less likely to be a part of the Eastern region.

The analysis leads to the conclusion that the Eastern region is scored lower regarding several spatial, economic and social indicators, with significantly lower results especially regarding the population density and average monthly net income of inhabitants, while significantly higher rates of registered unemployment and convicted adults and juveniles per 1000 population.

6 Multi-criteria Decision Making

Multi-criteria decision making (MCDM) based on assigning weights, describes the set of approaches for making decisions with respect to several, more or less conflicting criteria. They should be used when intuitive decision making is not enough for several reasons: e.g. conflicting criteria or disagreement between decision makers about what criteria are relevant or more important and what alternatives and preferences are acceptable.

The approach used in this chapter follows a prescriptive approach to decision-making (Raiffa 1994): instead of regarding people as perfect rational individuals, we develop systematic decision-making procedures supportive of decision-making. Based on Belton and Stewart’s (2012) process of MCDM:

  • From problem identification and structuring,

  • Through model building and its application for informing and thinking,

  • To the action plan for problem solving,

It follows the frame procedure for MCDM based on assigning weights that has already been introduced and well-verified in practice (Čančer and Mulej 2013):

  • Problem definition and structuring. We identify the problem, describe it thoroughly and determine criteria and alternatives. An MCDM problem can be structured in the form of the hierarchy of criteria and alternatives.

  • Criteria weighting and measuring local alternatives’ values. Weights express the relative importance of criteria. Their values are from 0 to 1. As decision makers may have difficulties when determining the weights of criteria directly, expressing the importance of criteria can be supported with several methods. By using the computer support, the criteria’s importance can be expressed by using the methods based on:

    • ordinal (e.g. SMARTER),

    • interval (e.g. SWING and SMART) and

    • ratio scale (e.g. AHP), or

    • by direct weighting.

When using the SMARTER method (Edwards and Barron 1994), we rank the attributes in the order of importance for the attribute changes from their worst level to the best level. We start from the most important attribute. The weights can be obtained by the centroid method. In SMART (Edwards 1977), we assign 10 points to the least important attribute change from the worst criterion level to its best level, and then give points (more than or equal to 10, but less than or equal to 100) to reflect the importance of the attribute change from the worst criterion level to the best level relative to the least important attribute change. When using the SWING method (Von Winterfeldt and Edwards 1986) for expressing the criteria’s importance, we assign 100 points to the most important attribute change from the worst criterion level to the best level, and then assign points (less than or equal to 100, but more than or equal to 10) to reflect the importance of the attribute change from the worst criterion level to the best level relative to the most important attribute change. In SMART and SWING, the weight of the j th criterion, w j, is obtained by:

$$ {w}_j=\frac{t_j}{\sum_{j=1}^m{t}_j}, $$
(2.4)

where t j – the points given to the j th criterion, and m – the number of criteria. When the criteria are structured in two levels, the weight of the s th attribute of the j th criterion, w js, is obtained in SMART and SWING by:

$$ {w}_{js}=\frac{t_{js}}{\sum_{s=1}^{p_j}{t}_{js}}, $$
(2.5)

where t js – the points given to the s th attribute of the j th criterion, and p j – the number of the j th criterion sub-criteria.

The AHP method is based on pairwise comparisons of criteria’s importance. Expressing judgments on criteria’s importance is based on the 9-point scale: 1 – a criterion is equally important as the compared criterion, 3 – a criterion is moderately more important than the compared criterion, 5 – a criterion is strongly more important than the compared criterion, 7 – a criterion is very strongly more important than the compared criterion, and 9 – a criterion is extremely more important than the compared criterion. Even numbers are interpreted by means of odds: 2 means from equally to moderately more, 4 from moderately more to strongly more etc. The local weights of criteria are calculated by using eigen vectors and eigen values (for details see Saaty 1999).

The values of alternatives with respect to criteria on the lowest hierarchy level (attributes) can be measured by:

  • Value functions,

  • Pairwise comparisons, and

  • Directly.

When using value functions, the lower and upper bounds should be defined, and then appropriate formula (for example, for linear, piecewise linear or exponential value function) should be applied to obtain the local values of alternatives.

Pairwise comparisons of alternatives’ values with respect to the criterion on the lowest hierarchy level is one of major advantages of the AHP method, especially when nominal data are available. Expressing preferences to alternatives is based on the 9-point scale; the strengths of preferences can be described as follows: 1 – an alternative is equally preferred as the compared alternative, 3 – an alternative is moderately more preferred than the compared alternative, 5 – an alternative is strongly more preferred than the compared alternative, 7 – an alternative is very strongly more preferred than the compared alternative, and 9 – an alternative is extremely more preferred than the compared alternative. The procedure for calculating the local values of alternatives is equal to calculating the criteria weights.

Synthesis, ranking and sensitivity analysis. In the Multi -Attribute Value (or Utility) Theory (MAVT or MAUT) and the methodologies that were developed on its bases (e.g., SMART, AHP), the additive model is used in synthesis: the aggregate (final) value of each alternative is the sum of the weighted values of the alternative with respect to criteria:

$$ v\left({X}_i\right)=\sum \limits_{j=1}^m{w}_j{v}_j\left({X}_i\right), $$
(2.6)

for each i = 1, 2, …, n, where v(X i) is the value of the i th alternative, w j is the weight of the j th criterion and v j(X i) is the local value of the i th alternative with respect to the j th criterion.

When the criteria are structured in two levels, the aggregate alternatives’ values are obtained by

$$ v\left({X}_i\right)=\sum \limits_{j=1}^m{w}_j\left(\sum \limits_{s=1}^{p_j}{w}_{js}{v}_{js}\left({X}_i\right)\right), $$
(2.7)

for each i = 1, 2, …, n, where p j is the number of the j th criterion sub-criteria, w js is the weight of the s th attribute of the j th criterion and v js(X i) is the local value of the i th alternative with respect to the s th attribute of the j th criterion.

The use of the additive model (2.6) or (2.7) is not appropriate when there is an interaction among the criteria. If the criteria can interact with each other, not only the weights of each criterion but also weighting on subsets of criteria should be considered (Moaven et al. 2008; Sridhar et al. 2008). Marichal (2000) defines and describes three kinds of interaction among criteria that could exist in the decision-making problem: correlation, complementary, and preferential dependency.

The aggregate values obtained with an additive model can be completed by considering synergies and redundancies among criteria by a fuzzy measure – discrete Choquet integral. Let us consider a finite set of alternatives X and a finite set of criteria K in an MCDM problem. In order to have a flexible representation of complex interaction phenomena between criteria, it is useful to substitute to the weight vector w a non-additive set function on K allowing to define a weight not only on each criterion, but also on each subset of criteria. For this purpose, the concept of fuzzy measure has been introduced. A suitable aggregation operator, which generalizes the weighted arithmetic mean, is the discrete Choquet integral.

Proposed in capacity theory (Choquet 1953), the concept of the Choquet integral was used in various contexts, among them in non-additive utility (value) theory (Grabisch 1995; Marichal 2000). Let us adapt its definition to the MAVT. Following (Grabisch 1995) and (Marichal 2000), this integral is viewed here as an m-variable aggregation function; let us adopt a function-like notation instead of the usual integral form, where the integrand is a set of m real values, denoted v = (v 1, …, v m) ∈  n. The (discrete) Choquet integral of v ∈  n with respect to w is defined by

$$ {C}_w(v)=\sum \limits_{j=1}^m{v}_{(j)}\left[w\left({K}_{(j)}\right)-w\left({K}_{\left(j+1\right)}\right)\right], $$
(2.8)

where (.) is a permutation on K – the set of criteria, such that v (1) ≤ … ≤ v (m), where K (j) = {(j), …, (m)}. Similarly, to consider the interactions among the second-level criteria, the alternative’s value with respect to the j th first level criterion can be substituted with the Choquet integral

$$ {C}_w\left({v}_j\right)=\sum \limits_{s=1}^{p_j}{v}_{(js)}\left[w\left({K}_{(js)}\right)-w\left({K}_{\left( js+1\right)}\right)\right]. $$
(2.9)

With ranking, we can select the most appropriate alternative(s), eliminate the alternative(s) with the lowest aggregate value, or compare the alternatives with respect to their aggregate values.

Several types of sensitivity analysis enable decision makers to investigate the sensitivity of the goal fulfilment to changes in the criteria’s weights (e.g. gradient and dynamic sensitivity), and to detect the key success or failure factors for the goal fulfilment (e.g. performance sensitivity). Jankowski (1995) addressed another advantage of performing sensitivity analysis, namely the problems of imprecision, uncertainty and inaccurate determination of decision makers’ preferences are addressed in MCDM techniques by sensitivity analysis of decision recommendations.

7 Use of Multi-criteria Decision Making in Land-Use Evaluation and Management

Many spatial decision making problems, such as location or site selection, the selection of optimal utility routes in urban or rural areas, and land use decision making, are complex problems and thus require considering multiple, more or less conflicting criteria when choosing the best alternative(s). From 1980s onwards, multi-criteria evaluation methods were introduced to spatial decision making and Geographical Information Systems (GIS) (Malczewski 1999; Massam 1980; Voogd 1983), together with file exchange modules and computer programs (see, e.g., Jankowski 1995).

As Burian et al. (2012) exposed, new methods and approaches were developed and applied in spatial planning in the last 15 years (for example, GIS, Global Positioning Systems (GPS), and remote sensing techniques). In addition, Burian et al. (2015) presented a new approach to automatic optimal land use scenario modelling using the developed ArcGIS Urban Planner extension, functional in the assessment of land suitability and the detection of optimal areas suitable for urban development. The land suitability was assessed with respect to physico-geographical factors and socioeconomic factors with the weights, set at modelling final scenarios.

GIS based multi-criteria analysis has been used for supporting decisions about several problems. Rikalovic et al. (2014) applied it in industrial site selection. Ogryzek and Rzasa (2017) used it in the revitalization of space, applied on a local revitalization program. To assess revitalization projects of various types of urban public space, Palicki (2015) created valuation models of different types of public space based on multi-criteria analysis methodology taken into account the multidimensionality of assessment criteria (space and economic, social, economic, urban and cultural indicators); the PROMETHEE method with the software D-sight was used. In addition, Torrieri and Batà (2017) used the Integrated Spatial Multi-criteria Decision Support System (ISMDSS), i.e., the integration of GIS and multi-criteria analysis, in strategic environmental assessment, to support the preparation of environmental assessment reports and the construction of scenarios for the adoption of urban plans. The table of effects included the agriculture, forestry, tourism, industry, noise, waste, and atmosphere and water indicators.

Moreover, Rinner (2007) proposed to use principles of geographic visualization in conjunction with multi-criteria evaluation methods to support expert-level spatial decision making and presented a case study where the AHP was used to calculate composite measures of urban quality of life for neighborhoods. The study of land analysis for urban extension presented by Chen (2014) introduces the multi-criteria decision analysis based on the additive model. Due to data limitations, this study was based on simplified factors, such as population, employment and average income. The lowest weight was assigned to average income, and the highest to population (Chen 2014). Yang et al. (2008) combined suitability modeling with remote sensing, landscape ecological analysis and GIS to develop a spatial analyzing system for urban expansion land management; to address the uncertainties during the evaluation process, grey relational analysis (GRA) was combined with the AHP. Land evaluation was done with respect to the indicators of biological and water resources, soil resources, and society and economy. Among the first level criteria, the lowest weight was assigned to society and economy, and the highest to soil resources.

Javadian et al. (2011) focused on environmental suitability analysis of educational land use by using AHP and GIS. The criteria were access range, compatibility and slope. A GIS-based MCDM using pairwise comparisons of the AHP was also applied in investigating suitable locations for a new recreational park (Lawal et al. 2011). Along with the criteria academic area buildings, student residential zone and pedestrian paths, the model included also the criteria road networks, slope and land use (Lawal et al. 2011).

The research of Nyeko (2012) explores application of MCDM and GIS to find the best spatial allocation of land to future agriculture and forest development. Factors concerning allocation to agriculture were rainfall, road, settlement, population, water, normalized difference vegetation index, and land use. The AHP was used to determine the criteria’s weights. Nyeko (2012) stated that the objectives of land allocation to agriculture are to increase productivity of land and the scale of farming and to protect the environmental including controlling soil erosion and soil degradation. The biophysical parameters used in suitability analyses of land for agriculture land use were the Normalized Difference Vegetation Index (NDVI) used as measurement of biomass density and soil fertility, settlement and road network maps used in determining accessibility, rainfall map used in assessing adequacy of rainfall received, and reference land use map used in setting the constraints. The most important criterion was rainfall, and the least important criteria were NDVI and land use.

Pažek et al. (2018) presented a multi-criteria DEXi model for the assessment of individual less favored areas and farming systems with respect to criteria of sustainability and farming potential. The criteria were as follows: farm description, social structure, amount of natural handicap payment, amount of agri-environment payments, amount of direct payments (Pažek et al. 2018). This multi-attribute DEXi model can be applied to a smaller number of farms, emphasizing the qualitative aspects of decision making based on judgments when measuring values of alternatives, which is proven suitable when there is a lack of numeric data.

Besides the multi-criteria applications in spatial decision making based on the approaches of multi-criteria decision making, the literature review offers also the applications based on multi-criteria optimization. Baja et al. (2007) developed spatial modelling procedures for agricultural land suitability analysis using compromise programming and fuzzy set approach within a GIS environment. However, GIS is not needed in each spatial decision making decision support. For example, the Multi-criteria Analysis Shell for Spatial Decision Support (MCAS-S) (Australian Government, Department of Agriculture and Water Resources ABARES 2017) is a software tool that enables support of multi-criteria analysis in spatial decision making without GIS programming, removing the technical obstacles to non-GIS users.

8 The Development of the Multi-criteria Model for the Protection of Agricultural Land for Food Self-Sufficiency

8.1 Problem Definition and Structuring

Protection of agricultural land and food self-sufficiency are indispensable in reducing the risk associated with climate change, increasing demand for ecological goods and services, the threats of natural and social catastrophes, and economic crisis.

Since 2000, the European Landscape Convention (Council of Europe 2000) has been developing a new perspective for the perception of a landscape socio-ecological system where people become a crucial part of this system, providing services and benefits to human well-being due to sustainable transformations and ecological resource preservation (Mele and Poli 2017).

As stated in the Resolution on the strategic orientations for the development of Slovenian agriculture and food industry by 2020 (UL RS 2011), the basic task of agriculture is to ensure adequate supply of safe food. According to the United Nations, the countries in the climate zone which includes Slovenia need about 3000 square meters of farmland (fields, meadows and orchards) per habitant for the required amount of food, which amounts to approximately 600,000 hectares (UL RS 2011). Court of Audit of the Republic of Slovenia found that in 2010 the competent authorities in Slovenia were not successful in protecting agricultural land (Court of Audit 2013), which reflects in food self-sufficiency.

The multi-criteria model for the protection of agricultural land for food self-sufficiency should thus include selected geographical and economic factors, structured in the criteria hierarchy. Because of the geographical and economic diversity of statistical regions in Slovenia, the global goal is to measure the capacity of alternatives – statistical regions for food self-sufficiency with respect to geographical and economic factors.

As the heterogeneity of the public, private, and voluntary dataset entails different standards, formats and scattered sources (Mele and Poli 2017), we limited on the latest available official data sources of the Statistical Office of the Republic of Slovenia (SURS). To include the data about the agricultural land in overgrowing (Glavan et al. 2017), we also used the available data of the ministry, responsible for agriculture and food (MKGP). The multi-criteria model for the protection of agricultural land for food self-sufficiency includes the first level criteria that are geographical and economic factors, second level criteria or attributes, and alternatives. The alternatives in the model are statistical regions, according to the Nomenclature of Territorial Units for Statistics, the so-called the NUTS 3 level, as follows: Pomurska (X 1), Podravska (X 2), Koroška (X 3), Savinjska (X 4), Zasavska (X 5), Posavska (X 6), Jugovzhodna Slovenija (X 7), Osrednjeslovenska (X 8), Gorenjska (X 9), Primorsko-notranjska (X 10), Goriška (X 11), Obalno-kraška (X 12). The hierarchy of the selected relevant criteria for which data can be obtained in the data sources of SURS and MKGP is presented in Table 2.9.

Table 2.9 Relevant criteria

8.2 Criteria Weighting and Measuring Local Alternatives’ Values

The weights of the first-level and then of the second-level criteria were determined hierarchically, so that the sum of the weights on the lower level with respect to the criterion on the higher level was one. Following the importance attributed to the geographical and socioeconomic factors in several studies (e.g., Burian et al. 2015; Pažek et al. 2018; Yang et al. 2008) we made the following pairwise comparison of the importance of the first level criteria with respect to the global goal: the geographical factor is twiceFootnote 3 as important as the economic one. The weights of the second-level criteria of the geographical factor were determined directly, i.e. considering the available data of utilized agricultural area, agricultural land with limited opportunities for farming taking into account the evaluation that only 30% of agricultural land with limited opportunities for farming is arable land (Pažek et al. 2018) and agricultural land in overgrowing in Slovenia. The importance of the second-level criteria of the economic factor was expressed by using the SWING method (see Fig. 2.3), by following the results of several studies about the importance of socioeconomic factors to food self-sufficiency, and characteristics in Slovenia. According to Chen (2014) and Rikalovic et al. (2014), employment is the most important attribute, therefore 100 points were given to the change from the lowest to the highest level. Similarly, 100 points were given to the change of the annual working units. Furthermore, 30 points less, i.e. 70 points, was given to the average monthly net earnings per employee in agriculture, forestry and fisheries.Footnote 4 Because insufficient size of agricultural holdings presents a serious problem in Slovenia (UL RS 2011), 100 point were given to the change from the lowest to the highest level of the economic size of an agricultural holding.Footnote 5 From this point of view, the change from the lowest to the highest level of the number of agricultural holdings is 20 points less important, and the change from the lowest to the highest level of the number of agricultural holdings with economic size >100,000 EUR is 50 points less important.Footnote 6 The weights in Fig. 2.3 were calculated by following (2.5). The obtained weights of criteria are presented in Table 2.10.

Fig. 2.3
figure 3

Expressing the importance of the second-level criteria of the economic factor by using the SWING method. (Note: NAH – number of agricultural holdings, AWU – annual working units, ES AH – economic size of an agricultural holding, ES AH 100,000 – number of agricultural holdings with economic size >100,000 EUR, EMPLOYMENT – employment in agriculture, forestry and fisheries, NET EARNINGS – average monthly net earnings per employee in agriculture, forestry and fisheries)

Table 2.10 Weights of criteria

The latest public available data with respect to the second-level criteria (attributes) in the official data sources of SURS in 2018, together with the latest available data for the agricultural land in overgrowing of MGZT were considered when measuring the local alternatives’ values – for this purpose, value functions were used. Decreasing linear value functions were used to measure alternatives’ values with respect to agricultural land with limited opportunities for farming, in percentage of utilized agricultural area, and agricultural land in overgrowing, and increasing linear value functions were used to measure alternatives’ values with respect to other second-level criteria: utilized agricultural area, number of agricultural holdings, annual working units, economic size of an agricultural holding, number of agricultural holdings with economic size >100,000 EUR, employment in agriculture, forestry and fisheries, and average monthly net earnings per employee in agriculture, forestry and fisheries. In order to ensure the greatest distinction between the alternatives, the lower bound is equal to the lowest datum and the upper bound is equal to the highest datum for each second-level criterion. The obtained local alternatives’ values are presented in Table 2.11.

Table 2.11 Local alternatives’ values

8.3 Synthesis, Ranking and Sensitivity Analysis

To obtain the alternatives’ values with respect to geographical factor, to economic factor and the aggregate alternatives’ values with respect to all criteria included in the criteria hierarchy (Table 2.9), the additive model was used. For instance, the aggregate value of Pomurska (X 1) is calculated by (2.7), considering the weights in Table 2.10 and the local alternative’s values in Table 2.11:

$$ {\displaystyle \begin{array}{l}v\left({X}_1\right)={w}_1\left({w}_{11}{v}_{11}\left({X}_1\right)+{w}_{12}{v}_{12}\left({X}_1\right)\right.+\\ {}+\left.{w}_{13}{v}_{13}\left({X}_1\right)\right)+{w}_2\left({w}_{21}{v}_{21}\left({X}_1\right)+{w}_{22}{v}_{22}\left({X}_1\right)\right.+\\ {}+{w}_{23}{v}_{23}\left({X}_1\right)+{w}_{24}{v}_{24}\left({X}_1\right)+{w}_{25}{v}_{25}\left({X}_1\right)+\\ {}+\left.{w}_{26}{v}_{26}\left({X}_1\right)\right)\end{array}} $$

The results in Table 2.12 show that the highest aggregate value with respect to all criteria and with respect to geographical factor was obtained by Podravska (X 2), followed by Pomurska (X 1), and the lowest aggregate value with respect to all criteria and with respect to geographical factor was obtained by Zasavska (X 5). With respect to economic factor, the highest value was obtained by Podravska (X 2), followed by Savinjska (X 4), and the lowest level was obtained by Zasavska (X 5). Sensitivity analysis (see Fig. 2.4) showed that changes of the first-level criteria does not influence the ranking of the alternative with the highest aggregate value Podravska (X 2); if the weight of geographical factors decreased for more than 0.14 (if it would be less than 0.53), Savinjska (X 4) region would have ranking 2 instead of Pomurska (X 1). The results of the gradient sensitivity analysis showed that the ranking results are not sensitive to changes in the second-level criteria weights.

Table 2.12 The alternatives’ values, obtained with the additive model
Fig. 2.4
figure 4

The results of the sensitivity analysis with respect to the geographical factor – additive model

The aggregate values obtained with an additive model were completed by considering an interaction among criteria by a fuzzy measure – discrete Choquet integral. The results of correlation analysis let us report that there is positive correlation between utilized agricultural area and agricultural land with limited opportunities for farming, where correlation coefficient is 0.807, p < 0.01. Positive correlation can be overcome by using a weight on a subset of criteria w 11,12 < w 11 + w 12, note that w 11 + w 12 = 0.960 (Table 2.10), w 11,12 = 0.807. For instance, for Pomurska (X 1), where v 13 < v 11 < v 12 (Table 2.11), we have

$$ {\displaystyle \begin{array}{c}{C}_w\left({v}_{11},{v}_{12},{v}_{13}\right)={v}_{13}\left[{w}_{13}{,}_{11}{,}_{12}-{w}_{11}{,}_{12}\right]+\\ {}+{v}_{11}\left[{w}_{11,12}-{w}_{12}\right]+{v}_{12}{w}_{12},\end{array}} $$
(2.10)

where w 13,11,12 = 1. Following (2.9), the Choquet integral for other alternatives can be expressed. However, as the ranking of the alternatives’ values with respect to the geographical factor in Table 2.11 differs for each of the considered alternatives, we cannot use (2.10) as the common formula for calculating the Choquet integral in the example case; it should be expressed for each alternative by considering (2.9).

The values of the Choquet integral with respect to geographical factor are presented in Table 2.13.

Table 2.13 The alternatives’ values, obtained by considering an interaction between criteria with the Choquet integral

The redundancy factor between utilized agricultural area and agricultural land with limited opportunities for farming decreased the aggregate values of Pomurska (X 1), Podravska (X 2), Osrednjeslovenska (X 8) and Goriška (X 11) region and changed the ranking of Pomurska (X 1) and Savinjska (X 4) region.

9 Conclusions

Quantitative methods in analyzing and processing data in order to obtain the information that organizations need for effective business decision-making, have a number of advantages, which are illustrated by cases of the use of these methods in the field of spationomy. On the other hand, they also have a range of restrictions.

An important practical limitation in the use of inferential statistics is the fact that it is based on a random sample; obtaining a random sample (i.e. a sample where each statistical units from a population, has an equal and independent chance (probability) of selection into the sample and even more, this probability can be calculated) is in most cases expensive, measured in both time as in money. On the other hand, only a random sample of statistical units provides a reliable bases for generalization of sample statistics to the parameters of the statistical population.

The logistic regression, which we present in the second part of the chapter, is a very useful method for analyzing relations between variables since the assumptions upon which logistic regression is based can in principle be easy to ensure. However, the limitation of logistic regression is that it does not provide information about the influence of the selected variables on the response variable (as the results of logistic regression are often misinterpreted) but the odds ratio is a measure of the nature and the strength of an association (Agresti and Finlay 2009).

The results of the application of MCDM methods presented in the third part of this chapter enable suggesting measures for the protection of agricultural land for food self-sufficiency.

Based on the literature overview it can be concluded that the majority of multi-criteria applications in spatial decision making are based on the additive model assuming no interactions between criteria. Thus, aggregate alternatives’ values are obtained as weighted arithmetic means (the sum of the weighted values of the alternative with respect to criteria). Pairwise comparisons within the AHP method are commonly used for criteria weighting and measuring local alternatives’ values.

The crucial limitation is the availability of the up-to-date data in public data sources for the attributes of the geographical and economic factors delineated by statistical regions. This limitation influenced the selection of the second-level criteria, structured in the criteria hierarchy.

Further research possibilities are to extend the criteria hierarchy to sustainable development, which will result in interdisciplinary cooperation of several stakeholders when making decisions for the protection of agricultural land for food self-efficiency. Another improvement of the presented model is to include the criteria about which the data can be obtained with on-site analysis and questionnaires. A promising possibility is to connect the GIS and MCDM. Further research possibilities include also the use of MCDM for solving other complex decision making problems of spatial economy, such as location or site selection, the selection of optimal utility routes in urban or rural areas, the assessment of LFAs, the evaluation of urban or rural quality of life, landscape services, public space from different perspectives, and in the environmental assessment.