Building Well-Being Composite Indicator for Micro-Territorial Areas Through PLS-SEM and K-Means Approach

In the analysis of the difference in the distribution and profiles of the equitable and sustainable well-being, the territorial dimension is a fundamental reading-key for local policies since it allows the areas of advantage or relative deprivation to emerge more accurately. Specifically, in Italy the provincial level coincides with the administrative area of metropolitan cities, which are the subject of growing attention from European and national policies. The BES 2018 report by Italian National Institute of Statistics (ISTAT) has confirmed that from 2015 an improvement in many areas of well-being has been marked, even if territorial differences remain stable both in levels and dynamics. These differences appear in some cases as real structural differences between the North and South of Italy. Then, the measures of equitable and sustainable well-being in the territories allow, in various degrees, to deepen and specify this situation employing synthetic measures of well-being. In this work, we propose a statistical methodology focused on the simultaneous partial least squares structural equation modeling and simultaneous K-means clustering to obtain a composite indicator of Italian well-being and at the same time a classification of Italian territorial micro-areas by means of the just updated provincial data about BES 2018. In this way, the territorial differences of well-being can be more reliably and more exactly defined on the basis of the relationships among all elementary indicators and domains proposed in the analysis of well-being by ISTAT.


Introduction
The territorial dimension is a very important key for local policies in the analysis of the distribution and profiles of the equitable and sustainable well-being, since it allows areas of advantage or relative deprivation to emerge more accurately. This is relevant overall in Italy, where the provincial level coincides with the administrative area of metropolitan cities, objects of growing attention from European and national policies.
The BES 2018 report by Italian National Institute of Statistics (ISTAT) has confirmed that from 2015 an improvement in many areas of well-being has been observed, even if territorial differences remain stable both in the levels and dynamics. These differences appear in some cases as real structural differences between the North and South of Italy. Then, the measures of equitable and sustainable well-being in the territories allow, in various degrees, to deepen and specify this situation employing synthetic measures of well-being.
In the present study, the Partial Least Squares Structural Equations Models and, simultaneously, K-means clustering method (PLS-SEM-KM) proposed by Fordellone and Vichi (2020), are employed in order both to build a well-being composite indicator and simultaneously cluster the territorial micro-areas on the basis of different levels of the built composite indicator.
The methodology PLS-SEM-KM (Fordellone and Vichi 2018), differently from the proposed PLS-SEM methods, does not mainly focus on heterogeneous structural or measurement model relations but on the isolation and homogeneity (between and within clusters) derived by a unique structural and/or measurement relationship (Fordellone et al. 2019).
For a review on the use of PLS-SEM to build composite indicators, see also Esposito Vinzi et al. (2010), Boccuzzo and Fordellone (2015), Cataldo et al. (2017), Lauro et al. (2018), and Davino et al. (2018). Russolillo (2012) extends the PLS-SEM to the non-metric approach (NM-PLS-SEM) in order to spread the applicability of PLS method to data measured on different measurement scales, as well as to variables linked by non-linear relationships. NM-PLS is based on the concept of optimal scaling (OS). This methodology is useful for composite indicator building when observed variables are qualitative and quantitative.
In general terms, the researcher is focused on clustering the units and identifying a composite indicator of well-being and its structural and measurement relations, based on a set of observed variables that characterise both the well-being levels and the clusters of territorial micro-units or regions aggregated through well-being levels.
The paper is structured as follows: in Sect. 1, the well-being concept is defined; in Sect. 2, the measurement of well-being is discussed; in Sect. 3 the PLS-SEM-KM approach is presented; in Sects. 4 and 5 the application on BES data of the PLS-SEM-KM simultaneous approach and the results obtained by the composite indicator construction are shown, respectively; in Sect. 6 some concluding remarks on the proposed methodology and suggestions for future research are given.

Defining Well-Being: An Open Challenge
On April, 15th 2014, Alex Michalos, leader of "Movement for Social Indicators" in 1960s and editor of Encyclopedia of Quality of Life and Well-Being Research (2014), in an interview with Dan Weijers (editor of International Journal of Wellbeing) declared: • by evaluative sense as a global, contemplative, long-term assessment, reflective of quality of life over the life course; • by emotional sense measured by the positive or negative affect-based mood experienced immediately and potentially more transient (Kahneman and Deaton 2010;Graham 2010); • by subjective experience like psychological experiences, attitudes, life choices, preferences, etc.; measured through population surveys by Cantril's self-anchoring ladder; • by objective dimensions measured by external factors and numerical indicators of income, health, environmental quality, security, and other tangible goods as determinants of life satisfaction and quality or happiness (Sen 1999).
In Dodge et al. (2012) propose a new definition of well-being as the balance point between a psychological, social and physical individual resources pool and the psychological, social and physical challenges faced.
Besides academia and popular literature, agencies in the governmental sphere are concerned with well-being, since happiness and well-being have long featured in politics and in public policy (Allin and Hand 2014). Hence, the research on well-being is employed to inform policy with the aim of increasing overall societal well-being and many measurement programmes, such as the Measuring National Well-being by Office for National Statistics in United Kingdom, explore the role of well-being in formulating and analysing public policy (Allin and Hand 2017).
Measuring well-being presents challenges at various levels. Firstly, how to construct reliable measures in order to capture both the concept of interest and be sensitive to differences among their components to obtain accurate measures. Furthermore, statistical issues to contend with include selection bias in collected data, identifying the sources of uncertainty and measuring the effect, and producing, a single overall measure, or different indicators for different aspects of national well-being measured on different scales.
Since the introduction of the System of National Accounts (SNA) and until the end of the twentieth century, policy-makers focused their actions on maximising the growth of economic measures such as the Gross Domestic Product (GDP) per capita which was mostly used as a proxy for the measurement of a population's well-being, neglecting that such as measures do not take into account a wide range of different dimensions or domains affecting living conditions, environmental quality, personal health, security, and family and community relationships (Afsa et al. 2008;Rojas and García-Vega 2017). Despite the fact that many economists assessed that an economic growth does not automatically imply an increase in overall quality of life, due to an ineffective coincidence between production and well-being, as a consequence sheer economic measures could be inadequate to depict complex phenomena, such as well-being (Kuznets 1937). Even its creator, Kuznets in 1934 declared at the American Congress: "the welfare of a nation can scarcely be inferred from a measure of national income" (Adler and Seligman 2016).
Nevertheless GDP may be an extremely useful economic indicator, it ignores many factors related to the well-being concept, such as health care or life quality. If GDP is not fully related to social progress, then other relevant, measureable, and reliable measures have been defined and operationalized, taking into account rather than a single metric (Forgeard et al. 2011), multiple dimensions aligned with actual well-being levels. Over last four decades of research have yielded an extensive number of measures for different domains of well-being based on instruments that best capture social well-being, and individuals, organizations, and governments can choose domains to devise strategic policies. Then, some alternative measures have been explored to analyse well-being and design policies (Giannetti et al. 2015): • Measure of Economic Welfare (MEW); • Sustainable MEW (SMEW) based on adjustments made to the Net National Product (NNA) by Nordhaus and Tobin (1972); • the Japanese SMEW and the Zolotas' Economic Aspect of Welfare Index (Redclift 2005); • the Human Development Index (HDI) from the United Nation Development Program (Ul Haq 1995), which underlines the importance of non-monetary measures; • the Index of Sustainable Welfare (Daly and Cobb 1994) introducing the concept of sustainability; • the Genuine Progress Indicator (Talberth 2007).
The measures of well-being, economic progress, and social welfare are adopted as drivers for designing public policies by decision makers and governments (Jayawickreme et al. 2012;Layard 2011;Sachs 2012). They more accurately depict changes not only in individual living standards (Helliwell et al. 2012) but simultaneously also in comprehensive national economic growth (Diener et al. 1985(Diener et al. , 2009. Since around 2000 the Organisation for Economic Cooperation and Development (OECD) embarked on a global project to measure the well-being and progress of societies in ways that were not just about economic performance, involved in setting up and supporting the Commission on the Measurement of Economic Performance and Social Progress (CMEPSP), established by the then President of France, Nicolas Sarkozy, and convened in 2008 and led by Joseph E. Stiglitz, Amartya Sen and Jean-Paul Fitoussi. On the report of the Commission, the aim was "to identify the limits of GDP as an indicator of economic performance and social progress, including the problems with its measurement; to consider what additional information might be required for the production of more relevant indicators of social progress; to assess the feasibility of alternative measurement tools, and to discuss how to present the statistical information in an appropriate way" (Stiglitz et al. 2009, p. 8) in order to come up with a new, broader definition of prosperity but "GDP is not wrong as such but is wrongly used" (ibidem). The limits of GDP are reviewed also as a standard of the well-being of societies taking into account, for example, how GDP does not address economic inequality, happiness, quality of life, wellness, and other crucial societal parameters, and does not integrate environmental services into economic decisions (Stiglitz et al. 2010).
The Commission's Report (Stiglitz et al. 2009) is nominally a very interesting set of proposals or guidelines for creating alternatives to GDP on how to develop measures of wealth and social progress in three basic domains of material conditions, quality of life, and sustainability, encouraging international statistical organizations to modify their set of statistical indicators in light of their recommendations in the wake of the worst financial, economic and social crisis in post-war history which severely affected most of the economics all over in the world in 2008. The aim is to avoid the future being riddled with financial, economic, social, and environmental failures by changing the way that we live, consume and produce, suggesting that changing the way that economic performance is measured, is a necessary precursor to changing behaviour (Stiglitz et al. 2009).
Specifically, the evolution of modern economies has produced many structural changes, which make the measurements of outputs and performances more difficult than in the recent past, such as the growing share of medical, educational, research, security, financial services and the production of many goods like information and communication technologies. For the development of a broad statistical measurement system, the Report recommends "to shift emphasis from measuring economic production to measuring people's well-being" (Stiglitz et al. 2009, p. 12) without dismissing GDP and other economic measures, like income, consumption, and wealth, but complementing by several dimensions of people's quality of life standards or material well-being, like inequality, health, education, personal activities, and environmental conditions. The aim is to assess well-being and progress of society and to predict life satisfaction and, thus, sustainability of at least the current level of well-being for future generations by means of a dashboard of reliable, robust, and accurate statistical measurements or indicators of social connections, political voice, and security in order to support making decisions, designing and implementing policies, and affecting the management of economic markets by governments, institutions, businesses, and individuals.
More recently, other scholars (Ven 2015;Fleurbaey 2015) have called for a new generation of multifaceted and more comprehensive well-being measures, better able to describe actual living standards and useful for a more accurate design of policies improving efficiency in resources assignment. For instance, Office of National Statistics in the United Kingdom has developed new and more comprehensive well-being indicators but with limited geographical scope (Dolan and Metcalfe 2012;Everett 2015), whereas a much larger initiative was undertaken with the Better Life Index (Durand 2015), which provides information on several well-being dimensions, and many other measures have been proposed in literature (for a recent review, Barrington-Leigh and Escande 2018), used also to compare the level of well-being across the countries (Peirò-Palomino and Picazo-Tadeo 2018).
Often the increasing gap between the socio-economic statistical measures, like the information of the aggregate GDP data, and the citizen perception of the same phenomena is explained through the lack or inadequacy of good and understandable metrics and their appropriate use. In accordance with the complexity of modern economics and the widespread supply of information technology, new statistical indicators covering new domains are being produced to supplement the national accounts, and in particular to go beyond the headline of GDP metric, just inadequate to gauge over time the economic, environmental, and social dimensions of well-being, often referred to as sustainability. GDP is not the only right measure of economic growth (Van den Bergh 2009) and most of actual indicators do not reflect the meaning intended by the government (Tasaki and Kameyana 2015).
A series of measures of well-being, inspired by the Nussbaum-Sen approach to human capabilities and subjective well-being (Nussbaum and Sen 1993), have been proposed in an attempt to go beyond GDP with the aim to broaden the scope of effects in the assessment of policies. For instance, the Human Development Index by UNDP or the Better Life Initiative launched by the OECD (OECD 2015) and many other approaches are based on the income, health, and education measurements of the countries' performance (for a review, see Fleurbaey and Blanchet 2013). With the aim of overcoming the limits of the measures of subjective well-being and social welfare, Decancq and Schokkaert (2016) proposed to calculate a new measure of the level of well-being. It is based on the concept of equivalent income of an individual as hypothetical income which, combined with the best performance of other non-economic dimensions, would define the individual income as well as actual income. Michalos et al. (2011) assesses as the weakest feature of the Commission's Report to link the requirements of an acceptable measure of well-being or quality of life to those of an acceptable measure of sustainability: "The assessment of sustainability is complementary to the question of current well-being or economic performance, and must be examined separately…. confusion may arise when one tries to combine current well-being and sustainability into a single indicator." (Stiglitz et al. 2009, p. 17). In Michalos's opinion, it is true that a good measure of the quality of life is a necessary condition for possessing of a good measure of its sustainability but "… there is a clear asymmetry of order such that the second task cannot be accomplished unless the first task is accomplished and directly linked to it." (Michalos et al. 2011, p. 121). As a consequence, he considers it dangerous to insist on the need to separate the two tasks even though only a single measure will not accomplish both tasks. The sustainability is a very different topic and it is better measured separately by indicators about the level of capital transmitted to future generations than by the level of GDP of the current generation (Neumayer 1999;Fleurbaey and Blanchet 2013).

Measuring Well-Being by Indicators for Policy Making
According to the wider literature on well-being indicators (Scott 2012;McGregor 2015), they shed light on the nature and role of evidence in promoting well-being in policy formulation (Bache 2019). Well-being indicators are expected to enhance the rationality of policy making and public debate by providing a supposedly more objective, robust, and reliable information base, for purposes ranging from scientific, professional, and experiential knowledge to political administration in various venues of policy-making. Moreover, in Mulgan's (2005) terms, well-being indicators can be considered an inherently novel policy field such as measures of government performance, mandatory reporting, auditing and ex ante and ex post evaluation by external agencies for advocacy for specific worldviews, community empowerment and capacity building, within a framework called 'governance by numbers' (Lehtonen 2015). Then, indicators are employed to policy formulation.
Also subjective indicators are expected to focus on policy issues (Lehtonen 2015), in assessing performance and comparing policy options or objectives, in order to monitor the quality of service, inform about the choices and debate in the media, or justify a given policy design in terms of choice of models, tools and measures, and also for performance benchmarking, public accountability, agenda-setting, best practices adoption, resource allocation decisions, and monitoring progress by non-governmental actors and stakeholders (Seaford 2013).
Much of the existing literature on well-being indicators lacks the general consensus on what well-being means and how to measure it, many statistical approaches have been adopted to build composite measures as a composite indicator or a composite index through conceptual and mathematical combinations of different elementary indicators based on theoretical frameworks (Salzman 2003;Maggino 2017) and taking into account the availability of data over time and in territorial units (Mazziotta and Pareto 2013).
From a technical point of view, many methods are employed to measure the well-being level through composite indicators but no method is universally valid to select indicators based on theory-driven criteria and suitable to measure correctly the concept, to aggregate, and normalise a set of input variables and define a weighting and aggregating system (OECD 2008;Dobbie and Dail 2013), with the aim to simplify the analysis of the multidimensional concept in accordance with a formative or reflective measurement model, where elementary indicators are causes or effects of the latent variable, respectively (Michalos 2014;Simonetto 2012).
In order to select elementary indicators suitable to capture different aspects of the equitable and sustainable well-being concept "on the basis of their analytical soundness, measurability, country coverage, relevance to the phenomenon being measured and relationship to each other" (OECD 2008, 15), synthetic approaches finalised to the computation of robust and valid composite indices have become increasingly widespread to allow for a direct comparison across countries, or regions, and over time, as well as for easy communication of their performance to policy-makers and citizens. Principal Component Analysis (PCA) for metric variables or Categorical Principal Components Analysis (CATPCA) for nominal, ordinal and continuous variables have been largely employed with the specific aim to reduce the multidimensionality of economic development, quality of life, or welfare composite indices. When the aim is to build a composite indicator, PCA or CATPCA are inappropriate (Shalizi 2009;Jolliffe and Cadima 2016), mainly because they ignore the polarities and the meaning of indicators. Recently, in the building process of the wellbeing composite indicator the Geographically Weighted PCA has been proposed to derive a set of local weights taking into account the spatial variability of the elementary indicators involved, assessing human and ecosystem well-being in the Italian urban areas (Sarra and Nissi 2019).
Nevertheless, PCA is also used for this purpose but the choice of the measurement model is crucial because it defines if the relationships between the composite phenomena to be measured as latent variable and the elementary indicators are determined through a formative or reflective form (Mazziotta and Pareto 2019). In the case of well-being and other economic composite measures based on objective and subjective indicators (Maggino and Zumbo 2012), the most employed models are formative because the latent measures are determined by non-interchangeable elementary indicators such as health, income, occupation, services, environmental quality, etc., and not vice versa: the well-being value increases if the value of any indicator-and not necessarily of all-improves. In the case of psychological and management sciences, the reflective models are used for scaling models of satisfaction or attitudes, based exclusively on subjective measurements.
In any case, it can be important to distinguish between (a) components of the measure, (b) potential causes which influence the value that it takes, (c) consequences which change as it takes different values and (d) other indicators which are merely correlated with it (Van Beuningen et al. 2014). Of course, deciding into which category a variable fits may not be straightforward. Nonetheless, such relationships can be used to enhance the accuracy of a measure through regression estimation or more elaborate linear structural relational models, for instance Fayers and Hand (2002)  The awareness about the crucial role in policymaking and benchmarking (OECD 2008) and the relevance of BES indicators in terms of economic and financial planning reached its peak with the declaration by the Italian Government on 2014 to monitor the progress of some BES indicators considered relevant within the annual Economic and Financial Document (DEF). In particular, the Government with the Committee for BES indicators has enforced the analysis of 12 indicators included in the dimensions of the BES, selecting for the monitoring the indicators 1, 2, 7, 11 and also 3 of the following list.
1. available average income adjusted per capita 2. index of inequality of disposable income 3. index of absolute poverty 4. life expectancy in good health at birth 5. excess weight 6. early exit from the education and training system 7. rate of non-participation in the work, with relative breakdown by gender 8. ratio between the employment rate of women aged 25-49 with pre-schoolers and women without children 9. predatory crime index 10. index of efficiency of civil justice 11. CO2 emissions and other altering climate gases 12. index of illegal construction.

Algebraic Notations
Before showing the modeling details, the notation and terminology used in this paper are here presented (Table 1) to allow the reader to easily follow the subsequent formalizations and algebraic elaborations.

Model and Algorithm
Given the n × J data matrix X, the n × K membership matrix U, the K × J centroids matrix C, the J × P loadings matrix = H , L , the n × P latent variables matrix = [ , ] , and the errors matrices Z, E and D, the Partial Least Squares K-Means model can be written as follows: subject to constraints: (1) T = ; and (2) ∈ {0, 1} , 1 K = 1 n . Thus, the PLS-SEM-KM approach includes the PLS-SEM and the clustering method (i.e., = and then, = becomes = ). In fact, the third set of equations is the Reduced K-means model (De Soete and Carroll 1994) and the three sets of equations will produce a partitioning of the units and the  corresponding SEM, simultaneously. Moreover, gap method discussed in Tibshirani et al. (2001) is embedded in the PLS-SEM-KM algorithm in order to automatically select the optimal number of clusters. Note that, in the PLS-SEM-KM algorithm the centroid matrix C and the loadings matrix simultaneously converge to an optimal solution that turns out to be at least a local minimum. It is important to remember that the algorithm, given the clustering constraints on U, can be expected to be rather sensitive to local optima. For these reasons the use of a multi-start procedure is recommended, i.e., PLS-SEM-KM is randomly started several times and the best solution is retained (for details on this methodology the reader can refer to Fordellone and Vichi 2020). In fact, in our application we have used 200 random starts and the results seem to be more stable.

Building a Well-Being Composite Indicator Through BES Data at Local Level
In the Commission's Report (Stiglitz et al. 2009), we can read: "At the national level, round-tables should be established, with the involvement of stakeholders, to identify and prioritise those indicators that … The Commission hopes that this report will provide the impetus not only for this broader discussion, but for on-going research into the development of better metrics" (p. 18). So, in other European countries, the National Statistical Institutes are involved in cross-cutting issues regards measuring the inequalities and/or inequities in the distribution of economic measures and assessing the relationships between material living standards, health, education, etc., in order to constructing an aggregate composite scalar measure, such as a quality of life index (Michalos 2005;Maggino and Zumbo 2012;Haq and Zia 2013). The debate on the meaning of well-being and its measurement has produced many studies carried out at country scale and specifically, at local level, too. The well-being composite indicator and the indicators tout-court, indeed, are purposely aimed to inform and affect societal, political and institutional processes. In Italy, indicators of well-being are being used more and more in policy-making at national level but also regional or local level involving public institutions.
From a theoretical point of view, the relationship of well-being assessment with policy-making process in sectors such as healthcare, education and training, or local services is the rationale proposed for analysing the well-being measures at local level. Since the policies of local governmental authorities have a direct and huge impact on the social and economic context where the people lives, the assessment of living standards at provincial level allows evaluating the economic, environmental and social needs of the citizens by the policy-makers at any level of government, in order to implement and design decentralised policies to take on the real issues.
Several studies (Eger and Maridal 2015). compare well-being at national level among industrialized countries by using the official UNDP Human Development Index (HDI) (Conte et al. 2007) or measuring the HDI at provincial level (Casmiri and Di Berardino 2013;Monni 2002) or measuring the socio-economic development and living conditions among the Italian provinces (Nuvolati 2003).
In Italy, the report on equitable and sustainable well-being (Benessere Equo e Sostenibile, BES) is published by ISTAT every year together with the updated set of indicators developed by ISTAT since 2013, which have now become a reference point at national and territorial level ("Provincial BES" and "UR-BES" initiatives). Some studies have focused on the construction of well-being composite indexes to evaluate and compare well-being specifically across the Italian provinces. Mazziotta and Pareto (2019) obtain a global well-being index by aggregating 11 composite indices with AMPI (Mazziotta and Pareto 2016) and rank Italian provinces for each dimension of well-being and a general ranking.
Based on a data dashboard containing 41 elementary indicators, selected from the original 88 related to the 2014 edition of BES, aggregated by 11 domains for the Italian provinces, Chelli et al. (2017) define the method for normalizing elementary indicators and compare different aggregative approaches proposing a class of composite indices for each BES domain based on Adjusted Mazziotta-Pareto Index (AMPI) and the Gini based weighted average (GW and RGW). As above depicted, AMPI takes into account the unbalanced distribution among the indicators belonging to the same well-being domain, whereas GW and RGW depend on the distribution of each indicator across the local units. Also combinations of them are considered (GAMPI and RGAMPI). The indices allow at illustrating the difference in the rankings of the different patterns among Italian provinces and just within same region, accounting both for different distribution of the values of indicators between the provinces, expressed in terms of the Gini coefficient, and for the variability within each domain (Mazziotta and Pareto 2016).
Recently, Calcagnini and Perugini (2019) have proposed a composite indicator of wellbeing for the Italian provinces (NUTS-3) based on the methodology of the regional Index of Regional Quality of Development (QUARS) to analyse the extent to which the socioeconomic heterogeneity in individual and contextual features within region affect the wellbeing among adjacent provinces.
The conceptual structure (Giovannini et al. 2012) of the BES considers 9 domains related to aspects that directly influence well-being (health, education and training, work and reconciliation of life time, economic welfare, social relations, safety, subjective wellbeing, environment and landscape, and cultural heritage), plus 3 instrumental or context domains (politics and institutions, research and innovation, and service quality). The work is not just an editorial product, but a line of research, a process that takes the multidimensionality of well-being as a starting point and, through the analysis of a wide set of indicators, describes in a comprehensive way the quality of life in Italy. A series of 130 elementary indicators and a synthesis through composite indicators related to all of 12 domains are organised in 12 chapters, each corresponding to each well-being domain (ISTAT 2018).
In the present study, we employ the methodology focused on the simultaneous Partial Least Squares Structural Equation Modeling (PLS-SEM) and K-Means clustering to obtain a composite indicator of Italian well-being and, simultaneously, a classification of the Italian provinces based on BES 2018 available provincial data. The dataset consists in 109 units (Italian provinces) and 16 available indicators organised in 9 different domains and employed as manifest variables in the analysis (Table 2 and Fig. 1). The conceptual structure of the 9 domains corresponds to the one given by Giovannini et al. (2012). For the reduced availability of data, 5 domains: "Health" (LV1), "Economic Well-Being Income and Inequality" (LV4), "Policy and Institutions" (LV5), "Cultural Heritage" (LV7) and "Innovations Research and Creativity" (LV9) are actually represented each one by a single MV: MV1, MV8, MV9, MV13 and MV16, respectively. Thus, in these cases the MV should represent a strong proxy of the corresponding dimension. The polarity of the MVs with respect to the general concept of well-being is reported in Table 2, where the positive sign shows concordance between the MV and the general concept of well-being.
The path diagram in Fig. 1 shows the structure of the relations considered for the construction of the composite indicator.
The diagram represents the multiple indicators multiple causes (MIMIC) model, where some formative latent blocks and an overall reflective block are included in the model (for details see Ringle et al. 2012). Table 2 List of manifest variables for each domain at provincial level Source: ISTAT, BES 2018 a The ± sign next to each indicator is referred to the relationship with well-being concept: positive sign means concordance between well-being and the manifest variable; negative sign discordance b Incoming and outcoming migration rate of Italians (25-39 years old) with tertiary degree is computed through the ratio between the migration balance (difference between enrolled and cancelled by transfer of residence) and residents. For provincial values intra-provincial movements are not considered but among provinces of the same region or intra-regional movements and regional movements and with foreign countries, too With the application of PLS-SEM-KM model, we have identified three homogeneous wellbeing groups of Italian provinces. The optimal number of clusters (Fig. 2) identified corresponds to the maximum value the of pseudo-F function (around 1.1). Table 3 shows the loading matrix obtained by the measurement models estimated through PLS-SEM-KM with the statistical significance level regarding the t test. It can be Fig. 1 Path diagram of the specified structural equation model observed that all MVs have significant reflective effects (correlation different from zero) of the corresponding LV, except for "Overcrowding of prisons" (MV9), which shows a non-significant loading (0.050), in fact, also the sign is not correct. Moreover, the direct effect on the well-being by MV9 is not statistically relevant (0.202). Thus, the evidence shows that the indicator "Overcrowding of prisons" is insufficient in defining the (LV5) dimension "Politics and Institutions", and other additional MVs should be collected and included. We do not move this MV in a different dimension because we wish to be consistent with the structure defined by Giovannini et al. (2012). A similar comment applies also to the (MV13), "Availability of urban green" and the corresponding (LV7) "Cultural Heritage". This time the MV just has a slight significant reflective effect of the LV (0.105) and also a direct effect of the well-being (0.226); however, also in this case, it is clear the need to include more MVs to better describe this dimension, because, as it is now, there is a limited level of relations. The last column of Table 3 shows the "direct" correlations of each MV with the well-being dimension (i.e., the composite indicator). From this analysis, we can see that the theoretical polarity associated to each observed variable (see Table 2) is well described by the measurement-PLS approach. It is worth underlining that "Work and Conciliation of Life Times" is a dimension (LV3) well represented by: (MV4) "Unemployed rate", (MV5) "Unemployed rate 20-64 years" (Extra proxy), (MV6) "Young employment rate 20-29 years" and (MV7) "Young unemployment rate 20-29 years" (Extra proxy), since the correlation is, in absolute terms, around 0.9. Similar considerations apply to "Health" (LV1) with MV1 "Healthy life expectancy at birth"; "Education and Training" (LV2) with (MV2) "Graduates and other tertiary degrees 30-34 years" and (MV3) "Early exit from the education and training system (NEET)"; "Innovation, Research and Creativity" (LV9) with (MV16) "Mobility of Italian graduates 25-39 years" (Extra proxy). Table 4 shows the proportion of the total variance of the nine second-order constructs explained by each MVs and the explained variance by the single constructs. The amount of this proportion of explained variance is equal to about 80% of the total variance: a very good value. It is interesting to observe that the variance is explained mainly by: "Work and Conciliation of Life Times" (LV3, 24%); "Education and Training" (LV2, 10%) and "Safety" LV6 (9%); while all remaining dimensions produce each an explained variance about equal to 6%.
In Table 5 the path coefficients' matrix of the estimated structural-PLS model is shown. From the structural model we can see the formative part of the Italian provincial wellbeing, which is highly affected by the Work-life balance construct (0.666), followed by Health (0.324) and Economic well-being (0.247) constructs. Whereas, the values of Cultural heritage (0.067) and Education (0.053) constructs are very low and, then the contributions of the dimensions are very negligible. Politics and Institutions represented by "Overcrowding of prisons" is confirmed to be not statistically significant and therefore we underline the need to have additional indicators to better characterise this dimension.
In summary, we can say that the overall fit of the structural model is good: R 2 = 0.74 , i.e. the mean value of R 2 for all the endogenous constructs is high, such as the measurement models, where almost all the communalities are bigger than 0.5. In terms of clustering results, seems that the 3 clusters identified by the PLS-SEM-KM algorithm describe 3 level of well-being, i.e., high, medium, and low. Figure 3 shows the boxplots of the normalized latent scores distributions represented by each cluster.
For high level of well-being in the first cluster, the latent dimensions with high values similar to well-being values are safety, economy, and education; whereas, work-life and health are lower. For a medium level of well-being the values of health, environment, and work-life dimensions are more similar to well-being values and all the other dimensions have higher values. All the dimensions show higher values than a very low level of wellbeing in the third cluster.
The Italian provinces are classified into each cluster according with the different values of the well-being composite indicator, as represented in Fig. 4.
The cluster structure shows that the group with high well-being levels is mostly composed of the northern Italian provinces and some provinces of central regions of Italy for    The results of the present study are consistent with rankings of the Italian provinces according to measurements such as life quality level. In addition, the classification of provinces in the clusters has many connections with the distribution of provinces by BES level along the ISTAT methods.
Finally, since only some provinces are allocated on the borderline between different clusters, e.g., Imperia, Ascoli Piceno, Taranto, and Barletta-Andria-Trani, the overlapping could be solved by extending the PLS-SEM-KM model to a fuzzy clustering approach.

Conclusions
In wide a range of applications for empirical data analysis, the assumption that data are collected from a single homogeneous population is often unrealistic. In particular, the identification of different groups of observations and their appropriate consideration in PLS Path Modeling constitutes a critical issue.
The traditional approach of segmentation in Structural Equation Modeling consists in estimating separate models for objects segments, which have been obtained by assigning observations to a priori segments. Then, each class has different component scores, structural coefficients, outer weights and loadings. PLS-SEM-KM estimates the best partition of

Fig. 4
Italian well-being provincial represented by each cluster: Top-plot for cluster 1: high well-being; Middle-plot for cluster 2: medium well-being; Bottom-plot for cluster 3: low well-being the units corresponding to the best PLS-SEM model and vice versa, thus it may be seen to define a consensus of the traditional class-conditional PLS-SEM models. This methodology is particularly useful in the composite indicator construction, providing, as additional tool, the classification of the units. Units in the classes have similar value of the composite indicator and therefore the methodology allows to identify units in different classes that significantly differ form each other. In fact, in the case of well-being we have estimated the relevant relationships between the (latent) Italian provincial well-being indicator and its domains. The PLS-SEM-KM approach has provided a single PLS-SEM estimation of relations between MVs and LVs, these last modelled according to conceptual structure proposed by (Giovannini, et al. 2012). The good fit of the model allows us to say that the conceptual structure measured with the 16 MVs is confirmed except for the Politics and Institutions that need to be better defined by additional MVs. The methodology has guaranteed the identification of the best partition of provinces according to this conceptual structure. Three classes have been identified corresponding to three levels of provincial well-being. The gap statistics has guided the correct identification of the number of clusters with the largest isolation among segments, measured according to the deviance between classes, and the smallest heterogeneity within segments measured by the deviance within each class.
To get agreement on the identification of dependent variables and their relationships in a defined context is important for driving scientific research where a comprehensive and empirically well-supported theory is lacking. A robust theory is needed in order to find relationships and to develop lines of study helpful for breaking new ground.
From a substantive point of view, the promise of enhancing our understanding of the role of well-being in the world encourages future research, as well-being and its underlying constructs continue to provide crucial knowledge to inform policy-makers.
A last note is relevant about the availability and usefulness of data at local level because "…sometimes indicators are available at a very detailed territorial level, but they are not robust or complete enough…" (Taralli et al. 2015). As known, ISTAT is engaged in a project aimed at constructing a provincial BES in order to provide additional information and implement well-being studies at local level.
Furthermore, a good index of well-being should provide a complete description of how the economic system works because no single measure can cover the full range of environmental, social and economic issues and the use/combination of different approaches should be the subject of future research (Giannetti et al. 2015).