A Linguistic Group Best–Worst Method for Measuring Good Governance in the Third Sector: A Spanish Case Study

The need of Non-profit Organizations (NPOs) of generating trust and credibility, to their stakeholders by an efficient management of their resources, lead them to openly show that they develop adequate good governance practices. But this is not a simple task and few research has been done on measuring methods of good governance in this field; without achieving an agreement about the best procedure. This paper aims at facilitating the measurement of good governance practices in NPOs by a fuzzy linguistic consensus-based group multi-criteria decision-making (MCGDM) model that will provide agreed and easy-understanding weights for a list of indicators proposed by the stakeholders and entities in such good governance practices. To do that, a linguistic 2-tuple BWM method with a consensus reaching process (CRP) will be developed and then applied to a real-world case in Spain, in which a group of experts from significant Spanish NPOs will assess the list of indicators proposed by the most representative entities (the alliance between the non-governmental organizations (NGO) Platform for Social Action, and the NGO Coordinator for Development (CONGDE) to obtain a prioritization of such indicators for measuring the good governance practices in Spanish NPOs.


Introduction
Recently, Third Sector 1 has grown rapidly in significance and size in many countries [3] and it is usual to consider non-profit organizations (NPOs) as one of the most important agents in their society [4]. Classically, NPOs aimed at enhancing in charge of improving the quality of disadvantaged people life [5]. Thus, with its growing global importance entails responsibility towards the community. Such a responsibility was established in the Code of Ethics and Conduct of the World Association of Non-Governmental Organizations where it is stated that NPOs have a responsibility to be transparent, honest, responsible, and ethical and disclose accurate information [6]. Hence, NPOs worldwide are increase demanded for accountability and improved transparency [7,8].
Within the 2030 Agenda for Sustainable Development issued in 2015, the United Nations (UN) promulgated 17 Sustainable Development Goals (SDGs) that play a key role in its evaluation and implementation. Our study is focused on the goals ''16.6 Develop effective, accountable and transparent institutions at all levels'' and ' '17.19 Build on existing initiatives to develop measurements of progress on sustainable development that complement gross domestic product, and support statistical capacity building in developing countries''.
Accountability is useful for stakeholders forming a clear idea about the funds administration by NPOs. It is precisely at this point, where the terms ''non-profit responsibility'' together with ''good governance practices'' arise. However, an analysis on good governance in the Third Sector reveals a series of concerns: • It shows how literature on accountability issues in the Third Sector is limited [9]. Furthermore, much remains to be understood in terms of accountability mechanisms [10,11], given the scarce research on the analysis of governance mechanisms [12][13][14]. • The scant number of researches that empirically measure the levels of non-profit good governance. Different institutions have prepared their own proposals based on lists of good governance indicators [15] and most of these lists do not weight the indicators and, if they are weighted, there is not any theoretical or empirical validation for such weights.
Previous issues drive us to conclude that there is no a clear consensus on the use of a certain procedure for measuring the degree of good governance in the field of NPOs.
To fill previous gaps up, we propose a novel linguistic consensus-based multi-criteria group decision-making (MCGDM) approach that models the problem of weighting a list of indicators for good governance in NPOs and aims at achieving agreed and easy-understanding results. Among the different proposals that can be found in the literature to deal with MCGDM weighting criteria [16][17][18], our proposal takes as basis the Best-Worst method (BWM) presented by Rezaei in [19] and introduces a new consensusbased linguistic group BWM in order to: • Smooth out the conflicts among experts involved in the weighting of indicators for Good Governance in NPOs by applying a CRP [20][21][22]. • Weighting indicators for Good Governance in NPOs by the BWM method that gathers information from multiple decision-makers. To facilitate the elicitation of such information, it is used, modified and extended, the linguistic 2-tuple BWM [23] that will manage the inherent uncertainty, related to the vagueness of meaning of knowledge expressions provided by decision-makers. Additionally, it will also provide easy-understanding results for all stakeholders in spite of their different knowledge and background.
Notice the BWM derives criteria weights from pairwise comparisons provided by an expert. Concretely, the expert should choose the best and worst criterion and then compare these ones with the remainder. However, in MCGDM problems, as the one studied in this proposal, experts may disagree on the selection of the best and/or worst criterion or on the pairwise comparisons, resulting in different weights for the same criterion for each expert. To face this issue, we extend the BWM by including a consensus approach based on a comprehensive minimum cost consensus model (CMCC) [24]. A CMCC model is a nonlinear programming model that guarantees to achieve a collective consensual solution by preserving as much as possible the initial preferences of each expert. In this way, we are able to obtain global consensual weights for all the criteria. Additionally, we make use of an extension of the BWM, so-called 2-tuple BWM [23], which allows modeling the experts' preferences by 2-tuple linguistic values [25]. However, this method does not represent properly, from a linguistic point of view, the criteria weights, making difficult the interpretation of the results from the experts. Keeping in mind the previous limitations, the main novelties of our proposal are listed below: • A linguistic consensus-based approach that aims at removing disagreements in the final weighting of the indicators. It is remarkable that no previous BWM proposals within MCGDM problems have considered consensus approaches yet. • The recent extension of BWM, so-called 2-tuple BWM [23] that models input and output information by linguistic 2-tuples [25] and facilitates the interpretability and keeps accuracy of the results, will be modified to model the linguistic output by building an adequate syntax and semantics for the linguistic term set used to express the weights obtained using a fuzzy unbalanced linguistic approach [26].
Eventually, once the model has been developed to show the performance of the consensus-based 2-tuple BWM it will be undertaken an empirical analysis about good governance in NPOs, empirically validating indicators with the decision-makers' opinions in accountability of these entities. Specifically, such an empirical analysis will focus on the Spanish social economy context, in which Spain plays a key role within the European Union, and in which the list of indicators to be weighted is the proposed by the alliance between the Social Action NGO Platform and the NGO Coordinator for Development (CONGDE, hereinafter) in 2019 [27]. This case study focuses on the Spanish context since, in the field of the social economy, Spain has become a pioneer country, being the first member of the European Union to develop a law on social economy, the Law 5/2011, of March 29 [28]. This law gives a relevant value to this model of provision of services to citizens, granting it a unique recognition to this sector. Besides, Spain has been the first member state of the European Union to implement a Social Economy Strategy 2017-2020 based on 63 measures that are supported by 11 strategic axes. For the development of the 2030 Agenda, Spain places the Spanish Social Economy Strategy 2017-2020 as an essential element to achieve the Sustainable Developments Goals (SDGs) promulgated by the United Nations. Here is the relevance of studying the Spanish case. The paper is set up as follows: In the Sect. 2, different concepts of the 2-tuple linguistic model, 2-tuple BWM and the CRPs that will be used in our proposal are revised and then a review of related works about good governance in the Third Sector is done to understand the importance of studying the challenges facing the sector. Sect. 3 introduces the novel consensus-based 2-tuple BWM that will be used in Sect. 4 to provide a proper and consensual weighting for the good governance indicators in the Spanish case. Afterwards, the results obtained are analyzed in Sect. 5. Then, the proposal is compared with other similar studies in Sect. 6. Finally, Sect. 7 concludes this paper.

Background
This section revises basic concepts about 2-tuple linguistic model, 2-tuple BWM and CRPs that are necessary to understand our proposal. Eventually, a related works section about Good governance in Third Sector is presented to understand the importance and application of our proposal.

2-Tuple Linguistic Model
Fuzzy linguistic approach [29] has been broad and successfully used for dealing with uncertainty in decisionmaking. Among the fuzzy linguistic models used in decision-making, outstands the 2-tuple linguistic model [25] because it facilitates the accomplishment of linguistic computations with high readability and precision. High readability is related to the use of a computing with words (CW) approach [30], in which both decision-makers' preferences and results are represented linguistically. High precision is related to the use of the symbolic translation concept. The 2-tuple linguistic model represents the information by a 2-tuple ðs i ; aÞ where s i is a linguistic term belonging to a predefined linguistic term set S ¼ fs 1 ; . . .; s g g and a 2 ½À0:5; 0:5Þ a numerical value that represents the symbolic translation of the membership function of the linguistic term s i : To accomplish CW processes with linguistic 2-tuple values, different functions were established. The function D S , from a value of b 2 ½0; g, returns an equivalent 2-tuple linguistic value, ðs i ; aÞ: . . .; s g g be a set of linguistic terms and S the 2-tuple set associated with S defined as S ¼ S Â ½À0:5; 0:5Þ: where roundðÁÞ assigns the closest integer number i 2 f0; . . .; gg to b.
The function D À1 S , from a 2-tuple linguistic value ðs i ; aÞ, returns its equivalent numerical value b in the interval of granularity of S, [0, g].
. . .s g g be a linguistic term set and ðs i ; aÞ 2 S be a 2-tuple linguistic value: Remark 1 A linguistic term s i 2 S can be transformed into a 2-tuple linguistic value in S by including a symbolic translation equal to zero: The 2-tuple linguistic model has been extended to facilitate computations. Specifically, Tai et al. [31] proposed an approach in which any value b 2 ½0:1 can be mapped to the linguistic 2-tuple terms set S 0 via the map- Again, the 2-tuple linguistic value ðs i ; aÞ could be translated into an equivalent numerical value b 2 ½0:1: Definition 3 [31] Let S ¼ fs 0 ; . . .s g g be a linguistic term set and ðs i ; aÞ 2 S 0 be a 2-tuple linguistic value. There is a function D 0 À1 S : with b 2 ½0:1 & <.

2-Tuple Best-Worst Method
BWM was introduced by Rezaei in [19] as a MCDM method for computing the criteria weights by reducing the number of pairwise comparisons among criteria and the inconsistency of decision-makers' preferences. In the BWM, such pairwise comparisons are so-called reference comparisons. In the original proposal, decision-makers use a numerical scale to provide their preferences. However uncertain contexts definition are very common in realworld MCGDM problems and cannot be easily handled using numerical precise assessments. For this reason, Labella et al. [23] proposed the 2-tuple BWM able to model both decision-makers' preferences and results by linguistic information. This model consists of the below steps based on the classical BWM: • Step 1: To determine a set of decision criteria, C ¼ fC 1 ; . . .; C n g. • Step 2: To select the best criterion C B and the worst criterion C W . In case that there are several best and worst criteria, these can be selected arbitrary. • Step 3: To make linguistic pairwise comparisons among C B and the rest of the criteria using the linguistic scale S BWM represented in Table 1 and obtaining the Best to Others (BO) vector, BO ¼ fa B1 ; a B2 ; . . .a Bn g, where a Bj denotes the preference degree of C B over the criterion C j and a Bj ! 1; j ¼ 1; 2; . . .n; j 6 ¼ B. • Step 4: To make analogous linguistic pairwise comparisons among C W and the rest of the criteria and obtaining the Others to Worst (OW) vector, . . .a nW g, where a jW denotes the preference degree of the criterion C j over C W and a jW ! 1; j ¼ 1; 2; . . .n; j 6 ¼ B or W. where ðd Bj ; a Bj Þ is the decision-maker's linguistic preference over the comparison of C B with c j and ðd jW ; a jW Þ denotes the linguistic preference for the comparison of C W with c j .
• Step 6: To compute the criteria weights by an optimization model from the LBWCM in which the maximum absolute differences jw B =w j À D À1 S BWM ððd Bj ; a Bj ÞÞj and jw j =w W À D À1 S BWM ððd jW ; a jW ÞÞj should be minimized: ðM À 1Þ minn s:t: j w j w W À D À1 S BWM ððd jW ; a jW ÞÞj n; w j ! 0; forall j ¼ 1; 2; :::; n: At this point, the priority weights that are numerical, are transformed into linguistic weights. These linguistic weights are represented through a linguistic term set (see Table 2) since any value from the interval b 2 ½0; 1 can be linguistically interpreted by Definition 2. This approach presents an important limitation from the syntax point of view in the fuzzy linguistic scale for representing criteria weights. Because the current fuzzy linguistic scale, S BWM w , is symmetric and equally distributed and due to the fact that the numerical weights should sum 1, the linguistic representation of most of weights obtained  by the method are usually modeled by either VU or U which could be a bit confuse. Such a problem is further detailed in the below example: Example 1 Let us suppose a MCDM problem composed by six criteria whose resulting numerical weights obtained from (M-1) are w 1 ¼ 0:26, w 2 ¼ 0:282, w 3 ¼ 0:141, w 4 ¼ 0:141, w 5 ¼ 0:035 and w 6 ¼ 0:141. Their linguistic representation according to the fuzzy linguistic scale represented in Table 2 is w 1 ¼ ðU; 0:01Þ, w 2 ¼ ðU; 0:032Þ, w 3 ¼ ðU; À0:109Þ, w 4 ¼ ðU; À0:109Þ, w 5 ¼ ðVU; 0:035Þ, w 6 ¼ ðU; À0:109Þ. Keeping in mind the number of criteria, in this case six, the equal importance for all of them would be 1=6 ¼ 0:167 and it does not seem logical to provide a ''unimportant'' importance for weights above the average mean, for instance w 2 .
The previous limitation will be fixed in the proposal introduced in Sect. 3 using a fuzzy unbalanced linguistic scale for representing the linguistic weights [26].
Another key issue, when a MCDM method deals with pairwise comparisons, is the evaluation of their consistency, since inconsistent preferences implies unreliable results. In [23] was introduced a consistency ratio based on a random average consistency index (RACI). The RACI represents the average value of consistency n generated by selecting N random configurations of LBWCMs for a given n and a BW , thus it depends on both the best to worst ratio a BW and the number of the objects under consideration. Table 3 shows the RACI values depending on the number of objects to compare (rows) and the best to worst ratios (columns).
Based on the notion of RACIðn; a BW Þ, a consistency ratio for a given LBWCM is defined as follows: According to [23], CRðn; a BW Þ 0:35 is an acceptable consistency threshold because of it is not very restrictive and guarantees sufficient consistency to generate reasonable results.

Consensus Reaching Process
Decision-making often involves groups to obtain better decisions. Group decisions are usually better accepted when the solution is agreed by all decision-makers involved in the group decision problem [20]. The achievement of this agreement usually implies a CRP before selecting the best alternative for the group decisionmaking problem.
Specifically, we propose the use of a minimum cost consensus (MCC) model to optimize the CRP and obtain quick agreed solutions. MCC was introduced by Ben-Arieh and Easton [40] aiming at minimize the overall cost of moving all decision-makers' opinions to achieve the agreement. Taking into account the MCC concept, Zhang et al. studied how the level of agreement in the group can be different according to the selected aggregation operator for computing the collective opinion and proposed a new MCC model [38]. The properties of the latter model were investigated under the situations that the weighted average operator or the ordered weighted average operator are used to compute the collective opinion [38].
Recently, some researchers have paid much attention on the model proposed by Zhang et al. [38] and have introduced some new MCC approaches [24,[41][42][43]. Particularly, Labella et al. [24] introduced a MCC approach in which the consensus computation within the decisionmakers group is considered. According to their research, the classical MCC cannot guarantee a minimum level of agreement for the group of decision-makers, but just a maximum distance between each decision-maker's opinion and collective opinion. Therefore, a CMCC that modifies the model introduced by Zhang et al. in [38] was developed by including the computation of consensus level: Definition 4 [24] Let ðo 1 ; . . .; o m Þ be the original assessments provided by a set of decision-makers E ¼ fe 1 ; e 2 ; . . .; e m g over an alternative. Suppose that after CRP, the decision-makers' assessments are modified into ðo 1 ; . . .; o m Þ, and a collective opinion o is obtained based on the modified assessments, and ðc 1 ; . . .; c m Þ are the cost of moving each decision-maker's opinion 1 unit, respectively. The parameter e is the maximum acceptable distance of each decision-maker to the collective opinion. The MCC model based on a linear cost function is given as follows: where F is an aggregation function, consensusðÁÞ represents the consensus level achieved and c is equal to 1 À l, being l 2 ½0:1 a consensus threshold defined a priori.

Labella et al. proposed several CMCC models based on different measures to compute the group agreement level.
Here we introduce the one used in this proposal: where / k 2 ½0:1 is the decision-makers e k 's weight and P m k¼1 / k ¼ 1.

Related Works on Good Governance in the Third Sector
A brief but insightful review of main researches done in good governance in the Third Sector pointing out its importance and challenges is provided here. Accountability in the Third Sector is crucial [44] to achieve NPO's social mission [45], and as a way to strengthen trust, improving relationships and demonstrating transparency in its activities within the community [11]. In this sense, accountability practices are considered for users as NPO's overall commitment to transparency, based on legitimacy [46] which reaffirms their relevant contribution to the society.
To carry out activities for the community, NPOs are financially dependent on several internal actors, such as their own members and beneficiaries, as well as external founders [47]. Due to its social nature, NPOs obtain funding [48]; however, that money must not be used for their personal benefit [6]. Besides, taken into account the competitive environment, where donors have multiple options, maintaining positive perceptions of trustworthiness have been proved to be decisive for the existence of the sector as a whole [49]. Consequently, trust in the Third Sector is essential. Good governance has become the most valuable element for NPOs to achieve the social credibility that allows ensuring their future.
Unfortunately, fraud cases have triggered a crisis of confidence in the sector [50,51]. Founders are concerned with allegations of corruption among NPOs [52]. Furthermore, the controversial behavior of some NPOs has resulted in increased efforts to analyze and improve the reputation of the sector [53]. In summary, it is observed the increasing need of accountability due to the inappropriate behavior of some NPOs, which have damaged the credibility of the organizations that conform the Third Sector [54].
To solve this, it is necessary to carry out ethical practices that are visible for their stakeholders. Under these circumstances, the concept ''good governance'' arises. Taking this view, non-profit governance should relate to all stakeholders involved in a NPO [44,55,56]. Non-profit governance and accountability are social and dynamic processes [57], becoming a central concern for NPOs [3]. This is reinforced with the view in prior literature that NPOs are perceived as more effective when they manage to align the diversity of expectations of stakeholders with good governance [56].

A Consensus Model Based on Linguistic BWM for MCGDM
With the aim of obtaining agreed weights for a MCGDM such as we intend in the good governance for the Third Sector problem, it is necessary to apply a consensus process to smooth disagreements in the final weighting of the criteria. Therefore, our proposal consists of a CMCC model based on the 2-tuple BWM that will obtain the importance of the criteria in a consensual way. The steps of this new approach are further described below and graphically depicted in Fig. 1: • Step 1: decision-makers select the best and worst criterion according to their view and provide linguistic pairwise comparisons between them and the rest of the criteria using the linguistic scale represented in Table 1. Then, the pairwise comparisons are transformed into linguistic 2-tuple values. • Step 2: for each decision-maker e k the criteria weights are obtained ðw k 1 ; w k 2 ; . . .; w k n Þ by applying the 2-tuple BWM (see Sect. 2.2). Due to the 2-tuple BWM allows to represent the weights linguistically, the decision-makers may choose whether they prefer to obtain weights represented numerically or linguistically. However, as it was aforementioned, the fuzzy linguistic scale used for representing the weights in the 2-tuple BWM and represented by fuzzy triangular numbers it is not suitable either syntactically or semantically to represent the real importance of the weights. In order to overcome such a weakness, it is reasonable to develop an unbalanced linguistic scale [26] that takes into account that the average weight for all elements cannot be considered as unimportant but rather than as ''average''. In spite of multiple proposals for building unbalanced linguistic term sets [26,58], such proposals proposed methods that fix the semantics prior to the number of elements to be assessed. However, in the BWM the importance of the elements are related to the number of elements, i.e., the average importance for 5 elements is 0.2 but for 9 elements is 0.11. For this reason, in order to express properly the importance of the elements in a linguistic way, we propose the use of a fuzzy unbalanced linguistic scale whose semantics is built according to the number of the elements that will be assessed in the 2-tuple BWM. Such a fuzzy unbalanced linguistic scale, U BWM w ðnÞ, aims at building a terms set that will be a fuzzy partition in the sense of Ruspini [59] and then develop the terms set taking into account that the ''Average Important (AI)'' linguistic term represents the average importance value thus, its core is the value of the universe of discourse ([0, 1]) that is equal to 1/n, being n is the number of elements to compare. Hence, if all the elements in the problem solved by the BWM were of equal importance, their linguistic weights would be (AI, 0). Then, the ''Unimportant'' and ''Average Important'' linguistic terms are distributed from the average importance value 1/n. The remaining terms are adjusted to build the fuzzy partition such that their central values are symmetrically distributed in the interval [1/n, 1]. In other words, the distance between the core of the 3 remaining triangular fuzzy terms is derived as 1Àð1=nÞ 3n . Note that, in spite of the fuzzy unbalanced scale semantics changes depending on n, its syntax remains unchanged. This syntax is composed by triangular fuzzy linguistic terms whose meaning facilitates the readability of the linguistic weights. (see Table 4).
where x h represents the value of the coordinate x of the centroid [60] of the linguistic term s h , g þ 1 the cardinality  of U BWM w ðnÞ and j is the index of the linguistic term whose centroid is the closest one to w i . At this point, from each decision-maker e k 's preferences we obtain their corresponding criteria weights ðw k 1 ; w k 2 ; . . .; w k n Þ represented linguistically to improve their readability. However, the decision-makers may present different points of view and disagreements among the importance of the criteria. To smooth such disagreements and achieve an agreed solution, a CRP is included [20][21][22] in the resolution process. Particularly, we apply a CMCC model that guarantees to achieve a desired level of agreement between the decision-makers by changing as less as possible their preferences regarding the criteria weights. In this way, we will obtain a collective consensual weight for each criterion in which decision-makers agree, noted as ðw Ã 1 ; w Ã 2 ; . . .; w Ã n Þ. • Step 3: to obtain the consensual weights, the computed linguistic criteria weights for each decision-maker e k , ðw k 1 ; w k 2 ; . . .; w k n Þ are transformed into their numerical representation in [0, 1] using Def. 3 and then used as inputs in the consensus model to achieve the agreed solution. Among the different automatic consensus models pointed out in Sect. 2.3, we have chosen the CMCC model, because it models not only the distance but also the level of agreement. Therefore, the CMCC model (M-2) is used to obtain the consensual collective weights for the criteria ðw Ã 1 ; w Ã 2 ; . . .; w Ã n Þ. According to the decision-maker choice, these weights can be represented numerically or transformed again into 2-tuple linguistic values using the fuzzy unbalanced scale shown in Table 4 and Preposition 2 improving the interpretability of the results. The CMCC model used in the proposal is described in further detail below: Note that this proposal uses the CMCC model for deriving the consensual weights because of the following reasons: 1. To achieve a consensual solution implies to modify the initial experts' preferences. The CMCC aims at minimizing the cost of modifying experts' preferences thus, it guarantees to reach a consensual solution by changing as less as possible the initial experts' opinions, which is, in fact, the best solution that can be obtained for the problem. 2. Classical MCC models guaranteed only a maximal distance between each expert and the collective opinion, the smaller the distance the higher the consensus within the group. However, it does not imply to reach a desired level of consensus. The CMCC model [24] adds an additional constraint related to the consensus computation that guarantees to achieve a solution with a required level of consensus represented by the parameter c. 3. The CMCC is an automatic optimization model that does not requires the participation of the experts in the consensus process. In this way, the consensus process is faster and possible deadlocks are avoided. In addition, a real consensus process would not get a better solution since the CMCC model preserves as much as possible the initial experts' preferences. Therefore, the model (M-2) modifies the initial weights of each decision-maker obtained from the 2-tuple BWM and guarantees to compute global weights for the criteria by preserving as much as possible the decision-makers' views regarding their importance in a quick and easy way. All the previous steps have been summarized in the following algorithm:

Weighting Good Governance Indicators: A Spanish Case Study
The CONGDE (2019) tool defines a set of indicators necessary to regulate and guarantee an adequate level of good governance of the NPOs, in a responsible exercise of self-regulation [27]. For each indicator, CONGDE proposes the following information: (1) the weight that indicator has within the block of which it is part and (2)  Therefore, the aim of this case study is to derive the priority of the list of indicators proposed by CONGDE using the linguistic BWM approach introduced in Sect. 3. Figure 5 represents a general system diagram with the different steps carried out to solve the case study. Table 5 shows the values of the parameters that are necessary such an approach in the resolution of this case study and the different steps that compose the process are described below.
Remark 2 Note that in the resolution of this case study, the opinions of all experts have been considered equally important as the cost of modifying them.

Step 1
First, the decision-makers should select the best and worst indicators according to their expertise and then compare each one with the rest of indicators. In this case study, we ask to 5 decision-makers, who representing some of the most important Spanish social entities, about these issues by means of a questionnaire. This questionnaire is available online 2 . The decision-makers' preferences are shown in Tables 6, 7, 8, 9, 10, 11 and 12. These tables show, for each decision-maker, the best indicator (C B ), the worst indicator (C W ), the pairwise comparisons between the best indicator and the rest of them (BO), the pairwise comparisons of the rest of indicators with the worst one (OW) and the consistency of the opinions obtained from Eq. 3.

Remark 3
Note that all the decision-makers' preferences are consistent according to the consistency threshold defined in [23].

Step 2
Lately, the importance of the indicators are derived from the decision-makers' opinions using the non-linear programming model (M-1). The indicators weights for each decisionmaker are shown in Tables 13, 14

Step 3
From the resulting individual weights, it is clear to see that decision-makers disagree in the importance of some indicators. In order to achieve an agreed solution that satisfies all the decision-makers, we apply a CMCC model (M-2) to smooth automatically such disagreements and obtain consensual collective weights for each indicator. The consensual weights are shown in the column 'Consensus' both in a numerical and linguistic way, together with the weights assigned to each indicator by the CONGDE, in Tables 13,  14, 15

Discussion of the Results
This section discusses the resulting weights of each block of indicators obtained in the previous section. Table 13 shows general weights for the six blocks (BG1, BG2, BG3, BG4, BG5 and BG6) of general aspects to assess good governance in the Third Sector and Tables 14, 15 Experts' weights  Table 13 as it shows the consensual weights (represented numerically (N) and linguistically (L)) that decision-makers consider most        representative of good governance in NPOs for each of the 6 blocks. Besides, each of the institutions represented by each decision-maker is indicated by ''e'' (for example, expert 1 = e 1 ). According to Table 13, the six general aspects blocks (BG1, BG2, ...) receive the same linguistic term regarding their importance (L) by the decision-makers (AI; average important). One of the aspects is considered to be more representative of good governance as compared with the other five aspects, the block BG2 (numerical weight (N) = 0.296). This block contains eight indicators that measure the adequate management of the mission, vision and values of the organization. Moreover, with the exception of an individual respondant who considers the aspect to have a lower weight (e 4 , weight = 0.174), the remaining institutions give this block of general aspect a highest weight, thus confirming the superiority of this block as compared to the others in numeric terms. Second, the blocks BG4 ''Economic management'' and BG5 ''Human resources''. These blocks show, both linguistically (L) and numerically (N), the same weighting (AI, -0.162) as shown in the last column of ''Consensus''. Finally, we must note the lower extent of importance given to the BG6 ''Stakeholders'' aspects in terms of good governance, where only one individual respondent (e 4 ; 0.348) shows a very high weighting in this block. Table 14 presents the results from each indicator from the block BG1. Following the CONGDE order, we have started with the issues related to the governing body. As the CONGDE offers a possible weighting for each indicator, we will compare these indicative weights (CONGDE column) with the weights that have been validated with the BWM considering the decision-makers' opinions (Consensus column).

Results from BG1: Governing Bodies
A comparison between the BWM and the CONGDE values shows an overvaluation of several indicators (BG1.1, BG1.2, BG1.6 y BG1.9) by the CONGDE. This fact is most notably observed in two indicators (BG1.2 and BG1.6) by the CONGDE. BG1.2 referring to the proportion of women in the governing body and BG1.6 regarding the government body members that receive remuneration for other positions. CONGDE considers that both are the most important aspect regarding this block (weight: 0.15), while the decision-makers offer a considerably lower consensus value for the valuation of these indicators (consensus weight value: 0.071 and 0.104, respectively). Moreover, in the CONGDE set of indicators, BG1.2 indicator is reinforced by ''relevant'' as a degree of importance, while BG1.6 is categorized as ''inexcusable compliance''. These differences proposed by the CONGDE between the degree of importance and not between weighting values leads us to think that a revision of these indicators would be advisable. Regarding linguistics values, both decision-makers opinions (consensus weight) and the CONGDE present the same linguistic term (AI) for all the indicators that compose the block BG1. Finally, we draw our attention to the indicator BG1.5 that represents the fact that 80% of the members, minimum, attends at least 50% of the meetings. We observe here a great difference in the weight proposed by the CONGDE (0.1) and the one given by the decisionmakers (consensus value: 0.176). Moreover, the extent of compliance degree suggested by the CONGDE is also lower, they only considered it to be ''relevant'', rather than a criteria of ''inexcusable compliance''.

Results from BG2: Mission, Vision and Values
The second aspect that the CONGDE proposes to represent good governance is the adequate management of the mission, vision and values. Moreover, as we indicated above, this is the most highly weighted block of indicators according to the decision-makers. Table 15 shows the results from the BWM for the weighting of the indicators defined by this block, as well as those proposed by the CONGDE.
According to BG2, we find similarities in the weighting proposed by the CONGDE and the weighting drawn from BWM expert's analysis. This fact is reflected in a similar weight for the review of the mission every time the strategic plan is updated (BG2.4), the values review indicators every 10 years (BG2.6), and the definition and review of the mission, vision and values refer to the Coordinator's Code of Conduct and the Third Strategic Plan Social Action Sector (BG2.7). Regarding the degree of importance, the three indicators are proposed as  conduct, which is considered one of the most important indicators by the CONGDE, is underestimated by decisionmakers (consensus weight: 0.079), being the indicator that presents the lowest consensus value within the BG2 block.

Results from BG3: Planning and Evaluation
This block of indicators is considered, as we pointed out above, the second most important block according to the decision-makers. Once again, it is a block that presents many differences between the weightings of the decisionmakers and those proposed by the CONGDE. A detailed comparison is shown in Table 16.
According to the decision-makers, differences in practically all the indicators are observed in this block BG3. Moreover, this block has the greatest number of indicators for which the values proposed by the CONGDE differs more in magnitude with the decision-makers' opinions. Firstly, one of the indicator for which the CONGDE gives a higher weight (0.15) and degree of importance (inexcusable compliance) is the one referred to the fact that the strategic planning is specified in periodic operational schedules approved by the governing body (BG3.5). However, decision-makers do not consider this fact so important (consensus value: 0.093). Furthermore, decisionmakers consider this indicator with one of the lowest weights. Another indicator for which the CONGDE gives a higher weight (0.1) and a higher degree of importance (inexcusable compliance) is the indicator BG3.7: Governing body monitors and evaluates operational schedules.
Nevertheless, decision-makers do not consider this fact so important (consensus value: 0.047). Furthermore, decisionmakers consider this indicator as the lowest weight for the whole block BG3. In this sense, the indicator that the decision-makers consider to be most important (consensus value 0.156), is the public availability of a document that reflects the policy, system or procedure for monitoring and evaluating the NPO's own activity projects and programs, directly linked to the mission's fulfillment (BG3.8). This indicator is given a lower value by the CONGDE (weight: 0.1) and degree of importance (relevant). Lastly, the most valued indicator by the decision-makers in contrast to the CONGDE are the following: BG3.  decision-makers. Regarding linguistic values, we highlight two differences. First, BG3.3 (Consensus value: AI (average important); CONGDE: U (Unimportant)) and second, BG3.7 (Consensus value: U (Unimportant)); CONGDE: AI (average important)). The consensus value of the decisionmakers and the CONGDE present the same linguistic value (AI) for the remaining indicators.

Results from BG4: Economic Management
The forth aspect that the CONGDE proposes to represent good governance is about the economic management as the block BG4. Table 17 shows the results from the BWM for the weighting of the indicators defined by this block, as well as those proposed by the CONGDE. Most of the indicators reflecting a higher extent of good governance in terms of the economic management by the NPO show similar weights between decision-makers' opinions and those proposed by CONGDE. Both decisionmakers and the CONGDE agree that the most important aspects are BG4.1, about the public availability of the annual income and expense budget approved by the governing body (consensus value: 0.168; CONGDE: 0.15); and BG4.3, referred to the public availability of the annual budget settlement executed, reviewed and approved by the governing body (consensus value: 0.264; CONGDE: 0.15). This is also consistent with the higher degree of importance (inexcusable compliance) of these indicators. However, despite the fact that the CONGDE considers BG4.3 to have the highest weight (CONGDE: 0.15), it would be more logical to revise this magnitude and increase the weight for this indicator, approaching it to the decision-makers opinions (consensus value: 0.264). In short, this indicator should still be considered to be an ''inexcusable compliance'' one, increasing its weighting by the CONGDE, in consonance with the opinion of the decision-makers. Two of the indicators that are considered to be ''relevant'' in terms of representing good governance in the field of NPO's economic management present similar importance according to the decision-makers and the CONGDE and besides, both indicators have the lowest weighting for both proposals. Finally, two aspects are given considerably similar importance by the decision-makers and the CON-GDE; BG4.8, regarding no financier contributes more than 50% of total income for the year (Consensus value: 0.036; CONGDE: 0.05), and BG4.9, about the fact that the NPO does not accumulate liquid assets or financial assets in the previous audited year greater than the expense of the current year (consensus value: 0.052; CONGDE: 0.05). In linguistic terms, all the indicators form BG4 present the same values according to the decision-makers and the CONGDE. Table 18 presents the results from each indicator from the block BG5 about the human resources management to represent good governance in the Third Sector.

Results from BG5: Human Resources
Firstly, two of the indicators for which the decisionmakers give the higher weights are BG5.4, based on a list of organization profiles and the continuous development of its operational team (consensus value: 0.161), and BG5.5, regarding NPO that promotes training and continuous development of the operational team (consensus value: 0.179). On the contrary, the CONGDE gives a weight valued at 0.1 and a degree of importance of ''relevant''. In contrast, other indicators for which the CONGDE gives higher weights (0.15) but not a higher degree of importance (relevant) are the indicators BG5.7, about the volunteer plan that includes minimum objectives and activities (consensus value: 0.08), and BG5.8, about the percentage of women who are part of the responsible executive structure (consensus value: 0.045). In this sense, the decision-makers do not consider these facts so important and thus, they consider these indicators as two of the lowest weights for the whole block BG5. Our results suggest the need to decrease the importance given to the weights of these indicators. In addition, we highlight the indicator BG5.6, related to the agreement model for volunteering. On the one hand, the CONGDE gives it the lowest value of the entire block (0.05). On the other hand, CONGDE considers it ''inexcusable compliance'' in terms of the degree of importance. We also highlight how the decisionmakers give it a value of something more than double (consensus value: 0.104) than that proposed by CONGDE. Finally, regarding linguistic values, we noticed two differences. First, BG5.6 (consensus value: AI (average important); CONGDE: U (Unimportant)) and second, BG5.8 (consensus value: U (Unimportant); CONGDE: AI (average important)). For the remaining indicators, the consensus value of the decision-makers and the CONGDE present the same linguistic value (AI; average important).

Results from BG6: Stakeholders
Lastly, Table 19 presents the results from each indicator from the block BG6 regarding the relationship and communication with stakeholders to measure the levels of good governance in the Third Sector.
This block BG6 presents several differences on the values proposed by the CONGDE and by the decisionmakers' opinions. The first one, the indicator BG6.1, for which the decision-makers give the highest weight (consensus value: 0.255), is referred to the fact that the partnership policy approved by the governing body defines relationships with entities with which it carries out its projects. However, the CONGDE does not consider this fact so important (0.1) and its degree of importance is categorized only as ''relevant''. Therefore, BG6.1 presents a weight of more than double by the decision-makers. The opposite occurs with the indicator BG6.2 about the collaboration agreement model to be signed with local partners and/or local executing entities that contains purpose, rights and obligations, and duration. This indicator is undervalued by the decision-makers (consensus value: 0.052). However, the CONGDE provides a weighting of almost double (0.1). In addition, the degree of importance is of ''inexcusable compliance'', which differs from the opinion of the decision-makers. Finally, this block shows linguistically the same value (AI, average important) for all the indicators except for BG6.2 (consensus value: U; CONGDE: AI).
As a final comment on the results of the 6 blocks of indicators, there are many differences that are underscored in the importance of most of the indicators proposed by CONGDE to measure desirable levels of good governance in an NPO. As a general conclusion, we have contributed with this analysis to the research field by proposing an optimal weight for the importance of those aspects as indicative of the extent of good governance measurement in the Third Sector. Considering all the above and observing the discrepancies found between the CONGDE and the decision-makers in many of the indicators, we consider it is necessary to review the importance that the CONGDE proposed for each indicator in comparison with the consensus value that the decision-makers in the nonprofit sector grant.

Comparison with Existing Studies
There have been some studies focusing on weighting attributes approaches [19,23,61] with different features. These approaches will be analyzed and compared with our proposal regarding 3 main aspects related to the resolution of MCGDM problems: the management of experts groups, the processing of consensual solutions and the use of linguistic information to model opinions and results. Table 20 shows a summary of these characteristics for the studied proposals.
1. Rezaei introduced for the first time the BWM in [19]. This new approach was proposed to solve some behavioral errors in similar MCDM methods and reduces the number of pairwise comparisons and inconsistency in experts' preferences elicitation. Although our proposal is mostly based on this one, it does not consider the resolution of MCGDM problems where several experts are involved in the decision process and thus, the disagreements among them are omitted. Additionally, the classical approach also does not consider the use of linguistic information to model experts' opinions and results. 2. Labella et al. [23] introduced the 2-tuple BWM, which is the BWM approach in which our proposal is based on. Nevertheless, this proposal uses a symmetrically distributed linguistic terms set to model the linguistic results, which is not convenient to represent properly the criteria importance (see Example 1). In addition, although this approach faces the resolution of MCGDM problems, it ignores the consensus among experts. 3. Safarzadeh et al. [61] proposed a MCGDM approach based on the BWM. Again, this proposal is able to deal with group of experts but does not apply any consensus approach to smooth the possible disagreements among them. Furthermore, this proposal models experts' preferences using the classical numerical scale used in BWM and does not provide a linguistic representation of the information.

Conclusions
Non-profit good governance has become a necessary mechanism for the supervision of NPOs as it would allow NPOs to better achieve their goals through the most effective, transparent and objective use of their resources, providing an adequate ethical content for their stakeholders. Therefore, the main objective of a good governance mechanism should be the increase and improvement of the accountability practices of NPOs. Several institutions have provided different list of indicators to measure the levels of non-profit good governance that are either not weighted or are weighted without a theoretical or empirical basis. In this paper, it has been proposed the use of a MCGDM approach for modeling the problem of weighting the list of indicators in a consensual way. Our proposal takes as basis the 2-tuple BWM that allows deriving the indicators weights and providing a linguistic representation of them to facilitate their understanding. Moreover, we have extended the 2-tuple BWM to deal with fuzzy unbalanced linguistic scales that represent better the importance of the indicators. Afterwards, we have applied a consensus process based on CMCC models to remove possible disagreements in the indicators weights provided by the decision-makers. To show the validness of our proposal and contribute to improve the measurement of good governance in NPOs, we apply our MCDGM approach to the weighting of several indicators proposed by the CONGDE. Through this research and analysis of the results obtained, we have found significant differences between the weights of the indicators proposed by CONGDE and those obtained from the decision-makers. Bearing in mind the previous paragraph, the main findings of this proposal can be summarized as follows: • The 2-tuple BWM has been significantly improved.
The classical proposal uses a symmetrically distributed fuzzy linguistic term set to represent linguistically the experts' weights. However, both the syntax and semantic of such a linguistic term set is not suitable to represent appropriately the importance of the criteria. We have defined a fuzzy unbalanced linguistic terms set based on the number of elements to compare whose syntax and semantic allow to obtain more comprehensive linguistic results.
• The BWM based-approaches do not consider disagreement among the experts. For this reason, we have incorporated a consensus approach based on a CMCC model to derive the collective consensual criteria weights obtained from the 2-tuple BWM. • We have applied our proposal to a real case study focuses on measuring the good governance in NPO. The analysis of the results has allowed us to build a battery of indicators that is appropriately validated by the opinion of NPO accountability decisionmakers, giving an optimal weight to each indicator. Therefore, our study contributes not only to previous literature on good governance of accountability for NPOs by proposing a rigorously mechanism based on a set of indicators, but it also contributes to regulators and professionals. This mechanism is supported by the need to adhere to the standards of ethics and honesty that have been consolidated in the non-profit sector as relevant tools to cultivate the image of the NPOs.

Appendix A: Description of the Indicators
The indicators to be weighted proposed by the CONGDE for measuring good practices in NGO are described below: • -BG5.9: There is a gender policy approved by the governing body in the organization.