An extended best–worst multiple reference point method: application in the assessment of non-life insurance companies

In this paper a multi-criteria decision-making (MCDM) method is developed to rank a set of insurance companies. The proposed method is based on combining two MCDM methods: Extended Best–Worst (EBW) and Multiple Reference Point (MRP) methods. We formulate the problem of finding a priority vector from a set of interval pairwise comparisons applying an EBW method which allows the decision maker (DM) to use interval values in order to describe the relative importance of one criterion over another. The EBW method, using fuzzy set theory, can successfully handle the vagueness and ambiguity present in the judgments. Lastly, the MRP method is employed to obtain an overall score for each company using the weights established at the first stage. A case study is presented to rank Spanish non-life insurance companies based on the constructed model. Since the evaluation of insurance companies involves a great number of indicators, it is a complex MCDM issue. The results show the effectiveness of the proposed method and offer an insightful reference for an evaluation of the insurance industry.


Introduction
The insurance market plays a key role in the financial markets and the economy of any country. Insurers provide security to companies and individuals who transfer their risks to them in exchange for the payment of a premium (Rejikumar et al. 2019). At the same time, insurance companies collect the premiums and invest them in the financial markets contributing in this way to economic development (Akyüz et al. 2020). Thus, insurance functions can be resumed in assurance and intermediation.
Assurance is the main service provided by insurers. It is related to risk-pooling and risk bearing services, as well as to the "real" financial service (Cummins and Nini 2002;Cummins and Rubio-Misas 2006;Leverty and Grace 2010). Insurers use the premiums received to pay claims. For this purpose, they incur actuarial and subscription expenses that, together with the capital reserve for extraordinary claims, allow the creation of value added. Real financial service is related to financial advisory services, risk management or loss prevention. Within the intermediation function, insurers obtain funds from their policyholders through premiums and invest them in financial markets in order to guarantee the payment of claims when they occur. This service is especially relevant in life insurance, where the moment of collection of the premiums and the payment of the claim are, normally, separated in time. Furthermore, this type of insurance guarantees an additional interest.
The competitiveness of insurance companies relies on their ability to maximise profits and improve the quality of their services. That is why the efficiency of insurance companies has long been studied in the literature (see, e.g., Eling and Luhnen 2010; Weiss 2013 for a review). Efficiency measures the ability of companies to obtain the maximum output with a given amount of input, or conversely, use the least amount of inputs to achieve a certain level of output. The models to study efficiency are the non-parametric Data Envelopment Analysis (DEA) model and the parametric Stochastic Frontier Analysis (SFA) model. Both models in their different variations have been applied to the insurance market and allow a ranking of companies according to their level of efficiency. This allows companies to know their positions relative to their competitors. But although efficiency and profitability are usually related, only considering efficiency suffers the weakness that it does not guarantee profitability (Ventakateswarlu et al. 2016), nor financial strength (Eckles and Pottier 2011).
Meanwhile, the profitability of a company is related to its ability to make its income exceed its expenses and it can be measured through financial ratios which are widely used in the literature. It is related to its financial strength, given that better financial health is interpreted as presenting a greater likelihood that the company will meet all its obligations, thereby bearing a lower risk of insolvency and a greater likelihood of profit (Eckles and Pottier 2011). Analysing different financial ratios to rank companies for better performance, allows the introduction of different criteria such as profitability, liquidity, solvency, among others. Therefore, in order to find the best non-life insurance companies from a pool, it is necessary to consider multiple but usually conflicting criteria. Thus, a company Labib 2011 and Zhang et al. 2021). In contrast to the aforementioned approaches, most of the prioritisation methods are based on the distance minimisation between the empirical pairwise priority ratio and the corresponding theoretical pairwise priority ratio. Zhang et al. 2021 present a revision of the most famous methods and they propose a novel methodology implying both additive and multiplicative deviation relations. Their approach introduces two norms giving rise to four conic programming models.
In our paper, this first phase of the proposed hybrid MCDM methodology is carried out by an extension of the prioritisation method used in AHP. We work with the linear scale proposed by Saaty (1977) and interval values are used to represent the pairwise comparisons. We extend the Best-Worst (BW) method (Rezaei 2015(Rezaei , 2016Lahri et al. 2021) for constructing the incomplete interval comparison matrix. This new methodology, introduced as Extended Best-Worst (EBW), is based on a modification of Mikhailov's approach (2004) for obtaining the priority vector. It consists in solving a minimax problem that gives both the optimal weights and a measure of the consistency between the proposed ratios and the achieved ratios. The second phase, related to the aggregation process, is addressed by applying the Multiple Reference Point (MRP) method which performs two tasks. Firstly, a normalisation of the scores of the companies based on several reference points is carried out, permitting the DMs intervention for the purpose of setting the latter reference points. Secondly, several aggregation procedures based on fully compensatory, noncompensatory and partially compensatory operators allow to obtain a unique global score for the joint performance of each company. These scores share the same interpretation as the scale defined by the set reference points.
The methodology contributions of this study, represented by the Extended Best-Worst Multiple Reference Point (EBW-MRP) method, provides new tools for analysing the global performance of insurance companies. We see the EBW results as a foundation for prioritising criteria in an understanding and easy way for DMs. In addition, the application of a MRP method helps to rank the best alternatives according to expert knowledge.
We apply our methodology to the Spanish non-life insurance market. Since the 1980s, the Spanish insurance market has suffered a restructuring and consolidation process. The consequences of the latter have been studied in papers such as Cummins and Rubio Misas (2006) or Cummins et al (2017) for example. We have chosen the Spanish insurance market, because despite the high impact of the financial crisis in Spain, the non-life insurance sector has managed to grow by more than 6.5% over the period (Mapfre 2018. This highlights the interest in studying the financial assessment of companies operating in the Spanish non-life insurance market. Between 80 and 83 insurers are considered during the period from 2009 to 2017. The criteria were chosen based on the academic literature and on the data published annually by the Dirección General de Seguros y Fondos de Pensiones (DGSFP).
The rest of the paper is organised as follows. Section 2 presents the proposed methodology, an Extended Best-Worst Multiple Reference Point Method. The next section is devoted to the empirical application. Our database consists of Spanish non-life insurance companies evaluated with respect to eight criteria. The main results of our real application are discussed in Sect. 4, with nine rankings obtained that allow an assessment for the period analysed. Finally, conclusions are discussed in Sect. 5.

Extended best-worst multiple reference point method
Throughout this section, it will be assumed that we are managing a set of M units U 1 , U 2 , ..., U M and N criteria, C 1 , C 2 , ..., C N , which play an important role in evaluating the units in order to arrive at a decision. To do this, we formulate an Extended Best-Worst Multiple Reference Point (EBW-MRP) method considering the most important criterion (best) and the least important criterion (worst) for the DM. The procedure is divided into two phases, which are described in detail. In the first phase, a novel BW method is proposed to calculate the criterion weights considering that the relative preference of the i-th criterion over the j-th criterion is not precisely expressed by the expert since their judgment is imprecise. These weights are incorporated into a MRP model , in the second phase, to obtain the global scores of the units from which a ranking can be obtained (see Fig. 1).

Phase 1. Extended Best-Worst (EBW) method
In the classic BW model (see Table 1), it is assumed that the preference value on the i-th criterion to the j-th criterion determined by experts is accurate. However, in some uncertain/imprecise situations the DM may not be able to provide exact point judgements and she/he expresses her/his preferences through linguistic labels, such as 'approximately or about a ', instead of exact numerical values a . The use of fuzzy logic tools (Zadeh 1965;Bellman and Zadeh 1970) may be useful to attempt at mechanisation or formalisation of human reasoning. Some research has been developed to handle vague and uncertain information in the BW method. For example, Mou et al. (2016Mou et al. ( , 2017 propose an intuitionistic fuzzy multiplicative BW model for multi-criteria group decision making and use the graph theory to find the best Fig. 1 Extended best-worst multiple reference point method and worst criteria; Guo and Zhao (2017) extend the BW method to a fuzzy environment and describe the reference comparisons for the best criterion and for the worst criterion by triangular fuzzy numbers; Aboutorab et al. (2018) address the problem of the uncertainty of real world decisions with Z numbers that integrate in a BW method and obtain triangular fuzzy weights; Pamučar et al (2018) presents a new BW approach where the treatment of uncertainty is based on interval-valued fuzzy-rough numbers; Ren (2018) develop an interval BW method for determining the interval weights of the evaluation criteria and the relative performances of the alternatives with respect to the soft criteria; Ali and Rashid (2019) investigate a BW method in which the uncertain evaluation values are represented by hesitant linguistic term sets; Mi and Liao (2019) implement the weight-determining process by a hesitant fuzzy BW model; Fei et al. (2020) propose an evidential BW model based on the theory of belief functions, which is employed for evaluating hospital service quality.
In this paper, the uncertain/imprecise judgements are represented by interval ratios (Mikhailov 2003(Mikhailov , 2004Arbel and Vargas 2007;Wang and Elhag 2007;Yue, 2012;Ahn 2017;Ren 2018;Acuña-Soto et al. 2019). A novel weighting problem is then formulated and solved as a max-min mathematical programming problem for obtaining an optimal crisp weighting vector that maximises the overall degree of satisfaction of the DM.
To derive priorities from uncertain judgements, Saaty and Vargas (1987) construct an interval reciprocal comparison matrix of the type: where a ij = l ij , u ij represents the relative preference of criterion i to criterion j , which is an interval judgement.l ij and u ij are the lower and the upper bounds of the interval. The range of bounds is assumed to be between 1∕9 and 9 inclusive (Saaty 1977), taking into account that a ij = a −1 ji and the operations on closed intervals we have: Table 1 Best-worst method (Rezaei 2015) Step Description Step 1 Identify a decision criteria set, C 1 , C 2 , ..., C N Step 2 Determine the best and worst decision criteria, C B and C W , for the DM Step 3 Determine the DM's preference degree of the best criterion over all the other criteria: Step 4 Determine the DM's preference degree of all the criteria over the worst criterion: Step 5 Find the optimal solution of criteria weights w * 1 , w * 2 , ..., w * 1 3 An extended best-worst multiple reference point method:… Definition 1 An interval pairwise comparison a ij is defined as a reference comparison if i is the best criterion and/or j is the worst criterion. According to Rezaei (2015), N(N − 1)∕2 interval pairwise comparisons are not needed to obtain the complete interval comparison matrix and it is sufficient to determine the 2N − 3 reference comparisons. This is the basis principle of the interval weighting method that we are going to formulate below.
The steps of the EBW method that can be used to obtain the weights of the N criteria, w * 1 , w * 2 , ..., w * N , are described in what follows: Step 1. Determine the best criterion, C B , and the worst criterion, C W .
Step 2. Determine the preference of the best criterion over all the other criteria by mean of interval judgements. The resulting best-to-others interval-vector would be: where a Bj = l Bj , u Bj indicates the interval preference of the best criterion C B over the criterion j.
Step 3. Determine the preference of all the criteria over the worst criterion by mean of interval judgements. The resulting others-to-worst interval-vector would be: where l jW , u jW indicates the interval preference of the criterion j over the worst criterion C W .
Step 4. Find the optimal crisp weights vector w * = w * 1 , w * 2 , ..., w * N . In this work, a crisp priority vector w = w 1 , w 2 , ..., w N is admissible with respect to the best-to-others interval-vector, A B , and the others-to-worst interval-vector, A W , if it verifies where ≤ denotes the fuzzified version of ≤ and reads 'approximately less than or equal to'.
(2) a ij = l ij , u ij = 1 u ji , 1 l ji The weights w * 1 , w * 2 , ..., w * N will be optimal if they satisfy the fuzzy inequalities (5) and (6) with the highest degree of membership. Therefore, in order to find the optimal weights a fuzzy nonlinear problem must be solved.
In order to linearise the fuzzy problem, (5) and (6) are transformed into the following fuzzy linear inequalities: The range of approximate satisfaction of (7) and (8) can be defined as extended intervals 0, d l Bj , 0, d u Bj , 0, d l jW and 0, d u jW where d l Bj , d u Bj , d l jW and d u jW are tolerance parameters for the corresponding intervals. 1 As the two constraints in (7) relate to the same interval, we can represent them as a linear satisfaction function (see Fig. 2) corresponding to the lower and upper bounds that expresses the DM's satisfaction with its accomplishment (Mikhailov 2004;Chen and Xu 2015):  is the extended middle of the interval l Bj , u Bj where the greatest satisfaction is achieved. This function has a maximum, max Bj , when w B w j = m Bj . In this case the DM should be 'most satisfied'. Otherwise, the DM is 'satisfied' when l Bj < w B w j < u Bj , and she/he is 'partially satisfied' when w B w j takes a value within the admissible We have a similar function for the two constraints in (8): The function (10) has a maximum, max jW , when w j w W = m jW . 2 In this case the DM should be 'most satisfied'. Otherwise, the DM is 'satisfied' when l jW < w j w W < u jW , and she/he is 'partially satisfied' when w j w W takes a value within the admissible interval Taking into account the above satisfaction functions, we can solve the weighting problem applying the Zimmermann fuzzy programming approach (1976). To do this, the feasible set of weighting vectors is defined: The fuzzy subset P , whose membership function P(w) is the intersection of the satisfaction functions defined in (9) and (10): This membership function represents the overall satisfaction of the DM with a specific weighting vector w.

Definition 2
A vector w * = w * 1 , w * 2 , ..., w * N is an optimal weighting vector with respect to the best-to-others interval-vector, A B , and the others-to-worst interval-vector, A W , if it maximises the overall degree of satisfaction P(w) . i.e., As is well known, this problem is equivalent to solving the following linear program: The optimal solution of (14) is a vector (w * , * ) whose first component represents the weighting vector that maximises the degree of membership of the aggregated function P(w) , whereas * measures the degree of overall satisfaction of the DM with the optimal solution w * .
The optimal value * is also an indicator for measuring the consistency of the DM judgements and, for a given set of interval judgements, depends on the values of the tolerance parameters.

Definition 3
The DM interval judgements are strongly inconsistent if the optimal value * of (14) is negative. In this case the solution ratios are outside the extended intervals.
The following proposition avoids the problems of strong inconsistency in the DM interval judgments.
Proposition 1 (Mikhailov 2004). The solution of (14) verifies * ≥ 0 , if all deviation parameters d l Bj , d u Bj , d l jW , d u jW is greater than or equal to d * , where d * is the solution of the following linear problem: An extended best-worst multiple reference point method:… The weights obtained in this Phase 1 are incorporated into a Multiple Reference Point model García-Bernabeu et al., 2021;Cabello et al. 2021;Boggia et al. 2022) for ranking and choosing the best unit. .

Phase 2. Multiple Reference Point to calculate the global scores of the units
Let us denote by x ij the value of the i-th unit for the criterion j-th.
Step 1. Setting of the reference levels. Let us denote by q 0 j and q n+1 j , respectively, the minimum and maximum values that criterion j can take. For each criterion j , the DM gives n reference levels, q 1 j , q 2 j , ..., q n j , which define the performance levels of criterion j (e.g., very low, low, medium, high, very high or very poor, poor, fairly good, very good). Wierzbicki et al. (2000) mention several ways for establishing these reference levels. They can be defined in an absolute way by experts, in a relative way, applying a statistical scheme, or by setting all the reference levels equal to certain percentages of their respective criteria ranges.
Therefore, the (n + 2)-dimensional vector contains all the information relative to the reference levels of the criterion j . These reference levels can naturally define performance levels for each criterion, in absolute or relative terms, and the corresponding distance function measures the position of each unit with respect to these levels.
We assume a set of n + 2 real values 0 , 1 , ..., n , n+1 which define a common measurement scale for all the criteria. A piece-wise linear achievement function is used to turn each criterion j to the scale defined by the values k .
If the j criterion is of type "the more, the better" we consider: In this case s j x ij , q j is a an increasing piece-wise linear function. Therefore, the achievement function s j of criterion j takes values between k−1 and k if the unit achieves values between q k−1 j and q k j for the criterion j. If the j criterion is of type "the lower, the better" In this case s j x ij , q j is a decreasing piece-wise linear function. Therefore, the achievement function s j of criterion j takes values between n+1−k and n+2−k if the unit achieves values between q k−1 j and q k j for the criterion j.
Considering the weights obtained in Phase 1, w * = w * 1 , w * 2 , ..., w * N , and the achievement functions (17) and (18), we can build the global score of each unit. At this stage, we can obtain different types of composite measures.
The weak composite measure of unit i, WS i , allowing for full compensation among the criteria, is obtained using an additive weighted aggregation: which can be easily interpreted as the composite performance of the unit i with respect to the reference levels q j .
The strong composite measure of unit i, SS i , does not allow any compensation. In this case, following Ruiz et al. (2020), it is possible to modify the achievement functions (17) and (18) so that poor performance on a given criterion is not so bad if the criterion is not very important to the DM. For this, a normalisation of the weights is used, so that the highest weight takes the value 1: Then, the modified achievement function takes the form: Making use of (21) it is possible to build the strong composite measure as: Finally, we propose building new composite measures of unit i, PS i , for different compensation degrees considering the following convex linear combination of the weak and strong composite measure: The coefficient i associated to unit i is obtained as follows: Step 1: Set a threshold 0 < th ≤ 1.
Step 2: Rank (decreasing order) the criteria according to their weights: Step 3: Identify the minimum set of "important" criteria, C p(1) , C p(2) , ..., C p(H) , such that the sum of their weights is higher or equal to the threshold: Step 4: Calculate the coefficient i as below: Note that i is close to 1 when the unit i achieves good results for the most important criteria, therefore the weighted mean WS i is a suitable composite measure of performance. On the contrary, i is close to 0 when the unit i achieves bad results for the most important criteria and therefore, the compensation of the weighted mean should be corrected with a higher coefficient for the strong measure SS i .
We observe PS i = n+1 if and only if the unit i achieves the score n+1 for all criteria and PS i = 0 if and only if the unit i achieves the score 0 for all criteria.
The value of the global score obtained with the above composite measures is influenced by the choice of the reference levels q j .
The set of units can now be ranked according to the descending order of the global scores obtained by (19), (22) or (23).
In the next section we apply the proposed Extended Best-Worst Multiple Reference Point (EBW-MRP) method for ranking non-life insurance companies operating in Spain.

Case study: spanish non-life insurance companies
In 2017, the Spanish insurance market with a total volume of total direct premiums amounting to €62,451 million ($70,547 million) ranked 15th worldwide in front of other European countries such as Switzerland, Sweden, Belgium, Finland or Portugal, among others 3 . Despite this, in line with other international insurance markets, the recent crisis had an important impact on the Spanish insurance market. Nevertheless, the Spanish sector has proved robust, managing to overcome this economic cycle in a solid way (Manzano 2017). This is reflected by the evolution of the different financial ratios that it has presented in recent years. Specifically, the Non-Life insurance market, has seen an accumulated growth of 6.8% over the period 2007-2017, while in parallel Spanish GDP grew by 7.7% (Mapfre 2018). This economic growth is linked to an increase in the consumption capacity of homes and companies, therefore contributing towards the growth of the Non-Life insurance business.

Database
The present research studies the performance of Non-Life Spanish insurers over the period 2009-2017 (see Table A1 in the Appendix). The data have been collected from the Balance Sheets and Accounts of insurance entities that the Spanish regulatory and supervisory authority, the Dirección General de Seguros y Fondos de Pensiones (DGSFP), publishes annually.

Non-life insurance companies
In Spain, insurance companies can operate in life, non-life or simultaneously in both life and non-life branches of insurance. In this paper, we focus on those firms that operate exclusively in non-life business between the years 2009-2017. This criterion has the advantage that it makes the insurers more comparable. But, at the same time, it has the disadvantage that insurers such as Mapfre, Mutua Madrileña, or Allianz among others, with important market shares in Spain, but operating simultaneously in the life and non-life business, can be omitted. In Spain, insurers can have two organizational forms: private (stocks) and mutuals. The literature has always considered this a determining aspect of insurers' performance, with stocks being more profitable than mutuals, and with better performance (Cummins and Nini 2002;Gaganis et al. 2015). Given that among the companies analysed in our study, both types coexist (23 mutuals appear in our database for all years), we will also analyse whether or not our performance ranking supports this hypothesis.

Criteria
Similar to other research in the field (see Table 3), in order to approach the performance and profitability of insurers, different financial ratios have been calculated, all of which are typically used by international organisms such as The International Association of Insurance Supervisors or the OCDE to evaluate the global insurance market.
Premium growth ratio: it reflects the evolution of the yearly premium. If for an insurer company this ratio is above the mean of the market, it means that this insurer is expanding its share market.
Loss ratio: it indicates the percentage of premiums used to pay claims. The smaller the better. A high loss ratio would indicate poor risk management by the insurer and the need for greater control over future payments as well as the process of underwriting policies.
Expenses ratio: it reflects that part of the premium employed to pay the underwriting expenses, including acquisitions cost, commissions, administrative and general management expenses. This ratio reflects an insurer's ability to manage its daily

3
An extended best-worst multiple reference point method:… activity. The lower this ratio is, the more efficient is the insurer is in its management, this enabling it to obtain greater benefits.
Combined ratio: it is the sum of the loss and expenses ratios. This ratio shows a first approximation to the technical profitability of the insurer. Lower values of this ratio would imply that the entity chooses its policyholders well and manages expenses better. If the ratio is greater than 1, it implies that the expenses are greater than the premiums, which means that the insurer is not obtaining benefits with its underwriting activity; either because it has many claims or because expenses are too high. Insurers should compensate these losses with income derived from financial investments, which in non-life insurance have a less relevant role than in life insurance.
Return on equity (ROE): This ratio explains the relation of profit to shareholder capital. It is one of the ratios commonly used in the financial analysis of any firm. High values of this ratio indicate that the company is able to generate a lot of profit with less capital.
Technical ratio: This measures how much profit the insurer makes in relation to the premiums. It indicates the performance of the insurer derived solely from its underwriting activity. The higher the ratio, the greater its financial strength, as the company can generate more profit per income received, thus indicating good management and business efficiency.
Number of Insurance Business Lines: among Non-Life Insurance, up to 19 different lines of business can be distinguished: health, dependency, automobile, home … This variable includes the number of business lines in which each observed insurer operates. It gives us a proxy of the diversification grade of the insurer. Table 4 shows the main statistics of the criteria calculated for the Spanish Non-Life Insurance market for the period 2009-2017. 4    1 3 An extended best-worst multiple reference point method:…  (2018) report. The evolution of the variations in market shares confirms this trend given that, in recent years, many companies have improved their market share, probably due to the increase in the premiums written linked to the upturn of the Spanish economy.
The analysis of the Combined Ratio shows an efficient and healthy non-life insurance sector, as its mean values remained below 1 over the whole period. The big drop of this ratio in 2016 is explained by an important decrease in claims for this year also seen in the loss ratio. At the same time, the expenses ratio has seen an increase since 2004, parallel to the increase in premium writing in the market.
The profitability of non-life insurers can be approximated through their Technical Ratio and ROE. Both measures indicate that Spanish non-life insurers are good managers, generating an average profitability of 10% throughout the period, both on equity (ROE) and in terms of technical activity (Technical Ratio). In 2010, the effects of the economic crisis were felt in the Spanish insurance market and a significant drop in both the ROE and technical ratio occurred. Despite this, profitability levels recovered from 2011 onwards, once again demonstrating the financial strength of the sector. Figure 3 (graphic 8) shows the number of business lines in which non-life insurers operate. This variable is stable along the period, with almost 60% of companies only working in up to 3 branches. This is probably due to a large number of companies being specialised in specific branches such as: health, credit and surety or legal defence, among others.

Extended Best-Worst (EBW) Method to calculate the criteria weights.
In order to determine the weights of each criterion, model (14) will be solved. To do this, preferential parameters should be set. The ranking of the importance of the decision criteria is established according to expert opinion and is shown in the first column of Table 5 in descending order, the best criterion is the ROE and the worst is the number of business lines. Reference interval preferential ratios used for solving (14) are displayed in Table 5. 1 3 An extended best-worst multiple reference point method:… 5 Without loss of generality, we suppose that all the tolerance parameters are equal, The weights obtained by (14) using the deviation parameter 5 d = 0.1 are displayed in the last column of Table 6, the optimum value, * , is equal to 0.75 showing that the obtained ratios are within the extended intervals but not in the strict intervals. The value used for the deviation parameter d assures a degree of overall satisfaction greater than zero because the solution for model (15) gives a lower bound for the parameter d equal to 0.0254. In order to visualise the results of the EBW method in our application, Table 6 shows the preferential intervals established and the achieved ratios as well as the left and right spreads corresponding to these intervals.
On inspection of Table 6, we conclude that in the process of weighting the criteria for ranking Spanish non-life insurance companies, the most important criterion for the expert, ROE, has a weight of 0.279 and the worst, number of business lines, 0.032.

Multiple reference point to calculate the global scores of the non-life insurance companies.
Phase 2 of the process requires setting the reference levels. We have used three reference points (n = 3) for each criterion established according to both statistical values and expert knowledge. Specifically, the reference levels corresponding to Premium Growth, ROE, Technical Ratio and Number of Business Lines are obtained according to the three first quartiles of the data distributions (see Tables 7, 8, 910). For the criteria of the Loss Ratio, Expense Ratio and Combined Ratio, reference levels were set by the DM according to the values displayed in Table 11.

Results
The global scores obtained by applying the proposed EBW-MRP(WS) model to our database, according to the composite measure, WS i , proposed in (19), are presented in Table A2 of the Appendix. We have applied the Jarque Bera test for the nine series and normality cannot be rejected at the 5% significance level for all of them. Table 12 summarises the distribution of the EBW-MRP(WS) scores for the four tranches. Few companies achieve good results simultaneously for the eight criteria analysed. The same happens in the case of companies with poor results in all criteria This shows the conflict that exists between the different criteria that measure the comprehensive performance of the companies. For all the years the upper intermediate tranche includes more companies than the lower. Moreover, in most years there are more companies in the upper tranche than in the lower one. Therefore, we can point out a good performance of the Spanish non-life insurance market during the analysed period. According to Table 13, sixteen firms remain above the mean of the EBW-MRP(WS) scores for all the periods analysed (from 2009 to 2017). On the other hand, fourteen companies remain below the corresponding means. From Table 13, we conclude that mutual insurers rank worse than stock companies, as only one mutuality always appears above the mean (out of 16 firms); while 7 of the 14 insurers that are always below the mean are mutual ones. The hypothesis that the proportion of mutual insurance companies above the score mean is lower than the same proportion for the stock companies is accepted at the 5% significance level (p-value = 0.03408). In addition, the hypothesis that the proportion of mutual insurers always below the score mean is greater than the same proportion for the stock companies is accepted at the 5% significance level (p-value = 0.04306). It means that, according to the EBW-MRP(WS) score, stocks firms perform better than mutual insurers, confirming the results of previous research (Cummins and Nini 2002;Gagarnis et al., 2015). Table A3 in the Appendix displays the top 10 insurers for each year according to our scoring. For , 2014For , 2015 years it is possible to accept at the 5% significance level the hypothesis that the proportion of mutual insurers is less than the proportion of stock companies in the top ten of the best firms. We focus on the 7 insurers that score in this top 10 at least 5 times, their main financial ratios appearing in Fig. 4. Undoubtedly, EXPERTIA appears as the best firm during all the  observed period, ranking first for 5 years. Surprisingly, it is not one of the largest insurers and represented only 0.015% of the market share in 2017, because it only operates in two business lines -assistance and death insurance-. Nevertheless, being a smaller insurer is not a factor which impairs efficiency, profitability or an overall good performance. Looking at the financial health of EXPERTIA, we observe an important decrease of the Combined Ratio over the whole period due to an improvement in both the loss and the expenses ratios. This reveals an enhancement of the insurer's ability to manage risk and daily activity, which can also be seen in the positive evolution of its ROE.

3
An extended best-worst multiple reference point method:… The second-best firm is CAI, ranking 8 out of 9 years amongst the 10 best firms according to our scores. It had a market share of 0.048% in 2017 despite operating in up to 7 lines of business. As can be seen in Fig. 4.2, CAI experienced an improvement in its financial health since 2009. It diminished its Combined Ratio and improved its ROE and Technical Ratio. This indicates an important improvement in the capacity of the insurer to generate profits efficiently.
ACUNSA only operates in 2 lines of business, related with health insurance. It has a 0.39% of the market share. The behaviour of the financial ratios for ACUNSA is stable over the whole period and reveals a good performance and efficient management. The high value of the loss ratio is noteworthy, probably because the coverage provided is very costly (it is the Navarra Clinic insurance company, an expert in cancer and experimental treatments). Despite the latter, all the financial ratios of this insurer are close to the expected values for a financially healthy firm.
National-Nederlanden is the Spanish Non-life division of the international company of the same name. In our sample it represents 0.174% of the market share and operates in up to 8 lines of business. Its financial health is good throughout the period. It should be noted that the ROE practically doubles its value, but the Technical Ratio, although it increases, does not do so to the same extent, indicating that a significant part of the company's profit does not come from the underwriting activity. In addition, there is an increase in the Loss Ratio from 2010 onwards which, although it always remains at very acceptable levels, indicates that the company could improve its risk management.
ERGO is the most important insurer in travel insurance, operating in 7 lines with 0.066% of market share in our sample. Although its financial health is good, its behaviour is unstable over time. Until 2013 all ratios improved, although from that moment on the trend changed and the ratios showed a worse performance, mainly derived from an increase in claims. There is also a drop in the ROE, greater than that for the Technical Ratio, which indicates a decrease in the earnings from nonunderwriting activity.
Sanitas has the largest market share of those companies appearing at least 5 times in the top 10, with 9.785%. Its insurance activity is focused on 4 health-related business lines. Its financial health is good, and the evolution of its ratios is similar to that of ACUNSA (a company also focused on the health business), with a high value for the Loss Ratio. It should be noted that Sanitas, together with Línea Directa, is the one with the best ROE, standing at around 6%.
Línea Directa also appears 5 times in the top 10 and is the second by market share with 5.924%. It is the most diversified company, operating in up to 10 business lines with differing coverage: auto, home, health, etcetera. Although its financial performance has always been good, as of 2014 there has been a significant increase in the ROE reaching in 2017 a value higher than 0.6%. Furthermore, throughout the period an improvement in risk management translates itself into a decrease in the loss ratio and, consequently, in the Combined Ratio.
To sum up, the seven insurers that appear 5 or more times in our top 10, perform well and present good financial health, being efficient in terms of their business management and profitability. None of them is a mutuality, which reinforces the idea that mutual insurers perform worse than stock insurers. Finally, it seems that market share, and size is not necessary to guarantee a good performance, as most of the firms in this top 10 are not the biggest or those with the largest market share (see Table 14). Table 15 and Fig. 5 show the most important firms by share market for our sample. Even those insurers usually large firms, with important amounts of gross premium, are not necessarily ranked as the best ones. These insurers usually operate in up to 13 lines of business, so they are very diversified. Nevertheless, diversification does not indicate good performance or greater profit. In fact, as Cummins and Xie (2013) conclude: "the benefits of diversification come at a cost". González-Fernández et al (2020), also argue that while diversification allows earning more profits, reducing risk, achieving scope economies and providing higher revenues due to market size, diversification is also related to large and more complex organizations, that incur more management and underwriting costs, which can have a negative impact on profit and performance.
Kendall's (tau) and Spearman's (rho) rank correlation coefficients are calculated in order to analyse the relation between company rankings in different years. We have chosen for comparison those pairs of years in which the analysed insurers coincided. As observed in Table 16, a meaningful relation between the performance rankings is revealed with the latter analysis. There is a significant correlation between the performance rankings obtained through the EBW-MRP(WS) method for the different years. This shows that the resulting rankings are close to each other for each pair of years compared. Table 17 shows the ranking of the top 10 insurers, for the year 2017, according to the different composite measures, WS i , SS i and PS i , proposed in (19), (22) and (23), respectively. The last three columns of Table 17 correspond to the partial compensatory measure when we consider three different thresholds 0.65, 0.55 and 0.3. EXPERTIA appears as the best firm when applying the weighted mean WS i , but surprisingly this firms is not in the top 10 when applying the strong composite measure SS i . This is due to EXPERTIA being a non-diversified insurer with a very low value for the number of business lines criterion. When we consider a partial compensatory measure, this firm is not so penalized and again ranks in the top 10. Table 18 shows the Spearman and Kendall correlation coefficients between the insurers' rankings obtained for the different composite measures. There are high and significant correlations between the rankings obtained through the weighted mean WS i and the partial compensatory measures PS i for all the thresholds considered. When the relationship between SS i and WS i is analysed, the Spearman and Kendall correlation coefficients are lower. The top 10 insurers prove different for both measures. In addition to EXPERTIA, another three insurers disappear for the SS i ranking. 1 3 An extended best-worst multiple reference point method:… We also observe that when applying the PS i measure, the correlation with the WS i measure rises as does the threshold considered.

Sensitivity analysis
The sensitivity of the EBW-MRP(WS) model is analysed via changes in several parameters. We have modified the interval comparisons between pairs of criteria in line with Table 19. Here, keeping ROE as the most important criterion, the Loss Ratio is the second most important criterion, followed by the Technical and Combined Ratios. The resulting weights (last column in Table 19) reflect these preferences. In addition, the parameter d W has been set equal to 0.05. The value achieved by * is, in this case, equal to 0.641, therefore achieving consistency within the extended intervals. The Spearman correlation coefficient between the scores of the original and disturbed models is equal to 0.977 (Kendall coefficient equal to 0.887). The top ten companies remain for both rankings except La Fe which goes up five places and ranks among the top ten in the disturbed model. The other sensitivity exercise carried out consisted in using the reference points arising from the statistical values (the three first quartiles), for all criteria. In this situation, the Spearman correlation between the scores of the original and disturbed models is equal to 0.99, therefore signifying that results are highly correlated.

Conclusions
The paper proposes a model for ranking non-life insurance companies based on combining the most relevant financial ratios and indicators into one overall score, thereby measuring the composite performance of each firm.
We have attempted to resolve two issues associated with such scoring approaches that may preclude decision making. The pairwise comparisons are useful tools for identifying the relative importance between the decision elements. The study takes advantage of this suitability, well referenced in the literature, and proposes an approach for overcoming two ongoing problems, such as the imprecision in judgements and the difficulty for making many paired comparisons. The first issue is addressed by interval values that are extended by tolerance thresholds. The second one is overcome by restricting paired comparisons to those where one element of the pair is the most (least) important of the decision criteria. This approach extends the Best-Worst method and is addressed by fuzzy set theory. The aggregation procedure chosen uses both knowledge expert and statistical values, facilitating a good balance between objective and subjective information. In addition, the MRP methodology gives global scores within the scale set by the DM making them meaningful for the DM. The proposal is comfortable for the DM because it is based on easily obtainable expert knowledge as well as easily understandable results. Mathematically, the proposal is within a linear framework.
Several findings are obtained from the implementation of the model for a Spanish database spanning the years 2009 to 2017. The distribution of the score obtained allows us to conclude that the Spanish non-life insurance market performance is reasonably good given that most of the companies are located in the upper tranches.   2009-2010 2011-2012 2011-2013 2012-2013 2015-2016 2015-2017 2016-2017

3
An extended best-worst multiple reference point method:… The results reveal, as expected, that stock (private) companies perform better and are more profitable than mutual companies. At the same time, small companies score higher than large ones, revealing that size and diversification does not offer a real advantage. Maybe management inefficiencies and the increase in cost make large firms underperform when compared with smaller firms. Nevertheless, the financial study made for the Spanish non-life insurance market, allows us to affirm that it benefits from good financial health, based on the financial ratios analysed. Although over the whole period premium growth was positive, the financial crisis caused a slowdown in the latter from 2010 onwards. The impact of the financial crisis is also observed via the decrease of the ROE and technical ratio in 2010, although these ratios recuperate during the period analysed, albeit not reaching their previous values. The positive evolution of these ratios as well as the Combined Ratio reveals a favourable trend for the Spanish non-life insurance market.
Our method is easily comprehensible and includes both hard and soft data. We hope that these features will give rise to a widespread use by practitioners. In terms of future research, the proposed method may be extended to include group decision-making involving more than one DM. In addition, the EBW method could be combined with other scoring methodologies. We also suggest applying the proposed method to other markets. In the context of the COVID-19 pandemic it is important to check how have changed the financial performance of the insurance companies. The future research could consider the impact of the COVID-19 pandemic on the global scores to find the correlation between periods before and after the pandemic.

3
An extended best-worst multiple reference point method:…   1 3 An extended best-worst multiple reference point method:…

3
An extended best-worst multiple reference point method:… Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.This work was supported by Fundación para el Fomento en Asturias de la Investigación Científica Aplicada y la Tecnología (FICYT), Project AYUD/2021/50878.

Confict of interests
The authors declare that they have no confict of interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.