Background

Since the development of dialysis in the 1960s, the treatment of end-stage renal disease with dialysis has been a challenge worldwide. Less than 0.015% of the population is estimated to be on dialysis, but consumes approximately 2% of healthcare expenditure [1]. In Europe alone, more than 180,000 people are undergoing renal replacement therapy with haemodialysis in more than 4000 centres. The estimated cost of such treatment is between 30,000 and 47,000 euros per patient per year [2, 3].

Importantly, the results of treatment with haemodialysis vary between centres in both the USA and Europe. The Dialysis Outcomes and Practice Patterns Study (DOPPS) detected between the different facilities of the USA a variability in the adjusted mortality of almost double (88.7%), in the transfusions performed of more than double (113.9%), and in the prevalence of autologous fistulas of more than 50% (56.0%) [4]. Significant differences were also detected among seven European countries with regards to compliance with clinical guidelines [5].

On the other hand, assessment of dialysis outcomes is generally based on partial methodologies that exclude relevant features (e.g., quality of life, satisfaction, or costs) or present biases (e.g., they do not consider the perspective of the stakeholders, such as patients and managers) [6, 7]. A substantial epistemological limitation of evidence-based medicine has been suggested to be that its indicators reflect the preferences of researchers, ignoring those of other stakeholders [8]. The methods for evaluating health outcomes should consider aspects associated with the individuals involved, such as prioritizing patient-centred care, procuring their welfare, incorporating stakeholder participation in the evaluation, security, transparency, and dignity, respect, and compassion [9,10,11,12].

In addition, health organizations are characterized by a large number of dynamic components that, in the real world, interact in complex ways through frequently unpredictable relationships. The evaluation of clinical results from a traditional perspective that ignores these complex relationships is insufficient [13]. In this sense, it is necessary to develop new evaluation methodology that is more realistic and effective, and that considers the complexity of health organizations.

Multicriteria decision analysis (MCDA), also known as the multicriteria decision, includes a set of approaches capable of improving decision-making in complex systems and has been recommended by the International Society for Pharmacoeconomics and Outcomes Research, Health Science Policy Council. Multicriteria methods allow the values and preferences of the stakeholders to be captured, integrating their different perspectives, adding the information in a single expression value, and doing it transparently, consistently, and legitimately [14,15,16,17,18].

The present study aimed to determine the opinions and preferences of the stakeholders in the treatment with haemodialysis, to determine indicators of their results, and to establish their relative importance following a multicriteria approach. This knowledge would allow the creation of an instrument based on values, effectively enabling assessment of the results of different centres, their comparison, and then using it to improve them.

Methods

For the multicriteria study of stakeholder preferences, five working groups (WGs) were created consecutively, each with specific objectives. All of them were face-to-face, except for WG4, which was via the Internet.

WG1 defined the general objectives, identified the groups of “stakeholders” or relevant actors that provide the preferences, and created a draft of criteria and sub-criteria. WG2 evaluated and agreed on the criteria and sub-criteria. WG3 was comprised of three subgroups: WG3-A, WG3-B, and WG3-C. Each of these groups independently, in parallel, and face-to-face weighted the criteria and sub-criteria according to their preferences using two different multicriteria methodologies: weighted sum (WS) and analytic hierarchy process (AHP). Two weeks after this weighting, a survey was sent by email to each individual regarding their preference for the results of the WS or AHP method. Via the Internet, WG4 weighted the criteria and sub-criteria only by the method with the highest preference in the survey. In this way, a new weighting of the criteria and sub-criteria was established through similar methodology, but with a larger sample in order to validate the results of the face-to-face WG3 (obtained with a small sample size) and guaranteeing significant conclusions. Finally, WG5 consisted of two independent academics, specialists in multicriteria, who analysed the results (Fig. 1).

Fig. 1
figure 1

The figure shows the working groups, their composition, flows, methodologies and objectives

Working group 1

WG1 consisted of six researchers: four were nephrologists and two multicriteria analysts. The group defined the general objective of the study and the structure and composition of the remaining groups. The general objective was to determine the relevant criteria and sub-criteria of haemodialysis treatment and their weighting according to the preferences of the stakeholders. The preferences of the stakeholders allow a “performance matrix” summarizing the measured preferences for each relevant criteria to be established and determine an “aggregation function” that allows weights to be combined consistently with stakeholders’ preferences. This function enables analysis of the results of the centres considered in the study, and establishes their individual qualification in an orderly and justified manner.

WG1 defined the requirements and number of actors that comprised WG2, WG3, and WG4, which included patients, clinicians, and managers. All participants were recruited on a voluntary basis. The patients should have been in haemodialysis at least three years and have exercised coordination tasks in some organization of kidney patients. They were recruited from such organizations, mainly ALCER (Asociación para la Lucha Contra las Enfermedades Renales). The clinicians had to be of recognized prestige and extensive experience, one of them a nephrologist, an internist, and a nurse. For the managers, three profiles were defined that should be present in each group: economic direction, medical direction, and health services researcher. Clinicians and managers were recruited mainly from the centres involved in the study, they were contacted via phone-call, e-mail or personal approach. WG2 and WG3 (WG3-A, WG3-B, and WG3-C) were each comprised of nine interested individuals: three patients, three clinicians, and three managers (total 36 individuals: 4 groups × 9 interested in each). WG3-A was located in Alicante, WG3-B in Segovia, and WG3-C in Zaragoza. WG4 was comprised of at least 15 stakeholders from each category (patients, clinicians, and managers) who were located in different parts of Spain and participated online.

The criteria and sub-criteria were identified sequentially in two steps. First, WG1 agreed on the draft criteria, and then WG2 agreed on the criteria. The criteria are the relevant factors for the evaluation and ordering of the different options (haemodialysis centres). These must meet certain requirements in relation to the MCDA methodology used (completeness, non-redundancy, non-overlap, and preference independence). The following principles were also considered for the selection of draft criteria: feasibility of its implementation, potential modifiability of the indicator, and impact on the patient.

WG1 defined the search strategy in PubMed/Medline, EMBASE, and Cochrane Library. The terms included were: haemodialysis, outcomes, registry, patient reported outcomes (and equivalents), and clinical guideline. Priority was given to the PRISMA clinical guidelines (Preferred Reporting Items for Systematic Reviews and Meta-analyses). Two WG1 members independently reviewed the literature results and proposed a first draft of the criteria and sub-criteria to the rest of the group. After a discussion in the whole group, the draft criteria and sub-criteria were approved. An “evidence-based” criterion composed of various sub-criteria was established. To determine this, the group decided to consider only the recommendations of level 1 in international clinical guidelines to provide the study with transparency and reproducibility. This decision was made in a manner consistent with the GRADE (Grading of Recommendations Assessment, Development, and Evaluation) considerations. We focus on three clinical guidelines that provide an appropriate framework for the study: Kidney Disease Improving Global Outcomes (KDIGO, https://kdigo.org), European Renal Best Practice (ERBP, http://www.european-renal-best-practice.org) and Kidney Disease Outcomes Quality Initiative (KDOQI, https://www.kidney.org). Finally, the ultimate decision on the inclusion of each criterion was made separately by the majority of the group (at least four members of WG1).

Working group 2

WG2 consisted of stakeholders (3 patients, 3 clinicians, and 3 managers) working face-to-face. The group used WS methodology to reflect the criteria and sub-criteria by carrying out a qualitative structured analysis of the draft criteria and sub-criteria prepared by WG1. The deliberation was recorded and two independent analysts from the group with four pre-established criteria (internal, external validity, reliability, and objectivity) contributed to validating the selected criteria.

Working group 3

The face-to-face WG3 weighted the criteria and sub-criteria agreed upon by WG2. WG3 consisted of three subgroups, independent and parallel in time (WG3-A, WG3-B, WG3-C). Following a multicriteria approach, the following were performed sequentially: baseline weighting using the WS methodology, a structured debate, and a second weighting using two different multicriteria methodologies: WS and AHP. The purpose of the weighting was to elicit the preferences of the stakeholders for each of the criteria and the reasons for their preferences.

The WS is an additive model in which the stakeholder is invited to distribute 100 points proportionally to his preferences among the set of criteria and sub-criteria (total sum 100). For example, if stakeholder Patient 1 has to rank a group of four criteria according to his preferences, distributing 100 points among them, the weight could be 10 points for criteria 1, 60 for criteria 2, 10 for criteria 3, and 20 for criteria 4. The stakeholder establishes the weights for each criterion simultaneously. The AHP is a multicriteria technique in which, for each node of the hierarchy considered, the stakeholder compares in pairs the relative importance of the elements (criteria, sub-criteria, or alternatives) that hang from it according to the fundamental scale of Saaty [19]. The result in each node is a positive reciprocal square matrix from which the local priorities are obtained and a measure of the decision-maker’s inconsistency when issuing their judgments. The Super Decisions program was used for this (https://www.superdecisions.com, sponsored by Creative Decisions Foundation). The results obtained are transferred in the same way to a distribution of 100 points to their preferences among all of the criteria and sub-criteria (total sum 100 points). For example, in the AHP model, if stakeholder Patient 1 weighs a group of four criteria using a pairwise consecutive comparison (criterion 1 with 2, criterion 1 with 3, criterion 1 with 4, criterion 2 with 3, criterion 2 with 4, and criterion 3 with 4), the comparison is made on a quantitative scale that reflects the importance of one criterion in relation to the other. Both criteria can be ranked equally. The Super Decisions program allows the weight of each criterion to be found, in which the sum of all criteria is 100 (e.g., 10 points to criterion 1, 60 to criterion 2, 10 to criterion 3, and 20 to criterion 4). In this way, the results obtained by the WS and AHP methods are comparable. The assessment of criteria requires a personal and collective reflective process based on the individual weighting and a collective structured debate of the stakeholders, reasoning their different interests and perspectives to outline the trade-offs between criteria.

Two weeks after the meeting of each WG3, a survey was carried out with each participant. They were asked which results (WS vs. AHP) better reflected their preferences (or none of them or both equally). The survey was conducted blindly via email (the interviewee answered ignoring the methodology, WS or AHP). Thus, the researchers determined which method best expressed the preferences of each stakeholder according to their criteria.

Working group 4

WG4 again weighted the criteria and sub-criteria, but only by the method (WS) that best expressed the preferences of WG3 in the survey. This new weighting was performed to check the consistency of WG3’s results. WG4 reproduced the multicriteria appraisal process of WG3 via the Internet. It was composed of a minimum of 15 patients, 15 clinicians, and 15 managers. Thus, the method sequentially included a first baseline weighting (WS), structured deliberation, and a second weighting exclusively using the methodology preferred by WG3 (WS or AHP). An ad hoc website was designed in HTML5 with CSS, JavaScript and AJAX on the client side, and PHP with MySQL on the server side, tools that met the necessary requirements imposed by the methodology. The discussion via Internet was anonymous, but the category of the stakeholder was shown (patient, clinician, or manager).

Working group 5

Finally, WG5 integrated two independent academic experts in MCDA, who analysed the results. The statistical study was carried out using SPSS software and consisted of the following phases: (i) analysis of face-to-face results by stakeholder category and by methodology (WS vs. AHP); and (ii) analysis of significant differences between face-to-face and Internet results by stakeholder category, a T-test of means and ANOVA methods have been used for statistical analysis and significance. A Bonferroni test have been used to prevent data from incorrectly appearing to be statistically significant, if necessary.

Results

The bibliographic search carried out by WG1 resulted in 17 articles that met the requirements imposed by the MCDA methodology. WG1 identified five criteria: evidence-based variables (EBVs), morbidity, mortality, patient reported outcome measures (PROMs), and patient reported experience measures (PREMs). PROMs are health outcomes that capture symptoms, functional status, and quality of life. PREMs measure aspects related to the humanity of care, such as the dignity of care or communication with healthcare personnel [20]. In turn, the EBV criteria included four sub-criteria: dialysis dose, haemoglobin concentration, mineral and bone disease, and type of vascular access.

After the qualitative analysis, WG2 confirmed all of the established criteria and sub-criteria and asked WG1 to include a new sub-criterion within the EBVs, the “ratio of bacteraemia related to the catheter”. This modification was detected independently by the two analysts in the meeting as a need perceived by the three stakeholder groups. This indicator was considered by all of the groups to be an essential element in the safety of patients. WG1 included it when fulfilling all of the established requirements. Table 1 reflects the final structure of the approved criteria and sub-criteria and their definitions. The criteria are defined positively to allow their aggregation and the construction of a performance matrix.

Table 1 Criteria and sub-criteria established for haemodialysis treatment

Table 2 shows the relative importance of the criteria (from 0 to 100) and sub-criteria (from 0 to 100) expressed by the members of WG3 in the second weighting. The aggregate results of WG3 (A, B, and C) are shown by both methodology (WS and AHP) and category of stakeholder (patients, clinicians, and managers). The first weighting and the results disaggregated by WG3 (A, B, and C) are not collected to simplify the table. For patients, the criterion most valued by both methodologies was PROMs (WS 28.33 and AHP 36.26) and it was superior to that of the other stakeholders. For clinicians and managers, the most valued criteria were PROMs and EBVs, depending on the methodology used. Among the sub-criteria, the type of vascular access was the most valued criterion with both methodologies by all stakeholders. The analysis of face-to-face results by stakeholder category and methodology (WS vs. AHP) using ANOVA showed only minor significant differences between stakeholders for the criteria of morbidity and mineral and bone disease. Importantly, ANOVA showed that the groups were not comparable due to the small sample.

Table 2 Aggregated results of Working Group 3 (A, B, and C)

The results of the individual survey of WG3 are shown in Table 3. The majority (61.5%) expressed a preference for the WS method, and we decided to continue the investigation in WG4 with the WS method only.

Table 3 Result of the survey of the members of Working Group 3 (A, B, C) in which they were asked about the method that best reflects their preferences (WS vs. AHP)

Table 4 shows the weights of the criteria and sub-criteria given by the members of WG4 disaggregated by category of stakeholder (patients, clinicians, and managers). In the Internet group (WG4), ANOVA only detected small significant differences between stakeholders for PROMs (P = 0.047). The Bonferroni test showed that the differences detected were only between patient and managers (P = 0.043). These data were not included in the table for simplicity.

Table 4 Weighting of the criteria and sub-criteria made by Working Group 3 (A, B, C) face-to-face and Working Group 4 via the Internet using the WS methodology and its comparison

Table 4 also presents the results of WG3 disaggregated in the same way. A comparison is made between both WGs (WG3-A, B, C vs. WG4). The table shows that there are no significant differences between the two groups (face-to-face vs. Internet) for most of the results. The only differences detected are in two parameters in the category of clinicians: EBVs (dialysis dose, P = 0.045) and PREMs (P = 0.025).

By presenting only minor differences between WG3 and WG4 (face-to-face vs. Internet), both groups were considered comparable and their results added. After the addition of WG3 and WG4, ANOVA was performed to detect differences between stakeholders. The results showed significant differences that were confirmed by the Bonferroni test for the following criteria and groups: morbidity, patients vs. managers (P = 0.045); PROMs, patients vs. clinicians (P = 0.042) and patients vs. managers (P = 0.034); and PREMs, patients vs. managers (P = 0.023) and clinicians vs. managers (P = 0.042). For the sub-criteria and groups, the differences were ratio of bacteraemia, patients vs. managers (P = 0.007), and mineral and bone disease, patients vs. managers (P = 0.036; Table 5).

Table 5 Aggregate results of the Working Group 3 and Working Group 4 and comparison of differences between the stakeholder groups

Finally, Table 6 includes a weighting proposal for each criterion aimed at a hypothetical evaluation of the results of haemodialysis centres. It also presents the mean standard deviation of each criterion as a reference value to conduct a sensitivity study of said evaluation. This table has been prepared with the results from WG4. Thus, the EBVs would have a weight of 24.24 points out of the total 100 in the evaluation.

Table 6 Proposal of weights for each criterion in a hypothetical evaluation of dialysis centres and their standard deviation

Discussion

Our study shows that there are different perceptions and valuations among the different criteria for evaluating haemodialysis. Thus, patients give greater importance to PROMs than clinicians and managers, and this happens with all three estimation methods used (face-to-face: WS and AHP, and via Internet). The results corroborate a finding that has already been revealed in previous research using other methodologies [21, 22]. Mortality also has a differentiated weighting: lower for patients and higher for clinicians and managers. Despite these differences in assessment among the stakeholders, only recently has the need to include the patient’s perspective in a routine and explicit way been emphasized [23,24,25,26]. PROMs are a priority for patients and other stakeholders, reflecting their preferences, and should be systematically considered in evaluation systems.

The objective of the evaluation of health services is threefold: (i) to quantify the quality of the service provided; (ii) allow specific programs and activities aimed at improvement to be established; and (iii) enable accountability and citizen control of the services provided. Due to the transcendence of these objectives, it is an indispensable duty to have a comprehensive evaluative methodology that is valid, participatory, acceptable, and feasible.

The multicriteria methodology is a formal deliberative discussion procedure that uses explicit criteria. The method incorporates the perspective of the stakeholders in determining the preferences of the process studied. The preferences and intangible aspects are synthesized in the criteria and their weights. The mathematical expression of the preferences constitutes the performance matrix, with which an aggregation function of the results can be constructed, capable of adding these in a single expression value. The use of a performance matrix of indicators, such as the one proposed in Table 6, provides a measure of proportionality and uncertainty for each criterion that reflects the values of the stakeholders. The matrix can be useful to provide validity, legitimacy, and transparency to an analysis of the results and to the elaboration of clinical guidelines based on the values and preferences of the stakeholders [27].

Health services are made up of a multitude of components that interact with one another in a frequently unpredictable way. They constitute “complex adaptive systems” influenced by biochemical, cellular, physiological, genetic, pathological, pharmacological, organizational, psychological, social, cultural, economic, and political aspects that determine considerable uncertainty in the face of individual and collective decisions [13]. In addition, multiple cognitive limitations in information processing interfere with clinical and organizational decision making [28]. It has been postulated that the conceptualization of the health environment as a complex system can help in its understanding and improvement, by banishing simplistic paradigms of linear thinking [29]. In this context, the use of an evaluation model endowed with a multiple, transdisciplinary, and reflective perspective can constitute a tool to help assess the results and make decisions closer to the complexity of the real world.

The methodology allows a rational hierarchy of complex elements, such as the different EBVs. In multicriteria deliberations, all EBVs were subordinated to the type of vascular access, which is the most valued sub-criterion, and this subordination was widely accepted by the various stakeholders. The reason for this is that adequate vascular access improves the results of the other four EBVs, but this property does not happen the other way around for any of the four variables. The capture of nuances of a multilateral relationship between indicators helps characterize them, and their knowledge facilitates a judicious exercise of clinical practice.

The implications of this study on the evaluation of centres are important. Health processes are complex systems in which no individual actor is knowledgeable about the whole of their operation. Therefore, it is essential to design evaluation procedures that consider multiple perspectives and are based on broader societal involvement to effectively improve our knowledge of the process. With this study, we have determined the relevant outcomes of haemodialysis, quantifying their relative importance and potential degree of uncertainty. An evaluation of dialysis centres using this methodology may be more accurate and legitimate. In addition, this evaluation methodology can be reproduced and used in other homogeneous clinical processes, such as kidney transplant, hip replacement, or many others. To enable meaningful comparisons across centres, the case-mix variables of those centres need to be appropriately adjusted.

This evaluation, which considers the complexity of health organizations, may be an effective tool in helping clinical and management decision-making.

The study has several limitations. First, although there is an epistemological basis for the knowledge generated, the performance matrix could create a different structure in another cultural environment. As has been suggested, the subject is rooted in a social order that is a source of subjectivism [13]. It would be important to validate the weighting of the criteria in a different cultural environment before their practical application in it. Second, although the concept of PREMs is defined, there is no consensus about the use of questionnaires in practice in haemodialysis [25]. For this reason, the reflection of the group is adequate from a conceptual perspective, but imprecise when going down into the detail of the content of the questionnaires due to their heterogeneity. Despite the limitations of the study, we think that an evaluative approach that considers these indicator weights is more consistent than a perspective that does not discriminate between indicators, as it better reflects the values ​​of the stakeholders.

Conclusions

Our results suggest that the different types of stakeholders manifest distinct preferences among indicators, and this happens consistently when captured by different methodologies. Thus, patients have a greater preference for indicators related to PROMs than clinicians and managers, and this consideration must be incorporated into the assessment of health services. The use of a multicriteria methodology endowed with a multifocal, transdisciplinary, and reflective perspective allows us to determine the relative importance and uncertainty of the various evaluation indicators, as a reflection of the values of the stakeholders and society. The inclusion of values in the evaluation, through a performance matrix, could help with clinical and organizational decision-making in a complex system.