1 Introduction

The economic empowerment of women and girls is a process whereby women and girls experience transformation in power and agency, as well as economic advancement (Pereznieto & Taylor, 2014). Social intervention programs in WE gained momentum as an urgent issue in the social economy agenda towards contributing to the Sustainable Development Goal (SDG) “No. 5 Gender Equality” of the UN with an emphasis on all women and girls’s economic empowerment. UN linked WE with “sense of self-worth”, “right to have access to opportunities and resources”, “right to have the power to control their own lives, both within and outside the home”, and “ability to influence the direction of social change to create a more just social and economic order, nationally and internationally” (IOM, 2017). The vital connection between economic and gender-based inequality often adversely impacts women’s well-being (Hughes, 2015), especially in male-dominant communities where women are mostly limited to household activities. Therefore, the analysis of WE is essential for any social work-related context (Basimatory et al., 2023; Sell & Minot, 2018).

In literature and practice, women’s economic empowerment projects are discussed in a large spectrum varying from the role of microfinance institutions (Rehman et al., 2020) to agricultural production (Johnson et al., 2018) and entrepreneurial activities (Karki & Xheneti, 2018) which can be empowering only if they can eliminate the limitation brought by gender roles. The change in women’s marginalization in the labor market cannot be overpassed on the way to WE (Güney-Frahm, 2018). NPOs play a considerable role in these efforts due to insufficient support from the government and society with patriarchal norms (Gupta, 2021). In Turkey as a developing country, women’s labor participation rate in 2022 was 37.3%, strikingly lower than OECD countries (64.8%) (OECD, 2020). The employment rate of women of working age was also much lower (35.5%) than OECD average (62.5%) (OECD, 2020). Many NPOs and their stakeholders work toward women’s economic empowerment to improve these rates in Turkey; however, they face performance-related challenges primarily due to inefficient monitoring and evaluation of their projects’ outputs, outcomes, and SIs.

SI refers to a logical chain starting with organizational inputs, leading to outputs, outcomes, and societal impacts (Ebrahim & Rangan, 2010). NPOs and practitioners must assess social interventions to maximize resource utilization and stakeholder satisfaction. SI program execution demands performance reporting and monitoring with contextual indicators for timely corrective and preventive actions. NPOs often struggle in measurement design at project inception, due to flawed theory of change (ToC) and inappropriate indicators (Güner, 2021). Furthermore, the existing frameworks like MEAL and SRI lack context-specific indicators, while costly SI consultancy services and templates prove inefficient without real-time program data.

Effective Social Impact Measurement (SIM) demands robust indicators (Alexander et al., 2010). For social entrepreneurs and enterprises, the lack of shared metrics hampers planning, stakeholder communication (Nicholls, 2009), and gaining support (Bengo et al., 2016). Local capacity for community-level SIMs is essential (Bice, 2020), requiring tailored indicators. However, the literature review reveals a scarcity of research on selecting performance indicators for women’s empowerment. Glennerster et al. (2018) listed the “varying meaning of empowerment in different contexts” and “prioritizing outcome measures” among the challenges in measuring women’s and girls’ empowerment. For the former, using findings from formative research is suggested for selecting or developing context-specific, locally tailored indicators to complement empowerment indicators with more standard ones. The literature review also revealed the MCDM methods are underutilized in SIM indicator selection, and few research employ them for ranking SI indicators in women’s economic empowerment.

In brief, despite the richness of research streams regarding women’s empowerment policies (Tirka Widanti, 2023) and WE indicators, literature gaps persist in two key areas: (1) lacking quantitative frameworks for selecting WE indicators across different contexts, necessitating the application of MCDM models to enhance consistency and validity through sensitivity analysis, mitigating expert subjectivity (Bengo et al., 2016); (2) underutilization of MCDM techniques for prioritizing SIM indicators, despite their widespread use in related domains such as ranking organizations based on their SI (Amrita et al., 2018; Bengo et al., 2020; Dzunic et al., 2018).

Following this suggestion, this paper aims to contribute to the WE theory and practice by providing context-specific indicator lists and rankings for WE programs. By this aim, we proposed a quantitative MCDM model for SIM in WE, with an application on Turkish WE programs, utilizing the expert opinions of program implementers and Non-governmental organization (NGO) experts gathered through interviews and surveys. The findings have the potential to offer guidance to program designers, implementers, and evaluators, focusing on indicators directly impacting women’s economic empowerment and contributing to the achievement of the UN’s 5th SDG—Gender Equality. Additionally, the proposed MCDM model for indicator selection aids NGOs in aligning indicators with the Theory of Change and context-based outcomes (Bice, 2020).

The study primarily defines the 17 performance indicators based on the data gathered through interviews with experts from 11 WE NGOs in Turkey. Surveys with 12 WE program executors (Key Informants) and experts enabled the evaluation of the importance (weights) of each criterion (measurability, attainability, relevance, replicability, and usability in various contexts) by Fuzzy AHP. Followingly, the Fuzzy TOPSIS method ranked and prioritized the indicators by using the weighted criteria obtained from FAHP. The results are validated through the consistency check and sensitivity analysis and ensure the robustness of the methodology. Finally, the discussion of the results provided a thorough understanding of the indicators along with the theoretical and practical implications and policy recommendations.

The following sections present a literature review on SIM, indicators and measurement of women’s empowerment, and MCDM methods. After introducing the methodological flow, the paper proceeds with the findings from the FTOPSIS—FAHP applications for prioritizing the indicators. The final section presents the concluding remarks with a discussion of the results.

2 Literature Review on Social Impact Measurement and Women’s Empowerment Indicators

2.1 Social Impact and Social Impact Measurement

SI is “the consequences to human populations of any public and private actions altering how people work, live, play, relate to one another, organize to meet their needs, and generally act as a member of society.” (Burdge & Vanclay, 1996). The SIM as a tool for validating the contributions to social goals transformed into a non-dismissible part of policy formation for many institutions, from business corporations to NPOs (Becker, 2001). NPOs are increasingly expected to prove their SI, hence to assure their sponsors of the results for enabling the continuity of funds from their present and future donors (Arvidson & Lyon, 2014). However, the need for SIM arises from the requirement of satisfying the funders and those NPOs’ self-evaluation (Arvidson & Lyon, 2014). NPOs’ recognition of their social value and benefits that the stakeholders perceive is highly critical (Polonsky et al., 2016).

However, challenges in practicing SIM, such as varying assumptions of project members and consultants about social value or the lack of a shared understanding, often undermine the quality of SIM and utilization of the improvement opportunities (Vanclay et al., 2015). Inappropriate or non-contextual indicators hinder the validity of impact measurement while the dilemma of “what to do” versus “how to do” misguides the design of the SIM process, shifting the sole focus from outcomes to the outputs. Also, the SI is a new topic among practitioners in Turkey, with limited examples and resources (Güner & Keskin, 2021). Lack of expertise in designing the SIM process has always been a significant barrier in social projects (Social Impact Task Force, 2000). Köroğlu and Yıldırım (2023) also highlighted this challenge in the context of SIM processes of NGO-led WE projects in Turkey.

2.2 Determinants and Indicators of Women’s Empowerment

The concept of empowerment has a multi-dimensional focus on “resources” (control over physical, financial, human, and intellectual resources (Kabeer, 1999)), “the agency” (having the capacity and freedom to make individual life choices implies agencies (Desai, 2010; Sen, 2009)), and achievement (the functioning constituted by agencies and resources together) (Basumatary et al., 2023; Sell & Minot, 2018). Proper impact assessment methods, indicators, and appropriate project execution strategies are critical for the success of WE projects.

Women’s empowerment is an inspiring but also challenging concept for monitoring and evaluation specialists (Bishop & Bowman, 2014):

  • It is inspiring to consider the potential for evaluation to illustrate and support truly transformational but often hidden changes.

  • It is challenging from the measurement perspective—an abstract and contested concept boasting a range of sometimes-dry definitions causing failure in capturing its transformational elements.

In this context, Landig (2011) analyzed the effectiveness of 3 EU-funded WE projects in Turkey and pointed out these projects aiming to increase the number of women in the workforce became more successful when evaluated and monitored. The possible interventions of an agency empowering a social group were categorized by Rowlands (1997) into four power types: (1) empowerment as a choice (power to do)—domain-specific autonomy and household decision-making; (2) empowerment as control (power over)—control over personal decisions; (3) empowerment as change (power from within)—changing aspects in one’s life (communal level) and communal belonging; and (4) empowerment in a community (power with)—changing aspects in one’s life (individual level). Referring to Rowland’s (1997) typology, Pereznieto and Taylor (2014) and Carter et al. (2014) categorized WE into four dimensions being referred to as “change outcomes” in WE interventions:

  1. 1.

    Power within the knowledge, individual capabilities, sense of entitlement, self-esteem, and self-belief to make changes in their lives, including learning skills to get a job or start an enterprise.

  2. 2.

    Power to economic decision-making within their household, community, and local economy (including markets), not just in areas being traditionally regarded as women’s realm, but extending to areas being traditionally regarded as men’s realm.

  3. 3.

    Power over access to and control over financial, physical, and knowledge-based assets, including access to employment and income-generation activities.

  4. 4.

    Power with the ability to organize with others to enhance economic activity and rights.

In Rowlands' typology, some proposed indicators of empowerment in general were also introduced by Ibrahim and Alkire (2007). In evaluating WE, Carter et al. (2014) noted quantitative methods emphasize 'power-to,' and qualitative methods offer broader insights.

The United Nations Foundation’s (UNF’s) guidance on Measuring Women’s Economic Empowerment classifies the outcomes of WE programs as direct, intermediate, and indirect (Buvinic & Furst-Nichols, 2013). Though these topics are widely acknowledged as primary determinants of WE, at the first step, a ToC should be mapped to select the most appropriate indicators for a reliable measurement strategy tailored to specific empowerment interventions (Glennerster &Takavarasha, 2013; Glennerster et al., 2018).

Though it is essential to use interventions and domain-specific measures in the empowerment context, the measures include a core set of concepts (Alkire, 2005; Glennerster et al., 2018). Three determinants (psychological patterns of society, family, and women) affect six indicators ((1) education, (2) educational freedom, (3) economic contribution, (4) economic freedom, (5) household management and decision-making, (6) perceived status within the household and health)) directly influencing WE status (Sharma & Bansal, 2017). Some primarily available measurement frameworks with indicators or measures of women's economic empowerment are the International Center for Research on Women’s (ICRW) modules (Golla et al., 2011), Women’s Empowerment in Agriculture Index (WEAI) (Malapit et al., 2019), or the guide of Glennerster et al. (2018) prepared for NPO usage. However, in academic studies, we have not found studies significantly fitting this problem’s context. International organizations like UNF, World Bank, GroW, and Oxfam also provided a list of WE topics (Aletheia et al., 2017; Alkire et al., 2013; Buvinic & Furst-Nichols, 2013; Bishop & Bowman, 2014; Golla et al., 2011; Laszlo & Grantham, 2017; Lombardini et al., 2017, Glennerster et al., 2018):

  • Women’s access to and control over resources, including income and assets;

  • Participation in important decisions at the personal, household, and community level;

  • Control over reproductive health and fertility choices;

  • Subjective well-being and happiness; mobility;

  • Time use and sharing domestic work;

  • Freedom from violence;

  • Community and political participation;

  • Well-being outcomes in domains like education, health, and labor;

  • Women claiming and enjoying their rights;

  • Being able to make decisions about the direction of their lives;

  • Beginning to access power denied to them.

Recently, Basumatary et al. (2023) constructed a Women’s Empowerment Index (WEI) for handloom weavers (HW), adopting 25 indicators from Oxford Poverty and Human Development (OPHI), the Women’s Empowerment in Agriculture Index (Alkire et al., 2013); and the Women’s Empowerment Index for self-help groups women by Roy et al. (2018), in seven domains: economic empowerment, household empowerment, participation in the political and social sphere, health, involvement in fertility-related decisions, media, and leisure time/time allocation. However, like many other WE indices, this study gives the domains and indicators equal weight.

2.3 Women’s Empowerment Indicators Selection

The literature is wide regarding the frameworks used for measuring WE, such as the study of Roy et al. (2018), which proposed an index for measuring WE in India. Similarly, Naranayan et al. proposed a methodology for developing an index for WE in the nutrition context in India, where they selected the indicators via factor analysis and normative lenses (2019). A Ghana study measured employment's impact on WE by considering objective and subjective indicators derived from the literature and applying regression for the results (2020). Nevertheless, even though the computation process was systematical, the selection or ranking of the indicators lacked an analytical approach. Amrita et al. (2018) assessed women entrepreneurship projects’ performance indicators with Fuzzy AHP, however, it was also limited to the entrepreneurship context.

As Richardson (2018) suggested, research practices measuring WE lack an analytical approach for minimizing human judgment while considering the criteria and a measurement model for constructed indicator selection. In this regard, this study aims to fill this gap by proposing an MCDM model for WE indicator selection that can be applied to various projects or programs. The following section further reveals our findings regarding the rare application of quantitative methodologies, such as MCDM methods, for selecting and ranking indicators in the WE context.

3 Literature Review on Multi-criteria Decision-Making Approaches in Indicator Selection

Numerous studies employ MCDM approaches in various contexts: for instance, Ozkaya et al. (2021) in science, technology, and innovation indicators of countries; Mavi (2014) in ranking entrepreneurial universities’ indicators using Fuzzy TOPSIS and Fuzzy AHP; Anand et al. (2017) in evaluating sustainability indicators in smart cities with Fuzzy AHP; Singh et al. (2022) in construction safety with Fuzzy TOPSIS; Pansare et al. (2023) in reconfigurable manufacturing systems with a hybrid Fuzzy TOPSIS-Fuzzy AHP approach; Rao (2021) utilizing DEMATEL-ANP-based Method (DANP) for assessing sustainability indicators through corporate social responsibility reports in Taiwan; and Jiang et al. (2020) employing Z-DEMATEL in identifying key performance indicators in hospital management. Additional methods like Fuzzy DEMATEL, Fuzzy ANP, and MOORA are also evident, particularly in the shipbuilding industry (Gavalas et al., 2022).

MCDM methods also find extensive use in SIM. For instance, Dzunic et al. (2018) employed Entropy to rank SIs and TOPSIS to rate social enterprises in the context of people who parted from society. Stankovic et al. (2021) used an integrated approach with Entropy and Preference Ranking Organization Method for the Enrichment of Evaluations (PROMETHEE). Lamata et al. (2018) created an MCDM framework using AHP and TOPSIS to help investors decide the superior company in corporate social responsibility. Rafiaani et al. (2020) applied TOPSIS to rank SI indicators in Carbon capture and utilization (CCU). Bengo et al. (2020) introduced a Naive-scoring-based framework for SIM approaches. Adhikhari et al. (2023) evaluated WE across various domains using AHP and TOPSIS, though limited to measurement rather than indicator selection or ranking. Some studies opt for fuzzy extensions of AHP and TOPSIS due to their ability to tackle the challenges of traditional methods by bringing a realistic solution to complicated decision problems (Dang et al., 2019).

MCDM methodologies have demonstrated their ability to aid stakeholders in pinpointing and reaching a consensus on sustainable solutions across various sectors (Buchholz et al., 2009). However, adopting an MCDM method has pros and cons, with the choice of a method determining the outcomes. Multi-criteria methods encompass various categories, including weighting, ordinal approaches, utility function-based techniques, relationship handling, and methods centered around measuring proximity to an ideal alternative (Gomes & Gomes, 2014). Fuzzy TOPSIS and Fuzzy AHP are chosen for the proposed framework in this study. This choice stems from TOPSIS's distinctive advantages, such as reduced complexity in both data collection and computation processes, ease of utilization, and the comprehensibility of its logical foundation, aligning with human decision-making (Velasquez & Hester, 2013). Besides, TOPSIS identifies alternatives, eliminating units across criteria while normalizing values (Manivannan & Kumar, 2016). Predefined weights in TOPSIS rank alternatives based on proximity to ideal and anti-ideal solutions. An alternative exhibiting greater proximity to the ideal solution and greater distance from the anti-ideal solution is accorded a higher ranking (Jaini & Utyuzhnikov, 2016). Nonetheless, considering fuzzy logic’s ability to handle data vagueness and assign weights for criteria and alternatives (Chen, 2000), Fuzzy TOPSIS is chosen. Similarly, Fuzzy AHP is favored over conventional AHP, deploying fuzzy theory and fuzzy numbers to mitigate reliance on expert judgment (Figueiredo, et al., 2021). AHP relies on a reductionist approach reminiscent of Newtonian and Cartesian thought, wherein the problem is dissected into progressively smaller components until a precise and scalable level of analysis is achieved. AHP also necessitates the involvement of experts, who engage in pairwise comparisons of similar criteria to establish priorities for ranking the alternatives (Mathew et al., 2020). However, this also brings a deficiency caused by its reliability to experts (Ali et al., 2017). Therefore, Fuzzy AHP is preferred for this study because it deploys fuzzy theory concepts in hierarchical structure analysis using fuzzy numbers instead of real numbers (Figueiredo, et al., 2021).

Despite the significant potential contribution of such integrated methodology, no study utilized the Fuzzy MCDM methods in exploring the importance and priority of WE indicators. In the Scopus Publication Database, we searched for the following query matching the keywords corresponding to the concepts of our research question “Women Empowerment”, “indicator(s)”, “evaluation/assessment/measurement”, and combined them with the methodological keywords being “multi-criteria/multi-attribute decision making” and “fuzzy” (excluding the exact method name, such as AHP or TOPSIS to prevent researcher bias). The search query returned no publications in Scopus Database, as shown in Fig. 1.

Fig. 1
figure 1

Screenshot of the Search Result in Scopus Database

Even when we limited the query to exclude the keyword “Fuzzy”, the Scopus database still did not include any publications. When the query is revised by replacing the MCDM/MADM keywords with the specific methods of AHP or TOPSIS, the database cannot find any matches. Therefore, this paper can potentially contribute to this methodological gap in the WE indicator ranking literature. The following sections introduce the combined methods in detail.

3.1 Fuzzy AHP

The Analytic Hierarchy Process (AHP), as introduced by Saaty (1990), serves as a valuable and pragmatic tool intergrating qualitative and quantitative elements into the decision-making process. However, it often faces criticism due to its reliance on a discrete scale ranging from 1 to 9, which proves inadequate in addressing the uncertainty and ambiguity inherent in determining the priorities of various attributes (Choudhary & Shankar, 2012). FAHP was developed to better the extent of the decision-makers’ piece of knowledge and prevent the risks of hidden uncertainty embedded in traditional AHP, which does not resemble how humans think (Kahraman et al., 2003). In addition, linguistic terms are used by fuzzy set theory for the representation of the decision-maker’s preferences (Awasthi et al., 2011). Accordingly, during the application of FAHP, the fuzzy triangular set is selected at first with the linguistic representations, which can be observed in Sect. 4.2. The methodology being described by Dang et al. (2019) was used in this study. The steps after the linguistic terms selection start with the pairwise comparison matrix:

$$\tilde{A}^{k} = \left[ {\begin{array}{*{20}c} {\tilde{d}_{11}^{k} } & \cdots & {\tilde{d}_{1j}^{k} } & \cdots & {\tilde{d}_{1n}^{k} } \\ \vdots & {} & \vdots & {} & \vdots \\ {\tilde{d}_{i1}^{k} } & \cdots & {\tilde{d}_{ij}^{k} } & \cdots & {\tilde{d}_{in}^{k} } \\ \vdots & {} & \vdots & {} & \vdots \\ {\tilde{d}_{n1}^{k} } & \cdots & {\tilde{d}_{nj}^{k} } & \cdots & {\tilde{d}_{nn}^{k} } \\ \end{array} } \right] i = 1,2, \ldots ,n; j = 1,2, \ldots ,n$$
(1)

\(\tilde{d}_{ij}^{k}\) represents the kth decision-makers’ preference of the ith criteria over the jth criteria. If several decision-makers are evaluating their judgment, then an average preference for pairwise comparison matrices for all criteria would be as follows:

$$\tilde{A} = \left[ {\begin{array}{*{20}c} {\tilde{d}_{11} } & \cdots & {\tilde{d}_{1n} } \\ \vdots & \ddots & \vdots \\ {\tilde{d}_{n1} } & \cdots & {\tilde{d}_{nn} } \\ \end{array} } \right]$$
(2)

where

$$\widetilde{{d_{ij} }} = \frac{{\mathop \sum \nolimits_{k = 1}^{k} \tilde{d}_{ij}^{k} }}{k}$$
(3)

Chang and Yang (2011) suggest using the geometric mean of these numbers to avoid extreme values and to better deal with the reciprocal numbers:

$$\widetilde{{d_{i} }} = \left( {\mathop \prod \limits_{j = 1}^{n} \widetilde{{d_{ij} }}} \right)^{1/n} , i = 1,2, \ldots ,n$$
(4)

where \(\widetilde{{r_{i} }}\) is the fuzzy geometric mean and \(\widetilde{{d_{ij} }}\) represents the decision-maker’s preference of the ith criteria over the jth criteria. Later, fuzzy weights of the criterion (\(w_{i} )\) are calculated as below:

$$\widetilde{{w_{i} }} = \widetilde{{r_{i} }}{\text{ x }}\left( {\widetilde{{r_{1} }} + { }\widetilde{{r_{2} }} + \ldots + \widetilde{{r_{n} }}} \right)^{ - 1}$$
(5)

Finally, average and normalized weight criteria are calculated. \(M_{i}\) is the average, and \(N_{i}\) is the normalized weight criteria.

$$M_{i} = \frac{{\widetilde{{w_{1} }} + { }\widetilde{{w_{2} }} + \cdots + \widetilde{{w_{n} }}}}{n}$$
(6)
$$N_{i} = \frac{{M_{i} }}{{\widetilde{{M_{1} }} + { }\widetilde{{M_{2} }} + \cdots + \tilde{M}n}}$$
(7)

To check the consistency, the fuzzy numbers denoted as M = (l, m, u) are defuzzied to crisp numbers (Kwong & Bai, 2003) as the following equation.

$$M_{CRISP} = \left( {4m + l + u} \right)/6$$
(8)

Then, the consistency ratio (CR) is calculated below using the random index (RI) shown in Table 1 (Golden et al., 1989).

$$\begin{gathered} CI = \frac{{\lambda_{max} - n}}{n - 1} \hfill \\ CR = \frac{CI}{{RI}} \hfill \\ \end{gathered}$$
(9)
Table 1 Random Index

3.2 Fuzzy TOPSIS Method

TOPSIS is a method proposed by Hwang and Yoon (1981) to solve MCDM problems by obtaining the alternative with the shortest distance from the positive ideal solution and the longest distance from the negative ideal solution; the positive ideal solution consists of the best (highest) values for positive criteria and the best (lowest) values for negative criteria. However, similar to what was mentioned in the previous sections, a fuzzy extended version of TOPSIS was proposed by Chen (2000) to overcome the problem of vague and uncertainty in human thinking, which eventually influence decision-making processes (Sadoughi et al., 2012).

Chen's (2000) methodology is used in this paper. FTOPSIS starts with constructing the group of decision-makers and building the criteria set. This method utilizes the linguistic variable sets to evaluate the alternatives based on criteria. Then, it follows the steps below, applying the Eqs. 18 (where \(\tilde{X}ij^{K}\) is the rating weight while \(\tilde{W}j^{K}\) is the importance weight among k decision-makers). Considering we have m alternatives, n measures, and k decision-makers, the fuzzy multi-criteria group decision-making problem would be the following matrix:

$$\tilde{D} = \begin{array}{*{20}c} {A_{1} } \\ \vdots \\ {A_{i} } \\ \vdots \\ {A_{m} } \\ \end{array} \left[ {\begin{array}{*{20}c} {\tilde{X}_{11} } & \cdots & {\tilde{X}_{1j} } & \cdots & {\tilde{X}_{1n} } \\ \vdots & {} & \vdots & {} & \vdots \\ {\tilde{X}_{i1} } & \cdots & {\tilde{X}_{ij} } & \cdots & {\tilde{X}_{in} } \\ \vdots & {} & \vdots & {} & \vdots \\ {\tilde{X}_{m1} } & \cdots & {\tilde{X}_{mj} } & \cdots & {\tilde{X}_{mn} } \\ \end{array} } \right] i = 1,2, \ldots ,m; j = 1,2, \ldots ,n$$
(10)

where \(A_{1}\), \(A_{2}\), …, \(A_{n}\) are alternative (indicator in our case) to be selected or prioritized, \(C_{1}\),\(C_{2}\),…,\(C_{n}\) are evaluation measure or criteria. Furthermore, \(\tilde{X}_{ij}\) stands for the value of importance degree of alternative \(A_{i}\) based on \(C_{j}\) by evaluator k. The average value method is used for integrating the fuzzy performance score \(\tilde{X}_{ij}\) of k evaluators. Moreover, \(\tilde{X}_{ij}^{k}\) indicates a degree of alternative \(A_{i}\) based on \(C_{j}\) by evaluator k; a,b,c are the fuzzy numbers:

$$\begin{gathered} \tilde{X}_{ij} = { }\frac{1}{k}\left[ {\left( {\tilde{X}_{ij}^{1} + \tilde{X}_{ij}^{2} + \cdots + \tilde{X}_{ij}^{k} } \right)} \right] \hfill \\ \tilde{X}_{ij} = { }\left( {a_{ij}^{k} ,b_{ij}^{k} ,c_{ij}^{k} } \right) \hfill \\ \end{gathered}$$
(11)

The initial collected data to address variations in measurement units and scales within MCDM problems must be normalized. In this study, the linear normalization technique is used, based on the following equation where \(\tilde{R}\) is a normalized fuzzy decision matrix.

$$\begin{gathered} \tilde{R} = [\tilde{r}_{ij} ]_{m \times n} , i = 1,2, \ldots ,m;j = 1,2 \ldots , \hfill \\ {\text{Where}} \tilde{r}_{ij} = \left\{ {\begin{array}{*{20}c} {\left( {\frac{{a_{ij} }}{{c_{j}^{ + } }}, \frac{{b_{ij} }}{{c_{j}^{ + } }},\frac{{c_{ij} }}{{c_{j}^{ + } }}} \right)} \\ {\left( {\frac{{a_{j}^{ - } }}{{c_{ij} }}, \frac{{a_{j}^{ - } }}{{b_{ij} }},\frac{{a_{j}^{ - } }}{{a_{ij} }} } \right)} \\ \end{array} } \right. \hfill \\ \end{gathered}$$
(12)

If j is benefit criteria: \(c_{j}^{ + } = {\text{max}}_{i} c_{ij}\) while if j is a cost criterion: \(a_{j}^{ - } = {\text{min}}_{i} a_{ij}\).

With the consideration of various weights being attended to each indicator, a weighted normalized decision matrix is obtained by multiplying the importance weight of criteria by the normalized fuzzy decision matrix, which is shown as \(\tilde{V}\) below:

$$\begin{gathered} \tilde{V} = [\tilde{v}_{ij} ]_{m \times n} , i - 1,2, \ldots ,m; j = 1,2, \ldots ,n \hfill \\ \tilde{v}_{ij} = \tilde{r}_{ij} \otimes \tilde{w}_{j} ,\;{\text{where}}\;\tilde{w}_{j} {\text{represents the weight of criteria}} j \hfill \\ \end{gathered}$$
(13)

As positive triangular fuzzy numbers are between zero and one, fuzzy ideal solution and negative fuzzy solution are calculated as below:

$$\begin{gathered} A^{ + } = \left( {\tilde{v}_{1}^{ + } ,\tilde{v}_{2}^{ + } , \ldots ,\tilde{v}_{n}^{ + } } \right), \hfill \\ A^{ - } = \left( {\tilde{v}_{1}^{ - } ,\tilde{v}_{2}^{ - } , \ldots ,\tilde{v}_{n}^{ - } } \right) \hfill \\ {\text{Where}}\; \tilde{v}_{j}^{ - } = (0,0,0),\tilde{v}_{j}^{ + } = (1,1,1),\;), j = 1,2, \ldots ,n \hfill \\ \end{gathered}$$
(14)

Then comes the step of computing the distance of each alternative with positive fuzzy ideal solution (\(d_{i}^{ + } )\) and negative fuzzy ideal solution (\(d_{i}^{ - } )\):

$$\begin{gathered} d_{i}^{ + } = \mathop \sum \limits_{j = 1}^{n} d(\tilde{v}_{ij} ,\tilde{v}_{j}^{ + } ), i = 1,2, \ldots ,m;j = 1,2, \ldots ,n \hfill \\ d_{i}^{ - } = \mathop \sum \limits_{j = 1}^{n} d(\tilde{v}_{ij} ,\tilde{v}_{j}^{ - } ), i = 1,2, \ldots ,m;j = 1,2, \ldots ,n \hfill \\ \end{gathered}$$
(15)

Further, \(d(\tilde{v}_{a} ,\tilde{v}_{b} )\) represents the distance between two fuzzy numbers if \(\tilde{v}_{ij}\) = (a,b,c):

$$\begin{gathered} d(\tilde{v}_{ij} ,\tilde{v}_{j}^{ + } ) = \sqrt {\frac{1}{3}\left[ {\left( {a - 1} \right)^{2} + \left( {b - 1} \right)^{2} \left( {c - 1} \right)^{2} } \right]} \hfill \\ d(\tilde{v}_{ij} ,\tilde{v}_{j}^{ - } ) = \sqrt {\frac{1}{3}\left[ {\left( {a - 0} \right)^{2} + \left( {b - 0} \right)^{2} \left( {c - 0} \right)^{2} } \right]} \hfill \\ \end{gathered}$$
(16)

Finally, the closeness coefficient enables ranking all alternatives and the selection of the best alternative. The way to calculate the closeness coefficient of each alternative is below:

$$CC_{i} = \frac{{d_{i}^{ - } }}{{d_{1}^{*} + d_{1}^{ - } }}, i = 1,2, \ldots ,m.$$
(17)

\(CC_{i}\) indicates the extent of the proximity of alternatives to the optimal solution and distance from the negative ideal solution. Hence, higher values of \(CC_{i}\) signify stronger performances of alternatives.

4 A Combined MCDM Framework Propos for the Women Empowerment Program Performance

4.1 Building the MCDM Model

In the first phase, potential SI indicators for women's economic empowerment projects were identified through data collected via semi-structured key informant interviews (KIIs) with representatives from 11 Women NPOs in Turkey. These organizations included SistersLab, We Need to Talk, Flying Broom Association, Wtech, IDEMA, Red Pepper, KA.DER, Yaşamda Kadın ve Sanat Derneği, Ortak Yaşamı Geliştirme Vakfı, KAPI, and Yanındayız Derneği. Most interviewees were women with moderate experience in SIM, serving as project managers or co-founders. Informed consent was obtained at the outset of the interviews. Two interviewers conducted the KIIs through Zoom, each lasting approximately 40 min. They independently transcribed the interviews, coded the statements, and subsequently compared and merged the codes. The consistency in coding between the two interviewers obviated the need for an interrater reliability test, such as the Fleiss Kappa Statistic (Fleiss et al., 2003), which typically yields an “excellent” score of “ = 1”. As a result, these indicators were defined and used as the second-level criteria set within the MCDM framework. In the second phase, we developed an MCDM framework for evaluating these indicators based on a literature review. Subsequently, structured Expert Interviews (EIs) were conducted as online surveys with a new group of 12 decision-makers. This group consisted of 9 employees from Women's Empowerment NPOs from the initial list, 1 independent SI analyst, and 2 researchers involved in SIM. The final phase involved the application of MCDM methods, The methodological flow of the MCDM study is provided in Fig. 2.

Fig. 2
figure 2

Methodological flow of the MCDM study for indicator evaluation

Following a literature review assessing the appropriateness and practicality of indicators drawn from EI findings, we established a relevant criteria set, forming the first level of the FAHP hierarchy. In their FTOPSIS application to evaluate health, safety, and environment performance indicators, Sadoughi et al. (2012) based criteria on the SMART framework (specific, measurable, attainable, realistic, and time-sensitive) during the FTOPSIS application, while Schwemlein et al. (2016) suggested indicators should be measurable, reliable, available, sensitive, and valid. Accordingly, “measurability” necessitates quantifiability (Dale & Beyeler, 2001), “reliability” ensures consistent results from the same group, day, and method, eliminating evaluator bias, and “validity” underscores relevance to the topic (WHO, 1997). During EIs, criteria indicative of an indicator’s importance in various frameworks were defined as follows: measurability (C1), attainability (C2), relevance (C3), reliability (C4), and alignment with multiple SI frameworks (C5). These criteria align with Fuzzy AHP's methodological requirements regarding the number of criteria and levels. Russo and Camanho’s (2015) analysis of AHP literature suggested using seven or fewer criteria and alternatives for optimal consistency and redundancy in AHP. They also claimed previous AHP models predominantly featured one to three layers, with a two-level structure being the most common. The proposed framework in Fig. 3 adheres to these principles, maintaining a criterion count below seven and employing a two-level hierarchy.

Fig. 3
figure 3

MCDM Model

4.2 Application of MCDM Methods

This study combined the FAHP and FTOPSIS methods for evaluating the performance indicators of WE projects. Besides, sensitivity analysis and consistency check proved the validity of the proposed framework. Collecting the data, we conducted a survey with 12 experts, asking them to weigh the 5 criteria based on their importance and to evaluate 17 indicators by the 5 criteria.

12 decision-makers include 9 WE NGOs’ employees, 1 independent SI analyst, and 2 experienced researchers in the market research industry who are also involved in SIM. In the first part of the survey, criteria were rated using the linguistic variables Nurani et al. (2017) adapted in their study for Fuzzy AHP, and the Fuzzy number set in Table 2 is applied. In the second part of the survey, where the decision-makers rated the indicators based on the criteria, the scale in Table 3 with the relevant linguistic terms (Zahari & Abdullah, 2012) are adapted.

Table 2 Scale assessment conversion fuzzy AHP
Table 3 Fuzzy linguistic terms and correspondence for each alternative

The FAHP process went from building the pairwise comparison matrix to the weighted normalization fuzzy decision matrix (Eqs. 17). Table 4 shows the final weights from the geometric means of all decision-makers’ ratings for the criteria. The criteria weights were 0.205, 0.233, 0.278, 0.138 and 0.145, respectively.

Table 4 Results of fuzzy AHP with the normalized values in bold indicating the weights of the criteria

To test the validity of the method, Eqs. 89 are applied. Since the consistency rate is 2.9% as seen in Table 5, which is lower than 10%, we confirmed the findings’ consistency. We accepted the calculated weights in Fuzzy AHP as appropriate for ranking the indicators.

Table 5 Consistency check for fuzzy AHP

With the criteria weights being obtained through Fuzzy AHP, the Fuzzy TOPSIS process started. Through Eq. 10, the decision matrix is constructed based on the 12 decision-makers’ evaluation of indicators. Following the steps described in Eqs. 1117, the closeseness coefficients (CCi) and rankings of indicators are obtained as presented in Table 6. A2, A3, and A4 are ranked among the top three indicators.

Table 6 Results

Eventually “number of employed or business starter beneficiaries”, “type of jobs”, “the revenue increase in the beneficiaries’ existing businesses,” and “number of beneficiaries who promoted in their jobs after the project” are resulted to be the most significant indicators. On the other hand, “positive change in beneficiaries’ view of their body”, “increase in mental well-being”, and “prevention of domestic violence during the project” occurred as indicators with a lower weight in women's economic empowerment programs in Turkey.

4.3 Sensitivity Analysis

The sensitivity analysis is a critical step of the AHP-based indicator ranking studies to test the proposed frameworks' wellness (Amrita et al., 2018). It is also crucial to understand the sensitivity of the results towards the changes in weights of the criteria (Perçin, 2012) by validating the MCDM frameworks and providing clear directions for improving the consensus and consistency among the respondents to the research tools.

Perçin (2012) measured the sensitivity in a FTOPSIS-FAHP integrated study by setting 10 cases. Five cases adapt the maximum weight for the first and random values of calculated weights for other criteria. The first criteria take the lowest weights for the last five cases while others change. We applied the same model by inserting the weights from FAHP to the model, though we used 8 cases due to the lower number of criteria, as given in Appendix 1.

Figures 4 and 5 show A2 increases in the 4th case and reaches the top in the 6th case, resulting from the C5 receiving the highest weight. The fifth case had the lowest value in C5, and A2 was the second-best ranked, showing its strong perception as the recommended indicator in many SI frameworks. Other indicators’ rankings did not change significantly; however, A1 showed high sensitivity to the relevance criteria (C3), occurring in case 5 and case 8, where the C3 took its highest weights, and measurability had the lowest. On the other hand, A1 kept obtaining one of the best ranks and can be counted as better than more than half of the other criteria. Still, it indicates the criteria measurability and relevance are critical in changing some of the results. In other words, an indicator may not be the most relevant factor to measure women’s economic empowerment even though it is highly measurable. This reveals an important finding on prioritizing indicators while measuring the success of those projects in the most realistic way.

Fig. 4
figure 4

Sensitivity analysis of FAHP-FTOPSIS (CCi)

Fig. 5
figure 5

Sensitivity analysis of FAHP-FTOPSIS (Ranking)

5 Discussion

In SIM and performance evaluation processes of social programs and projects, measurement frameworks including well-designed indicators play a critical role in defining the theory of change and monitoring the performance during and after the implementation phase. The utilization of advanced indicator design and selection methods significantly enhances the effectiveness of performance management within social programs. Nevertheless, within various social intervention initiatives, such as WE programs, both implementers and funders encounter difficulties when constructing evaluation frameworks, particularly in the realm of indicator design being tailored to context-based SIM. Referring to Bice (2020) and Taylor (2004), who pointed out the importance of context-based evaluation of social intervention, this paper provides a methodology and presents an application for a context-specific ranking of indicators from the case of women’s economic empowerment programs of NPOs in Turkey. In this way, this study also serves to overcome the “prioritizing outcome measures” and “the varying meaning of empowerment in different contexts” challenges being raised by Glennerster et al. (2018) in WE measurement. As recommended by these authors, we employed the insights derived from formative research to select or develop locally customized indicators and questions. This approach aimed to augment the context-specific empowerment indicators with standardized ones, addressing the challenges at hand. In addition, aligned with the concerns and needs about ToC alignment, the proposed methodology offers an expert-based view of expected and actual outcomes as inputs to the indicator set.

The field study with experts and key informants revealed employment status and conditions, knowledge and skills, psychological resources, awareness and act on gender equality and roles, and child-care conditions are part of women's economic empowerment. The mentioned topics also align with the framed indicators directly influencing the status of WE reported by Sharma and Bansal (2017). The child-care conditions can be matched with household management, employment with economic contribution, and financial freedom. Findings from the study do not include any indicator about “educational freedom” and “physical health” (mental well-being indicator can be argued for being categorized within the health category). The findings also revealed economic indicators (such as “the number of employed or business starter beneficiaries”, “the revenue increase in beneficiaries’ existing businesses”, etc.) occurred as the most critical indicators. In assessing indicators related to gained soft skills and mental well-being within the context of women’s economic empowerment projects in Turkey, this study yielded lower scores. This finding raises the discussion on the priority of economic obstacles in WE from a developing country context, which is expected to be an input to further studies. This insight is poised to serve as valuable input for future studies. Moreover, though the commonly used indicator “the number of beneficiaries who participated in the project” can be good in output measurement, it did not receive a high rank for SIM.

Referring to Rowland’s (1997) typology and categories of Pereznieto and Taylor (2014) and Carter et al. (2014), we can also discuss the ranking of WE indicators being revealed by this study based on their agency practice types. The study unveiled indicators ranking within the top seven (1st–7th) primarily pertain to economic aspects. Specifically, indicators A2, A3, and A4 focus on employment, while indicators A12, A14, and A15 are centered around income increase. These indicators are associated with the "power over" dimension, which pertains to individuals' ability to access and control financial, physical, and knowledge-based assets, including opportunities for employment and income-generation activities. On the other hand, A1, A10, A6, A8, and A11 which are a part of the “power within” dimensions of empowerment are listed in the middle ranks (8th–12th) from FTOPSIS results. The indicators being ranked between 11 and 18th are about “power to” and “power over” practices, being associated with the well-being of women and inclusivity being achieved by empowerment. Based on this categorization, it is evident social indicators related to empowerment, particularly those involving power over resources, agencies, and achievements, hold a comparatively higher degree of significance and priority for social programs. The process of empowering women to uncover their “power within” follows “power over” dimensions in these rankings. However, one might notice the aspects related to “power to” and “power with” are given lower priority, except for entrepreneurship. The mentioned classification can be observed in Appendix 2.

For methodological discussion, in practice, indicator rankings or importance levels are determined by naive methods such as expert scoring or weighted scoring models by a few experts, which raises the subjectivity and consistency problem. Despite their notable advantages, which include the elimination of qualitative expert evaluations for performance indicators and provision of relativity and consistency checks to expert opinions, MCDM tools have been infrequently employed in SI and outcome evaluation frameworks of social projects. Being one of the few studies, Rafiaani et al. (2020) used TOPSIS to rank the SI indicators and criteria for better SIM. Moreover, Amrita et al. (2018) applied FAHP to evaluate performance indicators of women’s entrepreneurship projects. Since there is no previous study on MCDM usage for evaluating performance indicators in WE projects (at least to our knowledge), especially in the context of Turkey, we applied an exploratory approach by interviewing experts to create an indicator list from scratch. Therefore, ratings of these indicators were the first attempt of rater experts in this domain, which raised the precise rating concern that Kabir and Hasin (2012) underlined. Overcoming this issue, we applied FTOPSIS, differing from Rafiaani et al. (2020). Furthermore, our approach not only expands upon the methodological framework introduced by Amrita et al. (2018) by incorporating the FTOPSIS-FAHP integrated approach but also broadens the scope to encompass an evaluation of indicators of overall women’s economic empowerment projects. This represents a valuable contribution to the field of MCDM studies within the context of SIM.

6 Conclusion

In conclusion, this paper delved into the realm of performance indicators within an evaluation framework, aiming to enhance their utilization across various stages of women’s economic empowerment interventions, from the initial design of ToC to ongoing monitoring and follow-up and ultimately to post-implementation phase involving measurement and final evaluation. Despite this valuable concept, SI has been in practice since the eighteenth century with remarkable developments within the last decades (Becker, 2001); the literature regarding quantitative approaches assisting indicator selection for WE projects is thin. Being motivated to contribute to stated significance of designing robust indicator sets based on project or program context in the literature, we aimed to propose an MCDM framework utilizing FAHP and FTOPSIS approaches. The literature was reviewed regarding WE indicators and their selection processes, MCDM methods used for indicator selection, and SIM contexts. Accordingly, 11 NGOs were interviewed to achieve a list of indicators commonly used in Turkey's context. The sampling from Turkey is valuable here as it presents a case of a developing and conservative country. Later, considering 12 decision-makers’ evaluations based on the criteria derived from literature, a ranking of indicators was provided. In this domain, study findings revealed the “number of beneficiaries” often used in output is insufficient to provide a robust measure of the project’s SI. The improvement of social skills, self-confidence, mental well-being, awareness of gender roles, view of their bodies, and prevention of domestic violence during the projects occurred as secondary indicators of impact on women’s economic empowerment. Indicators like number of beneficiaries employed, who started a business, got a promotion, type of jobs provided, and increase in revenues of their existing jobs received higher scores, enabling the representation of the intervening project’s success.

Through applying our proposed indicator selection model to Turkish WE NGOs, our findings not only shed light on enhancing performance evaluation but also made valuable contributions to context-specific processes associated with SIM of WE. One of the main contributions is providing a quantitative model to be adapted to any ToC on the way to designing an effective indicator set to facilitate the SIM processes. The second main contribution is the application of MCDM methods and their fuzzy extensions in this domain, which have been overlooked in the literature for SIM frameworks and scorecards. Even though MCDM has been utilized in several studies for evaluating SI, the gap in usage for indicator selection was apparent, especially in the context of WE. The proposed framework enabled the elimination of the subjectivity of the practitioners and the bias of the single advisor, consultant, or manager in assigning importance to SI indicators. Sensitivity analysis also provided a validity test for the appropriateness of the rankings. Each indicator is evaluated by five criteria: measurability, attainability, relevance, replicability, and usability in various contexts. In its current form, the indicator set and their importance weights can be applied to social output and impact evaluation processes of WE NGOs in Turkey. It may also serve as an analytical approach to design the ToC and MEAL frameworks with validated output indicators. Also, the proposed data collection and MCDM framework are valuable due to the context-based variety of WE programs, their outcomes, and the need for indicator-ToC alignment.

Here we must note the indicators and their corresponding ranking in this study for SIM in WE programs should be regarded as a proposed list. The content and prioritization of these indicators cannot be definitively determined without being integrated into the ToC which is specific to the evaluated social program or project (Glennerster & Takavarasha, 2013; Glennerster et al., 2018). The evaluators and program executors should design the indicators based on projects’ or programs’ goals; this tool may guide them through that process. The proposed and validated MCDM methodology for indicator selection and ranking is also adaptable to SI and program performance evaluations from various contexts. We also underline the focus was women’s economic empowerment in our study; future studies should be extended beyond economic empowerment, and should also deal with the empowerment of reproduction and maternal health indicators as previously discussed by reproduction research and feminist literature (such as Hudson et al., 2019; Thompson, 2005). Moreover, women’s empowerment measures from different frameworks (UNF, IRIW, etc.) can be benchmarked and merged by the indicators introduced in this study to provide a more consolidated approach to WE measures. Another important future research topic is to categorize and discuss the ranked indicators in Rowland’s (1997) typology for elaborating on the interventions to extend the agency dimension and resources. Other MCDM methods such as VIKOR and Data Enveloping can also be applied to the framework for delivering cross-validation on the results. Comparison of the indicator ranks can enable evaluators to understand further the cross-country or cross-regional differences among WE priorities as an input to social intervention programs and policies towards UN’s SDG No. 5: Gender Equality.