1 Introduction

There is no doubt that weather-related events are increasing dramatically both in frequency and severity. Flooding is one of the major weather-related disasters that have affected almost every continent in the world (Kron 2005, 2014). One of its management strategies is to reduce the risk of flooding (Kobayashi and Porter 2012). Despite the huge variation in the concept of "risk" in the literature, it is generally held that risk encompasses “vulnerability" and hence vulnerability assessment is an essential component of disaster risk management (Nasiri et al. 2016). Defining vulnerability, however, is itself a difficult task (Birkmann 2006). The scientific use of the word vulnerability has been reported to have its roots in geography and natural hazard research, but it has now become a key concept for other research areas, such as ecology, health, development, livelihoods, and climate change (Fussel 2007). Literature also indicated that vulnerability is a single word with multiple meanings (Ciurean et al. 2013) that is often conceived in different terminology in distinct areas of research (Fussel 2007) to address similar issues (Brooks 2003). Therefore, the definition and assessment of vulnerability in each specific context are considered specific (Ciurean et al. 2013). One of the appropriate strategies for capturing the snapshot of a complicated human-nature system or society that shapes vulnerability is an indicator-based vulnerability assessment (Fekete 2010) using composite indicators or indices.

It was noted that not only composite indicators, but also their construction steps, are not exempted from criticism (Greco et al. 2018). Keeping this uncertainty in mind, some authors argued that the vulnerability assessment is study-specific (Ciurean et al. 2013), while others contended that empirical limitations and inconsistencies are inevitable in indicator-based vulnerability assessment (Damm 2010). However, in any case, composite indicators for vulnerability assessments should be made exclusively open, flexible, and customizable so that their users can adjust their structural design, compile (substitute) indicators, weighting schemes, aggregate approach and be easily understood by their non-technical users for reliable results (Baptista 2014).

Significant research has been conducted in this part (i.e., study area) of the province, taking into consideration flood risk management, damages caused by floods, gender role in flood disasters, and a little bit of indicator-based flood vulnerability and resilience assessment (Bibi et al. 2018; Farish et al. 2017; Khan and Ali 2012; Khan and Mohmand 2011; Nasir and Tabassum 2014; Saleem 2013; Qasim et al. 2015, 2016, 2017; Shah et al. 2018). It has been found that some of the methodological and practical implications of the indicator-based flood vulnerability and resilience assessment studies still require further investigation. Methodological issues include, but are not limited to, the use of redundant or logically associated indicators, the use of a single approach to construct composite indicators, and the lack of robustness tests to determine how stable the results are in comparison to other methods. The practical implication is that these studies overlook to identify the comparative levels of flood vulnerability of the flood-prone communities and vulnerability’s drivers for policy measures, while upscaled or presented their results at districts level. For instance, Qasim et al. (2016) conducted a community flood resilience study in this part of the province. Indicators have been selected on the basis of comprehensive literature. A survey was conducted through random sampling from 280 houses in Charsadda, Peshawar, and Nowshera districts (i.e., central part of the province). A subjective weighting approach has been used for the development of composite indicators. They reported that community-based resilience across all three districts was very low. Similarly, Qasim et al. (2017) performed a flood vulnerability assessment in Charsadda, Peshawar, and Nowshera districts and adopted almost the same approach as Qasim et al. (2016). Not surprisingly, the findings are the same and it is inferred that all districts are highly vulnerable to flooding. In this context, Shah et al. (2018) recently made one of the significant contributions by assessing flood vulnerability and resilience study of community households in the area. They selected the flood-affected villages in Charsadda and Nowshera districts and conducted surveys (of 600 households) through simple random sampling. They also selected indicators based on extensive literature and assigned weights to indicators using a subjective method. They noted that the community households in the Nowshera district are highly vulnerable and have low resilience compared to the Charsadda district.

The assessment of (flood) vulnerability on the basis of large geographical scale or administrative units, such as districts, has one of the main criticisms of the lack of heterogeneity (Balica et al. 2009; Kablan et al. 2017). Since not every part of a district does seem to be vulnerable to riverine flooding. In addition, secondary data can be effectively used for district level (i.e., broad scale) flood vulnerability assessment. Ample data may be available for large-scale vulnerability assessment with the authorities concerned and can be used for vulnerability assessment. Thus, if the purpose of flood vulnerability assessment is merely to determine the relative flood vulnerability levels of certain large-scale units (e.g., districts, counties, states, etc.), secondary data can be used for such purpose (see Nazeer and Bork 2019). However, such an assessment cannot address the heterogeneity issue (i.e., identifying which place or community is relatively highly vulnerable compared to others within the same district, state, etc.). Since the large-scale upscaling of the results would again lead policymakers and stakeholders to look for the vulnerability hotspots (or communities) instead of the whole state, district, etc. Therefore, collecting data for flood vulnerability assessment at the household or community level and then using these data for large-scale flood vulnerability assessment would not fulfill the site-specific vulnerability assessment objective. The methodological issue of assigning weights to indicators by subjective means and aggregation by a single approach requires further investigation. It is argued that not only the weighting of indicators but the standardization and aggregation of indicators would considerably modify the outcomes (Greco et al. 2018). This implies that the assessment of flood vulnerability based on indicators is not just the compilation and aggregation of indicators but a "mountain to climb" that can take several challenges at a time. In addition to conceptualizing the definition of vulnerability, there is a heated debate on several other issues as one attempt to measure the vulnerability to persuade the multiple stakeholders involved in policy making. Consensus of significant numbers of stakeholders would be attained when the investigator could illustrate all the upsides and downsides of the vulnerability assessment to the end-users. This issue can only be addressed if the vulnerability assessment approach is communicated transparently to all stakeholders in a comprehensive, step-by-step process (see Baptista 2014). In addition, an in-depth analysis is needed to include the most relevant number of indicators in the final assessment, rather than a set of closely related indicators. Birkmann (2007) added that certain redundant or strongly correlated indicators are often used at local or community level when evaluating vulnerabilities that need to be investigated further. Keeping these issues into account, the current study aims to (1) demonstrate flood vulnerability assessment in a comprehensive step-wise process by highlighting the key issues in the empirical assessment of flood vulnerability using composite indicators, and (2) to identify the comparative levels of flood vulnerability and drivers across the selected flood-prone villages of Khyber Pakhtunkhwa, Pakistan.

2 Material and methods

2.1 Study area

The province of Khyber Pakhtunkhwa lies northwest of Pakistan. The province covers an area of approximately 74,521 km2. The exact location of the province is 31°15′–36°57′ North and 69°5′–74°7′ East. The research area of the current study is the two central districts of Peshawar vale (Charsadda and Nowshera) also known as the catchment of the Kabul River and the Swat River. It has been reported that these districts have been greatly affected by almost each flooding due to their proximity to the tributaries of the Kabul River, Swat River, and Indus Rivers (Qasim et al. 2017). The area is quite flat relative to its surrounding areas (Fig. 1). Due to its fertile soil, the majority of the population relies on agriculture (Aslam 2012). One of the main difficulties in selecting study sites is the use of two different government classifications. The novel concept is the "Union Council" based on electoral boundaries rather than the previous "Patwar Circles" based on revenue perspective and may often overlap various union councils (RisePak 2006). Patwar Circles are used generally as a system analysis unit since the census is mainly based on Patwar Circles (Amin and Goldstein 2008). However, reports from the Provincial Disaster Management Authority (PDMA 2012, 2014, 2017) used data of union councils that not only assist in the random selection of the most vulnerable flood-affected villages but also in estimating the required sample size. Hajizai, Sukkar, Agra, Dheri Zardad, Banda Shaikh Ismail (B.S. Ismail), Mohib Banda, and Pashtun Garhi are selected villages (also referred to as communities or settlements in this study). The earlier fours fall within the administrative boundaries of the district of Charsadda while the latter three fall into the district of Nowshera.

Fig. 1
figure 1

Contour map of the research site

2.2 Development of flood vulnerability index

The construction of flood vulnerability indices in the current study follows the methodology of the Organization for Economic Cooperation and Development (OECD 2008), which is generally appropriate and practical. Note that the “Handbook on Constructing Composite Indicators” is jointly prepared by OECD and Joint Research Center (JRC) of European Commission. Therefore, the JRC's publications and reports follow almost similar approaches. This study followed the Hagenlocher et al. (2013) and Hagenlocher et al. (2016) approach to the development of composite flood vulnerability indicators that are largely based on the OECD multi-step workflow (Fig. 2).

Fig. 2
figure 2

modified from Hagenlocher et al. 2013; Hagenlocher et al. 2016)

Workflow for the development of Flood Vulnerability Indices (based on OECD 2008 and

2.2.1 Framing vulnerability

Framing the concept of flood vulnerability is the first step in the empirical assessment of flood vulnerability. Rosen (1991 in Saisana and Saltelli 2011) posits that in the computational sense of the term, the issues surrounding multifaceted measures may be put into perspective when one deems these measures as models. Models are inspired by the systems (natural, biological, social, etc.) that one wants to understand. Developing a conceptual model or framework for vulnerability assessment is crucial as it will help to identify the underlying terminologies, goals, procedures, and components that are required for the development of flood vulnerability indicators (Fekete 2010). Birkmann (2006) has documented over 25 methods, ideas, and concepts for systemizing vulnerability and its underlying terms (i.e., exposure, susceptibility, adaptive capacity, resilience, etc.) under different schools of thought (see Birkmann 2006). This simply indicates that there is no agreement among scientists to acknowledge uniformly the definitions or concepts of these terms. The current study adopted the UNESCO-IHE (Institute of Water Education, Netherlands, Balica and Wright 2010) and the MOVE (Methods for the Improvement of Vulnerability Assessment in Europe, Birkmann et al. 2013) frameworks for the description of flood vulnerability concepts and related terminology. UNESCO-IHE describes vulnerability as a result of exposure, susceptibility, and resilience factors, whereas the MOVE framework retains the adverse definition of vulnerability and uses the “lack of resilience” term instead of resilience. The UNESCO-IHE framework is also linked to the assessment method (to develop the flood vulnerability index). Although the MOVE vulnerability framework does not have a particular vulnerability assessment method for index building, it allows researchers to use the framework flexibly depending on the need for analysis, data accessibility, and scale. Flood vulnerability means, in this study, the community situations (in the bio- physical and socio-economic components) that facilitate damage (Umweltbundesamt 2016) under certain conditions of exposure, susceptibility (Balica and Wright 2010), and lack of resilience (Birkmann et al. 2013). Whereas, exposure is “the extent to which an area that is subject to an assessment falls within the geographical range of a hazard event.” This entails the possibility that flooding will impact people and possibly physical objects (assets, buildings, cultural heritage and agricultural land) due to their location (Penning-Rowsell et al. 2005 in Balica and Wright 2010). Similarly, susceptibility means “the predisposition of elements at risk (social and ecological) to suffering harm resulting from the levels of fragility of settlements, disadvantageous conditions and relative weaknesses” (Birkmann et al. 2013; Kablan et al. 2017). And, the lack of resilience is the lack of abilities to anticipate, cope, and recover from the impact of a natural hazard. This comprises pre-event risk reduction, in-time coping, and post-event response actions (Birkmann et al. 2013).

2.2.2 Indicators explanation

The current study pursued the Villordon (2014) questionnaire as a model to be adopted. A preliminary test was performed to assess its soundness and applicability. However, due to some irrelevant issues such as flood insurance, drilling, the existence of shelters houses, and building codes that do not exist in the area, the available version of the questionnaire was found to be almost non applicable in the study area context. It was also found that there is no notion of resilience of households against floods in the area. Qasim et al. (2015) and Qasim et al. (2016) noted that certain beliefs and poverty play a role in the lack of resilience among community households in the study area, while United Nations Human Settlements Program (UN-HABITAT 2013) argued that people's lack of awareness is also a key factor to take resilience measures. So, in the local context, the questionnaire was adjusted through vast literature. Indicators are selected on the basis of earlier studies conducted either for social vulnerability to natural hazards or exclusively for flood vulnerability assessment in different parts of the world. However, relevancy to the local context was the top priority.

The list of selected indicators with respect to vulnerability factors is showed in Table 1. The houses near the river channel are reported to be more probable to be highly exposed (Villordon and Gourbesville 2014; Qasim et al. 2015). Those houses are considered in flood-prone area that are approximately within 2 km (where the water had reached in the last flooding) to river channel as contrast to 1 km reported by Qasim et al. (2015). Australian Council for International Development (ACFID 2011) reported that these two districts were the badly hit zones in last super flooding, where almost all the villages were badly damaged by flood water (Saeed and Attaullah 2014). It is also reported that houses built above ground or street level have less flood exposure (Muller et al. 2011; Villordon and Gourbesville 2014). Household’s size is used for concentrated population in this study (Balica and Wright 2010). The larger a household's members, the more probably it will be highly exposed due to hurdles in safe evacuation (Cutter et al. 2003; Holand et al. 2011; Muller et al. 2011; Qasim et al. 2015). The threshold was kept 8 people per household to simplify this indicator. Qasim et al. (2015) study indicated that there are 8.66 members in the average household. The house type also shows the degree of vulnerability to flooding. Houses made of low-quality materials have been revealed to be more probable to be highly vulnerable (Cutter et al. 2003; Qasim et al. 2016). Khyber Pakhtunkhwa’s Bureau of Statistics (KPBOS 2017) showed that Kacha (made of mud or other low-quality materials), semi-Kacha (developed with kiln bricks and plastered with mud), and Pacca (kiln bricks with cements or high-quality materials) are primarily three kinds of buildings in the region. Except the last, the earlier two are considered Kacha houses that are highly vulnerable to flooding. Certain occupations are more vulnerable to flood than others (Cutter et al. 2003). In the current study, occupation is tied with employment to avoid ambiguous questions. Households that depends solely on agriculture, livestock and related occupation or daily wages (general labor) are considered more vulnerable than households that have permanent jobs (government or private), own business, or profession (not related with agriculture, livestock, or allied sectors). The more income sources a household has, the more likely it will be resilient to flooding as income and wealth boost the ability to recover readily (Cutter et al. 2003; Holand et al. 2011). Different connotations exist in scientific community about “multiple income sources.” Some authors using it for diversified sources while other think it in the sense of family members who are working. The current study used it in the later sense. Similar assumption is applied to the household’s average monthly income. The average monthly household income limit is greater than 20,000 Pakistani rupees in the current study. Studies have also shown that open waste disposal has caused high flood vulnerability by blocking drains and raising environmental pollution that can induce and spread diseases (Pelling 1997; Villordon and Gourbesville 2014). Likewise, living in or close to degraded land increases a household's vulnerability to flooding (Villordon and Gourbesville 2014) as it will not only affect the built environment but also indicated the disadvantaged conditions for residents who rely on agriculture or natural resources. Studies have also shown that communication penetration rate can influence flood vulnerability (Balica and Wright 2010). The current study includes community households who have access to advanced communication sources (TV/Radio, mobile phone, etc.) in a resilient category compared to those who rely on loudspeaker, siren, or other traditional approach for flood warning and awareness. Since it may help to save lives, but it is difficult to manage property and livestock. The presence of asphalt or paved routes (Balica and Wright 2010) can influence a household vulnerability to safely evacuate. The increased level of literacy increases the resilience to flooding in two respects. In the first place, it improves the socio-economic status and, in the second place, it improves the capacity to understand and acquire timely awareness (Cutter et al. 2003; Fekete 2010; Muller et al. 2011; Qasim et al. 2015). In a previous study, Qasim et al. (2015) used ten years of formal education to be regarded as literate. Similar approach was also adopted by the present study. The presence of flood protection measures (retention walls, gabion walls, etc.) in the vicinity (Qasim et al. 2016), inability of access to healthcare facilities (Holand et al. 2011; Hagenlocher et al. 2013), and participation in any flood awareness or related program (Villordon and Gourbesville 2014; Qasim et al. 2016) are regarded as resilience to flood vulnerability. It was found that lifestyle is mainly based on joint family where the dependent concept is not really applicable, as almost each household has children or aged persons. Variability in shifting to a safe place, lack of drinking water, and toilet facilities was noticed to be negligible. The social-environment of the questionnaire included cooperating with each other in hard time, social connection with neighbors and relatives, etc. These are cultural tenets in the context of study area, where no variation has been observed in responses. These issues can be helpful in scaled-based index that is not applicable for the current approach. So, they are not included in the preliminary list of indicators.

Table 1 The preliminary set of indicators

2.2.3 Survey

The next step in flood vulnerability assessment, though a composite indicator, is the collection of relevant data. The sample size was determined using Slovin’s method (in Villordon 2014). The sample was based on the Provincial Disaster Management Authority's report (PDMA 2014), with a total population of 146,137. Through holding the margin of error 5%, the appropriate sample size is achieved, which was proportionally divided into selected villages, Haji Zai (n = 56), Agra (n = 56), Sukkar (n = 67), Dheri Zardad (n = 62), B.S. Ismail (n = 51), Mohib Banda (n = 51), and Pashtun Garhi (n = 57). Note that population data from the union councils are used as no such data exists for the selected villages in recent times. For Haji Zai, Agra, Sukkar, Dheri Zardad, B.S. Ismail, Mohib Banda, and Pashtun Garhi, respectively, the population of the specified union councils where these villages are located is 20,450; 20,509; 24,366; 22,703; 18,654; 18,654; and 20,801.

2.2.4 Data treatment

In previous flood vulnerability or resilience assessment studies (e.g., Villordon 2014; Villordon and Gourbesville 2014; Qasim et al. 2017; Shah et al. 2018), a relatively easy method (primarily, Balica and Wright 2010) is adopted using a general vulnerability equation. For instance, if a village has 50% exposure, 100% susceptibility, and 50% resilience, then that village's flood vulnerability will be 100% regarding the general vulnerability equation (i.e., (exposure × susceptibility)/resilience). This approach is simple and easy to use. There are, however, some issues with this approach. First, most indicators are definitely redundant at local level, as reported by Birkmann (2007). The issue of double counting of the indicators is important step to be considered in the formation of composite indicators. Different views exist for the selection of certain indicators when they are highly correlated. It is generally accepted that if there are two or more indicators describing the same phenomenon and there is a high correlation among these indicators, then it is appropriate to discard certain indicators using a "rule of thumb." However, if the correlated indicators represent different phenomena than the rule can be safely neglected (see Damm 2010). The cutoff value of Pearson’s correlation (r) for strong linear relationship as a “rule of thumb” was reported 0.65 in the study of Damm (2010), and that is also applied in the current study. Second, such an approach hardly facilitates the development of flood vulnerability indices using different data rescaling, weighting, and aggregation approaches. As these issues can has a significant impact on the results of the flood vulnerability. Therefore, the current study uses the Mengesha (2014) methodology to address this issue with different approaches for data transformation.

2.2.5 Indicators rescaling

The general minimum–maximum method is used for rescaling data (Iyenger and Sudarshan 1982; Hudrliková 2013; Chakraborty and Joshi 2014; Kissi et al. 2015; Kablan et al. 2017) that brings the indicator data to zero and one, where zero represents minimum and one indicates maximum values for an indicator. The indicators that are directly related with vulnerability were transformed through Eq. 1;

$$ X_{i} = \frac{{X_{a} - X_{{{\text{Min}}}} }}{{X_{{{\text{Max}}}} - X_{{{\text{Min}}}} }} $$
(1)

The indicators that have inverse relationship with vulnerability were rescaled through Eq 2;

$$ X_{i} = \frac{{X_{{{\text{Max}}}} - X_{a} }}{{X_{{{\text{Max}}}} - X_{{{\text{Min}}}} }} $$
(2)

where Xi means the normalized value, Xɑ is the actual value, XMax is the maximum value, and XMin is the minimum value for an indicator i (1,2,3…n) across the selected villages.

2.2.6 Weighting scheme

No weights are allocated to the indicators for the construction of composite indicators in this case to avoid the subjectivity issue in the first instance. Note that indicators will only have equal weights with respect to sub-indices and sub-indices for the final flood vulnerability index.

2.2.7 Aggregation schemes

The additive arithmetic function in terms of non-weighted averages is used for aggregation of the indicators into its respective sub-indices (factors) using Eq. 3 (Booysen 2002 and Tate 2012 in Talukder et al. 2017);

$$ {\text{SI}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} X_{i} }}{n} $$
(3)

where SI stands for sub-indices exposure (SI_E), susceptibility (SI_S), and lack of resilience (SI_LoR) factors for n numbers of indicators in each factor. The overall flood vulnerability composite indicators (FVI) are calculated through Eq. 4 (Lee and Choi 2018);

$$ {\text{FVI}} = \frac{1}{3}\left( {{\text{SI\_E}} + {\text{SI\_E}} + {\text{SI\_LoR}}} \right) $$
(4)

This is the simplest approach that is frequently used in the scientific community for composite indices, where the normalized indicators are simply averaged through additive function. It is renamed “MMNA” in this study. Here, “MM” means that the indicators are normalized through “min–max” method, “N” means that “no” weights are assigned to indicators, and “A” implies that the aggregation is based on “additive” function. This is the base model, where it is assumed that its construction, interpretation, and comprehension are extremely simple that can easily convey it message to all its end users.

2.2.8 Robustness check

OECD (2008) approach (Eq. 5) used by Hudrliková (2013) and Nazeer and Bork (2019) is adopted in the current study where the average absolute shift in ranking (\(\overline{R}_{S} )\) from median ranking defines the stability and reliability of the findings. The lower value close to zero will imply a more similar ranking to the median ranking. Median ranking has been reported to be the most accurate ranking compared to other approaches that are largely influenced by data issues, such as highly correlated indicators and the presence of extreme values (Hudrliková 2013).

$$ \overline{R}_{S} = \frac{1}{M}\mathop \sum \limits_{Y = 1}^{M} \left| {Y_{{{\text{MR}}}} - Y_{{{\text{DM}}}} } \right| $$
(5)

where MR stands for median rank and DM for the rank derived through different methods for a given village Y across the selected villages (i = 1,2,3…M). Spearman's correlation is also used for such purposes in earlier studies (see Talukder et al. 2017). The higher correlation coefficient between the MR and other methodological approaches will indicate the most similar and stable ranking. In current study, robustness tests are conducted only in relation to comprehend the impact of different data rescaling, weighting, and aggregation (Nardo et al. 2005 in Talukder et al. 2017) on the overall results. The alternative data rescaling, weighting, and aggregation steps are given in the following.

2.2.8.1 Alternative data transformation

The second technique used in the current study is z-score. This type of data rescaling is generally used because it converts all indicators to a common scale with an average of zero and a standard deviation of one that avoids aggregation distortion. However, attention is needed in the case of exceptional behavior of certain indicators (OECD 2008). Furthermore, the data range of the standardized indicators will not remain the same with the new set of positive and negative values (Damm 2010). In this method, the mean is subtracted from the actual value and divided by the standard deviation of the indicator across the selected districts as implied in Eq. 6;

$$ X_{i} = \frac{{X_{a} - \overline{X}}}{\sigma } $$
(6)

where \({\overline{X}}\) stands for the mean values and σ for the standard deviation.

2.2.8.2 Alternative weights

The current study is using two data-driven techniques for the weights of indicators. The main advantage of the data-driven approach is to address issues of subjectivity and equal weighting (Damm 2010). The first method is called Principal component analysis (PCA). Varimax rotation with a value greater than 1 approach (Kaiser criterion) is used (Roder et al. 2017). The weights are calculated using Eq. 7 (Nicoletti et al. 2000 in OECD 2008; Damm 2010);

$$ W_{i} = \frac{{{\text{Square}}\,{\text{of}}\,{\text{factors}}\,{\text{loadings}}}}{{{\text{Total}}\,{\text{variance}}\,{\text{of}}\,{\text{the}}\,{\text{rotated}}\,{\text{square}}\,{\text{loadings}}}} $$
(7)

The second approach used in the current study is known as the Iyenger and Sudarshan’s method (IS). Previously, this approach has been used by different authors for different types of vulnerability assessment. In this approach, the weights are assumed to vary inversely as the variance over the regions in the respective indicators of vulnerability (Bhattacharjee and Wang 2011; Hiremath and Shiyani 2013; Kissi et al. 2015; Kablan et al. 2017). It is also reported that calculating weights through this approach “would ensure that large variation in any one of the indicators would not unduly dominate the contribution of the rest of the indicators and distort inter-regional comparisons” (Bhattacharjee and Wang 2011; Hiremath and Shiyani 2013; Kissi et al. 2015). The weights for each indicator i across the selected districts are calculated through Eq. 8;

$$ W_{i} = {\raise0.7ex\hbox{$K$} \!\mathord{\left/ {\vphantom {K {\sqrt {{\text{Var}} X_{i} } }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\sqrt {{\text{Var}} X_{i} } }$}} $$
(8)

where \(\mathop \sum \nolimits_{i = 1}^{n} W_{i}\) = 1 and 0\(\le W_{i} \le 1\) and K is the normalized constant that is calculated using Eq. 9;

$$ K = \left[ {\mathop \sum \limits_{i = 1}^{n} {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {\sqrt {{\text{Var}} X_{i} } }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\sqrt {{\text{Var}} X_{i} } }$}}} \right]^{ - 1} $$
(9)
2.2.8.3 Alternative aggregation

The weighted sub-indices are calculated through Eq. 10, by putting the weights derived for each indicator using Eq. 7 and 8 such as (OECD 2008; Chakraborty and Joshi 2014; Kablan et al. 2017; Lee and Choi 2018);

$$ SI = \mathop \sum \limits_{i = 1}^{n} W_{i} X_{i} $$
(10)

The weighted sub-indices are aggregated as an overall flood vulnerability index using Eq. 4.

As the additive (linear) aggregation is fully compensated where the low value in one indicator can be re-compensated by other indicators' sufficiently high values, which are sometimes not desirable; whereas, the multiplicative aggregation (geomean) is partially compensated (Hudrliková 2013). However, due to the normalization method, one was added to all indicators as geometric function is strictly applicable in positive data (Water and Waste Digest 2001). The factor-wise (sub-indices) aggregation was done through Eq. 11 (Nardo et al. 2005 in Talukder et al. 2017);

$$ SI = \mathop \prod \limits_{i = 1}^{n} X_{i}^{\frac{1}{n}} $$
(11)

While the overall flood vulnerability indices are calculated though Eq. 12 (Lee and Choi 2018);

$$ FVI = \left( {SI\_E \times SI\_S \times SI\_LoR} \right)^{1/3} $$
(12)

Based on the above techniques, four (4) different other procedures are used to construct composite indicators (Table 2) for flood vulnerability assessment;

  • Data normalized through Min–Max approach with weights calculated through Iyenger and Sudarshan’s method and aggregated through additive function (MMISA).

  • Data normalized through the Min–Max approach with PCA extracted weights and aggregated through additive function (MMPCA).

  • Data rescaled through the Z-score approach, with no weights to indicators and aggregated through additive function (ZSNA), and

  • Data normalized through the Min–Max approach, with no weights to indicators and aggregated through geometric function (MMNG).

Table 2 Construction of flood vulnerability indices through different techniques

2.2.9 Results analysis

Different tools are used in the current study. QGIS version 2.18 (QGIS Development Team 2018) is used for creating maps. DIVA portal (diva-gis.org) and Japan Aerospace Exploration Agency’s portal (Japan Aerospace Exploration Agency 1997) are used for spatial data. For statistical analysis, JASP version 9.2 (JASP Team 2018), the Jamovi project (2019), and PAST (Hammer et al. 2001) are used. The correlation “ellipse” can be regarded for strong linear relationship with narrower in size and straight in (slope) position. Column graphs are developed that rank districts and communities from low (left) vulnerable to high (right). The OECD (2008), Heltberg and Bonch-Osmolovskiy (2010), and Hagenlocher et al. (2016) approaches used stacked for index disaggregation; however, they are a bit complicated for a large number of comparative units as each column can serve as an independent chart in stacked graphs where confusion arises, as some indicators have shown quite low value across comparative units (villages) but greater contribution in each column. So, the current study uses colored matrix instead of stacked charts. These colored plots can give the end users an "instant idea" that is the relative proportions of the respective indicators in the respective sub-indices and sub-indices of the final flood vulnerability index. The indicators or sub-indices are plotted on the X-axis while the villages are plotted on the Y-axis that can be read from highly (bottom) vulnerable to low (top).

3 Results

The data were analyzed using the Pearson correlation matrix to understand highly correlated indicators, their linear relationship, and direction (Fig. 3). HNESL was found to be highly correlated with LASR, which has a clear logic as both of these indicators belong to the built environment. In this case, only LASR is retained while the HNESL is discarded. HHS has been found to be highly correlated with OWD, which makes sense that the higher the number of household members of a community, the higher might the production and practice of open waste disposal. So, only HHS is retained. KH, SOI, MMHI, WCS, LEH, and LPFT were found to be highly correlated, suggesting predominantly socio-economically disadvantaged circumstances and therefore only LEH is retained. As the lack of household, head education may be a reason to influence all of these related features. Although LAMF was found to be highly correlated with KH and SOI, there was no strong correlation between MMHI, LEH, and VO, and most of them are already discarded from the final list of indicators; in order to avoid the loss of this important information, LAMF is included in the final list of indicators. Finally, eight indicators are retained for the development of the flood vulnerability index (Table 3).

Fig. 3
figure 3

Correlation among the preliminary set of indicators

Table 3 Descriptive statistics of the retained indicators

3.1 Flood vulnerability index

The values of the flood vulnerability index as a relative measure of flood vulnerability of the selected villages are shown in Fig. 4. It can be seen that Agra is the highly vulnerable village to floods followed by Dheri Zardad, B.S. Ismail, Sukkar, Haji Zai, Mohib Banda, and Pashtun Garhi. These results imply that communities’ households situated in district Charsadda have comparatively higher flood vulnerability except B.S. Ismail of district Nowshera. The flood vulnerability index is the aggregate of three sub-indices, including exposure, susceptibility, and lack of resilience factors, as shown in Fig. 5. It is shown that Pashtun Garhi and Haji Zai seemed to have comparatively small flood exposure compared to other selected communities. Agra and Dheri Zardad showed comparatively high flood susceptibility. A relatively high lack of resilience was observed for Sukkar, Haji Zai, and Dheri Zardad.

Fig. 4
figure 4

Flood vulnerability Index for the selected villages of Peshawar Vale

Fig. 5
figure 5

Contributors in flood vulnerability index

3.2 Sub-index exposure

Figure 6 illustrates the comparative measure of flood exposure through the sub-index exposure values for the selected villages in Peshawar vale. It can be seen that B.S. Ismail has comparatively high flood exposure followed by Mohib Banda, Agra, Dheri Zardad, Sukkar, Haji Zai, and Pashtun Garhi. These results imply that flood exposure is quite higher at the outlet of River Swat in River Kabul where the three communities of B.S. Ismail, Mohib Banda and Agra are located. Whereas, the communities located in upstream and downstream in the current study showed comparatively less flood exposure. Sub-index exposure has two main contributors, HFPA and HHS, as shown in Fig. 7. It can be seen that Pashtun Garhi has a relatively low HFPA and HHS. On the other hand, B.S. Ismail has a comparatively high HHS and HFPA. It was also observed that HFPA has a high contribution in Mohib Banda, Agra, and Sukkar. Whereas, HHS ' contribution to flood exposure for the Dheri Zardad has been found to be relatively high.

Fig. 6
figure 6

Sub-index exposure for the selected villages of Peshawar Vale

Fig. 7
figure 7

Contribution in the sub-index exposure by individual indicators

3.3 Sub-index susceptibility

The values of sub-index susceptibility as a relative measure of flood susceptibility across the selected villages of Peshawar vale is shown in Fig. 8. The results indicate that Agra is the highly susceptible village to flood followed by Dheri Zardad, Haji Zai, Pashtun Garhi, Mohib Banda, B.S. Ismail and Sukkar. Sub-index susceptibility has two main indicators, VO and LINDL, as shown in Fig. 9. Both indicators make a significant highly contribution in the case of Agra and Dheri Zardad. While in Haji Zai, VO has shown a high contribution compared to the very low contribution made by LINDL. In the case of B.S. Ismail, VO is contributing more for its susceptibility to flooding. LINDL was found a high contributor also in the case of Mohib Banda.

Fig. 8
figure 8

Sub-index susceptibility for the selected villages of Peshawar Vale

Fig. 9
figure 9

Contribution in the sub-index susceptibility by individual indicators

3.4 Sub-index lack of resilience

The values of sub-index lack of resilience as a relative measure of lack of resilience factor to flood across the selected villages of the Peshawar vale is shown in Fig. 10. It can be seen that Sukkar was ranked first, followed by Haji Zai, Dheri Zardad, Agra, B.S. Ismail, Pashtun Garhi and Mohib Banda. These results suggest that the lack of resilience to floods is fairly lower in the households of the communities of Nowshera than in the Charsadda district. The key contributors to the sub-index lack of resilience in the selected villages are shown in Fig. 11. Almost all indicators have been found to make a significant contribution to the high lack of resilience in Sukkar. Nearly similar conditions have been observed in the case of Haji Zai and Dheri Zardad. In the case of Agra, LEH and LAMF have been observed comparatively higher contributors. LASR and LAMF were found comparatively high contributors in B.S. Ismail. One can see that flood management measures in terms of infrastructures are quite satisfactory that indicates that authorities are primarily focusing on infrastructural measures.

Fig. 10
figure 10

Sub-index lack of resilience for the selected villages of Peshawar Vale

Fig. 11
figure 11

Contribution in the sub-index lack of resilience by individual indicators

3.5 Robustness check

The robustness tests were carried out by comparing the ranks derived by different methods with the median ranks using the average shift in the ranking approach (Table 4). The weights derived from the PCA and IS methods can be seen in Table 5 and Table 6, respectively. A very nominal shift in rank values was observed for all methods with respect to median ranking (0.29) except for non-weighted geometric aggregation (0.00). These findings are also demonstrated through the correlation coefficients (Table 7). To know the variation in shift in ranking, all the derived rankings through different approaches (representing by line) were plotted against median ranking (representing by triangle mark) in ascending order Fig. 12. The results indicate that a maximum of one-degree shift in ranking can be seen in first fours villages. These results imply that these four villages can exchange their overall ranking with respect to different methods. These results are not unexpected as the data at local scale are quite uniform where these approaches are highly influenced by the data structure and linear strength. Though the base model did not show a higher value of average shift in ranking with respect to other methods but multiplicative aggregation showed a smaller average shift in ranks. However, not showing more average shift in ranking with respect to other methods implies that the results of base model are not biased.

Table 4 Average shift in ranking
Table 5 Indicators weights using PCA approach
Table 6 Indicators weights using IS approach
Table 7 Spearman correlation between median ranking and other approaches
Fig. 12
figure 12

Shift in ranking with respect to median ranking

3.6 Upscaling the local flood vulnerability assessment

The results of sub-index exposure, sub-index susceptibility, and sub-index lack of resilience and their aggregate as a flood vulnerability index by additive aggregation and multiplicative aggregation are shown in Figs. 13 and 14, respectively. It can be seen that in both cases there is no differentiation which means the outcomes are not biased. Indeed, District Nowshera has higher flood exposure than District Charsadda but less overall flood vulnerability, susceptibility, and lack of resilience.

Fig. 13
figure 13

Flood vulnerability indices using additive aggregation

Fig. 14
figure 14

Flood vulnerability indices using geometric aggregation

4 Discussion

Developing a composite indicator for flood vulnerability has several challenges. The MOVE vulnerability framework has no indication for assessment method, while the UNESCO-IHE approach uses a fixed model of analytical procedure for the quantification of indices. Such an approach is good in the sense that it keeps track of the readily available structure for the development of composite indicators. However, its non-flexible nature hardly makes it easier to test such an approach for other implications, since such an approach is often inconsistent with the various forms of data transformation, weighting, and aggregation. These might be the reasons that the authors of the UNESCO-IHE flood vulnerability’s approach quoted several limitations of their assessment approach such as that uncertainty cannot be removed, the flood vulnerability index is not “one size fits all scenario,” and the indices can be distorted in uneven number of indicators etc. (see Balica et al. 2012). This approach is frequently replicated by various authors in the study area by integrating indicators in the general vulnerability equation to develop indices without relying on the accuracy and reliability of the chosen approach in the local context, since vulnerability is considered context-specific. These studies provide a good background for the assessment of flood vulnerability in the study area. However, in addition to not identifying the exact hotspots for flood vulnerability (i.e., the comparative level of flood vulnerability of the flood-prone communities) and following a ready-made approach, these studies remain silent in order to resolve a number of issues relevant to the empirical analysis of flood vulnerability. For instance, earlier studies in the study area have used all such indicators, which are theoretically entirely related, such as house structure and household income or employment, etc. Definitely, if a household has multiple sources of income, they would also have decent housing, access to resources, etc. Birkmann (2007) has rightly pointed out that certain indicators are redundant at the community or local level and do not add anything significant. In order to address such issues in the development of composite indicators, a methodology should be adopted that must be open to end users and scientifically justified to opponents. The findings of flood vulnerability composite indicators can only be made reliable through such practices.

Generally, the construction of a composite indicator is a multi-step process. Unfortunately, there is no single universally accepted method for constructing composite indicators (Mazziotta and Pareto 2013) and each step toward constructing composite indicators has several options that make it more complicated for that Greco et al. (2018) uses the phrase “between the devil and the deep blue sea.” Taking these aspects into account, the OECD (2008) approach to the development of composite indicators seems to be the only way to address these challenges to some extent. In contrary to develop a composite indicator in a scattered form where readers (especially non-technical or newcomers) are unable to understand the methodological process, this can help to develop a composite indicator through a step-wise and comprehensive approach. This is also considered against composite indicators being transparent and reliable where subjectivity and technical artefacts are not open to common readers. It sets out the mechanism for interpreting the findings in an easy-to-use approach. It also facilitates a fairly simple, non-sophisticated approach to assess the methodological bias in terms of the average shift in rank from the reference ranking for different data rescaling, weighting, and aggregation schemes. We have illustrated these procedures in great detail in order to achieve the results of the flood vulnerability in the flood-prone villages of the Peshawar Valley.

The findings of the current study have been found to be somehow consistent with earlier studies, such as the high vulnerability of Agra, while no study suggests that the village is highly vulnerable to others. There are, however, a number of studies on the Agra Union Council. For instance, Bibi et al. (2018), Akhter et al. (2017), Malik (2012), Khan and Ali (2012), gray literature from the Provincial Disaster Management Authority and various development organizations suggested that due to its geographical position, the Agra Union Council is at high risk of flooding. However, regarding earlier research, there are some variations in findings. For instance, Shah et al. (2018) reported that households in District Nowshera communities are more vulnerable to flooding than the Charsadda district. This finding is not consistent with the findings of the current study, as well as with Qasim et al. (2017), where the value of the composite flood vulnerability indicators for Nowshera was less than that of the district of Charsadda. One possible reason for such variability in results is the use of different datasets, since the values of vulnerability indices are largely influenced by the numbers of indicators and their values. In addition, Charsadda has been stated to be marginally higher economically resilient due to female labour force than Nowshera. This is not the case in this study as well as in the Qasim et al. (2015) study, which indicated that women are not even permitted to participate in surveys due to cultural and religious limitations, repeated by a number of scholars, including Shah et al. (2018). They may have asked the participants regarding their female workers, or they may have meant working in their homes for economic purposes. However, vulnerable occupations, including farmers and labour in Charsadda, have been found to be significantly higher than Nowshera in the current study. The findings of higher exposure of community households in the district of Nowshera than Charsadda are consistent with Shah et al. (2018)

Differences and similarities in results in all of these studies may not be surprising due to different datasets. However, one thing that distinguishes this study from previous studies is its logical and methodological soundness. Almost all stakeholders agree that the economic situation, lack of resources, health services, education, etc. are key aspects of high flood vulnerability not only in the study area, but in the (developing) world as a whole. Giving a general discourse on increasing education, providing health care services, reducing unemployment, and so on, would add nothing different. And to be practical, what is the aim of the assessment of vulnerability if it ignores the vulnerability's purpose (i.e., to identify the exact hotspots or comparative levels and key drivers) and characteristics (that vulnerability differs across different locations/communities). The novel concept is not only to identify the comparative levels of hotspots and main drivers for further analysis or action, but also to ensure the implementation of a methodologically unbiased approach. Most flood vulnerability assessment studies are silent on robustness tests as stated by Nasiri et al. (2016), that an analysis of uncertainty is one of the key weaknesses in assessing flood vulnerability. Evaluating uncertainty and sensitivity, as mentioned earlier, is not optional but crucial to ensuring the transparency of vulnerability assessment indices (Baptista 2014).

Investigators should look at other indicators rather than the replication of certain indicators in each vulnerability study. Villordon (2014) has integrated very sound and robust indicators to assess flood vulnerability, such as living in or near degraded land, open waste disposal, etc. In current study, the majority of participants respond that they reside either in saline or water-logged areas, which is consistent with the statement of Amir (2013) that seeping from rivers in the low-lying Peshawar Vale is going to worry the residents of the area. Integrating these elements in the flood vulnerability assessment can expand the scope of the research and will draw the attention of the competent authority to also concentrate on these elements in rural areas of the province. Some random snapshots of the aspects mentioned are provided in Fig. 15. The study has some limitations, such as in the study of Kablan et al. (2017), which indicated that artificial boundaries have been created for the selected villages as there are no definite boundaries for the selected villages. Furthermore, a heterogeneous vulnerability within the community at household level is not possible due to time and resources. This means that the proposed sample size does not provide a "full snapshot" of the heterogeneous vulnerability within a community by surveying all households in the community. However, the study offers a guide that can be applied to all households in a community by the authorities concerned. Only relative levels of vulnerability are assessed here. This means that a village with a higher flood vulnerability index does not necessarily mean that it is a highly vulnerable village to flooding in general, but significantly more than the other selected villages on the basis of the available dataset.

Fig. 15
figure 15

Site-specific indicators a Houses in flood-prone area, b Degraded land, c Flood management measures, and d Kacha houses at street level

5 Conclusion

There are always some concerns about the reliability of the composite indicators when the construction process is not open to all stakeholders. A comprehensive approach has demonstrated in this study to construct flood vulnerability index for a variety of stakeholders engaged in flood risk reduction, as opposed to earlier flood vulnerability assessment studies where some ready-made approaches were replicated. Several issues that need to be properly concentrated in the development of flood vulnerability indices are highlighted in order to achieve reliable results. The assessment of flood vulnerability undertaken in this study is mainly interdisciplinary, where no advanced technique is used except for differential weighting schemes that may be substitute with other methods. Transparency is assured by its inclusive construction, and the findings are portrayed in an easy-to-understand approach. It is generally believed that an incredibly sophisticated approach can hinder the understanding of non-technical users. The study can facilitate a wide range of stakeholders and decision-makers not only to develop composite indicators for flood vulnerability but also to scientifically justify it as a management tool for flood risk reduction. The approach is applied to the selected flood-prone villages of Peshawar Vale, in Pakistan's Khyber Pakhtunkhwa province. It was noticed that the village of Agra is comparatively highly vulnerable to flooding followed by Dheri Zardad, B.S. Ismail, Sukkar, Haji Zai, Mohib Banda, and Pashtun Garhi. It has been discovered, however, that the Dheri Zardad can take over ranked first under various methodological assumptions, which need further investigation. Up-scaling results at the district level indicated that households in the communities residing in the Charsadda district were found to be relatively highly vulnerable to flooding compared to the Nowshera district using current dataset. From District Charsadda and Nowshera, it implies the selected communities of these districts as only some parts of these districts are vulnerable to flooding relative to their overall geographical areas. The study emphasizes to encourage the new-comers from human-geography or related fields to build composite indicators for floods, droughts, or related hazards in a fairly simple and methodologically defensible way that is demonstrated in the current study in great detail.