1 Introduction

The trans-Himalayan and transboundary Indus River flows approximately 3200 km across its 1.12 million square kilometer basin to ultimately descend into the Arabian Sea (FAO 2011). The upper Indus basin (UIB) marked by the confluence of the Hindukush, Karakoram, and Himalayans mountain ranges (HKH) contains the largest cryosphere volumes outside the Poles (Soncini et al. 2015; Qiu 2008). The HKH topography modulates two contrasting synoptic-scale processes: the South Asian Summer Monsoon and Western Disturbances, to determine highly variable patterns of observed precipitation over the UIB (Filippi et al. 2014; Palazzi et al. 2013). Snow and glacial melt within the UIB mainly govern the hydrological regime of the Indus River (e.g., Tahir et al. 2011; Archer and Fowler 2004), which sustains the livelihood of nearly 215 million downstream inhabitants (Latif et al. 2018; Lutz et al. 2016). In contrast, the lower Indus (LI) primarily has an arid to semi-arid climate and represents large fertile plains, spate irrigation regions, and a diverse coastal ecology.

An accurate knowledge of the spatial and temporal distribution of precipitation is largely unknown in the high altitudes (HA) of the UIB (Immerzeel et al. 2015; Hewitt 2005; Winiger et al. 2005). This lack of certainty in precipitation characteristics stems from an insufficient number of HA meteorological observatories, inaccuracies in available HA measurements (Rasmussen et al. 2012), concerns about the data quality (Hewitt 2011; Winiger et al. 2005), and serious constraints on transboundary data sharing due to regional politics (Dahri et al. 2016). It should be noted that most of the observatories with long-term records are sparsely located in the low altitudes, while short-term and inconsistent observations of the HA limit our understanding about the regional orography.

Given an elevated susceptibility to climate change and the associated high social-ecological vulnerability (MRI 2015; Nepal and Shrestha 2015), a reliable knowledge towards the spatiotemporal organization of regional precipitation is imperative to implement efficient adaptations at the basin level.

However, the observational constraints hamper a reliable modeling of the processes that govern precipitation distribution within the basin. While different gridded products (e.g., reanalysis, remote sensing or interpolated station observations) can potentially help to overcome these observational deficiencies (Immerzeel et al. 2015), the scope of such improvements is somewhat limited over the complex and largely snow-covered HA of the UIB (Immerzeel et al. 2015; Huffman et al. 2007; Turner and Annamalai 2012).

Given the unique observational challenges, different strategies have been adapted to improve the confidence in precipitation simulations over the UIB. However, studies often find contrasting climate change signals over these regions, which range from rapidly retreating glaciers (Kääb et al. 2012, 2015; Wiltshire 2014; Jacob et al. 2012; Cogley 2011) to rather glacial expansions in the Karakoram, also known as the Karakoram anomaly (e.g., Bashir et al. 2017; Kapnick et al. 2014; Tahir et al. 2014; Bhambri et al. 2013; Minora et al. 2013). The lack of robustness in the projected change signal and associated glacial response over this region implicates the development of effective policy measures to reduce projected vulnerability over the basin. Therefore, it is imperative to understand the influence of existing observational and modeling challenges on projected climatic changes over this region and to devise strategies for improving spatial inferences of regional precipitation. However, such improvements potentially have to be derived through methodological considerations, as the availability of additional high-quality data is still an ongoing issue.

In this context, the reliability of spatially distributed precipitation estimates may suffer from (i) altitudinal biases; (ii) issues with observational data (homogeneity, time series length, consistency), and/or (iii) the downscaling approaches. We refer to these challenges as the type 1, type 2, and type 3 uncertainties, respectively. Many previous studies exhibit the type 1 uncertainty because they rely on the low-altitudinal observations to infer seasonal and annual precipitation statistics over the UIB (e.g., Iqbal and Athar 2018, Ali et al. 2015; Khan et al. 2015a; Akhtar et al. 2008; Archer and Fowler 2004, Latif et al. 2018 and Khattak et al. 2011). Estimates in most of these earlier studies neither represent the HA nor provide any quantifiable mechanism for drawing logical inferences between the low and high altitudes, and therefore may lack the reliability. Some of the recent studies (e.g., Immerzeel et al. 2015; Dahri et al. 2016, 2018; Hasson 2016) have addressed this deficiency, either by incorporating the HA perspective using short-term data (e.g., glacial mass balance and or fragmented HA monitoring) or by other assumptions. These studies significantly differ in regional precipitation estimates over the UIB, when compared with historically observed estimates recorded by the valley stations. However, although estimates in these studies are derived by maximizing spatial coverage within the UIB, they may suffer from some other shortcomings. For example, the use of a limited number of HA stations, which not only contain short-term observations (e.g., Hasson 2016) but may also exhibit issues related to homogeneity (e.g., Dahri et al. 2016), measurement errors, and assumption of a linear precipitation gradient along the altitudes (e.g. Immerzeel et al. 2013), and therefore may induce type 2 uncertainties in respective simulations.

The type 3 uncertainty originates from the choice of the adapted statistical downscaling methodology. For instance, bias correction methods, which have been widely used in these previous studies, highly depend on the observations for an appropriate characterization of the region of interest. However, the observation sparsity over the UIB makes the use of bias correction relatively ineffective. Lastly, the current generation of General Circulation Models (GCMs) has persistently failed in the representation of dynamic and thermodynamic processes that dictate precipitation variability over this region at varying time scales (e.g., Ashfaq et al. 2017; Palazzi et al. 2015; Rastogi et al. 2018). Alternatively, regional climate models (RCMs) generally offer improved regional simulations due to better topographic representation and the flexibility of region-specific tuning of the model parameters. However, an evaluation of the seven fine-scale (0.44°) CORDEX-SA experiments has also shown a limited success over this region since they still derive boundary forcing from the biased GCMs (e.g., Hasson et al. 2019; Mishra 2015; Syed et al. 2014). The High Asia Refined analysis (HAR), which is based on dynamical modeling, has shown promising results in reproducing overall regional precipitation climatology and in explaining sub-regional precipitation variability over time (e.g., Maussion et al. 2014; Curio and Scherer 2016). More recently, Pritchard et al. (2019) have evaluated HAR performance specifically over the UIB in terms of precipitation and other important near-surface variables. Although the annual precipitation cycle is reasonably simulated, there exists some biases at seasonal scales, which lead to wetter and colder conditions across the HA areas.

It is clear that further efforts are needed to reduce uncertainties in precipitation estimates over the UIB. For example, incorporating a time series analysis along with careful data quality checks, an avoidance of the direct use of available HA observations, and the adaption of other, potentially better suited statistical downscaling approaches, may help to resolve some of these uncertainties. To this end, perfect prognosis statistical downscaling (Wilks 2006) is a promising statistical alternative, which offers distinct advantages. However, so far only a few studies are reported in the region that make use of this technique (e.g., Kazmi et al. 2016; Mahmood and Babel 2012). This approach employs large-scale atmospheric circulations within a statistical downscaling framework to resolve the observed precipitation distributions. The use of atmospheric circulations is advantageous as these variables are relatively better modeled in the GCMs (Kaspar-Ott et al. 2019), and can explain the physical mechanisms that govern regional precipitation. Given the observational uncertainty in the UIB, such physical explanations of the downscaled precipitation may further add confidence in the statistical modeling results. Moreover, knowledge towards the large-scale dynamics that influence precipitation distribution over the UIB is relatively robust, as many of the previous studies provide a good overview in this regard that can serve as the basis for comparison. For instance, Syed et al. (2006, 2010), Curio and Scherer (2016), Cannon et al. (2015), Ahmad et al. (2015), and Kazmi et al. (2016) explain different aspects of the precipitation governing circulations over the region.

Considering such advantages and to present a different perspective, we have adapted perfect prognosis downscaling to investigate the observed precipitation dynamics within the Indus basin with a focus on the UIB, by accounting for spatiotemporal precipitation variability. Due to the dominant role of synoptic-scale circulations in regional precipitation over the UIB, we assume that despite the differences in magnitudes, distinct linear relationships of temporal precipitation variability between the low and HA should exist, at least on sub-regional scales. Such intra-regional signals, if properly identified and captured, can significantly improve the understanding of precipitation variability over the relatively uncertain HA. The availability of the recent HA observations can help to validate this assumption. Especially, this study aims to:

  1. i)

    Evaluate the effectiveness of low altitude stations in representing HA precipitation dynamics across the UIB without using actual precipitation measurements

  2. ii)

    Identify precipitation governing circulations within a robust statistical downscaling framework to explain observed spatiotemporal precipitation patterns

  3. iii)

    Analyze the physical consistency of final statistical models through composites of governing circulations

Moreover, extending the analysis to the LI further helps to understand the water demand perspective, which is necessary for effective water resources planning at the basin scale.

In Section 2, we provide a brief description of the study area. Data and details of the adapted downscaling framework and the approach for seasonal composites are described in Section 3. This is followed by the description of results and discussion in Section 4 and the details of composite analysis and sources of uncertainty in Section 5. Finally, the conclusions are drawn in Section 6.

2 Study area

The Indus River originates from the southwestern Tibetan Plateau and ultimately descends into the Arabian Sea after traversing through the entirety of Pakistan. The River basin area stretches across four geopolitically complex South Asian countries (i.e., China, India, Afghanistan, and Pakistan) and is ranked 12th in the world in terms of its size (Fig. 1a). Annual precipitation substantially varies across the basin with a magnitude as low as 150 mm in the south to more than 2000 mm in the northern highlands and follows a nonlinear variation along the altitudes (Archer and Fowler 2004; Dahri et al. 2016). It should be noted that this study only focuses on the basin area controlled by Pakistan (the largest basin area among all four countries and contains the highest altitudes), due to severe constraints on availability of data over rest of the region (Fig. 1b). Within this study, the UIB outlines the drainage area above the Mangla dam over River Jehlum in Pakistan, as shown in blue color in Fig. 1a.

Fig. 1
figure 1

(a) The transboundary Indus River basin, shown as colored regions, along with the major tributary river network. The color scheme differentiates between the upper and lower Indus basin. (b) The study area (Indus basin of Pakistan) with the same color scheme to represent the upper Indus basin (UIB) and lower Indus (LI) basin of Pakistan. The Mountains of the HKH region are also shown around the UIB

3 Data and methodology

3.1 Precipitation: sources, homogeneity, and distributional considerations

The Pakistan Meteorological Department (PMD) operates the largest monitoring network across Pakistan that also includes the Indus River basin. However, these observatories mostly represent low-altitude regimes (only two stations are located above 2000 m altitude with a maximum at 2394 m). Since the mid-1990s, the Water and Power Development Authority (WAPDA) of Pakistan has initiated a systematic monitoring program to represent major sub-catchments within the UIB and that has significantly improved the spatial and altitudinal coverage. Additionally, the University of Bonn under Cultural Areas Karakoram (CAK) Program has some operational observatories in the HA of Hunza and Gilgit sub-catchments of the UIB.

We obtain precipitation time series of 58 stations (shown in Fig. 2) from these three sources for further investigations. Out of these 58 stations, 42 stations are located within the UIB to account for its topographic complexity and significance for the Indus waters. The mean data length of the 35 historic and the 23 recent HA stations is 37 (1979–2015) and 17 (1994–2015) years, respectively. The inclusion of the HA stations has extended the observational coverage of the UIB up to 4730 m (20, 11, and 8 stations are located above 2000, 3000, and 3500 meters, respectively), to understand orographic dynamics, which otherwise is least understood, and govern the basin hydrology. Information about the selected stations is given in Table 1.

Fig. 2
figure 2

Location of the (numbered) study stations over the Indus basin of Pakistan. The green circles represent the locations of historic stations (1979–2015), while triangles show the recent HA installations (1994–2015) within the UIB. The color scheme here represents the altitudinal perspective across the River basin. The major river network is also shown and named. For information about the numbers, please see Table 1

Table 1 Overview of the meteorological stations used in this study. The Long and Lat are longitude and latitude measurements, expressed in decimal degrees (dd). Altitudes represent average station elevation above mean sea level in meters. Source: PMD Pakistan Meteorological Department, WAPDA Water and Power Development Authority of Pakistan, CAK University of Bonn under Cultural Areas Karakoram Program (synthesized from Weniger et al. 2005 and in personal communication with the lead author of Dahri et al. 2016)

Based upon station precipitation analysis, we identify three major seasons that cover the winter period spanning from December through March (WS, DJFM), the pre-monsoon spanning from April to June (PMS, AMJ), and the summer monsoon season that spans from July to September (MS, JAS). These seasons contribute about 34%, 22%, and 36%, respectively, of the annual total of the basin precipitation and provide the basis for further analysis in our study.

We group monthly time series in each of the three seasons and check the data for its completeness (Moberg et al. 2006), homogeneity (Wijngaard et al. 2003; Alexandersson 1986), and underlying data structure. We make use of four different statistical procedures for homogeneity testing to group the stations as “useful” (where at least three tests indicate homogeneity), “doubtful” (where at least two tests indicate inhomogeneity), and “suspect” (where none or only one test indicates homogeneity) after Wijngaard et al. (2003). Although most of the stations appeared as “useful,” some seasonal inhomogeneity has also been identified. For example, we find seven stations as “suspect” and seven stations as “doubtful” in the WS, four stations as “suspect” and eight stations as “doubtful” in the PMS, and four stations as “suspect” and nine stations as “doubtful” in the MS. Most of the stations with homogeneity issues are located within the UIB. Therefore, the use of these stations without homogeneity considerations, as it has been done in previous studies, can potentially lead to errors in the seasonal precipitation estimates.

We further use several goodness-of-fit measures like the KS test (Smirnov 1939), Anderson-Darling Statistics (Anderson and Darling 1952), and Akaike and Bayesian Information Criterions (Akaike 1973; Stone 1979), to identify the underlying distribution for the observed precipitation. Our analysis suggests that gamma distribution is the best statistical representative for nearly all of the study stations. This identification helps to facilitate the choice of an appropriate statistical model, which is described in detail in Section 3.5.

3.2 Precipitation regionalization and selection of representative stations

We assume that large-scale atmospheric circulations drive precipitation variability within the UIB. Therefore, it is expected that despite differences in the magnitudes, commonalities in temporal precipitation variability should exist among the low and HA across the UIB. To test this assumption, we employ a K-means cluster analysis on all 58 stations to identify the precipitation regions with similar covariance, using Spearman correlation as a distance measure (Wilks 2006). Such a precipitation analysis, which avoids actual precipitation amounts, is advantageous as it can potentially account for the erroneous HA precipitation measurements (e.g., Curio and Scherer 2016; Tahir et al. 2011). During clustering, the objective function is set to maximize (minimize) correlation within (across) the regions to define sharp regional boundaries. Due to greater uncertainty in precipitation observations within the HA of the UIB, we also perform another regionalization experiment that only covers the HA part of the basin. Subsequently, the cluster centroids are computed and correlated with each of the respective regional members. The regional representative (RR) stations for each region are selected by using multiple considerations, which include a correlation between centroid and station, number of missing values, the level of homogeneity, and the length of observational record. The time series of these RR stations serve as predictands during the subsequent modeling process to draw sub-regional precipitation inferences.

3.3 Predictor data

We use variables of ERA-Interim reanalysis (Dee et al. 2011) at a 2 × 2 grid resolution as predictors. Using the understanding established in earlier studies (e.g., Ahmad et al. 2015; Kazmi et al. 2016; Mahmood and Babel 2012; Syed et al. 2010), a number of dynamic and thermodynamic variables at different vertical levels have been selected. These include geopotential heights (zg) at three atmospheric levels (200, 500, 700 hPa), meridional and zonal winds (ua and va) at four atmospheric levels (200, 500, 700, 850 hPa) and mean sea level pressure (psl) as dynamic variables, and relative and specific humidity (hur and hus) at two atmospheric levels (1000 and 700 hPa) as thermodynamic predictors. A larger domain was considered for the dynamic variables (10° E to 100° E, 10° N to 60° N) compared with the domain used for thermodynamic variables (64° E to 80° E, 22° N to 40° N) to account for both, the large-scale dynamical and more regional-scale thermodynamic influences.

3.4 Principle component analysis

We perform s-mode Varimax-rotated PCA, separately for each predictor to identify important centers of variation and to reduce dimensionality of the predictors (see Preisendorfer 1988). The number of PCs is extracted using a modified dominance criterion (Philipp 2003) with some additional constraints that each retained PC should dominate all other PCs by more than one standard deviation over at least seven grid boxes, and explains at least 3 % of the total variance. The PC scores serve as the predictor time series and the PC loadings define the location of centers of variation.

3.5 Generalized linear models

We adapt GLM framework (Mc Cullagh and Nelder 1989) to model the relationships between the atmospheric variables (PC scores) and observed precipitation of the RR stations. We selected the GLM framework due to gamma-distributed monthly precipitation. Within a GLM framework, the maximum likelihood estimation is used to estimate the model parameters for the expected value E of a random variable (Yt) at any given time t:

$$ E\left({Y}_t\right)={\mu}_t={g}^{-1}\left({\upeta}_t\right)={g}^{-1}{\sum}_1^n{X}_{tj}{\beta}_j\kern2.75em $$
(1)

where μt is the mean for the probability density function (PDF) at time t, ηt is a combination of linear predictors, g is the canonical link function, Xtj is the value of the jth covariate for observation t, n is the total number of covariates, and βj are parameters whose values have to be estimated from the data. We use log-canonical link function in this study.

For those cases having exact zeros in observed time series of the RR, we use Tweedie exponential dispersion models within a GLM framework, due to their ability for simultaneous modeling of discrete and continuous precipitation features (e.g., Dunn 2004; Hertig and Jacobeit 2015; Hertig et al. 2017). The Tweedie family of distributions has three parameters (i.e., μ, mean, dispersion parameter ϕ > 0, and the Tweedie index parameter p). The index p defines the particular distribution and a typical case (1 < p < 2) represents Poisson-gamma models, which are suitable for modeling positive continuous data with exact zeros (Hasan and Dunn 2011). The variance of the distribution is var [Y] = ϕμp. The estimation of the index parameter requires sophisticated numerical computations that have been resolved using the profile maximum likelihood estimate provided in the Tweedie R package (Dunn 2010). In summary, we use Tweedie model only when exact zeros appear in the predictand time series and otherwise gamma models for regression analysis.

3.5.1 Measures of model performance

Mean squared error skill score (MSESS) is a commonly employed metric to judge the accuracy of continuous non-probabilistic forecasts like precipitation, over the long-term climatology (Wilks 2006). MSESS is defined as:

$$ \mathrm{MSESS}=1-\frac{{\mathrm{MSE}}_{\mathrm{modeled}}\ }{\begin{array}{c} MS{E}_{\mathrm{reference}}\\ {}\ \end{array}}\kern0.75em \times 100 $$
(2)

where

$$ \mathrm{MSE}\ \mathrm{modeled}=\frac{1}{n}{\sum}_{n=1}^n{\left({y}^{\ast }-y\right)}^2 $$
(3)
$$ \mathrm{MSE}\ \mathrm{reference}=\frac{1}{n}{\sum}_{n=1}^n{\left(\overline{y}-y\ \right)}^2 $$
(4)

and

MSE:

mean squared errors

y*:

model prediction

y:

observation

\( \overline{y} \):

mean over the observations

n:

number of observations

MSESS ranges from 0% (no skill improvement over the reference model, in this case a long-term climatology) to 100 % (perfect model).

3.6 Downscaling framework: model development and selection

We develop GLM-based regression models to identify the atmospheric variables that exert the strongest influence on precipitation. We use a cross-validation framework by randomly selecting calibration (two-thirds of available time series) and validation periods (remaining one-third) and adapt the following stepwise reduction process, which is randomly repeated 1000 times to identify effective and robust statistical models:

I. The PC time series of each predictor is individually regressed against each of the RR stations. Initially, the regression models are developed using all PCs of an individual predictor as input. Subsequently, the revised models are developed based on an adjusted PC input (i.e., dropping PCs, one by one) and the corresponding MSE is calculated. We use MSE to rank the significance of the PCs and to identify the most influential PCs of each predictor (i.e., PCs whose absence maximized the errors were considered the most important). Next, the models containing only two most influential PCs are developed and the corresponding errors and MSESS are computed. These 2 PC models are further complemented one by one with the remaining ranked PCs to identify the best PC combination per predictor (i.e., which maximizes MSESS during the calibration and validation phases). We repeat this process for all 15 predictors and the model that demonstrates maximum predictability over the calibration and validation periods (sum of the MSESS) is termed as the best signal predictor model (SPM).

II. We further test different predictor combinations to evaluate any performance improvement over the best SPM from step I. For such combinations, the best SPM is complemented one by one with the remaining predictors. If two correlated PCs of different variables (absolute correlation threshold of 0.40) are found, only the more influential PC (showing greater error effect, when removed) was retained for further consideration. Moreover, an addition of a second predictor is only considered, if it improves the prediction skills by at least 7% over the best SPM. This arbitrary number effectively demonstrates reasonable validation improvements without overfitting the models. The same procedure is repeated during the combinations of three or more predictor variables to identify better performing combinations, wherever applicable.

3.7 Composite analysis

Composites can help to identify atmospheric circulations that are associated with different precipitation regimes to evaluate the physical consistency (reliability) of the statistical models. We construct the sample composites where monthly scores of the dominating PC (as identified in the final regression models) exceed or fall below ± 1.5 PC scores to subsample the cases of higher positive and higher negative PC scores. These subsets are evaluated to identify wetter (dryer) precipitation regimes of the predictand and their associated summary statistics such as the number of events, mean precipitation rate per event, and the dominant month. The same thresholds are further used to subsample relevant large-scale seasonal circulations to define climatology over respective subsamples and the standardized anomalies of these predictors are plotted to investigate precipitation supporting (suppressing) circulation patterns. Similarly, the difference between these contrasting regimes is also plotted to identify the circulation anomalies that support above-normal precipitation.

4 Results and discussion

4.1 Precipitation regionalization

In our analyses, the K-means clustering identifies six WS, seven PMS, and seven MS precipitation regions across the Indus basin to explain adequately the observed precipitation variability under the basin-wide experiment. The regionalization scheme successfully clusters 57%, 78%, and 48% of the total recent HA stations (23) around the historical observatories within the identified WS (three), PMS (four), and MS (five) precipitation regions in the UIB. The second regionalization experiment identifies four sub-regions during each of these seasons and further improves the spatial coverage within the UIB by clustering 70%, 83 %, and 83% of the HA stations around different historic stations. Table 2 presents the statistical performance during both of these regionalization experiments. Both experiments show a good inter- and intra-regional performance in terms of selected statistical attributes with an exception of the MS in the second experiment. Thus, both regionalization experiments validate our assumption of similarities in the precipitation variability between the recent HA and historic stations. Therefore, the historic stations can advantageously be used to infer orographic precipitation over more uncertain HA through these regionalization schemes.

Table 2 Summary of precipitation regionalization experiments on seasonal scales. Column 1 represents the precipitation seasons, and column 2 defines the number of precipitation regions. Columns 3 and 4 show sum of squared errors (SSE) within (WI) and between (BW) regions. Total explained variance is shown in column 5. Columns 6 and 7 define the correlation within (WI) and between (BW) the regions, respectively

Figures 3, 4, 5, and 6 describe the outcome of the precipitation regionalization process and identify the RR and those HA stations that could not be clustered around any of the surrounding historic stations during both regionalization experiments. Averagely, four different regions have been identified through both of these experiments to represent precipitation dynamics of the UIB. These regions represent precipitation variability of the southern Himalayans, northern Himalayans, and the northwestern parts of the UIB and provide an opportunity for a fine-scale analysis. The identification of such distinct precipitation regions further justifies the need for a sub-regional approach, as such heterogeneity may not be properly reflected by considering UIB as a single unit. Similarly, some of the important LI features (e.g., spate irrigation, irrigated plains, and the coastal belt) are also identified during the basin-wide regionalization experiment. These are important reflectors of the water demand and should simultaneously be considered to demonstrate integrated water resources planning at the basin scale.

Fig. 3
figure 3

The WS (i.e., DJFM) precipitation regionalization under the basin-wide experiment. The colored circles show different precipitation regions with similar co-variability. The numbered circles are the locations of the RR stations in respective precipitation regions. The triangles define those HA stations that could not be grouped with any of the available historic stations within the UIB

Fig. 4
figure 4

Same as Fig. 3 but for the PMS precipitation season (i.e., AMJ)

Fig. 5
figure 5

Same as Fig. 3 but for the MS precipitation season (i.e., JAS)

Fig. 6
figure 6

Same as Fig. 3 but for the HA-UIB precipitation regionalization. (a) WS, (b) PMS, and (c) MS precipitation regions, respectively

Overall, the regionalization schemes successfully capture the orographic heterogeneity within the UIB, show a good regional coherence, and significantly cover the altitudinal perspective within the observational profile during all three seasons. Moreover, our regionalization provides a more realistic basis for understanding sub-regional precipitation characteristics, when compared with the previously used hydrological units (Dahri et al. 2016) or the administrative units (Iqbal and Athar 2018).

4.2 Seasonal precipitation models

Using the outlined downscaling procedure (Section 3.6), we successfully model observed seasonal precipitation patterns over 29 sub-regions of the study basin under both regionalization experiments. However, we were unable to determine valid models for three smaller regions, representing the coastal belt within the basin-wide experiment (R2 in PMS and R2 in WS) and one UIB region (R3 in MS) during the revised regionalization experiment. This issue partly stems from the relatively large number of dry months (84, 80, and 21, respectively) in the observed time series of RR stations for these regions.

4.2.1 Predictor significance

The final composition of valid seasonal precipitation models (Table 3) indicates a dominant influence of wind predictors in basin-scale precipitation distribution (nearly 80% in total). Particularly, the lower tropospheric winds most effectively resolve the basin-wide regional models (50.9%). The influence of upper tropospheric winds is also significant at the basin scale (25.5 %). Among these wind predictors, the meridional wind velocities in the lower (40%) and upper troposphere (25%) are more prominent in these sub-regional models. However, their effectiveness has a stronger seasonality, where a maximum contribution is realized during the MS (va850 = 52.5%, va200 = 32.5%). After the wind components, the contribution of thermodynamic predictors is at its maximum at the basin scale (about 15%), which also exhibits strong seasonality with the maximum during the PMS (nearly 26%) and WS (about 15 %) and a minimum during the MS (5%). Among thermodynamic predictors, relative humidity (especially at 1000 hPa level) is the most effective predictor during all precipitation seasons. The lower level dynamic variables (psl, zg700) form the third important predictor set among regional models (7.6%) and mainly dominate during the WS.

Table 3 Frequency of the predictors (in percentage) in final seasonal precipitation models for different sub-regions of the Indus basin of Pakistan. Column 1 shows the large-scale predictors used in this study. The number after each predictor reflects the atmospheric level (pressure level in hPa). Columns 2–4 represent the predictor frequencies in the different seasons. The last column shows the average of the seasonal predictor frequencies

These final predictors capture the seasonally varying regional climatology to explain observed precipitation patterns. For instance, the thermally induced local Hadley circulations during the MS are adequately represented by the meridional component of tropospheric winds. Similarly, the atmospheric moisture sources from local evapotranspiration within the basin and advection from different remote oceanic and terrestrial sources (e.g., the Arabian, Caspian, and Mediterranean Seas) during the PMS and WS play a crucial role in regional precipitation (e.g., Cannon et al. 2015; Curio and Scherer 2016; Mei et al. 2015; Syed et al. 2010). Our statistical models represent such moisture influences through the humidity-related predictors particularly during the PMS, when the strength of dynamic forcing reduces over this region. Thus, the identified predictors are representative of the major physical processes that govern the regional precipitation and therefore enhance the confidence in statistical simulations. Similarly, explanations are also valid for the predictors of the HA-UIB regionalization experiment due to their close similarity with the basin-wide experiment. More details about the physical explanations of the statistical models will follow in Section 5.

4.2.2 Winter season models

The summary of the seasonal modeling process (Table 4) shows that GLM-Tweedie models adequately represent the WS precipitation dynamics across all five precipitation regions of the Indus basin. For information about these regions and the RR stations, please see Fig. 3. Within its three UIB regions (R1, R2, and R5), the mean calibration (validation) root mean squared error (RMSE) amount to 27.14 mm (27.93 mm) with a corresponding MSESS of 41.60% (34.69%). The R3 region (stretching along the foothills of the southern Himalayans and covering the lower Hindukush areas further northwest) contains 16 study stations including some of the HA, and receives the highest precipitation amounts (93 mm per month) among all other UIB regions during the WS. Therefore, precipitation (snow and liquid) within this larger region dominates the river flows during the winter and early spring seasons and its accurate modeling is highly desirable. The RR station strongly represents precipitation characteristics of this region (correlation = 0.93 with regional centroid) and therefore allows to make reliable inferences over a broad altitudinal range (308 to 2744 m) in this region. A single predictor (va850) effectively models the observed precipitation of this RR with a high validation skill (49.57%). Another large UIB region (R1), which mainly covers the Karakorum and trans-Himalayan valleys of the Hindukush, also exhibits an acceptable validation skill (26.1%) despite a large number of dry months. This is mainly a cryosphere-dominated region, which is also strongly represented by its RR station (correlation = 0.83 with regional centroid) and provides inference over the relatively higher altitudes (1251–3200 m) of more uncertain trans-Himalayas. Zonal winds at 700 hPa level (ua700) and specific humidity at 1000 hPa level (hus1000) effectively model the observed precipitation patterns in this region. Similarly, another northern Himalayan region (R5) shows a higher predictability skill (28.39%) despite having the largest numbers of dry months and provides important information for the highest altitudinal bands (2156–4030 m) in our analysis. This region has the largest cryosphere extent, receives mainly solid precipitation and, together with R1, helps to regulate the hydrology of the Indus River. Such a skillful statistical representation of the observed WS precipitation that covers the larger spatio-altitudinal scales within the UIB can greatly help in reducing the uncertainty in future projections.

Table 4 Statistical performance of the seasonal downscaling models over the Indus Basin of Pakistan. Column 1 outlines the identified precipitation region, column 2 indicates the name of the regional representative (RR) station, and its correlation with respective regional centroid is given in the parenthesis. Column 3 (Reg. Alt) indicates the mean regional altitude in meters (m) above mean sea level and in parenthesis the range of regional altitude. Columns 4, 5, 6, 7 and 8 show the adapted GLM, number of dry months in time series of RR station, number of final predictors, predictor names, and number of PCs in final regression models, respectively. The mean monthly-observed (modeled) precipitation rates of RR station are shown in column 9 (10) respectively. The calibration (validation) root mean squared errors (RMSE) and mean squared error skill scores (MSESS) are reflected in columns 11 (12) and 13 (14) respectively. The blue (green) color indicates UIB (LI) seasonal regions and red color shows the average outcome of seasonal models over the entire Indus basin

Two LI regions (R4 and R6) also exhibit significant validation skills (25.39 % and 32.89%) for making reliable inferences over diverse conditions of the northeastern rain-fed areas, irrigated plains, and southwestern spate regions. However, precipitation in these regions requires both dynamic and thermodynamic predictors for effective modeling, as shown by the final models (Table 4). The relatively higher error rates in some of the WS cases are discussed in Section 5.3.

4.2.3 Pre-monsoon season models

During the PMS, the GLM-gamma models provide the best modeling performance with an overall skill score of approximately 50% during calibration and 42% in validation (Table 4). Moreover, the average validation performance (43.90%) and the average ability of the RR stations in describing the spatial perspective (correlation coefficient of 0.92) over four different UIB regions (R1, R3, R5, and R7, see also Fig. 4) are at its maximum, compared with other precipitation seasons. Within the UIB, the largest northern Himalayan region (R1) contains 21 study stations including most of the HA observatories and covers a larger altitudinal band (1251–4030 m). R1 exhibits a validation skill of nearly 50%. Similarly, two northwestern regions (R5 and R7) that exhibit higher monthly precipitation (106 mm and 82 mm) have also been skillfully modeled (50.42% and 46.97%) and allow reliable precipitation inference over an elevation band between 353 and 2591 m. The southern Himalayan UIB region (R3) that receives more than 100 mm of monthly precipitation and represents the lower northeastern elevations (508–2168 m) also demonstrates a good validation performance (about 29%). During the PMS simulations over the UIB, relative humidity is an important predictor (see Section 4.2.1) along with lower tropospheric winds, to explain the observed spatial variability. Overall, a highly skillful UIB modeling during this transitional season is important for the assessment of seasonal water availability in the future period.

Additionally, the PMS modeling performance over the two LI regions (R4 and R6) exhibits higher validation skills (40.11% and 37.64%, respectively) to infer precipitation dynamics over the irrigated plains and lower southwestern spate regions. These conditions are simulated by a set of dynamic and thermodynamic predictors.

4.2.4 Monsoon season

Precipitation dynamics are spatially more complex over the UIB during the MS, as it required five different precipitation regions (R1, R3, R4, R5, and R7, see also Fig. 5). Moreover, both the GLM-gamma and Tweedie models are required for statistical simulations of observed precipitation variability over the UIB. Nonetheless, a reasonable average validation performance across all of these UIB regions (32.73%) contributes to confidence in resulting spatial inferences. Within the UIB, maximum validation skill (52.41%) is demonstrated over the wettest southern Himalayan region R3, which receives monthly precipitation sums of 181 mm over a larger vertical horizon (122 to 2591 m). The RR station for this region exhibits a correlation of 0.84 with the regional centroid. The R3 region mainly receives liquid precipitation, which helps to regulate the downstream reservoir operations (e.g., hydropower and irrigation). The modeling process of the two other UIB regions (R1 and R7) also exhibits reasonable validation skills (28.57% and 23.65%, respectively). These regions reflect the highest observed altitudinal range (1251 to 4030 m and 2156 to 4730 m) over the Karakoram and Hindukush areas. Similarly, two northwestern regions (R5 and R4) also demonstrate reasonable validation skills (20.71% and 39%) and represent an altitudinal range from 962 to 3719 m. A strong dynamic forcing during the MS helped to largely resolve the complex MS processes within the UIB, as shown by various predictors of the statistical models (Table 4).

In contrast, the homogenous topography and the strong MS influence across the LI helps to relatively better model precipitation over spate and irrigated regions (R4 and R2), with validation skills of 32.06% and 42.32 %, respectively. Spate regions not only provide an abundance of flows during the MS to support diverse livelihood but also cause regular havoc through damage to downstream infrastructure and human lives during extreme events. Therefore, its consideration is important to minimize the socio-economic suffering as well for enhancing national water availability through effective planning.

4.3 HA-UIB modeling

The HA-UIB modeling summary (Table 5) shows some performance compromises. For instance, the mean validation skills over four WS regions show a decrease of 3.5% compared with the corresponding skills in the basin-wide modeling (34.69%). However, this performance loss is primarily due to a poor simulation of a single-station region (R4), which exhibits a lower validation skill (18.42 %). The performance for this particular region suffers from a relatively lower precipitation rate (7 mm/month), a higher number of dry months (33), and the presence of some outliers. All other WS regions in the HA-UIB modeling exhibit comparable skills by mainly using different dynamical predictors. However, the trans-Himalayan regions (R2, R4) additionally require specific humidity as a predictor.

Table 5 Same as Table 4, but for HA-UIB regions

Four PMS regions show a very similar average skill (42.62%) compared with the basin-wide modeling (43.90%) despite the presence of some dry months in time series of some RR stations (R3 and R4). Thermodynamic contributions still largely dominate the PMS precipitation simulations (Table 5), where the lower level relative humidity appears as the most influential predictor (i.e., best SPM) in three out of four sub-regional models.

Similarly, the average modeling performance in the MS across its three different precipitation regions also exhibits nearly similar prediction skills (30.60%, ~ 2% less). Still, the dynamic predictors largely dominated these HA-MS simulations as shown by the final statistical models (Table 5), although the model for the northern Himalayan region (R1) also includes the contribution of specific humidity.

Overall, the HA-UIB regionalization significantly improves the spatio-altitudinal representation over the UIB in all seasons, without losing much of the predictability power. This increased spatial coverage helps to infer precipitation dynamics over the hydrologically important yet highly uncertain HA regions within the UIB. The resulting inferences can be significantly trusted for the wetter southern Himalayan regions during all seasons and the northwestern Hindukush and the Karakorum regions during the PMS and WS due to the stronger representations by the respective RR stations. However, caution is required during the MS particularly over the northwestern Hindukush region (R2) and the Karakoram-central region (R1) due to relatively poor representation by the respective RR stations. The weaker MS penetration into northwestern and Karakoram region could be one of the reasons. Overall, thermodynamic contributions increase in the trans-Himalayan regions, which reflect the role of atmospheric moisture buildup around the UIB from different moisture sources, as demonstrated by various centers of variation for humidity-related predictors (not shown).

5 Physical consistency of the statistical relationships: composite analysis

Composites can help to identify atmospheric conditions during different precipitation regimes to evaluate and explain the physical mechanisms of statistical models. The choice of model(s) to be presented in the following section is based on the relative importance for the season (i.e., the core precipitation seasons), the significance of the regional scale, prediction performance of the statistical models, and a distinct domination of a single PC in the final regression models. Using these multiple considerations, we select one MS and one WS region to discuss the atmospheric characteristics underlying the statistical models.

5.1 MS synoptic analysis using composite

During the MS, a larger region (R3) that contains 12 stations and receives a higher amount of precipitation (181 mm/month) is selected for the composite analysis. We use upper level meridional winds (va200) to model the precipitation variations at its RR station, which exhibits a high statistical skill during validation (52.41%). We use PC1, which has a correlation (Spearman) of 0.62 with the observations of RR station to construct composites. The positive loading pattern of PC1 (Fig. 7) stretches over a larger region (from Bay of Bengal to the East African coast) with its center over the Arabian Sea. Positive PC scores indicate drier conditions and vice versa. We select a threshold of ± 1.5 PC scores, which helps to clearly define these contrasting patterns (Table 6) with ten positive cases (average 69 mm per month) and six negative cases (average 344 mm per month).

Fig. 7
figure 7

Loading pattern of PC1-va200 (most influential PC in the final regression model due to the highest absolute regression coefficient) during the MS season. The green contours show labeled loadings over different regions

Table 6 A statistical summary of the MS composite analysis of a selected UIB precipitation region (R3). The table provides information about the association between observed and modeled precipitation, used GLM framework to identify governing circulations (predictor) and the regression coefficients of different PCs including the intercept term. The last column gives the significance level of the estimated model coefficients. The information about the most influential PC, its correlation with observations of the RR, threshold of PC scores used to subset contrasting precipitation regimes, and their statistical summaries are also provided in the table

The monsoon circulation is characterized by an anticyclone with its center along the southern boundary of the Tibetan Plateau, a tropical easterly jet in the upper troposphere, a westerly jet over the Arabian Sea, and the southwesterly flow over the Bay of Bengal in the lower troposphere (e.g., Krishnamurti 1973; Ashfaq et al. 2017; Duan et al. 2012). More recently, the analysis of extratropical influences on regional MS dynamics has identified an upper level high over the west central Asia (WCA) as a stronger MS precursor (e.g., Kazmi et al. 2016). In our composite analyses, we analyze the relevant dynamical circulations in the upper and lower tropospheres to explain the associated physical mechanisms.

During the wet MS composite of zg200 (Fig. 8a), a stronger high-pressure anomaly over the WCA and a low-pressure anomaly over the Indian Peninsula reflect amplification of the upper level anticyclone and tropical easterly jet. Strong upper level easterlies (Fig. 8a) point towards a stronger than normal local Hadley circulation. All of these conditions should favor a stronger than normal cross-equatorial moisture flow. Additionally, the circulation anomalies in Fig. 9a exhibit stronger than normal westerlies over the Arabian Sea and south-westerlies over the Bay of Bengal. These upper and lower tropospheric anomalies (Figs. 8c and 9c) in monsoon dynamics point towards a stronger than normal monsoon due to an enhanced moisture supply through oceanic sources and an increased dynamic and orography forcing (Hunt et al. 2018; Ashfaq et al. 2017; Curio and Scherer 2016; Mei et al. 2015; Saeed et al. 2010).

Fig. 8
figure 8

Standardized composite means of geopotential heights and atmospheric winds at 200 hPa (wind vector, m/s) during the MS precipitation season for a sample UIB region (R3). The panel (a) shows the mean state of the large-scale circulations during above-normal precipitation and (b) represents the conditions during below-normal precipitation over R3. Panel (c) reflects the difference between these two circulations (a minus b)

Fig. 9
figure 9

(a–c) Same as Fig. 8 but for lower tropospheric wind composites at 850 hPa atmospheric level (wind vector, m/s) during the MS precipitation season

In contrary, the composite during dryer MS (Fig. 8b) exhibits an eastward shift in the upper level anticyclone and a weakening of the tropical easterly jet. The weakening of the easterly jet (Fig. 8b) points towards a weaker than normal local Hadley circulation, which reduces the cross-equatorial flow. Similarly, an eastward shift of the anticyclone is linked with drier than normal conditions over western South Asia and the UIB, and wetter than normal conditions over eastern South Asia (Ashfaq et al. 2009; Soman and Kumar 1993). In the lower levels, both, westerlies over the Arabian Sea and south-westerlies over the Bay of Bengal, exhibit a weakening. It should be noted that the Arabian Sea and Bay of Bengal are important moisture sources for northern and northwestern South Asia (Mei et al. 2015). Therefore, anomalies in the lower level circulations should (Fig. 8c) lead to a decrease in moisture supply from these two oceanic sources, which, when combined with upper level circulation anomalies, lead to a decrease in the MS precipitation particularly over western South Asia that includes the UIB (Wang et al. 2019; Ashfaq et al. 2017; Ahmad et al. 2012).

The composite differences (Fig. 8c) reflect the relative strength of upper level circulations (WCA high, tropical easterly jet) to restrict the interaction of mid-latitude MS currents around UIB and their lower level dynamical signatures (Fig. 9c) to produce an excessive MS precipitation around UIB.

5.2 WS synoptic analysis using composite

Similar to the MS, we construct composites for the WS to analyze the circulation characteristics during the wet and dry precipitation regimes over a selected UIB region. We select region R3 in the WS for the construction of composites, which contains 16 stations and receives a high regional precipitation (70 mm per month). A single predictor (va850) effectively models the observed precipitation at its RR station with a validation skill of 53%. PC2, which has the highest absolute regression coefficient (0.51) and high co-variability with the RR station (correlation coefficient of 0.65) is used to subsample the composites. The higher positive loadings of PC2 extend over a larger region that stretches from southern Pakistan to Africa and is centered over the Arabian Sea off the East African coast (Fig. 10). Due to a positive regression coefficient and the loading pattern, the higher positive PC scores (Table 7) support above-normal precipitation (12 cases, 187 mm per month), whereas the higher negative PC scores support below-normal precipitation (4 cases, 8 mm per month).

Fig. 10
figure 10

Loading pattern of PC2-va850 (most influential PC in the final regional model due to the highest absolute regression coefficient) during the WS precipitation. The green contours show labeled loadings over different regions

Table 7 Same as Table 6, but provide information about a selected UIB region (R3) during the WS

During the WS, the extratropical storms originate mainly at the surface level around the North Atlantic, Europe, and the Mediterranean region, and propagate eastward by prevailing westerlies at mid and upper tropospheric levels to produce precipitation over the UIB (e.g., Martyn 1992; Barlow et al. 2005). Therefore, in the following discussion, we analyze composites of relevant dynamical predictors to describe the anomalous WS precipitation regimes over the UIB.

During a wetter WS, the sea level pressure composites (Fig. 11a) exhibit a higher pressure anomaly over northern Siberia and Europe and a low-pressure anomaly over northern Africa, which supports a strong eastward propagation of the extratropical cyclones, commonly known as the Western Disturbances (Ridley et al. 2013). Such an anomaly is analogous to the pattern that emerges during the negative phase of the North Atlantic Oscillation (NAO) and facilitates cyclogenesis activities around the Mediterranean, Europe, and the Middle East region. These storm tracks are further intensified over the WCA region due to a low-pressure anomaly (Fig 11a), which favors a positive precipitation anomaly over the UIB and surrounding areas (Syed et al. 2006). Moreover, a low sea level pressure anomaly over the Bay of Bengal, eastern Tibetan Plateau and a low-pressure anomaly over the southern Arabian Sea also support a stronger moisture transport from the Arabian Sea towards the UIB—a condition well known during the positive or warm phase of the El Niño-Southern Oscillation (ENSO), as shown by Syed et al. (2006, 2010) and Wang and Xu (2018). A cyclonic circulation anomaly in Fig. 12a and the eastward circulation anomaly over the Mediterranean Sea provide further evidence of excessive precipitation around the UIB. On the other hand, the dryer WS composite (Fig. 11b) exhibits a southward shift of the Siberian high and produces high sea level pressure anomalies over the UIB, Central Asia, and the Mediterranean, and a low sea level pressure anomaly over northern Europe. These conditions are analogous to the positive phase of the NAO and favor a northward movement of the extratropical cyclones tracks during the WS. Consequently, the lack of cyclogenesis activity reduces the strength and moisture transport towards the UIB. Additionally, the moisture supply from the Arabian Sea also greatly reduces due to the anomalous northerly winds (Fig. 12b) as also pointed out by Hunt et al. (2018). Wind anomalies over the Mediterranean Sea and land (Fig. 12b) also favor a reduced moisture flow, eventually resulting in lower than normal precipitation over the UIB as reflected in our corresponding composite. The difference plots (Figs. 11c and 12c) reflect the influence of relative dynamical forcing supporting a positive precipitation around the UIB.

Fig. 11
figure 11

(a–c) Same as Fig. 9 but for sea level pressure composites during the WS

Fig. 12
figure 12

(a–c) Same as Fig. 9 but during the WS precipitation

We also constructed similar composites for the geopotential heights at 700 hPa, 500 hPa, and 200 hPa atmospheric levels to analyze the corresponding circulations as the low-pressure systems travel at high altitudes towards the UIB (Fig. 13). The corresponding difference plots (Fig. 13a–c) confirm the presence of previously described mechanisms and their barotropic nature across the troposphere. It is important to note that some of the earlier studies relate a positive NAO signal with increased winter precipitation over the UIB (e.g., Syed et al. 2010; Ahmad et al. 2015), which is not consistent with our findings. Instead, we see evidence that perhaps the negative NAO supports positive winter precipitation over the UIB. Some other studies such as Archer and Fowler (2004) also point towards a mixed precipitation response during the positive and negative phases of the NAO. Therefore, further investigation is needed to establish a robust understanding of the NAO influence on winter precipitation variability over the UIB.

Fig. 13
figure 13

Composites of geopotential height differences across the troposphere during WS. (a) geopotential heights at 700 hPa, (b) geopotential heights at 500 hPa, and (c) geopotential heights at 200 hPa

5.3 Sources of uncertainty

In this study, we have attempted to improve seasonal precipitation assessments within the UIB by addressing many of the common uncertainty sources through methodological improvements. Still, our estimates are based on certain assumptions. For instance, despite significantly improving the HA representation in our station ensemble, altitudes beyond 4730 m (see Table 1) still have no representation in our analysis due to the unavailability of reliable time series data. This lack of representation of higher altitudes has the potential to influence the regionalization process and reliability of precipitation inferences over the UIB. However, it is worth mentioning that the bulk of the UIB precipitation falls between 3000 and 4500 m altitudes (Hewitt 2014), and its magnitude starts decreasing rapidly above 5000 m (Hewitt 2014; Winiger et al. 2005). Thus, it is expected that the regions of maximum precipitation are well represented in our station ensemble (see Section 3.1 and Table 1). Moreover, the mean equilibrium line altitude (ELA) is estimated between 4500 and 5500 m (e.g., Khan et al. 2015b) in the UIB and any precipitation beyond the ELA does not contribute to the runoff. We assume that precipitation variation between 4730 m and the ELA may be represented by our identified UIB regions, and thus, inferences can be extended to fill gaps beyond the available observations. Curio and Scherer (2016) classified the region around the UIB into at least four different precipitation classes to define spatiotemporal precipitation variation using HAR data that covers the entire orography of the UIB. We also identify on average four different regions based upon observations, which are in line with their findings and support our assumption regarding the adequacy of our regionalization in explaining effective orography of the entire UIB. Our study is also limited to only those station observations that are located on the Pakistan side of the UIB. While there are some online data sources (e.g., NOAA-NCDC’s data available from https://www.ncdc.noaa.gov) that provide observations of few stations on the Indian side of the UIB, those are mostly located in the low-altitude areas and do not overlap recent time period covered by the HA stations used in this study. Therefore, while it is understandable that additional station observations should improve the spatial coverage of the UIB, substantial effort is needed for the spatiotemporal homogenization of these datasets for drawing meaningful inferences, and particularly to infer orography, which is the focus of our study. We rather assume that precipitation variability explained by our regions may also represent the conditions in surroundings of the UIB of Pakistan, as the same large-scale forcing governs the regional precipitation. Already some studies (e.g., Archer and Fowler 2004) have pointed out that some of the Indian stations show stronger covariation with precipitation of adjacent stations of Pakistan.

Moreover, we have employed ERA-Interim-based predictors in our study, as the more recent ERA5 (Hersbach et al. 2018) was not publicly available at the time when this study was begun. Since statistical downscaling is based on the use of large-scale circulations, we do not expect that the improvements in the horizontal resolution in ERA5 will have a significant impact on the downscaling results. In any case, we evaluate the robustness of ERA-Interim-based final predictors by performing a similar PCA on ERA5 predictors and use Taylor diagrams (Taylor 2001) for the comparison of loading patterns (centers of variation) of the two datasets. We find a high correspondence among dynamical predictors that dominate our final regression models (Section 4.2.1). Thus, the strong correspondence of governing PCs among these different datasets (not shown) provides confidence in the use of ERA-Interim as predictors.

Another shortcoming is related to the RMSEs in Tables 4 and 5, which are sometimes higher than the corresponding precipitation rates. However, the relatively higher RMSEs are associated with a higher number of zeros in the observed time series, which increases the total errors due to the addition of smaller errors in simulating exact zeros in the regression models. In the case of gamma or those Tweedie models that have very few number of months with zero precipitation, the mean errors are within an acceptable range (e.g., in Table 4, PMS-R3 and WS-R3). Therefore, despite relatively higher error rates during some (relatively dry) winter cases, the ability of our models to simulate a large fraction of observed seasonal variance is still significant.

Despite some of these uncertainties, our study presents a unique approach to assess and explain fine-scale precipitation variability over large parts of the effective drainage basin using large-scale circulations and carefully selected actual observations. This will serve to meet communal and scientific interests by providing an alternative to analyze the complex, climate-sensitive, and yet data-scarce region, within the available observational framework.

6 Summary and conclusions

Complex processes govern spatiotemporal characteristics of regional precipitation over the UIB. An inadequacy of observational coverage within the UIB and regional limitations of various de facto precipitation products further compound the challenge of a reliable estimation of precipitation at varying time scales. In this study, we employed atmospheric circulations within a statistical modeling framework to resolve the observed (seasonal) precipitation over the Indus basin of Pakistan with a focus on the UIB. By taking a distinct advantage of data from recent HA observatories, we adapted a K-means cluster analysis to demonstrate the effectiveness of relatively low-altitude stations with historic data to provide precipitation inferences over more uncertain, but hydrologically important, HA of the UIB. This precipitation regionalization scheme also captured the spatial heterogeneity of precipitation over the UIB by identifying more sub-regions within the UIB, when compared with the ones identified over the LI. We argue that the precipitation regionalization is effective in filling out the spatial gaps over the UIB despite the lack of observations at very high altitudes. Furthermore, the RR stations for each of the precipitation regions were carefully identified to serve as predictands. We used different dynamical and thermodynamic variables from the ERA-Interim reanalysis as predictors.

We adapted a GLM framework with gamma and Tweedie distributions to model the typical predictor- predictand relationships across the Indus River basin during the major precipitation seasons. The final selection of precipitation models was based on minimizing the MSE (maximizing the MSESS) by duly considering the multi-collinearity among different predictors within a robust cross-validation framework. Overall, the precipitation models exhibited a high performance over the wetter southern Himalayans and various LI regions by mainly using different dynamical predictors. The typical modeling framework also demonstrated an adequate performance in resolving the seasonal precipitation of cryosphere-dominated and topographically complex trans-Himalayans regions, which largely govern the hydrological regime of the Indus River system. Knowing that highly localized processes are difficult to model and also govern some part of the mountainous precipitation variability, the modeled skills over different parts of the UIB are quite reasonable to explain the precipitation dynamics using large-scale atmospheric circulations.

A skillful modeling of the trans-Himalayan regions, particularly during the westerly dominated circulations (PMS and WS) which mainly nourish the seasonal snowpacks, should help to improve our limited understanding about the spatial characteristics of regional precipitation and to assess the future stability of associated cryosphere. However, relatively complex models containing both dynamic and thermodynamic predictors were required to simulate such cryosphere-dominated trans-Himalayan regions. Moreover, we also separately modeled the HA part of the UIB due to its importance in the basin hydrology and associated greater uncertainty. We also demonstrated the robustness of ERA-Interim circulations against the latest ERA5 reanalysis, and the insensitivity of the final precipitation models against different precipitation measurement losses.

Furthermore, we constructed composites for two major wet seasons (MS and WS) to analyze and explain the large-scale atmospheric structures during the wet and dry precipitation anomalies over the UIB. These composites provided additional confidence regarding the reliability of the statistical models by explaining the physical processes that govern precipitation dynamics over the complex UIB. We showed that the strength and location of an upper level high over the WCA (Siberian high) influence the anomalous behavior of UIB precipitation by triggering different dynamical and thermodynamic instabilities during the MS (WS).

We have also simultaneously modeled the precipitation distribution over different LI regions in a similar way to have a coherent perspective about the water supply (UIB) and its demand (LI). Such investigations will further help in devising appropriate strategies for water resources planning at the basin scale. These statistically skillful and physically consistent models also offer an opportunity to investigate the sub-regional response of the Indus basin under different climate scenarios by downscaling relevant circulations from the GCMs. Given that the latest GCMs tend to exhibit a relatively better skill in representing large-scale circulations when compared with their raw precipitation output (e.g. Mueller and Seneviratne 2014), future precipitation changes, derived through such predictors, may provide more realistic policy feedback to support the regional adaptations.

However, considering melt-dominated hydrological regime of the Indus basin, the downscaling of precipitation alone cannot fulfill the requirements for understanding the basin hydrology. Therefore, similar efforts are needed to downscale regional temperatures. Thus, we plan to apply similar statistical downscaling techniques on the temperature fields in addition to the use of these models for precipitation projections amid different climate scenarios.