1 Introduction

In Italy, the impact of the climate on water supply is a major concern, as it can cause problems in terms of water shortages and energy waste (Colombo and Karney 2003).

Several studies have used weather variables to explain demand variability (Slavíková et al. 2013; Ashoori et al. 2016; Haque et al. 2017; Toth et al. 2018; Manouseli et al. 2019; Xenochristou et al. 2020; Xenochristou et al. 2021). Most of these studies have found that water demand is positively related to the temperature (Ashoori et al. 2016; Toth et al. 2018; Manouseli et al. 2019; Xenochristou et al. 2020). Furthermore, different studies have shown the importance of accounting for additional explanatory factors, such as socio-demographic variables and types of users, in the characterization of water demand patterns (Mamade et al. 2014; Fiorillo et al. 2020; Xenochristou et al. 2020; Xenochristou et al. 2021).

Overall, since daily and weekly fluctuations in weather variables are highly associated with changes in water demand (De Souza et al. 2015; Chang et al. 2014), climate change is also expected to influence water demand (Parandvash and Chang 2016).Previous studies (Neale et al. 2007; Zachariadis 2010; Polebitski et al. 2011; Jampanil et al. 2012; Babel et al. 2014; Kanakoudis et al. 2017; Rasifaghihi et al 2020; Zubaidi et al. 2020) have shown that the likely increase in water demand due to climate change would vary widelybased on geographic location and climatic conditions (Wang et al. 2014).

Changes in water demand could affect the existing water systems in terms of capacity and operation. Specifically, increases in water demand can cause imbalance in water resources and problems in storage capacity, worsening the situation of water shortages that Mediterranean countries, such as Italy, are already experiencing (La Jeunesse et al. 2016). Therefore, knowing the extent of water demand changes due to climate change is needed for long-term climate adaptation planning (Wang et al. 2014; Parandvash and Chang 2016).

Despite the clear benefits, few studies have investigated climate change impacts on water demand in Italy, focusing on climate change effects on agricultural water demand (Bocchiola et al. 2013; Masia et al. 2018) and water supply (Peres et al. 2019). This paper aimsto improve the understanding of climate change effects in Italy, by investigating future variations in urban water demand due to climate change for a case study in Naples (Italy). This is achieved by linking water demand to weather based on climate change scenarios, via Coupled General Circulation Models (Grassl 2000). This link is established by developing Random Forest models (RFs) that predict daily water demand from weather variables. Changes in weather variables are estimated using different climate change scenarios obtained from the CCWorldWeatherGen (Jentsch et al. 2017). This tool transforms measured weather data into climate change adapted weather data through the “morphing” methodology developed by Belcher et al. (2005).

This study shows that water demand variations due to climate change could vary depending on the types of users. Previous studies focused on climate change impacts on water demand (Neale et al. 2007; Zachariadis 2010; Polebitski et al. 2011; Jampanil et al. 2012; Babel et al. 2014; Kanakoudis et al. 2017; Rasifaghihi et al 2020; Zubaidi et al. 2020) have neglected the variation of climate change effect according to the social characteristics of the users. This paper presents a novel methodology to assess future variations in water demand for different types of users. The methodology allows to determine climate change effects on water demand for different groups of users which vary according to their social characteristics. Therefore, the proposed methodology represents an innovative tool for water utilities to assess future variations in water demand more accurately.

2 Case Study

This work utilises water consumption, social characteristics and weather data from a case study in Naples (Italy).

The consumption data were provided by the District Metered Area (DMA) located in Soccavo, in the North-Western part of Naples (Italy). In this DMA the municipal water company “Acqua Bene Comune Napoli” (ABC) replaced 4989 traditional water meters with smart meters. Hourly consumption data from residential and non-residential water meters are collected and communicated daily to the utility central server through a fixed wireless network.

In this work, 1067 residential meters were considered, using the data collected at the household level from 20 March 2017 to 19 March 2018. In order to analyse daily water demand for each user, the hourly data were aggregated at the daily scale.

Furthermore, according to the data provided by Istat (Italian National Institute of Statistics) for each census section included in the DMA, the following characteristics were considered:

  • the average level of employment of household members;

  • the average educational level of household members.

The social characteristics of each household of the DMA were determined based on the related census section. According to the available information, each household was classified on the basis of state of employment and educational level as shown in Table 1.

Table 1 Classification of the households of the DMA according to their main social characteristics and description of households groups

In addition, daily maximum air temperature and daily mean solar radiation data over the same period (20 March 2017—19 March 2018) were collected. The weather data were recorded at intervals of 30 min by the weather station of the University of Naples Federico II. The weather station was chosen due to its proximity to the DMA. These data were also aggregated at the daily scale. According to the recorded data, the highestdaily maximum temperature occurred in summer, with an average of ∼ 30 °C, whereas the highest values of daily mean solar radiation were recorded in both spring and summer, with an average of ∼ 270 W/m2.

3 Methodology

In the following subsections an innovative methodology to assess future variations in water demand due to climate change is proposed. First, 3 configurations of RFs based on weather variables are presented. Then, the innovative methodology based on RFs and climate change scenarios obtained through the CCWorldWeatherGen (Jentsch et al. 2017) is explained.

3.1 Forecasting Water Consumption using Weather Variables

In this work, regression RFs were implemented. RFs can be used for both classification and regression, for categorical and continuous response variable, respectively (Cutler et al. 2011).

In RF regression, from the training data \(D=\left\{\left({x}_{1},{y}_{1}\right),\dots ,\left({x}_{N},{y}_{N}\right)\right\}\), where \({x}_{i}={\left({x}_{i,1},\dots ,{x}_{i,p}\right) }^{T}\) represents the \(p\) predictors and \({y}_{i}\) denotes the response, for the generic tree \(j\) \(\left(j=1, 2,\dots ,{n}_{t}\right)\) a bootstrap sample \({D}_{j}\) of size N is taken from D (Breiman 2001). Then, the tree is fitted by using \({D}_{j}\) as training data and applying the binary recursive partitioning (Cutler et al. 2011). Specifically, starting with all observations in a single node, for each un-split node, \(m\) predictors among the \(p\) available predictors are randomly selected. The node is then split into two descendant nodes using the best binary split among all binary splits on the \(m\) predictors. In the regression context, the mean squared residual at the node is usually used as a splitting criterion. The algorithm goes on until a stopping criterion is satisfied, i.e. when the tree has reached the maximum allowed depth. All the resulting trees are finally combined by averaging their responses. Therefore, the prediction at the generic point \(x\) is made as follows:

$$\widehat{y}\left( x \right) = \frac{1}{{n_{t} }}\sum\nolimits_{j = 1}^{{n_{t} }} {\widehat{{h_{j} }}\left( x \right)}$$
(1)

where \(\widehat{{h_{j} }}\left( x \right)\) is the prediction of the response variable at x using the \(j-th\) tree.

RFs allow merging together the predictions of multiple decision trees to get a prediction more accurate and stable than the one provided by individual decision trees.

In order to improve the accuracy, the following model hyperparameters were tuned:

  • the number of predictors randomly selected at each node (\(m\));

  • the number of trees (\({n}_{t}\));

  • the minimum size of terminal nodes (\({n}_{d}\)).

The available dataset was split in calibration subset, made up of the odd lines, and validation subset, made up of the even lines. RFs were trained using the calibration dataset, whereas the validation dataset was used to evaluate the model performance on unseen data. The accuracy was assessed using the Root Mean Square Error (RMSE) and the coefficient of determination (R2):

$${\text{RMSE}} = \sqrt{\frac{{\sum \nolimits_{i = 1}^{N} \left( {\widehat{{y_{i} }} - y_{i} } \right)^{2} }}{N}}$$
(2)
$${\text{R}}^{2} = \frac{{\sum \nolimits_{i = 1}^{N} \left( {\widehat{{y_{i} }} - \overline{y} } \right)^{2} }}{{\sum\nolimits_{i = 1}^{N} \left( {y_{i} - \overline{y} } \right)^{2} }}$$
(3)

where \({y}_{i}\) and \(\widehat{{y_{i} }}\) are the observed and forecasted values respectively, \(\overline{y}\) is the mean value of the observed values \({y}_{i}\) and \(N\) is the number of observations.

RFs were then used to predict the daily water demand at aggregated scale, i.e. the demand obtained by summing the daily demand of each user of the DMA.

Since past consumption is not always available for water utilities it is important to explore an alternative strategy. Furthermore, past consumption can conceal the effect of other predictors by carrying the same information (Xenochristou et al. 2021). For these reasons, 3 configurations of RFs, shown in Table 2, were developed to investigate the performances of weather variables as predictors. The first configuration of the model (Model 1) accounts for the combined effect of temperature and solar radiation, whereas Models 2 and 3 investigate the individual effect of temperature and solar radiation, respectively. Specifically, Models 2 and 3 were developed to take into account the possible interaction between weather variables. In the case of variable interaction, the predictors can provide overlapping information to the model. Thus, the influence of each predictor can be concealed by overlapping information, affecting the forecasting model. Temporal characteristics – i.e. type of day (working day or holiday), season, month and weekday—were considered in all configurations since they are always easily accessible to water utilities. Table 2 also shows the results of the tuning for each model.

Table 2 Explanatory variables and hyperparameters of each model

In order to investigate the influence of weather on different user types, the models were applied to forecast the aggregated demand of three groups of households. The description of the groups is reported in Table 1. The groups differ in the employment state and educational level of the residents. Group 1 consists of households where members are on average employed with a high average educational level (high school/university degree). Group 2 has members that are on average unemployed with primary/secondary school degree. Group 3 is made up of households where members are on average employed with primary/secondary school degree. It is worth noting that, according to the available information about the social characteristics of the users (i.e. state of employment and educational level), all the possible groups were identified. Further groups can be obtained by grouping the households based on one classification rather than on both state of employment and educational level (e.g. grouping together all households with employed members). However, these groups would be very heterogeneous, reducing the differences between each group and, thus, the benefit of disaggregating water consumption.

In order to take into account the effect of the group size on the forecasting accuracy (Xenochristou et al. 2020), groups with the same number of households were required. Given that Group 2 was the smallest group, samples with the same size of Group 2 (i.e. 125 households) were considered for Group 1 and Group 3. In order to both limit the calculation time and obtain representative results for each group, the number of samples was chosen proportionally to each group size. In order to select a number of samples that was proportional to the size of Group 1, 5 samples were randomly selected for this group, since it was almost 5 times greater than the samples’ size. Then, the models were applied to each sample and the results in terms of RMSE and R2 were averaged among all samples. The average results, and hence the forecasting accuracy, are expected to remain almost unchanged even for higher number of samples, since the samples selected are enough to be truly representative of all the users of the group. Similarly, the values of RMSE and R2 for Group 3 were determined by averaging the results obtained for 3 random samples.

3.2 Forecasting Changes in Water Consumption based on Climate Change

The aim of the methodology is to assess the impact of climate change on water demand. In the first phase, weather time series are generated based on climate change scenarios, then RFs are used to predict the daily aggregated water demand.

In the first phase, the CCWorldWeatherGen tool (Jentsch et al. 2017) is used to generate reliable climate change weather scenarios. The CCWorldWeatherGen uses the data from the HadCM3 (Hadley Centre Coupled Model, version 3) A2 experiment ensemble provided by the IPCC Third Assessment Report (IPCC, 2001a, b, c). The HadCM3 is a coupled atmosphere–ocean general circulation model (AOGCM or CGCM), whose output consists of relative changes with respect to the period ranging from 1961 to 1990 (Collins et al. 2001). The CCWorldWeatherGen accounts only for the A2 emission scenario provided by the special report on the emission scenarios (SRES) published by the IPCC (2000). A2 is at the higher end of the emissions scenarios described by Nakicenovic et al. (2000). A high emission scenario, such as the A2, is more suitable in investigating the impacts of climate change on water consumptions. From a management standpoint, if the water company can cope with significant changes in consumptions due to large climate changes, then the smaller changes can be easily addressed. Once the baseline scenario (i.e. the data gathered from 1961 to 1990) is selected, the relative changes provided by the HadCM3 are superimposed on the meteorological parameters through the CCWorldWeatherGen tool. In this work, the baseline scenario for Naples provided by the World Meteorological Organization Region and Country was used. The measured weather data are transformed into climate change adapted weather data according to the “morphing” methodology developed by Belcher et al. (2005). Finally, the CCWorldWeatherGen tool generates climate change projections relative to the time periods 2041–2070 (Scenario-2050) and 2071–2100 (Scenario-2080), consisting of hourly time series for the whole year. It is worth noting that the HadCM3 A2 experiment ensemble does not provide the data for more frequent climate projections. Therefore, the CCWorldWeatherGen allowed to generate projections only for the time periods 2041–2070 and 2071–2100, being based on the HadCM3 A2 experiment ensemble data.

In this work the projections of temperature and solar radiation were aggregated into daily projections.

In the second phase, the RFs based on weather variables are used to predict the daily aggregated water demand time series for each climate change scenario. The measured data related to 2017–2018 (described in previous section) were used as training dataset and represent the Current Scenario. The main changes in water demand compared to the measured consumptions (i.e. Current Scenario) were investigated as follows.

The seasonal percentage change in water demand \(\Delta C\) can be easily calculated according to the following equation:

$$\Delta C = \frac{{\sum\nolimits_{i = 1}^{{N_{s} }} \frac{{\widehat{y}_{i} - y_{i} }}{{y_{i} }} \cdot 100}}{{N_{s} }}$$
(4)

where \({N}_{s}\) is the number of observations for the season \(s\).

Assuming a positive correlation between both weather variables and water demand (Toth et al. 2018), forecasting errors may be taken into account by neglecting days in which changes in water demand and weather compared to the measured data are of opposite sign.

Similarly, the percentage change in water demand during the peak periods (when the highest increase or decrease in weather variables is expected) based on climate change projections, can be determined. For example, after identifying the weeks with the highest weekly weather averages (peak weeks) by using the 7-day moving average, Eq. 4 can be applied accounting for the weekly averages of percentage change in water demand. In this work, water demands obtained through Model 2 were used to assess the demand increase related to the temperature peak weeks; similarly, Model 3 was used for solar radiation peak weeks.

The social characteristics of the users can be taken into account as well. The whole methodology can be applied to different groups of users, formed according to their social characteristics, by using their aggregated water demands (i.e. the daily total demands of each group). For the sake of simplicity, the methodology was applied only to Group 1 (Table 1).

Overall, the methodology allows to evaluate not only the seasonal likely variations in water demand due to climate change, but also the variations during the peak periods to avoid failure in water supply.

4 Results and Discussion

The following subsections discuss the results obtained applying the presented methodologies to the case study of Naples.

4.1 Prediction Accuracy of RFs based on Weather Variables

First, the results for the daily aggregated water demand of the DMA are presented. Table 3 shows the results of the RFs in terms of RMSE and R2 for the validation dataset. All models led to good results, showing good performances in terms of R2. Model 1 (based on temporal characteristics, temperature and solar radiation input) resulted in the best performance, meaning that this model had the highest prediction accuracy, as evidenced in the highest value of R2 (0.67) and the lowest value of RMSE (16448 L). This demonstrated the benefit of including both the weather variables (temperature and solar radiation) as predictors. Model 3, that besides temporal characteristics included solar radiation input, led to slightly better performances compared to Model 2 (including only temporal and temperature input).

Table 3 Results in terms of root mean square error (RMSE) and coefficient of determination (R2) obtained at aggregated scale for each model, for the DMA and each group of households

Figure 1 reports the comparison between measured and forecasted aggregated daily demands for Models 1 and 3. Each point represents one day. The most of the points follow the bisectors of the graphs, highlighting a good agreement between measured and forecasted aggregated daily demands. However, both models seem to overestimate the lowest demands and underestimate the highest ones. This result can be traced back to the structure of RFs which is based on averaging among different predictions. This could lead to underpredict the highest demands and overpredict the lowest demands. Furthermore, many forecasting modelsstruggle to predict outliers (Xenochristou and Kapelan 2020). In this case, bias correction methods can be used to improve the forecasting accuracy of the peak days. Similar results were obtained for Model 2 as well.

Fig. 1
figure 1

Comparison between measured and forecasted aggregated daily demand for Model 1 (a) and Model 3 (b)

The good performances of the models showed that weather variables can be effectively used to forecast water demands. These models can be used to estimate future demand changes based on climate change scenarios, since they were able to catch the variations due to weather.

Table 3 also reports the results obtained for the aggregated demand of each group of households. For Group 1, all models led to similar results. More specifically, Model 1 showed slightly better performances in terms of RMSE (i.e. the lowest value of RMSE = 2937 L). Model 3 also showed good level of accuracy (R2 = 0.66 and RMSE = 2948 L). Slightly worse performances (R2 = 0.65 and RMSE = 2985 L) were obtained for Model 2, although the level of accuracy is good.

For Group 2, the models showed the lowest performances. The worst performances (R2 = 0.51 and RMSE = 3216) were obtained for Model 3.

For Group 3, the best results (RMSE = 2630 L and R2 = 0.62) were obtained for Model 1. Models 2 and 3 led to reasonable levels of accuracy. More specifically, Model 3 resulted in slightly higher prediction accuracy compared to Model 2.

Overall, the best performances were observed for Group 1, although good performances were obtained for Group 3 using both solar radiation and temperature input. The models showed the lowest performances for Group 2.

These results prove a stronger relationship between weather variables and water demand for Group 1 and Group 3, meaning that employed users appear to be on average more affected by weather than the unemployed ones. Indeed, employed users spend more time outside and have more scheduled habits that can be easily affected by weather. Notably, better results were obtained for Group 1, consisting of users with high school/university degree. Thus the educational level seems a further discriminating factor in investigating not only the water uses (Hurd 2006; Makki et al. 2013) but also the effects of weather variables on water consumptions.

The performed analysis showed that there are types of users that are more affected by weather, confirming the importance of including the socio-economic status of the users when investigating the effects of weather on water demand (Domene and Sauri 2006; Chang et al. 2010; Xenochristou et al. 2020; Xenochristou et al. 2021).

Furthermore, the types of users mostly affected by weather will likely be primarily responsible of potential future variations in water demand related to climate change. At the same time, the types of users less sensitive to weather will probably have less of an impact. Knowing the number of users belonging to each type will enable the water utility to easily assess if the total district demand is expected to rise and, thus, avoid failure in water system capacity.

4.2 Impact of Climate Change on Water Demand

This section reports the results in terms of seasonal and peak variations in water demand with respect to the Current Scenario (measured data from 20 March 2017 to 19 March 2018). Figure 2 reports the seasonal values of daily maximum temperature and daily mean solar radiation for each scenario. During the study period the temperature variations are higher than the solar radiation ones.

Fig. 2
figure 2

Barplot of seasonal values of daily maximum temperature (a) and daily mean solar radiation (b) for each scenario

Table 4 presents the results obtained using Model 1 and Model 2 for Scenario-2050 and Scenario-2080, including the seasonal average increase of daily maximum temperature (\(\Delta T\)). This value represents the average, over each season, of the differences between the daily maximum temperatures for the respective future and current scenarios. The same applies to the daily mean solar radiation (\(\Delta SR\)).

Table 4 Seasonal average of daily maximum temperature and daily mean solar radiations (\({T}_{max}\) and \(SR\)), seasonal average increase of daily maximum temperature and daily mean solar radiation (\(\Delta T\) and \(\Delta SR\)) and seasonal percentage change in daily water demand (\(\Delta C\)) compared to the Current Scenario obtained by applying Model 1 and Model 2, for each season and climate change scenario

The seasonal percentage changes in water demand (\(\Delta C\)) for Model 1 were lower than those for Model 2, due to the low or negative \(\Delta SR\). According to the solar radiation projections of both Scenario-2050 and Scenario-2080, except for spring, the \(\Delta SR\) were low (Table 4). Recent studies using regional climate models at high resolution for climate change scenarios have shown an overall small decrease in solar radiation (Jerez et al 2015; Bartók et al 2017), in accordance with the scenarios considered here. However, in summer, despite the low/negative \(\Delta SR\), \(\Delta C\) could rise up to approximately 5% (Scenario-2080).

For Model 2, the highest \(\Delta C\) occurred in the summer, reaching up to 7.6% in Scenario-2080. It should be noted that, even if the \(\Delta T\) was comparable between summer and autumn in both scenarios, the \(\Delta C\) was higher in the summer. This suggests that with the same temperature increase, larger demand increases occur at higher values of temperature (around 30 °C). This observation is in accordance with the study of Xenochristou et al. (2021) which identified a temperature threshold beyond which water consumption of UK households starts increasing.

Figure 3a shows the results of Models 2 and 3 for the peak weeks. During temperature peak weeks, the increases in water demand were higher than the summer ones. The weekly \(\Delta C\) was equal to 9.6% in Scenario-2050 and 10.5% in Scenario-2080, for weekly temperatures equal to 34.3 °C (Scenario-2050) and 36.5 °C (Scenario-2080). For the solar radiation peak weeks, the \(\Delta C\) was equal to almost 6% for both Scenario-2050 (weekly solar radiation of 328 W/m2) and Scenario-2080 (348 W/m2). The lower \(\Delta C\) during solar radiation peak weeks demonstrated that water demand was more affected by temperature.

Fig. 3
figure 3

Weekly percentage change in total daily water demand of the DMA (a), and summer (b) and weekly (c) percentage change in total daily demand of Group 1, for scenarios 2050 and 2080

Figures 3b and c show the results for the aggregated water demand of Group 1 (622 households). The summer \(\Delta C\) (Figure 3b) was higher than the one obtained for the total district demand for both scenarios. Accounting for the combined effect of temperature and solar radiation (Model 1) resulted in \(\Delta C\) equal to 3.1% and 7.0%, for Scenario-2050 and Scenario-2080, respectively. Higher increases for both Scenario-2050 (7.1%) and Scenario-2080 (11%) were found by considering only the temperature effect (Model 2). The \(\Delta C\) for peak weeks (Fig. 3c) was higher than the one of the total district demand. The highest \(\Delta C\) (13—15%) was observed during temperature peak weeks, although high \(\Delta C\) were attained during solar radiation peak weeks as well. Overall, the demand of Group 1 resulted to be more affected by climate changes compared to the demand of the district.

The performed analysis showed that due to climate change the water demand could increase mostly during the weeks with the highest temperatures. Furthermore, the results demonstrated that the water demand of the type of users most affected by weather variables (Group 1) could increase more than the total district demand. Thus, the results showed the relevance of disaggregating consumption based on the social characteristics of the users to determine the climate change effects on water demand more accurately.

5 Conclusions

This study investigates the effect of weather variables on water demand in both current and future climate change scenarios. A novel methodology to assess the impact of climate change on water demand is presented. This methodology allows to forecast future variations in water demand due to likely changes in weather variables by using RFs and the CCWorldWeatherGen. The case study of Naples (Italy) showed the effectiveness of using weather variables in forecasting aggregated water demand.

According to future weather scenarios for 2040–2100, the daily water demand of the DMA could increase mainly due to increases in air temperature. During the weeks with the highest temperatures, increases in water demand could reach up to 9–10%. The increase in water demand was different for users with different social characteristics, since they were affected by weather to varying degrees. Employed users with high education could increase their consumption by 13–15% during the weeks with the highest temperatures.

Accounting for future variations in water demand due to climate changes is needed to avoid risks of supply and operational failures in water systems. This need is particularly remarkable for Italy, that, as the most Mediterranean countries, is already coping with water shortages (La Jeunesse et al. 2016).

In previous studies for the Mediterranean area (Zachariadis 2010; Collet et al. 2015; Kanakoudis et al. 2017) climate change scenarios were used for assessing future vulnerability of water resources. However, in these studies demographic statistics and past consumption trends were used to determine future water demand variations (that apply to the entire year). Instead, the methodology presented in this paper allows to directly determine future variations in water demand due to climate change based on seasons and peak weeks. Thus, the methodology provides more accurate projections based on future climate change scenarios, accounting for different periods of the year.

In this work, the effect of climate change on the total district demand was investigated using information about some social characteristics of the users. However, additional socio-economic information about the users could lead to a better disaggregation of the consumption, allowing to increase the number of households groups. In addition, in this work only one emission scenario was considered. Different results could be obtained with lower or higher emission scenarios. Therefore, future works will investigate climate change impacts on water demand of different types of users, by increasing the households grouping and using different emission scenarios. Furthermore, the CCWorldWeatherGen provided projections only for long timeframes. In order to more accurately assess the variations during the entire study period (i.e. 2040–2100), future works will investigate more frequent climate projections (e.g. every 5 years), especially in the near future, by using different CGCMs.