Assessing the Impact of Climate Change on Future Water Demand using Weather Data

Assessing the impact of climate change on water demand is a challenging task. This paper proposes a novel methodology that quantifies this impact by establishing a link between water demand and weather based on climate change scenarios, via Coupled General Circulation Models. These models simulate the response of the global climate system to increasing greenhouse gas concentrations by reproducing atmospheric and ocean processes. In order to establish the link between water demand and weather, Random Forest models based on weather variables were used. This methodology was applied to a district metered area in Naples (Italy). Results demonstrate that the total district water demand may increase by 9–10% during the weeks with the highest temperatures. Furthermore, results show that the increase in water demand changes depending on the social characteristics of the users. The water demand of employed users with high education may increase by 13–15% when the highest temperatures occur. These increases can seriously affect the capacity and operation of existing water systems.

Several studies have used weather variables to explain demand variability (Slavíková et al. 2013;Ashoori et al. 2016;Haque et al. 2017;Toth et al. 2018;Manouseli et al. 2019;Xenochristou et al. 2021). Most of these studies have found that water demand is positively related to the temperature (Ashoori et al. 2016;Toth et al. 2018;Manouseli et al. 2019;. Furthermore, different studies have shown the importance of accounting for additional explanatory factors, such as sociodemographic variables and types of users, in the characterization of water demand patterns (Mamade et al. 2014;Fiorillo et al. 2020;Xenochristou et al. 2021).
Overall, since daily and weekly fluctuations in weather variables are highly associated with changes in water demand (De Souza et al. 2015;Chang et al. 2014), climate change is also expected to influence water demand (Parandvash and Chang 2016).Previous studies (Neale et al. 2007;Zachariadis 2010;Polebitski et al. 2011;Jampanil et al. 2012;Babel et al. 2014;Kanakoudis et al. 2017;Rasifaghihi et al 2020;Zubaidi et al. 2020) have shown that the likely increase in water demand due to climate change would vary widelybased on geographic location and climatic conditions (Wang et al. 2014).
Changes in water demand could affect the existing water systems in terms of capacity and operation. Specifically, increases in water demand can cause imbalance in water resources and problems in storage capacity, worsening the situation of water shortages that Mediterranean countries, such as Italy, are already experiencing (La Jeunesse et al. 2016). Therefore, knowing the extent of water demand changes due to climate change is needed for long-term climate adaptation planning (Wang et al. 2014;Parandvash and Chang 2016).
Despite the clear benefits, few studies have investigated climate change impacts on water demand in Italy, focusing on climate change effects on agricultural water demand (Bocchiola et al. 2013;Masia et al. 2018) and water supply (Peres et al. 2019). This paper aimsto improve the understanding of climate change effects in Italy, by investigating future variations in urban water demand due to climate change for a case study in Naples (Italy). This is achieved by linking water demand to weather based on climate change scenarios, via Coupled General Circulation Models (Grassl 2000). This link is established by developing Random Forest models (RFs) that predict daily water demand from weather variables. Changes in weather variables are estimated using different climate change scenarios obtained from the CCWorldWeatherGen (Jentsch et al. 2017). This tool transforms measured weather data into climate change adapted weather data through the "morphing" methodology developed by Belcher et al. (2005).
This study shows that water demand variations due to climate change could vary depending on the types of users. Previous studies focused on climate change impacts on water demand (Neale et al. 2007;Zachariadis 2010;Polebitski et al. 2011;Jampanil et al. 2012;Babel et al. 2014;Kanakoudis et al. 2017;Rasifaghihi et al 2020;Zubaidi et al. 2020) have neglected the variation of climate change effect according to the social characteristics of the users. This paper presents a novel methodology to assess future variations in water demand for different types of users. The methodology allows to determine climate change effects on water demand for different groups of users which vary according to their social characteristics. Therefore, the proposed methodology represents an innovative tool for water utilities to assess future variations in water demand more accurately.

Case Study
This work utilises water consumption, social characteristics and weather data from a case study in Naples (Italy).
The consumption data were provided by the District Metered Area (DMA) located in Soccavo, in the North-Western part of Naples (Italy). In this DMA the municipal water company "Acqua Bene Comune Napoli" (ABC) replaced 4989 traditional water meters with smart meters. Hourly consumption data from residential and nonresidential water meters are collected and communicated daily to the utility central server through a fixed wireless network.
In this work, 1067 residential meters were considered, using the data collected at the household level from 20 March 2017 to 19 March 2018. In order to analyse daily water demand for each user, the hourly data were aggregated at the daily scale.
Furthermore, according to the data provided by Istat (Italian National Institute of Statistics) for each census section included in the DMA, the following characteristics were considered: the average level of employment of household members; the average educational level of household members.
The social characteristics of each household of the DMA were determined based on the related census section. According to the available information, each household was classified on the basis of state of employment and educational level as shown in Table 1.
In addition, daily maximum air temperature and daily mean solar radiation data over the same period (20 March 2017-19 March 2018) were collected. The weather data were recorded at intervals of 30 min by the weather station of the University of Naples Federico II. The weather station was chosen due to its proximity to the DMA. These data were also aggregated at the daily scale. According to the recorded data, the highestdaily maximum temperature occurred in summer, with an average of ∼ 30°C, whereas the highest values of daily mean solar radiation were recorded in both spring and summer, with an average of ∼ 270 W/m 2 .

Methodology
In the following subsections an innovative methodology to assess future variations in water demand due to climate change is proposed. First, 3 configurations of RFs based on weather variables are presented. Then, the innovative methodology based on RFs and climate change scenarios obtained through the CCWorldWeatherGen (Jentsch et al. 2017) is explained.

Forecasting Water Consumption using Weather Variables
In this work, regression RFs were implemented. RFs can be used for both classification and regression, for categorical and continuous response variable, respectively (Cutler et al. 2011).
In RF regression, from the training data D ¼ represents the p predictors and y i denotes the response, for the generic tree j j ¼ 1; 2; ; n t ð Þa bootstrap sample D j of size N is taken from D (Breiman 2001). Then, the tree is fitted by using D j as training data and applying the binary recursive partitioning (Cutler et al. 2011). Specifically, starting with all observations in a single node, for each un-split node, m predictors among the p available predictors are randomly selected. The node is then split into two descendant nodes using the best binary split among all binary splits on the m predictors. In the regression context, the mean squared residual at the node is usually used as a splitting criterion. The algorithm goes on until a stopping criterion is satisfied, i.e. when the tree has reached the maximum allowed depth. All the resulting trees are finally combined by averaging their responses. Therefore, the prediction at the generic point x is made as follows: where b h j x ð Þ is the prediction of the response variable at x using the j À th tree. RFs allow merging together the predictions of multiple decision trees to get a prediction more accurate and stable than the one provided by individual decision trees.
In order to improve the accuracy, the following model hyperparameters were tuned: & the number of predictors randomly selected at each node (m); & the number of trees (n t ); & the minimum size of terminal nodes (n d ).
The available dataset was split in calibration subset, made up of the odd lines, and validation subset, made up of the even lines. RFs were trained using the calibration dataset, whereas the validation dataset was used to evaluate the model performance on unseen data. The accuracy was assessed using the Root Mean Square Error (RMSE) and the coefficient of determination (R 2 ): where y i and b y i are the observed and forecasted values respectively, y is the mean value of the observed values y i and N is the number of observations. RFs were then used to predict the daily water demand at aggregated scale, i.e. the demand obtained by summing the daily demand of each user of the DMA.
Since past consumption is not always available for water utilities it is important to explore an alternative strategy. Furthermore, past consumption can conceal the effect of other predictors by carrying the same information (Xenochristou et al. 2021). For these reasons, 3 configurations of RFs, shown in Table 2, were developed to investigate the performances of weather variables as predictors. The first configuration of the model (Model 1) accounts for the combined effect of temperature and solar radiation, whereas Models 2 and 3 investigate the individual effect of temperature and solar radiation, respectively. Specifically, Models 2 and 3 were developed to take into account the possible interaction between weather variables. In the case of variable interaction, the predictors can provide overlapping information to the model. Thus, the influence of each predictor can be concealed by overlapping information, affecting the forecasting model. Temporal characteristicsi.e. type of day (working day or holiday), season, month and weekday-were considered in all configurations since they are always easily accessible to water utilities. Table 2 also shows the results of the tuning for each model.
In order to investigate the influence of weather on different user types, the models were applied to forecast the aggregated demand of three groups of households. The description of the groups is reported in Table 1. The groups differ in the employment state and educational level of the residents. Group 1 consists of households where members are on average employed with a high average educational level (high school/university degree). Group 2 has members that are on average unemployed with primary/secondary school degree. Group 3 is made up of households where members are on average employed with primary/secondary school degree. It is worth noting that, according to the available information about the social characteristics of the users (i.e. state of employment and educational level), all the possible groups were identified. Further groups can be obtained by grouping the households based on one classification rather than on both state of employment and educational level (e.g. grouping together all households with employed members). However, these groups would be very heterogeneous, reducing the differences between each group and, thus, the benefit of disaggregating water consumption.
In order to take into account the effect of the group size on the forecasting accuracy , groups with the same number of households were required. Given that Group 2 was the smallest group, samples with the same size of Group 2 (i.e. 125 households) were considered for Group 1 and Group 3. In order to both limit the calculation time and obtain representative results for each group, the number of samples was chosen proportionally to each group size. In order to select a number of samples that was proportional to the size of Group 1, 5 samples were randomly selected for this group, since it was almost 5 times greater than the samples' size. Then, the models were applied to each sample and the results in terms of RMSE and R 2 were averaged among all samples. The average results, and hence the forecasting accuracy, are expected to remain almost unchanged even for higher number of samples, since the samples selected are enough to be truly representative of all the users of the group. Similarly, the values of RMSE and R 2 for Group 3 were determined by averaging the results obtained for 3 random samples.

Forecasting Changes in Water Consumption based on Climate Change
The aim of the methodology is to assess the impact of climate change on water demand. In the first phase, weather time series are generated based on climate change scenarios, then RFs are used to predict the daily aggregated water demand.
In the first phase, the CCWorldWeatherGen tool (Jentsch et al. 2017) is used to generate reliable climate change weather scenarios. The CCWorldWeatherGen uses the data from the HadCM3 (Hadley Centre Coupled Model, version 3) A2 experiment ensemble provided by the IPCC Third Assessment Report (IPCC, 2001a, b, c). The HadCM3 is a coupled atmosphere-ocean general circulation model (AOGCM or CGCM), whose output consists of relative changes with respect to the period ranging from 1961 to 1990 (Collins et al. 2001). The CCWorldWeatherGen accounts only for the A2 emission scenario provided by the special report on the emission scenarios (SRES) published by the IPCC (2000). A2 is at the higher end of the emissions scenarios described by Nakicenovic et al. (2000). A high emission scenario, such as the A2, is more suitable in investigating the impacts of climate change on water consumptions. From a management standpoint, if the water company can cope with significant changes in consumptions due to large climate changes, then the smaller changes can be easily addressed. Once the baseline scenario (i.e. the data gathered from 1961 to 1990) is selected, the relative changes provided by the HadCM3 are superimposed on the meteorological parameters through the CCWorldWeatherGen tool. In this work, the baseline scenario for Naples provided by the World Meteorological Organization Region and Country was used. The measured weather data are transformed into climate change adapted weather data according to the "morphing" methodology developed by Belcher et al. (2005). Finally, the CCWorldWeatherGen tool generates climate change projections relative to the time periods 2041-2070 (Scenario-2050) and 2071-2100 (Scenario-2080), consisting of hourly time series for the whole year. It is worth noting that the HadCM3 A2 experiment ensemble does not provide the data for more frequent climate projections. Therefore, the CCWorldWeatherGen allowed to generate projections only for the time periods 2041-2070 and 2071-2100, being based on the HadCM3 A2 experiment ensemble data.
In this work the projections of temperature and solar radiation were aggregated into daily projections.
In the second phase, the RFs based on weather variables are used to predict the daily aggregated water demand time series for each climate change scenario. The measured data related to 2017-2018 (described in previous section) were used as training dataset and represent the Current Scenario. The main changes in water demand compared to the measured consumptions (i.e. Current Scenario) were investigated as follows.
The seasonal percentage change in water demand ΔC can be easily calculated according to the following equation: where N s is the number of observations for the season s. Assuming a positive correlation between both weather variables and water demand (Toth et al. 2018), forecasting errors may be taken into account by neglecting days in which changes in water demand and weather compared to the measured data are of opposite sign.
Similarly, the percentage change in water demand during the peak periods (when the highest increase or decrease in weather variables is expected) based on climate change projections, can be determined. For example, after identifying the weeks with the highest weekly weather averages (peak weeks) by using the 7-day moving average, Eq. 4 can be applied accounting for the weekly averages of percentage change in water demand. In this work, water demands obtained through Model 2 were used to assess the demand increase related to the temperature peak weeks; similarly, Model 3 was used for solar radiation peak weeks.
The social characteristics of the users can be taken into account as well. The whole methodology can be applied to different groups of users, formed according to their social characteristics, by using their aggregated water demands (i.e. the daily total demands of each group). For the sake of simplicity, the methodology was applied only to Group 1 (Table 1).
Overall, the methodology allows to evaluate not only the seasonal likely variations in water demand due to climate change, but also the variations during the peak periods to avoid failure in water supply.

Results and Discussion
The following subsections discuss the results obtained applying the presented methodologies to the case study of Naples.

Prediction Accuracy of RFs based on Weather Variables
First, the results for the daily aggregated water demand of the DMA are presented. Table 3 shows the results of the RFs in terms of RMSE and R 2 for the validation dataset. All models led to good results, showing good performances in terms of R 2 . Model 1 (based on temporal characteristics, temperature and solar radiation input) resulted in the best performance, meaning that this model had the highest prediction accuracy, as evidenced in the highest value of R 2 (0.67) and the lowest value of RMSE (16448 L). This demonstrated the benefit of including both the weather variables (temperature and solar radiation) as predictors. Model 3, that besides temporal characteristics included solar radiation input, led to slightly better performances compared to Model 2 (including only temporal and temperature input). Figure 1 reports the comparison between measured and forecasted aggregated daily demands for Models 1 and 3. Each point represents one day. The most of the points follow the bisectors of the graphs, highlighting a good agreement between measured and forecasted aggregated daily demands. However, both models seem to overestimate the lowest demands and underestimate the highest ones. This result can be traced back to the structure of RFs which is based on averaging among different predictions. This could lead to underpredict the highest demands and overpredict the lowest demands. Furthermore, many forecasting modelsstruggle to predict outliers . In this case, bias correction methods can be used to improve the forecasting accuracy of the peak days. Similar results were obtained for Model 2 as well.
The good performances of the models showed that weather variables can be effectively used to forecast water demands. These models can be used to estimate future demand changes based on climate change scenarios, since they were able to catch the variations due to weather. Table 3 also reports the results obtained for the aggregated demand of each group of households. For Group 1, all models led to similar results. More specifically, Model 1 showed slightly better performances in terms of RMSE (i.e. the lowest value of RMSE = 2937 L). Model 3 also showed good level of accuracy (R 2 = 0.66 and RMSE = 2948 L). Slightly worse performances (R 2 = 0.65 and RMSE = 2985 L) were obtained for Model 2, although the level of accuracy is good.
For Group 2, the models showed the lowest performances. The worst performances (R 2 = 0.51 and RMSE = 3216) were obtained for Model 3.
For Group 3, the best results (RMSE = 2630 L and R 2 = 0.62) were obtained for Model 1. Models 2 and 3 led to reasonable levels of accuracy. More specifically, Model 3 resulted in slightly higher prediction accuracy compared to Model 2.
Overall, the best performances were observed for Group 1, although good performances were obtained for Group 3 using both solar radiation and temperature input. The models showed the lowest performances for Group 2. These results prove a stronger relationship between weather variables and water demand for Group 1 and Group 3, meaning that employed users appear to be on average more affected by weather than the unemployed ones. Indeed, employed users spend more time outside and have more scheduled habits that can be easily affected by weather. Notably, better results were obtained for Group 1, consisting of users with high school/university degree. Thus the educational level seems a further discriminating factor in investigating not only the water uses (Hurd 2006;Makki et al. 2013) but also the effects of weather variables on water consumptions.
The performed analysis showed that there are types of users that are more affected by weather, confirming the importance of including the socio-economic status of the users when investigating the effects of weather on water demand (Domene and Sauri 2006;Chang et al. 2010;Xenochristou et al. 2021).
Furthermore, the types of users mostly affected by weather will likely be primarily responsible of potential future variations in water demand related to climate change. At the same time, the types of users less sensitive to weather will probably have less of an impact. Knowing the number of users belonging to each type will enable the water utility to easily assess if the total district demand is expected to rise and, thus, avoid failure in water system capacity.

Impact of Climate Change on Water Demand
This section reports the results in terms of seasonal and peak variations in water demand with respect to the Current Scenario (measured data from 20 March 2017 to 19 March 2018). Figure 2 reports the seasonal values of daily maximum temperature and daily mean solar radiation for each scenario. During the study period the temperature variations are higher than the solar radiation ones. Table 4 presents the results obtained using Model 1 and Model 2 for Scenario-2050 and Scenario-2080, including the seasonal average increase of daily maximum temperature (ΔT). This value represents the average, over each season, of the differences between the daily maximum temperatures for the respective future and current scenarios. The same applies to the daily mean solar radiation (ΔSR).
The seasonal percentage changes in water demand (ΔC) for Model 1 were lower than those for Model 2, due to the low or negative ΔSR. According to the solar radiation projections of Fig. 2 Barplot of seasonal values of daily maximum temperature (a) and daily mean solar radiation (b) for each scenario both Scenario-2050 and Scenario-2080, except for spring, the ΔSR were low (Table 4). Recent studies using regional climate models at high resolution for climate change scenarios have shown an overall small decrease in solar radiation (Jerez et al 2015;Bartók et al 2017), in accordance with the scenarios considered here. However, in summer, despite the low/negative ΔSR, ΔC could rise up to approximately 5% (Scenario-2080).
For Model 2, the highest ΔC occurred in the summer, reaching up to 7.6% in Scenario-2080. It should be noted that, even if the ΔT was comparable between summer and autumn in both scenarios, the ΔC was higher in the summer. This suggests that with the same temperature increase, larger demand increases occur at higher values of temperature (around 30°C). This observation is in accordance with the study of Xenochristou et al. (2021) which identified a temperature threshold beyond which water consumption of UK households starts increasing. Figure 3a shows the results of Models 2 and 3 for the peak weeks. During temperature peak weeks, the increases in water demand were higher than the summer ones. The weekly ΔC was equal to 9.6% in Scenario-2050 and 10.5% in Scenario-2080, for weekly temperatures equal to 34.3°C (Scenario-2050) and 36.5°C (Scenario-2080). For the solar radiation peak weeks, the ΔC was equal to almost 6% for both Scenario-2050 (weekly solar radiation of 328 W/m 2 ) and Scenario-2080 (348 W/m 2 ). The lower ΔC during solar radiation peak weeks demonstrated that water demand was more affected by temperature.
Figures 3b and c show the results for the aggregated water demand of Group 1 (622 households). The summer ΔC (Figure 3b) was higher than the one obtained for the total district demand for both scenarios. Accounting for the combined effect of temperature and solar radiation (Model 1) resulted in ΔC equal to 3.1% and 7.0%, for Scenario-2050 and Scenario-2080, respectively. Higher increases for both Scenario-2050 (7.1%) and Scenario-2080 (11%) were found by considering only the temperature effect (Model 2). The ΔC for peak weeks (Fig. 3c) was higher than the one of the total district demand. The highest ΔC (13-15%) was observed during temperature peak weeks, although high ΔC were attained during solar radiation peak weeks as well. Overall, the demand of Group 1 resulted to be more affected by climate changes compared to the demand of the district.
The performed analysis showed that due to climate change the water demand could increase mostly during the weeks with the highest temperatures. Furthermore, the results demonstrated that the water demand of the type of users most affected by weather variables (Group 1) could increase more than the total district demand. Thus, the results showed the relevance of

Conclusions
This study investigates the effect of weather variables on water demand in both current and future climate change scenarios. A novel methodology to assess the impact of climate change on water demand is presented. This methodology allows to forecast future variations in water demand due to likely changes in weather variables by using RFs and the CCWorldWeatherGen. The case study of Naples (Italy) showed the effectiveness of using weather variables in forecasting aggregated water demand. According to future weather scenarios for 2040-2100, the daily water demand of the DMA could increase mainly due to increases in air temperature. During the weeks with the highest temperatures, increases in water demand could reach up to 9-10%. The increase in water demand was different for users with different social characteristics, since they were affected by weather to varying degrees. Employed users with high education could increase their consumption by 13-15% during the weeks with the highest temperatures.
Accounting for future variations in water demand due to climate changes is needed to avoid risks of supply and operational failures in water systems. This need is particularly remarkable for Italy, that, as the most Mediterranean countries, is already coping with water shortages (La Jeunesse et al. 2016).
In previous studies for the Mediterranean area (Zachariadis 2010;Collet et al. 2015;Kanakoudis et al. 2017) climate change scenarios were used for assessing future vulnerability of water resources. However, in these studies demographic statistics and past consumption trends were used to determine future water demand variations (that apply to the entire year). Instead, the methodology presented in this paper allows to directly determine future variations in water demand due to climate change based on seasons and peak weeks. Thus, the methodology provides more accurate projections based on future climate change scenarios, accounting for different periods of the year. In this work, the effect of climate change on the total district demand was investigated using information about some social characteristics of the users. However, additional socioeconomic information about the users could lead to a better disaggregation of the consumption, allowing to increase the number of households groups. In addition, in this work only one emission scenario was considered. Different results could be obtained with lower or higher emission scenarios. Therefore, future works will investigate climate change impacts on water demand of different types of users, by increasing the households grouping and using different emission scenarios. Furthermore, the CCWorldWeatherGen provided projections only for long timeframes. In order to more accurately assess the variations during the entire study period (i.e. 2040-2100), future works will investigate more frequent climate projections (e.g. every 5 years), especially in the near future, by using different CGCMs.
Data Availability All data used may only be provided with restrictions. The water consumption data were made available by Azienda Speciale Acqua Bene Comune Napoli (ABC) and the authors have restrictions on sharing them publicly.
Code Availability The codes used in this work are available from the corresponding author by request.

Declarations
Ethical Approval Not applicable.
Consent to participate Not applicable.

Consent to Publish Not applicable.
Consent for Publication Not applicable.

Conflicts of Interest/Competing Interests The authors declare no conflicts of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.