Water Consumption Variability Based on Cumulative Data From Non-simultaneous and Long-term Measurements

Devices for water consumption measurement provide data from periodical readings in a non-simultaneous and cumulative manner. This may result in inaccuracies within the process of inference about the short-term habitual patterns of water supply network users. Maintaining systems at the interface between periodic and continuous processes requires the continuous improvement of research methodology. To obtain reliable results regarding the variability of water consumption, the first step should be to estimate it for each observation day by periodic averaging and a possible water balancing approach, but the analysis of the value of estimators obtained in this way usually does not allow for studying autocorrelation. However, other methods indicate the existence of multiplicative parameters characterizing short- and long-term variations in water demand. The purpose of this study is to create a new and deterministic method for tackling the problem associated with a lack of short-term detailed data with fuzzy time series using a multiplicative model for water consumption. Satisfactory results have been obtained, demonstrating that the dispersed data, received in a cumulative manner for random periods of measurement, can be analyzed by the methodology of proposed statistical inference. The observed variability in water consumption may be used in the planning and modernization of water supply systems, development of water demand patterns, hydraulic models, and in the creation of forecasting models of water consumption.


Introduction
As part of sustainable development, meeting the water needs of consumers constitutes one of the major challenges of modern society in light of temporary or permanent water deficits in a number of regions (Carvalho et al. 2019;Kloosterman et al. 2020;Pallavi et al. 2021). To address this challenge, intelligent water supply systems such as extensive Water Demand Forecasting (WDF) are being used in developed countries. They consist in the creation of water consumption forecasts based on data derived from monitoring the water supply network (Tuz 2006;Wawrzosek and Ignaciuk 2018;Wawrzosek et al. 2019) and requiring that variability in water consumption be taken into account both in the sector of its prediction and in the solutions used for detecting failures (Brentan et al. 2017;Guo et al. 2016;Lipiński et al. 2017). This knowledge can be used in the planning and modernization of the water supply network (Bradley 2004), the development of water demand standards, hydraulic models, and in the creation of prognostic models of water consumption (Huang et al. 2021).
On average, hourly, daily, and seasonal variability is observed due to the deterministic and random nature of water consumption. Regardless of the forecasted horizon, it is necessary to identify the factors that affect the system's variability in water consumption. In the case of short-term forecasting, the weather situation is for the most part analyzed (Fiorillo et al. 2021;Gato et al. 2007), while long-term forecasting additionally requires demographic and economic factors to be taken into account (Breyer and Chang 2014;Ghiassi, 2017). In many countries, measuring the median consumption when the price is rising or there is a limit to water supply hours has led to the control of consumption and costs by consumers, thereby enforcing a far-reaching change in their behavior (Marzouk 2019). In Poland, the highest level of water consumption was observed in the 1980s (Piasecki and Jurasz 2016). This was mainly explained by a lack of water meters on the part of recipients as well as low public awareness. But with the introduction of water meters for cold and hot water in multifamily buildings, it has led to a reduction in water consumption by half (Pawełek et al. 2015).
The urbanization process has led to the isolation of areas with a diverse character within urban areas. Hence, knowledge of the habits of water supply system users, which can be assessed on the basis of increasingly precise measuring devices, should make it possible to more accurately gauge the operating status of water distribution systems (Betta et al. 2002). New opportunities for monitoring have been made possible by the introduction of modern water meters along with remote meter reading. At a specified time and frequency, modern measuring devices are able to dispatch data, e.g. via radio. Remote meter reading is a convenient solution, as it facilitates the transfer of data and billing of water meters, as well as raising concerns regarding surveillance and durability of batteries powering the device (Srbinovska and Cundeva-Blajer 2019).
The current way of reading data from devices for water consumption measurement does not correspond to the classical statistical methodology. Classically understood, time series analysis requires systematic data collection with a precise and regular time step creating an index for this series (Shumway and Stoffer 2017). Hence, the variation in the number of days in a month, for example, is then classically reduced to an equal number of days. If the time step is not regular, it then becomes a fuzzy time series ). In the case of water supply systems, although the measurement data are often of a structured nature, they are still heterogeneous, i.e. they are cumulative, long-term data from non-simultaneous measurements obtained at different time intervals during which they were collected for a certain period of time. Moreover, due to a lack (Quevedo et al. 2010) or falsified observations, the reliability of the measurements is low and not lead to a water balancing in the district metered area (DMA) . The problem being identified here applies to many companies (not just water supply companies) when data collection is dispersed. Data management is crucial for any company that seeks to anticipate upcoming events, respond when faced with risks, improve its efficiency, reduce the costs of obtaining, storing, and processing information or the costs of compliance with legal regulations.
The main objective of the study is to investigate the extent to which unstructured longterm cumulative data, i.e. from non-simultaneous measurements of water consumption in blocks in the absence of established measurement periods, allow for the recovery of information in periodic (daily/monthly/weekly) fluctuations in water consumption. Few researchers have addressed the problem of water consumption analysis in terms of water meters installed in residences. Most research has focused on the interpretation of water demand using time series registered in whole district metered areas (Fiorillo et al. 2021;Stańczyk et al. 2018). Therefore, the purpose of this study is to create a new deterministic method to tackle the problem associated with a lack of short-term detailed data with fuzzy time series using a multiplicative model for water consumption. Satisfactory results have been obtained, demonstrating that dispersed data received in a cumulative manner for random periods of measurement can be analyzed by the methodology of proposed statistical inference.

Materials and Methods
The authors use measurement data from three selected blocks of flats located in Swidnica, a Polish city with approximately 58,000 inhabitants. Three time series of cold water consumption marked A, B, and C over a five-year period between December 2011 and December 2016 are considered. It is noted that users from blocks A and B consume a similar amount of water, while users from block C only use half that amount. This is due to the fact that block C has a smaller number of residents. Another limitation of the data results concerns the specificity of meter readings by the water supplier meant only for the purposes of financial settlements: quite irregularly, approximately monthly meter status, which only provides cumulative data on consumption. The multiple of the number of days in a week is usually not the number of days in the data collection periods. Hence, there is no fixed day of the week during which readings from measuring devices were carried out. This made it possible to search the cumulative data for information on the difference in water consumption throughout the week.
Measurements of water consumption were carried out on different dates, often close to monthly periods but clearly distinct. This diversity is a problem when trying to analyze the annual periodicity of water consumption. Long-term periods of measurements provide cumulative data that does not allow for a direct interpretation of the daily water demand of these residents (Fig. 1).
More specifically, in the first step of this work, water consumption for each day of the entire observation period was estimated for each measurement point only by periodic averaging. Only data modification as such, referring to each day of the year, allows for the use of classical statistical methods. However, it should not be expected that such pre-prepared data will allow for a full observation of short-term fluctuations in water consumption. The results obtained in this way, referring to short periods, may be burdened with an error called flattening in this work. Therefore, in order to isolate the weekly and annual rhythm of water consumption from the data, it was necessary to start by determining the average daily water consumption separately for each block corresponding to different measurement periods. By completing the missing daily water consumption figures with average values for two days, the same common period containing the beginning and end of observation was established from 2011-12-21 to 2016-12-20. Without interfering with the value of the outlier measurement, the average daily levels of water consumption were summed up, thus obtaining Fig. 2. It is only the time series of averaged daily water consumption levels that constitutes the basis for further determination of the fluctuation in water consumption throughout the week. The analysis of the time series, obtained in this way, does not normally allow for studying autocorrelation.
In subsection 3.1, for the three blocks, the distribution of the number of days between successive measurements of water consumption was analyzed (Fig. 3). The normality of the distribution of such days was examined using Kolmogorov-Smirnov and χ 2 tests. Using the additionally calculated averages and standard deviations, an appropriate conclusion was drawn as to the periodicity of water consumption measurements. In subsection 3.2, the correlation between the estimated daily water consumption among the three blocks and the correlation of the number of days between consecutive readings in those blocks was examined (Fig. 4).
In the analysis of time series for the pre-prepared data of water consumption, the choice of the statistical model was based on the level of detail in the research conducted below. Therefore, in the research, one of the two typical methods for identifying a development tendency (trend) was used: Fig. 1 Uneven periodicity and unequal moment of measurements for three blocks a) mechanical method, i.e. the method of simple moving averages which eliminates random fluctuations from the series, while the approximate trend obtained in this way is described by an implicit mathematical function, or b) an analytical method that consists in appropriating an explicit mathematical function to the entire time series. More often than not, a linear trend model is built. Hence, depending on the detail of the research, two types of multiplicative models were used: a) Model (1), which corresponds to the trend function in the form of a seven-day simple moving average, was used 4 times; b) Model (2), which corresponds to the linear trend function of the time series for months / weeks, was used 2 times.
The multiplicative indexes obtained in these 6 models correspond to: a) the j-th day of the week; b) the i-th month / week of the year which allow for the determination of periodic fluctuations within: a) 7 days, i.e. for j = 1, 2 … 7 b) 12 months, i.e. for i = 1, 2 … , 12 or 52 weeks, i.e i = 1,2 … 52.
The analysis of time series from Fig. 2 for Swidnica was carried out using the multiplicative models in the Statistica program (subsection 3.3). These models (1) take the form of: where i = 1,… 4; j = 1,...1827; i = 1, …, 4 -the time series for blocks A, B, C and the time series for the total water consumption in these blocks of flats, respectively; y i,j (t) -water consumption on the t-th day which is the j-th day of the week for the i-th time series; ∼ y i (t) -the value of the i-th function on the t-th day designated as the seven-day simple moving average for the i-th time series; i,j -the multiplicative indicator corresponding to the j-th day of the week for the i-th time series, where ∑ 7 j=1 i,j = 700% for i = 1, … 4; (1) Fig. 4 Lack of interdependence between estimated daily water consumption (a) and the number of days between consecutive readings (b) for pairs of blocks confirmed by correlation coefficients with values nearing zero i (t) -the error value for the i-th time series on the t-th day. In the same Sect. 3.3, a comparison of multiplicative parameters α i, j obtained by the model was made (1) (Fig. 5a) with analogous parameters obtained for Fig. 5b (the data here was collected in a different period, i.e. 2019.06.11-2021.03.21, and manner (daily for the whole district). In subsection 3.4, two further time series of the average daily water consumption in three residential blocks in Swidnica were constructed with regard to the monthly and weekly averages within the 5-year research period. The number of the week in a year on which the day falls, with the assumption that the week starts on Monday, was determined using the ISOWEEKNUM() function of the EXCEL application. As such, the analysis of both of these time series was performed using the multiplicative models in the Statistica program. Both of these models are partly similar and take the form of model (2): where for i = 1, … 12;t = 1, … , 61 (respectively i = 1, … , 52;t = 1, … , 262); y i (t) -the average daily water consumption in the t-th month/respectively week of the observation, which is the i-th month/week of the year ; ∼ y i (t) -the value of the linear trend function of the time series in the t-th month/week of the observation, which is the i-th month/week of the year ; i -the multiplicative index corresponding to the i-th month/week of the year where ∑ 12 i=1 i = 1200% (for respectively ∑ 52 i=1 i = 5200%); i (t) -the error value for the time series in the t-th month/week of the observation, which is the ith month/week of the year.
If possible, the next step is to use Elliot waves (Poser 2003) of multiplicative coefficients for the above two models (2) in order to obtain information about the variability of water consumption in the interpretation of the city under study.

Results and Discussion
As a result of the aforesaid initial processing of the original data ( Fig. 1), modified data was obtained (Fig. 2) which became the basis of further statistical analysis. In the period from 2014-06-21 to 2014-07-29, dramatically low water consumption was measured in block B, which was half the usual value. This measurement could not be falsified due to the continuity of readings from one meter. However, the difference in this single observation does not significantly affect estimation of the value of the weekly variability being sought. On the basis of consultations with the water supply and sewage company, the failure of the internal installation, measuring devices or refurbishment and maintenance works carried out at that time was excluded.

Analysis of the Data with Respect to the Frequency of the Readings in the Blocks of Flats
The results regarding the frequency of measurements are shown in Fig. 3. The results suggest a normality in the distribution of the number of days between 182 readings for all three blocks. This is not supported by the p-values of KS and X 2 tests for all data. The lack of normality may result from the existence of a long left tail in the distribution. The values of these tests for the individual blocks are not so unanimous and sometimes they do not reject the normality of distributions. The estimated average values for the number of days between successive 61 readings in blocks A, B, and C over a fiveyear period from December 2011 to December 2016 are similar and amount respectively to 29.93, 30.43, and 29.95, while the total amount is 30.10 days on average. The following values of standard deviation (3.32, 2.90, 3.76, 3.33) correspond to them. This denotes the existence of an only partially stabilized and approximate monthly periodicity of the measurements performed.

Interdependences
The results on the relationship between a pairing of two blocks in terms of water consumption and the number of days between measurements are collected in Fig. 4, and then interpreted below.
In Fig. 4a, points that overlap one another multiple times, revealing a lack of correlation between the average daily water demand in the three blocks, are noticeable. The plots indicate that there is no correlation between the estimated daily water consumption and any pair of blocks A, B and C, while the diagrams in Fig. 4b illustrate that there is no correlation between any pair of blocks A, B and C for the number of days according to subsequent readings of the water meters. Both of these conclusions are also confirmed by the correlation coefficients with values nearing zero here placed respectively in these figures below.

Percentage Fluctuations in Water Consumption per Week
The results of the daily variability in the water consumption separately for each of the three blocks and the total for the three blocks are presented in Fig. 5a. Analogous results referring to the DMA flow meter are given in Fig. 5b.
The variability in water consumption throughout the week is described by the multiplicative parameters i,j (Fig. 5a). In Fig. 5a the variability of multiplicative parameters i,j due to the days of the week is observed for each block and for the total water consumption. These plots reveal that in Swidnica, all three blocks were subject to a cyclical decline in water consumption in the middle of the week, with the observed minimum falling on Wednesdays, whereas on Fridays the largest increase in water consumption was observed cyclically.
While comparing the results obtained for the three blocks separately with the analogous research results of daily measurements conducted from the DMA flow meter, it should be noted that the character of variability in both series is different. The obtained data may indicate different habits of water consumption. In addition, the rows of observed variability significantly diverge from each other. However, the flattened character of variability separately for the three blocks (Fig. 5a) results from observing only the cumulative consumption of water, while conclusions obtained for the DMA flow meter (Fig. 5b) were based on daily, not monthly, measurements. Moreover, in both cases, there is a general tendency towards a higher water consumption over the weekend than in the middle of the week. And the extended period of increased water consumption in Fig. 5a (including Friday and Monday) can also be explained by the cumulative nature of the data on which it is based.

Monthly and Weekly Percentage Indices of Fluctuations in Water Consumption per Year
It is observed (Fig. 6) that the designated direction coefficients for linear trends in both of these series are fully compatible, as on average there are 4.35 weeks per month; moreover, −0.0257 ≈ −0.0059 ⋅ 4.35 . The variability of more than 60% in the average daily water consumption shown in these figures, which was not explained by the simple linear models presented in Fig. 6, requires a more precise modelling as expressed by, among others, seasonal multipliers for both models (2).
Observing Fig. 7, fluctuations in the monthly average multiplicative percentage indices of daily water consumption within a year around the linear trend in model (2) for for i = 1, … 12, t = 1, … 62 its initial lowest value in May, and subsequent minima in September and February, can be observed as a form of a triple Elliott wave, whereas the maxima of this triple wave fall within the July-August period and October-November-December period, as well as during the March-April period; the two above-average peak periods of the monthly average in daily water consumption are reached in the early spring period and the autumn period, as well as at the beginning of winter. Fig. 6 The linear trend in daily water consumption for three residential blocks in Swidnica in terms of a monthly average in a period of 6 years, with the beginning of the Elliott wave marked (a) and in terms of a weekly average in a period of 6 years (b) Thus, this consumption in Figs. 7 and 8a exceeds the average annual trend by 1% and 1.5%, respectively. The depiction of the data in terms of average weekly water consumption per day in Fig. 8 reveals a greater exceedance of the average values (i.e. 100% of the linear trend), reaching as high as 3.5%. This means that the average values designated for longer periods of observation flatten the observed variability of daily water consumption in the period considered.
In the weekly breakdown, i.e. in model (2) respectively for i = 1, …52 t = 1, …52 of the multiplicative weekly percentage indices in Fig. 9, four clear increases in water consumption may be noticed, i.e. four jumps from a value below the average corresponding to the trend, to those above the average corresponding to 100% of the linear trend. They demonstrate fluctuations in water consumption around the linear trend in model (2) in the form of a triple Elliott wave, with the initial lowest value falling into the 5th week, i.e. at the turn of January and February, also including the smaller components. They occur in the: • 5th-10th week of the year (January/February-beginning of March), increased by 5%; • 24th-26th week of the year (last two weeks of June), increased by 4%; • 29th-33rd week of the year (mid-July to mid-August), 2% increase; • 39th-46th week of the year (the last week of September to mid-November), increased by 4%.  . 9 The multiplicative percentage indices of average weekly water consumption in three residential blocks in Swidnica within a year Four clear drops in water consumption can also be noticed, i.e. four leaps from a value below the average corresponding to the trend to those above the average corresponding to 100% of the linear trend. They occur in the: • 16th-18th week of the year (the second half of April -beginning of May), 4% decrease; • 26th-29th week of the year (the last week of June and the first two weeks of July), decreased by 2%; • 35th-39th week of the year (the end of August and all of September), decreased by 3%; • 52nd-5th week of the following year (the last week of December -end of January), decreased by 7%.

Conclusions
Accumulated data collection over various longer periods, usually convenient for settlements with regular customers, is not conducive to the observation of daily and weekly fluctuations in water consumption. Moreover, unequal periods and moments of water consumption measurements for several multi-family buildings prevent the analysis of cumulative fluctuations in such consumption for subnets without data pre-processing. Maintaining systems at the interface between periodic and continuous processes requires the continuous improvement of research methodology. This work raises an issue that has yet to be considered with statistical applications in measuring fluctuations in consumption of utilities. The authors have made every effort to develop a methodology of statistical inference for specific time series of periods created on the basis of data obtained in a cumulative manner for random periods of measurement.
Correcting the six-trend functions by multiplicative indexes means that the sensitivity of the first 4 models (1) to changes in water consumption depends on the days of the week, while the last 2 models' (2) sensitivity to changes in water consumption depends on months / weeks of the year. Thus, the initially determined daily averages for long-term measurements, which most often coincide with the seven-day simple moving average, can also be successfully corrected through multiplicative indexes that correspond to the j-th day of the week. As for the last 2 models, the existence of Elliott waves was observed among the multiplicative indexes, which cyclically corrects the appropriate linear trend functions. The authors propose that multi-year observations of cumulative values should be supplemented with recording of the water consumption over several correspondingly long periods, including daily measurements within a few weeks, weekly measurements within a few months, and monthly measurements within one or two years.
The results obtained suggest that a planned maintenance of the water supply system in Swidnica should be carried out from Tuesday to Thursday, which would then be different from Wroclaw where it ought to be carried out on Tuesday or Friday. In Swidnica, the weeks from May to mid-June are the least inconvenient for residents of the examined buildings.