The proposed solution was installed in the department of the city of Lima, and was evaluated in two different aspects. First, the error rate of the water consumption record reported by the system was evaluated. Secondly, the leak detection algorithm accuracy was measured where a data set was used and the consumptions were simulated to see if the application detected or not a possible leak (Fig. 12).
To evaluate the margin of error of the smart meter on water consumption in liters, a model had to be assembled and the water flow measurement algorithm was gradually calibrated. Figure 13 shows the design of the model, where the water flow is measured by a flow sensor that records the pulsations generated by the passage of water. Then, through a bucket with marks (0.5 L, 1 L, 1.5 L, 2 L, 2.5 L, 3 L, 3.5 L, 4 L, 4.5 L, and 5 L), the liters registered by the system were corroborated against the actual liters that have passed through the pipe. It is worth mentioning that the calibration started with the factor recommended by the sensor documentation (Chung and Yoo 2015), which details that 330 pulses/min equivalent to 1 L, but, because the margin of error was very high with that factor, it continued calibrating until reaching 372 pulses/min. The margin of error is calculated with the following metric:
$$ Error =| (real value-value) / real value*100)| $$
(1)
Table 1 shows the ten tests carried out with the last mentioned factor and reached a percentage of 4.63% error margin, for 10 random values from 0.5 to 5 L.
Table 1 Comparison of actual water consumption vs. consumption recorded by the system On the other hand, to measure the precision of the leak detection algorithm, 10 different scenarios were simulated, this being compared with other existing algorithms. In first place, the test data was obtained (DAIAD 2019) that serves to obtain a history of consumption, and it consists of 674,020 records of 92 consumers, in 1 year of consumption, with hourly consumption records, and in many cases less than 24 records per day; because the data is desired to be the most recent, the measurement dates were updated to the years 2018–2019 and only 9 consumers were randomly selected, generating a dataset of 69,194 records. Subsequently, the following scenarios were defined:
-
Normal Consume Week (NCW): These are the hourly consumptions between Monday and Friday where there is a normal consumption of water without the presence of a leak.
-
Normal Consume Weekend (NCWD): These are the hourly consumptions between Saturday and Sunday where there is normal water consumption without the presence of a leak.
-
Normal Consume Night Work (NCNW): These are the hourly consumption on the days where a person usually does work at dawn and his water consumption is considered normal.
-
Normal Consume First Day (NCFD): Refers to hourly consumption on the first day of system use, where there should be normal consumption.
-
Normal High Consume Is at Home (NHCIAH): Consumption per hour on days where there was a high increase in water consumption, but the user is at home and is not considered an anomaly or water leak.
-
Anomalous High Consume Week (AHCW): These are the hourly consumptions between Monday and Friday where there is the presence of leakage due to high anomalous consumption.
-
Anomalous High Consume Weekend (AHCWD): These are the hourly consumptions between Saturday and Sunday where there is the presence of leakage due to high anomalous consumption.
-
Anomalous Consume Non-Zero (ACNZ): These are the hourly consumptions in which during the last 24 h in a row water consumption has not stopped registering and there is not at least 1 h where consumption is zero.
-
Anomalous Consume Similar (ACS): These are the hourly consumptions where there are three consecutive consumptions with very similar values (+ -1 L), which is considered anomalous.
-
Anomalous Consume Negative (ACN): These are the hourly consumptions in which during the last 24 h there has been a negative trend in the accumulated consumption of water or a negative consumption has been registered.
The algorithms applied are the Minimum Night Flow (MNF), Continuous Flow (CF), and Average per Hour (AVG). The MNF assumes that any existing water consumption between 2:00 a.m. and 4:00 a.m. they are indications of a possible leak. Then, the CF says that if there is no zero consumption within a 24-h range, it is considered a possible leak. Finally, the AVG is an average of consumption per hour made and if that average is passed it is an indication of leakage. To simulate anomalous consumption, consumption had to be updated at certain times after 2019, but the data for 2018 were not altered in order to have a historical pattern of behavior that would help us detect any anomalous behavior in 2019. The alterations were made to generate records for the anomalous scenarios and the NHCIAH scenario, for example, for the ACNZ scenario, a random value was added to consumption that had zero, and for NHCIAH it was established that the user was within their home having a high consumption. The data set and test scenarios are available at https://github.com/henrygustavo/data_set, with the test distribution by scenario of NCFD with 48, NCNW with 32, NCW with 120, NCWD with 46, NHCIAH with 29, ACN with 68, ACNZ with 96, ACS with 32, AHCW with 85, and AHCWD with 48, making a total of 275 tests for scenarios of normal consumption and 329 for anomalous consumption. In addition, each test has a field called “isAnomalous” with a value of 1 or 0 that indicates whether or not to issue a leak alert for a specific consumption.
The measurement of the algorithm is carried out through the confusion matrix that allows measuring the performance of the algorithms against the reference or expected consumption (Benchmark), which is appreciated in Table 2, where its main values are:
-
True Positive (TP): Water leak identified by the algorithm.
-
False Positive (FP): Non-existent water leak, incorrectly identified by the algorithm (false alarm).
-
False Negative (FN): Water leak not identified by the algorithm.
-
True Negative (TN): Real absence of water leakage (most cases).
The following metrics were calculated based on the results provided by the confusion matrix:
$$ Accuracy = \frac{TP+TN}{TP+TN+FP+FN} $$
(2)
$$ Recall = \frac{TP}{TP+FN} $$
(3)
$$ Precision = \frac{TP}{TP+FP} $$
(4)
$$ F1 score = 2 \times\frac{Precision \times Recall}{Precision + Recall} $$
(5)
Accuracy indicates the percentage of leak and non-leak scenarios correctly identified by the algorithm. Recall quantifies the algorithm’s ability to identify alarms, measured by the ratio, correctly identified alarms to the numerical total of true alarms. Precision measures the algorithm’s ability to avoid false alarms, based on the ratio between the number of identified true alarms and the total number of alarms identified by it. Finally, the F1 score allows evaluating the algorithm’s ability, in a single metric, to distinguish between hours with and without water loss and is calculated as the Recall and Precision harmonic mean.
Tables 3 and 4 show the results obtained from the tests carried out in the different scenarios of normal and anomalous consumption, respectively. The “# Tests” column is the amount of consumption per hour that has been tested by the different algorithms on a given day. Then, the “Leakage / day” column is the number of leaks to be detected on a given day. Subsequently, the columns “MNF”, “CF”, “AVG”, and “Proposed Algorithm” show the number of leaks that have been detected during the day.
Table 3 Scenarios of normal water consumption Table 4 Scenarios of anomalous water consumption Table 5 shows the result of the confusion matrix, where it can be seen that the proposed algorithm has an Accuracy, Recall, Precision, and F1 score of 100% that are superior to the other algorithms.
Table 5 Results of the confusion matrix by the algorithm