1 Introduction

With the advancement in sensory technologies, understanding of long-term performance of various civil infrastructure assets and developments in the field of computation and processing, the water sector is evolving rapidly, and various advancements are taking place in the way water is managed and used.

Utilities deal with several issues related to water, like conservation of water resources, optimal energy management, reducing water consumption, recovery of nutrients in water and wastewater, separation of wastewater resources, control and automation (Abdalla et al., 2021).

Water utilities are in the pursuit of keeping up with the latest technologies to manage their watershed efficiently and sustainably. The use of these technologies has also led to generation of huge volumes of unstructured and complex asset data during their lifecycle. However, dealing with this data is not a straightforward task as the data generated in real-world is complex and contains a variety of issues like lack of standardization, metadata, and validity, among other issues.

The era of Internet, big data and machine learning has led the water utilities to undergo a digital transformation. The proliferating use of AI across various industries has also found its presence in the water sector. Water utilities can leverage AI and big data analytics to make intelligent data driven decisions (Sinha and Sinha, 2000). A study by International Water Association (IWA) suggests that digitalization will enable this transition and AI will be the key technology to digitalize the water utilities (Kapelan et al., 2020). The scope of AI for water utilities is huge. AI can enable intelligent decision making for the conventional processes taking place at the utilities (pipe failure prediction, water demand forecasting, wastewater treatment, etc.). AI can also help the utilities in achieving enhanced business intelligence by helping in tasks like data integration, smart visualizations, human resource management (Jenny et al., 2020).

Civil engineering is a licensed profession and therefore, trust in any decision support model is critical for acceptance in real-world applications. License of practice for engineers can be at risk if any decision taken by AI leads to a legal issue. Therefore, it is important to identify and discuss the true capabilities of AI. This will help the decision makers realize opportunities, challenges, and risks of implementing AI for modeling different processes within a water cycle (Sinha & Karray, 2002).

aiWATERS, the framework proposed in this work, provides a guide for the water utilities which can help them to understand key concepts, benefits, limitations, challenges of AI and how they correlate to the water sector. The framework is built based on seven pillars of AI: Understanding AI and its Benefits, Data Readiness, Knowledge Integration, Model Development, Decision Support and Implementation. To identify these building blocks which are trust-worthy to achieve a holistic AI implementation in the water sector, we followed a seven-step methodology as described in Fig. 1. This framework is a novel proposition and no other state-of-the-art approaches have articulated such a framework yet. It should be noted that this framework is not a generic literature-based collection of AI practices, but rather it is a framework drafted specifically for the water sector based on its real-world operations.

Fig. 1
figure 1

Methodology used to build aiWATERS

  1. 1.

    Literature review: We conducted a comprehensive literature review by manually reviewing more than 100 articles published by water organizations, research papers from conferences and journals. The literature review captured various features of the work done by researchers, such as the domain of application in the water sector, types of AI models used and their purpose, reasoning for selecting a particular model, hyperparameter combinations tested for the model, validation process for the selected model, datasets used for training and testing, data collection and preprocessing steps followed, and programming languages/libraries used.

  2. 2.

    Practice review with real-world utilities: We observed that most of the AI applications discussed in the literature are academic, and hence, we interviewed various water utilities about how they use AI or data driven decision making in their system.

  3. 3.

    Building aiWATERS framework: Our literature review and practice review helped us identify the challenges of implementing AI in the water sector. We identified seven key aspects of AI, which we call the seven pillars of AI, that can address the AI challenges to the water community.

  4. 4.

    Piloting with Jacobs engineering: To ensure the framework’s efficacy, we piloted with Jacobs Engineering during a three-day working meeting and obtained feedback from a water domain expert's perspective.

  5. 5.

    Preparing a questionnaire: Once we finalized the framework, we prepared a questionnaire based on the 7 pillars of aiWATERS to capture water utilities' current practices in AI.

  6. 6.

    Feedback sessions with large scale utilities: We conducted a working meeting with HRSD and Houston water utilities to obtain feedback on the questionnaire from a utility's perspective and incorporated their suggestions.

  7. 7.

    Pilot interviews and evaluating response from utilities: Finally, we sent the questionnaire to various utilities to gather their current AI practices. We also conducted pilot interviews with small, medium, and large-scale utilities to understand their perspectives on the current challenges and future of AI at their utility.

Based on this framework, we conducted pilot interviews and surveys with water utilities of different sizes to capture their AI practices and assess their maturity of AI implementation. By capturing and analyzing the current AI practices of water utilities, we intend this framework would help water utilities across the globe to improve their maturity in AI through aiWATERS.

The goal of this research is to provide an AI-based framework (aiWATERS) which can help the water utilities to successfully implement AI in their system.

1.1 Research questions

The analysis based on our pilot interviews and surveys with water utilities will enable us to answer the following research questions regarding the water utilities in the United States:

  • RQ1: Are water utilities in the US willing to develop AI across their system for decision making?

    1. o

      Through this question, we aim to find the degree at which utilities is wanting to integrate AI within their system. Do they wish to apply AI to every process within the utility with a human in the loop?

  • RQ2: How do water utilities of different sizes in the US compare to each other in terms of AI implementation?

    1. o

      We divide piloting utilities into three sizes: small, medium, and large utilities, based on the population they serve. The level of AI implementation and the best practices followed by them may differ based on their focus area, available expertise, and prior experience with AI. We try to capture any pattern of differences observed within these 3 categories of utility.

  • RQ3: What aspects of AI technology are considered as the major challenges by water utilities across the US?

    1. o

      For this research question, we try to find what according to the water utilities are the major challenges for implementing AI in their system. It will also help us to find which pillars of aiWATERS are more important for the utilities.

The pilot interviews and survey that we have conducted in this study help us to answer these questions. While the sample of water utilities currently analyzed in the study is small, we cover responses from utilities across all sizes (small, medium, and large utilities) based in the United States. Based on our review and discussions with domain experts (as described in the framework methodology in Fig. 1), we find that water utilities of similar size also have similar maturity level of AI. Therefore, our work research work explores the three research questions mentioned in this section which represents the current state of AI maturity in water utilities across the United States. As we collect more responses from utilities across different countries, the framework should be extended to generalize globally in future.

2 Literature and practice review

2.1 Literature review

The use of AI in the water sector is spread across the Built, Natural and Social subsystems, which makes the larger water system. Each of these sub-systems contains multiple components and elements. We define the categorization of the water system as follow:

  1. 1.

    Built subsystem: The built sub-system includes the human-made components developed for wastewater collection, stormwater capture, and treatment infrastructure. The main components of this subsystem include household/commercial/industrial, wastewater, and stormwater infrastructure.

  2. 2.

    Natural subsystem: The natural sub-system comprises all the naturally occurring resources/ elements that exist and influence sewershed characteristics like the water, land, and climate.

  3. 3.

    Social subsystem: The social sub-system comprises the factors affecting societal aspects and having an economic impact on a sewershed and can be categorized into the community, policy, financial and economic components.

There are various AI applications associated with the component of these subsystems. The ever-increasing research efforts of AI applications lacks a comprehensive study of AI on each of these subsystems. In Table 1, we present a summary of various applications of AI in the water sector. While we have reviewed several articles to understand the current state of AI applications in the water sector during the course of this research, in order to highlight some of the most common applications of AI on various infrastructure components across the Built, Natural, and Socio Economic Subsystems, and the corresponding AI techniques used, we have picked a few articles and summarized them in Table 1 to provide a concise picture of AI applications found in the water sector literature. We highlight the limitations of these literature references to discuss how aiWATERS can help the water utilities when approaching their AI based solutions.

Table 1 Summary of AI applications in water sector found in literature

2.2 Practice review

The literature review outlined in the previous subsection helps us to understand state of the art AI techniques used in the water sector and their limitations. However, most of the open-source literature articles revolve around academic experiments carried out on a limited dataset. There are very few resources which describe how a water utility uses AI in their system. To highlight how real-world water utilities have attempted to introduce AI in their system, we reached out to a few water utilities such as Metropolitan Water Reclamation District (MWRD) of Greater Chicago, Clean Water Services (Oregon), and Hampton Roads Sanitation District (HRSD), Virginia to study their previous work and plans of integrating AI in the water system. The practice review helps us to explore beyond literature articles towards the real-world implementation.

2.2.1 Metropolitan water reclamation district (MWRD) of greater Chicago

Yang et al., (2021) explores the implementation of advanced machine learning techniques in the context of odor and corrosion control in a water resource recovery facility. The authors highlight the challenges associated with odor and corrosion control in such facilities and the need for effective and proactive management strategies. Traditional approaches often rely on manual inspection and reactive measures, which may not be efficient or cost-effective in the long term.

To address these challenges, the paper introduces the application of advanced machine learning techniques as a promising solution. The authors discuss the advantages of machine learning algorithms, which can analyze large volumes of data and identify complex patterns and correlations that are difficult for human operators to discern.

The study then presents a machine learning approach divided into 3 modules which is employed in the water resource recovery facility to predict H2S (Hydrogen sulfide) and VFA (Volatile Fatty Acids) concentration. It outlines the data collection process, including the use of sensors and monitoring devices to capture relevant data on odor and corrosion-related parameters. The data is then preprocessed to ensure its quality and suitability for machine learning analysis.

The authors experiment with RNN (Recurrent Neural Network), Random Forest and SVM (Support Vector Machine) models for odor and corrosion control. It explains the algorithm selection and the training process using historical data. The model's performance is evaluated through various metrics, including accuracy, precision, and recall. Since the authors used years of historical data, RNN based LSTM technique provided the most accurate results.

The results of the study demonstrate the effectiveness of the advanced machine learning approach in predicting and managing odor and corrosion issues. The model shows high accuracy in identifying potential problem areas and generating proactive recommendations for mitigation strategies. This enables the facility to take preventive measures, optimize resource allocation, and reduce operational costs.

2.2.2 Clean water services

Clean Water Services in Oregon utilized a soft sensor system for process control and optimization. The project was developed in response to The LIFT Intelligent Water Systems Challenge and involved collaboration between Clean Water Services and Princeton University.

The objective of the project was threefold: (i) to predict influent flows for the Rock Creek facility in advance; (ii) to predict the next calendar month’s average Tualatin River flow; and (iii) to develop a user-friendly dashboard for visualizing predictions and process control tools. The team collected 11 years of historical data for developing soft sensors to predict the influent wastewater flow. They also obtained data from USGS river monitoring sites and NOAA meteorological sites for predicting the next month's average river flow. Data preprocessing techniques such as feature scaling, selection, and outlier detection were applied, and the data was stored in SQL servers.

For model selection, the authors found that Extra Trees Regression (ETR), Kernel Ridge Regression (KRR), and Support Vector Regression (SVR) achieved good results for predicting the next day's influent wastewater flow. For predicting the next month's influent wastewater flow, K-Nearest Neighbors (KNN), ETR, and KRR were identified as the top-performing methods.

The performance of the models was evaluated using metrics such as adjusted R2 (coefficient of determination), mean absolute percentage error (MAPE), root mean squared error (RMSE), and mean absolute error (MAE).

To support decision-making, a dashboard was developed using Power BI as a decision support tool. Overall, this case study demonstrates the successful application of a soft sensor system and advanced machine learning techniques in the water sector for odor and corrosion control. It showcases the benefits of predictive modeling and data-driven decision-making in improving operational efficiency and reducing resource consumption for Clean Water Services.

2.2.3 Hampton roads sanitation district (HRSD), VA

HRSD has partnered with DC Water, MWRD Colorado, University of Michigan, Ann Arbor, Northwestern University, Oak Ridge National Laboratory to work on a project titled "Crossing the Finish Line: Integration of Data-Driven Process Control for Maximization of Energy and Resource Efficiency in Advanced Water Resource Recovery Facilities". This is an ongoing project, and the team has outlined the vision they want to achieve using machine learning tools. They aim to reduce energy and chemical inputs, maximize energy and nutrient recovery. Through this project, they want to (i) achieve maximum energy and resource efficiency; (ii) achieve performance goals; and (iii) better manage the risk of implementing new technologies in the water sector. The team intends to share the benefits of this project across the water sector through the development and demonstration of a Toolbox. Much like a cookbook, the Toolbox will illustrate how generic statistical and ML tools can be applied, given plant-specific data synthesis, and blending approaches, advanced data analysis techniques and the code associated with specific examples, hardware, e.g., control network servers, DCS (Distributed Control System)/SCADA (Supervisory Control and Data Acquisition), and data management software configurations.

The team is currently in the planning and scheduling phase and thus, precise technical details are not available at the time of writing this report. However, the project goals of this team clearly describe how the utilities are intending to switch towards AI to maximize benefits and operational efficiency in the water sector.

2.3 Limitations of current AI practices and motivation for aiWATERS

The extensive literature review identified several limitations in the existing AI practices in the water sector. One major concern is the lack of reproducibility of models. Researchers often mention the techniques and AI models used in their applications but fail to provide sufficient details on the model and experimentation configurations that led to optimal outputs. This hinders the ability of other researchers to replicate and build upon previous research in this domain. For instance, Damavandi et al., (2019) used an LSTM model for rainfall prediction but did not outline the number of layers, nodes in each layer, epochs, and other model configurations which can help others to reproduce the same model which is claimed to have provided accurate results for rainfall prediction.

Another limitation is the scale of the AI experiments conducted. Many papers relied on relatively small datasets (such as Manu and Thalla, 2017; Shirzad et al., 2014), while some of the papers did not mention about the size and samples of data used (Aslani et al., 2021; Facchini et al., 2021). In practice, like we saw in Sect. 2.2.2 for Clean Water Services, water utilities work with years of historical data. The academic models proposed in the literature may not be scalable to real-world scenarios due to their limited testing on small datasets. Real-world AI models are more complex and trained on large volumes of data. The absence of information regarding steps taken to address underfitting and overfitting scenarios further limits the generalization of AI models.

A lack of a standard dataset is also observed. Most studies use data from specific regions or utilities, making it difficult for researchers to build upon existing models and develop state-of-the-art methods. For instance, Shirzad et al. (2014) used a dataset which was limited to the Mashhad and Mahabad region in Iran for their pipe failure prediction use case. There are several factors such as the material of a pipe, the temperature around it, moisture and other environmental factors which are crucial in determining the health and age of a pipe. The conditions around water pipes are different in different regions. Thus, results which are focused on a dataset of a specific region cannot be generalized to other regions.

Additionally, researchers often did not provide insights into their technical decisions, such as the choice of one machine learning technique over another. The rationale behind selecting a particular model was often missing, making it challenging to understand the suitability of a given model for a specific use case. For instance, Facchini et al. (2021) could have described how ANN (the AI technique selected in their work) compared to other state of the art deep learning techniques like LSTM and CNN in terms of performance on sewer sludge treatment process.

The practice review in Sect. 2.2 revealed that water utilities collaborate with academic institutes and consultants due to a lack of in-house expertise. However, developing in-house expertise is preferable for utilities, as they possess domain knowledge necessary for effective integration of AI applications.

While ML applications in the water sector primarily focus on optimization and modeling, most efforts remain confined within academic boundaries and have not translated into decision-making tools. Non-academic ML applications in the water sector are challenging to find in research publications, making it difficult to capture industry best practices. Successful AI applications in the water sector require interdisciplinary collaboration between domain experts in civil engineering and computer science, an aspect that is often overlooked in academic publications.

The limitations presented in this subsection are based on all the articles and case studies we have reviewed during this research work to understand state of the art literature of AI in the water sector. Considering the limitations found in the literature and practice review, the water sector requires a well-articulated framework which can guide water utilities to successfully implement AI in their system. These limitations in the water sector motivated the authors to propose the aiWATERS framework to facilitate successful implementation of AI in the water industry. The aiWATERS framework provides instrumental guidelines and directions to the water utilities through its 7 pillars of AI. Sections 3 will discuss about the validation and verification process of the aiWATERS framework, showcasing that it is a trustworthy framework drafted through a rigorous review of existing literature and industry practices related to AI in the water sector and inputs from water sector domain experts. Section 3 also describes the 7 pillars of AI for the aiWATERS framework.

3 aiWATERS: the framework

The aiWATERS framework is intended to help the utilities successfully navigate the issues associated with the application of AI in the water sector. The framework considers various issues associated with AI applications and is built on seven pillars of AI: Understanding Benefits and Challenges, Application Goals, Data Readiness, Knowledge Integration, Model Development, Decision Support, and Implementation, as shown in Fig. 2. Data and Knowledge are crucial inputs to the AI model and thus, considered as AI left domain pillars. The literature and practice review helps us to find the best practices and limitations of AI in the water sector. We consider these findings while building each pillar of the aiWATERS framework.

Fig. 2
figure 2

Pillars of aiWATERS framework

Following is a brief about the seven pillars of this framework:

  1. 1.

    AI Understanding, Benefits, and Challenges: The first pillar on understanding AI and its benefits focuses on briefly explaining the different foundational concepts of AI, difference between AI in academia and industry, benefits and challenges.

  2. 2.

    AI Application Goals: The second pillar describes how AI should be applied and discusses topics like the application at different levels of the studied system, how to deploy AI in various categories, and the different modes of building AI.

  3. 3.

    AI Data Readiness: The third pillar describes how to evaluate the quality and quantity of collected data and how to preprocess data to make it ready for AI.

  4. 4.

    AI Knowledge Integration: The fourth pillar explains how the knowledge in the minds of the utility experts can be integrated into AI modeling frameworks to build more robust models.

  5. 5.

    AI Model Development: The fifth pillar discusses how to develop accurate and reliable AI models.

  6. 6.

    AI Decision Support: The sixth pillar explains methods to improve the trustworthiness of AI models and how to develop “human-in-the-loop” models.

  7. 7.

    AI Implementation: The seventh pillar on AI implementation explains methods for ensuring successful AI application in the real-world and continuous improvement of models.

3.1 aiWATERS framework verification and validation

As shown in Fig. 1, the aiWATERS framework was developed by the research team based on an extensive literature and practice review. The research team conducted verification and validation of the questionnaire as described below:

3.1.1 aiWATERS verification

The verification of the proposed framework was performed in 3 steps process:

  1. 1.

    Piloting with 2 large utilities: We conducted a series of two-hour workshops with Hampton Roads Sanitary District (HRSD) and Houston Water to get critical feedback on the structure of the framework, individual building block categories, the number of building blocks and the pillars of each building block.

  2. 2.

    Brainstorming session with Jacobs Consulting: The building blocks were iteratively and revised based on feedback received from a two-day brainstorming session with Jacobs Engineering, a large consulting firm in the water sector who have worked with many utilities across the world to help with their digital transformation journey.

  3. 3.

    Comments and feedback from iWIN Committee: We also collected inputs from the Intelligent Water Infrastructure Network (iWIN) Committee, which comprises of technical domain experts from Oak Ridge National Lab (ORNL), Jacobs, Arcadis, DC Water, City of Houston, MWRD Chicago, Clean Water Services and Virginia Tech.

Involving domain experts of water sector from various organizations verified that the aiWATERS building blocks are correct and relevant for utilities and can guide utilities in their digital transformation in the right way. After the building blocks and the overall framework was reviewed and verified with domain experiments, we developed questionnaires to validate the framework, which is discussed in the following section.

3.1.2 aiWATERS validation

We created a questionnaire to explore the current AI practices and willingness to implement the proposed building blocks by large, medium, and small utilities across the country and around the world. The questions were prepared based on the building blocks and associated pillars of the aiWATERS. We did a dry run of the initial version of the questionnaire with two large utilities, Hampton Roads Sanitation District (HRSD) and Houston Water. The aim of this dry run was to capture their input on improving the questionnaire and making it easy to comprehend for experts in the water sector. The suggestions and approval of the large utilities verified the effectiveness of the questions to extract information related to the applicability and relevance of the building blocks.

We sent the questionnaire to more than 100 utilities across the United States, Canada and the UK and requested them to share the details regarding the AI practices they follow by answering our questionnaire. However, at the time of writing this paper, we collected responses from 11 water utilities in the US. We also pilot interviewed some of those utilities through video meetings to find out about their current AI practices and their future plans. The results of the analysis of the utility responses are covered in Sect. 5 of this paper.

3.2 Pillars of aiWATERS framework

3.2.1 aiWATERS understanding the technology, its benefits, and challenges

With the rise of AI technology, terms like AI, machine learning, and deep learning have become popular buzzwords. There is a concern that some studies may exploit these terms, including “deep learning”, to take advantage of the current scientific trends.

It is important for water utilities to understand AI as a technology and its associated benefits and challenges. AI is a broader field of computer science focused on developing intelligent machines that can perform tasks requiring human intelligence. It encompasses subfields such as machine learning (ML) and deep learning (DL) (Goodfellow et. al., 2020). ML enables machines to learn from data without explicit programming, while DL is a specific subset of ML that uses artificial neural networks to mimic the structure and function of the human brain. DL is particularly effective in processing complex data like images and speech, finding applications in computer vision, speech recognition, and natural language processing.

Differentiating between AI, ML, and DL is crucial for the water industry as it helps professionals identify the appropriate techniques and tools for solving different problems. Understanding these differences allows them to determine the most suitable approach based on factors such as data nature, desired outcomes, and available resources (Janiesch et. al., 2021).

AI implementation in the water sector is still in its early stages. Utilities can refer to academic literature for insights on how AI can be implemented. However, they need to be aware of the differences between AI implementation in academia and industry, as most of the research work in literature works with clean and limited data to outperform state of the art models, whereas, in real world, utilities need to work on large datasets which can be noisy and unstructured.

Water utilities should adopt a “human in the loop” approach, where human expertise is integrated with AI solutions and AI is not treated as an autonomous technology (Wu et. al., 2022).

To effectively integrate AI in their systems for real-world water applications, stakeholders and decision-makers in the water sector should be well-versed in AI technology, its benefits, and its challenges.

3.2.2 aiWATERS application goals

AI applications currently existing within the utilities are minimal and limited to a specific and isolated to process or infrastructure, as shown in Fig. 3.

Fig. 3
figure 3

Various AI applications in the water system

Through System-of-Systems (SoS) thinking (Fortino et al., 2020; Kossiakoff et. al., 2020), water utilities can move towards expanding their internal AI applications and achieve a fully AI-driven functionality, where AI as a system engine guides all processes taking place in the utility at a system level. It is similar to a self-driving car where all operations are achieved by AI and the human is in the loop to monitor the flow of operations.

3.2.2.1 Using AI at different levels of the system

Water utilities around the world have varying levels of AI implementation within their systems. Figure 4 shows various levels at which AI can be applied within the water system. The area shaded with green is the natural subsystem comprising of land, water, and other natural components in the water system. The area marked in blue is the built subsystem, comprising of for wastewater collection, stormwater capture, and treatment infrastructure. The area marked with orange highlights the socio-economic subsystem, comprising of factors affecting societal aspects and having an economic impact on a sewershed (such as water demand forecasting, energy cost optimization, etc.)

Fig. 4
figure 4

Different levels of AI applications

System level application is when AI is applied across all the subsystems (i.e. Built Subsystem, Natural Subsystem, Socio-Economic Subsystem). Subsystem level application is defined as when AI is being applied across all processes of a particular subsystem of the water system. Component level application is the AI application of a particular component of a subsystem, like water pipes or wastewater treatment plants.

Currently, AI is mainly used on component level, i.e., specific processes or infrastructure components (like pipes, wastewater treatment plants, etc. as shown by “Component Level Application”), resulting in limited and isolated presence within utilities. Some utilities still rely on manual practices without AI implementation. Others apply AI techniques to certain components or problems while manually managing the rest of their processes and assets. At the sub-system level, AI is deployed across various components to achieve multiple applications, and a hybrid implementation allows different AI applications to interact and benefit each other. The highest level is system-level application, where AI governs all processes while human monitoring is present. Implementing AI at this level brings benefits such as optimized energy usage, reduced manual effort, increased automation, improved infrastructure maintenance, and enhanced decision-making by operators.

3.2.2.2 Various categories of AI deployment

AI deployment within a system can be categorized based on the degree of automation associated with the AI output. There are three categories: manual output, semi-automated output, and fully automated output. In manual output, the AI output is used for manual actions and decision making without any automation. Semi-automated output involves partial automation of system components based on the AI output, where a human operator controls the process. Fully automated output represents a scenario where all system components are operated and controlled by AI without human intervention, but a human is present to monitor and intervene if necessary. These categories define the level of human interaction and engagement in the AI deployment process.

3.2.2.3 Different modes of building AI

There are three main approaches to building AI applications within water utilities are in-house AI applications, vendor-based AI applications, and hybrid AI applications. In-house applications are developed by the utility itself, utilizing the expertise of water domain experts. Vendor-based applications involve collaboration with external vendors and consultants who provide AI solutions. Hybrid applications combine in-house development with guidance from vendors and consultants. Each approach has its own advantages and disadvantages. In-house models can leverage domain knowledge and accurately incorporate utility-specific scenarios but require significant investment of time and resources. Vendor-based models may offer advanced techniques but may lack accuracy in utility-specific conditions. Hybrid approaches combine the benefits of both options.

3.2.3 aiWATERS data readiness

“Garbage In, Garbage Out” is a popular maxim which highlights the importance of high-quality input data in machine learning and AI systems (Kilkenny & Robinson, 2018). The output of these systems is heavily dependent on the quality of the data used for training. Flawed or biased input data will lead to inaccurate and biased outputs. In the water sector, as AI and digital transformation become more prevalent, ensuring high-quality and AI-ready data is crucial for success. Research often focuses on improving AI models, but the process of preparing data for AI applications is equally important. Data scientists spend a significant amount of time on data processing and cleaning tasks. Data readiness poses a challenge in AI applications across various sectors, including water utilities. It is essential for utilities to understand and invest time in cleaning, organizing, and processing data to optimize its readiness for AI implementation. Data Readiness is determined by important factors like data quality, quantity and the need for data preprocessing and data augmentation. The quality of collected data is influenced by factors such as the data source, data integrity, data timeliness, and data relevance. Utilities need to assess the reliability of data sources before using them for AI applications. Data integrity involves the accuracy, completeness, and consistency of the data over time and across formats. Data timeliness and data relevance define how up-to-date and relevant the data is for the problem statement.

Data quantity evaluation is necessary to make sure the AI model has a good amount of data for training and testing. The quantity of data can be determined by 5 Vs of big data (Volume, Velocity, Value, Variety, and Veracity), as shown in Fig. 5 (Krenn et. al., 2022; Vekaria et. al., 2020).

Fig. 5
figure 5

5 Vs of big data

Along with ensuring adequate quantity and quality of data, data preprocessing is a crucial step in preparing data for AI applications, and it involves several techniques to ensure suitability for analysis. The key techniques include data extraction, compilation, and typecasting, categorical data transformation, handling missing values and outliers, flagging inconsistent and duplicate values, dimensionality reduction, discretization and binarization, data normalization, and considering data distribution. Data preprocessing techniques (Mishra et. al., 2020) help utilities to convert data into a suitable format for AI models, handle missing values and outliers, identify and address inconsistent and duplicate values, reduce the complexity of the dataset, transform continuous data into categorical data, normalize data to a standardized range (Singh & Singh, 2020), and consider the required data distribution and skewness for specific problem statements. By applying these techniques, utilities can improve the usefulness of the data used in AI models, enabling better analysis, predictions, and decision-making.

Ensuring adequate feature representation also plays an important role as there are cases where the available data may not fully represent all the important features. For example, when predicting pipe failure, it is crucial to consider different types of cracks, such as fatigue, circumferential, and longitudinal. Sufficient data samples of each crack type are necessary for the model to effectively learn and analyze all the different crack patterns present in pipes.

There are various instances where collecting real world data is difficult, and thus utilities have limited data samples for such cases. For example, there exist water pipes which may operate under extreme conditions, such as high temperatures, high pressures, or corrosive environments. Collecting data under such extreme conditions can be hazardous, expensive, or logistically difficult. For such scenarios, using data augmentation techniques like Synthetic data generation can help the utilities.

Data augmentation is a technique used in machine learning and deep learning to increase the size and diversity of a training dataset by generating modified versions of the original data (Taylor & Nitschke, 2018). It helps capture a wide range of real-world scenarios while retaining the characteristics of existing data.

Class imbalance occurs when the number of samples in different classes of a dataset is significantly unequal, leading to biased models. To address this, class imbalance techniques are used. Popular techniques include resampling methods such as under sampling (reducing the majority class samples) and oversampling (creating copies of the minority class). Another approach is using synthetic data, which is artificially generated data that can help balance the classes and improve model performance. Synthetic data can be generated using techniques like variational autoencoders, GAN models, and neural radiance fields. It offers benefits like handling sensitive data, customizing data to specific conditions, and generating datasets for testing and quality assurance.

3.2.4 aiWATERS knowledge integration

Integrating domain knowledge is essential for effective AI applications alongside data (Krenn et. al., 2022). Science-based models, rooted in scientific theory, are used in conjunction with ML models to address problem statements. While ML models heavily rely on data and are opaque, science-based models have a large number of parameters and incomplete knowledge, limiting their agility. Scientific models are valuable for knowledge discovery and predictive analysis, but inferring parameters and state variables can be challenging.

ML models are not designed to uncover new knowledge and may lack interpretability. In scientific problems, understanding the underlying phenomenon is often as important as making predictions (Von Rueden et. al., 2021). Knowledge-guided ML combines data and accumulated scientific knowledge to extract patterns while considering interpretability, as shown in Fig. 6. This approach ensures that knowledge is not disregarded, and ML models can incorporate decision-making rules informed by scientific research.

Fig. 6
figure 6

Knowledge and data integrated models

Inducing domain knowledge in AI extends beyond ML models to the broader system in which these models are applied. In complex systems like water management, integrating expertise from various domains is crucial for informed decision-making. For example, when modeling wastewater treatment and predicting effluent release, environmental experts should assess the impact of ML-guided predictions on natural bodies of water. Legal experts should also be consulted to ensure compliance with regulations and legal guidelines. The integration of knowledge in AI for water utilities involves collaboration among data scientists, water experts, environmental specialists, legal professionals, and other relevant sectors, as shown in Fig. 7.

Fig. 7
figure 7

Domain knowledge integration in AI applications for water sector

Domain knowledge comes in many forms when required to be used in AI processes. The three main sources of extracting domain knowledge for AI applications in the water sector are:

  • Domain experts: Expert interviews, surveys, conferences, seminars, workshops, and industry events are some good resources which can enable the model developer to capture domain knowledge from the experts who have years of experience in the water domain.

  • Data: Domain knowledge may also be extracted from studying the data related to the problem statement. Data in this context is not limited to data collected through instruments and existing historical data; they also include documents, metadata, prior decisions and results, and facts.

  • Experiments: Experiments bring out procedural knowledge, which is responsible for knowing how to do something and includes rules, strategies, and procedures. Experiments can be laboratory experiments, field experiments, or scale experiments (i.e., where developing and testing the actual target is not feasible, so the experiment is developed and tested at a smaller size). For example, to test a particular bridge design, it is not viable to build a real bridge at the target location and then test it as building a real bridge requires a lot of effort and money. Building a scale model in the laboratory can help in determining how the bridge would react in a real-world scenario.

3.2.5 aiWATERS model development

AI models are the core part of any AI system. Before starting out on developing the AI model, the developers should have complete understanding about the problem being solved and should be aware about the ideal input and output parameters for the application. Input parameters can be in many forms, such as images, variables, numeric values, and integer ranges. Model developers should decide what type of input is required for your problem and can best represent the problem statement in hand. The selection of the right output parameters is crucial to the success of an AI application, as it determines the usefulness and effectiveness of the application. The domain knowledge for input and output complements each other. Through domain knowledge, identify what is the desired state of result and accordingly select the parameters which will be representative of this desired state which we want to achieve with the help of the AI model. Model developers are suggested to follow 7 keys tips (Huyen, 2022) when building an AI model, which are listed in Table 2.

Table 2 Seven keys tips of AI model development

Model developers should be aware of the characteristics of various AI models to make an informed decision about the potential models that can work as a solution for the given problem statement. For instance, Nearest Neighbors Classifiers are highly susceptible to noise but can create decision boundaries of arbitrary shapes, Logistic Regression can handle irrelevant and redundant attributes but does not work well in case of missing data instances, neural networks take relatively longer to train but can handle the presence of interacting variables.

After selecting the potential models, developers should perform rigorous hyperparameter tuning to find the best set of parameters that provide the most accurate results. The model is found to be accurate when it is generalization, i.e., being able to perform well on unseen instances of data. In case of underfitting (poor performance on both train and test data), developers can increase the model complexity, use a more powerful model or perform feature engineering for better features. Whereas in case of overfitting (good performance on train data, poor performance on test data), developers can use techniques like dropout, regularization, early stopping and ensemble modeling (Ying, 2019).

Once the model is deployed, it is a common misconception that its performance will remain constant over time. There are several factors that can affect the model's performance and necessitate ongoing monitoring and adjustments. One such factor is data drift, which refers to changes in the statistical distribution of the data used for training the model. As the nature of the data associated with the problem statement evolves, the model's performance can degrade. To counter this, continuous learning and adaptation of the model are necessary to maintain its performance over time. This involves updating the model with new data and retraining it to ensure it stays relevant.

However, data is not the only thing that changes over time. The external conditions under which the model's decisions are applied also evolve. To ensure optimal decision-making, it is crucial to integrate the latest domain knowledge into the model. This means considering changing external factors and incorporating them into the decision-making process. For example, in the case of a self-driving car, adjusting the vehicle's speed limits based on weather conditions, such as reducing speed in snowy or slippery conditions, can help prevent accidents. Integrating such domain knowledge into the model’s decision-making system ensures it can adapt to changing conditions and make informed choices.

3.2.6 aiWATERS decision support

The implementation of AI in water utilities requires a multi-disciplinary approach that involves experts from various fields. The impact of AI in water utilities extends beyond the utility processes and involves social, environmental, legal, and ethical considerations. Equitable distribution of water resources, environmental impact, and protection of ecosystems should also be prioritized along with other benefits achieved through AI at the utility. While AI based decision support systems may optimize and enhance the processes, it is critical to ensure that the AI system does not have negative social and environmental applications, like depriving any section of society of basic water needs.

AI based decision support should provide trustworthy output, involve humans in the loop and provide effective visualizations for the end users (Eberhard, 2023). For ensuring trustworthiness in AI systems, the European Union (EU) has suggested a holistic framework considering lawful, ethical, and robust AI implementation for trustworthiness (Kaur et. al., 2022), as shown in Fig. 8.

Fig. 8
figure 8

Framework for AI trustworthiness

The three guidelines are supported by four ethical principles: Respect for Human Control, Prevention of Harm, Fairness, and Explicability. Respect for Human Control emphasizes that AI should complement human decision-making, with humans retaining agency and oversight. The level of autonomy should correspond to the level of risk involved. Prevention of Harm involves ensuring technical robustness, safety, privacy, and data governance. AI systems should prevent errors, protect sensitive data, and consider societal and environmental impacts.

Fairness requires AI systems to treat all social groups equally and without discrimination. Accessibility to AI services should be ensured, without bias towards any social or income group. Explicability focuses on transparency and interpretability of AI decisions. AI systems should be explainable, reproducible, and provide explanations for their predictions or actions.

Using AI in critical real-world decisions demands the involvement of humans in the process. Human involvement should include various roles such as model developers, end-users, and domain experts. Human involvement can be classified into three categories: humans before the loop, humans in the loop, and humans over the loop (Kaur et. al., 2022), as shown in Fig. 9.

Fig. 9
figure 9

Types of human involvements in an AI system

Humans before the loop are involved in the design phase of the AI system. They contribute to the planning, designing, and drafting of requirements for the AI system. Developers, policy and domain experts, and end-users are “in the loop” to state their requirements. Humans in the loop are involved in building and developing the AI system. They work on data collection, data processing, model development, knowledge integration, and continuous updating of the model. They monitor the flow and AI output at every step, which is automated by AI, and make required changes if the flow is not as expected. Humans over the loop monitor the output and decisions provided by the AI system and evaluate the system's performance. If necessary, they can override the AI-based decision if it is not suitable for real-world scenarios.

Visualizations are a powerful tool for enhancing decision support in the water sector. They allow model developers, decision-makers, and other stakeholders associated with AI systems to draw simple and actionable conclusions. Effective visualizations can impact decision-making and judgment in many ways, such as improving judgement/decision accuracy, response time, decision confidence, and empowering collaboration.

Table 3 provides a summary of what various stakeholders in a utility may require from a visualization for decision-making use cases (Mitchell et. al., 2019).

Table 3 Stakeholders vs relevant AI use cases for visualizations

3.2.7 aiWATERS implementation

Through various pillars of AI, we saw how successful implementation of AI can be fraught with various challenges affecting the long-term sustainability of AI-based platforms and solutions. The aiWATERS implementation pillar addresses discussion regarding the various challenges in successful implementation of AI to ensure utilities can develop long-term, usable decision-support platforms.

Water utilities should start by accepting and adopting the AI technology for their system. They need to understand the basic characteristics of this technology and should get acquainted with its existing advantages and drawbacks. Once they understand this technology, they should identify the area of application for AI and staff an AI driven team which not only includes computer experts but also experts from water, social and environment domains. The human-in-the-loop approach shifts the focus from building an intelligent AI system to incorporating meaningful human interaction and trustworthiness. While developing AI applications, developers should focus not only on data but also integrating the domain knowledge of the utility process. Before carrying out a large-scale implementation, the utilities should start small by doing a pilot project, as risk is high in large scale implementation. For successful implementation, AI teams should incorporate monitoring as an intrinsic part of the utility AI lifecycle and put their findings into practice to improve models over time.

4 Questionnaire and pilot interview results

In this section, we answer the research questions mentioned in the introduction section based on our pilot interviews and questionnaire surveys with various water utilities. Since water utilities have different characteristics based on the population they serve and their resources, we categories the targeted utilities into 3 categories, small utilities (serves a population less than 50,000), medium utilities (serves a population greater than 50,000 and less than 250,000), and large utilities (serves a population greater than 250,000). Small utilities have simpler treatment processes and distribution systems, with fewer wells, pumps, and storage tanks. Geographical location, workforce availability, and financial funding mechanisms are some of the key factors creating unique challenges for small utilities. Small utilities may be operated by local governments, private companies, or cooperatives, and may face challenges in funding and maintaining their infrastructure. Medium utilities have a more extensive water and wastewater infrastructure system compared to small utilities, including multiple sources of water supply, treatment plants, and storage facilities. Medium utilities are typically operated by public agencies, such as municipal or county governments, and are subject to regulatory oversight and reporting requirements. Large utilities have more complex treatment processes and distribution systems, with multiple sources of water supply, treatment plants, and storage facilities. Due to their size and complexity, large utilities may face challenges in managing their infrastructure and meeting the needs of a diverse customer base. They may also have more resources available for implementing new technologies, such as AI, to improve their operations and decision-making processes.

At the time of writing this paper, we managed to get response from 11 utilities in the United States. Majority of utilities responding to our study were large utilities, and the responding utilities were spread across different regions of the United States, as shown in Figs. 10 and 11.

Fig. 10
figure 10

Distribution of responding utilities

Fig. 11
figure 11

Geographic distribution of responding utilities

4.1 RQ1: Are utilities willing to develop AI solutions and use AI across the system?

As shown in Fig. 12, only a small portion of medium and large utilities are currently using AI at component level, i.e. specific to a process (like wastewater treatment, CCTV image classification, etc.). Thus, it is evident that the current usage of AI in the water industry is limited and fragmented within the utilities.

Fig. 12
figure 12

Current AI usage in utilities

However, 84% of the responding utilities said that they are willing to develop AI solutions in the near future. Although, not all utilities are in favor of applying AI to every process in the future, as seen in Fig. 13. Some utilities think this idea will overcomplicate the system as some processes are very simple, whereas a few utilities believe that applying AI for every process will improve decision making and system understanding. Some of the utilities are unsure about the success of this vision as they have trustworthiness and sustainability issues related to AI.

Fig. 13
figure 13

Willingness to use AI for every process at the utility

The questionnaire captured that some of the utilities are already applying AI in-house, and collectively follow similar steps as proposed in the aiWATERS framework. Even though many utilities are not currently using AI, they responded that they are willing to explore AI in the future and a framework like aiWATERS will immensely help them in holistic and sustainable adoption of AI.

4.2 RQ2: How do water utilities of different sizes compare with each other in terms of AI implementation?

It was found that only some of the large utilities have been able to develop in-house AI expertise so far, as shown in Fig. 14. As stated by utilities there are various tradeoffs of using vendor based, academic and in-house AI models. In-house based models provide fit-for-purpose development, higher understanding, requires more resources, increased efficiency of deployment, and more control and security. Vendor based models are black box, difficult to sustain and maintain, and carry trustworthiness issues. Academia models are not able to generalize due to limited and experimental datasets.

Fig. 14
figure 14

Mode of building AI in water utilities

While comparing these utilities in terms of data readiness, we found that all the utilities value data source, integrity, timeliness, and relevance for data quality. Large utilities have a better sense of data quantity evaluation, unlike small and medium which are focused on volume, as shown in Fig. 15. We also found that all the utilities follow only basic data preprocessing steps. None of the utilities are progressed till the stage to adopt more advanced and mathematical steps, as seen in Fig. 16.

Fig. 15
figure 15

Data quantity evaluation of utilities

Fig. 16
figure 16

Data preprocessing followed by water utilities

Moreover, none of the utilities were found to investigate sampling and synthetic data generation to handle class imbalance.

While comparing model development practices, we found that large utilities had a better understanding of model development compared with medium and small utilities. All large utilities prefer to use simpler techniques (like linear regression, fuzzy logic) over state-of-the-art techniques (like ANN, CNN, LSTM). As shown in Fig. 17, unlike medium utilities, large utilities have no bias/preference when experimenting with AI models.

Fig. 17
figure 17

Model experimentation preference

However, utilities didn’t have a deep understanding regarding hyperparameter tuning as only 25% of them followed it, as shown in Fig. 18, which signifies that their current AI usage is a blackbox for them.

Fig. 18
figure 18

Hyperparameter tuning by utilities

Moreover, all large utilities frequently face the problem of poor generalization, where their models fail on test data. Three large utilities said they perform feature engineering and increase model complexity to overcome the same. In case of overfitting, two large utilities responded to using Dropout for handling the same. We also saw that only 33% of the large utilities update their AI model once a month with new knowledge and data.

We also found that most of the decision makers in the utilities do not have complete knowledge of about the model being used at their utility, as seen in Fig. 19. Here, complete knowledge of the model refers to how the model makes decisions, why the selected model was chosen above other models, what are the characteristics, advantages and drawbacks of this model and its parameters, etc.

Fig. 19
figure 19

Do utilities have complete knowledge about the AI model used at the utility?

The utilities are aware about the importance of multidisciplinary experts when using AI applications for decision making. However, most of them haven’t involved multidisciplinary experts yet in their decision-making process, as shown in Fig. 20.

Fig. 20
figure 20

Do utilities involve multidisciplinary experts in their decision-making process?

Meanwhile, the utilities have a good understanding of what factors accounts for a trustworthy AI system, as many utilities responded to the crucial AI trustworthiness factors, as shown in Fig. 21.

Fig. 21
figure 21

AI trustworthiness factors for utilities

4.3 RQ3: What aspects of AI technology are considered as the major challenges by water utilities?

We ask the utilities about what the major challenges are according to them while implementing AI in their system. As per the Fig. 22, we find that quality data acquisition, developing AI based workforce and integration of AI in existing system are the top three key challenges for utilities. As we saw earlier in RQ2, utilities are currently not exploring any data generation techniques like synthetic data, which can help them to overcome data acquisition challenges.

Fig. 22
figure 22

Major AI implementation challenges for water utilities

5 Discussion

In this section, we capture the perspective of small, medium and large utilities on AI based on our pilot interviews and survey results.

5.1 Small utility perspective on AI

The small water utilities are at the preliminary stages of adopting AI, as they currently do not use AI within their system. However, that does not mean that they are against implementing AI in their system. They have insights into what benefits and challenges AI provides for them and how AI-based decision-making can support their operations, and they intend to explore AI applications within their utility over the next 5–10 years. Like their medium and large counterparts, the small utilities are willing to upgrade to AI and use cutting-edge technology for their system. However, there are various challenges that they need to address for successfully integrating AI in their system. Since small utilities do not have AI experts on their teams, they rely on consultants and AI vendors to integrate AI into their systems. They have funds and resources to collaborate with consultants and vendors, however, sustaining the AI operations at the utility is a challenge for them. Consultants and vendors are not permanent employees of the utilities and can help the small utilities to set up and integrate AI into the system. The onus of sustaining and monitoring the performance and reliability of AI is on the utilities and they do not have the required experts and resources who can keep a check on the AI system and maintain it over time. Lack of in-house expertise also justifies the small utilities being limited to basic data processing approaches and zero experience in model development (RQ2). While these utilities operate at a small scale, providing water to its county is equally important as that of providing water to a large city. Another issue is the reliability of the AI output. Since the utilities deal with real-world water operations and provide basic water needs to people, they need to fully trust the system which they are using for service, as the stakes are high. They haven’t reached a stage where they can fully trust AI technology instead of a human for making crucial decisions. The small utilities already have a small workforce and there is a general feeling that AI will replace jobs that involve repetitive tasks such as data entry, customer service, and certain types of water utility services.

5.2 Medium utility perspective on AI

The current state of using AI within the utility is similar for small and medium utilities, as most of them are not using any AI techniques at this point for their water operations. We found one of the medium utilities piloting ML models for computer vision applications to CCTV data. However, this application was dependent on vendor based and academic collaboration as the utility didn’t have any in-house expertise. Their data readiness approach is similar to that of small utilities, as they focus on volume and value of the data collected and the basic data preprocessing techniques (RQ2). They are not well versed with the best practices of AI model development as they do not have any in-house expertise and the models provided by the vendors are of blackbox nature to them. Since they understand this technology, their views are in line with the small and large utilities regarding human-in-the-loop integrated AI approach within water utilities. Given the blackbox nature of AI models for these utilities, reliability and sustainability of AI technology remains a concern for them as well. Medium utilities are one step ahead of the small utilities in terms of taking proactive measures towards developing an AI driven workforce for a digital transformation in the future. Currently, medium utilities have diversity and inclusion-driven workforce development programs for new hires to specialize in achieving the utility’s day-to-day water operations. As the medium utilities prepare to enter the digital era of water, they plan to build a data and AI specialized workforce that can help them achieve in-house AI solutions for their utility. Medium utilities are also aware of the interdisciplinary requirements of integrating AI in their system and thus, they aim to bring together Social, Environmental, Legal and AI based domain experts in future to achieve a holistic and robust AI implementation within their utility. Thus, medium utilities are in the middle of small and large utilities in the spectrum of AI implementation, more towards small utilities, but they are making efforts towards reducing the digital gap with their large counterpart. Utilities across all size are acquainted with AI as technology, its benefits, and challenges. However, there is a notable gap in the pace at which they are approaching a digital transformation. Medium utilities have not achieved the level of large utilities in terms of implementing AI operations. This can be attributed to the fact that many utilities grew from small to medium due to urbanization from rural areas in the past century and are still lagging in terms of data collection, modelling and data driven decision support compared with the large utilities. However, their proactive approach and plans of building an AI workforce speaks for the fact that they are racing towards reducing the digital gap with the large utilities.

5.3 Large utilities’ perspective on AI

The current state of AI used in large utilities is spread across a spectrum, where some are not implementing AI like small utilities, some are piloting with vendors like medium utilities, whereas a few have developed in-house expertise. One of the large utility participants said: “Since we don’t have much of an in-house team, I was answering based on the practice of past vendors who did pilot previously—not really what we wanted, but pretty much what the organization was able to procure before.” This signifies the impact of vendor dependency in water utilities. The lack of in-house expertise justifies the black box nature of AI at some of the utilities (as they were not aware about hyperparameter tuning, model selection tradeoffs/complete knowledge of the techniques). The presence of in-house expertise in some utilities accounts for better data quantity evaluation, preprocessing and model development strategy for large utilities when compared with small and medium utilities (large utilities didn’t show any bias/preference towards particular model), as observed through RQ2. It should be noted that most respondents in the questionnaire were civil engineering domain experts and were not designated in a AI specific role. These people may have knowledge about AI and might be leading the in-house AI operations. However, lack of advanced knowledge may account for not following advanced preprocessing steps, no timely updating of model, and use of simpler AI techniques over state-of-the-art techniques, as seen in RQ2. Moreover, the AI solutions created by the large utilities are not perfect yet. They face difficulty in generalization. For instance, a utility may not be able to apply the same model to its various treatment plants as it won’t give results with same accuracy for treatment plants operating under different conditions. Even vendor-based collaborations didn’t lead to generalization. Vendors are not domain experts and may not customize their baseline solutions as per the need of a utility.

The large utilities, like their small and medium counterparts, understand the limitations of this technology as trustworthiness is a concern for them. They do not support the idea of a fully AI operated system as they wish to integrate humans in the loop at every stage of AI lifecycle. The scale at which they provide services to people is huge and therefore, it is important for them to trust the AI system which they will use for decision support. Since large utilities have more resources and funds to invest in AI, they plan to expand their in-house expertise by building an AI specialized workforce which can help them become more proficient in their journey of digital transformation.

We got zero responses on the question which asked if the responding utility look up to some other utilities in terms of implementing AI. It shows that utilities are not aware about how other utilities are progressing in their AI journey. Considering the digital divide, small and medium utilities can pilot and learn from the large utilities about their AI based projects and plans.

A key takeaway from this discussion is that many water utilities want to get involved in applying more AI for their systems but are unsure where to start. For this purpose, the proposed framework can help these utilities start implementing AI at their system level. Utility managers have trust issues with AI as the decisions they make have wide social implications and may put their licenses at risk. As a result, utilities need a comprehensive framework such as aiWATERS to develop a systematic way of implementing and testing AI models.

5.4 Limitations and future work

The authors reached out to more than 100 utilities globally to capture the AI practices in the water sector. The current results include a discussion on 11 utilities which responded to the questionnaire at the time of writing this paper. Collecting and analyzing the response from more participating utilities from different geographic locations across the world can help to enrich the discussions and results obtained in this research. The participating utilities in this research were all public utilities. To get a more generalized sense of current AI practices and challenges in the water sector, this study should also be extended to private and federal utilities. We saw that many utilities are not currently implementing AI, and some are dependent on AI vendors for the same. The Strategic Implementation pillar of the aiWATERS framework can be extended to include a guide on how to develop in-house AI expertise for the utilities.

6 Conclusion

In this paper, we propose aiWATERS, an Artificial Intelligence framework designed to help water utilities successfully integrate AI in their system. We provide an extensive literature and practice review of the AI practices in the water industry, where we identified several challenges associated with AI implementation in the water sector. Based on the limitations and best practices found in literature and practice review, along with discussions from water domain experts, we proposed seven pillars of AI for the aiWATERS framework: Understanding the technology, its benefits and challenges, Data Readiness, Knowledge Integration, Model Development, Decision Support, and Strategic Planning. We conducted pilot interviews and administered questionnaire surveys with several public utilities in the United States to assess their current level of AI implementation and identify the challenges faced by them. By comparing the AI implementation among small, medium, and large utilities, we made several noteworthy findings. Utilities in each category are at a different stage in their journey of digital transformation. We found that none of the small utilities and only 50% of medium and large utilities implement AI in their system. Moreover, half of these utilities who implement AI are dependent on AI vendors and lack in-house expertise. It proves that AI in the water sector is still at a nascent stage and this technology is a black box for many utilities. The black box nature of AI for water utilities leads to trustworthiness and sustainability issues for the utilities. Therefore, the utilities require a comprehensive framework like aiWATERS to provide a systematic way of achieving a holistic AI implementation in their system. Based on the pilot interviews and questionnaire survey with the 11 utilities, we found that utilities that belong to the same scale/size have similar structure, workforce, and characteristics within their organization. Since we capture the AI practices of water utilities belonging to all three (small, medium, and large) sizes, we can generalize the findings of this work to water utilities across the US. In future, this study can be extended to private and federal utilities as well, as they may have different characteristics than public utilities which can lead to more diverse findings regarding the state of AI in the water sector.