1 Introduction

Maintenance includes an aggregate of all administrative, technical and supervision processes that ensure machines continue to run in their expected manner (Basri et al. 2017; Jianzhong et al. 2020). Maintenance generally includes methods such as inspecting, testing, servicing, overhauling, replacing, repairing, measuring, and detecting faults to avoid failures that could hinder smooth production processes (Zonta et al. 2020). Efficient maintenance is able to minimize failures and prolong useful life of machines. Historically, as shown in Fig. 1, maintenance could be categorized into corrective, preventive, condition-based, predictive, and prescriptive, with prescriptive being the latest method (Hu et al. 2022).

In the beginning of automation, machines were simplistic. Thus, they could be operated as long as possible and maintained only after they have failed. This maintenance approach is reactive and called corrective maintenance (CM). However, as machines became a little more sophisticated, corrective maintenance proved to be naive. The outcome was a considerable rise in unanticipated downtimes, resulting in severe damage to machine parts and a substantial strain on the maintenance budget.

Fig. 1
figure 1

The five types of maintenance approaches and their evolution

This paved the way to what is called preventive maintenance (PM), where the machine is maintained either at a regular frequency (calendar-based) or after a certain length of usage (usage-based) (Yang et al. 2019; Wang et al. 2020; Wong et al. 2022). The time or length of usage before maintenance is often the result of an educated guess. For example, it could be said that a car is due for service after it runs a certain number of kilometers. PM has the potential of reducing system downtimes, hazards due to part failures, and also maintenance cost. While this type of maintenance has its merits, it also comes with its own limitations. The machine operation hours or time set for maintenance may be inaccurate. For example, a car driven on a rough road may require early maintenance despite running a few kilometers. Thus, not maintaining the car on schedule could be dangerous. Also, the reverse may be true, when the car runs on a smooth terrain, it may not require maintenance even after reaching the preset kilometers. Hence, maintaining it before the due time amounts to resource wastage. Huang et al. (2020) have reported that in PM, majority of machines are maintained too early while they still have a large amount of their useful life available.

With advancement of electronic sensors, the condition of many parts of machines can be determined. Hence, technicians can check whether parts actually require maintenance or not. This presented another advanced proactive maintenance method called predictive maintenance (PdM) (Pech et al. 2021; Theissler et al. 2021; Wen et al. 2022). In PdM, data analytic tools are used to find defects and abnormalities in operations of machines so they can be maintained before they fail. PdM goes hand-in-hand with condition monitoring. In condition monitoring, sensors are deployed to measure conditions of parts of the machines such as vibration, temperature, torque, and noise. These sensor signals are then used in PdM to determine when the system would likely fail and thus maintenance scheduled. From these definitions, it can be seen that although the PdM may require a large deployment cost because of the sensors, it offers the least maintenance cost since the usage time before maintenance can now be set more accurately with information from the sensors. In the context of our car scenario, the vehicle can be equipped with sensors that monitor certain properties of the oil, such as temperature and viscosity. By utilizing this data from the sensors, well-informed maintenance decisions can be made to precisely predict the most suitable timing for an oil change. PdM techniques have been proven to result in a staggering 900% increase in ROI, around 30% decrease in maintenance costs, 75% reduction in breakdowns, and 45% drop in downtimes (Daily and Peterson 2017; Florian et al. 2021; Zhang et al. 2019).

Some authors have another category intermediate between PM and PdM. This is called condition-based maintenance (CBM) (van Staden and Boute 2021; Xu et al. 2021; Rath et al. 2022). In this category, sensors are also used to monitor the conditions of the machines. However, failure predictions are not made. Maintenance alarms are raised when the sensor signals hit certain preset thresholds. The main drawback of this method is that a significant amount of degradation must have already occurred before maintenance is done.

Prescriptive maintenance (RxM) takes predictive maintenance to the next level (Gordon and Pistikopoulos (2022; Tham and Sharma 2021; Momber et al. 2022). In RxM, not only are failure events estimated, but the system is able to recommend action(s) to be taken and their corresponding consequences. For instance, when an engine is running with varying bearing temperature, PdM would be able to tell when the engine would probably fail given the temperature trend. In contrast, RxM would go further and tell us that if the engine speed is decreased to a certain level, the time before it fails may be extended. Thus, while PdM can tell the estimated usage time of machines before failure, RxM would allow us to know the effect of different operating conditions on the time to failure. The main driver of prescriptive maintenance is prescriptive analytics. This kind of analysis extends beyond predictions to exploring hypothetical events. Thus, prescriptive analytics can be regarded as a tool that uses mostly artificial intelligence techniques to provide multiple scenarios and simulations without them happening in real life (Meissner et al. 2021).

Table 1 Different maintenance approaches and some of their properties

In most literature, the field of detecting, diagnosing, and predicting faults in industrial machines is called prognostics and health management (PHM) (Che et al. 2019; Fan et al. 2019; Huang et al. 2019). Table 1 compares the four maintenance schemes on some characteristics. Prescriptive maintenance is not listed since it is an extension of predictive maintenance. The table shows the strength and weakness of each method. For example, although the PdM offers more ROI, it has a high initial deployment costs. One may wonder which scheme to adopt? The answer is subjective and depends on the asset under consideration. For simplistic machines, PdM may be an overkill, while it may be the choice method for sophisticated machines like the airplane.

In data-based PHM, sensor data is collected from machines and sent to remote cloud servers for analysis. However, because of the time-critical nature of PHM, transmitting the data to the cloud leads to significant time wastage. Thus, modern methods have adopted edge computing, where the data analysis is done on devices closer to the sensors for a faster response. More details on cloud and edge computing are presented in Sect. 2.

1.1 Related reviews

Many researchers have provided surveys on the application of edge computing technology in maintenance, for example, Hafeez et al. (2021) reviewed works that use edge computing with machine learning for predictive maintenance. However, the papers reviewed were not restricted to maintenance application. In addition, the review discussed many data preprocessing and reduction techniques that can be done on the edge. Moreover, they propose a framework where the edge could also conduct local model retraining and later merge its new model with that of the cloud. However, there was no proof of concept to see the feasibility of the architecture.

In a related review, Compare et al. (2019) presented practical challenges encountered in adopting predictive maintenance (PdM) in industry. The paper pointed out that PdM goes beyond the hardware and software used for tracking the health of machines. It involves all decisions taken from data collection to maintenance labor. They further highlighted procedures to consider before choosing the right maintenance strategy for machines. Moreover, the paper stated that it is essential that practitioners not only focus on collecting big data that fills up the memory of devices but also on fetching smart data Alsharif et al. (2020). However, the paper did not discuss maintenance in the context of edge computing.Other review works that focus on AI with edge computing for the maintenance of industrial equipment include Chatterjee and Dethlefs (2021), Lu et al. (2023), Li et al. (2022), and Ucar et al. (2024). Table 2 shows a summary of these reviews and how they compare with this paper.

Table 2 Comparing the proposed work with existing reviews

To the best of our knowledge this is one of the first reviews that focuses on works that apply these three technologies, PHM, edge computing, and AI together. Moreover, the paper also details the location most researchers perform the basic tasks like data preprocessing, model training, and final model deployment. The main contributions of the paper can be summarized as follows.

  • The explanation of maintenance techniques and their evolution over the years.

  • The review of recent works that employ edge computing with AI for machine maintenance.

  • The categorization of the works based on the location of the AI set up (model training).

  • Elaboration of design issues to consider before placing AI on edge devices.

  • Identification of trends and future research directions in AI-enabled edge computing for machine maintenance.

The rest of the paper is arranged as follows. Section 2 presents the edge computing paradigm. Section 3 presents a review of works that employ edge computing for machine maintenance. Section 4 of the paper discuses obstacles often encountered when placing AI on edge devices. The section also presents trends and future research directions of the field. Finally, Sect. 5 concludes the paper.

2 IIOT and edge computing

This section describes the industrial internet of things (IIoT) and edge computing technology.

2.1 Industrial internet of things

Industrial Internet of things (IIoT) originated from internet of things (IoT), which was a term first used by Kevin Ashton in 1999 while describing his research works on RFID tags (Ashton 2009; Hazra et al. 2021; Kashani et al. 2021). Accordingly, IoT is an interconnection of mobile/stationary objects, devices embedded with sensor, communication, and actuator units. These additional modules make these devices “smart,” allowing them to automatically function with minimal human supervision (Sengupta et al. 2020).

Fig. 2
figure 2

An IoT/IIoT architecture (Abosata et al. 2021)

Figure 2 shows a typical IoT/IIoT framework. It comprises the following four layers: perception, transport, processing, and application (Abosata et al. 2021).

The perception layer contains devices (like cameras), actuators, and sensors. Sensors collect device data and may also track environmental factors like temperature, humidity, and pressure. On the other hand, the actuators are responsible for converting digital or control signals from the higher layers into physical actions such as turning camera angles, switching devices on or off, or controlling flow valves.

The second layer is transport, also called the communication layer. It transfers data collected from the perception layer to the processing layer, often via the Internet. Typical protocols adopted are WiFi/IEEE 802.15.4, NarrowBand-IoT, Bluetooth, LoRa, and 6LoWPAN.

The third phase is the processing layer. It performs data storage, data abstraction, and data analysis. The analysis involves extracting intelligence insight from data using powerful tools like machine learning.

Finally, the application layer decodes information from the processing layer and presents it into human-readable versions such as graphs and tables for end-user consumption. It also provides record-keeping and an interface for sending signals back to devices at the perception layer.

This signal is often generated as a result of processing the data from the previous layer and is usually used to activate actuators in the perception layer. As an illustration, if the processing layer detects that a device’s temperature is reaching critical levels, it can send a signal to an actuator, prompting the activation of cooling fans.

As later discussed, edge computing aims to minimize the distance traveled by data from the perception layer to the application layer and vice versa. This enables rapid decision-making, protecting human lives and avoiding multimillion-dollar losses caused by unexpected machine downtimes.

IoT has facilitated applications beyond our imaginations, today, farmers can remotely track the real-time condition of their fields from mobile devices. At their fingertips, they can monitor weather conditions, soil moisture, and even check if their farms are about to get intruded either by humans or birds (Mohamed et al. 2021; Rehman et al. 2022). Additionally, physicians can now access the health of home patients on 24/7 basis, thus reducing the number of hospital visits (Kashani et al. 2021; Bharadwaj et al. 2021).

Another area that IoT has revolutionized is home-automation (Stolojescu-Crisan et al. 2021). Here, home appliances are connected to the internet so that they can be remotely controlled and monitored. With the advent of IoT, it is now possible to effortlessly control light switches using smartphone buttons or voice commands. Additionally, smart thermostats allow us to adjust room temperatures and even provide energy usage reports. Furthermore, IoT-driven water sprinklers can efficiently water gardens at preset times, contributing to water conservation efforts (Islam et al. 2022).

When all these technologies are moved to industrial applications, another “I” is added to IoT, making it industrial internet of things (IIoT) (Endres et al. 2019). IIoT is a connection of multiple industrial devices through a varied communication platform that enables the creation of a top-level system for collecting, monitoring, exchanging, analyzing, and providing valuable information. This system can aid industries in making faster and smarter business decisions (Al-Turjman and Alturjman 2018). Artificial intelligence (AI) tools (Ali 2018) can use these sensor data from IIoT to help in detecting, diagnosing, and predicting faults in machines, this is the backbone of modern PdM and CBM.

2.2 Edge computing technology

In PdM and CBM, there is often the need to analyze the sensor data collected from parts of the machine. This analysis is usually done on remote server stations. Thus, the data needs to be transmitted either via cables or the internet. These long transmission paths can lead to certain challenges such as, transmission of unwanted data, transmission delays, sending of incorrect or incomplete data due to dropped packets, and privacy concerns. Due to these and other challenges, researchers have tried to make data analysis and decisions as close as possible to the machine without the unnecessary transmission of data to remote stations. This technology is called edge computing. Edge computing is a distributed computing technology that brings applications closer to their sources of data like IoT devices and local edge servers (Cao et al. 2020; Khan et al. 2019; Satyanarayanan 2017). It earns its name "edge" from the fact that now computing resources are located at the edge of the network rather than in remote core place as in cloud computing. This offers numerous advantages such as faster insights, shorter response times, better bandwidth availability, more battery life management, and enhanced data safety and privacy (Olaniyan et al. 2018; Cao et al. 2020; Dai et al. 2019).

Figure 3 compares edge computing with centralized computing. From the figure, it can be seen that in centralized computing like cloud, all processing and analysis is done on the remote cloud, while in edge computing, the processing is moved to the proximity of data sources. The advantages of edge computing are obvious on the upstream link i.e., from IoT devices to the cloud. However, the reverse is also true. Despite the enormous computing power in cloud or datacenter, the edge can also offer certain services for downstream communication i.e., from cloud to IoT devices such as, data caching, buffering, and routing of information to the right machine. In addition, Table 3 compares centralized computing to edge computing based on six parameters. From the table, it can be observed that there is no clear winner between the two. For example, while edge computing offers less network latency, centralized computing may offer better security. The choice between the two will depend on the target application. Additionally, most large network systems will combine the two architectures in a hybrid form to reap the benefit of each. However, for maintenance systems, the primary optimization goals revolve around maximizing system reliability, minimizing downtime, and optimizing overall performance. These objectives are critical in ensuring the smooth operation of the system and minimizing disruptions that may lead to inefficiencies. With these objectives in mind, it is clear that PHM maintenance systems require fast response times. Regarding response times (latency), from Table 3, edge computing outperforms centralized computing, which requires data to be sent to a remote server for processing before making decisions. Moreover, centralized computing often hits a stumbling block when plants are unwilling to share their data with the remote server for proprietary or privacy reasons. Thus, to achieve fast response times and, at the same time, ensure data privacy, the architecture of choice for most systems is edge-computing, where the central server works collaboratively with edge devices to provide fast and reliable inferences (Sriram 2022).

Fig. 3
figure 3

Comparing edge computing architecture with typical centralized computing (Liu et al. 2019)

Table 3 Comparing centralized computing to edge computing

2.3 Notes on terminologies

Here, some common terminologies used in the literature are defined.

Fog computing:The term fog computing was coined by Cisco Systems Inc. (Bonomi et al. 2012; Naha et al. 2018; Yousefpour et al. 2019). It refers to an architecture where there is a fog layer sandwiched between data sources and the cloud. This fog layer provides compute, networking, and storage for data. It offers the same advantages outlined in edge computing. It is hard to make a distinction between fog computing and edge computing, because they basically mean the same thing. Thus, most researchers use the terms interchangeably. This paper sticks with the name “edge computing” to mean both.

The cloud:In regular cloud computing, the cloud is often stationed in some remote continent and clients subscribe for services. In this paper and the context of IIoT, the term “cloud” could mean the next hop above the edge layer having more compute resources. Thus, both the edge devices and cloud devices could be in the same building or room.

3 Edge computing AI for machine maintenance

Figure 4 categorizes the edge computing-based maintenance research into three stages, which are, data preprocessing, model training and model deployment. The data preprocessing stage involves all steps executed to prepare the data for training, these include data cleaning, denoising, normalization, standardization, feature engineering, and dimensionality reduction. Most surveyed papers would carry out this operation on the edge as shown by the thick arrow to the left of the figure, while only a few do it on the cloud (Dang et al. 2021). One of the reasons for this is because these processes are not as computationally expensive as model training and thus can be performed on the edge.

Fig. 4
figure 4

Framework for categorizing edge computing-based machine maintenance

Fig. 5
figure 5

Classification of common models used

In stage 2, the model is trained. The prevalent models employed in this domain can be classified into two main categories: classical and deep learning methodologies, as illustrated in Fig. 5. Classical approaches include linear regression, random forest, and support vector machines. On the other hand, deep learning approaches consist of sophisticated models such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRU). Furthermore, some research papers propose hybrid techniques, integrating then strengths of two or more methods to achieve enhanced performance (Hsu et al. 2020).

Most researchers perform model training on the cloud because of its computational power (Wu et al. 2017). However, a few papers chose to perform model training on the edge using methods like federated learning (Imteaj et al. 2021; Yang et al. 2022).

Finally, in stage three, the model is deployed to perform the fault identification or prediction tasks. From the works surveyed, this process is mostly executed on the edge. This is partly because it is often required that the inference be as fast as possible, thus the model is moved closer to the data source. The whole process is not often static. As the machine operates, the stages are often repeated and the model is retrained to match the machine evolution or change in its environmental conditions.

The selection criteria for the papers are as follows. Scientific databases such as Google Scholar and IEE Xplore were searched with a combination of the words “edge computing,” “prediction,” “fault detection,” “fog computing,” “predictive maintenance,” and “condition-based maintenance.” A further filtering process was performed, where the paper’s abstracts were read to make sure they incorporated these three technologies in their methodologies, i.e., edge computing, PHM, and artificial intelligence. All papers that did not meet this requirement were left out.

Moreover, to review the selected research works, they are categorized into three: model training on the cloud or remote server, works that perform the training on the edge layer, and those that perform the training on both edge and cloud.

3.1 Model training on the cloud

For most works in this category, the data is collected via the edge devices and sent to the remote server for training. After the model is trained, it is deployed on the edge devices for inference.

For instance, Park et al. (2018) designed an edge computing-based maintenance technique. They termed it light-weight real-time fault detection system (LiReD), It consists of two parts, frontend and a backend. In the beginning, the frontend edge device (Raspberry Pi) takes data from machine sensors and transmits them to the backend. When enough data is sent, the backend then trains an LSTM to binarily classify faults. Subsequently, the trained model from the backend is sent to the edge device at the frontend. Thus, to classify new data from sensors, the edge device can now make classifications without sending the data to the backend. The method was tested on an industrial robot manipulator and the LSTM was compared with other baseline techniques.

In Zhang et al. (2020), the cloud trains a support vector machine (SVM) to perform condition monitoring of bearings from their vibration data and the trained model is deployed on a raspberry Pi model 4B. Similarly, Wu et al. (2017) is one of the famous works that developed an edge computing maintenance strategy. The framework emphasizes the advantages of having a hybrid of a private cloud at the edge and a public cloud in terms of data privacy and latency reduction. The model training is conducted in the cloud and deployed on the edge BeagleBone Black device.  They tested the method on health prediction of pumps in a power plant and that of a CNC cutting tool. The prediction model used is the random forest and the Predix model of General Electric. However, they do not provide any great detail of the random forest they used or even that of the model contained within the Predix software. The research also provides recommendations on the communication protocols to be used in every interface of the framework.

In a compelling research, Ren et al. (2020) put forward a cloud-edge-based RUL prediction for IIoT applications. It consists of two AI models, one at the edge and another at the cloud. Both models are an improved and lightweight versions of the temporal convolutional network (TCN) (Bai et al. 2018). The developed lightweight TCN (LTCN) had lesser parameters and training times. At the edge, the LTCN takes the preprocessed sequence data at a single sampling time and gives a rapid prediction. While in the cloud, the model takes the compressed feature extracted from the LTCN from the edge at multiple sampling times and produces a more precise and smooth prediction. After enough data (the amount is not specified in the paper), an incremental learning system situated within the cloud retrains and updates selected parameters of both LTCNs. This can be useful in avoiding data-drift (Mallick et al. 2022).  They tested the method on the RUL prediction of roller bearings and compared it to LSTM (Zheng et al. 2017) and GRU (Cho et al. 2014). Results they obtained showed that their technique gave faster and more accurate results.

Additionally, Zhang and Ji (2020) developed another technique for machine maintenance. It relies on energy consumption data to detect anomalies in machine operation. Thus, at the physical layer, energy meters are installed to collect energy consumption data. This data is then preprocessed on the edge. Subsequently, it is sent to the cloud where an LSTM classifier is trained and later deployed on the edge. The method was applied for fault detection in a milling machine and the LSTM was compared with other models.

In Panicucci et al. (2020), an end-to-end edge computing-based predictive maintenance was proposed. In the work, the RUL predictor is trained in a docker container within the cloud and later deployed on the edge (Alam et al. 2018). Since the model is within a container, the RUL prediction can be carried out either on the edge or the cloud. The model options are decision trees, random forest, and gradient boosted trees. Additionally, the system has a self-assessment module to retrain the model after data drift has been detected. They tested the method for the RUL prediction of a robotic arm as a proof of concept.

In related work, Hsu et al. (2020) proposed a method that uses Raspberry Pi and NVIDIA GeForce GTX 1080 Ti GPU processor at the edge for remaining useful life (RUL) prediction of an aircraft engine. They employ machine learning methods like convolution neural network (CNN), long short-term memory (LSTM) and gated recurrent unit (GRU) for the RUL prediction. They state that the final RUL prediction results are sent to a MongoDB cloud. However, the paper falls short, it does not elaborate on how the Raspberry Pi interacts with the GPU neither does it state on which of edge device the algorithms are run.

In related work, Liang et al. (2019) also proposed a three-layered maintenance method that is powered by fog/edge computing. The terminal layer consists of the sensors and analog to digital converters (ADCs) attached to the physical machine. The collected data from the terminal layer is then sent to the fog layer. This layer does data preprocessing and fault identification based on a deployed CNN model. Finally, the cloud layer, given its high computational ability is responsible for training the CNN. Moreover, if a fault is identified on the fog layer, signals are sent to the cloud for further actions. They tested the method on a CNC machine with encouraging results.

Furthermore, Huang et al. (2021) developed an edge AI-based predictive maintenance method. It follows similar trends with other works, where the cloud does the model training and the edge carries out prepossessing and real time fault prediction based on the deployed model. They used the gradient-boosted decision tree (GBDT) model with Raspberry Pi 3B+ as the main edge device. The interesting part of the research is that as the sensor signals evolve with machine usage, the trained model in the cloud is also retrained. They applied the technique on a lithium bromide absorption chiller, which is the main unit of the central air conditioning unit used in most commercial buildings. The method was able to identify a fault one day before a human technician did.

Yu et al. (2022) proposed a three-layered framework for edge computing-based predictive maintenance. It consists of the edge layer, application, and the cloud layer. As in most other works, the edge layer uses a pre-trained model (trained in the cloud layer) for the prediction. Moreover, the edge layer also performs feature engineering and prepossessing of sensor data. Thus, only a summarized version of the data or prediction results is sent to the cloud layer. As a result, the edge layer is able to make real-time predictions without needing to send data to the cloud. On the other hand, the application layer is responsible for showing fault signatures via a dashboard. The deployed model is an auto-encoder and the technique was tested on predicting faults of a reciprocating compressor.

In Gültekin et al. (2022), another maintenance technique was developed. In the method, the model is trained offline and deployed on an NVIDIA Jetson TX2 GPU edge device. Thus, the edge device now does the preprocessing and inference in real time. The model they used is the LeNet-5 CNN with short-time fourier transform (STFT) (Wan et al. 2020). Furthermore, the approach was used for condition monitoring of an autonomous transfer vehicle. They were able to obtain a reduced bandwidth requirement of 43 folds as well as a 37 times reduction in latency. One downside of the method is that the model is not automatically updated with new data.

A similar maintenance strategy was developed by Huang et al. (2022). The method involves a cloud and edge collaboration. Like other techniques, model training is done on the cloud and then deployed at the edge. Additionally, the deployed edge model was able to identify both faults and working conditions. The authors stressed that since data is time varying, the model should not be static. Thus, two approaches of model update were outlined. The first is time-triggered, where the model is updated after a certain period has elapsed or a fixed amount of data has been collected. The second is event-triggered, where the model is updated after a specified event has occurred. Due to the difficulty in identifying the right moment of model update in the time-triggered method, they employed an event-triggered method. When the threshold is hit at the edge, a model update trigger is sent to the cloud. The model used was dictionary learning Garcia-Cardona and Wohlberg (2018) and it was tested on numerical data as well on an industrial boiling roaster and high accuracy was obtained when compared to other techniques.

On the other hand, Ren et al. (2022) proposed a cloud-edge collaborative method called label-split multiple-inputs convolutional neural network (LM-CNN). The paper highlighted that the lack and imbalance of historical data is a bottleneck to most fault detection models. Thus, the research first developed an approach that enlarges the sample space of available data and then performs data augmentation. This gave them ample data that they then used to train the ResNet-18 and ResNet-34 networks. The sample space enlargement, data augmentation and training are carried out in the cloud. In contrast, the trained model is deployed on the edge. They tested the technique for fault identification in bearings and obtained competitive results.

Similarly, Mourtzis et al. (2022) developed a framework for integrating predictive maintenance and edge computing. To leverage the advantages of the edge, Raspberry Pi nodes are installed close to the sensors. The nodes perform preprocessing of collected data and use a support vector machine (SVM) to check the existence of fault or otherwise. If there is no fault, the digital twin in the cloud remains idle. However, if a fault is detected by the edge node, the data of that node is sent to the cloud to determine the remaining useful life (RUL) of the equipment. They tested the framework on a set of refrigerators with interesting results.

3.2 Model training on the edge

As discussed in the previous section, the best place for model training or any major processing is the centralized cloud which has an enormous amount of compute resources. However, in certain situations practitioners are almost forced to train on the edge of the network. One of the reasons for this is privacy issues. Due to proprietary and privacy laws, some edge devices may not be able to share their data to the cloud for model training. To solve this issue, technologies like federated learning (FL) (Yang et al. 2019; Nguyen et al. 2021; Khan et al. 2021) are used. In FL (see Fig. 6) the model is trained on the edge devices. Rather than share data, the local models share training parameters and after training, each edge device will have the final trained model.

Fig. 6
figure 6

A typical federated learning framework. Due to privacy, local data is not sent to the server. Local clients perform training on their data and share only model parameters via the server (Zhang and Li 2021)

Federated learning and edge computing are two distinct technological concepts with different motivating factors. Edge computing was primarily designed to address latency issues by pushing data processing closer to the source, thereby reducing response times for time-sensitive applications. On the other hand, federated learning aims to tackle data privacy concerns by enabling collaborative model training while keeping individual data decentralized and secure. Nevertheless, the two technologies can complement each other harmoniously, as federated learning can leverage the computational capabilities of edge computing to optimize its efficiency, achieving its goal of data privacy while benefiting from reduced communication overhead and low latency processing.

Some of the works that apply FL for machine maintenance include that of Qolomany et al. (2020). In this work, they developed a particle swarm optimization (PSO) (Elbes et al. 2019) technique to optimize the selection of hyperparameters of an LSTM network within the edge devices. They tested the method on the condition-based maintenance of 100 machines and compared the PSO method with other hyperparameter selection techniques. They found that their method was competitive and had less communication overhead.

In related work, Zhang et al. Zhang et al. (2020) developed a blockchain-based FL method for fault detection. The technique uses blockchain to improve the integrity of client data. Moreover, each client is rewarded with an incentive for participating in training the global model of the central server. Additionally, to mitigate the effect of data heterogeneity, they developed a new aggregating procedure based on the distance between positive and negative classes of client dataset. They tested the method on fault detection in air conditioning units and obtained interesting results.

Similarly, Zhang and Li (2021) developed an FL-based machine fault diagnostic method. Local edge devices are used to train the model. However, in each training cycle, a central server coordinates and receives local model parameters from each edge node. The server then uses a federating aggregation procedure to update the global model and send it back to the edge nodes for the next training epoch. Thus, the edge devices share model parameters but not data. At the end of training, each node would have the updated global model. The method was tested on bearing fault data using a CNN and the results proved to be competitive.

In a similar study, Zhang and Li (2022) proposed a federated learning-based industrial fault diagnostics. They state that traditional federated learning assumes that the data from different clients in the federation are independent and identically distributed (IID), which means that client data are obtained from the same machine operating in the same condition. They highlighted that this is not the case in practice. An FL model built on this assumption will not produce accurate predictions due to this domain-shift problem. Moreover, conventional transfer learning solutions to domain-shift may not work because it relies on data availability in all domains (clients), which can practically be challenging, and defeats FL’s data privacy aims. Thus, they introduced a new method to solve this problem. The method uses prior distributions to help in the training. This prior distribution is simulated from client data and used in the domain adaptation rather than the raw client data. The ML model used was the basic CNN model, and they tested the method on the two machinery fault diagnostics to prove its power. The only downside of the paper is that the basic CNN was not compared with other ML models.

In Yang et al. (2021), a federated transfer learning based on averaging shared layers (FTL-ASL) was developed. The technique aimed to solve the data imbalance and the domain-shift problems. Clients in the federation have both shared and personalized parameters. Initially, a related large public data set is used to train the CNN model within the server. The pre-trained model parameters obtained are then sent to each client. Thus, client training does not start from scratch. In each federated learning iteration, the local clients update both the shared and their personal parameters using local data. Next, the clients send only the updated shared parameters back to the server for aggregation. The process keeps repeating until the global model has converged. The technique was tested on bearing data to prove its effectiveness.

In addition, Chen et al. (2022) explained that one of the shortcomings of the federated averaging (FedAvg) algorithm is that it assumes that clients in the federation have equal contributions to the global model. However, due to domain-shift explained earlier, some clients are more important than others in most practical fault diagnostic tasks. Hence, they developed a new FL aggregation method called discrepancy-based weighted federated averaging (D-WFA). The averaging method gives larger weights to clients contributing more to the global model. The weights assigned are based on the maximum mean discrepancy (MMD) between the source and target clients. The AI model used was a feature-aligned multiscale convolutional neural network (MSCNN-FA), and it was tested on a bearing data set. The technique was found to outperform other existing aggregation methods.

Furthermore, Wang et al. (2022) developed a federated transfer learning (FTL) technique for the intelligent diagnosis of insulation defects in gas-insulated switch gears. The FTL technique involves an adversarial learning that attains domain adaptation while maintaining data privacy of clients within the federation. Additionally, they developed a novel aggregation algorithm called federated minimax (FedMM) that minimizes gradient drift due to data imbalance while improving the global model’s accuracy. The model used is a four-layered second-order attention CNN. Finally, the method outperformed other FL techniques.

In related work, Li et al. (2022) developed a technique called clustering federated learning (CFL) to diagnose industrial bearings faults. The method uses a self-attention network rather than CNN. The self-attention model offers the advantage of directly extracting global and local features within the data, thus obtaining better generalization over CNN, which relies on only local features. In addition, to improve performance, K-means unsupervised learning is used to cluster client features before they are aggregated at the server. The aggregation is then carried out within each cluster. The CFL was found to be better than other aggregation methods and other models like the LSTM.

In a recent work, Li et al. (2023) proposed a new CNN-based federated transfer learning method for detecting faults in rotating machines. The approach involves two stages. In the source training stage, they generated fake data through data time-stretching. The fake data creates additional health states (labels) at the source clients to reduce decision boundaries and thus help in the subsequent domain adaptation. At the target adaptation phase, they created a prediction alignment technique that achieves knowledge transfer without requiring the source data. Moreover, they introduced an instance-level consensus scheme that introduces noise to the target data to avoid over-fitting.

In Chen et al. (2023), a new bearing RUL prediction approach based on federated learning (FL) and Taylor-expansion network pruning is proposed. The model used is the multiscale convolutional neural network with a longish full connection in the first layer (LFMCNN). The model has three units: the multiscale feature augmentation module (MFAM), the deep feature extraction module (DFEM), and the prediction module (PM). The MFAM is utilized to expand shallow features from the data stored in each client. Subsequently, the Taylor-expansion pruning criterion is used by the DFEM to delete unnecessary network nodes. In addition, each client uses its local data to reconstruct the pruned model. Next, the server uses the federated averaging (FedAvg) technique to aggregate all rebirth models into a new global model. Network pruning and rebirth occur alternatively during the model training phase to develop a compact structure. The experimental results show that the proposed strategy offers a promising solution to prognostic difficulties in data privacy contexts. Moreover, when the model was deployed on an embedded Raspberry Pi board, it produced short inference times, proving its industrial applicability.

Additionally, Wang et al. (2023) developed another CNN-based federated transfer learning approach for fault detection in rotating machines. The method consists of three parts. First, a filter is used at the target clients to remove low-quality knowledge data from the training. Secondly, they employ batch normalized maximum mean discrepancy (BN-MMD)-based loss function during training at the target clients to reduce the domain gap between the source and target. Finally, instead of the often-used FedAvg aggregation at the central server, they used an adaptive aggregation process that assigns client weights based on their contributions. They tested the method on three benchmark datasets, and the results obtained showed its superiority.

In Du et al. (2023) proposed a transformer-based RUL predictor and tested it for the remaining useful life prediction of aircraft engines. They also developed an FL version of the technique where client data privacy is maintained. Instead of using the full structure of the transformer model, only the encoder part is used to make it light. Furthermore, they employed Bayesian optimization in selecting the hyperparameters of the encoder. The FL version was deployed on four NVIDIA Jetson Nano B01 boards. While the method outperformed many other RUL techniques, the only drawback is that, in the FL, the model starts with hyperparameters optimized with combined client data, which can lead to privacy violations.

Li and Zhao (2023) developed a zero-shot learning-based federated learning for fault diagnostics. They explain that most existing transfer learning-based FL methods assume a closed fault set, i.e., faults unavailable in one client can be found in others. This is not always true in practice, as some faults are rare, and others never exist before inference. Zero-shot learning (Feng and Zhao 2020) solves this issue by using data from seen faults and their semantic description to learn characteristics of unseen faults. The model used is the variational autoencoder (VAE). The results of testing the approach on thermal power plants proved its efficiency.

Other non-FL methods that experiment with training on the edge layer include Natesha and Guddeti (2021), which proposed an edge computing condition-based maintenance procedure. In the method, machine operation sounds are collected using audio sensors and are sent to the edge to train a machine learning binary classifier to categorize the sounds into normal or abnormal conditions. The machine learning methods used are random forest, multilayer perceptron (MLP), logistic regression, AdaBoost, and support vector machine (SVM). Both training and deployment are performed on the fog server with Intel core i5. They tested the technique on the fault identification in a pump, valve, and a fan. It was discovered that on average the AdaBoost and MLP were the best. Moreover, they compared the method with a cloud-based architecture, where the training and model deployment is done in the remote cloud. They found that their edge-based method had a way faster response time.

Similarly, Dang et al. (2021) developed a method called cloud-based digital twin for structural health monitoring (cDTSHM). To develop the digital twin of the physical structure in the cloud, they used an ensemble of a mathematical model, finite element model (FEM) and deep learning (DL). The DL method used was the ResNet-34 (Gao et al. 2021) and 1D-CNN. Both DL models rely on collected data from sensors and synthetic data generated from the FEM. Moreover, the paper highlighted that data preprocessing and training is conducted on the edge and then the model is deployed in the cloud. This contrasts with most surveyed works that do preprocessing and inference on the edge and model training in the cloud. Furthermore they incorporated a web-based application for visualization. The method was demonstrated for monitoring the health of an experimental model of bridge and a real one with an accuracy of 92%. One downside of the research is it does not specify the type of edge device used.

Another reason to opt for training AI on the edge is when the model employed is lightweight and does not require heavy computations. For example, in Oyekanlu (2017), Onyekalu developed an edge computing-based condition monitoring system for industrial internet of thing (IIoT) applications. The system employs a lightweight SQLite database at the network’s edge to collect sensor signals from machines. In the beginning, the SQLite database is populated with healthy machine data. After deployment, any incoming data to the edge is compared to the reference healthy data using magnitude and frequency analysis statistical tools. If the difference between the two signals hits certain preset thresholds, an alarm is sent to the cloud. The paper tested the method on electric motors, and a significant reduction in bandwidth usage was obtained. Similarly, in Huo et al. (2019), an edge computing-based power distribution fault detection is proposed. The method uses a wavelet transform on the edge of the network to detect faults. Additionally, Short and Twiddle (2019) proposed an edge computing-based condition monitoring and fault detection for pumps. The condition monitor and fault detection are implemented on the microcontroller edge device which hosts a digital twin of the system. Test results show that the method is robust and inexpensive.

3.3 Model training on both cloud and edge

Here, some works that do not fit into the classification above are presented. For example, in some new works, model training occurs on both the edge and cloud layers.

For instance, in Jing et al. (2022), a cloud-edge collaborative framework for remaining useful life prediction was proposed. The method consists of two distinct blueprint separable convolution neural networks (BSCNNs) (Haase and Amthor 2020), one at the edge and another at the cloud. They both cooperate with each other for enhanced prediction. The edge BSCNN is a shallow network designed for fast predictions, while the cloud BSCNN model is a deep network that provides more accurate results. To collaborate with each other, they share lower layer parameters. At first, both models are initialized, and the train data is processed (at the edge) and saved in the cloud layer. This data is used to train the cloud BSCNN model, and its trained lower layer parameters are then sent to the lower layers of the edge BSCNN model. While keeping the lower layers fixed, the edge model then also trains its deeper layers using the train data from the cloud. For inference, when raw data is obtained from the machine, it is preprocessed and denoised at the edge layer. The data is then fed to the edge BSCNN model for rapid RUL prediction. This same processed data is also sent to the cloud model for RUL prediction and both data and prediction results are stored. Since it is common for models to get stale, partly due to data-drift, the edge model needs to be updated after a while. After a certain amount of real-time data has been collected, the cloud BSCNN then shares its lower parameters, stored prediction data, and saved processed data again to the edge model for retraining (update). As with the initial stage, this update only affects the deeper layers of the edge BSCNN model. With an edge server hosting, Intel(R) Core(TM) i7-8700 CPU @3.20 GHz, 8 GB RAM, and NVIDIA Quadro P400 GPU, the framework was tested on the RUL prediction of turbofan engines (Bala et al. 2020) and compared with other methods (Li et al. 2018; Pillai and Vadakkepat 2021). The proposed technique produced faster and more accurate RUL predictions.

Similarly, the first federated learning-based RUL prediction method was developed by Guo et al. (2022). In the technique, several edge clients are used to train a global encoder and an RUL predictor without sharing data. Each edge device contains a convolutional autoencoder (CAE) Chen et al. (2017) with an encoder and a decoder. On the other hand, the cloud server contains a similar encoder and the RUL predictor. At the beginning of a training epoch, each edge CAE is trained on its local data. The trained encoders are then sent over to the cloud server for aggregation to the global encoder. The aggregation is weighted-based, with the weights depending on the performance of each encoder on validation data within the cloud. Subsequently, the global encoder is sent to each edge client to extract low-level features from their data. These features and their labels are then uploaded to the cloud and used to train the global RUL predictor. Thus, training of the RUL predictor occurs on the server. The process is repeated till the global encoder and RUL predictor are fully trained. Finally, the global encoder and RUL predictor are deployed to each client. For inference, the global encoder first extracts low-level features and sends them to the trained RUL predictor for prediction. They tested the method on the RUL prediction of a milling cutter and bearing dataset. The only drawback of the approach is that by sharing their labels with the server during training, the privacy conditions of FL may be breached.

In addition, Yu et al. (2023) developed a new federated learning-based fault diagnostics model using convolutional autoencoders CAEs. They stated that in other existing Fl methods, all the training occurs in the clients, while the server is only responsible for aggregation. This approach fails to utilize the computational capacity of the server. Hence, in their technique, a global fault diagnosis classifier (GFDC) is trained within the server. In summary, the method is as follows. First, CAEs at the client train on local data. Then, client parameters are sent to the server for aggregation using an adaptive weighting method similar to Zhang and Li (2021). The aggregated CAE is then downloaded to each client and again fed with local data for feature extraction. These features and their labels are then uploaded to the server to train the GFDC. This entire process is repeated till the GFDC produces acceptable predictions on test data. After that, the GFDC and the global CAE are deployed to the clients. Not only does their method utilize the server’s computational ability, they also have limited use of bandwidth since the large GFDC does not need to be downloaded to each client at each training epoch. They tested the method on two bearing datasets to prove its effectiveness. The only loophole of the technique is that the features from clients and labels sent to the server could be used to recreate client data, defeating FL’s data privacy aims. However, with encryption, this issue may be solved.

3.4 Summary

Table 4, summarizes the works surveyed in this paper. It gives the type of maintenance implemented, the edge device used, the algorithm employed, as well as the application the method was tested on. An “x” is placed in a table cell where the researchers did not specify the entry. From the table, it can be observed that most researchers perform condition-based maintenance and the most used edge device is the Raspberry Pi. Moreover, the most prevalent model used is the CNN.

Table 4 Research papers that employ edge computing for maintenance

4 Trends, challenges, and future research directions

This section analyzes the patterns observed in the surveyed research papers. It also delves into the prominent challenges of deploying AI on edge devices. Furthermore, it highlights research areas that remain underexplored or have not received comprehensive investigation.

4.1 Trends

From the reviewed papers above, the following patterns can be deduced.

  1. 1.

    First, most papers perform data prepossessing on edge devices. This is because most tasks like data cleaning, data reductions, and feature extraction are not computationally expensive and thus can be performed on the edge.

  2. 2.

    Secondly, researchers perform model training on the remote cloud. The reason is that the training, especially those involving deep learning, require high computations that are better handled on the high-performance machines at the cloud datacenter.

  3. 3.

    The trained models are mostly deployed on the edge to make the inference as swift as possible.

  4. 4.

    Another trend discovered is that most of the works tested their methods on existing benchmark datasets. This can be a double-edged sword. On one hand, it shows that their algorithms are able to perform well on difficult benchmark datasets. However, it does not explain how their techniques would be applied on machinery in which there is little or no available data or how they will handle concept and data drifts.

4.2 Challenges of placing AI models on edge devices

This section highlights the challenges and adjustments needed before machine learning models are successfully deployed on edge devices.

  1. 1.

    Limited computing resources: Edge devices, like IoT devices or smartphones, typically have limited processing power, memory, and storage. Thus, most AI models need to be optimized and lightweight to run efficiently in such resource-constrained environments (Zhu et al. 2020). This becomes a huge challenge since most existing models were developed with high-performance computers, such as clouds in mind. Hence, many researchers have proposed lightweight versions of these models that could be placed on these edge devices. For instance, Nikouei et al. (2018) leverage the depthwise separable convolutional network to develop a lightweight CNN algorithm for real-time human detection in video frames. Other works that proposed lightweight models for edge devices include Almeida et al. (2022), and Huang et al. (2020).

  2. 2.

    Power Conservation: Power is another scarce resource within edge devices, especially when they are battery-powered. Thus, to conserve power, appropriate model architecture and training algorithms should be selected. Other energy-saving approaches include quantization, where the model weights and activations are represented in lower precision floating point (Coelho et al. 2021). Further methods are sparse computation, pruning of model connections (Li et al. 2020), and model distillation (Jang et al. 2020). In federated learning, methods that can preserve energy include reducing communication frequency between clients and the central server, selective participation, offloading heavy computations to the server, and hardware optimization. In some cases, researchers may consider harvesting energy from the environment (like using solar cells) to charge the batteries of end devices (Shen et al. 2022).

  3. 3.

    Data privacy and security:  Since edge devices often process sensitive data, protecting user information and ensuring data integrity is paramount. To achieve this, encryption, access control, and authentication mechanisms should be employed to protect sensitive data and ensure data integrity. In addition, secure communication protocols and data minimization techniques can further safeguard privacy. Moreover, techniques like federated learning and anonymization can also preserve user privacy during model training and data processing. Lastly, it is essential not to overlook ethical considerations and transparency in data usage to foster trust among users.

  4. 4.

    Offline capability: This refers to the ability of AI models deployed on edge devices to operate without continuous internet connectivity. Such models perform computations and decision-making processes locally on the device, avoiding reliance on cloud services. They are pre-trained and do not require constant access to new data. One way of realizing that is by using caching and local computation to enhance offline functionality while employing optimizations like model quantization and compression to reduce data and model size. The offline model can then be synchronized with cloud services during periodic internet access.

  5. 5.

    Edge device heterogeneity: When deploying machine learning models at the edge, it is crucial to consider edge device heterogeneity. This refers to the diversity of hardware, processing capabilities, memory, and connectivity options among the various edge devices in an edge computing network. Edge devices can vary significantly in processing power, memory constraints, and connectivity, which impacts the feasibility and efficiency of deploying machine learning models on them. Some edge devices might have limited processing power and memory, necessitating model optimization and quantization, while others may have custom hardware accelerators that require specialized model adaptations. One solution to this is adaptive model selection (Marco et al. 2020; Taylor et al. 2018). It involves developing multiple model variants tailored to different edge device categories based on their processing power and memory capabilities. Each edge device’s resource profile is assessed during deployment, and the most appropriate model variant is selected and deployed to optimize resource efficiency, reduce latency, and enhance accuracy. This approach ensures that models match the capabilities of each device, leading to improved performance, scalability, and adaptability as new edge devices are introduced or old ones evolve.

4.3 Future research directions

The following are interesting challenges that few researchers have delved into.

  1. 1.

    Data availability: One measure issue with prognostic health management (PHM), which is a family of data-driven methods used to monitor the health of machines, is still data availability. Although there is a plethora of benchmark data that can be used to test new techniques, there is the problem of how the trained model can be used in other machinery with little or no available data. One interesting research is to see how the trained parameters on benchmark data can be fine-tuned to a new machine. This would be like what has transformed computer vision research, called transfer learning, where the parameters trained on benchmark images are used as the initial parameters for new image sets. Although a few works have been done in this regards (Guo et al. 2018; Shao et al. 2018; Han et al. 2019; Li et al. 2019), the area still remains a gold mine for more research.

  2. 2.

    Creating synthetic data: An additional technique for solving data shortage or imbalance is the creation of artificial data. One way of doing this is from the mathematical model of the machine or after the development of its digital twin in the cloud. The generation of this digital twin for new machinery and the careful generation of balanced synthetic fault data for model training can be a fascinating research area. Moreover, recently there have been works that use generative adversarial networks (GANs) to either generate synthetic data or perform data augmentation (Gao et al. 2020; Zhang and Li 2021; Liu et al. 2022).

  3. 3.

    Model update: Another interesting research area is how an existing trained model can be retrained. This is because as machines age, the model trained for its early life may not still be applicable (concept drift). Another cause of model mismatch may be changes in the operation or even the environment in which the machine functions (data drift). Thus, where there is a need to update the model, would the model be updated based on usage (time-triggered) or would it be event-triggered? For event-triggered update methods, there is a need to develop light weight algorithms that can detect model mismatch. Also, where should the model update process take place? on the cloud or the edge? It would be interesting if edge devices can perform model updates and then later sync with the cloud.

  4. 4.

    Edge communication protocols: One significant challenge in deploying AI on edge devices is that many of these devices are powered by batteries, which are often limited in capacity and require efficient power management. To ensure optimal battery life, communication protocols between sensors and edge devices should be as lightweight as possible, minimizing energy consumption during data transmission. Numerous wireless and wired communication protocols exist, each with its strengths and weaknesses. However, with the rise of edge computing and the need for energy-efficient communication, there is a growing interest in research and development of new, robust protocols tailored specifically for edge devices. These protocols should prioritize energy efficiency and responsiveness, considering the constraints imposed by limited battery power. Another interesting area of research lies in the fusion of existing communication protocols into stronger hybrids. With this, researchers can combine the best features of multiple protocols to create new solutions that offer a balance between performance, reliability, and power efficiency. Hybrid protocols can leverage the strengths of different communication methods, such as Wi-Fi, Bluetooth, Zigbee, LoRa, or cellular networks, to achieve optimal performance in diverse edge computing environments.

  5. 5.

    Light-weight algorithms: It is true that certain problems require complex models for them to be solved. However, it is human nature to be amazed by the complex, a fallacy known as complexity bias. This makes humans think subconsciously that the complex is always better. This has flooded AI research, and journal reviewers are always looking to accept the next complex “novel” method. This has almost forced authors to write complicated methods or even cocktails known as “ensembles” and “hybrids,” involving intricate math and jargon that only the first author of the paper can fully understand. Most researchers fail to try out simpler models, which often perform better, but instead jump to the complex. Authors of this paper have a different opinion, complex is not always better, and “simplicity is the ultimate sophistication.” If you go down history, it is the simple machines, such as the AK-47 rifle and the internal combustion engine that have endured the test of time. Complexity should be added only when necessary, as it seems that the overly complex only remains in theory. Thus, to fully democratize AI and move them beyond the research papers they are written on, researchers should place emphasis on simple and light methods that work, and hopefully, reviewers will understand. Thus, the creation of these simple models would enhance the deployment of edge AI in PHM. Due to their sizes and location, edge devices are known to be resource constraint in terms of memory, computation, energy, and bandwidth. Thus, as AI keeps moving closer to the edge, researchers need to develop light-weight versions of algorithms that can be hosted on these constraint edge devices and still provide real-time results. This has been a less explored area. One application is light-weight algorithms for preprocessing and summarizing big data (Rani et al. 2018; Xu et al. 2019; Azar et al. 2019). Another area of future research is light weight models, such as just another network (JANET) Van Der Westhuizen and Lasenby (2018) which is a light version of LSTM, lightweight recurrent neural network (LLRNN) Liu et al. (2019) and other tiny machine learning tinyML methods (Dutta and Bharali 2021; Signoretti et al. 2021; Asutkar et al. 2022). An additional less ventured field is model initialization and hyperparameter optimization using light weight algorithms such as light and faster versions of heuristics and metaheuristics (Palani et al. 2019).

4.4 Limitations of AI-based maintenance systems

Despite the enormous potential of AI in the maintenance of industrial equipment mentioned in the pages above, it has limitations. Although some of these have been stated as research gaps in Sect. 4.3, they can be the hurdles that may make AI-based maintenance fail. These issues include data quality and availability, making it hard for the models to learn accurately. Additionally, understanding how AI arrives at decisions can be difficult, as it often operates like a black box. This can be a significant issue since most stakeholders want to know how AI arrives at its decisions (explainable AI). Moreover, scaling across different systems and integrating with existing legacy ones can be challenging. There are also concerns about biases creeping into the algorithms, leading to unfair or discriminatory responses. In addition, ensuring the security and privacy of sensitive data can be daunting. Furthermore, getting humans and AI to work collaboratively can pose a significant challenge. To overcome these obstacles and make AI-based maintenance systems hassle-free, a collaborative effort is required that involves combining expertise in AI, data management, ethics, and human–machine interaction.

5 Conclusions

This paper reviewed studies that employ artificial intelligence (AI) with edge computing for fault detection, diagnosis, and prognosis of industrial machines. The works were classified into three based on where they perform model training. The first is the traditional approach. Most of the reviewed works fall into this class. Here, the model training occurs exclusively in the cloud, and subsequently, the trained model is deployed on the edge. This happens to be the most logical way to go about it since the cloud has huge computing resources suited for training. In the second class, the model is trained within the edge layer. This is done partly due to data privacy, and local data is not sent to the cloud. The cloud only serves as a coordinator for the training process. In the final category, both the cloud and edge participate in training the model.

In addition, the paper also presented the most common challenges faced when placing AI models on edge devices. Finally, the paper presented future research directions. These areas include the generation of synthetic data, application of transfer learning, development of lightweight algorithms for training and hyperparameter optimization, and proposing new and secure communication protocols for edge devices.