1 Introduction

Artificial intelligence (AI) is now at the forefront in the pursuit of industry 4.0. Over the past several years, the accumulation of big data via Internet-of-Things (IoT) technology has led to the rapid growth of information retrieval and analysis techniques such as AI. Such advancement in ways to deal with a large amount of data is about to revolutionize many manufacturing industry sectors, and it is the driving force behind the foundation of smart factories where everything is conducted intelligently and in an automated fashion during every cycle of the manufacturing process.

Industrial AI is a term coined to specifically refer to AI for the particular goals in the manufacturing industry. Industrial AI covers a wide range of machine learning where the keys to success are pattern recognition for highly nonlinear data, unstructured data analysis, robustness to repetitive tasks, fast computation speed, and high interpretability. Out of these industrial AI traits, recognizing a highly nonlinear pattern is essential, particularly because the relationship between input parameters and output parameters is only partially understood under simplified conditions. It is sometimes even unknown due to extremely high nonlinear correlations. To dispel the concerns, deep learning, a part of machine learning, is beginning to replace traditional data analysis techniques. Recently, the power of deep learning is already well known to the public. It not only captures complex patterns in train data, but it also can recognize various types of unstructured data, hence its tremendous success in object detection, natural language processing, speech recognition, and realistic image synthesis. Despite its drawbacks associated with interpretability and extrapolability, its potential is nearly limitless as its performance depends largely on the amount and quality of data and the design of its architecture. As such, it is widely studied worldwide with a huge amount of investment from both the governments and firms.

Unfortunately, it still faces much reluctance when adopting it directly at manufacturing sites. One reason behind it could be that there is a lack of awareness of where and how it should be incorporated in the manufacturing pipeline and a few of its unsolved issues, as mentioned before, making it less trustworthy. This review aims to raise the awareness of possible AI applications by providing an extensive overview of its usage in various industrial sectors but only for limited objectives (i.e., product enhancement and manufacturing process enhancement) due to overly broad applicability. We hope this review paper would contribute to an even wider expansion of AI implementation in the industries. For clarity, details on machine learning [1], deep learning [2], and its sub-branches [3,4,5,6,7,8] should be referred to the attached references. The rest of this paper has the following sections remaining. Section 2 provides an extensive literature survey with four subsections that introduce various AI applied to improve the performance of specific products. Section 3 contains an overview of literature with two subsections, each of which talks about AI applications in the course of the manufacturing process. Finally, Sect. 4 concludes this overview with a summary and a brief insight into the future of AI.

2 Applications of AI for Product Enhancement

2.1 Autonomous Vehicle

Autonomous driving (AD) is a thriving field of study where AI is actively taking part. The main objectives of AD consist of road detection, lane detection, vehicle detection, pedestrian detection, drowsiness detection, collision avoidance, and traffic sign detection [9]. These tasks mainly involve image-based object detection, localization, and segmentation in the context of computer science, and they are enhanced through the use of multiple sensors and appropriately fusing collected data from them. Sensor fusion is one of the vital aspects of self-driving cars. All of the detection schemes mentioned earlier could be useless and far from reality if there is a substantial error in the sensor signal. Despite its remarkable development in recent years, sensors are still vulnerable to noise and manufacturing defects. One practical solution to this issue is to merge multiple sensor readings to increase reliability by complementing the shortcomings of each sensor.

In particular, sensors in AD are used for two main purposes: environmental perception and localization. While environmental perception refers to various object detection types (i.e., road detection and pedestrian detection) in a self-driving scenario, localization is meant by finding the absolute and relative positions of a driving vehicle. Different combinations of sensors are frequently used for each purpose, leading to the fusing of more than one sensor as input to deep neural networks. For example, LiDAR, radars, thermal cameras, and RGB cameras are the common choices for environmental perception. In contrast, for localization, inertial measurement units (IMU), inertial navigation systems (INS), LiDAR, global navigation satellite systems (GNSS) are selected [10]. Moreover, multiple sensor readings are fused and then merged with a deep learning pipeline at different levels. A more detailed explanation of the possible routes of integration can be found in [10]. While perception and localization may be challenging issues if readings from multiple sensors are mixed, deep learning is adopted to break the barrier. Here, we first introduce road detection and pedestrian detection as representative examples of the general image-based environmental perception using deep learning, followed by the cases of its applications based on multiple sensor fusion. Likewise, cases of localization is explained in detail afterwards.

Road detection is the task of distinguishing the boundary between the road and the background. Limmer et al. [11] showed a CNN-based road course prediction system for augmented reality applications. The proposed framework includes a multiscale CNN that receives multiple scales of the same input data simultaneously. Each scale of data is fed to the corresponding branch of the network. The branches do not share any weights, and they are joined at the end by a fully connected layer for scene labeling. It was shown that the approach performed well even for various weather conditions. Besides, Cheng et al. [12] presented a cascaded end-to-end convolutional neural network (CasNet) for two tasks: road detection and road centerline extraction. The novelty of the proposed model lies in how a cascaded network is used to bridge two tasks together. Specifically, the first network has the form of an autoencoder for road detection. Feature maps generated at the last deconvolution layer in the first network are fed to the encoder network of the second network for the centerline extraction. In this way, two tasks are solved concurrently through an end-to-end fashion. A few other studies that tackle the road detection task uses a fully convolutional Siamese network [13] and a specifically designed neural network (RBNet) [14], respectively.

Pedestrian detection refers to the task of differentiating pedestrians on roads from other objects. Special attention is given to this task due to its importance for preventing vehicle-to-human accidents. For this specific task, Wang et al. [15] proposed the part and context network (PCN) that leverages the body part semantic information and the contextual information. The part branch, designed for occlusion handling, uses the long short-term memory (LSTM) module to communicate semantic information among the body parts. It was shown to boost the classification performance even for invisible parts. In the case of the context branch, contextual features with different scales are handled for pedestrian localization. They are important because they are the source of information for whether an object may be classified as a pedestrian or others by considering its background from different perspectives. In other studies regarding pedestrian detection, Ouyang et al. [16] demonstrated that pedestrian detection could be enhanced by the joint handling of feature extraction, deformation, occlusion, and classification using a simple CNN. Cai et al. [17] investigated the complexity-aware cascaded network, which leverages features of different complexities.

As for the cases of sensor fusion in environmental perception, the frequent baseline deep neural networks turned out to be R-CNN, Faster R-CNN, YOLO, and etc. These networks are specifically designed for solving tasks involving object detection, localization, and segmentation. Wagner et al. [18] compared the two types of sensor fusion (early fusion and late fusion) of RGB and thermal cameras. For an early fusion, the images of both cameras are concatenated side by side as channels and then are fed to R-CNN [19]. For a late fusion, two separate networks are built for each input and are joined by a fully connected network right before the classifier at the end. The addition of thermal images is shown to solve the issue of low prediction accuracy in the nighttime. On the other hand, Schlosser et al. [20] used LiDAR and RGB camera and performed an early fusion where the features representing different aspects of 3D scene were extracted from the LiDAR output and were used as additional image channels to be fed to R-CNN. Other studies which fuse features from LiDAR and visible images include [21, 22]. For these studies, YOLO [23] was used for faster computation, and sensor fusion with LiDAR usually boosted discriminative performance for detecting pedestrians and objects on roads. Liu et al. [24] demonstrated that RGB and thermal images could provide complementary information in detecting pedestrians by building four different fusion architectures based on faster R-CNN [25]. The author shows that the halfway fusion where features extracted from both types of images are fused in halfway through the network achieves the best performance. A similar flow of work was presented in [26] but with a few additional fusion architectures.

Localization is another important area of sensor fusion in which deep learning is widely adopted. However, it normally requires different sensor fusions, as mentioned earlier. Multiple combinations of sensors are effective for the task: GNSS / INS or IMU and RGB / LiDAR. Firstly, Dai et al. [27] recently introduced the deep learning-based integrated framework of GNSS and INS where the inaccurate GNSS signal is enhanced with the output of RNN fed with time-varying INS signal. Kim et al. [28] also used a type of RNN, LSTM for the localization of a vehicle, but the work differs from the previous one because both GNSS and IMU are fed to the network simultaneously. Secondly, a sensor fusion of RGB images and LiDAR point clouds for improved localization by accurate depth estimation has been performed by Gao et al. [29]. In the study, LiDAR point clouds images are projected to RGB images to create sparse depth images given as input to AlexNet [30]. Similar works have been presented by Laidlow et al. [31] that fuses depth predictions of multi-view stereo system with CNN output, and by Lee et al. [32] that places RNN in addition to the existing CNN model. Figure 1 summarizes the aforementioned baseline deep learning models used in the surveyed literature where different pairs of sensors are fused.

Fig. 1
figure 1

Baseline deep learning models found in the literature where different pairs of sensors are fused for improved perception and localization

In conclusion, the applications of AI in autonomous vehicle is mainly driven by image-based deep learning models such as CNN because autonomous vehicles generally collect information through sensors that characterize retrieved data into a 2-dimensional form that maintains rich spatial information. To improve the related studies in literature, it is necessary to have additional functions on top of the existing models that help capture the dynamics (i.e., temporal information) and complex patterns. Further research needs to be done to incorporate multiple sensor information into deep learning to allow for safer and fully automated driving on roads. The reviewed studies in this section are summarized in Table 1 with respect to some useful information including data type, data publicity/openness, investigated task and exploited algorithm or model.

Table 1 An overview of the surveyed literature regarding the autonomous vehicle

2.2 Battery

Recently, secondary battery has gained considerable interest worldwide due to its rising demand for electric vehicles (EVs) and hybrid electric vehicles (HEVs). One of the most commonly adopted secondary batteries for such vehicles is the lithium-ion battery because of its high power density, long battery life, high durability, low self-discharge rate, and fast charge rate compared to other types of secondary cells. However, its application to EVs and HEVs, which are exposed to extremely harsh conditions (in the perspective of battery usage) such as cold weather and long driving range as well as repetitive charge and discharge situations, constantly demands better batteries that have higher charge capacity, durability, cycle life, and faster charge rate than those of the existing batteries. These desirable traits are achievable not just by selecting the best type of materials (i.e., lithium iron phosphate and graphite) but also by optimizing the internal structure of batteries (i.e., separator thickness) and continuously monitoring the battery state to manage/control the external parameters (i.e., temperature) accordingly so that batteries are less exposed to unwilling conditions (i.e., 0% state-of-charge). The former is performed ex-situ by predicting in advance the battery behavior under specific conditions, while the latter is conducted in-situ by the battery management system (BMS) contained in a battery pack for EVs and HEVs.

Two significant parameters in the battery must be tracked for battery health monitoring: state-of-charge (SOC) and state-of-health (SOH). SOC denotes the ratio of the current capacity to the fully charged capacity. On the other hand, SOH is the ratio of the fully charged capacity to the nominal capacity of the battery. The main difference can be intuitively understood such that while SOC can always be from 0 (fully discharged) to 1 (fully charged), SOH is 1 only at the time of manufacture and becomes 0.8 at its end-of-life (EOL). By being able to track the parameters correctly, one can design and control batteries much more effectively. Although the conventional ways of estimating SOC and SOH typically make use of several model-based and physics-based methods, a substantial amount of work is being made towards data-driven methods nowadays because of the incompleteness of the conventional methods. As for data-driven models, machine learning has been proven to demonstrate high accuracy, low computation, and the ability to learn from historical data, implicating its high feasibility to substitute the model-based and physics-based methods in the near future. The rest of this section describes various machine learning models that estimate SOC and SOH.

SOC estimation by BMS that leverages data-driven methods has been accomplished fairly well, with test errors that range from 0.6 to 6.5% [33]. Hu et al. [34] presented SOC estimation using support vector regression (SVR) with a Gaussian kernel for which a double-step search discovers the optimal parameters. Since EVs run in diverse driving conditions in practice, parameters optimized using the available train data usually do not guarantee a good fit to unseen data unless the train data sufficiently reflects upon the real-world scenarios. The proposed methodology, however, exhibits a good generalization ability to test data under different operating conditions. Sahinoglu et al. [35] introduced a novel approach of a recurrent Gaussian process regression (GPR) in which SOC estimate from the previous time step is fed back to the model as part of the input vector. The probabilistic nature of GPR allows for the quantification of the confidence intervals over the estimates and also for the identification of influential variables on the output, which are practical advantages of GPR over other machine learning models. The proposed model outperforms other models such as SVR, relevance vector machine (RVM), and neural network (NN) in RMSE and MAE but falls short in terms of computation time. Neal et al. [36] used random forest, decision tree, and gradient boosted machine for SOC estimation of generated data using a physics-based simulation model. It is shown that they are generally capable of predicting the dynamics of the simulation model even though the computation times are much lower, implicating the high feasibility of machine learning as battery surrogate models.

Similarly, SOH estimation has been conducted using many machine learning approaches, and it turned out to be more challenging than SOC estimation. Nuhic et al. [37] presented SOH estimation on-board vehicles through SVR. The author creates an input training vector composed of operation history and Load Collective, a feature that reflects upon the change in environment, ambient, and load conditions. The train and test sets are split so that the test data contains information under a different driving profile to mimic the real-life scenario as much as possible. Guo et al. [38] suggested a Bayesian formulation for the modeling of capacity fade where the coefficients of a linear regression model were formulated as probability distributions (e.g., normal distribution) to account for random effects in cell-to-cell variations. As such, this study contributes to modeling the random effects inherent in between-battery variations, which were usually neglected in prior studies. Tseng et al. [39] stated that regression models that leverage fully discharge voltage and internal resistance as aging parameters could be more beneficial for SOH estimation than those with cycle numbers. The proposed regression model uses exponential terms with the aging parameters as input, and its coefficients are determined adaptively through particle swarm optimization. Khumprom et al. [40] demonstrated a deep neural network-based approach and compared the performance against formerly used machine learning algorithms, including linear regression, k-nearest neighbors, SVR, and NN. Similarly, Ren et al. [41] showed a deep neural network framework but it additionally comprises of multi-dimensional feature extraction step through an autoencoder model. The unsupervised way of feature extraction enables the exclusion of domain knowledge on the aging parameters. Severson et al. [42] used a simple neural network but with a different optimization scheme, namely, elastic net, which places an additional particular regularization term. Several features, including the variance of the voltage-to-capacity slope, are used as input features. This approach proved to be very effective, showing the state-of-the-art prognosis result with 9.1% test error at the first 100 cycle point on the provided open dataset.

To conclude, what mainly determined an accurate prognosis result on SOC and SOH estimation is not the machine learning approaches, but rather the aging parameters extracted based on authors' expert knowledge about the battery degradation. Furthermore, there is still room for improvement in the related studies since most of the experimental conditions are not fully representative of the real-world scenario, and the results for SOH and RUL estimation seem to be not accurate enough as of yet. It is advised to overcome these unsolved issues by tackling the hybrid approach that incorporates prior knowledge with more complex models in deep learning and reduces computational load. The reviewed studies in this section are summarized in Table 2.

Table 2 An overview of the surveyed literature regarding the battery

2.3 Robotics

Robotics is one area that has been stimulated by the rise of AI to be at the core of automation and intelligent manufacturing process because a robot is what physically realizes the automation by the programmed movement of objects. A robot, however, can imply a different form of itself depending on the objective for which it is used. For example, a robot at the assembly line of the automobile industry typically reminds the general public of a dynamic multi-joint arm, whereas one in the military may look like a four-legged animal that can freely run, jump, and crawl around on bumpy terrain. Although they may differ in their looks, there are characteristics that all robots should share in common. All robots should have the capabilities to overcome the following challenges in order to operate as semi or fully autonomous robots [43]:

  • Challenge 1 and 2: learn complex, high dimensional dynamics and recognize an object

  • Challenge 3: learn control policy in a dynamic environment

The listed challenges are generally confronted by robots in a scenario where they need to process and comprehend given signal such as image and GPS signal (Challenge 1), spot any objects of interest nearby (Challenge 2), and stop moving and find another way if faced by an obstacle (Challenge 3).

Understanding the complex meaning behind a spotted scene or an image requires a model to learn and find a hidden pattern or knowledge from a large dataset in a similar context. Mariolis et al. [44] demonstrated that a robot could recognize the category (shirt, pants, and towel) and pose of hung garments through deep CNN, which has previously been considered a very challenging problem in computer vision due to the intricate state space of such highly deformable objects. Gao et al. [45] fused two branches of CNN, a visual CNN and a haptic CNN, so that robots can get a better tactile understanding of an object, stressing the fact that humans also benefit from a cognitive pattern where both visual and haptic experiences of the physical world are involved for the understanding of things. To imitate such patterns, the branch of visual CNN is fed with real-world images of an object, while the haptic CNN branch is fed with signals of five types of physical quantities (e.g., fluid pressure and core temperature.). The proposed model shows a high classification accuracy of objects initially labeled as 24 different haptic adjectives (e.g., bumpy, soft, porous, compressible, sticky, and textured). Polydoros et al. [46] proved the superiority of deep learning models in the learning of inverse dynamics of a robotic manipulator. The study suggests replacing the conventional physics-based models that cannot cope with the change in robot structure and dynamic environments with the proposed model, which is largely a DNN with one hidden layer modeled as an RNN. Similarly, Lenz et al. [47] used a deep learning-based framework called DeepMPC to handle robotic food-cutting, wherein a deep recurrent model is devised to model a time-varying nonlinear dynamics involved in the task. The study shows that the model which implements recursive learning of internal representation as a new control signal is given constantly improves the prediction output, implying that deep learning can be very efficient in learning complex and high-dimensional dynamics.

In particular, reinforcement learning, a subfield of machine learning, is widely adopted for learning control policies (Challenge 3) in robotics. This is mainly because reinforcement learning involves a computational agent that makes decisions by trial and error, and it has been shown to be highly efficient in modeling human-like cognitive behavior in the real world. Lillicrap et al. [48] presented a model-free algorithm based on Deep Q-Network (DQN) that can operate over continuous action space. It is shown that even without getting full access to the dynamics of the domain, it outperforms the conventional planning algorithms with full access on more than 20 simulated physics tasks, including cart pole swing-up. This is made possible by adapting DQN to continuous domains by simply discretizing the action space. On the other hand, Levine et al. [49] developed end-to-end learning of control policies by a novel CNN architecture trained using a guided policy search method. The method is evaluated by learning control policies for several physics tasks, such as placing a coat hanger on a rack with a robot. Such tasks require object localization, tracking, and understanding of contact dynamics. It is demonstrated that rather than training the perception and control systems separately, it is better to do it in an end-to-end fashion. The reviewed studies in this section are summarized in Table 3.

Table 3 An overview of the surveyed literature regarding the robotics

2.4 Renewable energy

In this renewable energy section, wind energy and photovoltaic energy are thoroughly discussed. Nowadays, wind energy is one of the most important renewable energy sources. The market is growing and maturing itself, so it is necessary to improve operation stability, maintenance, and efficiency. Forecasting and monitoring energy production, fault detection and diagnosis (FDD), parameter optimization in wind energy systems are crucial. Here, we discuss FDD applied in the wind turbine (WT), which contains the following components: rotor, blade, gearbox, and bearing, as shown in Fig. 2 [50].

Fig. 2
figure 2

Components of a wind turbine

First, the WT component with a significant failure rate and downtime is the rotor blade. As the main fault, structural damage such as splitting, fatigue, wear, deterioration, deflection occurs [51]. Due to the prolonged maintenance time during a shutdown cycle, manual inspection, which is the most general method, accompanies tiresome human involvement. To avoid such an inefficient procedure, Reddy et al. [52]. suggested that an unmanned aerial vehicle (UAV) with optical cameras can monitor WT surface damage and noticeable visual features. By training a CNN with photos collected by UAVs and based on the intensity of the damage with feature recognition of a picture, structural health monitoring of WT blades can be conducted. Liu et al. [53] developed a stiffness prediction method for WT blades built on deep learning networks. To achieve the prediction using historical fatigue stiffness data, three training models, CNN, LSTM, and CNN-LSTM, are used. The results show that the models can learn features straight from raw stiffness data and estimate residual stiffness. Combining the strength of different single network models such as CNN and LSTM, CNN-LSTM is shown to complement the weakness of a single network. Kong et al. [54] showed that the health state of WTs was precisely defined by careful attention to the shift details on the spatial and temporal scale of the SCADA data. The author suggested a condition monitoring system of WTs based on spatiotemporal features fusion by CNN and the gated recurrent unit (GRU). The WT will be considered abnormal if the index obtained from the online SCADA data exceeds the established threshold for the successive time. For standard working environments, the index will alter in the range of the threshold.

As a vital component in WTs, due to its long downtimes and complicated maintenance procedures, the gearbox commonly undergoes multiple failures, such as bearing damage, tooth breakage, and gear crack resulting in high maintenance cost. Jiang et al. [55] introduced a new architecture, MSCNN, which acquires high-level, robust fault characteristics at various time scales directly by hierarchical learning from complex raw vibration signals in a parallel way. It substitutes for better robustness against noises of a wide range than the conventional CNN, heavily relying on the hand-crafted features, thus mainly reducing the demand for prior information and time-consuming signal processing steps. Therefore, due to its end-to-end feature learning capability, the proposed method can offer a valuable alternative as a general-purpose classification technique for intelligent fault diagnosis. Radford et al. [56] proposed deep convolutional generative adversarial networks (DCGANs) for health condition monitoring (HCM) in an unsupervised manner. In contrast with existing unsupervised models such as autoencoder (AE), Chen et al. [57] also utilized DCGAN for long-term accurate HCM of a WT bearing. In these monitoring frameworks, the critical obstacle is establishing a threshold for detecting different health conditions. Based on DCGAN, a self-setting threshold scheme is proposed to overcome the drawback. As DCGAN is adequately trained, a threshold for HCM can be automatically generated through its output. The threshold value can be automatically created by the output of the G network in the DCGAN model. To this end, the whole scheme creates a self-setting HCM threshold based on a DCGAN model to observe a WT bearing.

Photovoltaics (PV) can provide electric energy directly from solar energy, and it is one of the promising renewable energy technologies. By converting sunlight photons directly into electricity, PV cells make energy conversion. As seen in Fig. 3, PV cells are arranged either in series or in parallel to form PV modules. PV modules constitute PV panels, and several PV panels make up a PV array. This section primarily discusses the FDD of PV systems.

Fig. 3
figure 3

PV cell, module, panel, and array

Deitsch et al. [58] proposed the general training system for SVM and CNN for the automated identification of defects in a single PV cell image. Simonyan et al. [59] used deep convolutional networks consisting of up to 19 convolution layers (VGG-19) and stated that representation depth is beneficial for classification accuracy. Alcantarilla et al. [60] suggested a KAZE feature related to the analogy with nonlinear diffusion processes in the image domain. This method is for multiscale 2D feature detection and description in nonlinear scale-spaces. Deitsch et al. [58] suggested a fine-tuned regression CNN based on VGG-19, which is trained on enhanced module images. Both SVM and CNN classifiers fulfill equally well on monocrystalline and polycrystalline PV modules, with just a negligible advantage on average for the CNN. Cautiously built SVMs are trained on diverse features derived from PV cells EL images but can operate on random hardware. On the more inhomogeneous polycrystalline cells, however, the CNN classifier outperforms the SVM classifier by around 6% accuracy. Both automatic methodologies make constant, exceptionally accurate monitoring of PV cells feasible. For the diagnosis of different types of observable module defects, Li et al. [61] proposed an automated UAV-based deep-learning CNN inspection method in order to determine the operational status of PV modules. The principle of the technique is to obtain in-depth features from module images and conduct pattern recognition of defects. The alternating convolution and sub-sampling operations are first conducted at CNN, and then a generalized multi-layer network is eventually implemented. For the completely linked sheet, the output is flattened as a vector, and the softmax function is exploited to identify the gained features to several classes. The result supports its usefulness with high precision in diagnosing numerous types of PV modules' general defects, including dust shading, encapsulant delamination, gridline corrosion, snail trails, and yellowing.

For faults detection in PV panels, Herraiz et al. [62] suggested a novel approach that uses a UAV-embedded thermographic camera to detect hot spots and set their positions on PV panels. To produce a stable detection structure, two novel region-based convolutional neural networks (R-CNNs) are unified. The combination of thermography and telemetry data to respond to panel condition monitoring is the key contribution. The data is collected and then converted automatically, facilitating fault detection during the examination. To encourage the performance, durability, and protection of PV systems, automated FDD techniques for PV arrays are critical. Belaout et al. [63]. suggested a multiclass adaptive neuro-fuzzy classifier (MC-NFC) for automated PV array fault detection and classification, with more discriminative capability compared to an artificial neural network (ANN) classifier. Adopting space dimensionality reduction techniques provides the classifier with a clean way to select their inputs, strong classification precision, and lower space dimensionality characteristics to speed up the classification process. Chen et al. [64] presented a smart FDD method for PV arrays based on a newly designed deep residual network model trained by the algorithm of adaptive moment estimation. The proposed model can automatically extract features from raw current–voltage curves, atmospheric irradiance, and temperature and effectively boost efficiency with a deeper network. Based on the output I-V characteristic curves and input ambient condition details, the method can detect numerous types and levels of typical early PV array faults, including partial shading, loss, short circuit, and open circuit faults.

One of the major difficulties in PV solar power production is holding the designed PV systems running with the optimal operating performance. Harrou et al. [65] proposed a model-based anomaly detection method for tracking the DC side of PV systems and transient shading. To replicate the monitored photovoltaic array characteristics, a model based on the one-diode model with binary clustering algorithms for more accurate fault detection is set up. The residuals from the simulation model are then exposed to a one-class support vector machine (1-SVM) protocol for fault detection.

In this section, many deep learning methods for monitoring the system and detecting faults are discussed. Most studies seemed to be adaptable to only a small part of the system, so future works are needed to focus on big data from variable sensors. Some novel approaches have not yet been applied to the practical operation environments due to the limitation of large-scale, high-quality data, and intensely minimized images, and so on. The reviewed studies in this section are summarized in Table 4.

Table 4 An overview of the surveyed literature regarding the renewable energy

3 Applications of AI for Manufacturing Process Enhancement

3.1 Steel

Steel mills, also known as steelworks, are one of the most fundamental industries in the modern world, which specialize in steel production. In this section, AI applications in various steelmaking such as ironmaking, casting, rolling, and galvanizing are introduced. To achieve more sustainable production and environmentally beneficial methods, this steel section mostly discusses FDD, and comparative study of several techniques, modeling, and forecasting of production.

In ironmaking, a blast furnace (BF) is a key unit that consumes more than 70% of the energy in the whole steelmaking process. The ideal operation of the ironmaking method of BF relies greatly on the calculation of the molten iron quality (MIQ) indices. Zhou et al. [66] introduced a novel data-driven robust modeling process for the online estimation and control of multivariate MIQ indices. First, for the MIQ indices, a nonlinear autoregressive exogenous (NARX) model is built to fully capture the nonlinear dynamics of the BF method. A multi-task transfer learning is then suggested to develop a new multi-output least-squares support vector regression (M-LS-SVR) to learn the NARX model, given that the standard LS-SVR does not directly cope with the multi-output issue. It has been demonstrated that the evolved model not only provides operators with accurate MIQ information to make an effective decision for optimum manufacturing operations with good consistency, adaptability, and robustness but also helps to implement input management of the BF process.

For slag quality, tapping temperature, and hot metal quality, the silicon content of the hot metal is also a significant characterization parameter in ironmaking process. Han et al. [67] suggested a parallelization scheme to build an SVM solution algorithm under the Hadoop platform to enhance the SVM solution speed on large data sample sets. On the Hadoop platform, dynamic estimation of blast furnace Si content is achieved. The greatest benefit of this algorithm is that, by way of the structural risk minimization theory, it can prevent dimensionality disasters with kernel features and realize the optimum generalization efficiency of the algorithm. The algorithm is primarily applicable to small sample results.

Another important mechanism is to forecast hot metal temperature (HMT) in a BF to ensure that the ironmaking process runs smoothly. By comparing deep and shallow predictive approaches, the current period and multi-step-ahead HMT prognosis are given by Zhang et al. [68]. From the point of implementation to an industrial BF, three advanced deep predictive models, including DNN, LSTM, and CNN, and seven successful shallow predictive models including partial least squares (PLS), locally weighted (LW)-PLS, Gaussian process regression (GPR), support vector regression (SVR), random forest (RF), boosted regression trees (BRT), and shallow neural network (SNN) are studied. The results demonstrated that the shallow neural network is preferred for current time HMT prediction. Moreover, GPR and SVR are selected for multi-step-ahead HMT predictions. The findings of the experiment are that PLS is the simplest approach with the cheapest cost of calculation but with less competitive prediction precision. In comparison, it is more expensive to calculate LW-PLS. Other than that, SNN and DNN are considered to attain better prediction precision in forecasting current time HMT than other techniques. SNN is favored for current HMT prediction because DNN has an acute model complexity and calculation expense than SNN. GPR and SVR are particularly appropriate for HMT forecasts of one hour ahead and two hours ahead. In comparison, both the current period and multi-step-ahead HMT forecasts have been particularly inappropriate for LSTM and CNN.

Continuous casting is the procedure where molten steel is allowed to solidify. Such continuity of the process can save the cost of the casted steel. Moreover, carefully monitored and controlled casting can attain a high quality of steel casts. Early detection and prediction of the sticker, centerline segregation, mold level, mold breakout, and slab consistency are the main issues in continuous casting. Therefore, fault identification and prediction in continuous casting are studied as a second aspect of the steel industry application. For a better understanding of continuous casting, Fig. 4 is shown below.

Fig. 4
figure 4

Continuous casting process

The most costly and hazardous problem of continuous casting is the breakout, which involves the loss of processing time and substantial penalties for yield. The sticker, a part of a stranded shell, which adheres to a mold surface, is the common reason for the breakout. A temperature pattern in a mold heat map will detect stickers. By monitoring and analyzing the temperature data from the fiber optical sensors installed on a mold, Fasizullin et al. [69] presented a cyber-physical system that detects stickers. The author developed a special CNN, which identifies a sticker pattern and can be used as a full-fledged replacement or an assistant of the existing algorithm. Such an approach was implemented as the sticker detection system (SDS), a method when CNN works alone and the breakout prevention system (BPS) is idle. The BPS + SDS approach suggests that, after the sticker warning from BPS, SDS analyzes only suspicious circumstances. The study demonstrates that CNN decreases the number of false alarms of the current algorithm.

Decreasing centerline segregation of casting slabs in the continuous casting process is an important parameter for a better mechanical property. For early detection of the centerline segregation from operation input parameters measured in continuous cast steel slabs, Nieto et al. [70] showed a novel hybrid algorithm based on SVM combined with the particle swarm optimization (PSO). In addition, for comparative purposes, the experimental results include an MLP and a multivariate adaptive regression splines (MARS) approach in conjunction with the PSO. First, the importance of each physical–chemical variable on the segregation is addressed via the model. Second, models are obtained for forecasting segregation. Then, regression with optimal hyper-parameters is conducted. When this hybrid PSO-SVM-based model with RBF kernel function is tested on an experimental dataset, the coefficient of determination and average width are equal to 0.98 and 0.97, respectively. Wu et al. [71] suggested a novel multiscale convolutional and recurrent neural network MCRNN architecture for which the input is converted at various scales and frequencies, recording both long-term patterns and short-term shifts in time series. The suggested system outperforms traditional time series classification approaches with improved feature representation. The experimental findings and comprehensive contrast with state-of-the-art techniques indicate the supremacy of the proposed MCRNN framework, which has adequate prediction efficiency and strong potential to enhance the quality of casting slabs.

After the casting process, steel goes through the rolling mills to reduce the thickness and obtain high uniformity. In this process, a steel slab is placed between two rolls, and after undergoing several rolls, the thickness can be altered. The key factors in the rolling part are mainly based on the crown of the strip, temperature, rolling power, bending force, and flatness. Zhang et al. [72], for the dynamic rolling process, suggested a nonlinear full condition monitoring model. First, for condition recognition, a dissimilarity index (DI) is decided, and a support vector model is developed to check the idle condition. Second, for sluggish feature analysis and co-integration analysis, t-distributed stochastic neighbor embedding (t-SNE) is utilized to remove nonlinear principal components. To gain a coil with a precise thickness after the rolling phase, it is important to pre-determine the exact rolling power.

Li et al. [73] proposed precise bending force prediction, which can enhance the control precision and flatness of the strip crown and further boost the strip shape quality. The author suggested six machine learning models, including ANN, SVR, classification and regression trees (CART), bagging regression tree (BRT), least absolute shrinkage and selection operator (LASSO), and gaussian process regression (GPR), which were implemented in the HSR process to predict the bending force. The findings indicate that GPR, with the best prediction precision, better stability, and reasonable computational expense, is the optimal model for bending force prediction.

Strip shape prediction is a crucial task for a high-quality product. Sun et al. [74] proposed an ensemble algorithm, random forest (RF), to forecast hot-rolled strip crowns. To develop three machine learning models, namely SVM, regression tree (RT), and RF, parameter tuning based on mean squared error is carried out. Results reveal that RF is the most preferred model to strip crown prediction because of the accurate results. For profile and flatness predictions, Wang et al. [75] presented three hybrid models, including GA-MLP, MEAMLP, and PCA-MEA-MLP. In comparison with the hybrid GA-MLP model, the hybrid PCA-MEA-MLP model established after dimensionality reduction of input variables by PCA can improve training time without decreasing model prediction accuracy, which is an important means of model simplification.

Hot-dip galvanizing is the process of submerging steel in a molten zinc bath to obtain corrosion resistance to protect the steel from harsh environments. As the last part of the steel industry application, the rest of this section discusses prediction and monitoring of tensile stress, yield stress, ultimate tensile strength, coating weight, and coating thickness of hot-dip galvanizing for a cost-effective process.

By controlling the main process parameters within defined limits, mechanical properties, that is, yield strength and ultimate tensile strength, are obtained in the galvanizing line of the cold rolling mill. In order to predict the mechanical properties of a coil, Lalam et al. [76] used an ANN. To prevent the consequences of redundancy and collinearity of input variables for the ANN, a key component analysis is used. To monitor the predicted mechanical properties and process parameters of a galvanized coil, an online quality management system is established. Colla et al. [77] presented a machine learning-based system to enhance the homogeneity of tensile properties of steel strips. Two types of data-driven mechanical property prediction models have been adopted: a first-order polynomial model and a feedforward neural network (FFNN). The suggested system can improve its performance through time and keep up-to-date concerning the development of the product and evolving consumer demands.

Pan et al. [78] suggested an advanced neural network-based coating weight control approach for hot-dip galvanizing lines. The framework consisted of a feedforward control (FFC) and feedback control (FBC), together with a neural network predictive model, a bias-update module, and a real-time optimizer. Through this framework, nonlinearity, large time-variant delays, disturbances, and unsynchronized regulation of two manipulated variables (MVs) have been addressed. Both the coating weight variance and the transition time were greatly reduced as well. Mao et al. [79] introduced a groundbreaking neural network model consisting of the BP algorithm and the genetic algorithm for the first time to model and predict the thickness of the hot-dip galvanized zinc sheet. In the model, the major influences of the coating thickness such as the stripline speed, air knife pressure, air knife to strip distance, and air knife height are used as the model input parameters. Furthermore, the coating thickness is the model output parameter of the hot-dip galvanizing system. Simulations demonstrate that the GA-BP algorithm, as opposed to standard coating thickness models, increases estimation precision and converges quicker and that it can be used as input in a closed loop zinc layer thickness control method.

In conclusion, in each steelmaking process, newly adopted AI-based methodologies are discussed. It can be seen that they can facilitate high precision and an intense monitoring system, unlike conventional supervised processes, which are not profitable and efficient. However, some research still needs to be conducted further with more complex models or combined with other algorithms to improve its performance and reduce computational load. The reviewed studies in this section are summarized in Table 5.

Table 5 An overview of the surveyed literature regarding the renewable energy

3.2 Semiconductor

The technological advancement of fast computing chips from the semiconductor industry has made possible the current status of AI for various engineering applications. However, the one-sided relationship is now beginning to shift to a bi-directional one as the growth of both sides is mutually beneficial: AI is about to change the semiconductor industry in return.

Generally, a semiconductor is processed on top of a silicon wafer. The wafer undergoes several procedures in which an insulator is formed through oxidation, a pattern is drawn by photolithography, etching is done using an etchant, a thin film is formed through evaporation or sputtering, and so on. These processes demand extremely high precision and thus prohibit anything (e.g., tiny dust) that may be a source of defect. Even though most semiconductor fabs are controlled under a near dust-free environment, defects that bring a severe impact on the yield are still detected in manufacturing lines due to other factors, including machine error and human mistakes. Approaches to detecting and localizing defects on wafers are twofold: image-based and signal-based fault detection. Here, studies that use both approaches are discussed.

The more common way of wafer fault detection is the intricate image-based detection through deep learning. Not only do defects indicate the specific fault location, but the way defects tend to cluster and form a pattern can also provide information on the root causes of malfunction. Imoto et al. [80] automated the classification process by using a CNN-based transfer learning method for monitoring the occurrence frequency of defect types that are useful for figuring out the root causes of process failures. The author suggests using transfer learning in a weakly supervised sense as there exists a massive incoherent labeled data in storage. The proposed methodology involves pre-training and fine-tuning. The former is done with massive data containing numerous incorrect labels. After that, the parameters of only the final layers are fine-tuned using few highly reliable labeled data. Yang [81] achieved a classification accuracy of 99.2% and 98.1% using CNN and extreme gradient boosting (XGBoost), respectively. Such high accuracy is achieved by carefully exploring the wafer map and finding out that the defect signature of the wafer appears to have some spatial correlations in the background, thereby transforming the images in a way to enhance the contrast between the signal and the background. The images are pre-processed using singular value decomposition (SVD) that eventually removed much noise around the defects. In addition, data augmentation (e.g., random cropping, rotating, resizing, and flipping) is conducted to obtain a better model. On the other hand, Tello et al. [82] classified both single-defect and mixed-defect patterns using randomized general regression network (RGRN) and CNN. It is stated that previously reported studies on defect classifications mainly focus on single-defect patterns and thus utilize simple models. Here, three stages are involved for the classification. In the first stage, a spatial filter is applied to remove noise in raw images. Then, a splitter based on information gain theory generates rules to identify and separate single-defect and mixed defect patterns. Lastly, the single-defect classified data are fed to RGRN while mixed-defect ones are given to CNN for training and testing. The separation of pattern types using the splitter turns out to be more effective than the traditional end-to-end deep learning techniques that are previously reported. O’Leary et al. [83] presented an interesting study where the classification of the chemical composition of particle defects was carried out. Although a simple CNN model is used, an investigation is made to validate the merge of spectral data from EDX spectroscopy with fully connected layers of CNN. The CNN, therefore, extracts features from input raw images as well as the spectral data simultaneously, and the results show a significant rise in overall classification accuracy.

The conventional method to inspect defects visually through a high-resolution camera faces limitations as it needs to be informed of all types of defects and their possible shapes in advance. Such inspection requires the labeling of numerous defects of shapes of a wide spectrum, which usually accompanies error and tedious human labor. Therefore, it is necessary to make use of the larger portion of unlabeled wafer maps that are available for training. Yu et al. [84] devised a stacked convolutional sparse denoising autoencoder (SCSDAE) which is a combination of CNN and SDAE. A sparse autoencoder (SAE) is known to learn relatively sparse features as it constraints the learning process by adding a sparse penalty term, thereby optimizing the network weights better than the standard AEs. Coupled with SAE, the denoising feature of the network provides robustness in feature representation even if input data has been stochastically corrupted. Two SDAEs are involved in feature extraction steps that are further enhanced by convolution and max-pooling for learning even more discriminant features, hence the name SCSDAE. Nakazawa et al. [85] not only detected abnormal defect patterns but also segmented them using an end-to-end deep convolutional encoder-decoder network. The proposed network is based on a fully convolutional network (FCN) that comprises mostly convolution layers for segmentation. Segmentation of defect patterns allows for the extraction of supplementary information such as location, size, major and minor axis length, and orientation. It is shown that it surpasses base architectures like FCN, SegNet, and U-Net in detection performance.

Even though image-based detection is taking over much of the highlights, there is a substantial amount of studies regarding signal-based methods. Lee et al. [86] showed that fault diagnosis to find root causes of process failures could be effectively carried out even using a black box CNN model. This is enabled particularly by tailoring the CNN’s receptive field over multivariate sensor signals along the time axis that allows for the association of its extracted features from hidden layers with the physical meaning of raw data. This further enables to locate the variable and time of process failures. Lee et al. [87] focused on reducing the noise while maintaining valuable information as much as possible for reliable and robust fault monitoring. For reducing the noise, the author proposes SDAE for which several DAEs are pre-trained with latent representation from the previous time step given as input. Once the pre-training stage is done, the final layer is switched for defect classification, and further fine-tuning is carried out. The study shows that the proposed model is more robust as noise severity increased than twelve other machine learning models used for comparison. Kim et al. [88] used sensor data called status variables identification (SVID) of varying lengths, which is important for early fault detection. The main difference of the proposed method from previous studies is that it shows robustness for SVID of different sequence lengths. Inspired by text classification in which sequences of varying length are encoded into a fixed-size vector, the author implements a self-attention mechanism to distribute attention over a fixed-size vector so that the model can pay attention to a certain time when faults occur. Azamfar et al. [89] solved the issue of data drift; that is, data distribution is shifted as operating conditions and environment are changed. In a real-world scenario, due to various disturbances, operators often face times when the testing conditions become different from the normal, which leads to differently distributed train and test data. The proposed model is no different from a standard CNN. However, it has an additional loss term called maximum mean discrepancy (MMD) loss which is a metric that quantifies the distribution discrepancy between the source and the target domains. It is stated that the addition of the loss term contributes to a big increase in classification accuracy. Unlike previous works that mostly utilize CNNs, Kim et al. [90] showed that RNN could be used as anomaly detection at an early stage. The effectiveness of the method is that it can pre-detect anomalies even if the model is not trained with defective data in advance. The model, so-called DeepNAP, consists of a detection module and a prediction module. The latter capitalize on the power of LSTM to forecast the future signal that is then passed to the detection module for latent representation and anomaly detection. In the detection module, the early part of the LSTM output is treated as the target signal used for optimization under partial reconstruction loss function. The partial reconstruction loss provides higher anomaly scores on the defective parts of the input signal, making it suitable for such application. It is shown that the proposed model outperforms other baseline architectures on the pre-detection of anomalies. The reviewed studies in this section are summarized in Table 6.

Table 6 An overview of the surveyed literature regarding the semiconductor

4 Conclusion

Applications of AI in manufacturing industries have been particularly challenging due to the demand for near immaculate modeling of highly nonlinear phenomena in a high-dimensional space. However, the vast amount of recent literature investigating AI in related industrial sectors (Table 7) implies that although it is still in its infancy, it possesses huge potential as a modeling, analysis, and automation technique that can change the manufacturing paradigm in the near future. Apart from the aforementioned industrial sectors, it is widely studied for medical image analysis, bioinformatics, drug discovery, recommendation systems, financial fraud detection, visual art processing, and military. Some of the renowned commercial products that leverage the power of AI include 'Alexa' by Amazon, 'Watson' by IBM, and 'AlphaGo' by DeepMind, and many more products without names have already penetrated our daily lives. Furthermore, to overcome a number of limitations such as lack of interpretability and performance degradation under data shortage that hinders broader applications of AI in the industries, sub-branches of deep learning including physics-informed deep learning, explainable AI, domain adaptation, active learning, multi-task learning, graph neural networks are actively being studied. The convergence of AI with other engineering sectors is promising, and it should not be overlooked. Therefore, through this review, we truly hope that the community of precision engineering and manufacturing finds a way to utilize the upcoming AI for future-oriented manufacturing effectively.

Table 7 An overview of how deep learning models are used in the recent literature