Introduction

Regardless of the quality of their construction, all systems are unreliable: their capability to perform their intended function will degrade with usage or time and they will eventually fail. The function of maintenance is then to ensure that the systems can be retained in or restored to a state where they can perform their intended function (Ben-Daya et al., 2016). Maintenance strategies have evolved since the first industrial machines required repairs in the First Industrial Revolution. These strategies have matured from simple corrective actions to more complex strategies that support asset management and provide added value through the exploitation of information and computer technologies (GFMAM, 2021). An overview of this evolution as proposed by the Global Forum on Maintenance & Asset Management (GFMAM) is illustrated in Fig. 1 and is explained in the following points:

Fig. 1
figure 1

Evolution of maintenance maturity in organizations, leading to the four main maintenance strategies; adapted from the GFMAM Maintenance Framework (GFMAM, 2021)

  • Reactive maintenance involves the handling of failures once breakdowns have occurred. This strategy can lead to unexpected machine downtime and high production costs (Lee et al., 2020). Reactive maintenance was born around the year 1800 in the first industrial revolution, when the first equipment that requires maintenance was born.

  • Planned maintenance is associated with maintenance actions (such as inspections and routine maintenance) that occur at periodic intervals based on time or usage (Ben-Daya et al., 2016). It can prevent certain failures by slowing down the deterioration process that leads to faults. However, it is unable to detect random failures that do not follow age-related patterns (Peng et al., 2010). Planned maintenance gained prevalence around the time of the Second World War (Ben-Daya et al., 2016).

  • Proactive maintenance aims to predict failures through the monitoring of a machine, allowing maintenance actions to occur when they are needed (Carvalho et al., 2019). Condition monitoring is made possible through the acquisition and processing of sensor data from the monitored machine (Barbieri et al., 2020). This acquisition can either be continuous or periodic (Jardine et al., 2006). Furthermore, proactive maintenance is characterized by the application of techniques like Reliability Centered Maintenance (RCM), Total Productive Maintenance (TPM), and Failure Modes and Effects Analysis (FMEA) that aim to learn from previous failures throughout the operational life of systems to improve maintenance management (GFMAM, 2021; Ben-Daya et al., 2016). Such techniques became widespread around the year 1970 (Ben-Daya et al., 2016).

  • Strategic maintenance adopts a perspective of asset management. Rather than only focusing on retaining or restoring the functionality of machines, asset management refers to the set of activities that “enables an organization to realize value from assets in the achievement of its organizational objectives” (ISO, 2014). In the strategic maintenance vision, maintenance is not considered in isolation. This means that instead of focusing on reducing the maintenance cost of systems and increasing their reliability and maintainability as maintenance traditionally does, strategic maintenance considers maintenance as one of the processes involved within the asset management landscape (Parra et al., 2019). Maintenance decisions are thus made considering the maintenance function in the context of the entire asset management process (GFMAM, 2016). In this context, the conflicting goals of multiple departments (such as maintenance versus production) must be balanced to obtain a solution that is in the best interest of the enterprise (Campos, 2009). An evolution that goes from considering maintenance activities in isolation to properly implementing asset management has the potential to improve overall financial performance, support risk management, and improve the efficiency and effectiveness towards the achievement of the organizational goals (ISO, 2014). Strategic maintenance started becoming the most relevant around the 2010’s, when the BSI PAS 55 standard (BSI, 2008) and its successor ISO 55000 (ISO, 2014) were first released.

Alongside this evolution of maintenance strategies, an evolution of technology has occurred. Technologies such as Cyber-Physical Systems (CPS) and the Internet of Things (IoT) have led a trend of automation in the manufacturing industry, known as the Fourth Industrial Revolution or Industry 4.0 (Oztemel & Gursev, 2020). This Industry 4.0 trend involves a vision of enabling transparency, which is the ability to discover manufacturing uncertainties and measure real manufacturing capability (Lee et al., 2013). Manufacturers can quantify what is usually invisible through the exploitation of Industry 4.0 technologies and the realization of transparency. For instance, the real availability (often wrongfully assumed to be complete availability) and real performance capability (often wrongfully assumed to be optimal performance) of manufacturing systems can now be measured (Lee et al., 2013). This insight into the real state of machinery enables the optimization of existing manufacturing processes (Monostori et al., 2016), such as maintenance. In particular, the increase of transparency in manufacturing through Industry 4.0 technology has enabled the rise of both proactive and strategic maintenance.

On one hand, proactive maintenance benefits from Industry 4.0 by enabling failure prediction. Many failures can be predicted in a system through the monitoring of its physical conditions (Nowlan & Heap, 1978); this monitoring has been paired with physics-based (Singh et al., 2014), data-based (Lee et al., 2021) and hybrid (Cofre-Martel et al., 2021) models to reach predictions about the system. This failure prediction capability is exploited through Condition-based Maintenance (CBM), a maintenance program that uses condition monitoring data to suggest maintenance actions (Jardine et al., 2006). Regardless of its specific type, CBM allows maintenance managers to enable maintenance transparency and more effectively decide which items should be scheduled for maintenance (Lee et al., 2013).

On the other hand, strategic maintenance can benefit from transparency by quantifying the contribution that maintenance brings to the enterprise. Decisions made in strategic maintenance intend to fulfill the maintenance functions while supporting asset management and organizational objectives (Márquez, 2007). As a consequence, all maintenance decisions must consider these higher objectives. Once maintenance decisions are made and the resulting actions are executed, their contribution can be evaluated so that further decisions can continue supporting asset management. This evaluation is possible through the analysis of indicators related to health, performance, quality, and resource usage metrics (Lee et al., 2013). Such indicators can be calculated thanks to the increased transparency enabled by Industry 4.0 technologies (Lee et al., 2013).

To exploit the benefits of transparency in maintenance, Industry 4.0 technologies must be integrated into the manufacturing devices of a company. Devices that do not have these capabilities are often referred to as legacy devices. Such devices must be either replaced or upgraded to enable transparency (Villar-Fidalgo et al., 2018). The latter process is known as smart retrofitting. Jaspert et al. (2021) define Smart Retrofitting as the integration of new technologies into legacy systems to enable the transition towards Industry 4.0. It is an action that involves machinery that was not designed for the Industry 4.0 vision, and it aims to transfer the requirements of this vision to such machinery using the lowest time and capital investment (Guerreiro et al., 2018). As a result, machines that undergo smart retrofitting are converted into CPS (Lins & Oliveira, 2020). The improvement of devices through smart retrofitting is a desirable outcome, as it enables Industry 4.0 transparency without the high investment and high risk associated with the replacement of machinery. Smart retrofitting has several advantages, such as ensuring competitiveness of enterprises, increasing equipment efficiency, and achieving sustainability (Jaspert et al., 2021).

This work is interested in the study of smart retrofitting in the maintenance context: Smart Retrofitting in Maintenance (SRM). While Jaspert et al. have already produced a literature review on smart retrofitting (Jaspert et al., 2021), they focus on the manufacturing domain. The main novelty of the present work is then to focus on the maintenance domain. Such a focus is of interest because there might be requirements or characteristics of smart retrofitting that are specific to maintenance.

Given the above, the rest of this paper is organized as follows. The definition and execution of the methodology utilized for the systematic review on SRM is presented in Section 2. The results that arise from its application are illustrated in Section 3 and discussed in Section 4. Finally, Section 5 draws conclusions and outlines future work.

Methodology

An inductive approach was followed to define the necessary steps to create a systematic review on SRM. In an effort to find common patterns, this approach involved studying multiple engineering-related systematic literature reviews. The objective was to identify the required steps, along with the methods and tools that were utilized. An initial analysis of reviews on manufacturing (Franciosi et al., 2020) and retrofitting (Jaspert et al., 2021), along with a review on systematic reviews (Kitchenham et al., 2009) led to the following workflow:

  • Research questions: define a set of Research Questions (also referred to as “RQ” in this document) that the literature review must reply.

  • Specific methodology: define a specific methodology for conducting the literature review.

  • Keyword matrix: define a keyword matrix for building the initial database of papers of the literature review.

  • Screening and eligibility: identify and apply a set of exclusion criteria to refine the initial database.

  • Analysis: read the complete texts and search the answer to the defined research questions.

The remainder of this section illustrates the identified steps.

Research questions

In an effort to select a proper set of research questions, the ones identified in different systematic literature reviews on engineering-related topics were analyzed (Franciosi et al., 2020; Cimino et al., 2019; Kitchenham et al., 2009; Jaspert et al., 2021). Then, the following research questions were chosen for this literature review:

  • RQ1: how is smart retrofitting in maintenance defined in the literature?

  • RQ2: how does smart retrofitting support different maintenance strategies?

  • RQ3: what are the benefits (drivers) of smart retrofitting in maintenance?

  • RQ4: what are the challenges for implementing smart retrofitting in maintenance?

Specific methodology

The PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) method (Liberati et al., 2009) is a frequently used approach for the realization of systematic literature reviews, with at least 700 recorded uses in engineering and computer science works. The method involves an identification phase, a screening phase, and an eligibility phase. Given its pervasiveness, PRISMA was chosen as the methodology for this review.

Fig. 2
figure 2

Visual representation of a keyword matrix, using standard boolean notation where \(+\) corresponds to the or operator and \(\times \) sign to the and operator. As long as one keyword in each group exists in a paper, it will appear in the keyword search

Fig. 3
figure 3

Keyword groups in the final keyword matrix

Identification: keyword matrix

One of the most critical steps in this process was searching for works using keywords from a keyword matrix. Keyword matrices are a systematic method to select the specific keywords to use in a literature review. This method is described by Jaspert et al. (2021), and it consists in using AND and OR operators to find papers that discuss a specific subject within scientific databases; such as Scopus and World of Science.

As a general rule, a keyword matrix is created by defining certain groups of keywords that are thematically related, and linking said groups with the AND operator. Within each group, keywords are linked with OR operators. This means that a given work will appear in the search results if at least one keyword from each group is present on it. A visual guide to the method of the keyword matrices is depicted in Fig. 2. Note that these results correspond to manuscripts that were published until around June 2021. The final matrix, shown in Fig. 3, includes a topic group pertaining to smart retrofitting, an activity group that considers maintenance, a narrowing of the areas of interest in the area group, and a group of related paradigms and technologies of which at least one is expected to be present in any reference related to SRM. This last group also has the purpose of searching for documents that are related to different maintenance strategies.

Screening and eligibility

Once the initial set of articles was found through Scopus and World of Science using the keyword matrix, the PRISMA methodology was followed. The following exclusion criteria were used for the screening and eligibility phases:

  • Screening Exclusion Criteria (SEC)

    • SEC1: is the document an article published in a peer-reviewed journal or conference, or a book chapter?

    • SEC2: does the document show a possible relationship with both retrofitting and maintenance?

  • Eligibility Exclusion Criteria (EEC)

    • EEC1: is the full document available for downloading in English?

    • EEC2: does the document show a clear relationship with both retrofitting and maintenance?

    • EEC3: does the document present an answer to at least one of the research questions?

The SEC were chosen because they were easily verifiable with only reading the title and the abstract, while the EEC required the full document to be analyzed. A flow chart with the result of the application of the methodology is depicted in Fig. 4.

Fig. 4
figure 4

PRISMA stages for the SRM review

Analysis

After the PRISMA screening and eligibility phases, 82 papers were selected for a complete analysis. The method described in Saldaña (2016) and adopted by Jaspert et al. (2021) was utilized for said analysis. This consists of assigning descriptive keywords (codes) to text excerpts from each paper that could present an answer to the research questions, to then organize the keywords into meaningful categories.

The grouping of keywords into meaningful categories, along with the explanation and analysis of these categories, is presented in the following section.

Results

This section presents an overview of the answers found in the literature to each RQ. After the extraction of text excerpts, keywords were assigned to each document and meaningful categories were found.

RQ1: Definitions

RQ1 focuses on formulating a definition for SRM. It should be noted that simply looking for explicit definitions for SRM would have yielded poor results. Only a few of the reviewed papers mentioned the word retrofit (Uhlmann et al., 2017; Cattaneo & Macchi, 2019; Sepehri et al., 2018; Strauß et al., 2018; Hesser & Markert, 2019; Surico et al., 2020; Bernal et al., 2018), and they used it without attempting to formalize a definition. Because of this, the following approach was taken to answer this research question. In each paper, authors state the characteristics or requirements needed for a process to be considered a retrofitting (with maintenance purposes) of a legacy device. These characteristics and requirements were recorded for the building of a definition of SRM. For RQ1, the keywords fell into four main categories: data collection, data communication, data processing, and services. The resulting frequency of each category is depicted in Fig. 5, and the most cited works in each category are illustrated in Table 1. Next, a description of each group is presented.

Fig. 5
figure 5

Frequency of the categories found regarding Research Question 1: Definition

Table 1 Most cited papers in the categories identified for RQ1

Data collection

If data analysis is to occur, devices must have the capability of collecting data from sensors (Barksdale et al., 2018; Ge et al., 2019; Liu et al., 2018). This can be done through preexisting sensing capabilities if they are available (Luo, 2020; Chau et al., 2015; Calabrese et al., 2020), but other times the installation of additional sensors is necessary (Huber et al., 2019; Prathima et al., 2020; Kiangala & Wang, 2018). Such sensors are sometimes added in the form of sensor nodes (Christou et al., 2020; Surico et al., 2020; Catenazzo et al., 2018), which are IoT devices that integrate both sensing and communication capabilities. Although it might not always be possible, some authors emphasize the need to collect data in a non-invasive manner. Jónasdóttir et al. (2018) propose a retrofitting solution that does not require machinery to be opened or heavily modified. Wiemer et al. (2019) cite the need of not interrupting a running production while the upgrading process takes place by performing any pilot test in a testbed rather than in the production line. Regardless of the specific details of the process, it becomes clear that any SRM process involves capturing data from the target legacy devices.

Data communication

Devices also need the ability to communicate with other entities. The OM2M platform (Alaya et al., 2014) defines three actors present in any Machine to Machine (M2M) architecture: M2M devices (which are natively capable of communicating to an M2M network), legacy devices (which are not capable of communicating to a network), and M2M gateways (which enable M2M communication capabilities for legacy devices). Communication in SRM closely resembles this paradigm. According to the reviewed literature, devices must be able to communicate with external entities such as smart mobile devices (Ranjbar et al., 2019; Hussain et al., 2020; Cologni et al., 2015), local central computers or processors (Atluru et al., 2012; Ashjaei & Bengtsson, 2017; Short & Twiddle, 2019), or to a remote location (Nordal & El-Thalji, 2021; Damanik et al., 2020; Bucci et al., 2020) (such as the equipment maintainer or the cloud). Through communication, legacy devices can be integrated into management software, like Maintenance Execution Systems (MES) and Computerized Maintenance Management Systems (CMMS) (Balogh et al., 2018; Barton et al., 2019; Chau et al., 2015). Some authors identified that such communication should be wireless (Ziegelaar et al., 2020; Magadán et al., 2020; Uhlmann et al., 2017), sometimes referencing the use of wireless sensor networks (Talmoudi et al., 2019; Priyanka et al., 2021; Sadiki et al., 2018). Furthermore, enabling interoperability by being compatible with multiple vendors and protocols is also cited as a requirement for all these types of communication (Priller et al., 2014; Chen et al., 2018; Liang et al., 2020). When communication is not initially possible for a device, it is often enabled using a gateway device (Alexandru et al., 2016; Richter et al., 2019; Scholtz et al., 2018). The specific means of communication and the specific entity with which devices communicate change depending on the considered application, but the requirement of data communication remains constant in most of the analyzed works.

Data processing

Data that are collected and transmitted must then be processed, converting it into information, knowledge and eventually wisdom (Ackoff, 1989). Such processing involves any manipulation of data that requires computational capabilities before the data can lead to decision making and other added value services. Data are generally stored in databases (Bousdekis et al., 2019; Prudenzi et al., 2019a; Arosio et al., 2014), so that current and historical data can be retrieved at any point (Ardila et al., 2020; Romero et al., 2021). Processing then includes data cleaning or preprocessing (Uhlmann et al., 2017; Åkerman et al., 2018; Chen et al., 2018), data visualization (Yu et al., 2014; Bousdekis et al., 2019; Prathima et al., 2020), and in general, data analysis (Wiemer et al., 2019; Ranjbar et al., 2019; Calabrese et al., 2020). Although data processing is often performed entirely in the cloud (Gayathri & Vasudevan, 2018; Balogh et al., 2018; Iqbal et al., 2019), edge and fog computing can be utilized as alternatives or supplements to cloud computing (Strauß et al., 2018; Shapsough et al., 2020; Magadán et al., 2020).

Services

By enabling data collection, communication, and processing for a legacy device, the exploitation of service-based architectures can be achieved (Lesjak et al., 2014; McNally et al., 2020; Alonso et al., 2018). Such services can be classified according to the Prognostics and Health Management (PHM) architecture proposed in Li et al. (2020). These services consist in:

  • Fault Diagnosis Assessment (FDA): the current fault state of devices can be estimated through retrofitting (Lee et al., 2017; Barton et al., 2019; Sepehri et al., 2018). This fault estimation includes detecting the presence of failures, isolating the failed component, and identifying the specific failure mode.

  • Prognosis Assessment (PA): incipient failures can be detected before they can affect the device performance (Aqueveque et al., 2021; Bucci et al., 2020; Alves et al., 2020). This is possible through the estimation of the health state and the prediction of Remaining Useful Life (RUL).

  • Health Management (HM): once the fault and health state of a device have been assessed, decision making can take place (Ciancio et al., 2020; Vieira et al., 2018; Mourtzis & Vlachou, 2018). Health management services must first integrate the fault and health information. Then, they are able to offer maintenance advisory through the analysis of this information. This enables the conversion of information into knowledge and eventually wisdom (Ackoff, 1989).

RQ2: maintenance strategies

Next, RQ2 is analyzed. Reactive, planned, proactive and strategic maintenance strategies are defined in the GFMAM Maintenance Framework (GFMAM, 2021), and these are the strategies considered in this review. Note that a single paper could discuss the use of SRM in the context of multiple strategies. The following criteria were used to classify the maintenance strategy or strategies considered in a paper:

  • If the studied maintenance actions occur after a failure, the paper is classified as reactive maintenance.

  • If maintenance actions occur before failures but there is no condition assessment, the paper is classified as planned maintenance.

  • If maintenance actions occur before the occurring of failures, and the condition is assessed through the measurement of physical variables (e.g., acceleration, temperature), the paper is classified as proactive maintenance.

  • If maintenance actions occur following a framework that considers asset management and/or business objectives, the paper is classified as strategic maintenance.

The differentiation of planned and proactive maintenance is of special interest. A paper was classified as studying SRM in planned or proactive maintenance when SRM is used to collect data that allows maintenance actions to be planned before the occurrence of failures. The main difference between both categories then lies in the nature of the collected data. If measured variables enable the estimation of the current condition of the machinery (i.e. the current wear) then a paper was classified as studying proactive maintenance. In contrast, a paper was classified as studying planned maintenance if the measured variables enable the quantification of how long a given device has been operating.

The frequency of each strategy is presented in Fig. 6, and the most cited works in each category are provided in Table 2. The most frequently studied maintenance strategy is proactive maintenance, followed by planned maintenance which was studied or mentioned with significantly less frequency. Regarding reactive and strategic maintenance, they were seldom mentioned in the analyzed works. The remainder of this section presents an overview of the state of SRM in each of the maintenance strategies.

Fig. 6
figure 6

Frequency of the keywords found regarding Research Question 2: Maintenance Maturity

Table 2 Most cited papers in the categories identified for RQ2

Reactive maintenance

Only four of the analyzed works explicitly study SRM in the context of reactive maintenance. Alexandru et al. (2016) generate smartphone notifications when a machine’s programmable logic controller (PLC) triggers an alarm. Ramani et al. (2016) and Deroussi et al. (2018) use SRM to generate notifications that are sent to operators when a machine failure is detected. Priller et al. (2014) mention the potential of using retrofitting to execute predefined reaction patterns when a machine breaks down (such as automatically scheduling a repair action), rather than simply generating a notification. It can be noticed that the objective is the automation of reactive maintenance actions through SRM after the occurrence of functional failures. However, such exploitation of SRM in reactive maintenance policies appears to be relatively uncommon.

It is worthy of mention that this scarcity of reactive maintenance-related papers was expected. The chosen keyword matrix includes terms such as Industry 4.0 and Smart Maintenance, whose technologies are not necessarily required in reactive maintenance. Instead, traditional industrial automation technologies (such as PLC’s) might be sufficient to enable maintenance-related services in reactive maintenance; e.g. fault detection. This low number of papers does not reflect a lack of research on smart retrofitting in reactive maintenance, but instead shows that the focus of this literature review lies elsewhere.

Planned maintenance

Planned maintenance papers were found to be researched more frequently than reactive and strategic maintenance. Within planned maintenance papers, two variables that were frequently measured are the operational state and operational mode of devices. An operational state refers to the level of activity within a structure. Operational states are commonly described in simple terms of binary ON and OFF values (Wasson, 2006). Whereas, an operational mode specifies an abstract user-selectable set of system activities that focuses on satisfying an objective. They are not limited to ON / OFF values, since operational modes might also include modes like initialization, calibration, configuration, cleaning, production, and even maintenance (Wasson, 2006).

Regarding operational states (ON/OFF), machinery was retrofitted so that start and stop times could be recorded, allowing maintenance to be done according to total work-hours (Damanik et al., 2020; Bhandari et al., 2020; Calabrese et al., 2020; Erazo Navas et al., 2021). When devices do not have the ability to measure their own state, variables such as voltage and current can be used to monitor the operational state (Bhandari et al., 2020; Mourtzis & Vlachou, 2018). The machine operational mode is also recorded in some of the analyzed works (Pistofidis & Emmanouilidis, 2012; Mourtzis & Vlachou, 2018; Cattaneo & Macchi, 2019). These operational measurements allow for an increased awareness of the shop floor condition, which in turn allows better maintenance and workshop planning (Mourtzis & Vlachou, 2018).

Proactive maintenance

Proactive maintenance papers were by far the most frequent. Such works are characterized by the implementation of CBM techniques, which aim to predict potential failures through the analysis of physical variables (Jardine et al., 2006).

The most frequently measured variables were vibration (Surico et al., 2020; Aqueveque et al., 2021; Magadán et al., 2020) and temperature (Ciancio et al., 2020; Iqbal et al., 2019; Short & Twiddle, 2019), followed by electric current (Ge et al., 2019; Strauß et al., 2018; Barksdale et al., 2018). The most analyzed components were motors, induction or otherwise (Rubio et al., 2018; Eiskop et al., 2017; Talmoudi et al., 2019), followed by bearings (Richter et al., 2019; Bernal et al., 2018). Most research on SRM involves rotating machinery. Other variables of interest that were measured were fluid pressure (Schneider et al., 2019; Lalanda et al., 2017) and electric voltage (Al Kindhi & Pratama, 2021; Prudenzi et al., 2019b).

Strategic maintenance

Specific mentions to strategic maintenance (i.e. fulfillment of asset management or organizational goals) were seldom found. Only two of the analyzed works study SRM in the context of strategic maintenance, both of which belong to the same research group.

Huber et al. (2019); Wiemer et al. (2019) worked on extending the CRoss-Industry Standard Process for Data Mining (CRISP-DM). CRISP-DM is an open standard that describes a standard approach to data mining projects (Wirth & Hipp, 2000). It consists of 6 layers: Business Understanding, Data Understanding, Data Preparation, Model Building, Model Evaluation, and Deployment. The two first layers are of special interest. Business Understanding focuses on understanding the project objectives and requirements from a business perspective, while Data Understanding involves data collection (Wirth & Hipp, 2000). Authors defined intermediate steps between Business Understanding and Data Understanding. These are: i) business objectives are transformed into technical tasks that fulfill said objectives; ii) a selection of the data required to complete these technical tasks is made; iii) proper measurement equipment and methodology are selected. Such intermediate steps allow a direct link between organizational goals and technical implementations of SRM, making these papers examples of the implementation of strategic maintenance in an enterprise.

RQ3: drivers

To create cohesive groups of SRM drivers, the GFMAM Maintenance Framework was followed (GFMAM, 2021). This identifies three categories in which maintenance management adds value to businesses: performance, risk and cost. The resulting frequency of each category is illustrated in Fig. 7, and the most cited works in each category are provided in Table 3. A description of each group follows.

Fig. 7
figure 7

Frequency of the categories found regarding Research Question 3: Drivers

Table 3 Most cited papers in the categories identified for RQ3

Performance

The introduction of maintenance services through smart retrofitting can increase the performance of manufacturing systems, optimizing their production capabilities (Tedeschi et al., 2017; Ramani et al., 2016; Jung & Jin, 2018). This is possible by improving the availability of physical assets, which is in turn a function of their reliability and maintainability (Parra & Crespo, 2015).

SRM has the potential to improve the reliability of assets (Cattaneo & Macchi, 2019; Deroussi et al., 2018; Hussain et al., 2020), by increasing the Mean Time To Failure (MTTF) and consequently decreasing the Failure Frequency (FF). Improving reliability is often cited as a result of an improved ability to schedule maintenance actions (Priller et al., 2014; Yiu et al., 2019; Catenazzo et al., 2018), and in some cases, as a result of implementing prognostics and health management through retrofitting (Ranjbar et al., 2019; Cattaneo & Macchi, 2019; Vogl et al., 2015). When failures occur, SRM can also reduce the Mean Down Time (MDT), effectively improving maintainability (Sezer et al., 2018; Lesjak et al., 2014; Alves et al., 2020). Besides an improved maintenance schedule, maintainability might be augmented by retrofitting through faster response times (Wang et al., 2020; Liang et al., 2020; Gayathri & Vasudevan, 2018), which enable the use of alarms, earlier detection of faults, and remote notifications.

The collection of these improvements increase the availability of assets (Jónasdóttir et al., 2018; Bernal et al., 2018; Mykoniatis, 2020), which means that assets stay in a productive state for a greater amount of time. Furthermore, the optimization of production through workshop scheduling (rather than just maintenance scheduling) is also enabled by SRM (Zhang et al., 2017; Vogl et al., 2015; Chen et al., 2018). Finally, SRM can improve the quality of produced products and offered services (Prudenzi et al., 2019a; Sezer et al., 2018; Hsu et al., 2019).

Risk

Another way in which maintenance adds value is by reducing risk. Risk is the potential impact to an asset or another source of value that may arise from a present process or from a future situation (Márquez, 2007). In maintenance, risk commonly refers to safety, environmental, operational and not operational effects of failures that might lead to monetary losses (Ben-Daya et al., 2016). Maintenance can reduce risk in an organization through: safety, environmental risk, and stakeholder confidence (GFMAM, 2021). SRM was found to address all these elements in varying levels.

SRM has been said to improve operator safety. Safety sometimes comes from the increased autonomy of assets, which reduce the need of operators directly interacting with potentially hazardous machinery (Gayathri & Vasudevan, 2018; Catenazzo et al., 2018). Risk has also been minimized by reducing the chances of potential injuries due to damaged equipment (Hesser & Markert, 2019), preventing unsafe working conditions (Nordal & El-Thalji, 2021), or by providing clear instructions for operators to follow when performing risky tasks, reducing potential human errors (Arosio et al., 2014). Regarding human-asset interactions, some papers also tracked operator location through retrofitted machinery. These authors propose tracking the position of each operator and which tasks they have completed on which assets (Arosio et al., 2014; Pistofidis & Emmanouilidis, 2012). This tracking mechanics can lead to a more human-centered maintenance enabling greater situational awareness of operators (Oliveira et al., 2013), and increasing safety while reducing errors (Gavish et al., 2015).

Concerning environmental risks, some authors in the studied papers concerned themselves with environmental sustainability (Scholtz et al., 2018; Zhang et al., 2017). Although sustainability was seldom mentioned directly in SMR-related works, one of the main functions of maintenance is maintaining plant and environmental safety (Muchiri et al., 2011). This indicates that the improvement of the maintenance function has the potential to improve environmental sustainability. Conversely, an inadequate execution of maintenance can lead to hazardous emissions, production wastes due to untimely breakdowns, and inefficient use of energy and resources (Liyanage & Badurdeen, 2010). Proper maintenance (and SRM as an extension) then has a significant sustainability impact in the organizations (Franciosi et al., 2020).

Finally, the literature indicates that SRM has the potential to increase stakeholder satisfaction (Jung & Jin, 2018; Gayathri & Vasudevan, 2018), by enabling working-culture-oriented and reliable business models (Nordal & El-Thalji, 2021). The studied works then indicate that SMR has the potential to bring a positive impact on risk management through safety, environmental risk, and stakeholder confidence.

Costs

The third main added value that maintenance offers is cost reduction. The costs associated with asset utilization can be divided into capital expenditures and operational expenditures. Maintenance can support the reduction of these costs (GFMAM, 2021), and SRM can aid this process.

Capital expenditures (CAPEX) involve money spent on acquiring or upgrading productive assets. This includes buying or replacing machinery to increase overall productivity or augment redundancy. One way SRM enables the reduction of this cost is by increasing the remaining useful life (RUL) of assets (Hesser & Markert, 2019; Aqueveque et al., 2021; Strauß et al., 2018). When machines can perform their functions for longer, less refurbishments and replacements are needed. Another driver that reduces CAPEX is enabling interoperability between machinery from different vendors (Mourtzis & Vlachou, 2018; Barton et al., 2019; Iqbal et al., 2019). Interoperability eliminates the need to replace functioning machinery simply because it uses a different or outdated communication interface, and removes the need to purchase separate software tools that are compatible with specific vendors only. Finally, SRM solutions usually aim to have a low-cost implementation, meaning that transitioning from legacy to retrofitted assets does not involve high CAPEX (Chen et al., 2016; Eiskop et al., 2017; Sezer et al., 2018).

Operational expenditures (OPEX) are related to money spent on operating costs such as raw materials, utilities (electricity, water, etc.) and labor. It also includes maintenance itself, both labor-wise and parts-wise. Numerous examples can be found in which SRM is demonstrated to save resources, such as maintenance costs (Ramani et al., 2016; Surico et al., 2020), materials (including spare parts) (Bernal et al., 2018; Priller et al., 2014), and energy (Alonso et al., 2018; Pignatelli et al., 2015). Of special interest are ways in which SRM can simplify the maintenance process, potentially saving money and time. Retrofitting has enabled the use of remote services (Khademi et al., 2019; Liang et al., 2020) and outsourcing maintenance through after-sales services (Chen et al., 2016; Zhang et al., 2017), which reduce the need of having expert personnel on-site. Meanwhile, it has enabled an increased asset autonomy (Chau et al., 2015; Prathima et al., 2020) and ease of use (Lin et al., 2014; McNally et al., 2020), meaning that human resources may be reduced through retrofitting.

RQ4: Challenges

When selecting a set of categories to group definitions and drivers, pre-existing frameworks or standards were used. This approach does not apply well to RQ4, since they change as technology advances and research prioritizes certain areas over others. Instead, categories were identified only from the reading of the selected works. The resulting frequency of each category is depicted in Fig. 8, and the most cited works in each category are provided in Table 4. The identified categories consist in: unavailability of CBM data, lack of expert knowledge in plants, and various technical challenges that include security, interoperability, latency and data volume. These are next illustrated.

Fig. 8
figure 8

Frequency of the categories found regarding Research Question 4: Challenges

Table 4 Most cited papers in the categories identified for RQ4

Security

Cybersecurity is the computer science field that devotes itself to protecting the privacy, confidentiality and integrity of data that is stored and transmitted. The importance of this field is ever-increasing, as cyber-attacks become more frequent and sophisticated (Babiceanu & Seker, 2016). In this context, multiple considerations are made regarding security of retrofitted machinery. Intellectual property becomes a concern, as poor security measures in data transmission might lead to third parties capturing production data (Eiskop et al., 2017; Mourtzis & Vlachou, 2018). Such considerations become a greater concern if a device is connected permanently to the internet (Lesjak et al., 2014), so alternatives like edge and fog computing become of interest (Ashjaei & Bengtsson, 2017).

Interoperability

Collecting data for maintenance purposes is not a new concept since many plants already do this. However, these datasets are usually fragmented across systems because of their heterogeneous semantics and formats (Christou et al., 2020; Barbieri & Gutierrez, 2021). The development of common standards for device interoperability were found to be some of the main challenges of health management in smart manufacturing (Weiss et al., 2015). Even if various machine to machine (M2M) communication standards have been developed to this day (such as MQTTFootnote 1, MTConnectFootnote 2, and OPC-UAFootnote 3), heterogeneity and the lack of interoperability are still challenges today (Barton et al., 2019; Prathima et al., 2020). Interoperability issues are not limited to machine communication. Existing data analysis and visualization dashboards (such as Power BI, Tableau, and Google Data Studio) have been found to be incompatible with sensor data streams and streaming technologies, but instead operate properly only on classical data storage (Moens et al., 2020).

Latency

Cloud services are commonly used in retrofitting scenarios. However, cloud services usually do not guarantee the reactiveness required from time-critical operations (Khelifi et al., 2018; Wang et al., 2020). Even for applications that are not time-critical, network reliability has been identified as an issue in using the cloud (Wang et al., 2020). Some authors have also cited real-time data collection and processing as challenges (Mourtzis & Vlachou, 2018; Cologni et al., 2015). Consequently, edge computing has been proposed as a solution to these issues (Ashjaei & Bengtsson, 2017; Strauß et al., 2018). However, edge computing has several constraints that the cloud does not, such as computational power and available data storage (Scholtz et al., 2018). The aforementioned issues make the challenge of latency in data transfer still relevant.

Data volume

Volume is one of the main dimensions of Big Data, which is increasingly relevant in the manufacturing domain (Babiceanu & Seker, 2016). Big Data refers to amounts of data that are too large to be processed efficiently through traditional methods (Kaisler et al., 2013). The communication, storage and real-time processing of such large data volumes have become a challenge for IoT devices (and by extension, retrofitted devices) (Lee et al., 2017). Regarding communication, the bandwidth of existing networks in industrial scenarios can be insufficient for these data volumes (Catenazzo et al., 2018). Edge computing might be considered as a solution to this, but storage and processing of large data volumes are an even bigger challenge in edge computing than in cloud computing (Scholtz et al., 2018). Furthermore enabling this communication, storage and processing of large data volumes might require large start-up (CAPEX) and maintenance (OPEX) investments in plants and businesses (Lee et al., 2017). Note that this challenge should be more relevant in proactive maintenance than planned maintenance.

Data availability

When performing CBM, sensor data must be collected before being able to perform analyses. Run-to-failure data is necessary to train CBM models so they can detect the current state of a device (Calabrese et al., 2020; Alves et al., 2020). However, such data is not always available to plants that perform CBM on a device for the first time (Lei et al., 2018). The collection of sensor data through retrofitting well in advance of obtaining condition estimation capabilities must be a long-term commitment for enterprises that implement CBM. Much like with data volume, this challenge should be more relevant in proactive maintenance than planned maintenance.

Expert knowledge

To properly conduct a retrofitting project, an interdisciplinary team of domain experts is needed (Huber et al., 2019); e.g. data scientists, process engineers, control engineers, maintenance engineers, etc. After the termination of the retrofitting activity, professional and technical persons that are familiar with Information and Communication Technologies (ICT) are required to operate and maintain the retrofitted equipment (Liang et al., 2020; Bucci et al., 2020). Acquiring expert knowledge and/or personnel that has such knowledge can be especially challenging for Small and Medium Enterprises (SMEs) whose budget might be more restricted than larger enterprises (Jung & Jin, 2018; Surico et al., 2020).

Discussion

In an effort to answer the RQs illustrated in this work, 82 different papers that discuss SRM were analyzed and categorized. This allowed the identification of: RQ1: a definition of SRM; RQ2: the maintenance strategies in which SRM is being researched the most; RQ3: the drivers that might push enterprises towards the implementation of SRM; RQ4: the challenges that hinder this implementation. This section summarizes these findings and proposes a roadmap for the implementation of SRM.

RQ1: definition

An analysis of the requirements for SRM that are found in the literature leads to a definition of it. This definition should reflect the need to enable data collection, communication, and processing in legacy devices for the exploitation of such capabilities to generate maintenance-related services. The following definition is proposed based on the previous analysis:

Smart retrofitting in maintenance refers to the development of maintenance services through the retrofitting of legacy devices with the following functionalities:

  • Data Collection: capability to collect operational data through preexisting or additional sensors;

  • Data Communication: capability to transmit sensor data within a network to local or remote actors;

  • Data Processing: capability to transform sensor data into information and knowledge through data preprocessing, visualization, and analysis.

Some devices might already have embedded some of these functionalities. The functionalities that must be retrofitted then depend on the current capabilities of the device. For instance, a device that already has data collection capabilities (such as a computer numerical control or CNC mill) would only require communication and processing capabilities to be added for the implementation of SRM.

The definition proposed in this work is built from the requirements that are found in the literature for the retrofitting of legacy devices with maintenance purposes. This proposal is not based on other definitions, since these are not available in the analyzed works. Despite not being based on other definitions in the literature, the presented definition is found to be aligned with various Industry 4.0 architectures. For instance, the 5C CPS architecture proposed in Lee et al. (2015) and the hierarchical architecture of smart factories presented by Chen et al. (2017) pose similar technological needs. Such similarities between this definition and Industry 4.0 architectures is not surprising, considering that “Industry 4.0” is one of the keywords used to select the analyzed works. The proposed definition is then supported by pre-existing Industry 4.0 architectures while detailing specific requirements and considerations for the case of SRM.

RQ2: maintenance strategies

A second set of findings comes from maintenance strategies. By studying the concepts and the physical variables monitored by retrofitted devices, each work was identified as studying the use of SRM in the context of one or more maintenance strategies. It is notable that significantly more research has gone into SRM for proactive maintenance than for any other maintenance strategy. The ways in which SRM can support proactive maintenance are clear: it enables the collection of physical variables, which can then be used for FDA, PA and HM services. Ways in which SRM can support other maintenance models are then of interest.

In the domain of planned maintenance, increased transparency enables maintenance managers to know how much each device has been used and in which mode of operation, thus allowing better maintenance planning. However, research on planned maintenance is significantly less frequent than in proactive maintenance. Such a situation becomes undesirable when considering that planned maintenance is still prevalent in industries today (Macchi et al., 2017).

Regarding reactive maintenance, almost no research on SRM was found. The papers that discuss this maintenance model mention that SRM can improve the response to failure by generating alarms and identifying which steps must be taken next. The scarcity of results in this area is probably caused not from a lack of research, but from a keyword matrix that is focused in different topics.

Finally, the reduced amount of research on SRM for strategic maintenance proves to be undesirable. Strategic maintenance is the most recent trend of maintenance evolution, and it has been regarded as the next step in increasing the added value of the maintenance function (GFMAM, 2021). The few works that studied SRM in strategic maintenance first consider what type of data must be collected and then use SRM to enable the collection of these data. However, only two out of 82 analyzed documents discuss strategic maintenance. This lack of works on how SRM can support strategic maintenance presents a strong opportunity for future research.

RQ3: drivers

In this work, SRM drivers were grouped into significant categories. Drivers were classified as supporting performance, risk and costs. Performance is improved by SRM through increased availability (achieved through higher reliability and maintainability), increased production, and better quality in products and services. Risk is reduced by increasing operator safety, considering sustainability goals, and enhancing stakeholder satisfaction. Finally, costs are reduced by minimizing both capital and operating expenses, increasing the useful life of assets, and decreasing the usage of resources. These value drivers are aligned with the objectives of asset management (ISO, 2014), and by extension, of strategic maintenance.

SRM then has the potential to support enterprises in their fulfillment of their organizational objectives through strategic maintenance. Despite this potential, SRM does not appear to be currently used to quantify how maintenance impacts the enterprise at the organizational level. Since this topic merge technology and organizations, the collaboration between academia and industry is fundamental. The lack of research may suggest a need to increment this collaboration.

RQ4: challenges

The challenges found through this review were divided into security, interoperability between physical assets and software tools, latency in both data transmission and processing, data volume, data availability, and the lack of expert knowledge regarding SRM. The analyzed literature suggests that the challenges to implement SRM are mostly technical in nature. Whereas, a discussion of the challenges for the implementation of SRM in strategic maintenance is missing. Researching how SRM can support strategic maintenance processes is one of the challenges that the academy should focus on, considering the course that maintenance is taking towards more strategic approaches (GFMAM, 2021). Undertaking this challenge will in turn realize the potential of SRM to support enterprises in their fulfillment of asset management and organizational objectives.

SRM: implementation roadmap

From the previous discussion, an important outcome of this review is that the analyzed literature does not appear to consistently consider the impact that SRM can bring to strategic maintenance and its processes. By following an effective asset management policy (which is part of strategic maintenance), organizations are able to realize greater value and achieve a balance of performance, costs and risk (Parra et al., 2019). Specific roadmaps that allow SRM to be exploited in strategic maintenance and asset management were not found. Even if few works propose architectures that consider asset management and/or organizational goals (Huber et al., 2019; Wiemer et al., 2019; Wirth & Hipp, 2000), these do not quantify the benefits that SRM may bring to the fulfillment of these goals. Instead, most of the analyzed literature focuses on the technical requirements and challenges of SRM.

This technological focus might be sufficient in the perspective of proactive maintenance, as the ability to predict failures is enough to realize value. However, it proves to be insufficient to realize value in strategic maintenance. Because of this, the tendency of many enterprises is to implement SRM and then trying to realize value from it – following a “technology-push” approach to business.

This tendency is witnessed in Lubik et al. (2013). Here, authors surveyed 25 different Italian manufacturing enterprises. Among the interviewed companies, 15 were found to start with a technology-push philosophy, as opposed to a market-pull or demand-pull one. Among technology-push companies, decisions were usually instinct-based and their first revenue took longer to materialize (4.2 years vs 2.9 years in demand-pull organizations). The study also found that the technology-push companies had to shift to a demand-pull strategy. One reason for this change was the realization that the real needs of their clients were different than the needs they had supposed. Another reason was the inability to fulfill profit targets through their current methods.

Lubik et al. (2013) are not the only researchers that reflect on the disadvantages of deploying a given technology or tool and then trying to realize value from it. Without specific strategies, Parra and Crespo (2015) warn that companies might fall into the pitfall of implementing tools (e.g. SRM) without a clear understanding on how or why to implement them. Such implementations can result in underutilized tools that do not bring as much value as they could. Indeed, Mittal et al. (2018) conclude that there is a need for strategies that allow organizations to: i) understand their own maturity from a smart manufacturing standpoint (which is closely related to the utilized maintenance strategy); ii) learn what steps (courses of action) should be taken, according to their current maturity.

It is now clear that a technology-first approach in which SRM is implemented before designing a strategy is not desirable. A demand-first plan in which technology supports an existing strategy is then a more effective way to exploit SRM as a tool. In this regard, strategic maintenance and asset management must be considered before any attempts of implementing this (or any) tool. Enterprises first need to focus on their operational contexts, financial and regulatory constraints, and on the needs and requirements of their stakeholders (ISO, 2014). Once all of these factors have been considered, the following steps should take place:

  1. 1.

    Process: organizations must first select the processes necessary to fulfill their strategic maintenance vision. These processes might be identified by following a preexisting management framework, such as the Maintenance Management Model from Márquez (2007), or the Maintenance Management Framework from GFMAM (either Version One (GFMAM, 2016) or Two (GFMAM, 2021). Some of these processes might include identifying which assets in an organization are the most critical, analyzing which failure modes should be studied in those assets, and deciding which maintenance policy is the most appropriate for each failure mode (GFMAM, 2016).

  2. 2.

    Method: after a set of processes is established, enterprises must assign a supporting method for each process (Márquez, 2007). For instance, the selection of maintenance policy might utilize Reliability Centered Maintenance (RCM) or Total Productive Maintenance (TPM). Supporting methods should be selected as a result of defining a set of processes, and not the other way around (Parra & Crespo, 2015).

  3. 3.

    Tool: finally, after a set of processes and supporting methods has been defined, a tool or technology must be selected to support the implementation of each method. For instance, in the case of RCM, a computerized maintenance management system (CMMS) would act as an appropriate technology. The maintenance processes that are traditionally implemented in the enterprises do not need to be completely replaced or overhauled from the technology. Instead, the idea of SRM is that such processes should be improved and supported by the transparency-enabling technologies of Industry 4.0 (Roda & Macchi, 2021).

A representation of this sequential procedure is depicted in Fig. 9. The figure illustrates how the approach should shift from a technology-first one to a process-first perspective. The matter of how to exploit SRM as a tool to support specific processes, realizing value at the process and organization levels, then arises as an area for future research.

Fig. 9
figure 9

Paradigm shift from technology-push to demand-pull for the realization of organizational value

Threats to validity

Although great care went into making this review as comprehensive as possible, some threats to validity should be considered when analyzing its results. First, documents from before 2010 were for the most part not considered. Second, the decision to only analyze English documents that were indexed in either Scopus or Web of Science might have left out relevant works that do not fulfill those conditions. And finally, the selected keyword matrix (see Fig. 3) can be considered a threat to validity. The paradigms and technologies group in particular contains keywords that, to the authors’ knowledge, would return documents related to proactive, planned, and strategic maintenance. There is always the possibility, however, that there could have been keywords that were not considered which would have returned more documents related to each maintenance strategy, presenting a more accurate overview of the current literature.

Conclusions

This work aims to analyze the current state-of-the-art concerning Smart Retrofitting in Maintenance (SRM). SRM stands as an alternative to the complete replacement of machinery when the digitalization of maintenance in an enterprise is desired. In this context, it is of interest to analyze where research stands and where it might go. With this in mind, the PRISMA method for systematic literature reviews was followed, and 82 papers related to SRM were analyzed. The objective was to propose a definition for SRM, identify how SRM is currently exploited in multiple maintenance strategies, and define the current drivers and challenges towards the widespread implementation of SRM. The completion of this document led to multiple relevant outlooks.

First, a definition for SRM was provided:

Smart retrofitting in maintenance refers to the development of maintenance services through the retrofitting of legacy devices with the following functionalities:

  • Data Collection: capability to collect operational data through preexisting or additional sensors;

  • Data Communication: capability to transmit sensor data within a network to local or remote actors;

  • Data Processing: capability to transform sensor data into information and knowledge through data preprocessing, visualization, and analysis.

Then, an analysis was conducted concerning the maintenance strategies in which SRM is utilized. This analysis presents a trend towards the exploitation of SRM in proactive maintenance. Finally, a set of drivers and challenges was outlined concerning the implementation of SRM. Drivers were classified in accordance with the forms in which maintenance adds value in an asset management setting; i.e. performance, risk and cost. Concerning challenges, meanly technical ones were found in the analyzed literature, while management-focused or strategic-focused challenges were only partially considered.

These results showed that the analyzed literature mainly deals with technological issues without sufficient considerations on strategy. Such a technology-first focus in the management of companies has been deemed undesirable by various authors, who instead advocate for more demand-centric and strategy-focused approaches. Starting from this result, a roadmap for the implementation of SRM was proposed suggesting the utilization of SRM as a tool to support the processes and methods of enterprises.

Based on the outcomes of this investigation, few opportunities for further research arise:

  • Strategic maintenance: due to the recent trend of maintenance, it is fundamental to implement SRM in strategic maintenance and asset management scenarios. The transparency brought by SRM has shown many benefits in proactive maintenance through FDA, PA, HM, and the implementation of CBM in general. However, the benefits that SRM may bring to strategic maintenance have not been sufficiently explored. Likewise, following the principles of strategic maintenance towards deciding how to exploit SRM in a given scenario by following a demand-first approach (for example, deciding if a specific asset should undergo SRM or if it should simply be replaced or left as it is) could be valuable. Achieving this goal might require close cooperation between academia and industries.

  • Planned maintenance: despite the fact that planned maintenance is still very relevant in enterprises, most of the analyzed works focuses on enabling transparency in proactive maintenance. If SRM were to be exploited cost-effectively in planned maintenance scenarios, a considerable amount of enterprises might benefit from it.