1 Introduction

The integration of digital technologies with industrial processes has led to a paradigm shift in manufacturing. Principles of Smart Manufacturing have enhanced production systems in terms of quality, cost, flexibility and decision-making capabilities (Mittal et al. 2020). In particular, using Machine Learning (ML) as part of analytics solutions is becoming increasingly popular as a method to meet the requirements of a rapidly evolving, dynamic manufacturing environment. Leveraging analytics solutions based on ML yields several benefits for manufacturing SMEs. For example, ML is capable of handling high-dimensional but knowledge-sparse data, which is common in the manufacturing setting. Furthermore, the patterns inferred from existing data can form the basis for predicting the behaviour of the manufacturing system, in particular to provide decision support or enhance the system directly (Wuest et al. 2016). This paper explores the potential adoption of low-cost analytics solutions in the specific context of small- and medium-sized manufacturing enterprises (SMEs) in the UK. Compared to larger companies, SMEs have a small budget, lack of analytics skills, and reduced ability to assess and address risks as well as limited access to data (Bianchini and Michalkova 2019; Bauer et al. 2020). Furthermore, existing commercial analytics solutions are usually perceived as expensive for the needs of such SMEs. Hence, we shed light on a systematic approach to facilitate the design, development and integration of simple to deploy, off-the-shelf, and low-cost analytics solutions.

Indeed, this investigation is timely given the UK government’s Industrial Strategy, and the Grand Challenges which encourage the adoption of AI and ML technologies by the manufacturing sector, while keeping in mind their potential impacts on society and the need to ensure the public can benefit from such technologies. A special mention has been the provision of the support for the adoption of ML technologies by start-ups but also other businesses.Footnote 1

As a basis for our work, we use the Digital Manufacturing on a Shoestring (DMS) (McFarlane et al. 2019) approach, which aims to increase digital capabilities of SMEs using low-cost, readily available, off-the-shelf technologies. DMS advocates digitally enabled manufacturing processes, which leverage data to enhance process performance and preserve system knowledge. In particular, DMS offers a framework for developing low-cost digital solutions in terms of five stages: (1) needs assessment, (2) solution specification and development, (3) procurement, installation and testing, (4) training, and (5) operation and maintenance. Given its successful outcomes reported in de Silva et al. (2020); Hawkridge et al. (2020, 2021), DMS is chosen as the foundation for the approach proposed in this paper.

In what follows, we characterise Machine Learning (ML) as algorithms that learn recurrent patterns from presented data and produce a computational model capable to infer judgement on newly presented data (Goodfellow et al. 2016). ML activities are usually part of an overarching process such as the Cross-Industry Standard Process for Data Mining (CRISP-DM) (Jackson 2002), which defines an independent common approach to accelerate the development and increase the reliability of data mining projects. While such processes are designed to guide data mining efforts in phases, it is important to note that the preliminary studies presented here specifically focus on the data preparation phase and the modelling phase to minimise cost and complexity when developing analytics solutions. Notably, it is often necessary to repeat the data preparation phase since some modelling techniques require specific data structures. For the remainder of this paper, we refer to Low-Cost Machine Learning as (pre-built) predictive models with few hyper-parameters or computationally inexpensive machine learning algorithms that handle well-defined, structured and intelligible data. Low-cost ML algorithms initially considered in this scope include off-the-shelf solutions usually found easy to interpret and simple to implement, therefore requiring low engineering effort and expertise to deploy in a production environment.

For the purpose of initial investigations, this research work introduces the term Machine Learning on a Shoestring (MLS) as the activity to identify low-cost ML solutions using the DMS framework. In particular, this paper seeks preliminary answers for the following research questions:

  • RQ1: What are the barriers manufacturing SMEs must overcome in order to adopt MLS solutions?

  • RQ2: What would be the characteristics of a MLS solution?

  • RQ3: What would be the potential technologies that could support a MLS solution?

  • RQ4: What are appropriate ML models for MLS solutions?

The objective of this paper is to analyse manufacturing SME requirements to identify ML adoption barriers and characterise MLS solutions. Demonstrating the application of the MLS concept to a manufacturing setting is out of the scope of these preliminary studies.

The remainder of this paper is structured as follows. Section 2 discusses existing approaches to develop low-cost analytics solutions and provides an overview of the identified knowledge gaps. Section 3 presents the approach proposed in this paper. Section 4 applies the approach to a group of SMEs, which includes the gathering and analysis of SME requirements, the characterisation of a MLS solution, a discussion of low-cost technologies, and an analysis of the model selection process. The paper concludes by presenting future research challenges.

2 Background

While a number of attempts have been made to develop low-cost ML solutions for specific applications, the majority of these focus on bespoke quality monitoring systems. For example, Koditala and Pandey (2018) propose a low-cost water quality monitoring system, which consists of low-cost temperature and turbidity sensors and uses a multiple linear regression model to predict the temperature of the environment. Others, like Kiruthika and Umamakeswari (2017), and Lim et al. (2019) developed low-cost air quality monitoring systems. While the former work relies on a Raspberry Pi to connect various sensors and a tree-like model to make decisions, the latter uses data from low-cost air quality sensors and data collected from smartphones to train a regression model. Furthermore, Souza et al. (2018) leverage smartphone sensors to monitor the road pavement condition and a supervised learning algorithm to extract relevant features for the classification process. In the manufacturing domain, Narayanan et al. (2016) combine cost-efficient vibration sensors with an open source analytics software to monitor milling operations for small shop floors. In addition to quality monitoring applications, Emanet et al. (2014) rely on low-cost microphones to accelerate and improve diagnostics decisions in healthcare.

Although the aforementioned approaches have successfully addressed associated requirements, their focus lies on leveraging inexpensive hardware to reduce the cost of the solution. While some monitoring applications rely on simple analytics, such as decision trees (Kiruthika and Umamakeswari 2017) and multiple linear regression (Koditala and Pandey 2018), most approaches are characterised by unknown data dependencies and an extensive data preparation task, and require laborious parameter tuning and a comprehensive understanding of the underlying problem. These clearly require advanced analytical skills, thereby raising the barrier to the adoption of analytics solutions.

Therefore, developing low-cost analytics solutions for manufacturing SMEs pose specific challenges that need to be addressed: (1) there is a lack of approaches which aim to minimise the expenditure of the modelling and data preparation phase of developing a ML solution, (2) a methodology is needed for analysing the requirements of SMEs to identify appropriate methods for low-cost analytics solutions, and (3) besides monitoring applications, which constitute the majority of current analytics approaches, SMEs require low-cost analytics solutions in areas that are underrepresented thus far (Schönfuß et al. 2020).

3 An approach to low-cost machine learning

This paper proposes a general approach to identify low-cost analytics solutions for manufacturing SMEs, with particular emphasis on ML. To this end, the proposed methodology involves five steps, which are summarised in Fig. 1: (1) define criteria for using ML in a digital solution to separate solutions which require ML from those that merely examine the data; (2) identify high-priority analytics solution areas for the purpose of addressing relevant needs of SMEs in the UK; (3) gather detailed solution requirements of SMEs by conducting semi-structured interviews, (4) analyse the qualitative data from the requirements to identify adoption barriers SMEs must overcome and the required characteristics of low-cost ML solutions; and (5) determine appropriate low-cost ML models and technologies based on the gathered solution requirements of SMEs, an assessment of the adoption barriers, and the characteristics of a low-cost ML solution. In the following, we describe the approach in more detail and apply it to a group of SMEs.

Fig. 1
figure 1

An approach to identify low-cost ML solutions for manufacturing SMEs

4 Low-cost machine learning for SMEs

In this preliminary study, we define two criteria for using ML in a digital manufacturing solution: (a) the analytics solution is required to make observations on previously captured, historical data to infer judgement on newly presented data, and (b) the training data set is sufficiently separated and independent (Fedyk 2016). Therefore, if the analytics solution merely analyses the data and provides decision support (e.g. visualisation of bottlenecks in planning and operation) without inferring judgement on new data (e.g. apply a model to make predictions), ML is not required, and if the analytics solution relies on someone (manually) creating the training data set then the use of ML in this context should be avoided.

4.1 SME requirements

To identify the barriers to adopt analytics solutions (RQ1), it is essential to study the needs of SMEs. High-priority analytics solution areas are identified by starting from the work presented in Schönfuß et al. (2020), who studied the digitalisation requirements of manufacturing SMEs in the UK and provided a catalogue of (general) digital solution areas. Based on this catalogue, six analytics solution areas are identified, which we deemed to require data analysis or ML: (1) capacity monitoring of human and machine resources, (2) gathering and analysis of product or customer demand, (3) cost modelling of disruptions and changes, (4) predictive maintenance, (5) automated quality inspection, and (6) automated bottleneck identification in operations. To gather detailed requirements for the solution areas, semi-structured interviews were conducted with manufacturing and construction SMEs in the UK. Construction SMEs were included because they are similar to manufacturing SMEs, and they seem to have similar priorities based on our initial interactions and workshops with the construction sector; more importantly, the inclusion of construction SMEs produced a larger data set. During the interviews, each SME was asked to describe required features of the selected analytics solution area. A transcription of the gathered requirements is shown in Table 1.

Table 1 Transcript of the requirements and solution approach for each of the following high-priority analytics solution areas: (1) capacity monitoring of human and machine resources, (2) gathering and analysis of product or customer demand, (3) cost modelling of disruptions and changes, (4) predictive maintenance, (5) automated quality inspection, and (6) automated bottleneck identification in operations

For the capacity monitoring solution, the identified approach is not based on an analytics solution with ML, but rather a visualisation tool for monitoring purposes. The same holds for the solution involving the gathering and analysis of product or customer demand, where the interviewed construction SME needed a solution that integrates with legacy systems and analyses the fetched data to identify disruptions and bottlenecks in operations. There were two SMEs who required analytics solutions for cost modelling of disruption and changes. While the former cost model was for the price of the finished product of a construction project, the latter required a model for measuring the wasted time between manufacturing processes. Both analytics solutions could be realised by performing a data analysis to detect bottlenecks and would not need additional learning capabilities. However, for the predictive requirement, the SME did require a ML-based analytics solution in the context of facility management, which comprised sensors to monitor the condition of machines and a prediction algorithm that utilised current and historical data for maintaining these machines. This analytics solution did benefit from the increased capabilities induced by the ML model, but in principle it could also be solved by a tree-like decision algorithm which does not infer judgement on new data. For automated quality inspection, two manufacturers were interviewed: the former relied on the capabilities of a Fast Fourier Transform (FFT) algorithm to evaluate the quality of manufactured composite braids, and the latter could be realised using sensors in the manufacturing process to assess the quality of the parts, without using a ML model. Finally, two construction SMEs needed analytics solutions for automated bottleneck identification, which maps and analyses the data fetched from another application to identify disruptions and bottlenecks in operations. In particular, the data are analysed by comparing the cycle times of distinct processes from the construction schedule to detect bottlenecks. Thus, these analytics solutions do not require ML as well.

The initial results indicate that, except for the predictive maintenance and automated quality inspection solutions, analytics solutions needed by manufacturing and construction SMEs tend to be simpler than ML-based solutions. That is, it appears that the majority of solution requirements can be addressed by easier approaches based on mapping and analysing small data sets to for example detect disruptions and bottlenecks in operations. For instance, to create the cost model for determining the price of the finished product of a construction project, the analytics solution could merely fetch the data, including the price for materials and energy costs, and detect disruptions, such as sudden price drops. In summary, our initial results seem to indicate that there is room for a simpler type of analytics approach than ML, which we characterise in the next section.

It is noteworthy to mention that the considered SMEs may face some challenges besides the adoption of ML. For instance, the development of low-cost analytics solutions is restricted by legacy systems, which need to be integrated into the digital solution. For example, for gathering and analysis of product or customer demand, the low-cost analytics solution would need to interface with an existing cost estimation application to fetch relevant data for the analysis. Furthermore, if datasets are larger or describe complex data dependencies, SMEs may require a workforce with advanced analytical skills capable of systematically investigating the existing data before considering any ML technique. This is because using a method that is inappropriate for a given dataset might yield poor results. For predictive maintenance, for example, increasing the frequency of data collection could create larger datasets with potentially more dimensions to consider. Without studying the new data before adopting the ML model the solution may yield unreliable judgements, as the model may not be capable of representing all dimensions accurately.

4.2 Machine learning on a shoestring

If a ML solution was deemed to be necessary as per the two criteria defined in Sect. 4, we argue here for the use of a Machine Learning on a Shoestring (MLS) solution, which is an approach to identify low-cost analytics solutions for manufacturing SMEs. Based on low-cost technologies, this methodology aims to facilitate the use of ML by creating ‘simple solutions’ (RQ2). Analytics problems that can be solved using such simple solutions are characterised by nearly zero data pre-processing tasks, well-defined data dependencies, and computationally inexpensive algorithms, which can run on low-cost hardware and return useful results within a certain period of time. Compared to the majority of scarce big data problems, MLS solutions handle intelligible data and lead to results that can be understood by the average end-user at an SME. The proposed characteristics of a MLS solution is presented in Table 2 (some of which are discussed in Section 4.3).

Table 2 Proposed characteristics of a Machine Learning on a Shoestring solution

Based on the requirements of manufacturing and construction SMEs in the UK, the following subsections review suitable low-cost technologies and provide guidelines for selecting appropriate ML methods for developing a MLS solution.

4.3 Low-cost technologies

In regard to technologies for developing low-cost analytics solutions (RQ3), we differentiate between cloud-based and on-device systems for training and inference. Cloud-based ML and data science platforms, such as IBM Watson, Azure Machine Learning Studio by Microsoft and the Google Cloud AI Platform, enable the development of scalable ML applications without requiring advanced data science skills. However, they are limited in terms of transparency, whereby SMEs are for the most part unable to control and supervise the training process and the results cannot be interpreted without having in-depth knowledge. Furthermore, there might be a security compromise, if the data is not thoroughly prepared (e.g. anonymised) before using a cloud-based service. In contrast, there are various cross-platform ML technologies, such as TensorFlow Lite and Caffe2, which enable on-device training and inference and meet the performance characteristics of a MLS solution (Zhang et al. 2018). While some of these systems provide pre-trained models which simplifies the development of specific applications, the majority of on-device toolkits require advanced technical skills for creating a MLS solution, especially for selecting an appropriate model.

4.4 Selecting models for low-cost machine learning

As a final remark, we analyse the selection process of suitable ML models (RQ4). In this paper, we want to propose two methods for the model selection: (1) filter and select common ML algorithms based on the MLS characteristics, and (2) leverage technologies which support the model selection process. While the first method provides a simple guideline to create a general list of suitable models, SMEs still require additional guidance to help them select a specific model to successfully develop and implement a low-cost analytics solution. In an attempt to alleviate this challenge, several technologies have been developed in recent years, most prominently, AutoML, which describes an approach to build ML solutions with little human intervention. Although analytics solutions created by AutoML achieve similar results compared to manually designed solutions, they are limited in terms of interpretability, reproducibility and robustness (Hutter et al. 2019; He et al. 2021). In addition, there is a lack of real-world applications based on AutoML, and in most cases, several parameters need to be defined manually, hence requiring in-depth knowledge.

For specific vision-based solutions, much effort can be saved by relying on deep learning for image processing. Besides AutoML, multiple technologies have been proposed, which are optimised for image analysis tasks. For example, the Transfer Learning Toolkit by Nvidia offers numerous pre-trained models which require no coding. However, these models are trained for specific domains, such as detecting pedestrians in an urban environment. Hence, they are harder to adapt and reuse in manufacturing or construction without retraining the model.

To achieve low prediction errors, the ML model needs to be sufficiently complex (Goodfellow et al. 2016). In the context of ML, the complexity of a model is defined by its capacity. A model which is too simple overly generalises the underlying data dependencies, whereas a model with a higher capacity may lead to overfitting, which both yield undesirable errors. Nevertheless, to improve the applicability of MLS solutions, it might be beneficial to accept models with larger errors as opposed to less complex ML models. There is a need for conducting case studies to analyse the costs and benefits of this approach by evaluating models of varying complexity.

5 Conclusion

In this paper, we have presented a general approach to identify low-cost ML solutions for manufacturing SMEs. To facilitate their development and integration, we have gathered and analysed the requirements of six manufacturing and construction SMEs, discussed potential low-cost technologies and proposed two methods for the selection of appropriate ML models. Our preliminary results seem to suggest that, contrarily to what is usually thought at first glance, SMEs seldom need digital solutions that use advanced ML algorithms which require extensive data preparation, laborious parameter tuning and a comprehensive understanding of the underlying problem. If an analytics solution does require learning capabilities, a ‘simple solution’, which we have characterised in this paper, should be sufficient. Although we have focused on low-cost analytics solutions in manufacturing and construction SMEs in this study, the general approach to low-cost ML solutions presented here can be applied to a variety of areas, such as logistics and healthcare.

In this paper, we have touched on two ethics principles, namely transparency and security. In particular, since there is a lack of transparency in cloud-based systems, we argue for leveraging on-device applications, even though in many cases, developing a solution using cloud-based ML services requires less technical skills. While the cost of creating on-device solutions can be reduced by selecting appropriate ML models and technologies, there exists no cloud-based system that is sufficiently transparent. Moreover, cloud-based system might compromise the security of data if it is not carefully prepared. Both issues significantly reduce the applicability of cloud-based systems for manufacturing SMEs. Consequently, our approach identifies low-cost ML solutions that provide acceptable levels of transparency, security and interpretability by selecting appropriate technologies and models.

There are a number of knowledge gaps that would benefit from further studies. Since the majority of low-cost technologies still require in-depth knowledge, there is a need for simplifying their utilisation for SMEs. Although AutoML supports the automatic generation of ML solutions, it still requires human intervention and technical skills to develop real-world applications. Finally, to select appropriate models for low-cost analytics solutions, a cost–benefit analysis based on case studies needs to be performed to compare models that underfit the data with models of optimal capacity.