Big Data Analytics in the Manufacturing Sector: Guidelines and Lessons Learned Through the Centro Ricerche FIAT (CRF) Case

Alexopoulos, Andreas; Becerra, Yolanda; Boehm, Omer; Bravos, George; Chatzigiannakis, Vassilis; Cugnasco, Cesare; Demetriou, Giorgos; Eleftheriou, Iliada; Fotis, Spiros; Genchi, Gianmarco; Ioannidis, Sotiris; Jakovetic, Dusan; Kallipolitis, Leonidas; Katusic, Vlatka; Kavakli, Evangelia; Kopanaki, Despina; Leventis, Christoforos; Martínez, Miquel; Mascolo, Julien; Milosevic, Nemanja; Montanera, Enric Pere Pages; Ristow, Gerald; Ruiz-Ocampo, Hernan; Sakellariou, Rizos; Sirvent, Raül; Skrbic, Srdjan; Spais, Ilias; Spennacchio, Giuseppe Danilo; Stamenkovic, Dusan; Vasiliadis, Giorgos; Vinov, Michael

doi:10.1007/978-3-030-78307-5_15

Andreas Alexopoulos⁷,
Yolanda Becerra⁸,
Omer Boehm⁹,
George Bravos¹⁰,
Vassilis Chatzigiannakis¹⁰,
Cesare Cugnasco⁸,
Giorgos Demetriou¹¹,
Iliada Eleftheriou¹²,
Spiros Fotis⁷,
Gianmarco Genchi¹³,
Sotiris Ioannidis^14,15,
Dusan Jakovetic¹⁶,
Leonidas Kallipolitis⁷,
Vlatka Katusic¹¹,
Evangelia Kavakli¹²,
Despina Kopanaki¹⁵,
Christoforos Leventis¹⁵,
Miquel Martínez⁸,
Julien Mascolo¹³,
Nemanja Milosevic¹⁶,
Enric Pere Pages Montanera¹⁷,
Gerald Ristow¹⁸,
Hernan Ruiz-Ocampo¹¹,
Rizos Sakellariou¹²,
Raül Sirvent⁸,
Srdjan Skrbic¹⁶,
Ilias Spais⁷,
Giuseppe Danilo Spennacchio¹³,
Dusan Stamenkovic¹⁶,
Giorgos Vasiliadis¹⁵ &
…
Michael Vinov⁹

12k Accesses
3 Citations

Abstract

Manufacturing processes are highly complex. Production lines have several robots and digital tools, generating massive amounts of data. Unstructured, noisy and incomplete data have to be collected, aggregated, pre-processed and transformed into structured messages of a common, unified format in order to be analysed not only for the monitoring of the processes but also for increasing their robustness and efficiency. This chapter describes the solution, best practices, lessons learned and guidelines for Big Data analytics in two manufacturing scenarios defined by CRF, within the I-BiDaaS project, namely ‘Production process of aluminium die-casting’, and ‘Maintenance and monitoring of production assets’. First, it reports on the retrieval of useful data from real processes taking into consideration the privacy policies of industrial data and on the definition of the corresponding technical and business KPIs. It then describes the solution in terms of architecture, data analytics and visualizations and assesses its impact with respect to the quality of the processes and products.

You have full access to this open access chapter, Download chapter PDF

Manufacturing process data analysis pipelines: a requirements analysis and survey

Article Open access 07 January 2019

Applying Big Data Concepts to Improve Flat Steel Production Processes

Big data for furniture intelligent manufacturing: conceptual framework, technologies, applications, and challenges

Article 08 May 2024

Keywords

1 Introduction

The manufacturing industry transforms material or assembles components to produce finished goods that are ready to be sold in the marketplace. The organizational structure of manufacturing companies is very complex and involves many business and operative functions with different roles and responsibilities in order to guarantee efficiency at every level [1]. The fourth industrial revolution [2, 3] has initiated many changes in the industrial value chain, transforming the shop floor, which is the production part of the manufacturing industries. Companies are introducing process equipment provided with several robots and digital tools. In this way, it is possible to set and control processes in an automated manner that speeds up production with a high level of accuracy [4]. Furthermore, large volumes of data are generated every day that may be collected and analysed for increasing process robustness and efficiency and building a technical cycle that reduces the consumption of energy and material. However, despite the potential benefits offered by the exploitation of Big Data, its usage is still at an early stage in many manufacturing companies [5].

Centro Ricerche FIAT (CRF) is one of the main private research centres in Italy and represents Fiat Chrysler Automobiles (FCA) in European and national collaborative research projects. In the context of the European Horizon 2020 I-BiDaaS project,^{Footnote 1} CRF identified two use cases, in which complex datasets are retrieved from real processes. By exploiting Big Data analytics in these two cases, CRF aims to improve the process and product quality in a much more agile way through the collaborative effort of self-organizing and cross-functional teams, reducing costs due to further processing and predicting faults and unnecessary actions. This requires solutions that will allow manufacturing experts to interact with Big Data [6] in order to understand how to easily utilize important information often hidden in raw data. In other words, the first best practice (1)^{Footnote 2}is the correlation between the value of Big Data technology and the skills of people involved in the data management process. The I-BiDaaS approach follows this best practice and develops a self-service [7] Big Data analytics platform that enables different CRF end-users to exploit Big Data in order to gain new insights assisting them to make the right decisions in a much more agile way.

The aim of this chapter is to demonstrate how advanced analytic tools can empower end-users [8] in the manufacturing domain (see Sect. 5) to create a tangible value from the process data that they are producing, and to identify a number of best practices, guidelines and lessons learned. For future reference, we list here the main best practices with the identified guidelines and lessons learned, while they will be discussed in detail throughout the chapter:

The correlation between the value of Big Data technology and the skills of people involved in the data management process with the involvement of different departments belonging to the same or different organizations in order to extract the value of all data collected from several sources and levels (breaking data silos).
The alignment of the Big Data requirements with the business needs and the definition of appropriate experiments with the identification of Big Data technologies most suitable for the specific identified business requirements.
The management of the type of data generated with the identification of the types of data useful for the analysis, their anonymization and generation of synthetic data in parallel with the process of data anonymization.
The development of a solution that satisfies Big Data requirements of specific use cases by mapping the identified functional and non-functional concerns into a concrete software architecture with the development of Advanced Visualization tools for showing high-value Big Data analytics solutions for domain experts and operators.

The remainder of this chapter is organized as follows. Section 2 describes the process followed for the identification of the Big Data requirements in the manufacturing sector and demonstrates how it was applied to elicit the requirements of the CRF use cases, which are imposed the design of the I-BiDaaS Big Data solution. Furthermore, CRF requirements guide the definition of the experiments for assessing the developed system, described in Sect. 3. The architecture of the I-BiDaaS solution is described in Sect. 4. Finally, Sect. 5 reports on the lessons learned, challenges and guidelines reflecting the experience of the I-BiDaaS project. Section 5 also provides the connection of the described work with the Big Data Value (BDV) reference model and its Strategic Research and Innovation Agenda (SRIA) [9]. Finally, Sect. 6 concludes the chapter.

2 Requirements for Big Data in the Manufacturing Sector

Alignment between business strategy and Big Data solutions is a critical factor for achieving value through Big Data [10]. Manufacturers must understand how the adoption of Big Data technologies [11] is related to their business objectives in order to identify the right datasets and increase the value of the analytics results. Therefore tailoring Big Data requirements to the business needs is the second best practice (2) reported in this chapter.

In more detail, the I-BiDaaS methodology for eliciting CRF requirements draws on work in the area of early Requirements Engineering (RE), which considers the interplay between business intentions and system functionality [12, 13]. In particular, the requirements elicitation followed a (mostly) top-down approach whereby business goals reflecting the company’s vision were progressively refined in order to identify the user requirements of specific stakeholder groups (i.e. data providers, Big Data capability providers and data consumers). Their analysis resulted in the definition of system functional and non-functional requirements, which describe the behaviour that a Big Data system (or a system component) should expose in order to realize the intentions of its users. This process was facilitated by the use of appropriate questionnaires. In the cases that information on the requirements was available (either collected in the context of the project setup phase, or identified through a review of related literature [10, 14]), this was used to partly pre-fill the questionnaires and minimize end-users’ effort. Evidently, users were asked to check pre-filled fields and ensure that documented information was valid and accurate.

Table 1 gives a summary of the CRF requirements. Although it provides only an excerpt of the elicited CRF requirements, it demonstrates the application of the I-BiDaaS way-of-working in the CRF use cases.

Table 1 CRF Big Data requirements

Full size table

In more detail, the strategic CRF business goal (R1) was refined into a number of more operational business goals that need to be satisfied through Big Data analytics (R3). In addition, a number of relevant KPIs (R6) were defined that can be used to assess the proposed solution (see Sect. 3). Continuing, at the user requirements level, requirements were described in terms of the characteristics of different data sources that are planned to be used (requirements R7 and R8), the analytics capability of the proposed solution envisaged (R9) and the different interface requirements of the end-users that will consume the analytics results (R10–R12). Finally, analysis of the above user requirements resulted in the generation of the system requirements, both functional (R13) and non-functional (R14 and R15). Although described in a linear fashion, the above activities were carried out in an iterative manner, resulting in a stepwise refinement of the results being produced. The complete list of CRF requirements elicited is described in detail in [15].

Further to forming the baseline of the I-BiDaaS solution (see Sect. 4), these requirements also assist the definition of experiments as described in Sect. 3.

3 Use Cases Description and Experiments’ Definition: Technical and Business KPIs

The aim of experimentation is to assist stakeholders’ acceptance of any new Big Data solution. The definition of appropriate experiments is thus another best practice (3) reported in this chapter. In particular, the definition of CRF experiments aims at evaluating and validating the I-BiDaaS solution and its implementation in the context of CRF use cases. It follows a goal-oriented approach, whereby the experiment’s goal(s) towards which the measurement will be performed are defined, then a number of questions are formed aiming to characterize the achievement of each goal and, finally, a set of Key Performance Indicators (KPIs) and associated metrics is associated with every question in order to answer it in a measurable way.

The definition of each experiment also involved the specification of the experiment’s workload in terms of the use case datasets and type of analysis envisaged, as well as the definition of the experimental subjects that will be involved in the experiment, as reported in the following Sects. 3.1 and 3.2 that discuss, respectively, the ‘Production process of aluminium die-casting’ and ‘Maintenance and monitoring of production assets’ use cases.

3.1 Production Process of Aluminium Die-Casting

The ‘Production process of aluminium die-casting’ use case generates complex datasets from the production process of the engine blocks. During the die-casting process [16, 17], molten aluminium is injected into a die cavity, mounted in a machine, in which it solidifies quickly. In this case, we have a large number of interconnected process parameters that influence the flow behaviour of molten metal inside the die cavity, and, consequently, the productivity and the quality [18,19,20]. Henceforth, the fourth best practice (4) is to identify the type of data generated. Data collected from several sources can be disorganized and in different formats and data may not be exploited.

In this use case, the data provided for the analyses consist of a collection of casting process parameters, such as piston speed in the first and second phase, intensification pressures and others. In addition to the process data, CRF also provided a large dataset of thermal images of the engine block casting process, under a hypothesis that there is a correlation among process data, thermal data and the outcome of the process.

For the mentioned complexity of the process, it is important to not only carefully design parameters and temperatures but also to control them because they have a direct impact on the quality of the casting.

Analysis of the datasets aims to predict whether an engine block will be produced correctly during the casting process in order to avoid further processing and scraps, which would lead to financial savings for the manufacturers.

To test the efficiency of the I-BiDaaS solution in this context, an experiment has been defined, as shown in Table 2. As seen in Table 2, the Business KPI ‘Product/service quality’ identified during requirements elicitation (see Sect. 2) was further elaborated in order to define appropriate metrics (quality control levels related to good and defective products) and to map it to appropriate indicators at the I-BiDaaS solution level (execution time, data quality, cost).

Table 2 Overview of the ‘Production process of aluminium die-casting’ experiment

Full size table

For each KPI, a baseline value for evaluating the performance of the I-BiDaaS solution has also been defined. For example, an increase of 2–6% of the quality control level related to good products and a decrease of 1–4% and 0.05–2% of the two quality control levels related to defective products is sought in order to satisfy manufacturers’ requests in terms of product quality.

3.2 Maintenance and Monitoring of Production Assets

In this use case, data have been retrieved from sensors mounted on several machines (e.g. linear stages, robots, elevators) along the production line of vehicles. Many related works are conducted in this field concerning, e.g., sensor applications in tool condition monitoring in machining [21], predictive maintenance of industrial robots [22] and assessing the health of sensors using data historians [23].

We focused on welding lines in which robots are used to assemble vehicle components, and flexibility is required for the continual changes of the types of components and vehicles. A data server gathers sensor data, which is categorized into two different datasets, namely SCADA and MES. The SCADA dataset contains production, process and control parameters of daily vehicle production and is structured as in Table 3.

Table 3 Structure of the dataset for the SCADA data

Full size table

There are over 100 sensors and each one is identified by a specific number (id). The other columns report on the value of the specific sensor, the unit of measurement and the timestamp.

The MES dataset contains specific data associated with the type of vehicle being produced and is structured as in Table 4.

Table 4 Structure of the dataset for the MES data

Full size table

When OP020.Passo20 changes from 0 to 1, a new vehicle enters into the area provided with sensors and modello_op_020 indicates the model of the vehicle being processed.

Analysis of this data aims at predicting unnecessary actions and the improvement of the efficiency of manufacturing plants by reducing production losses. Once again, an experiment has been defined in order to test the efficiency of the I-BiDaaS solution in this context. The key points of the ‘Maintenance and monitoring of production assets’ experiment are shown in Table 5. In particular, data was analysed to obtain thresholds for anomalous measurements for all sensors. The fifth best practice (5) is the building of a foundational database with the history of anomalies that may help end-users to plan maintenance through prevision of asset failures only when it is necessary.

Table 5 Overview of the ‘Maintenance and monitoring of production assets’ experiment

Full size table

As shown in Table 5, the business KPIs reported during requirements elicitation were further elaborated to identify related metrics (Overall Equipment Effectiveness (OEE) [24, 25] and maintenance costs [26]) and to map them on specific indicators at the Big Data solution level (execution time, data quality and cost).

For each KPI, a baseline value for evaluating the performance of the I-BiDaaS solution has been defined. For example, the prediction of unnecessary actions and the improvement of the efficiency should reduce production losses and achieve greater competitiveness of the company by an increase of 0.05% of the current Overall Equipment Effectiveness (OEE) and a decrease of 50% in maintenance costs.

4 I-BiDaaS Solutions for the Defined Use Cases

The final best practice (6) reported in the following sections relates to the development of a solution that satisfies Big Data requirements of specific use cases by mapping the identified functional and non-functional concerns into a concrete software architecture [27]. In particular, the general requirements reported in Sect. 2 were further clarified, taking into consideration the specific context of each use case (described in Sect. 3), resulting in customized solutions per use case described in Sects. 4.1 and 4.2.

For both use cases, data gathered from the production lines are sent to CRF, where they are manipulated and masked. After the anonymization, data are sent to the I-BiDaaS Platform, hosted in a Virtual Machine. This represents a bridge between the I-BiDaaS infrastructure and CRF internal server, created by the I-BiDaaS technical partners. The same bridge is used to send the analytics results to the production plant end-users, as seen in Fig. 1.

4.1 Production Process of Aluminium Die-Casting

In this section, the architecture, data analytics, visualization and results for the ‘Production process of aluminium die-casting’ use case are described.

4.1.1 Architecture

Figure 2 shows the architecture of this use case, which consists of several well-defined components. The Universal Messaging component is used for communication with most of the other components. To start with describing the data flow for this use case, we first consider the dataset. Data is transferred from CRF’s internal server to the I-BiDaaS platform server. Therein, the data is pre-processed and cleaned—this step is important as the data needs to be prepared for model training and inference tasks. Then, the data is given to the Machine Learning algorithm from the I-BiDaaS pool of ML algorithms. In this use case, the model is a complex neural network implemented in PyTorch^{Footnote 3} and trained jointly from thermal images and sensor datasets. The Machine Learning component outputs two results: training metrics/results for visualization purposes—used in the Advanced Data Visualization component—and the trained model used for inference. Both these results are transferred through Universal Messaging. In the end, for inference purposes, the Model Serving (Inference) Service component is used. In the initial phases of development, before the real data is fully prepared (e.g. retrieved, anonymized, etc.), the architecture uses realistic synthetic data for initial components development. The use of synthetic data can make the development significantly more agile, but is utilized with care and under a quality assurance process. For example, a final trained ML model has to be delivered on real data. We refer to Sect. 4.1.5 for details on realistic synthetic data generation and quality assessment.

4.1.2 Data Analytics

In this section, we describe in more detail the data analytics solution that corresponds to the four respective modules in Fig. 2 (Data pre-processing, PyTorch neural network model, Trained model and Training results) and that analyses the thermal images and the sensors datasets.

Under the hypothesis that there is a correlation among sensor data, thermal data and the outcome of the process, a further task is to classify combined image and sensor data inputs to see whether the cast engine blocks are without any production faults. Formally, data analytics here corresponds to an M-ary supervised classification task [28]. As the dataset involves image classification, for this task we utilize Deep Convolutional Neural Networks [29].

We tried three approaches during this use-case analytics development regarding the input data: unmodified thermal images, grayscale thermal images and raw sensor data. For raw sensor data the thermal camera provides a matrix of values which is the same dimension as the image, which when normalized provides very similar (almost the same, depends on the normalization process) input to the grayscale image from the computing standpoint. While the grayscale image and the raw sensor data did have faster training times (one channel for convolutions versus three for thermal images) from our experiments the thermal images gave best accuracy/precision/recall metrics so we decided to keep using them. We suspect that this is the case because modern neural network architectures we are using (e.g. DenseNet [29, 30]) are optimized to work with coloured images (e.g. ImageNet dataset [31]). The corresponding results are reported in Sect. 4.1.4.

4.1.3 Visualizations

The approach to visualize the die-casting process results in real time involves the deployment of a number of constantly updated visualizations which offer a complete overview of the results. These include the values of monitored sensor variables and the final classification of the end products of the process.

We report here, as example, the Global Live Chart that allows end-users to timely visualize the trend of the main parameters (e.g. velocity, pressure, standard deviation, etc.) and to check the classification levels (Fig. 3).

4.1.4 Results

The models described in Sect. 4.1.2 were trained on both the original and the newly balanced datasets. We favour the model trained on the balanced dataset as it learns to recognize faulty engine blocks much better than the model trained on the imbalanced dataset, even though the overall accuracy is lower—simply because we have less faultless engines. In Fig. 4, we see the accuracies of both models on the training and testing datasets (standard 80/20 split). The orange (top) line is the model trained on the full dataset and the pink line is the model trained on the balanced dataset^{Footnote 4} [32].

4.1.5 Synthetic Data Generation and Quality Assessment

An initial development of the use case solution was carried out with realistic synthetic data. In parallel with the process of data anonymization, making data structured, etc., it was useful to carry out a synthetic data generation for early development stages with particular caution when extracting insights from synthetic data.

The fabrication of synthetic data that exhibits similar characteristics and similar distribution as the real data is a challenging task. The IBM Test Data Fabrication technology (TDF) was used for that purpose. TDF requires constraint rules that model the relationships and dependencies between the data and leverages a Constraint Satisfaction Problems (CSP) solver to fabricate data that satisfies these constraints. The rules for the production of synthetic data were set by CRF with the help of IBM. The correlation between the real parameters and the synthesized parameters was further refined after reiteration of the data analysis.

For the initial evaluation of the synthetic data, we performed empirical and analytical validations. The empirical technique consisted of delivering these data to the expert production technicians, which were not able to indicate any difference with the actual production data, as there was no distinguishing factor for them. The second analytical technique was carried out by the CRF research team. They used the K-Means algorithm [33] as their desired technique. Further evaluation was carried out by IBM while striving to perform a qualitative generic evaluation process for the real data compared with the fabricated data. This evaluation was concerned with methods to judge whether the distributions of the fabricated data and the original data were comparable, what is commonly referred to in the literature as the general utility of the datasets. In addition to the general utility, IBM also considered the specific utility, i.e. the similarity between the synthetic data and the original data.

The propensity mean-squared-error (pMSE) [34] was used as a general measure of data utility to the specific case of synthetic data. Propensity scores represent probabilities of group memberships. If the propensity scores are well modelled, this general measure should capture relationships among the data that methods such as the empirical Cumulative Distribution Function (CDF) may miss.

The method is a classification problem where the desired result is poor classification (50% error rate), giving better utility for low values of the pMSE.

Randomly sampling 5000 data points from the real and synthetic datasets, and using a logistic regression to provide the probability for the label classification, we were able to show that the measured mean pMSE score for the ‘Production process of aluminium die-casting’ dataset is 0.218 with a standard deviation of 0.0017, as shown in Fig. 5.

4.2 Maintenance and Monitoring of Production Assets

In this section, the architecture, data analytics, visualization and results for the ‘Maintenance and monitoring of production assets’ use case are described.

4.2.1 Architecture

Figure 6 shows the architecture, which consists of several well-defined components. The Universal Messaging component is used for communication in most of the components. To start to describe the data flow, we start with the dataset. Data are sent from CRF to the I-BiDaaS platform. There, the data is pre-processed and prepared for model training with an outlier detection model. The outlier detection model outputs two results: training results for visualization purposes—used in the Advanced Data Visualization component, and the trained model used for inference. Training results are transferred through Universal Messaging. In the end, for inference purposes, the Model Inference Serving component is used. It is also important to say that all the components use containerized (i.e. Docker^{Footnote 5}) backbone from the Storage and Container Orchestration Service. Data is visualized and the jobs are scheduled through the I-BiDaaS User Interface component.

4.2.2 Data Analytics

Data, described in Sect. 3, has been transformed into separate time series—one per sensor so that each sensor can be monitored separately. Since the measurements were not labelled (anomalous/non-anomalous), outlier detection algorithms arose as natural candidates for this use case [35]. We constructed an outlier detection model for each of the time series. While more advanced algorithms can be used, we adopted a simple, easy-to-implement and computationally cheap, yet here effective, solution based on the Inter-Quartile Range (IQR) test. Results of these models could be used for suggesting if a measurement is an outlier and for discovering the pairs of sensors that have anomalous measurements at similar timestamps. Preparation of these models was done using Python, and it consisted of the following steps:

1.
For each sensor, obtain thresholds for anomalous measurements using a modified interquartile range (IQR) test. Three different variants of IQR-like tests were performed:

(Q₁, Q₃) ∈ {(5th, 95th), (10th, 90th), (25th, 75th)} where Q₁ and Q₃ are the corresponding percentiles.
2.
With obtained thresholds, filter the time series such that only anomalous measurements were kept, as shown in Fig. 8.
3.
Calculate the Dynamic Time Warping (DTW) [36] distance between outlier time series.
4.
Rescale distances to [0, 1].
5.
Group pairs of sensors by the distance into groups:

[0, 0.1), [0.1, 0.2)…[0.9, 1].

Time series with anomalous measurements obtained in step 2 enabled us to see the outlier trends for each sensor and to compare their behaviour. Comparison of anomalous trends was made using steps 3, 4 and 5. If the distance obtained in step 5 is small, it means that two sensors output anomalous measurements in a similar fashion. Therefore, if one of them fails, then the other sensor in the pair should also be inspected. We present the distribution of sensors’ similarity in Fig. 9.

4.2.3 Visualizations

Data stemming from the aforementioned analysis are presented using a multi-step approach that allows operators drill down to sensory data and detected anomalies in an intuitive and easy-to-use way. Starting from a given month, operators then select the category of sensors they wish to see and immediately have an overview of the ones having anomalies detected, as shown in Fig. 7. Upon selection of a sensor, operators see the anomalies detected during the selected month and can furthermore select a specific day to see the actual values and therefore review the actual anomaly that was detected, as shown in Fig. 8.

4.2.4 Results

The obtained boundaries (from step 2 in Sect. 4.2.2) could be used for daily analysis of sensors and various visualization tasks, such as showing the number of anomalous measurements for the current day, as seen in Fig. 8, comparing the number of outliers between two sensors for the given time window, etc., as shown in Fig. 9.

5 Discussion

Reflecting on CRF’s experience and all the work done within the I-BiDaaS Project, this section develops several recommendations addressed to any manufacturing company willing to undertake Big Data projects. This section also positions the I-BiDaaS solution within Big Data Value (BDV) reference model and Strategic Research and Innovation Agenda (SRIA).

5.1 Lessons Learned, Challenges and Guidelines

The I-BiDaaS project developed an integrated platform for processing and extracting actionable knowledge from Big Data in the manufacturing sector. Based on the challenges experienced and lessons learned through our involvement in I-BiDaaS, we propose a set of guidelines for the implementation of Big Data analytics in the manufacturing sector, with respect to the following concerns:

1.
Data storage and ingestion from various data sources and its preparation: In a production line deploying digital instruments, there are many devices which setup operating values and adjust and control parameters during the production processes. Depending on whether we want to act on the quality of the production process or on the maintenance of the equipment, the first challenge is to understand how data will be ingested and managed from data sources over time and who will be able to access them. Furthermore, this aspect highlights the importance of breaking data silos by extracting the value of all data collected from several sources and levels and may be necessary to involve different departments belonging to the same or different organizations.
2.
Data cleaning: A second important aspect is to understand which types of data can be useful for analysis. This implies the importance of data cleaning in order to identify incomplete, inaccurate and irrelevant parts of the generated dataset.
3.
Fabrication of realistic synthetic data for experimentation and testing:

Data are strictly confidential, so another challenge is to decide how data will be shared if external analysis is required. In this case, manufacturers need to evaluate the possibility of fabrication of realistic synthetic data for experimentation of the analytical models that will be developed and then to test the same models with anonymized real data.
4.
Batch and stream analytics for increasing the speed of data analysis: After collecting and analysing data, it is necessary to understand which Big Data technologies are most suitable for the specific identified business requirements. Batch and stream analytics cover all aspects, which may occur in real-world environments, including cases that require a deeper analysis of large amounts of data collected over a period of time (batch) or those that require velocity and agility for the events that we need to monitor in real or near-real-time (streaming).
5.
Simple, intuitive and effective visualization of results and interaction capabilities for the end-users: Advanced visualization tools which provide the insights, value and operational knowledge extracted from available data need to consider both expert and non-expert end-users (e.g. manufacturers, engineers and operators)

5.2 Connection to BDV Reference Model, BDV SRIA, and AI, Data and Robotics SRIDA

The described solution for the defined manufacturing use cases can be contextualized within the BDV reference model defined in the BDV Strategic Research and Innovation Agenda (BDV SRIA). They contribute to the BDV reference model in the following ways. Specifically, regarding the BDV reference model horizontal concerns, we address:

Data visualization and user interaction: By developing several advanced and interactive visualization solutions applicable in the manufacturing sector, as detailed in Sects. 4.1.3 and 4.2.3.
Data analytics: By developing data analytics solutions for the two industrial use cases in the manufacturing sector, as described in Sects. 4.1.2 and 4.2.2. While the solutions may not correspond to state-of-the-art advances in AI/machine learning algorithms development, they clearly contribute to revealing novel insights and best practices on how Big Data analytics can improve manufacturing operations.
Data processing architectures: We develop architectures as shown in Figs. 2 and 6 that are well suited for manufacturing applications wherein both batch analytics (e.g. analysing historical data) and streaming analytics (e.g. online processing of the data that correspond to a newly manufactured engine) are required.
Data protection and data management: Real data were anonymized by CRF that manipulated and masked them after they were retrieved from an internal proprietary server.

Regarding the BDV reference model vertical concerns, we address the following:

Big data types and semantics: Our work here is mostly concerned with structured sensory data, meta-data and thermal images data (which corresponds to the Media, Image, Video and Audio data types according to the BDV nomenclature). The work also contributes to best practices in the generation of realistic synthetic data from the corresponding domain-defined meta-data, as well as a systematic way to assess the quality and usefulness of the generated synthetic data.
Communication and connectivity: the work describes innovative ways to communicate with and retrieve data from an internal manufacturing company proprietary server, as described in Sect. 4 and outlined in Fig. 1.

Therefore, in relation with BDV SRIA, the I-BiDaaS solution contributes to the following technical priorities: Data protection; Data Processing Architectures; Data Analytics; and Data Visualization and User Interaction.

Furthermore, in relation to the BDVA SRIA priority areas in connection with Factories of the Future with EFFRA, we address the following dimensions:

(a)
Excellence in manufacturing: advanced manufacturing processes and services for zero-defect and innovative processes and products
(b)
Sustainable value networks: manufacturing driving the circular economy
(c)
Inter-operable digital manufacturing platforms: supporting an ecosystem of manufacturing services

In more detail, CRF use cases have been selected in order to develop innovative tools and solutions that may ensure better product quality towards zero-defect manufacturing. In particular, the existing production lines may be improved to maximize the quality of their product through the integration of solutions that exploit Big Data technologies. A better process efficiency can result in energy saving and cost reduction in the context of circular economy and allow manufacturers to reach a high level of competitiveness and sustainability.

Finally, the chapter relates to the following cross-sectorial technology enablers of the AI, Data and Robotics Strategic Research, Innovation & Deployment Agenda [37], namely: Knowledge and Learning, Reasoning and Decision Making, and Systems, Methodologies, Hardware and Tools.

6 Conclusion

The increasing levels of digitalization in the manufacturing sector contribute to generate a large amount of data that often contain a high value of hidden information. This is due to the complexity of real processes that require several interconnected stages to obtain finished goods. Variables and parameters are set for the operation of each digital machine and just like we assemble components, we need to pull together data generated from different sources and levels if we want to improve the quality of processes and products. I-BiDaaS developed an integrated platform, taking into consideration how complex data can be managed and how to help manufacturers who are not sufficiently enabled to analyse complex datasets, by empowering them to easily utilize and interact with Big Data technologies.

Notes

1.
http://www.ibidaas.eu/
2.
As explained below, we identify throughout the chapter several best practices for the application of Big Data analytics in manufacturing.
3.
http://pytorch.org
4.
Visualization with TensorBoard: https://www.tensorflow.org/tensorboard
5.
https://www.docker.com/

References

Nahm, A., Vonderembse, M., & Koufteros, X. (2003). The impact of organizational structure on time-based manufacturing and plant performance. Journal of Operations Management, 21, 281–306.
Article Google Scholar
Schwab, K. (2016). The fourth industrial revolution. Franco Angeli.
Google Scholar
Chen, B., Wan, J., Shu, L., Li, P., Mukherjee, M., & Yin, B. (2017). Smart factory of Industry 4.0: Key technologies, application case, and challenges. IEEE Access, 6, 6505–6519.
Article Google Scholar
Groover, M. P. (2018). Automation, production systems, and computer-integrated manufacturing. Pearson.
Google Scholar
Yadegaridehkordi, E., Hourmand, M., Nilashi, M., Shuib, L., Ahani, A., & Ibrahim, O. (2018). Influence of big data adoption on manufacturing companies’ performance: an integrated DEMATEL-ANFIS approach. Technological Forecasting and Social Change, 137, 199–210.
Article Google Scholar
O’Donovan, P., Leahy, K., Bruton, K., & O’Sullivan, D. T. J. (2015). Big data in manufacturing: A systematic mapping study. Journal of Big Data, 2, 20.
Article Google Scholar
Passlick, J., Lebek, B., & Breitner, M. H. (2017). A self-service supporting business intelligence and big data analytics architecture. In 13th international conference on Wirtschaftsinformatik, St. Gallen, Switzerland.
Google Scholar
Bornschlegl, M. X., Berwind, K., & Hemmje, M. (2017). Modeling end user empowerment in big data applications. In 26th International conference on software engineering and data engineering at: San Diego, CA, USA.
Google Scholar
Zillner, S., Curry, E., Metzger, A., Auer, S., & Seidl, R., (Eds.). (2017). European big data value. Strategic research & innovation agenda. Springer.
Google Scholar
Arruda, D. (2018). Requirements engineering in the context of big data applications. SIGSOFT Software Engineering Notes, 43(1), 1–6.
Article Google Scholar
Raguseo, E. (2018). Big data technologies: An empirical investigation on their adoption, benefits and risks for companies. International Journal of Information Management, 38(1), 187–195.
Article Google Scholar
Nuseibeh, B., & Easterbrook, S. (2000). Requirements engineering: A roadmap. In Proceedings of the conference on the future of software engineering, ICSE’00 (pp. 35–46).
Google Scholar
Paech, B., Dutoit, A. H., Kerkow, D., & Von Knethen, A. (2002). Functional requirements, non-functional requirements, and architecture should not be separated—A position paper. In Proceedings of the 8th international working conference on requirements engineering.
Google Scholar
Horkoff, J., Aydemir, F. B., Cardoso, E., Li, T., Maté, A., Paja, E., Salnitri, M., Piras, L., Mylopoulos, J., & Giorgini, P. (2019). Goal-oriented requirements engineering: An extended systematic mapping study. Requirements Engineering, 24, 133–160.
Article Google Scholar
I-BiDaaS Consortium. (2018). D1.3: Positioning of I-BiDaaS. Available at: https://doi.org/10.5281/zenodo.4088297.
Murray, M. T., & Murray, M. (2011). High pressure die casting of aluminium and its alloys. In Fundamentals of aluminium metallurgy production, processing and applications. Woodhead Publishing series in metals and surface engineering (pp. 217–261).
Google Scholar
Lumley, R. N. (2011). Progress on the heat treatment of high pressure die castings. In Fundamentals of aluminium metallurgy production, processing and applications. Woodhead Publishing series in metals and surface engineering (pp. 262–303).
Google Scholar
Winkler, M., Kallien, L., & Feyertag, T. (2015). Correlation between process parameters and quality characteristics in aluminum high pressure die casting. In Conference: NADCA.
Google Scholar
Fiorese, E., & Bonollo, F. (2016). Process parameters affecting quality of high-pressure die-cast Al-Si alloy. Doctoral Thesis, University of Padova.
Google Scholar
Chandrasekaran, R., Campilho, R. D. S. G., & Silva, F. J. G. (2019). Reduction of scrap percentage of cast parts by optimizing the process parameters. Procedia Manufacturing, 38, 1050–1057.
Article Google Scholar
Bhuiyan, M. S. H., & Choudhury, I. A. (2014). Review of sensor applications in tool condition monitoring in machining. Reference Module in Materials Science and Materials Engineering, Comprehensive Materials Processing, 13, 539–569.
Google Scholar
Borgi, T., Hidri, A., Neef, B., & Nauceur, M. S. (2017). Data analytics for predictive maintenance of industrial robots. In International conference on advanced systems and electric technologies (IC_ASET).
Google Scholar
Eren, H. (2012). Assessing the health of sensors using data historians. In IEEE sensors applications symposium proceedings.
Google Scholar
Dal, B., Tugwell, P., & Greatbanks, R. (2000). Overall equipment effectiveness as a measure of operational improvement—A practical analysis. International Journal of Operations & Production Management, 20(12), 1488–1502.
Article Google Scholar
Ljungberg, Õ. (1998). Measurement of overall equipment effectiveness as a basis for TPM activities. International Journal of Operations & Production Management, 18(5), 495–507(13).
Article Google Scholar
Galar, D., Sandborn, P., & Kumar, U. (2017). Maintenance costs and life cycle cost analysis. CRC Press.
Google Scholar
Arapakis, I., Becerra, Y., Boehm, O., Bravos, G., Chatzigiannakis, V., Cugnasco, C., Demetriou, G., Eleftheriou, I., Mascolo, J. E., Fodor, L., Ioannidis, S., Jakovetic, D., Kallipolitis, L., Kavakli, E., Kopanaki, D., Kourtellis, N., Marcos, M. M., de Pozuelo, R. M., Milosevic, N., Morandi, G., Montanera, E. P., & Ristow, G. H. (2019). Towards specification of a software architecture for cross-sectoral big data applications. In IEEE world congress on services (SERVICES) (Vol. 2642). IEEE.
Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
MATH Google Scholar
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354–377.
Article Google Scholar
Huang, G., Liu, Z., Van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4700–4708).
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778).
Google Scholar
I-BiDaaS Consortium. (2020). D3.3: Batch Processing Analytics module implementation final report. Available at: https://doi.org/10.5281/zenodo.4608346
Bock, H. H. (2017). Clustering methods: a history of k-means algorithms. In Selected contributions in data analysis and classification. Springer (pp. 161–172).
Google Scholar
Snoke, J., Raab, G. M., Nowok, B., Dibben, C., & Slavkovic, A. (2016). General and specific utility measures for synthetic data. Journal of the Royal Statistical Society Series A (Statistics in Society), 181(3).
Google Scholar
Gupta, M., Gao, J., Aggarwal, C. C., & Han, J. (2014). Outlier detection for temporal data: a survey. IEEE Transactions on Knowledge and Data Engineering, 26(9), 2250–2267.
Article Google Scholar
Petitjean, F., Forestier, G., Webb, G. I., Nicholson, A. E., Chen, Y., & Keogh, E. (2014). Dynamic time warping averaging of time series allows faster and more accurate classification. In IEEE international conference on data mining.
Google Scholar
Zillner, S., Bisset, D., Milano, M., Curry, E., García Robles, A., Hahn, T., Irgens, M., Lafrenz, R., Liepert, B., O’Sullivan, B., & Smeulders, A. (Eds.) (2020, September). Strategic research, innovation and deployment agenda—AI, data and robotics partnership. Third release. Brussels. BDVA, euRobotics, ELLIS, EurAI and CLAIRE.
Google Scholar

Download references

Acknowledgements

The work presented in this chapter is supported by the I-BiDaaS project, funded by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement 780787. This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Author information

Authors and Affiliations

Aegis IT Research LTD, London, UK
Andreas Alexopoulos, Spiros Fotis, Leonidas Kallipolitis & Ilias Spais
Barcelona Supercomputing Center, Barcelona, Spain
Yolanda Becerra, Cesare Cugnasco, Miquel Martínez & Raül Sirvent
IBM, Haifa, Israel
Omer Boehm & Michael Vinov
Information Technology for Market Leadership, Athens, Greece
George Bravos & Vassilis Chatzigiannakis
Ecole des Ponts ParisTech, Champs-sur-Marne, France
Giorgos Demetriou, Vlatka Katusic & Hernan Ruiz-Ocampo
University of Manchester, Manchester, UK
Iliada Eleftheriou, Evangelia Kavakli & Rizos Sakellariou
Centro Ricerche FIAT, Orbassano, Italy
Gianmarco Genchi, Julien Mascolo & Giuseppe Danilo Spennacchio
Technical University of Crete - School of Electrical and Computer Engineering, Crete, Greece
Sotiris Ioannidis
Foundation for Research and Technology, Hellas - Institute of Computer Science, Crete, Greece
Sotiris Ioannidis, Despina Kopanaki, Christoforos Leventis & Giorgos Vasiliadis
University of Novi Sad - Faculty of Sciences, Novi Sad, Serbia
Dusan Jakovetic, Nemanja Milosevic, Srdjan Skrbic & Dusan Stamenkovic
ATOS, Madrid, Spain
Enric Pere Pages Montanera
Software AG, Darmstadt, Germany
Gerald Ristow

Authors

Andreas Alexopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Yolanda Becerra
View author publications
You can also search for this author in PubMed Google Scholar
Omer Boehm
View author publications
You can also search for this author in PubMed Google Scholar
George Bravos
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis Chatzigiannakis
View author publications
You can also search for this author in PubMed Google Scholar
Cesare Cugnasco
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Demetriou
View author publications
You can also search for this author in PubMed Google Scholar
Iliada Eleftheriou
View author publications
You can also search for this author in PubMed Google Scholar
Spiros Fotis
View author publications
You can also search for this author in PubMed Google Scholar
Gianmarco Genchi
View author publications
You can also search for this author in PubMed Google Scholar
Sotiris Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar
Dusan Jakovetic
View author publications
You can also search for this author in PubMed Google Scholar
Leonidas Kallipolitis
View author publications
You can also search for this author in PubMed Google Scholar
Vlatka Katusic
View author publications
You can also search for this author in PubMed Google Scholar
Evangelia Kavakli
View author publications
You can also search for this author in PubMed Google Scholar
Despina Kopanaki
View author publications
You can also search for this author in PubMed Google Scholar
Christoforos Leventis
View author publications
You can also search for this author in PubMed Google Scholar
Miquel Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Julien Mascolo
View author publications
You can also search for this author in PubMed Google Scholar
Nemanja Milosevic
View author publications
You can also search for this author in PubMed Google Scholar
Enric Pere Pages Montanera
View author publications
You can also search for this author in PubMed Google Scholar
Gerald Ristow
View author publications
You can also search for this author in PubMed Google Scholar
Hernan Ruiz-Ocampo
View author publications
You can also search for this author in PubMed Google Scholar
Rizos Sakellariou
View author publications
You can also search for this author in PubMed Google Scholar
Raül Sirvent
View author publications
You can also search for this author in PubMed Google Scholar
Srdjan Skrbic
View author publications
You can also search for this author in PubMed Google Scholar
Ilias Spais
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Danilo Spennacchio
View author publications
You can also search for this author in PubMed Google Scholar
Dusan Stamenkovic
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Vasiliadis
View author publications
You can also search for this author in PubMed Google Scholar
Michael Vinov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giuseppe Danilo Spennacchio .

Editor information

Editors and Affiliations

Insight SFI Research Centre for Data Analytics, NUI Galway, Ireland
Edward Curry
Information Centre for Science and Technology, Leibniz University Hannover, Hannover, Germany
Sören Auer
SINTEF Digital, Oslo, Norway
Arne J. Berre
Paluno, University of Duisburg-Essen, Essen, Germany
Andreas Metzger
Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain
Maria S. Perez
Siemens Corporate Technology, München, Germany
Sonja Zillner

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Alexopoulos, A. et al. (2022). Big Data Analytics in the Manufacturing Sector: Guidelines and Lessons Learned Through the Centro Ricerche FIAT (CRF) Case. In: Curry, E., Auer, S., Berre, A.J., Metzger, A., Perez, M.S., Zillner, S. (eds) Technologies and Applications for Big Data Value . Springer, Cham. https://doi.org/10.1007/978-3-030-78307-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-78307-5_15
Published: 29 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78306-8
Online ISBN: 978-3-030-78307-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Big Data Analytics in the Manufacturing Sector: Guidelines and Lessons Learned Through the Centro Ricerche FIAT (CRF) Case

Abstract

Similar content being viewed by others

Manufacturing process data analysis pipelines: a requirements analysis and survey

Applying Big Data Concepts to Improve Flat Steel Production Processes

Big data for furniture intelligent manufacturing: conceptual framework, technologies, applications, and challenges

Keywords

1 Introduction

2 Requirements for Big Data in the Manufacturing Sector

3 Use Cases Description and Experiments’ Definition: Technical and Business KPIs

3.1 Production Process of Aluminium Die-Casting

3.2 Maintenance and Monitoring of Production Assets

4 I-BiDaaS Solutions for the Defined Use Cases

4.1 Production Process of Aluminium Die-Casting

4.1.1 Architecture

4.1.2 Data Analytics

4.1.3 Visualizations

4.1.4 Results

4.1.5 Synthetic Data Generation and Quality Assessment

4.2 Maintenance and Monitoring of Production Assets

4.2.1 Architecture

4.2.2 Data Analytics

4.2.3 Visualizations

4.2.4 Results

5 Discussion

5.1 Lessons Learned, Challenges and Guidelines

5.2 Connection to BDV Reference Model, BDV SRIA, and AI, Data and Robotics SRIDA

6 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation