Keywords

1 Introduction

The manufacturing industry transforms material or assembles components to produce finished goods that are ready to be sold in the marketplace. The organizational structure of manufacturing companies is very complex and involves many business and operative functions with different roles and responsibilities in order to guarantee efficiency at every level [1]. The fourth industrial revolution [2, 3] has initiated many changes in the industrial value chain, transforming the shop floor, which is the production part of the manufacturing industries. Companies are introducing process equipment provided with several robots and digital tools. In this way, it is possible to set and control processes in an automated manner that speeds up production with a high level of accuracy [4]. Furthermore, large volumes of data are generated every day that may be collected and analysed for increasing process robustness and efficiency and building a technical cycle that reduces the consumption of energy and material. However, despite the potential benefits offered by the exploitation of Big Data, its usage is still at an early stage in many manufacturing companies [5].

Centro Ricerche FIAT (CRF) is one of the main private research centres in Italy and represents Fiat Chrysler Automobiles (FCA) in European and national collaborative research projects. In the context of the European Horizon 2020 I-BiDaaS project,Footnote 1 CRF identified two use cases, in which complex datasets are retrieved from real processes. By exploiting Big Data analytics in these two cases, CRF aims to improve the process and product quality in a much more agile way through the collaborative effort of self-organizing and cross-functional teams, reducing costs due to further processing and predicting faults and unnecessary actions. This requires solutions that will allow manufacturing experts to interact with Big Data [6] in order to understand how to easily utilize important information often hidden in raw data. In other words, the first best practice (1)Footnote 2is the correlation between the value of Big Data technology and the skills of people involved in the data management process. The I-BiDaaS approach follows this best practice and develops a self-service [7] Big Data analytics platform that enables different CRF end-users to exploit Big Data in order to gain new insights assisting them to make the right decisions in a much more agile way.

The aim of this chapter is to demonstrate how advanced analytic tools can empower end-users [8] in the manufacturing domain (see Sect. 5) to create a tangible value from the process data that they are producing, and to identify a number of best practices, guidelines and lessons learned. For future reference, we list here the main best practices with the identified guidelines and lessons learned, while they will be discussed in detail throughout the chapter:

  • The correlation between the value of Big Data technology and the skills of people involved in the data management process with the involvement of different departments belonging to the same or different organizations in order to extract the value of all data collected from several sources and levels (breaking data silos).

  • The alignment of the Big Data requirements with the business needs and the definition of appropriate experiments with the identification of Big Data technologies most suitable for the specific identified business requirements.

  • The management of the type of data generated with the identification of the types of data useful for the analysis, their anonymization and generation of synthetic data in parallel with the process of data anonymization.

  • The development of a solution that satisfies Big Data requirements of specific use cases by mapping the identified functional and non-functional concerns into a concrete software architecture with the development of Advanced Visualization tools for showing high-value Big Data analytics solutions for domain experts and operators.

The remainder of this chapter is organized as follows. Section 2 describes the process followed for the identification of the Big Data requirements in the manufacturing sector and demonstrates how it was applied to elicit the requirements of the CRF use cases, which are imposed the design of the I-BiDaaS Big Data solution. Furthermore, CRF requirements guide the definition of the experiments for assessing the developed system, described in Sect. 3. The architecture of the I-BiDaaS solution is described in Sect. 4. Finally, Sect. 5 reports on the lessons learned, challenges and guidelines reflecting the experience of the I-BiDaaS project. Section 5 also provides the connection of the described work with the Big Data Value (BDV) reference model and its Strategic Research and Innovation Agenda (SRIA) [9]. Finally, Sect. 6 concludes the chapter.

2 Requirements for Big Data in the Manufacturing Sector

Alignment between business strategy and Big Data solutions is a critical factor for achieving value through Big Data [10]. Manufacturers must understand how the adoption of Big Data technologies [11] is related to their business objectives in order to identify the right datasets and increase the value of the analytics results. Therefore tailoring Big Data requirements to the business needs is the second best practice (2) reported in this chapter.

In more detail, the I-BiDaaS methodology for eliciting CRF requirements draws on work in the area of early Requirements Engineering (RE), which considers the interplay between business intentions and system functionality [12, 13]. In particular, the requirements elicitation followed a (mostly) top-down approach whereby business goals reflecting the company’s vision were progressively refined in order to identify the user requirements of specific stakeholder groups (i.e. data providers, Big Data capability providers and data consumers). Their analysis resulted in the definition of system functional and non-functional requirements, which describe the behaviour that a Big Data system (or a system component) should expose in order to realize the intentions of its users. This process was facilitated by the use of appropriate questionnaires. In the cases that information on the requirements was available (either collected in the context of the project setup phase, or identified through a review of related literature [10, 14]), this was used to partly pre-fill the questionnaires and minimize end-users’ effort. Evidently, users were asked to check pre-filled fields and ensure that documented information was valid and accurate.

Table 1 gives a summary of the CRF requirements. Although it provides only an excerpt of the elicited CRF requirements, it demonstrates the application of the I-BiDaaS way-of-working in the CRF use cases.

Table 1 CRF Big Data requirements

In more detail, the strategic CRF business goal (R1) was refined into a number of more operational business goals that need to be satisfied through Big Data analytics (R3). In addition, a number of relevant KPIs (R6) were defined that can be used to assess the proposed solution (see Sect. 3). Continuing, at the user requirements level, requirements were described in terms of the characteristics of different data sources that are planned to be used (requirements R7 and R8), the analytics capability of the proposed solution envisaged (R9) and the different interface requirements of the end-users that will consume the analytics results (R10–R12). Finally, analysis of the above user requirements resulted in the generation of the system requirements, both functional (R13) and non-functional (R14 and R15). Although described in a linear fashion, the above activities were carried out in an iterative manner, resulting in a stepwise refinement of the results being produced. The complete list of CRF requirements elicited is described in detail in [15].

Further to forming the baseline of the I-BiDaaS solution (see Sect. 4), these requirements also assist the definition of experiments as described in Sect. 3.

3 Use Cases Description and Experiments’ Definition: Technical and Business KPIs

The aim of experimentation is to assist stakeholders’ acceptance of any new Big Data solution. The definition of appropriate experiments is thus another best practice (3) reported in this chapter. In particular, the definition of CRF experiments aims at evaluating and validating the I-BiDaaS solution and its implementation in the context of CRF use cases. It follows a goal-oriented approach, whereby the experiment’s goal(s) towards which the measurement will be performed are defined, then a number of questions are formed aiming to characterize the achievement of each goal and, finally, a set of Key Performance Indicators (KPIs) and associated metrics is associated with every question in order to answer it in a measurable way.

The definition of each experiment also involved the specification of the experiment’s workload in terms of the use case datasets and type of analysis envisaged, as well as the definition of the experimental subjects that will be involved in the experiment, as reported in the following Sects. 3.1 and 3.2 that discuss, respectively, the ‘Production process of aluminium die-casting’ and ‘Maintenance and monitoring of production assets’ use cases.

3.1 Production Process of Aluminium Die-Casting

The ‘Production process of aluminium die-casting’ use case generates complex datasets from the production process of the engine blocks. During the die-casting process [16, 17], molten aluminium is injected into a die cavity, mounted in a machine, in which it solidifies quickly. In this case, we have a large number of interconnected process parameters that influence the flow behaviour of molten metal inside the die cavity, and, consequently, the productivity and the quality [18,19,20]. Henceforth, the fourth best practice (4) is to identify the type of data generated. Data collected from several sources can be disorganized and in different formats and data may not be exploited.

In this use case, the data provided for the analyses consist of a collection of casting process parameters, such as piston speed in the first and second phase, intensification pressures and others. In addition to the process data, CRF also provided a large dataset of thermal images of the engine block casting process, under a hypothesis that there is a correlation among process data, thermal data and the outcome of the process.

For the mentioned complexity of the process, it is important to not only carefully design parameters and temperatures but also to control them because they have a direct impact on the quality of the casting.

Analysis of the datasets aims to predict whether an engine block will be produced correctly during the casting process in order to avoid further processing and scraps, which would lead to financial savings for the manufacturers.

To test the efficiency of the I-BiDaaS solution in this context, an experiment has been defined, as shown in Table 2. As seen in Table 2, the Business KPI ‘Product/service quality’ identified during requirements elicitation (see Sect. 2) was further elaborated in order to define appropriate metrics (quality control levels related to good and defective products) and to map it to appropriate indicators at the I-BiDaaS solution level (execution time, data quality, cost).

Table 2 Overview of the ‘Production process of aluminium die-casting’ experiment

For each KPI, a baseline value for evaluating the performance of the I-BiDaaS solution has also been defined. For example, an increase of 2–6% of the quality control level related to good products and a decrease of 1–4% and 0.05–2% of the two quality control levels related to defective products is sought in order to satisfy manufacturers’ requests in terms of product quality.

3.2 Maintenance and Monitoring of Production Assets

In this use case, data have been retrieved from sensors mounted on several machines (e.g. linear stages, robots, elevators) along the production line of vehicles. Many related works are conducted in this field concerning, e.g., sensor applications in tool condition monitoring in machining [21], predictive maintenance of industrial robots [22] and assessing the health of sensors using data historians [23].

We focused on welding lines in which robots are used to assemble vehicle components, and flexibility is required for the continual changes of the types of components and vehicles. A data server gathers sensor data, which is categorized into two different datasets, namely SCADA and MES. The SCADA dataset contains production, process and control parameters of daily vehicle production and is structured as in Table 3.

Table 3 Structure of the dataset for the SCADA data

There are over 100 sensors and each one is identified by a specific number (id). The other columns report on the value of the specific sensor, the unit of measurement and the timestamp.

The MES dataset contains specific data associated with the type of vehicle being produced and is structured as in Table 4.

Table 4 Structure of the dataset for the MES data

When OP020.Passo20 changes from 0 to 1, a new vehicle enters into the area provided with sensors and modello_op_020 indicates the model of the vehicle being processed.

Analysis of this data aims at predicting unnecessary actions and the improvement of the efficiency of manufacturing plants by reducing production losses. Once again, an experiment has been defined in order to test the efficiency of the I-BiDaaS solution in this context. The key points of the ‘Maintenance and monitoring of production assets’ experiment are shown in Table 5. In particular, data was analysed to obtain thresholds for anomalous measurements for all sensors. The fifth best practice (5) is the building of a foundational database with the history of anomalies that may help end-users to plan maintenance through prevision of asset failures only when it is necessary.

Table 5 Overview of the ‘Maintenance and monitoring of production assets’ experiment

As shown in Table 5, the business KPIs reported during requirements elicitation were further elaborated to identify related metrics (Overall Equipment Effectiveness (OEE) [24, 25] and maintenance costs [26]) and to map them on specific indicators at the Big Data solution level (execution time, data quality and cost).

For each KPI, a baseline value for evaluating the performance of the I-BiDaaS solution has been defined. For example, the prediction of unnecessary actions and the improvement of the efficiency should reduce production losses and achieve greater competitiveness of the company by an increase of 0.05% of the current Overall Equipment Effectiveness (OEE) and a decrease of 50% in maintenance costs.

4 I-BiDaaS Solutions for the Defined Use Cases

The final best practice (6) reported in the following sections relates to the development of a solution that satisfies Big Data requirements of specific use cases by mapping the identified functional and non-functional concerns into a concrete software architecture [27]. In particular, the general requirements reported in Sect. 2 were further clarified, taking into consideration the specific context of each use case (described in Sect. 3), resulting in customized solutions per use case described in Sects. 4.1 and 4.2.

For both use cases, data gathered from the production lines are sent to CRF, where they are manipulated and masked. After the anonymization, data are sent to the I-BiDaaS Platform, hosted in a Virtual Machine. This represents a bridge between the I-BiDaaS infrastructure and CRF internal server, created by the I-BiDaaS technical partners. The same bridge is used to send the analytics results to the production plant end-users, as seen in Fig. 1.

Fig. 1
figure 1

Flow of data and results

4.1 Production Process of Aluminium Die-Casting

In this section, the architecture, data analytics, visualization and results for the ‘Production process of aluminium die-casting’ use case are described.

4.1.1 Architecture

Figure 2 shows the architecture of this use case, which consists of several well-defined components. The Universal Messaging component is used for communication with most of the other components. To start with describing the data flow for this use case, we first consider the dataset. Data is transferred from CRF’s internal server to the I-BiDaaS platform server. Therein, the data is pre-processed and cleaned—this step is important as the data needs to be prepared for model training and inference tasks. Then, the data is given to the Machine Learning algorithm from the I-BiDaaS pool of ML algorithms. In this use case, the model is a complex neural network implemented in PyTorchFootnote 3 and trained jointly from thermal images and sensor datasets. The Machine Learning component outputs two results: training metrics/results for visualization purposes—used in the Advanced Data Visualization component—and the trained model used for inference. Both these results are transferred through Universal Messaging. In the end, for inference purposes, the Model Serving (Inference) Service component is used. In the initial phases of development, before the real data is fully prepared (e.g. retrieved, anonymized, etc.), the architecture uses realistic synthetic data for initial components development. The use of synthetic data can make the development significantly more agile, but is utilized with care and under a quality assurance process. For example, a final trained ML model has to be delivered on real data. We refer to Sect. 4.1.5 for details on realistic synthetic data generation and quality assessment.

Fig. 2
figure 2

Architecture of the ‘production process of aluminium die-casting’ use case

4.1.2 Data Analytics

In this section, we describe in more detail the data analytics solution that corresponds to the four respective modules in Fig. 2 (Data pre-processing, PyTorch neural network model, Trained model and Training results) and that analyses the thermal images and the sensors datasets.

Under the hypothesis that there is a correlation among sensor data, thermal data and the outcome of the process, a further task is to classify combined image and sensor data inputs to see whether the cast engine blocks are without any production faults. Formally, data analytics here corresponds to an M-ary supervised classification task [28]. As the dataset involves image classification, for this task we utilize Deep Convolutional Neural Networks [29].

We tried three approaches during this use-case analytics development regarding the input data: unmodified thermal images, grayscale thermal images and raw sensor data. For raw sensor data the thermal camera provides a matrix of values which is the same dimension as the image, which when normalized provides very similar (almost the same, depends on the normalization process) input to the grayscale image from the computing standpoint. While the grayscale image and the raw sensor data did have faster training times (one channel for convolutions versus three for thermal images) from our experiments the thermal images gave best accuracy/precision/recall metrics so we decided to keep using them. We suspect that this is the case because modern neural network architectures we are using (e.g. DenseNet [29, 30]) are optimized to work with coloured images (e.g. ImageNet dataset [31]). The corresponding results are reported in Sect. 4.1.4.

4.1.3 Visualizations

The approach to visualize the die-casting process results in real time involves the deployment of a number of constantly updated visualizations which offer a complete overview of the results. These include the values of monitored sensor variables and the final classification of the end products of the process.

We report here, as example, the Global Live Chart that allows end-users to timely visualize the trend of the main parameters (e.g. velocity, pressure, standard deviation, etc.) and to check the classification levels (Fig. 3).

Fig. 3
figure 3

Real-time aggregated results

4.1.4 Results

The models described in Sect. 4.1.2 were trained on both the original and the newly balanced datasets. We favour the model trained on the balanced dataset as it learns to recognize faulty engine blocks much better than the model trained on the imbalanced dataset, even though the overall accuracy is lower—simply because we have less faultless engines. In Fig. 4, we see the accuracies of both models on the training and testing datasets (standard 80/20 split). The orange (top) line is the model trained on the full dataset and the pink line is the model trained on the balanced datasetFootnote 4 [32].

Fig. 4
figure 4

Training and testing accuracy for the two joint neural network models: when trained on full imbalanced data (orange line) and when trained on sub-sampled balanced data (pink line)

4.1.5 Synthetic Data Generation and Quality Assessment

An initial development of the use case solution was carried out with realistic synthetic data. In parallel with the process of data anonymization, making data structured, etc., it was useful to carry out a synthetic data generation for early development stages with particular caution when extracting insights from synthetic data.

The fabrication of synthetic data that exhibits similar characteristics and similar distribution as the real data is a challenging task. The IBM Test Data Fabrication technology (TDF) was used for that purpose. TDF requires constraint rules that model the relationships and dependencies between the data and leverages a Constraint Satisfaction Problems (CSP) solver to fabricate data that satisfies these constraints. The rules for the production of synthetic data were set by CRF with the help of IBM. The correlation between the real parameters and the synthesized parameters was further refined after reiteration of the data analysis.

For the initial evaluation of the synthetic data, we performed empirical and analytical validations. The empirical technique consisted of delivering these data to the expert production technicians, which were not able to indicate any difference with the actual production data, as there was no distinguishing factor for them. The second analytical technique was carried out by the CRF research team. They used the K-Means algorithm [33] as their desired technique. Further evaluation was carried out by IBM while striving to perform a qualitative generic evaluation process for the real data compared with the fabricated data. This evaluation was concerned with methods to judge whether the distributions of the fabricated data and the original data were comparable, what is commonly referred to in the literature as the general utility of the datasets. In addition to the general utility, IBM also considered the specific utility, i.e. the similarity between the synthetic data and the original data.

The propensity mean-squared-error (pMSE) [34] was used as a general measure of data utility to the specific case of synthetic data. Propensity scores represent probabilities of group memberships. If the propensity scores are well modelled, this general measure should capture relationships among the data that methods such as the empirical Cumulative Distribution Function (CDF) may miss.

The method is a classification problem where the desired result is poor classification (50% error rate), giving better utility for low values of the pMSE.

Randomly sampling 5000 data points from the real and synthetic datasets, and using a logistic regression to provide the probability for the label classification, we were able to show that the measured mean pMSE score for the ‘Production process of aluminium die-casting’ dataset is 0.218 with a standard deviation of 0.0017, as shown in Fig. 5.

Fig. 5
figure 5

Results for 100 random sampling taken from the real and the synthetic data (5K datapoints each) and the pMSE calculated using a logistic model

4.2 Maintenance and Monitoring of Production Assets

In this section, the architecture, data analytics, visualization and results for the ‘Maintenance and monitoring of production assets’ use case are described.

4.2.1 Architecture

Figure 6 shows the architecture, which consists of several well-defined components. The Universal Messaging component is used for communication in most of the components. To start to describe the data flow, we start with the dataset. Data are sent from CRF to the I-BiDaaS platform. There, the data is pre-processed and prepared for model training with an outlier detection model. The outlier detection model outputs two results: training results for visualization purposes—used in the Advanced Data Visualization component, and the trained model used for inference. Training results are transferred through Universal Messaging. In the end, for inference purposes, the Model Inference Serving component is used. It is also important to say that all the components use containerized (i.e. DockerFootnote 5) backbone from the Storage and Container Orchestration Service. Data is visualized and the jobs are scheduled through the I-BiDaaS User Interface component.

Fig. 6
figure 6

Architecture of the ‘maintenance and monitoring of production assets’ use case

4.2.2 Data Analytics

Data, described in Sect. 3, has been transformed into separate time series—one per sensor so that each sensor can be monitored separately. Since the measurements were not labelled (anomalous/non-anomalous), outlier detection algorithms arose as natural candidates for this use case [35]. We constructed an outlier detection model for each of the time series. While more advanced algorithms can be used, we adopted a simple, easy-to-implement and computationally cheap, yet here effective, solution based on the Inter-Quartile Range (IQR) test. Results of these models could be used for suggesting if a measurement is an outlier and for discovering the pairs of sensors that have anomalous measurements at similar timestamps. Preparation of these models was done using Python, and it consisted of the following steps:

  1. 1.

    For each sensor, obtain thresholds for anomalous measurements using a modified interquartile range (IQR) test. Three different variants of IQR-like tests were performed:

    (Q1, Q3) ∈ {(5th,  95th), (10th,  90th), (25th,  75th)} where Q1 and Q3 are the corresponding percentiles.

  2. 2.

    With obtained thresholds, filter the time series such that only anomalous measurements were kept, as shown in Fig. 8.

  3. 3.

    Calculate the Dynamic Time Warping (DTW) [36] distance between outlier time series.

  4. 4.

    Rescale distances to [0, 1].

  5. 5.

    Group pairs of sensors by the distance into groups:

    [0, 0.1), [0.1, 0.2)…[0.9, 1].

Time series with anomalous measurements obtained in step 2 enabled us to see the outlier trends for each sensor and to compare their behaviour. Comparison of anomalous trends was made using steps 3, 4 and 5. If the distance obtained in step 5 is small, it means that two sensors output anomalous measurements in a similar fashion. Therefore, if one of them fails, then the other sensor in the pair should also be inspected. We present the distribution of sensors’ similarity in Fig. 9.

4.2.3 Visualizations

Data stemming from the aforementioned analysis are presented using a multi-step approach that allows operators drill down to sensory data and detected anomalies in an intuitive and easy-to-use way. Starting from a given month, operators then select the category of sensors they wish to see and immediately have an overview of the ones having anomalies detected, as shown in Fig. 7. Upon selection of a sensor, operators see the anomalies detected during the selected month and can furthermore select a specific day to see the actual values and therefore review the actual anomaly that was detected, as shown in Fig. 8.

Fig. 7
figure 7

Sensor selection

Fig. 8
figure 8

Sensor history and details

4.2.4 Results

The obtained boundaries (from step 2 in Sect. 4.2.2) could be used for daily analysis of sensors and various visualization tasks, such as showing the number of anomalous measurements for the current day, as seen in Fig. 8, comparing the number of outliers between two sensors for the given time window, etc., as shown in Fig. 9.

Fig. 9
figure 9

Number of outliers between sensors

5 Discussion

Reflecting on CRF’s experience and all the work done within the I-BiDaaS Project, this section develops several recommendations addressed to any manufacturing company willing to undertake Big Data projects. This section also positions the I-BiDaaS solution within Big Data Value (BDV) reference model and Strategic Research and Innovation Agenda (SRIA).

5.1 Lessons Learned, Challenges and Guidelines

The I-BiDaaS project developed an integrated platform for processing and extracting actionable knowledge from Big Data in the manufacturing sector. Based on the challenges experienced and lessons learned through our involvement in I-BiDaaS, we propose a set of guidelines for the implementation of Big Data analytics in the manufacturing sector, with respect to the following concerns:

  1. 1.

    Data storage and ingestion from various data sources and its preparation: In a production line deploying digital instruments, there are many devices which setup operating values and adjust and control parameters during the production processes. Depending on whether we want to act on the quality of the production process or on the maintenance of the equipment, the first challenge is to understand how data will be ingested and managed from data sources over time and who will be able to access them. Furthermore, this aspect highlights the importance of breaking data silos by extracting the value of all data collected from several sources and levels and may be necessary to involve different departments belonging to the same or different organizations.

  2. 2.

    Data cleaning: A second important aspect is to understand which types of data can be useful for analysis. This implies the importance of data cleaning in order to identify incomplete, inaccurate and irrelevant parts of the generated dataset.

  3. 3.

    Fabrication of realistic synthetic data for experimentation and testing:

    Data are strictly confidential, so another challenge is to decide how data will be shared if external analysis is required. In this case, manufacturers need to evaluate the possibility of fabrication of realistic synthetic data for experimentation of the analytical models that will be developed and then to test the same models with anonymized real data.

  4. 4.

    Batch and stream analytics for increasing the speed of data analysis: After collecting and analysing data, it is necessary to understand which Big Data technologies are most suitable for the specific identified business requirements. Batch and stream analytics cover all aspects, which may occur in real-world environments, including cases that require a deeper analysis of large amounts of data collected over a period of time (batch) or those that require velocity and agility for the events that we need to monitor in real or near-real-time (streaming).

  5. 5.

    Simple, intuitive and effective visualization of results and interaction capabilities for the end-users: Advanced visualization tools which provide the insights, value and operational knowledge extracted from available data need to consider both expert and non-expert end-users (e.g. manufacturers, engineers and operators)

5.2 Connection to BDV Reference Model, BDV SRIA, and AI, Data and Robotics SRIDA

The described solution for the defined manufacturing use cases can be contextualized within the BDV reference model defined in the BDV Strategic Research and Innovation Agenda (BDV SRIA). They contribute to the BDV reference model in the following ways. Specifically, regarding the BDV reference model horizontal concerns, we address:

  • Data visualization and user interaction: By developing several advanced and interactive visualization solutions applicable in the manufacturing sector, as detailed in Sects. 4.1.3 and 4.2.3.

  • Data analytics: By developing data analytics solutions for the two industrial use cases in the manufacturing sector, as described in Sects. 4.1.2 and 4.2.2. While the solutions may not correspond to state-of-the-art advances in AI/machine learning algorithms development, they clearly contribute to revealing novel insights and best practices on how Big Data analytics can improve manufacturing operations.

  • Data processing architectures: We develop architectures as shown in Figs. 2 and 6 that are well suited for manufacturing applications wherein both batch analytics (e.g. analysing historical data) and streaming analytics (e.g. online processing of the data that correspond to a newly manufactured engine) are required.

  • Data protection and data management: Real data were anonymized by CRF that manipulated and masked them after they were retrieved from an internal proprietary server.

Regarding the BDV reference model vertical concerns, we address the following:

  • Big data types and semantics: Our work here is mostly concerned with structured sensory data, meta-data and thermal images data (which corresponds to the Media, Image, Video and Audio data types according to the BDV nomenclature). The work also contributes to best practices in the generation of realistic synthetic data from the corresponding domain-defined meta-data, as well as a systematic way to assess the quality and usefulness of the generated synthetic data.

  • Communication and connectivity: the work describes innovative ways to communicate with and retrieve data from an internal manufacturing company proprietary server, as described in Sect. 4 and outlined in Fig. 1.

Therefore, in relation with BDV SRIA, the I-BiDaaS solution contributes to the following technical priorities: Data protection; Data Processing Architectures; Data Analytics; and Data Visualization and User Interaction.

Furthermore, in relation to the BDVA SRIA priority areas in connection with Factories of the Future with EFFRA, we address the following dimensions:

  1. (a)

    Excellence in manufacturing: advanced manufacturing processes and services for zero-defect and innovative processes and products

  2. (b)

    Sustainable value networks: manufacturing driving the circular economy

  3. (c)

    Inter-operable digital manufacturing platforms: supporting an ecosystem of manufacturing services

In more detail, CRF use cases have been selected in order to develop innovative tools and solutions that may ensure better product quality towards zero-defect manufacturing. In particular, the existing production lines may be improved to maximize the quality of their product through the integration of solutions that exploit Big Data technologies. A better process efficiency can result in energy saving and cost reduction in the context of circular economy and allow manufacturers to reach a high level of competitiveness and sustainability.

Finally, the chapter relates to the following cross-sectorial technology enablers of the AI, Data and Robotics Strategic Research, Innovation & Deployment Agenda [37], namely: Knowledge and Learning, Reasoning and Decision Making, and Systems, Methodologies, Hardware and Tools.

6 Conclusion

The increasing levels of digitalization in the manufacturing sector contribute to generate a large amount of data that often contain a high value of hidden information. This is due to the complexity of real processes that require several interconnected stages to obtain finished goods. Variables and parameters are set for the operation of each digital machine and just like we assemble components, we need to pull together data generated from different sources and levels if we want to improve the quality of processes and products. I-BiDaaS developed an integrated platform, taking into consideration how complex data can be managed and how to help manufacturers who are not sufficiently enabled to analyse complex datasets, by empowering them to easily utilize and interact with Big Data technologies.