1 Introduction

Businesses are becoming increasingly connected to enhance collaboration and co-create value. The notion of “business networks” describes two or more linked businesses that act as “collective actors” [33]. Since the emergence and growth of the internet and the digitization of many aspects of a company, digital interaction has become more important in those networks [43]. Throughout every interaction and collaboration, masses of data are produced-data that can be analyzed to generate valuable insights, for example, to optimize processes [60] or build innovative services on top of existing offerings [41, 75, 87]. Davenport [24] describes data analytics as one of the most important activities to have gained competitive advantage. Combined with the need to better understand relations and interactions in business networks [6], the need for inter-organizational analysis of distributed data sources is evident. Previous work presents concepts for centralized data analytics for distributed data sources [30, 67].

Therefore, the topic of machine learningFootnote 1 across different entitiesFootnote 2 within a value chain or business network is of high relevance [36]. However, as recent work points out, “a substantial potential for utilizing AI across company borders has remained largely untapped” [34, p. 1]. For the manufacturing industry, the World Economic Forum estimates the potential value of sharing analytical knowledge and associated data at over $100 billion [14]. Other sources confirm the economic and/or public benefits of inter-organizational machine learning for other domains like health care [65, 80, 90], mobility [71], or smart cities [48].

To address this research gap, Bach et al. [9] state that for the novel challenge of machine learning in business networks, “several approaches need to be extended or re-thought” [9, p. 1]. Consequently, the centralized analysis of data across businesses faces several challenges—and real-world examples unlocking the potential of cross-entity learning, i.e., acquiring analytical knowledge across different (legal) organizations, remain scarce [14]. Hirt et al. [46] analyze typical barriers for machine learning in systems—and carve out three main requirements for successful inter-organizational learning as depicted in Table 1. In the course of this work, we will design a machine learning artifact for business networks focusing on these requirements: ensuring data confidentiality (DR1) and reducing data volume (DR2)—while still ensuring superior prediction performance (DR3).

Table 1 Design requirements of this work

Companies are afraid of exposing sensitive information throughout the process of data analysis (DR1). The need to protect sensitive data is subject to research in the area of business networking [50, 88] or customer privacy protection and advertising [37, 42, 66]. In complex business networks, collaboration happens between multiple organizations of different legal units, hence data confidentiality is required.

As more and more data is produced, the respective transfer of large volumes of data (e.g., to a central analysis unit) can be challenging and should be addressed (DR2). Techniques like complex event processing or fog computing offer solutions to cope with growing data streams [16, 67], but still lack convincing concepts for data confidentiality preservation [91]. Additionally, as sensor sensitivity increases, not all data produced can be centralized [2]. In practice, this leads to selective centralization and/or collection of data and, thus, to a major loss of potentially relevant information.

An artifact addressing these previous requirements also needs to ensure that the resulting performance of a method leveraging the network is superior to cases where a single company would only analyze its own data (DR3). Especially the trade-off between ensuring confidentiality while allowing superior performance is worth exploring and of raised interest—and will be analyzed in detail in the course of this article.

In our work, we propose inter-organizational meta machine learning, a method that addresses all three requirements for machine learning in business networks. The kernel theory of meta machine learning [18] informs the design of our artifact. Meta machine learning combines the prediction of several base classifiers (multiple entities in a business network, e.g., suppliers) to create one aggregated prediction (single entity in a business network, e.g., original equipment manufacturer (OEM)). To demonstrate the feasibility of meta machine learning as a viable solution within business networks, we instantiate our proposed method within a working prototype and evaluate it regarding the three criteria of data confidentiality, transferred volume, and achieved predictive performance based on an industrial use case. We highlight that analytics within organizations is often a trade-off between full data confidentiality, centralization of data, and overall predictive performance. In summary, we contribute to the body of knowledge by showing that our meta machine learning method is suited to inter-organizational machine learning in terms of general technical feasibility, addressing the requirements of data confidentiality (DR1), data volume reduction (DR2), and performance (DR3). Additionally, we demonstrate its usefulness within the application context at our industry partner.

The remainder of this work is structured as follows: We first set the fundamentals for our work by elaborating on business networks and meta machine learning (Sect. 2), as well as on our methodology and research questions (RQs) (Sect. 3). We then review state-of-the-art literature as part of the theoretical background (Sect. 4). With these prerequisites, we present our concept of inter-organizational meta machine learning and explain its architecture, the data streams as well as the necessary processes in detail (Sect. 5). This concept is then applied to a real-world case and evaluated in a technical experiment for its usefulness (Sect. 6). We conclude with a summary, limitations, and an outlook for future research (Sect. 7).

2 Fundamentals

In this section, we first introduce business networks and distributed data sources. Then we describe meta machine learning as a foundation for comprehensive analyses across these networks.

2.1 Business networks

We base our conceptualization of a business network on Anderson et al. [6] for a common understanding. Every business network consists of two or more units-representing for example companies or other organizations-that have a dyadic relationship. Kambil and Short [49] describe the relationship as a linkage that can have different forms, such as an alliance or hierarchy. As businesses increasingly move towards digitalization to make processes more intelligent, data is produced at each company, leaving the network with various distributed heterogeneous data sources. Such networks can be described as smart business networks [43].

Fig. 1
figure 1

Simplified business network between two or more units of a business network, based on Anderson et al. [6]

In a connected world, every unit in a network possesses a piece of the puzzle in the form of distributed data sources of a common context [10], the “big picture” as illustrated in Fig. 1. To identify this big picture, those distributed data sources must be analyzed comprehensively to derive holistic insights. In an ideal setting, all units would exchange their data and freely communicate with each other. In reality, practical barriers such as the sheer volume of data that is required to be transmitted and, foremost, the exposure of data outside company boundaries, prohibit such an analysis, leaving huge potential untapped [14, 34]. While machine-learning-based solutions exist to enable secure data centralization and analysis in business networks (e.g., AWS Amazon Macie), this centralization is often not happening as data is kept confidential and is not exposed to other parties. Appropriate methods that still allow learning from a distributed, but not shared dataset are lacking. However, there is a lack of methods and providers enabling a machine-learning-based analysis on data which itself is confidential and therefore cannot be directly accessed by the analyzing party itself.

2.2 Meta machine learning

Basic machine learning techniques are commonly used to solve various real-world problems. Machine learning describes computational methods that use a series of examples (“past experience”) to learn about a given task [61]. Although statistical methods are used in the learning process, a manual adjustment or programming of rules or solution strategies to solve a problem is not required. In more detail, basic machine learning uses a model that is built by applying an algorithm on a set of known data to gain insight about an unknown set of data [18, 61].

The term “meta machine learning” describes methods that employ more than one layer of learning and is “concerned with accumulating experience on the performance of multiple applications of a learning system” [18]. Džeroski and Ženko [32] argue that meta machine learning enables to “learn about learning” [18, 83]. Based on Lemke et al. [56], we define meta learning as a system that includes a learning sub-system that builds meta knowledge. Meta knowledge is extracted by a previous learning episode on one or more data sets [56]. We further differentiate between two categories of meta learning: ensemble learning and stacked generalization. Ensemble learning methods such as bagging [19] or boosting [35] propose varying data selection and processing and building different sub-models. The output of these sub-models is then combined by a meta-model (e.g., majority voting).

The same principle can be applied to perform comprehensive analyses on different data sources, using stacked generalization. A dedicated sub-model is built for every data source. Their prediction is then combined through a meta model (e.g., another trained machine learning model) to get an aggregated prediction [89]. Through the combination of predictions, the uncorrelated error between all models can be minimized, which leads to a performance increase [79].

Meta machine learning is often used to combine heterogeneous types of data to perform a comprehensive analysis. Hirt et al. [45] use a stacked generalization approach to combine different types of data (e.g., pictures and text) by employing different sub-models and combining them through a meta model, mimicking a cognitive paradigm to predict attributes of Twitter users. In the area of financial fraud detection, Abbasi et al. [1] propose a meta learning method to combine heterogeneous data sources to improve prediction performance. They use meta learning to reduce the declarative and procedural bias [83] of classifiers working on company-internal and publicly available data in one specific use case. It is often used to enhance prediction performance and combine different data sources [1]. Similarly, in the course of this work, we consider stacked generalization and its potential to solve practical problems in business network analytics. In contrast to prior work in the area of meta machine learning, we do not solely focus on its performance-enhancing properties but utilize an underestimated characteristic: the information abstraction between the sub-layer and the meta layer.

In the context of business networks, this has two advantages. Considering that sub-models are deployed at different units of a business network and send their prediction to any desired unit that inherits a meta model, we suppose only a fraction of data needs to be transmitted, compared to a transfer of raw data. Additionally, confidential information is already (and possibly irreversibly) masked through the abstraction and pre-analysis of data, making a meta machine learning analysis confidentiality-preserving.

3 Methodology

The general research is based on evaluation-centric design science according to Venable et al. [82]. To guide the design of our artifact, meta machine learning [18] and service-oriented computing [47] act as kernel theories throughout the design process for construction [39, 84].

Prior studies focus on solving the issue of disclosing sensitive data during analysis by proposing to only exchange encrypted data or masking sensitive information in data sources before exchanging it. Architectures and principles to reduce or handle the amount of transferred data during analysis do not ensure data confidentiality. Existing methods are prone to disclose sensitive information, limiting analytical methods or significantly decreasing predictive performance. As outlined in the upcoming theoretical background in detail, we identify a need for inter-organizational machine learning approaches which preserve data confidentiality [91] while reducing volume [73] and still allowing for reasonable performance [31]. For instance, there are methods capable of not exposing any raw data, but they lack in performance [7] or are not suitable to machine learning endeavors [8]. Thus, we pose our general research question (GRQ):

General Research Question (GRQ):

How can we design a well-performing meta machine learning approach allowing the holistic analysis of distributed entities within business networks while preserving data confidentiality and reducing transferred data volume?

Fig. 2
figure 2

Possible scenarios of comprehensive analyses in business networks

To better understand the effectiveness and efficiency of our proposed method, we consider three scenarios, as depicted in Fig. 2. In all three scenarios, data is distributed across different units of a business network. All units collaborate in some way with each other and could potentially optimize their own output, or the output of the overall network. In scenario 1, we assume that there is no communication of data or insights of any kind between business units caused by the stated obstacles (“isolated analysis”). Every unit only performs an isolated analysis of its own data to gain insights. In contrast, in scenario 2 we consider a situation where an analysis is performed through the proposed meta machine learning method that ensures data confidentiality and reduces the volume of transferred data (“inter-organizational meta machine learning method”, short IOMML). Lastly, in scenario 3, we depict an “ideal world” where obstacles such as data confidentiality and volume are non-existent, and all data is accessible by all units of a network (“shared data pool”).

The first main challenge that we address is the technical evaluation of whether the three design requirements (data confidentiality, reduction of data volume, performance evaluation) are met by the proposed method, thus stating RQ1 as follows:

Research Question 1 (RQ1):

Is the proposed method effective and efficient with regard to data confidentiality, volume reduction, and prediction performance?

We expect an increase in the predictive performance of an analytics method from scenario 1 (isolated analysis) to scenario 2 (inter-organizational meta machine learning method (IOMML)), as scenario 2 has more information available than scenario 1. By comparing scenario 2 (IOMML) with scenario 3 (shared data pool), it is expected to yield a lower predictive performance of the meta machine learning method than in a scenario with complete data exchange and all raw data at disposal. Narayanan and Shmatikov [62] describe this trade-off between anonymizing/masking data and prediction accuracy: While most public datasets revealed by companies are anonymized to protect user privacy, researchers hint that perfect anonymization is not possible without damaging the utility of the data. However, distributed analysis—like the one suggested in this work—yields the advantage of separate, specialized models [32]. We are interested in the performance of the meta machine learning method in comparison to a case with complete data exchange and a case with an isolated analysis.

Apart from the technical effectiveness and efficiency, we aim to gain insights on the perceived usefulness of the method in the field, more precisely, in the organizational context where it could be established. We measure perceived usefulness in our case company with the respective sub-construct from the well-established technology acceptance model (TAM) [25], similar to related work [20, 26, 44]. Thus, we state the second RQ as follows:

Research Question 2 (RQ2):

How is the proposed method perceived within its application context in terms of usefulness?

To answer both questions, we instantiate the proposed artifact within a real-world production line case with our industry partner. To strengthen generalizability, we implement an additional robustness check within a distributed sensor group system (see Appendix A.3 on page 27).

4 Theoretical background

Within the body of knowledge, we can identify two research streams in the context of enabling an analysis of distributed data sources within a business network, which is closely related to two design requirements of this work: preserving data confidentiality (DR1) and reducing the amount of transferred data in the process (DR2). To outline the research gap, we describe work in the area of data confidentiality, often called privacy preservation—an established field of research—as well as the distributed analysis of large data streams.

4.1 Preserving data confidentiality

Data privacy and confidentiality can have multiple facets and are driven by different motives and in different domains, such as social media [93], healthcare [36], industrial applications [69] or others. Belanger and Xu [12] describe privacy in online social networks and propose a multi-dimensional privacy concept fit to online social interactions. Wohlgemuth et al. [88] describe the role of security and privacy in business networking. Kieseberg et al. [50] propose an algorithm for collusion-resistant anonymization and fingerprinting of sensitive microdata. Especially the involvement of end users requires data privacy. Riquelme and Román [66] assess the influence of privacy and security on online trust for consumers. Goldfarb and Tucker [37] elaborate on privacy regulation and online advertising, while Hann et al. [42] develop a theoretical approach to overcoming online information privacy concerns.

Methods to preserve data confidentiality and privacy can be distinguished based on their main principle: masking, noising, or encryption of data (see Table 2). Additionally, there are approaches combining the previously described principles. The field of privacy-preserving data mining aims to build accurate models without disclosing an individual data record. In the following, we provide an overview of related work in the area of preserving data confidentiality in general and then describe approaches to realize confidentiality-preserving analyses and their suitability for our task at hand.

Data masking and noising are approaches that originate from the statistical sciences that strive to perform analysis without compromising security and privacy [29]. These approaches reduce the problem to that of extracting usable information from noisy data [22, 28]. While data noising is fairly robust to standard security attacks like the man-in-the-middle attack or an structured query language (SQL) injection, the accuracy of the analysis result often suffers from the amount of noise introduced in the initial data [4].

Besides masking and noising, encryption is another key method for preserving private information. As a comprehensive analysis of distributed data requires the transport of all data sources to a central analytics unit, encryption could be used to secure the transmission. For analysis, this data needs to be decrypted, which might already disclose data to the central analytics unit. The efficacy of this approach, therefore, depends on the safety of data in “safe” zones.

Table 2 Overview of learning methods and strategies for data-confidential learning in business networks. Legend:   = fully applies,   = partially applies

In our work, similar to masking or noising techniques, we strive to transform data to preserve data confidentiality. However, in contrast to the mentioned techniques, we aggregate the data as a part of the desired analysis to minimize the loss of information during the process (“aggregation”). By shortly elaborating on the drawbacks of existing approaches, we discuss the suitability of an aggregation technique like meta machine learning.

Compared to methods relying on encryption, our technique is able to leverage any machine learning during analysis. Although there are novel approaches that perform mathematical and rudimentary learning techniques on encrypted data [38], those do not allow for flexible use of various machine learning methods. Furthermore, performing operations on encrypted data is known for causing a high computational effort [15]. The higher computational effort and the incapability to perform machine learning on encrypted data make encryption not a suitable technique for the task at hand. Recent reports suggest that these techniques might also be vulnerable to external attacks [86]. Bhattacharya et al. [15] describe a method for privacy preserving analytics using homomorphic encryption of data among peers, enabling them to perform analyses. Their key proposition is to perform analysis on encrypted data, deducing the desired insight and, therefore, never exposing data to a third central party. Although they extend the tool set of analytical capabilities, their approach is limited to only performing basic mathematical operations (i.e., calculating the sum of products). Additionally, the computational costs are increasingly high due to the necessity of homomorphic encryption.

In the case of masking techniques, critical fields of data entry are masked to ensure confidentiality. Especially in cases where only single elements of a data entry are critical (e.g., the name of a data entry about a person), masking might be a viable option to consider. However, in the case at hand the critical data itself is the one which needs to be analyzed, making masking techniques a non-viable option.

Noising techniques strive to preserve confidentiality by adding noise to the critical data element. This enables exposing the noised data and, then, performing every machine learning technique on it. However, with increasing noising of data and, therefore, increasing data confidentiality, the predictive performance also drops significantly. Therefore, noising could be applied but has major drawbacks in terms of performance for the task at hand.

The proposed method in this work can be characterized as an aggregation technique to group and summarize critical data in a form, where the result is not exposing any private information. However, this technique aims towards realizing the aggregation by a subordinate layer of machine learning, leaving only relevant information for further analysis in the aggregated result.

As we are striving towards realizing inter-organizational machine learning, in the remainder of this paper we focus on comparing our method with the noising technique.

While masking, noising, and encryption manipulate the source data but try to keep the information content and data structure as similar as possible, we propose to only transmit information that has a direct impact on the target of the analysis.

4.2 Reducing and processing large data streams

To realize analytics in distributed systems, large volumes of data that originate from heterogeneous sources are required to be processed: sensors, transactional or social networks, or company-internal information systems. The reasons for performing analytics in those systems can be the detection of undesired behavior or other specific patterns (e.g., misconduct, unusual events, runtime errors) and to derive higher-level information from them [27, 57]. However, the increasing number of devices with access to the internet and the increasing interconnectedness of data-producing units pose challenges for data analysis infrastructure and techniques.

In general, there are two opposing literature strands of data processing and computing and, i.e., analytics in networked systems in the literature [73]: centralized and decentralized paradigms. Centralized approaches aim towards processing real-time, possibly fluctuating, data streams generated by heterogeneous, distributed units gathering low-level information in the cloud [57, 78]. In contrast, there are approaches to directly process decentralized data, like edge or fog computing [68].

Centralizing analytics requires the data to be in one place. Driven by the need to transfer and process large data streams in real time at once in order to detect undesired behavioral patterns, complex event processing (CEP) has emerged. CEP represents a set of techniques to analyze event-driven information systems. According to Luckham [58], an event is a record of an activity within a system that may depend on other events. A set of events, including their dependencies, results in a complex event containing valuable, higher-level information. In order to be able to continuously process the events gathered on distributed units, a technology must be able to apply complex analyses in parallel on several data streams [67]. For instance, CEP is used in finance to detect fraud [74] and make automatic trading decisions [3]. In addition, CEP is also used to analyze time series collected by sensors to perform real-time analyses of complex interactions measured by independent sensors [30, 85].

In contrast, decentralized analytics is driven by the utilized unused processing capabilities [72] or a need to reduce the transferred data volume [81]. Sarlis et al. [72] propose a decentralized analytics system for network traffic data, dynamically distributing parts of the decentralized data for processing and orchestrating an analysis. Uhlmann et al. [81] describe a decentralized data analytics framework for maintenance for connected manufacturers. The described system pre-analyzes sensors on site and sends the status of a machine to a central platform. Then the information is distributed and made accessible via a dashboard. Pournaras and Nikolic [64] describe on-demand self-adaptive data analytics in large-scale decentralized networks. They focus on the automated allocation of computational capacity in a network of multiple processing nodes. To strike a compromise between cloud-based and edge computing, cloudlets have evolved. A cloudlet represents the middle layer between a cloud and a mobile device, addressing latency issues of cloud architecture as well as centralization endeavors [73].

Centralized analytics imposes the necessity to transfer data to a central unit, which prohibits the analysis of sensitive data and, in fact, prohibits any analysis. Hitherto, mechanisms for decentralized analytics might impose first characteristics similar to this work. In most mechanisms, there is no additional meta-analysis of the pre-analyzed content to gain further insights. Furthermore, in the area of processing large streams of data, confidentiality is often not considered a problem.

As decentralized approaches attempt to significantly reduce the network load by pre-processing low-level information on the edge device, our approach processes data where it is produced, compresses it but derives high-value information. At a central level, i.e., in a meta entity, the information from distributed units is aggregated. The information is then combined on this (central) level and analyses across multiple organizations can be performed. Thus, our proposed concept combines the advantages of central information processing with low latency due to low data volume.

5 Inter-organizational meta machine learning

To address the challenges of realizing a data confidentiality preserving method that minimizes the amount of transferred data volume in decentralized business networks, we propose an IOMML. This method uses data aggregation techniques in order not to disclose data and reduce their volume.

In the following, we suggest and describe our artifact in a general way before we instantiate it in a use case in Sect. 6. We perform this description along three perspectives: the intended architecture, the data and model output exchange during analysis, and the life cycle of an instantiation.

5.1 Architecture

As elaborated, every business network consists of different units that interact with each other. Every unit might possess its own data sources, might be owned by different legal organizations, or might be geographically distributed. The architecture should therefore enable to preserve data confidentiality and reduce the required volume of data that is transferred during analysis.

We distinguish between two unit types: sub-units and meta units. A sub-unit possesses one or many data sources (e.g., a customer data base and corresponding transaction history data) that-in combination with data of other units-might reveal a network-wide pattern after analysis. The meta unit represents a virtual unit that analyzes the overall situation based on the sub-units. All involved units-regardless of whether they are sub or meta-have a common understanding of the goal of the analysis. Every sub-unit possesses a certain data source that requires processing. By nature, data sources might store heterogeneous types of data that require an individual analysis. Although these separate data sources might reveal information and insights on their own, the core assumption is a pattern that is distributed throughout more than one unit. We aim to analyze and learn about these patterns-the “bigger picture”. For further consideration, we assume that every sub-unit’s data source reveals such a piece of the puzzle.

Fig. 3
figure 3

Simplified structure of a meta model based on two or more sub-units with a superordinate meta unit

In Fig. 3, we depict a simplified structure of a meta analysis between multiple sub-units. Hereby, every sub-unit analyzes its own data, using a customized sub-model. As a result, each sub-unit prepares a collection of an item identifier, a (categorical) result, and a corresponding certainty value for the analysis. This output of the sub-model is sent to a virtual meta unit that uses a meta model to perform a comprehensive analysis. The final output of the meta analysis could then be sent back to every sub-unit as a basis for further activities or could be used by other units to build upon the insight. Note that the meta unit is just a virtual construct and could be represented by any (sub-) unit.

Fig. 4
figure 4

Flow chart of a prediction based on sub-unit data input

The data source of a sub-unit gets processed by a sub-model that is highly customized to the corresponding data structure and predicts a certain attribute that might be an indicator of the big picture we are trying to reveal. As depicted in Fig. 4, after processing the sub-unit data point, the prediction output gets sent to a meta unit that aims to aggregate all incoming, subordinate predictions. Note that at no point throughout this process does raw data get transferred or exposed outside the processing unit. The meta unit does not need any information about the input data of the sub-models or how the sub-models perform their analysis. No bare information or raw data that was not intended to be shared is distributed, thereby preserving intellectual property and confidentiality of data. To aggregate all incoming predictions and to retrieve insights from them, the meta unit employs a meta layer of machine learning that learns which sub-model’s prediction is of importance in which situation. The output of such a meta model prediction is an accumulated prediction towards a distributed database. To make this prediction, the meta model draws on the stacked generalization paradigm from meta machine learning as a kernel theory [40].

The meta unit collects information of different, distributed, and independent sources to make a holistic prediction as an insight that is latently present in these data sources. It uses machine learning to gain information about the significance, relevance, and validity of each sub- model prediction and their interdependencies. Without communicating the meaning of a sub-model output, or even sub-unit data, to the meta unit, the stacked generalization meta model can still identify the desired big picture. The meta model prediction output can then be included in analytics applications or other smart services to create value.

5.2 Data and model output exchange during analysis

During the process of a meta prediction, data is analyzed by different sub-models, and sent to the meta unit for aggregation by the meta model. The final result is then sent back to every sub-unit, e.g., to optimize local business processes. In Fig. 5, we depict the data and model output exchange during a meta analysis in a business network. Hereby, a sub-unit possesses confidential unit data. That data is analyzed on the sub-unit’s site by a sub-model, generating an abstract sub-model output. The sub-model output is then transferred to a meta unit. That step repeats for every sub-unit that is part of the analysis in a business network.

The architecture aims to maintain data confidentiality while minimizing the volume of data transmitted during analysis. In this context, it is important to consider the type of data being processed and transferred, as the reduction of volume can have a different impact on preserving confidentiality for structured and unstructured data. For structured data, such as tables, reducing the volume depends on the number of columns a row or observation possesses. Given the prediction output of a sub-classifier stays the same, the more columns a row has, the larger the reduction in volume and the higher the abstraction and security can be. This can help maintain data confidentiality by minimizing the amount of sensitive information that is transferred. For unstructured data, such as images or videos, naturally, more information is reduced by just sending a classifier output instead of the complete visual information. In this case, the data confidentiality can be high in comparison to considering the raw data. In conclusion, it is crucial to take into account the type of data being processed and the methods available for reducing its volume while maintaining its integrity.

Fig. 5
figure 5

Exemplary communication in an inter-organizational meta learning landscape: data and prediction flow across involved units

At the meta unit’s site, all incoming abstract sub-model predictions are combined by a feature aggregation, forming the input for a meta model. The meta model then processes its input and generates an output. As that output is based upon all underlying sub-model predictions, based on sub-unit data, comprehensive insights can be derived. Afterward, the meta model output can be consumed by the meta unit, every participating sub-unit, or a possible third party unit as a basis for any action or decision.

6 Evaluation in production line quality prediction

In order to evaluate the proposed IOMML artifact, we follow the FEDS framework and its application to a real-world use case according to Venable et al. [82] as depicted in Fig. 6. The FEDS framework is a framework for evaluating decision-making systems. The evaluation episodes (EEs) within the framework consist of a series of tests or scenarios that are designed to assess the performance and behavior of the system under different conditions. These episodes are used to evaluate the system’s ability to handle various types of uncertainty, its robustness to different types of failures, and its overall effectiveness in making decisions—by moving from artificial to more naturalistic evaluations with each episode as well as forming more summative than formative knowledge. The goal of these evaluations is to identify any weaknesses or limitations in the system so that they can be addressed and improved upon. We conduct two EEs in alignment with our two RQs: Evaluation episode 1 (EE1) covers the technical feasibility aspects of the artifact and its characteristics to meet our design requirements (RQ1). These requirements include privacy preservation (DR1), data volume reduction (DR2) as well as prediction performance (DR3). In the subsequent evaluation episode 2 (EE2), we cover the potential users of the artifact and the assessment of its usefulness, thus, addressing our second research question (RQ2).

Fig. 6
figure 6

Evaluation episodes according to the FEDS framework [82]

In the remainder of this section, we start by thoroughly describing the industrial use case that serves as the basis for our evaluation. We showcase the technical instantiation and describe its technological foundation and a possible user interface for demonstration. To understand the technical effectiveness and efficiency of the presented meta machine learning method as our design science research (DSR) artifact, we conduct three evaluations based on an industrial use case. First, we elaborate on the data confidentiality preserving capability of our approach. Then we show the reduction of data volume that needs to be transmitted during analysis. Third, we measure the performance of the approach and compare it to two reference scenarios, as described in Sect. 3. Fourth and finally, we evaluate the artifact with experts from the related application field to assess its potential usefulness.

6.1 Use case description and suitability

The use case originates from industrial manufacturing and serves as a basis for simulating a business network with different units. We deliberately chose a network within one legal entity which enables us to compile a benchmark case where all data is available in one place (see Fig. 2 on page 5).

A global industrial manufacturing company has provided us with a dataset that inherits data about 1,183,747 parts as well as information on whether each part has passed the quality control (“no scrap”) or not (“scrap”). During the production process, each part goes through a varying sequence of several lines and their stations. The present dataset comprises 52 stations across four lines. The dataset includes 968 numeric features, 1156 date features, and 2140 categorical features. In addition to the large number of existing features, the sparse nature of the data poses an additional challenge. Most of the data instances contain empty values for more than half of the features because a part only passes through a fraction of a number of the stations. Figure 7 illustrates the paths of the parts through the different lines during the production process. Each horizontal bar represents an independent entity-in our case a production line. Each lane represents a subset of parts that undergo a production step in the respective line. As depicted in the graph, most of the parts pass lines 0 and 3 (77.4%), while only a small share includes all four lines (\(<0.1\%\)). The second and third most frequently passed paths comprise lines 1, 2, and 3 (20.7%), and lines 0, 2, and 3 (9.6%), respectively. The data itself is very imbalanced. The complete set contains only 6,879 parts labeled as faulty, which corresponds to a failure rate of 0.58%. For the overall production, it is desired to reduce the number of faulty parts by predicting future failures in time and to intervene. Often, as data is not accessible, there is a lack of quality prediction mechanisms that help to increase overall production quality by intervening and improving the production during the process. Hereby, an intervention could be done either during an ongoing production or afterwards. In the first case, potentially faulty parts could be inspected separately as they flow through the production line. This could help to detect causes for the quality issue, such as degraded production gear, or to prevent a faulty part overall. In the second case, quality issues could be detected after production but before shipment. In some cases, quality checks are also a cost driver and are only performed on a sample of the overall produced parts. Therefore, having a predictive model which can pre-determine the sample for that quality check could decrease the cost of quality management but still increase the overall quality of production.

Fig. 7
figure 7

Paths through lines L taken by different subsets of parts throughout the production process

To perform a comparative evaluation between an isolated scenario and one where data can be freely shared, we choose a use case where data sharing is possible in a test setting but not a productive system. This enables us to create a measurable benchmark for our proposed method. The production lines may be distributed geographically and even owned by different legal units.

The use case is well suited for our instantiation of inter-organizational meta machine learning, as it allows us to develop an artifact addressing design requirements (DR1-3). By encapsulating the lines, we can simulate completely independent entities (DR1). In interviews with experts working in the application context, they reveal that data transfer, especially in rural areas of production sites, can be quite challenging, as the amount of data produced exceeds 60 terrabytes (TBs) per day (DR2). Our industry partner provided the data set as part of a “Kaggle competition”Footnote 3 with the aim of benefiting from a community-driven increase of prediction performance (DR3).

6.2 Artifact instantiation

In Fig. 9, we depict the instantiation of the proposed meta machine learning method in our use case. For each line, a sub-model is generated that produces a sub-prediction. After receiving all sub-predictions, the meta unit-in our case a production control-receives all sub-predictions, aggregates them into a single feature array, and then analyzes it throughout the meta model. The result is a holistic quality prediction that can be used to improve production.

To evaluate the technical performance of the prediction, we use the Matthews correlation coefficient (MCC) as a metric for evaluation, which is particularly robust to class imbalance [17].

The MCC is calculated directly from the results of the binary predictions and lies in the interval [\(-1\), 1], with values of 1 denoting perfect classification, values of \(-1\) denoting complete disagreement and values of 0 denoting an uncorrelated relation between prediction and ground truth.

Fig. 8
figure 8

Sparsity matrix depicting the large ratio of missing values in numeric data. Rows represent observations/parts and columns numeric features across four lines

Due to its high sparsity (\(>99\%\)), previous work suggests omitting the categorical data [92]. Additionally, as we are interested in comparing methods across several scenarios, we do not include categorical data in our analysis and rather focus on numerical data. We cope with missing values (81%, cf. Fig. 8) in the numerical data by replacing them with a marker value [63]. The remaining dataset is adopted unchanged. In addition, the date information is compressed into four representative features. As shown in Fig. 9, we compare the inter-organizational meta learning approach (scenario 2) to a separate isolated analysis of data in each unit (scenario 1) and a comprehensive analysis with a shared data pool and all data in one model (scenario 3). We choose the random forest classification as it offers good results on this dataset with comparatively little training time [92]. For training, the parameter search through the parameter grid shown in Table 3 and the validation of the regular approach, we use threefold nested cross-validation to avoid overfitting [21]. Similarly, in the case of meta machine learning, we apply an adapted threefold nested cross-validation, which we have altered towards the conditions of the two-stage process to prevent data leakage. The nested cross-validation uses three outer and two inner folds. The test set of the inner fold is once again based on a three-fold cross-validation. The training set of the inner fold is used to train all sub-models, while the meta model is trained and evaluated on the predictions of the test set of the inner fold by another threefold cross-validation.

In addition to the meta machine learning classification model, we develop a microservice-based web service. This web service simulates the data generated in the individual lines, classifies these by the sub-model and transfers the results to the meta model service. This classifies the data originating from the sub-models and makes them available to the frontend.

Fig. 9
figure 9

Instantiation of an inter-organizational meta learning method in an industrial use case with four production lines representing the sub-units and a production control as a meta unit

Table 3 Parameter search space for random forest model

The microservice pattern is an architectural style for software applications, whose basic idea is to split a heavyweight monolithic application into several independent, usually smaller, self-contained parts. This architecture is well-suited to the concept of meta machine learning, as the individual components are loosely connected and easily expandable. Each model, as well as an additional web frontend that visualizes the meta results in a web browser, simulating the production control, is encapsulated as a standalone microservice. Each service provides a uniform representational state transfer (REST) application programming interface (API) with exactly one endpoint. This endpoint accepts hypertext transfer protocol (HTTP) POST requests with attached JavaScript object notation (JSON) formatted text. The incoming data is processed within the service and passed to the subsequent service.

The result is a frontend (cf. Fig. 10) in which the classification results of the individual lines and the result of the meta model are displayed. For each part, the sub-model outputs are shown as they come in. After at least one sub-model output is available, the meta model predicts an output that is also shown in the production control dashboard. This prototype illustrates the two-layer architecture of the meta machine learning approach and the dependencies between sub-models and the meta model.

Fig. 10
figure 10

Web front-end of the instantiated artifact

6.3 Evaluation episode 1: technical evaluation

6.3.1 Preserving data confidentiality (DR1)

On the basis of the artifact (IOMML) and its instantiation, we evaluate the confidentiality aspect of meta machine learning in business networks. We define that a system is confidential when it ensures “that only authorized users access information” [23]. In our case, the users are the units. Each unit, regardless of being sub or meta, should only be able to access its own raw data.

To answer the first research question (RQ1), whether raw data can only be accessed by the unit it originates from, we compare the different scenarios of business network analyses as depicted in Table 4 based on our research framework (Fig. 2 on page 5). In the scenario of isolated analysis in separate units (scenario 1), no data is exchanged, therefore data confidentiality is preserved. In the other extreme of comprehensive network-wide analysis using all available data based on a shared data pool (scenario 3), data is, by definition, distributed among all units in the business network. Data confidentiality is therefore violated. The scenario of the meta machine learning method is of interest for further evaluation, as all sub-units only have access to their own data, but the meta unit receives the output of the sub-units’ machine learning models. The question remains whether data from the sub-units can be reproduced from these abstract outputs-and, consequently, whether data confidentiality is preserved or violated.

Table 4 Comparison of scenarios in regard to data confidentiality

To answer this question, we first need to regard the raw data in each sub-unit. The dataset contains an extremely large number of anonymized features. Features are named according to a convention that tells reports on the production line, the station on the line, and a feature number. For example, L3_S36_F3939 is a feature measured on line 3, station 36, and is feature number 3939. An example of an observation is depicted in Table 5. Every row represents one part that is described by different features at each station. Every feature represents measurements performed for the specific part at the respective station during the production process.

Table 5 Excerpt of raw data for sub-units 0 and 3

Now each sub-unit builds its own model, based on the goal of predicting the target value (scrap, no scrap), and communicates this prediction to the meta unit. The communicated output of prediction and its probability is depicted in Table 6. For each part, the sub-set of data is analyzed at each line towards the attribute “scrap” or “no scrap”. Each sub-prediction also contains a probability score of the respective prediction.

Table 6 Excerpt of sub model output and probabilities for sub-units 0 to 3

There is no possibility of reconstructing the raw data from Table 5 with the abstract predictions of Table 6. The machine learning model of each sub-unit is highly complex and a reconstruction from a binary value and a probability is impossible, as the nature, amount and type of the raw features are unknown to the meta unit. We can therefore positively answer RQ1, as data confidentiality is preserved in the scenario of meta machine learning.

6.3.2 Reduction of transferred data volume (DR2)

We also ask whether transferred data volume can be drastically reduced during comprehensive analyses. In this section, we evaluate the reduction of transferred data volume in business networks between sub-units and a meta unit by comparing the different scenarios depicted in Fig. 2. In this case, units are represented by organizational business units, sites, or companies.

Table 7 Comparison of scenarios regarding data volume with k - amount of sub-units; n - number of input features; m - number of output features of sub models; s - volume of a feature

Assuming that all features require the same amount of space s and there are k sub-units with a varying number of features, the transferred volume for a comprehensive network-wide analysis using all available data in a shared data pool (scenario 3) is composed of the sum of all sub-units’ features multiplied by the feature size. However, no data is transferred in the case of isolated analysis within individual units (scenario 1) without data exchange. In comparison, applying the proposed architecture of inter-organizational meta machine learning (scenario 2), every sub-unit individually analyses its own data (i.e., features produced by a certain organizational unit) and only transfers the output to the meta unit. These three scenarios with their transferred data volume between sub-units and the meta unit is depicted in Table 7. Thereby, the volume of data to be transferred in scenario 2 is reduced compared to scenario 3, assuming that the number of output features m of a sub-model is smaller than its number of input features n. This leads to savings of \((\sum _{i=1}^{k}{n_i}-k*m)*s\) when considering scenario 2 compared to scenario 3. Accordingly, the reduction ratio is described by \(\frac{k*m}{\sum _{i=1}^{k}{n_i}}\).

Regarding the industrial use case from our evaluation, we have four production lines as sub-units with different numbers n of features or columns per data instance: \(n_i\in \{173,519,48,251\}\). Each sub-model predicts a certain output based on its input features. Due to the very small number of output features (m = 2) compared to the number of input features (scenario 2), the data volume to be transferred to the meta unit is reduced to 0.81% of the volume in the case of complete information in a shared data pool (scenario 3) considering our presented industrial use case. We can therefore answer RQ2 and demonstrate that our method enables the drastic reduction of the required amount of transferred data volume.

6.3.3 Performance of method (DR3)

Finally, we are interested in the performance of our method in comparison to meaningful benchmarks-and estimate the “loss of privacy” of a scenario with meta learning and distributed data sources in comparison to one shared data pool. In Sect. 3, we give an overview of our research design and consider three scenarios that require comparison: In the first scenario, units in a network perform an isolated analysis. In the second one, we consider our meta machine learning method to realize comprehensive analysis. In the third scenario, we draw on a complete analysis of all data available in one shared data pool. By comparing the performances between scenarios 1 and 2, we expect to see a performance increase due to the comprehensive meta learning approach. Between scenarios 2 and 3, two effects could occur: increased performance (performance gain) due to the application of stacked generalization or performance loss due to the processing of prediction outputs rather than raw data (loss of abstraction).

During the meta machine learning classification process, one sub- model per line is trained. However, in the chosen dataset, not all parts pass each of the four lines. Figure 11 depicts the relative amount of parts passing a certain line. Accordingly, for parts that pass only a subset of lines, only predictions of sub-models of these lines are used as input features for the meta model.

We present an overview of our results in Table 8. The sub-model performances in the form of an MCC range from 0.1935 to 0.2326 (for additional metrics see Appendix A.1 on page 25). As depicted in Figs. 5 and 9, not every part passes every line, making a comparison of results of the sub-models difficult. However, we can see that the meta model yields a performance increase of 21.32% compared to the best performing sub-model by reaching an MCC of 0.2822. We can conclude that the meta model aggregates the information of the sub-model outputs and is able to draw comprehensive conclusions that are superior to the ones of the sub-models (performance increase).

Fig. 11
figure 11

Share of parts passing a certain line

These results are consistent with the findings of Džeroski and Ženko [32]. We can therefore already partly answer RQ2, as we observe a significant increase in statistical performance when comparing an isolated scenario with the applied meta machine learning method.

As expected by Narayanan and Shmatikov [62], in scenario 3 (shared data pool) we reach a slightly superior performance of 0.2965 compared to all regarded baselines, surpassing the meta model’s performance by 5.07% (loss of abstraction). Despite the extremely low information content of the training data visible to the meta model compared to the complete classification, the performance deteriorates only slightly. We can therefore fully answer RQ2, as we regarded the baselines of all scenarios.

Table 8 Technical performance of method compared to other scenarios

Table 8 depicts the optimal model parameters, number of estimators, and maximum tree depth of the respective models. Especially the maximum tree depth ranges between 25 and 50 for all sub-models and the meta model. The complete model trained on all lines performs best with a maximum tree depth of 200. The estimators’ parameter representing the number of trees used for a model varies between 50 and 200 for sub-models and meta models, and is 300 for the complete model.

Summarizing the results, we show the technical feasibility of our method regarding data confidentiality preservation (DR1) and data volume reduction during a comprehensive analysis (DR2). We identify a performance gain (DR3) that is enabled by our method in comparison to an isolated analysis (scenario 1 vs. scenario 2), but also a performance loss (scenario 2 vs. scenario 3) due to the analysis of abstract prediction outputs (loss of abstraction). Although this performance loss seems rather small, it depicts a consideration between performing a comprehensive analysis of all raw data sources at once or a confidentiality-preserving one. We denote the “price of privacy” as the difference between the effectiveness of a scenario with perfect data availability (but a violation of privacy) and a distributed meta-analysis without the exposure of sensitive data. In our case, the price is rather small (5% loss of MCC), but we gain the possibility to hide the raw data from other units in a business network-and still allow them to cooperate in terms of holistic analyses. Compared to our proposed approach, noising as an alternative shows a significantly higher price of privacy (12% loss of MCC, cf. Appendix A.2 on page 27). In general, the entire business network can profit from such analyses, as the comparison of performance to isolated analyses is remarkable and the scenario with a shared data pool is highly improbable for different legal units [51]. Furthermore, we show the increased performance of our method also in an additional industrial use case (see Appendix A.3 on page 27 for more details).

6.4 Evaluation episode 2: usefulness

After the technical evaluation of the artifact, we now aim to evaluate its usefulness within its designated application field (RQ2). To this end, we discuss the developed artifact with practitioners from our industry partner as part of a workshop. The aim is to gain feedback on the artifact in general as well as its perceived usefulness. The workshop participants are from different divisions with different roles in the company. An overview of their characteristics is depicted in Table 9.

Table 9 Overview of workshop participants

We elaborate on the artifact’s capabilities, demonstrate it and let them interact with it. We discuss advantages and disadvantages and provide the experts with a short questionnaire on the perceived usefulness using the measures developed by Davis [25]. As the artifact is in an early stage and usability aspects were not of interest, we omit measures of ease of use in this evaluation episode and focus on the more general aspect of artifact adoption, regardless of the detailed user interface choices [77]. The perceived usefulness measure prompted participants to indicate their level of agreement on six items about how the artifact would enable them to perform tasks quicker, increase their performance on the job, increase their productivity, increase their effectiveness, increase their easiness in the job environment as well as an assessment on the general usefulness. Responses range from “very unlikely” (1) to “very likely” (5) on a 5-point Likert-type scale. Several studies have indicated satisfactory reliability for perceived usefulness in TAM for artifacts in an early development stage [70]. The results of the aggregated questionnaire are depicted in Table 10.

Table 10 Results of an expert workshop on the perceived usefulness of IOMML. Items are rated on a Likert scale of 1 (“unlikely”) to 5 (“likely”). N=3

All participants (n = 3) demonstrate a positive attitude towards IOMML with a median of “4” in all six questions. In discussion with the experts, multiple aspects arise. First of all, \(\beta\) mentions that fast analyses are often important in their daily work: “With over 60 TB of transferred sensor data per day, any abstraction that still allows analyses is beneficial to us”. Participant \(\gamma\) tributes that the incorporated process model also contains the training phase, which is often neglected when implementing IT artifacts. However, he is doubtful about the necessary incentive of the affected employees within an organization to implement a system that first has to be trained for a certain amount of time before it can be put into production. Both, \(\alpha\) and \(\beta\) note that the aspect of the live analysis of distributed data sources with meta machine learning would be highly beneficial, because in the current state such analyses (if possible at all) could only be done after something went wrong, e.g., a part not being within quality. Then the department typically starts an intensive investigation, which becomes very complicated once it leaves company borders. When discussing a possible productive implementation, \(\alpha\) notes that some suppliers would even be open to sharing data for analyses to increase their unique selling point towards an OEM. Within the same legal entity, access to both the raw data or abstracted predictions would not be an issue (\(\alpha\) and \(\beta\)).

In regard to other application areas within their company, they note that only critical processes would be of interest. All three experts raise legal concerns and elaborate that this aspect needs more attention.

7 Conclusion

This work aims to overcome the data confidentiality and transfer volume barriers caused by distributed data sources across different units in business networks. Specifically, we propose an inter-organizational meta machine learning method (IOMML) built on meta machine learning and service-oriented knowledge as kernel theories. In our setup, we differentiate between various scenarios in a business network, instantiate our method based on an industrial use case and evaluate it according to the feasibility of preserving data confidentiality and reducing the volume of transferred data during analysis and the overall prediction performance. We show, first, implications for its suitability in a production control interface implemented via a service-oriented architecture. Furthermore, we discuss the potential usefulness of the artifact with practitioners. Our contribution to the body of knowledge is threefold: First, we propose a flexible method that can be used in business networks to perform comprehensive analyses on a distributed data source and show its technical feasibility in terms of a prototypical instantiation, preserved data confidentiality, reduced data volume, and statistical performance. Second, we show that the artifact is perceived as useful within its application context. Third, we show that the method of IOMML could be well feasible compared to the two scenarios of either sharing all data or no data within a business network.

In addition to these theoretical contributions, concrete managerial implications are obvious: The proposed method allows units in business networks to share insights without exposing data-a possibility that has so far been limited in traditional settings. Especially in co-opetition networks [13] such a method can lower the barrier for individual units to collaboratively work on insights that are a shared interest among all parties. However, even if all units would (in theory) agree to share all data, it would be technically challenging to transfer all data, especially in production scenarios with large data streams [76]. With the drastic data volume reduction of the proposed method, analyses of large, distributed data sources become possible. Lastly, the application of the method would facilitate comparability among different units and drive standardization towards a uniform structure and schema of gathered data. This would be especially true for all platforms thriving on shared data, for example in the area of predictive maintenance.

While there is potential for theory and practice, our work also poses several limitations that need to be addressed in future research. As of now, we only instantiate the developed method in an artificial industrial use cases to test its feasibility. However, additionally, we conduct a robustness check on a second industrial case (see Appendix A.3 on page 27). In our main evaluation, the test performed with the artifact involved units of the same organization. To generalize and deduce insights on its projectability to other problems and domains, further evaluation and studies are needed. Future work requires researchers to elaborate on how the proposed method can be applied in a real-world business network. For example, a consortium of different value co-creating businesses could apply this method in an experimental setup to observe and size individual benefits. Furthermore, we do not include concrete aspects of the instantiation of our approach using IT systems or services, as we only address the conceptual aspects of the information flow between business entities, but not infrastructure-specific properties. Additionally, we only evaluated the perceived usefulness of the artifact, not its actual usefulness and usability in use [11]. We evaluate the technical efficiency of the proposed approach to preserve the confidentiality of data originating from subordinate entities. However, we have to acknowledge the possibility of information leakage through the sub-predictions. By analyzing the aggregated sub-predictions, one could for instance derive insights into the reliability of each entity. Thus, we can only account for preserving the raw information values of each entity and not overlying concepts or paradigms that might or might not materialize through abstract sub-predictions. However, we also observe a continuum between the absence of inter-organizational analytics—and a full exchange and exposure of data. In this continuum, the level of shared information increases. Organizations have to make a trade-off: living with a fraction of analytical insights, or opening up—and potentially exposing information through the sub-predictions, but receiving system-wide insights.

Regarding the technical dimensions of the proposed method, we only reviewed stacking as a possibility of meta machine learning. It would be interesting to explore alternative types and algorithms, for example, distributed deep networks. As a basis for these algorithms, the features for meta learning could be altered and additional information could be communicated to the meta unit besides prediction and probability, such as the number of features, training parameters, or additional meta data. Apart from the technical aspects of our work, a thorough assessment of the organizational aspects of the proposed method is still required. This includes but is not limited to, questions on how the proposed method would perform in a real-world scenario, how a system would need to be designed to incentivize all entities to participate, and how and where the meta unit is governed. This includes the legal dimensions, questions of ownership, and liability. Finally, while the method is able to preserve the confidentiality of sub-units’ attributes to other units during analysis, it is not able to mask the existence of the instance itself, which limits its privacy-preserving characteristic. Despite these limitations, the proposed method could fundamentally change the way of communication between the units of a business network, foster system-wide analytics, and, therefore, improve overall network productivity.