1 Introduction

The advent of Industry 4.0 requires the smartization of industrial processes, extending smart manufacturing to smart storage and smart logistics [1]. Industrial resources are the basis of the development and reconstruction of value chains, and the integration and sharing of such resources become increasingly important in the dawn of Industry 4.0 [2].

With the highly specialized division of labor and the continuous upgrading of products, industrial resources experience exponential growth in the entire product life cycle [3]. This growth inevitably leads to several problems. First, scattered systems impede enterprise collaboration, and the innovation and safety of the manufacturing value chain are difficult to guarantee [4]. Second, there is a lack of integration models for product innovation and resource scheduling. At last, massive long-tail resources (i.e., the resources are rarely referenced within a knowledge graph) cause the cold-start problem (i.e., the engine cannot make reliable recommendations due to an initial lack of ratings) in most of the reported recommendation engines [5]. These problems compromise the robustness and generalization of recommender systems, and thereby hinder the sharing and reuse of industrial resources throughout the product life cycle.

The previous works focus on utilizing knowledge graphs to construct interpretable recommender systems [6,7,8]. Yet, till now, insufficient attention has been paid to building recommender systems for the value chains in the manufacturing industry. Despite several existing works using auxiliary information to overcome the data sparsity problem of user-item interactions [9] , their performance is far from satisfaction because auxiliary information is always unavailable for newly arising data and long-tail data [10]. Smart resource aggregation and collaborative recommendation require deep learning models [11, 12] , while these models are usually suboptimal under low-resource conditions.

To address the above challenges, a proper meta-model is needed to integrate and share heterogeneous industrial resources, where sufficient annotated data for training is always unavailable. In this article, a novel framework for industrial resource recommendation in low-resource conditions is proposed to improve the seamless collaboration between enterprises and thereby promote the production and operation efficiency of manufacturing enterprises.

Our method consists of the following steps. First, scattered industrial resources are integrated and correlated to form an industrial knowledge graph based on the resource schema. Second, to tackle the sparsity within the graph, we conduct schema-based reasoning to identify the potential implicit relations and heuristically complete the graph. Lastly, we propose a novel multi-head attention-based meta relational learning model that learns the latent representations of relation-meta to solve the cold-start problem caused by the long-tail resources. The predicted links are used to recommend industrial resources to corresponding entity nodes.

Our main contributions are summarized as follows.

  • 1)We propose an industrial knowledge graph model to integrate heterogeneous resources throughout the product life cycle. We further formulate long-tail recommendations as a few-shot relational learning problem of learning-to-recommend resources with few interactions.

  • 2)We conduct schema-based reasoning to mine the potential implicit relations, as well as to complete the industrial knowledge graph.

  • 3)We propose a novel multi-head attention-based meta relational learning model to improve the use of long-tail resources and to address the cold-start problem.

  • 4)We develop a graph-based platform in which the proposed recommendation algorithm is incorporated to recommend industrial resources in low-resource conditions. A business metric, net promoter score, is adopted to evaluate the recommender systems in low-resource conditions.

The rest of the paper is organized as follows. Section 2 presents the literature review on related methods. Our method is illustrated in Sect. 3. Section 4 reports the experimental results. Section 5 displays analysis and discussion. Section 6 concludes this work.

2 Background

2.1 Knowledge Graph-Based Recommendation

Recently, personalized recommender systems based on knowledge graphs are proposed to improve performance and provide interpretability. In general, the recommender algorithms can be categorized into three classes, i.e., feature-based methods, path-based methods, and embedding-based methods [13].

The feature-based methods, as their name indicate, extract the features of users and/or items for the machine learning process [14]. Sohail et al. [15] present an opinion mining-based recommendation technique to provide the university students with promising books for their syllabus; however, this technique could be subjectively biased. Uyangoda et al. [16] apply a user-profile-feature-based approach to improve the recommender system with few user records. Dai et al. [17] propose a feature-based bayesian task recommendation scheme to overcome the challenge of emerging recommendations, but the scheme cannot address the changes in users’ interests. Yang et al. [18] propose a meta-feature-based approach, called Explainable Recommendation Framework, to render explainable recommendations for both warm-start and cold-start users/items in a unified framework. Yet, these methods typically neglect the implicit information in graph structures [19]. The structural information in the graph could contribute to capturing the pairwise relations [20], and therefore modeling structural patterns is beneficial to recommender systems [21, 22].

The path-based methods learn the paths between users and items to construct the knowledge graph. Yu et al. [23] investigate the entity recommendation problem in heterogeneous information networks and propose the use of implicit feedback data for developing personalized recommendation models. Zhao et al. [24] introduce the concept of meta-graph to HIN-based recommendation and solve the information fusion problem with a novel approach. Unfortunately, these methods still exhibit poor generalization performance due to the manually designed paths.

The embedding-based methods embed users and items to a low vector space to improve the accuracy of recommendations. Wang et al. [25] develop a deep knowledge-aware network that incorporates knowledge graphs into news recommendations. Zhao et al. [8] leverage heterogeneous information in a knowledge base to improve the recommendation quality. Ripple Network [26] stimulates the propagation of user preferences over knowledge entities by automatically and iteratively extending a user’s potential interests along with links in the knowledge graph. Multi-task feature learning approach [27] is a deep end-to-end framework that uses knowledge graph embeddings to assist resource recommendation. Yet, these embedding-based methods have limited applicability under low-resource conditions, see Sect. 4.2.2.

How to improve the performance of recommender systems based on knowledge graphs is a recent research hotspot. To resolve these problems, researchers have incorporated auxiliary information into recommendation algorithms. These works mine users’ historical records for implicit information and improve the model structure for better performance. Contrarily, data sensitivity and knowledge specificity of industrial resources require massive labeled data for training, but manual tagging is extremely costly and time-consuming. In addition, owing to their multi-disciplinary and heterogeneous nature, industrial resources tend to be highly complicated and professional to construct knowledge graphs for the manufacturing value chains, especially in low-resource conditions.

2.2 Industrial Knowledge Graph

There are several well-known knowledge graphs such as CN-DBpedia [28] and Wikidata [29], but limited attention has been paid to knowledge graphs in industrial resource integration such as procurement and manufacturing. The current research on industrial knowledge graphs can be categorized into three parts, including the construction of industrial knowledge graphs, knowledge deduction, and applications of industrial knowledge graphs.

The construction of industrial knowledge graphs aims to aggregate massive data from industrial products and services to create artificial intelligence for industrial applications [30]. Knowledge extraction technologies, such as data mining, natural language processing, and deep learning, are designed to extract entities and relations from unstructured industrial resources [31,32,33]. Researchers also enhance industrial knowledge graphs with generic knowledge graphs and exploit the storage of industrial knowledge graphs [34,35,36,37]. However, at present, industrial knowledge graphs are still entangled with the emerging resources, the knowledge mining of possible implicit relations between resources, and the collaboration of domain experts within the graph construction [38].

The knowledge deduction of industrial knowledge graphs focuses on the multi-hop semantic search and knowledge reasoning based on the industrial knowledge graphs [39,40,41]. The industrial knowledge deduction can be categorized into attribute deduction and relationship deduction. Practically, these methods map attributes and relations to a low-dimensional vector space and transform the knowledge deduction process into matrix operations [42,43,44,45]. Nevertheless, knowledge deduction also requires massive labeled data for training [38]. Moreover, the few-shot problems under low-resource conditions are usually overlooked.

Industrial knowledge graphs can be applied in a series of scenarios like industrial collaboration, resource representation learning, and intelligent search. The typical representative applications include question answering with knowledge graphs [46], visualization of knowledge graphs [47], and fault diagnosis [48]. However, these practical implementations impose constraints on the links in knowledge graphs by defining a schema or an ontology [49]. Worse still, knowledge representation is always much more complicated in the manufacturing industry [50].

2.3 Few-Shot Link Prediction

Knowledge deduction based on knowledge graph embedding [51] (KGE) usually assumes that there are sufficient triples of entities and relations for training. However, the applicability of this approach is limited due to the two following aspects. First, long-tail resources are widespread in knowledge graphs, and those newly added relations often do not have many known samples for training [52]. Second, emerging resources in the manufacturing industry also cause the few-shot learning problem, while such resources tend to be ignored in prior works [53].

Compared with the methods that require sufficient training data, the performance of few-shot learning problems is poorer [54]. Further, embedding-based methods for few-shot link prediction always perform awkwardly on relations that only have a few associative triples [55]. Therefore, recommender systems are unable to make reliable recommendations due to these long-tail constraints, also known as the cold-start problem.

In sum, the industrial knowledge graph is a promising solution to the integration, sharing, and management of domain knowledge, while industrial resources require a more canonical and formal form to fuse heterogeneous and multi-disciplinary resources for cooperation. Still, the industrial recommender systems are suffering from the cold-start problem caused by the newly arising resources and the long-tail resources.

3 Method

Here, a novel industrial knowledge graph is constructed based on the predefined resource schema. We further conduct rule reasoning on the schema to heuristically complete the knowledge graph. The inference rules are added to the support sets to train the multi-head attention-based meta relational learning algorithm. The industrial knowledge graph aims at integrating resources across the entire product life cycle, and our meta relational learning model is developed to recommend resources under low-resource conditions.

3.1 Industrial Knowledge Graph

3.1.1 Industrial Knowledge Graph Construction

The industrial knowledge graph is a promising solution to integrate industrial resources and enhance knowledge sharing across different manufacturing sectors [38]. In this paper, an industrial knowledge graph is constructed to support our recommendation algorithm in low-resource conditions. Hence, it is feasible to apply our method to other domains by modifying the resource schema and conducting rule-based reasoning with the proposed meta relational learning algorithm. From Fig. 1, the resource schema is based on the resource classification tree that categorizes industrial resources into four classes, namely knowledge resources, business data resources, human resources, and product resources.

Fig. 1
figure 1

Resource classification tree

Specifically, knowledge resources, including patents, standards, papers, achievements, regulations, and reports, enhance the knowledge service capabilities for product innovations [56]. Business data resources are accumulated from business processes, from user demands to records of maintenance. These data comprise the implicit and empirical knowledge for intelligent manufacturing. Human resources refer to the information of experts who ensure the orderly management of other resources. Product resources are the entities of modules that enable manufacturing enterprises to focus on the interplay between innovation and design processes, and they also embed a co-creation paradigm between firms and customers for mass customization [57].

The overview of the industrial knowledge graph construction process is shown in Fig. 2. Specifically, the resource schema [58] is designed based on the resource classification tree that describes the hierarchy concepts and their relations. Open Information Extraction (OpenIE) annotator [59] is used to extract open-domain relation triples within structured and unstructured data. Neo4j JDBC driver [35] and RDF2Neo4j interpreter [60] are employed for data mapping, and attribute-based fusion [61] is used to fuse industrial resources from scattered relational databases in a unified paradigm. In such a paradigm, the entity set represents heterogeneous resources, and the edge set represents the relations among industrial resources. The information and knowledge in resources are set as properties of the nodes. The industrial knowledge graph is then integrated and stored in the graph database Neo4j [35]. For KGE, entities and relations form triples that are embedded in a low-dimensional vector space using TransE [62]. Further, the community detection algorithm Cluster-GCN [63] and the few-shot multi-hop reasoning algorithm Meta-KGR [64] are used to support schema-based reasoning (see Sect. 3.1.2). Our few-shot relational learning algorithm (see Sect. 3.2) is proposed to complete the industrial knowledge graph and recommend industrial resources in low-resource conditions. Lastly, a graph-based platform that provides intelligent services like our recommendation engine is developed (as shown in Sect. 4.2).

Fig. 2
figure 2

Overview of the industrial knowledge graph construction

3.1.2 Schema-Based Reasoning

Schema-based reasoning is an inductive reasoning process that represents a generalized form of case-based reasoning [65]. We conduct schema-based reasoning to complete the industrial knowledge graph and resolve the cold-start problem by generating more training data for the few-shot link prediction in Sect. 3.2.

Schema-based reasoning on industrial knowledge graph can be categorized into four classes, namely historical rules, performance rules, community rules, and path rules. The symbols in Table 1 are defined as follows. Assuming \(a\) is a manufacturing enterprise that would benefit from the industrial knowledge graph. \(b\) and \(b^{\prime}\) are potential suppliers of \(a\). \(c\) and \(c^{\prime}\) are parts and components that are required by \(a\).

Table 1 Schema-based Reasoning

By their definitions, (1) historical rules associate newly added resources based on the historical or empirical data, i.e. if the enterprise \(a\) cooperated with the supplier \(b\) in the past, and the supplier \(b\) and \(b^{\prime}\) are similar in production, then \(b^{\prime}\) would be linked to \(a\). (2) Performance rules are proposed to link candidate resources that achieve comparable performance. For instance, if the product \(c^{\prime}\) achieves comparable machining accuracy with the product \(c\) that is required in the production process of the manufacturing enterprise \(a\), then \(c^{\prime}\) would be linked to \(a\). (3) Community rules are introduced to associate resources in the same community based on graph structure information, i.e. if industrial enterprises \(a\), \(b\), and \(b^{\prime}\) are densely connected in the knowledge graph, then \(b\) and \(b^{\prime}\) would be linked to \(a\). (4) Path rules are generated based on the path in the graph to discover long-tail resources for multi-hop reasoning, i.e. if enterprises \(a\) and \(b\) are partners, \(b\) and \(b^{\prime}\) are also partners, then \(b^{\prime}\) would be linked to \(a\).

In sum, schema-based reasoning can be used to address the sparsity issue of industrial knowledge graphs. Such rules are competent to discover and unveil the implicit relations among resources and associate long-tail resources to complete the industrial knowledge graph. Moreover, schema-based reasoning generates more training data for few-shot link prediction [55], which is regarded as a promising solution to the cold-start problem in recommender systems [66, 67].

3.2 Multi-head Attention-Based Meta Relational Learning Algorithm with Schema-Based Reasoning

The sparsity of knowledge graphs brings about long-tail data that are the principal cause of the cold-start problem of recommendation algorithms. To fix this problem, we conduct schema-based reasoning to generate more training data and employ the few-shot relational learning on the industrial knowledge graph, which transfers the recommendation task to a simple ranking problem of candidate predictions in low-resource conditions.

An example of the recommendation algorithm based on the few-shot link prediction is shown in Fig. 3. First, the schema-based reasoning strategy is conducted on the industrial knowledge graph to generate more data in the support set and query set for few-shot learning. The meta relational learning model is further proposed to learn the relation-meta. Finally, a recommendation engine that is based on the rankings of predicted links among long-tail resources is introduced to resolve the cold-start problem.

Fig. 3
figure 3

Resource recommendation algorithm based on the few-shot link prediction

Different from traditional deep learning models that are slow and highly computational expensive, few-shot link prediction aims to gain the capability of predicting new triples about a specific relation by only observing a few triples. Hence, our model is designed to predict any newly added relations without fine-tuning, while existing models always require massive training data to adapt to newly added relations.

The structure of the multi-head attention-based meta relational learning is shown in Fig. 4, in which a representation layer, a relation-meta encoder, and a training module are included. The representation layer maps entities and relations to a hyper embedding space. A relation-meta encoder is proposed to learn a mapping from head entities to corresponding tail entities in the support set. The training module constructs the objective function and updates the relation-meta that is transferred from the support set to the query set, enabling our model to address the few-shot link prediction.

Fig. 4
figure 4

Multi-head attention-based meta relational learning for few-shot link prediction

3.2.1 Representation Layer

Within a support set \({\mathcal{S}}_{r}\) and a query set \({\mathcal{Q}}_{r}\), a fact \(\left( {h_{i} ,t_{i} } \right)\) is defined as an entity pair that comprises the head entity \(h_{i}\) and the tail entity \(t_{i}\). The corresponding relation in the fact is \(r_{i}\). Entities and relations are first embedded in a hyper vector space, either by random initialization or pre-trained embeddings. [54] Given one few-shot link prediction task \({\mathcal{T}\ominus }_{r} = \left\{ {{\mathcal{S}}_{r} ,{\mathcal{Q}}_{r} } \right\}\), each fact \(\left( {h_{i} ,t_{i} } \right)\) of the task \({\mathcal{T}\ominus }_{r}\) is mapped to get the embeddings of entities and corresponding relations in Eq. (1).

$$\begin{gathered} x_{hi} \, = \,\,e(h_{i} ) \hfill \\ x_{ri} \, = \,\,e(r_{i} ) \hfill \\ x_{ti} \, = \,\,e(t_{i} ) \hfill \\ \end{gathered}$$
(1)

where \({\text{e}}\left( {} \right)\) denotes an embedding lookup table, \(k{ = }\left| {{\mathcal{S}}_{r} } \right|\) is the shot number, and \(i \in \left\{ {1,2...,k} \right\}\).

We concatenate entity embeddings both for the positive fact \(\left( {h_{i} ,t_{i} } \right)\) and the negative one \(\left( {h_{i} ,t^{\prime}_{i} } \right)\), generating inputs of the relation-meta encoder in Eq. (2). The symbol \(\oplus\) represents the concatenation function.

$$\begin{gathered} x_{{p_{i} }}^{0} = x_{{h_{i} }} \, \oplus \,x_{{t_{i} }} \hfill \\ x_{{n_{i} }}^{0} = x_{{h_{i} }} \, \oplus \,x_{{t_{i} }} \hfill \\ \end{gathered}$$
(2)

where \(x_{{h_{i} }}\), \(x_{{t_{i} }}\), and \(x_{{t^{\prime}_{i} }}\) are embeddings of the head entity \(h_{i}\), the positive tail entity \(t_{i}\), and the negative tail entity \(t^{\prime}_{i}\).

3.2.2 Relation-meta Encoder

The relation-meta encoder consists of two stacked encoding modules and an inference long short-term memory network (LSTM) sublayer. Since the previous implementation is based on simple multi-layer perceptron [55] (MLP), it could lead to vanishing gradient problems when the network goes deeper [68]. In comparison, here we extract fact-specific relation-meta via the multi-head attention-enhanced LSTM, allowing the encoder to capture implicit long-range semantic dependencies [69, 70] between the head entities and tail entities.

As shown in Fig. 4, an encoding module consists of an LSTM sublayer and a multi-head attention layer, learning a deeper representation of facts for relation encoding. In Eq. (3), both the positive embedding \({\mathbf{x}}_{{{\text{p}}_{i} }}^{0}\) and the negative one \({\mathbf{x}}_{{{\text{n}}_{i} }}^{0}\) are fed into an LSTM sublayer that yields an intermediate output representation \(\left\{ {{\mathbf{H}}_{{{\text{p}}_{i} }}^{l} ,{\mathbf{H}}_{{{\text{n}}_{i} }}^{l} } \right\} \in {\mathbb{R}}^{k \times d}\), where \(k\) and \(d\) denote the shot number of the task \({\mathcal{T}\ominus }_{r}\) and LSTM hidden size, respectively. \(l\) here is the layer number of the encoding module.

$$\begin{gathered} {\mathbf{H}}_{{{\text{p}}_{i} }}^{1} = LSTM\left( {{\mathbf{x}}_{{{\text{p}}_{i} }}^{0} } \right) \hfill \\ {\mathbf{H}}_{{{\text{n}}_{i} }}^{1} = LSTM\left( {{\mathbf{x}}_{{{\text{n}}_{i} }}^{0} } \right) \hfill \\ \end{gathered}$$
(3)

Then a multi-head attention sublayer is stacked to the above LSTM sublayer, forcing the encoder to focus on the commonalities of facts. For the multi-head attention mechanism, we assign \({\mathbf{Q}} = {\mathbf{H}}_{{{\text{p}}_{i} }}^{1} ,{\mathbf{K}} = {\mathbf{V}} = {\mathbf{H}}_{{{\text{n}}_{i} }}^{1}\). Hence, the attention mechanism is calculated using Eq. (4).

$${\text{attention}}\left( {{\mathbf{Q}},{\mathbf{K}},{\mathbf{V}}} \right) = {\text{softmax}}\left( {\frac{{{\mathbf{QK}}^{\rm T} }}{\sqrt d }} \right){\mathbf{V}}$$
(4)

We use multi-head for capturing the distribution difference between positive facts and negative ones in parallel.

$$\begin{gathered} {\text{head}}_{i} = {\text{attention}}\left( {{\mathbf{QW}}_{i}^{Q} ,{\mathbf{KW}}_{i}^{k} ,{\mathbf{VW}}_{i}^{V} } \right) \hfill \\ {\mathbf{H}}_{{a_{i} }}^{1} = {\text{concat(head}}_{1} {,}...{\text{,head}}_{q} {)} + {\mathbf{H}}_{{{\text{p}}_{i} }}^{1} \hfill \\ \end{gathered}$$
(5)

where \({\mathbf{W}}_{i}^{Q} \in {\mathbb{R}}^{{d \times \tfrac{d}{q}}} ,{\mathbf{W}}_{i}^{k} \in {\mathbb{R}}^{{d \times \tfrac{d}{q}}} ,{\mathbf{W}}_{i}^{V} \in {\mathbb{R}}^{{d \times \tfrac{d}{q}}}\) are trainable parameters, and \(q\) is the number of parallel heads. \({\mathbf{H}}_{{a_{i} }}^{1}\) denotes the intermediate output representation of the first multi-head attention sublayer. Then the output of the first encoding module \({\mathbf{H}}_{i}^{1}\) is the concatenation of \({\mathbf{H}}_{{{\text{p}}_{i} }}^{1}\) and \({\mathbf{H}}_{{a_{i} }}^{1}\).

$${\mathbf{H}}_{i}^{1} = {\mathbf{H}}_{{{\text{p}}_{i} }}^{1} \oplus {\mathbf{H}}_{{a_{i} }}^{1}$$
(6)

We stack two encoding modules in the relation-meta encoder. Here, \({\mathbf{H}}_{i}^{2} = {\mathbf{H}}_{{{\text{p}}_{i} }}^{2} \oplus {\mathbf{H}}_{{a_{i} }}^{2}\) denotes the output of the second encoding module. Then, an inference LSTM sublayer that only modifies dropout to be 0 is added to generate the fact-specific relation-meta \(R_{i}\) in Eq. (7)

$$R_{i} = LSTM\left( {H_{i}^{2} } \right)$$
(7)

At last, the relation-meta in Eq. (8) is the average of all the fact-specific relation-meta \(R_{i}\). \(k\) denotes the shot number of the task \({\mathcal{T}\ominus }_{r}\).

$${\mathcal{R}}_{{{\mathcal{T}\ominus }_{r} }} = \frac{{\sum\nolimits_{i = 1}^{k} {R_{i} } }}{k}$$
(8)

3.2.3 Training

A score function is used to evaluate the relation-meta and the loss function is employed to update the whole model. The score function of the support set is defined in Eq. (9) based on TransE [62]. Inspired by Lin et al. [71], a penalty term is added to ensure the relation-meta \({\mathcal{R}}_{{{\mathcal{T}\ominus }_{r} }}\) not too far away from the original relation embedding \(x_{{r}{{{i}} }}\). \(\lambda\) is the weight parameter for this constraint, and \(\left\| {} \right\|_{2}^{2}\) is the squared L2 norm operation.

$$s_{{(h_{i} ,t_{i} \,}} = \,\left\| {x_{{h_{i} }} + R_{{T_{r} }} - x_{{t_{i} }} \left\| {\mathop {}\limits_{2}^{2} } \right.} \right. + \lambda \left\| {R_{{T_{r} }} - x_{{r_{i} }} \left\| {\mathop {}\limits_{2}^{2} } \right.} \right.$$
(9)

In Eq. (10), margin ranking loss [72] is conducted to compute the loss of the support set \({\mathcal{S}}_{r}\) for parameter updating.

$${\mathcal{L}}\left( {S_{r} } \right) = \sum\limits_{{\left( {h_{i} ,t_{i} } \right) \in S_{r} }} {\left[ {m + s_{{\left( {h_{i} ,t_{i} } \right)}} - s_{{\left( {h_{i} ,t^{\prime}_{i} } \right)}} } \right]}_{ + }$$
(10)

where \(m\) represents the margin hyperparameter. \(s_{{\left( {h_{i} ,t_{i} } \right)}}\) is the score of the positive fact and \(s_{{\left( {h_{i} ,t^{\prime}_{i} } \right)}}\) is the score of the corresponding negative fact. \(\left[ {} \right]_{ + }\) denotes the positive part of the function.

In Eq. (11), we compute the gradient of parameters, denoted as \(\nabla_{{{\mathcal{R}}_{{{\mathcal{T}\ominus }_{r} }} }} {\mathcal{L}}\left( {S_{r} } \right)\), in the support set \({\mathcal{S}}_{r}\) to update the relation-meta and transfer it to the query set \({\mathcal{Q}}_{r}\) to get the updated relation-meta \({\mathcal{R}^{\prime}}_{{{\mathcal{T}\ominus }_{r} }}\) and the updated relation embedding \(\tilde{x}_{{r}{{{j}} }}\). \(\beta\) here indicates the updating step size.

$$\begin{gathered} \tilde{R}_{{T_{r} }} = R_{{T_{r} }} - \beta \nabla_{{R_{{T_{r} }} }} {\mathcal{L}}(S_{r} ) \hfill \\ \tilde{x}_{{r_{j} }} = x_{{r_{j} }} - \beta \nabla_{{R_{{T_{r} }} }} {\mathcal{L}}(S_{r} ) \hfill \\ \end{gathered}$$
(11)

Thus, the scores and loss of query set in Eq. (12) and Eq. (13) follow the same procedure in the support set.

$$s_{{\left( {h_{j} ,t_{j} } \right)}} = \mathop {\left\| {x_{hj} + {\tilde{\mathcal{R}}}_{{{\mathcal{T}}r}} - x_{tj} } \right\|}\nolimits_{2}^{2} + \lambda \mathop {\left\| {{\tilde{\mathcal{R}}}_{{{\mathcal{T}}r}} - \tilde{x}_{rj} } \right\|}\nolimits_{2}^{2}$$
(12)
$${\mathcal{L}}\left( {{\mathcal{Q}}_{r} } \right) = \sum\limits_{{\left( {h_{j} ,t_{j} } \right) \in {\mathcal{Q}}_{r} }} {\left[ {m + s_{{\left( {h_{j} ,t_{j} } \right)}} - s_{{\left( {h_{j} ,t^{\prime}_{j} } \right)}} } \right]}_{ + }$$
(13)

Our training objective is to minimize the loss of both the support set and the query set. Finally, the updated relation-meta is used to predict tail entities, and recommendations are based on the rankings of the predictions \({\mathcal{Y}}\) in the test set. From Eq. (14), top-k querying [73] is adopted to return the k samples with the highest scores as the candidate recommendations.

$$P_{k} \left( s \right) \in \left\{ {y \in {\mathcal{Y}}^{\left( k \right)} :\forall i \in \left\{ {1,..,k} \right\},s_{{y_{i} }} \ge s_{\left[ k \right]} } \right\}$$
(14)

where \({\mathcal{Y}}^{\left( k \right)}\) represents the set of k-tuples with k distinct samples of \({\mathcal{Y}}\). \({\mathbf{y}}\) denotes a tuple distinguished from a single prediction \(y \in {\mathcal{Y}}\). \(s\) refers to the score computed as Eq. (12), and \(s_{\left[ k \right]}\) is the kth maximum score.

4 Experiment

4.1 Multi-head Attention-Based Meta Relational Learning

In this section, we conduct experimentsFootnote 1 on two public data sets to demonstrate the validity of our method. Our meta relational learning model outperforms previous methods, GMatching [54], MetaR [55], and GANA [74], which achieve state-of-the-art results on the few-shot link prediction benchmarks. We further develop an industrial knowledge graph-based platform embedded with our meta relational learning model to support the recommender system in low resource conditions.

4.1.1 Datasets and Evaluation Metrics

4.1.1.1 Datasets

In the experiments, two public datasets, namely NELL-One and Wiki-One, are employed to evaluate the performance of our meta relational learning model. These datasets are first constructed by Xiong et al. [54] and then reused by Chen et al. [55]. They are commonly used in few-shot link prediction [54, 55, 74]. The statistics of datasets are shown in Table 2.

Table 2 Statistics of datasets. The Pre-Train setting represents using background to train entity embedding in advance. The In-Train setting fits the background graph into training tasks. # Train, # Dev, and # Test count the number of relations in training, validation, and test set. Regardless of the different settings above, # Ent, # R, and # Triples denote the number of entities, relations, and triples

Since GMatching considers both learned embeddings and one-hop graph structures, a background graph is constructed with relations out of training/validation/test sets to obtain the pre-trained embeddings and local graph. Except for one-shot task relations, GMatching selects the rest of the relations as background relations that provide crucial background knowledge for training. MetaR further uses the background graph to get two dataset settings. (1) Pre-Train setting represents the background used for pretraining. (2) In-Train setting denotes that the background graph is fitted into the training tasks with random initialization.

4.1.1.2 Evaluation

For evaluation, MRR (mean reciprocal rank) and Hits@N are widely adopted for evaluating few-shot link prediction algorithms. MRR is an average measure of reciprocal ranks over all the true triples [55, 75, 76]. Hits@N counts the proportion of correct entities ranked in the top N in link prediction [54, 55].

4.1.2 Implementation and Baselines

4.1.2.1 Implementation

Our model uses the Adam optimizer [77] for adaptive convergence, and we use the Leaky ReLU function [78] to address the dying ReLU problem. The initial learning rate is 0.001 and gradually drops to 0.0001. The dropout rate is set to be 0.5. We set the margin \(m = 1\) and the updating step size \(\beta = 5\). The training would be stopped when the performance on Hits@10 drops 30 times.

Parameters of the relation-meta encoder in Sect. 3.2.2 are as follows. Following GMatching, [54] the initial embedding dimension of NELL-One is 100 and Wiki-One is 50. For the first LSTM sublayer, the input dimension of NELL-One is 200 and Wiki-One is 100, with the size of the hidden state being 200 for NELL-One and 100 for Wiki-One. For the multi-head attention sublayers, the input size is 200 for NELL-One and 100 for Wiki-One. The number of parallel attention heads is 5. Due to the concatenation operation, for the inference LSTM sublayer, the input dimension of NELL-One is 400 and Wiki-One is 200, with the output size being 100 for NELL-One and 50 for Wiki-One.

4.1.2.2 Baselines

Since GMatching and MetaR are two notable baselines of few-shot link prediction task, we compare our method with the embedding-based methods below, following the implementations of Chen’s report [55] for comparison:

  • (1) TransE [62]: a classical method that transforms the triples into a low-dimensional vector space.

  • (2) TransH [79]: a hyperplane-based translation model.

  • (3) RESCAL [80]: a factorization-based method that combines the collective learning of a three-way tensor with the factorization to learn the latent relation space of the triples.

  • (4) DisMult [81]: setting the matrix to diagonal matrices and constructing the object function to simplify the computation of RESCAL.

  • (5) ComplEx [82]: a latent factorization method that uses the composition of complex embeddings to handle both the symmetric and antisymmetric relations.

  • (6) GMatching [54]: a one-shot relational learning model that is trained to match local graph patterns. Notably, GMatching_RESCAL, GMatching_TransE, GMatching_DisMult, GMatching_ComplEx, and GMatching_Random in Table 3 and Table 4 represent different selections of the matching processor in the GMatching model.

  • (7) CogKR [83]: a cognitive graph based on coordinating retrieval and reasoning.

  • (8) GANA [74]: a global–local framework.

Table 3 Results of few-shot link prediction on NELL-One. Underline numbers are the best results of our model. Our model and MetaR test on both the Pre-Train setting and the In-Train setting, specified in (bracket)
Table 4 Results of few-shot link prediction on Wiki-One. Underline numbers are the best results of our model. Both the Pre-Train setting and the In-Train setting results are also specified in (bracket)

4.1.3 Results

We use two initialization strategies, Pre-Train and In-Train, for 1-shot and 5-shot tasks on NELL-One and Wiki-One following MetaR [55]. The results of the test set are shown in Table 3 and Table 4. Results of the baselines with different initialization of KGE are copied from the original paper in this paper.

In Table 3 and Table 4, our model performs better with all evaluation metrics compared with MetaR on both datasets. (1) With the Pre-Train setting, our model increases 1-shot link prediction on NELL-One by 43.9, 8.5, 25.6, and 80.6% on MRR, Hits@10, Hits@5, and Hits@1, with an average improvement of 39.1%. The improvements of 1-shot link prediction on Wiki-One are 6.1, 10.1, 5.9, and 6.0%, with an average improvement of 7.0%. Besides, our model increases 5-shot link prediction on NELL-One by 19.1, 6, 12.5, and 21.3% on MRR, Hits@10, Hits@5, and Hits@1, with an average improvement of 14.9%. The improvements in 5-shot link prediction on Wiki-One are 9.6, 12.4, 12.2, and 9.6% on Wiki-One, with an average improvement of 11.0%. (2) With the In-Train setting, our model increases 1-shot link prediction on NELL-One by 30%, 21.4, 24.4, and 39.4%, with an average improvement of 28.8%. The improvements of 1-shot link prediction on Wiki-One are 11.4, 10.7, 12.4, and 11%, with an average improvement of 11.4%. Besides, our model increases 5-shot link prediction on NELL-One by 29.1, 19.2, 27.7, and 39.3%, with an average improvement of 28.8%. The improvements in 5-shot link prediction on Wiki-One are 11.7, 16.6, 16.3, and 8.4%, with an average improvement of 13.3%.

Moreover, we conclude that the performance of few-shot link prediction is affected by the quantity and quality of training data. On the one hand, the In-Train dataset setting is better than the Pre-Train dataset setting on large and sparse datasets because the relations are too sparse to learn the relation-meta. Controversially, the Pre-Train setting provides a few supervised signals for the sparse data for fast convergence, and we conduct schema-based reasoning based on the same intuition. On the other hand, the In-Train dataset setting performs better on the small-scale dataset because our meta relational learning model focuses on learning task-specific relation-meta. Bi-LSTM layers are also employed in the relation-meta encoder, while this implementation performs even worse because the reverse relations of facts should be mapped as a different relation-meta.

4.2 Case Study

In this section, a graph-based platform is developed to demonstrate the feasibility of our method. The proposed recommendation algorithm is embedded in the platform with the graph searching engine to support product development and service innovation for Zhejiang Yueli Electrical CO.LTD (YL) [84]. YL is one of the top three home appliance manufacturing bases in China, with annual sales exceeding 1.7 billion yuan. Their three factories have more than 50 assembling lines, 7 automatic painting lines, 286 patents, and 258 injection molding machines.

4.2.1 Evaluation

To evaluate the impact of our recommendation algorithm on the industrial knowledge graph of YL, a business metric, net promoter score (NPS) [85, 86], is introduced in this section. NPS is a measure of customer loyalty that evaluates users’ preferences for the recommended results. However, NPS based on customer surveys may lead to subjectivity and randomness of results. Assuming the threshold between promoters and passives is m, and the threshold between passives and detractors is n. The number of candidate resources is k for each query entity. Thus, we define the promoters, the passives, and the detractors as follows.

  • (1) Promoters are correct predictions ranked in the top m.

  • (2) Passives are correct predictions that are ranked from mth to nth.

  • (3) Detractors are correct predictions that are ranked from nth to kth.

Based on our definition, NPS is calculated by subtracting the percent of detractors from the percent of promoters, as shown in Eq. (15).

$${\text{NPS}} = \frac{1}{\left| k \right|}\sum\limits_{i = 1}^{\left| k \right|} {{\mathbb{I}}\left( {{\text{rank}}_{i} \le m} \right)} - \frac{1}{\left| k \right|}\sum\limits_{i = 1}^{\left| k \right|} {{\mathbb{I}}\left( {n \le {\text{rank}}_{i} \le k} \right)}$$
(15)

where \({\text{rank}}_{i}\) is the ranking of the ith prediction, and \({\mathbb{I}}\left( {} \right)\) is an indicator function counting rankings that meet the inequality constraints.

Referenced to the construction of NELL-One, we select 50/5/10 task relations that are collected from the industrial knowledge graph constructed for YL for training/validation/testing, considering sufficient triples for evaluation [54]. Here we use random initialization. The results are shown in Table 5, and the performance of NPS is related to the ranking threshold settings of m and n.

Table 5 NPS results with different ranking thresholds

Table 5 shows that NPS keeps going up with higher threshold settings of m and n, which denotes users would require more recommended resources. If the manufacturing process requires more accurate recommendations, then a strict threshold should be considered. Considering our recommendation engine is based on the few-shot link prediction, the threshold settings of m and n in Table 5 refer to the widely used evaluation metrics Hits@N [55], including 1 and 5, 1 and 8, and 5 and 8, with 10 recommendations for each query. It is based on the intuition that redundant and mismatched recommendations are worthless. Additionally, due to the random initialization adopted for testing here, slight fluctuations of MRR and Hits@N demonstrate that our meta relational learning model could predict any newly added relations without fine-tuning.

4.2.2 Comparative Experiment

To compare with prior recommendation approaches under low-resource conditions, the following algorithms are employed as baselines using the data of YL in Sect. 4.2.1:

  • (1) CKFG [19]: an explainable recommendation approach using knowledge-based embeddings.

  • (2) CKE [8]: a collaborative knowledge base embedding approach that uses the heterogeneous information in a knowledge base to improve the quality of recommender systems.

  • (3) NFM [87]: a novel neural factorization machine for predicting recommendations under sparse settings.

  • (4) KGAT [88]: a framework that investigates the utility of the knowledge graph to provide explainable recommendations.

  • (5) MetapathRS [89]: a unified recommendation method with embedding-based learning and graph-based learning.

Here, we reimplement all the approaches by setting the threshold settings of m and n to be 5 and 8. We randomly initialize all the embeddings and the learning rate is 0.001. Evaluation metrics introduced in Sect. 4.1.1 are also adopted for evaluation. Experimental results are shown in Table 6.

Table 6 Comparative Experiment

From Table 6, our model outperforms previous approaches on all the evaluation metrics. Compared with our model, there are significant gaps in long-tail recommendations for prior knowledge graph-based recommendation approaches like KGAT. We postulate that the performance divergence generally falls into two factors. First, long-tail recommendations that are formulated as a few-shot learning problem of learning-to-recommend long-tail resources with few interactions pose extreme difficulties to the previously reported recommendation approaches. Second, the industrial knowledge graph of YL includes resources with various labels and diverse relations, but the methods like KGAT and MetapathRS that are developed to address user-item graphs cannot be directly adapted to this scenario. Comparatively, based on these observations, we propose handling such problems by learning the robust relation-meta between potentially associated resources.

4.2.3 Applications

The platform integrates resources throughout the product life cycle based on the industrial knowledge graph. Additionally, newly arising resources produced by the business process system are added to the graph, forming a closed loop. Once YL restarts a business process, candidate resources are recommended to corresponding nodes according to the links in the knowledge graph.

Figure 5 and 6 are applications of the graph-based platform, including the resource map, graph search engine, and recommendation engine based on the proposed approach in Sect. 3.

Fig. 5
figure 5

Graph-based platform: home page of our recommender system

Fig. 6
figure 6

Graph-based platform: knowledge graph search

We further analyze an outsourcing procurement process of the YL enterprise for a case study. It is a collaborative step in the product design process. Resources involved in the 11 sub-processes are shown in Table 7. Knowledge resources are standards and patents. Business data resources include purchase requisitions, purchase order letters, test reports, sample confirmation letters, and specifications. Product resources are product samples and drawings.

Table 7 Outsourcing Procurement of YL

The core enterprise YL can manage the entire outsourcing process by the process engine in the platform, as shown in Fig. 7. Meanwhile, the industrial resources involved in the process are continuously added to the graph-based platform. The core enterprise releases resources to cooperative suppliers that upload resources back to the core enterprise for industrial process control. Figure 8 shows the knowledge graph that integrates all the resources involved in the outsourcing process. Yet, such data are still static and sparse. In this sense, our recommendation algorithm would recommend resources under low-resource conditions.

Fig. 7
figure 7

Outsourcing procurement of YL: Process engine

Fig. 8
figure 8

Outsourcing procurement of YL: Knowledge graph visualization

From Fig. 9, black links annotated as rel, specify the correlations among resources based on the process engine. The platform further links the resources based on schema-based reasoning (blue links in Fig. 9), forming the training set for few-shot link prediction.

Fig. 9
figure 9

An example of recommending industrial resources for the outsourcing procurement of YL

Specifically, the supplier BJHC (denoted as BJHC in Fig. 9) could be recommended to the core enterprise because of the link annotated as rel based on the historical rule. Besides, according to the performance rule, the platform links the supplier ZJLB (denoted as ZJLB in Fig. 9) to the core enterprise YL (denoted as YL in Fig. 9) because both the supplier ZJLB and supplier NBKQ (denoted as NBKQ in Fig. 9) are satisfied with the requirements of the specification. Third, the patents and standards (nodes in black ovals in Fig. 9) are linked to supplier NBJH (denoted as NBJH in Fig. 9) based on the community rule. Lastly, with the path rule, several resources like drawings can be recommended to the core enterprise YL according to the two-hop paths in the graph.

Also, schema-based reasoning generates relations and triples for few-shot recommendations. The platform conducts multi-head attention-based meta relational learning to link implicit resources for the correlative recommendation. As shown in Fig. 9, the model learns the relation-meta (red links in Fig. 9) to correlate long-tail resources. For example, the supplier NBJH is linked to the core enterprise YL, because both suppliers (ZJLB and NBJH) are satisfied with the knowledge resource requirements and the schema-based relation in the graph serves as the heuristic rule for meta training. Moreover, the meta relational learning model learns the relation-meta for product specification, drawing, purchase order, and suppliers shown in the graph. These industrial resources are recommended to the suppliers (NBJH and BJHC) because of their strong relevance in the outsourcing process. All the predicted links practically improve the YL’s outsourcing process and further complete the industrial knowledge graph.

5 Analysis and Discussion

To the best of our knowledge, we are among the first to construct an industrial knowledge graph and use the few-shot relational learning to address the cold-start problem in the recommender system. We develop a meta relational learning model for recommender systems based on the intuitive that the close correlations among industrial resources are conducive to the conditions where massive labeled data are unavailable. The case study demonstrates that the industrial knowledge graph contributes to resource ordering, and our meta relational learning model with schema-based reasoning helps to solve the cold-start problem in recommender systems.

Inevitably, our method also suffers from multiple limitations. Specifically, the construction of the resource graph is hugely influenced by the specialists using top-down methods. Contrarily, the performance of bottom-up methods in most industrial scenarios, such as unsupervised learning, is unsatisfactory. Moreover, the meta relational learning model greatly affects the performance of the recommender systems that are troubled with the cold-start problem. Thus, we discuss the components and modules of our model through the ablation study.

The ablation study on NELL-One selects metric Hit@10 with four settings for evaluation. The results are shown in Table 8 where the marks (Pre-Train/In-Train) represent two different dataset settings in experiments. First, we remove the multi-head attention sublayers, denoted as -mh. Second, we remove the training module, denoted as -g following MetaR for convenience. Third, we remove the entire relation-meta encoder and the penalty term in the training module, which makes the model rebase to a simple TransE model, denoted as -g -r. is the result of TransE is copied from GANA [74], which is neither Pre-Train nor In-Train. The last one is our complete model denoted as standard.

Table 8 Ablation Study on NELL-One with Hits@10 Metric

First, removing the multi-head attention layers for Pre-Train and In-Train settings, our model decreases by 29.5 and 24.4% for the 1-shot task, and 21.5 and 11.1% for the 5-shot task. Second, by removing the translating embedding module for Pre-Train and In-Train settings, our model decreases by 31.8 and 27.5% for the 1-shot task, and 25.5 and 14.2% for the 5-shot task respectively. At last, removing the entire relation-meta encoder Pre-Train and In-Train settings, our model decreases by 37 and 48.7% for the 1-shot task, and 8.5 and 33.8% for the 5-shot task.

The results demonstrate that all the components of our model contribute dramatically, and the relation-meta encoder contributes more than translating module. Moreover, more shot samples contribute to few-shot learning algorithms because such training samples enhance the robustness of models. This also illustrates that the few-shot link prediction is a practical solution to the cold-start problem of recommender systems.

However, all evaluation metrics of our model still have room for improvement, especially compared with deep learning models that have sufficient data for training. Thus, refinement and optimization of meta relational learning models are expected in the future.

6 Conclusion and Future Work

Here we propose an industrial knowledge graph to recommend resources in low-resource conditions for industrial collaboration. We construct the industrial knowledge graph with the predefined schema to enhance resource sharing and improve resource reuse. The nodes in the graph represent different resources, and the links in the graph are correlations among resources. Then the schema-based reasoning links resources to heuristically construct the training data for few-shot learning. The multi-head attention-based meta relational learning further learns the relation-meta that supports the correlative recommendation within the manufacturing value chain.

In the future, we will employ our recommendation algorithm in other domains based on the domain knowledge graph. In addition, reinforcement learning and generative adversarial learning would be used to improve the performance of few-shot relational learning. Finally, multi-modal knowledge graphs with industrial applications are to be developed to enhance industrial cooperation.

7 List of Abbreviations and symbols

A list of abbreviations in this paper is provided in Table 9 .

Table 9 Abbreviations

A symbol table for Sect. 3.2 is provided in Table 10.

Table 10 Symbol Table