Optimizing subgraph matching over distributed knowledge graphs using partial evaluation

The partial evaluation and assembly framework has recently been applied for processing subgraph matching queries over large-scale knowledge graphs in the distributed environment. The framework is implemented on the master-slave architecture, endowed with outstanding scalability. However, there are two drawbacks of partial evaluation: if the volume of intermediate results is large, a large number of repeated partial matches will be generated; and the assembly computation handled by the master would be a bottleneck. In this paper, we propose an optimal partial evaluation algorithm and a filter method to reduce partial matches by exploring the computing characteristics of partial evaluation and assembly framework. (1) An index structure named inner boundary node index (IBN-Index) is constructed to prune for graph exploration to improve the searching efficiency of the partial evaluation phase. (2) The boundary characteristics of local partial matches are utilized to construct a boundary node index (BN-Index) to reduce the number of local partial matches. (3) The experimental results over benchmark datasets show that our approach outperforms the state-of-the-art methods.

In the Semantic Web community, the Resource Description Framework (RDF) has become a de-facto standard format for knowledge graphs and has been extensively applied [1]. An RDF dataset consists of a set of triples 〈s, p, o〉 and can be transformed into a graph where the resources denoted by s and o are vertices, and the attributes denoted by p are labeled edges. SPARQL [2] is the standard graph query language on RDF graphs. A SPARQL query can be regarded as the subgraph homomorphism problem [3], which is recognized as an NP-complete problem [4,5].
The efficient processing of subgraph matching queries over large-scale RDF graphs in a distributed setting is a challenging problem. Recently, the partial evaluation technique [6] has been applied to solve the problem of regular path queries on distributed environment [7][8][9]. The queries Q are partially evaluated in parallel to obtain partial results on each fragment of data F i on each site S i , then all the partial results are transmitted to a master site. Finally, assemble these partial results to get the final results of Q.
Based on partial evaluation technique, the partial evaluation and assembly framework has been proposed to answer SPARQL queries [10]. However, a large number of partial results can be generated during the partial evaluation phase, making the assembly phase a computational bottleneck. To improve the efficiency of assembly phase, Peng et al. [11] proposed the LEC feature-based optimization strategy to prune some unpromising intermediate results. However, existing works only focus on the assembly phase of query processing, while ignoring the partial evaluation phase, and efficient index-based methods are not effectively utilized by the partial evaluation phase to speed up the search process. The following example demonstrates the drawback of computing partial matching results by the method in [11]. Fig. 1a, given a distributed RDF graph G 0 , and a query Q = (?a, spouse, ?s) ∧ (?a, director, ?b) ∧ (?b, country, ?c) ∧ (?c, capital, ?d), the query engine traverses the dataset on F i according to the query graph to obtain all the candidate sets of query variables. In fragment F i , the candidate sets of query is denoted as B g i , where the subscript g i is used to distinguish among different fragments. Based on the candidate generation strategy, the candidates of internal query nodes of query Q are B g 1  ?s and ?d are not internal query nodes because there only exists one edge connected with them in query graph Q). The candidate vertices are in yellow, green, and blue, respectively, in Fig. 1a. When computing local partial matches on each site, each candidate vertex will start a graph exploration, so that the search process will iterate six times on the first site and six times on the second. To further explore, we can find that a graph exploration from either v 2 or from v 3 can obtain the local partial match (〈?

Example 1 As shown in
As a result, this strategy can generate a large number of repeated local partial results, leading to a degradation in the performance of partial evaluation.
To handle this problem, we propose an effective optimization strategy to accelerate the partial evaluation phase by a constructed index named inner boundary node index which exploits characteristics of partial evaluation and assembly framework (the formal definition will be explained in detail in Section 4). For the example shown in Fig. 1, the following candidate sets are generated after filtered by IBN-Index: 13 , v 14 )〉, 〈?c, (v 9 )〉}. These candidate vertices of ?a, ?b and ?c are colored yellow, green and blue respectively in Fig. 1b. The size of the whole candidate sets is much smaller than the candidates mentioned in Example 1. Furthermore, the local partial match (〈?s, v 1 〉, 〈?a, v 2 〉, 〈?b, v 3 〉, 〈?c, v 9 〉, 〈?d, v 10 〉) will be only generated once which is started from v 3 on fragment F 1 .
Since the growing number of partial matches heavily influence the assembly stage, filtering out part of the partial results becomes an effective way to speed up the query. Therefore, we propose another method that utilizes constructed boundary node index (the formal definition will be explained in detail in Section 5) to filter out part of the false local partial matches in advance, reducing the cost of communication and centralized computation.
We summarize the contributions of the paper with the following three aspects: • Based on partial evaluation and assembly framework, we propose an inner boundary node index (IBN-Index) and a partial evaluation algorithm that utilizes IBN-Index to filter the candidate sets of query nodes, which can significantly speed up partial evaluation phase. • To reduce the number of local partial matches, the boundary node index (BN-Index) is constructed by exploiting the characteristics of local partial matches. Furthermore, a BN-Index-based filter algorithm is proposed. • Extensive experiments on benchmark datasets have been conducted to verify the efficiency and scalability of our method. The experimental results show that our method outperforms the state-of-the-art method.
The rest of this paper is organized as follows. Section 2 reviews the related work. In Section 3, we present the preliminaries and problem definition. An overview of the methods is depicted in Section 4. In Section 5 and Section 6, we propose the inner boundary node index and the boundary node index, with their corresponding algorithm, respectively. Finally, experimental evaluations are presented in Section 7 and we draw conclusions in Section 8.

Related work
Due to performance, confidentiality, and security factors [12,13], the cluster-based distributed data management architecture has become the inevitable research trend to deal with the knowledge graph. In this section, we will review several distributed subgraph matching research on large-scale RDF graphs, which can be classified into two categories, including MapReduce-based graph systems, and specialized RDF systems. Furthermore, existing works on partial evaluation and assembly and graph indexing methods are summarized. SHARD [14], a MapReduce-based triple store for RDF graphs, is able to process SPARQL queries, which decomposes the query graph into a set of triples (a triple containing variables) and binds variables to the vertices of the data graph by iterating on the triple patterns. Meanwhile, it is necessary to satisfy all the constraints in the query. For example, each round of the MapReduce operation adds only one query clause through the join operation. Likewise, the smallest decomposition unit of the query graph in HadoopRDF [15] is also the triple pattern, and it also utilizes the MapReduce framework to divide the RDF triples into multiple small files based on the predicate. However, both methods mentioned above ignore the structural information of the query graph, require multiple MapReduce iterations, and require a large number of join transactions, resulting in high query cost.

Specialized RDF systems
Trinity.RDF [16], a distributed in-memory key-value store, stores RDF data in the native form, with vertex identifiers as keys and adjacent lists of vertices as values. Trinity.RDF finds the optimal exploration plan and reduces the number of intermediate results using the graph exploration instead of join operations, while the final results need to be obtained using a single thread on the master node. In addition, systems based on partial evaluation and assembly framework have also been extensively and deeply studied in recent years.
The method of partial evaluation and assembly is first applied in distributed XML data management by Peter et al. [17]. The key idea is to transmit the whole query graph to each site that is partially evaluated in parallel, and after each node computes the partial results of the query, the results are transmitted as compact Boolean functions to the master node, which are combined to obtain the result. Fan et al. [7,18,19] subsequently propose a series of algorithms based on partial evaluation and assembly framework to deal with XQuery on distributed XML data, reachability query and graph simulation on distributed graph. Peng et al. [10,11] design a subgraph matching query algorithm based on partial evaluation and assembly framework to process SPARQL queries on distributed RDF data and propose relevant optimization strategies.
However, the huge overhead caused by repeated partial matches during partial evaluation is not handled in aforementioned methods. Furthermore, when the number of intermediate results obtained in the local computation phase is extensive, the aforementioned methods may suffer from a performance bottleneck in the assembly phase.

Graph indexing
As a classic space-for-time strategy, graph indexing has been researched extensively and deeply in the past years. Graph indexing methods can be classified into value-based indexing and structure-based indexing.
Through value-based methods, a graph index is usually constructed on one or more properties of an entity. SB-Tree [20] is a variant of B-Tree, which has a better performance on dynamic data. Hexastore [21] and RDF-3X [22] index the RDF data in six possible ways. Based on S-Tree [23], gStore [24] proposed VS-Tree to prune the search space efficiently.
Structure-based methods focus on mining the features of a graph, such as a path, subtree, or other substructure, indexing them to filter the search space and accelerate the query process. Closure-Tree [25] proposed graph closure, which is a generalized graph that represents several graphs, and based on it to organize graphs as a tree. Both subgraph queries and similarity queries can benefit from this method. SPath [26] takes the shortest paths around the vertex neighborhood as the basic process unit and decomposes the query into a set of shortest paths to exploit its indexing. K-path-bisimulation [27] is a path-based index, the path with identical length and the same edge label sequence is divided into the same catalog, and the query is decomposed into a set of paths.
In this paper, the features of partial evaluation and assembly framework are exploited profoundly, and two kinds of indexes are constructed to compensate for the shortcomings of previous methods. Although the proposed index-based methods are value-based, the structural information of the graph is also included in the indexes.

Preliminaries
Let U and L be the disjoint infinite sets of URIs and literals. Then, a tuple in the form of 〈s, p, o〉 ∈ U × U × (U ∪ L) is called an RDF triple, where s is the subject, p the predicate, and o the object. Given an RDF dataset as a finite set of triples, it can be converted to its corresponding RDF graph. In this paper, we focus on the problem of subgraph matching query over a distributed RDF graph. This section will present preliminaries for distributed RDF graphs and subgraph matching queries. Table 1 lists the notations frequently used in this paper.

Definition 1 (RDF Graph)
Given an RDF dataset as a finite set of triples in the form 〈s, p, o〉, its corresponding RDF graph is G = (V, E, Σ), where the set of vertices V is the union of all s and o. For each 〈s, p, o〉, there is a directed edge e ∈ E from the vertex s to the vertex o, where p is the label of that edge e. Here, Σ is the set of all labels, i.e., Σ = {p | 〈s, p, o〉 ∈ G}.
To ensure the integrity and consistency of the RDF graph when partitioned in a distributed system, each computing node needs to store some copies of the edges that cross between different entity sets. Let the copy of the associated edges with other partition be denoted as N c i .
In other words, G can be considered as a distributed RDF graph w.r.t. F , such that: and Σ i represent the set of internal vertices, the set of edges, and the set of labels in F i , respectively. Formally, where E c i is the set of crossing edges between F i and other fragments. If an internal vertex of F i has a direct edge with any vertex Let S = {S 0 , S 1 , … , S n } be a set of n + 1 computing nodes, i.e., sites, in a cluster. Without loss of generality, each fragment F i is stored at a slave site S i for i ∈ {1, … , n}. Fig. 2, given a distributed RDF graph G 1 extracted from the DBpedia dataset, G 1 can be divided into four parts F = {F 1 , F 2 , F 3 , F 4 }, which are respectively  15 ), (v 13 , v 11 ), (v 11 , v 12 )}. The copy between F 2 and other frag-

Example 2 As shown in
In particular, we colored the incoming and outgoing vertices of fragment F 2 in blue.
Given an RDF graph G and a query graph Q as a set of triple patterns, a subgraph matching problem is to find the subgraphs over G that satisfy all the triple patterns in Q. Such a subgraph matching problem is a conjunctive query (CQ) on G, which is the focus of this paper. In the following, we formally present the query graphs and the other necessary definitions adapted from [28].
A query graph includes m triple patterns 〈s r , p r , o r 〉, where the value of each s r , o r can either be a member of V, or 'not labeled'. If a s r or o r is 'not labeled', s r or o r belongs to a special set Var, and the name of each element in Var starts with the character '?'. Similarly, the value of each p r can either be a member of Σ, or Var.

Definition 4 (Query Graph) Given an RDF graph
A CQ Q is also referred to as a query graph.
Before defining subgraph matching, we recapitulate certain definitions of the mapping. For a mapping μ, dom(μ) is its domain. Two mappings μ 1 and μ 2 are compatible, i.e., μ 1 ∼ μ 2 , iff for every element v ∈ dom(μ 1 ) ∩ dom(μ 2 ), it holds that μ 1 (v) = μ 2 (v). Furthermore, the set-union of two compatible mappings, i.e., μ 1 ∪ μ 2 , is also a mapping. Q consists of five query vertices and its semantic is to find the films directed by a person with his spouse, the film's country and the capital of the country. The corresponding query graph is shown in Fig. 3, with one of the query results being highlighted in purple in Fig. 2.

Overview
The partial evaluation and assembly framework is extended to answer SPARQL queries over a distributed RDF graph G, as shown in Fig. 4. In the execution model, there are two phases: the partial evaluation phase and the assembly phase. In addition, two optimization strategies are designed and embedded in this framework. Before the query starts, the entire RDF graph G is divided into multiple fragments according to a certain partitioning strategy, and an index named BN-Index is built on each fragment. The fragments and corresponding BN-Index are then transmitted to each site, and an index named IBN-Index is further constructed on each site locally. When querying, the master node sends the entire query graph to all slave nodes, and the subsequent partial evaluation phase can be summarized into three processes. (1) Each site S i first receives the complete query graph Q and finds all candidate sets of the query graph variables. (2) The query engine uses the IBN-Index to filter the candidates, and executes the graph exploration algorithm according to the filtered candidates to find local matches.
(3) Finally, BN-Index is utilized to filter the local partial matches after graph exploration.
The local partial matches are then sent to the master site to compute the complete SPARQL matches, which is called the assembly stage. Benefiting from the filtering effect of BN-Index, the number of partial matches is drastically reduced, which alleviates the assembly bottleneck problem to a certain extent.
To better illustrate the effect of IBN-Index, it is necessary to explain the partial evaluation process of gStoreD in detail. After each site S i receives the complete query graph, the candidate set of each query variable is obtained according to the predicates connected with the variables. Specifically, the vertices in the candidate set can be classified into internal candidate vertices and boundary candidate vertices. The internal candidate vertices denote the vertices contained in the subgraph F i allocated on S i , while the boundary candidate vertices denote those vertices connected with F i but do not belong to it. After obtaining the  Fig. 4 Overview of the optimized partial evaluation and assembly method candidate sets, all the sites transmit the internal candidate vertices to the master site, and the collection of all internal candidates is resent to all sites.
In order to find partial results on F i , graph exploration starts with each internal candidate vertex of the internal query variables. Specifically, for a query graph, the query variables can be classified into internal query variables and satellite variables, depending on the edges connected with the query node. If a query node only has one edge, it is denoted as a satellite variable.
The reasons why we choose these special candidate vertices as the starting vertices are considered from two aspects. (1) A boundary vertex is also an internal vertex on another site simultaneously so that it will be set as a starting vertex on that site, and the path connected with it will not be lost. (2) If a partial match on S i only matches a satellite node, it will also be found on other sites. Take the partial match in purple on S 1 in Fig. 2 as an example, the partial match will be found on S 2 and connected with v 6 as a complete SPARQL match. Therefore, only starting with internal query variables will not lose any partial results.
At each expansion step of graph exploration, query engine judges whether the matching node belongs to the collection of all the internal candidates. The graph exploration process will not stop until all maximal partial matches are obtained.

Inner boundary node-based algorithm
Recall that in partial evaluation and assembly framework, given a distributed RDF graph G, each site S i receives a part of the graph F i and constructs IBN-Index according to the subgraph. Then, when answering query Q, each site S i computes local partial matches utilizing the constructed index. In this section, the structure of the IBN-Index is defined, and the IBN-Index construction algorithm is introduced. Then we present the subgraph matching algorithm utilizing IBN-Index on each site and give the complexity analysis of the proposed method.

Inner boundary node
The local partial match computation algorithm based on partial evaluation and assembly framework have been proposed in [10]. First, an in-depth analysis of the performance problem of existing work in computing local matches during partial evaluation is carried out, and on this basis, an optimization using the inner boundary node index is proposed.
When computing local partial matches, each internal candidate vertex of the candidate vertices sets starts the graph exploration to find the local partial matches. It should be noted that it is not necessary to traverse all internal nodes as the starting point of graph exploration. The reasons can be considered from the following two aspects.
(1) Intuitively, a complete SPARQL match only needs to be traversed from one vertex so that the candidate vertices in other candidate sets can be discarded. (2) Besides, as mentioned in [10], a local partial match is the overlapping part of an unknown crossing match and a fragment F i . There must be a crossing edge derived from an internal vertex connected with other fragments. Therefore, only searching from the vertices connected with other fragments can get all the partial matches. These kinds of vertices are defined as inner boundary nodes, and the formal definition is given as follows: Definition 6 (Inner Boundary Node.) Given a distributed RDF graph Q and a fragmentation F = {F 1 ,...,F n }. In fragment F i = E i ∪ N c i , if an internal vertex v of F i has a direct edge with any vertex u in F j , where i≠j, then v is an inner boundary node.
Based on inner boundary nodes, the internal entity set V i of fragment F i can be divided into two mutually exclusive subsets, pure internal node set P i and inner boundary node set D i . Formally, The definition of the inner boundary node index is presented as follows: Definition 7 (Inner Boundary Node Index.) Given a fragment F i of RDF graph G, the inner boundary node index, i,e., IBN-Index of F i is a key-value map I IBN where 1) for any tuple (v, tag) ∈ I IBN , the key is a vertex v ∈ V i , and the value tag is a Boolean value denoting if v is an inner boundary node or not; if a vertex v is an inner boundary node, its corresponding tag will be set to True, otherwise it will be set to False; 2) for any vertex v ∈ V i , there exists a unique tuple in I IBN with v as the key and a Boolean tag as the value. Fig. 2 To improve space efficiency of IBN-Index, the strategy of dictionary encoding is adopted, such that each vertex is encoded into an integer by hash operation.

Example 4 As shown in
The construction approach of the IBN-Index is shown in Algorithm 1. First, the inner boundary node set and IBN-Index are initialized by the fragment identifier F i (line 1). Then, for each triple 〈s, p, o〉, if it is a crossing edge and the subject s (or object o) is an internal vertex, the s (or o) is put into the inner boundary node set D i (lines 2-6). For each node v in the internal node set V i , if it is also an inner boundary node in D i , a mapping (v, True) will be put into I IBN i ; otherwise, a mapping (v, False) will be put into I IBN i (lines 7-11).

IBN-index based partial evaluation
In order to answer query Q, each site S i computes the local partial matches based on the known fragment F i . The formal definition of local partial match was defined in [10]. Intuitively, a local partial match PM i is an overlapping part between a crossing match M and fragment F i . Algorithm 2 describes the local partial match computation process utilizing the IBN-Index. The key idea is to use the inner boundary node index to filter the candidate sets, thereby reducing the search space when finding local partial results. Since the result of partial evaluation can be divided into complete SPARQL matches and local partial matches, the correctness of Algorithm 2 can be considered from the following two aspects. (1) Since some complete SPARQL matches contain only pure internal vertices (e.g., the dashed partial match on S 1 in Fig. 2), i.e. they do not have any inner boundary nodes, in order to ensure that all complete SPARQL matches are obtained, we keep a complete candidate set in which all vertices will start graph exploration (line 8). Here a greedy strategy is applied, selecting the candidate set with the smallest size as the reserved set (line 6). (2) In other candidate sets, only the candidate vertex judged as an inner boundary node will start the graph exploration (lines [13][14][15][16][17][18][19].

Example 5
We take site S 1 as an example. As shown in Fig. 2, for query Q, the candidate sets of each internal query variables on site 〈?c, (v 33 )〉}. Firstly, the candidate sets are sorted to find the candidate set with the smallest size and the set 〈?c, (v 33 )〉 is reserved. In the candidate set of ?a, the nodes v 2 and v 31 are not inner boundary nodes according to the IBN-Index, so they are all filtered out. As for the candidate set of ?b, v 3 will not be filtered, while v 32 will be discarded. The filtered set of candidate sets is B g1 = {〈?a, (v 8 )〉, 〈?b, (v 3 )〉, 〈?c, (v 33 )〉}. As a result, graph exploration can find all partial matches on S 1 starting only from v 8 , v 3 , and v 33 . Likewise, the candidate sets on other sites are also filtered out of a large number of candidate vertices using the same method.

Filter local partial matches with boundary node index
As shown in Fig. 5, since the RDF data graph is distributed and stored on multiple sites, the boundary node on each site becomes a bridge connecting any two sites. Node 1 (in blue) on F 1 is a boundary node for F 2 , while it is an internal node in the view of F 1 . As a result, it is named an inner boundary node on F 1 , while it is a boundary node for F 2 . In partial evaluation and assembly framework, after the local partial matches are obtained, all of them are sent to the master site uniformly. However, not all the partial results can continue to be joined to form a complete SPARQL match on the master site. Therefore, it is unnecessary to transmit local partial matches, which are not associated with partial matches on other sites, to the master site. Based on the above problem, this paper proposes an optimization strategy for pre-judging edge labels from boundary nodes to reduce the communication overhead between local sites and the master site.
The definition of boundary nodes (see the definition of V e i ) has been given in Section 3, which means vertices that belong to other fragments but are directly connected to internal vertices in F i . For an RDF graph G, when dividing the data, we record each boundary node's out-edge and in-edge information on the fragment F i as boundary node index. The formal definition of boundary node index is as follows:  Fig. 6, the local partial matches on S 3 is PM 3 = {(〈?s, null〉,

Example 7 As shown in
According to the boundary node index, the last two local partial matches could not constitute any final results. Then these two matching pairs will not be sent to the master node as a message.
The filtering partial matches process is presented briefly in Algorithm 5. On site S i , each partial match is iterated to judge on the boundary nodes (line 3). Only when the value (also known as the predicates belong to other fragments) of BN-Index of the boundary node contains the unmatched predicates on the query graph, can the partial match be further matched and reserved to transmit to the master site.

Experimental evaluation
In order to verify the effectiveness and efficiency of the IBN-Index method and BN-Index filtering method under the partial evaluation and assembly framework, a comparative experiment with gStoreD [10,11] was performed over several benchmark RDF datasets. The proposed algorithm is implemented on top of gStoreD, and is deployed on a 3-node cluster, of which each node is in a Docker. The three dockers are all deployed on a machine with 16 cores Intel Xeon Silver 4216 2.10 GHz processors, 512 GB of RAM, and 1.92 TB SSD, running the 64-bit CentOS 7.7 operating system.

Datasets and queries
In this experiment, the proposed method and gStoreD are evaluated using the LUBM [29] synthetic benchmark dataset of different scales. The statistics of the datasets are shown in Table 2. We need to compare the query efficiency of the method based on IBN-Index, the method based on BN-Index and the combination of the two, and the original gStoreD on different queries. It is necessary to gradually change the number of intermediate results that the query satisfies while limiting the basic structure of the query. Therefore, we choose benchmark datasets rather than real-world datasets to keep the dataset size positively correlated with the number of intermediate results.
In addition, to eliminate the impact of the partitioning strategy on query performance, we use a random partitioning method to divide each dataset into four fragments. As shown in Table 3, eight complex queries of different scales on the LUBM dataset are presented, i.e., Q 1 ∼ Q 8 . To exhibit the effect of IBN-Index and BN-Index, the proposed queries may generate large amount of intermediate results in the partial evaluation phase, as showed in Fig. 7a.

Experimental results
To verify the effectiveness of the IBN-Index and BN-Index based partial evaluation algorithm, extensive experiments were conducted.

Exp 1. Number of partial matching results.
To make it more intuitive to observe and evaluate the performance of the partial evaluation algorithm with different queries and datasets, the number of partial matching results and complete SPARQL matches of Q 1 ∼ Q 8 over LUBM10, LUBM20, and LUBM30 is recorded. The maximal total number of all partial evaluation results (including complete SPARQL matches and local partial matches) generated from each site are depicted in Fig. 7a, which determines the time consumption of the partial evaluation and influences assembly phases. As shown in Fig. 7a, for Q 1 , the number of partial evaluation results increases approximately linearly with the size of the datasets. For the query Q 2 ∼ Q 8 , their partial evaluation results on LUBM20 are the most, which is affected by the partitioning strategy.    For IBN-Index, since the value of each node is only a Boolean value, the space required for the index is small, which guarantees the time and space complexity of the proposed method. As for BN-Index, the space required is correlated with the number of boundary nodes, which depends on the graph partitioning strategy. Although the random partitioning method we use will produce a large number of intermediate results, the experimental results prove the time and space complexity of BN-Index. Overall, the size of the BN-index is proportional to the size of the graph except that on LUBM20. The reason is that the number of crossing edges on LUBM20 is even more than that on LUBM30, which can be verified by the partial evaluation results depicted in Fig. 7a.

Exp 3. Efficiency of IBN-Index Based Optimization.
A measurement of the statistical consequences of graph exploration time in the partial evaluation phase of Q 1 ∼ Q 8 over different datasets is shown in Fig. 9. It can be observed that the partial evaluation method based on IBN-Index outperforms gStoreD on all queries and can improve query performance by 1.64 times in the best case. Furthermore, the method combining IBN-Index and BN-Index (the grey lines) improves the query efficiency by 1.79 times in the best case. As the size of the dataset increases, the proportion of time reduction also increases.
Interestingly, the optimization becomes more significant as the number of query nodes increases. The reasons can be summarized in two aspects. (1) When dealing with more  Fig. 9 The graph exploration in partial evaluation on LUBM datasets query nodes, the number of candidate sets also rises, leading to more repetitive partial evaluation results. Therefore, the IBN-Index can be affected on more candidate sets resulting in a better pruning efficiency. (2) As the length of the result of partial evaluation expands, the candidate nodes are more likely to be pure internal nodes that can be filtered by IBN-Index.

Exp 4. Efficiency of BN-Index Based Optimization.
Although gStoreD already has a partial match filtering strategy, the BN-Index-based method could have the same filtering effect and even higher efficiency than gStoreD. As shown in Fig. 9, the efficiency of graph exploration during the partial evaluation of the BN-Index-based method (the blue lines) exceeds gStoreD in most cases, and the strength is expanded as the scale of datasets grows. The reason is that when dealing with a large number of candidate vertices, gStoreD collects all the internal candidate vertices from all slave sites and transmits them back to all sites, which enlarges the searching space of graph exploration seriously. However, BN-Index will not enlarge the candidate sets.
Exp 5. Scalability. To prove the scalability of the IBN-Index and BN-Index based methods, the whole query times of the improved partial evaluation and assembly method over eight queries are presented in Fig. 7b. It is obvious that the query time of our method is nearly linear with the scale of the datasets. Unfortunately, all runs of query Q 3 , including gStoreD and the IBN-Index and BN-Index based methods, are failed on the LUBM30 dataset due to the limited memory.

Conclusion
In this paper, we proposed an inner boundary node index-based method and a boundary node index-based method to improve the computing efficiency of the subgraph matching queries in distributed settings based on the partial evaluation and assembly framework. Moreover, we also proved that the IBN-Index and BN-Index are both time-efficient and space-effective. The extensive experimental results on benchmark datasets verified the efficiency and scalability of the proposed method, which clearly outperforms gStoreD when large-scale intermediate results need to be processed.