FLAG: Towards Graph Query Autocompletion for Large Graphs

Graph query autocompletion (GQAC) takes a user’s graph query as input and generates top-k query suggestions as output, to help alleviate the verbose and error-prone graph query formulation process in a visual interface. To compose a target query with GQAC, the user may iteratively adopt suggestions or manually add edges to augment the existing query. The current state-of-the-art of GQAC, however, focuses on a large collection of small- or medium-sized graphs only. The subgraph features exploited by existing GQAC are either too small or too scarce in large graphs. In this paper, we present Flexible graph query autocompletion for LArge Graphs, called FLAG. We are the first to propose wildcard labels in the context of GQAC, which summarizes query structures that have different labels. FLAG allows augmenting users’ queries with subgraph increments with wildcard labels to form suggestions. To support wildcard-enabled suggestions, a new suggestion ranking function is proposed. We propose an efficient ranking algorithm and extend an index to further optimize the online suggestion ranking. We have conducted a user study and a set of large-scale simulations to verify both the effectiveness and efficiency of FLAG. The results show that the query suggestions saved roughly 50% of mouse clicks and FLAG returns suggestions in few seconds.


Introduction
Researchers and practitioners perform different types of queries on large graphs [30]. Formulating subgraph matching query, among others, requires significant users' effort. A popular approach to provide query formulation aids for users is to build visual query interfaces (a.k.a Guis) that facilitate the drawing of query graphs in an easy and intuitive manner. Real-world visual query interfaces (e.g., PubChem 1 , Chem-SPider 2 , and SCAFFoLd hunter 3 ) have already been offered. However, composing graph queries in a visual environment may still be cumbersome. To alleviate the burden of visual graph formulation, graph query autocompletion (GQAC) [36,37] has been proposed. Consider a scenario that a user formulates a target query graph q t iteratively via the Gui. Given an existing partially formulated query graph q, GQAC aims to suggest a subgraph increment Δq to q to form a query suggestion, such that the suggestion is closer to the target query q t .
Since users' intention is hard to predict, GQAC typically returns k suggestions on a visual interface for users to choose from. An example of Gui, the user's current query, and suggestions of GQAC are shown in Fig. 1. We mimicked the example figure style of a related work [37] for presentation consistency.
Existing studies only consider the GQAC problem for large collections of small graphs, e.g., chemical databases 4 , and cannot be directly applied to large graphs. In particular, previous studies construct query suggestions based on some popular substructures (a.k.a features) of the graph data. 5 This assumes users want to construct queries to retrieve some graphs. For instance, we may set the minimum support of frequent substructures to 10% of the dataset size for PubChem and we obtained approximately a thousand features for GQAC. However, in large graphs, such features are smaller in size and more scarce in quantities. For example, such frequent subgraphs are very few in CiteSeer, reported fewer than 10 frequent subgraphs for various support threshold values [8]. This phenomenon leads to two main challenges of GQAC for large graphs. First, there are a large number of distinct subgraphs and each of them has small supports from the graph. Candidate suggestions generated from them are many but rare in the graph data. Further, the visual interface shows only k suggestions, not to mention humans may interpret a small set of suggestions in practice. Such k suggestions may not be useful. We illustrate the second challenge with Example 1.
Example 1 Suppose the current query is q. The first suggestion in Fig. 1 ( q + Δq 1 ) increments q by one edge, which may not save effort from query formulation. Then, consider the last three suggestions increment q by two edges. They have however become overly specific, each of which appears only few times in the data graph. The three suggestions occupied relatively much area of the GUI. It is desirable to efficiently summarize the specific suggestions and rank the generalized one high and leave room for others.
To address the aforementioned challenges, we propose Flexible graph query autocompletion for LArge Graphs (FLAG). To tackle the first challenge, we propose wildcard label to GQAC. A wildcard label represents any label of the data graph. It is suitable for GQAC for a large graph for two reasons. First, FLAG can then provide suggestions that contain wildcard labels. An example is shown as q ′ 2 of Fig. 2. q ′ 2 summarizes suggestions q + Δq 2 , q + Δq 3 , and q + Δq 4 of Fig. 1. It is evident that q ′ 2 summarizes (or generalizes) the three suggestions, each of which has only few support from the graph data, and spare some space of the top-k suggestions for others. Second, wildcards can be naturally used when users are not sure about the labels of the nodes/edges of the query graph, but FLAG still suggests new edges. To avoid having wildcards appearing in arbitrary places of query suggestions, we propose well-formed suggestions.
To address the second challenge, we introduce query generalization and query specialization of suggestions for GQAC. Query specialization is an operator for augmenting an existing query to one that is closer to the target query. It also quantifies how much a suggestion augments the existing query. In each specialization, the user either i) add a wildcard edge or ii) change a wildcard label to an exact label. The introduction of wildcard label does not alter the asymptotic complexity of the query process (e.g., subgraph matching) or graph autocompletion (e.g., suggestions ranking). Next, we propose query generalization which is the opposite of query specialization. Recall from Fig. 2, three suggestions are generalized into one so that the support of the generalized suggestion becomes higher. q ′ 2 is more specific than q + Δq 1 but more generalized than q + Δq 2 , q + Δq 3 and q + Δq 4 .
Wildcard-enabled GQAC may generate numerous candidate suggestions and their ranking can be inefficient. We propose a novel linear submodular ranking function that involves not only query suggestion's specialization to the current query but also the summarization of the possible candidate suggestions. Specifically, we propose specialization value ( ) to quantify how much a suggestion augments the existing query; and summarization value ( ) to quantify how many candidate suggestions a suggestion summarizes. The approximation of the ranking function is differentiable. Hence, we can adopt a stochastic gradient descent algorithm to learn the parameters of the ranking function. It is also not surprising that the ranking problem is nP-hArd. Since the ranking function is submodular, we propose an efficient greedy algorithm for computing the top-k suggestions. To further optimize efficiency, we extend an existing index with the support of wildcards for ranking.  Fig. 2 Example of query and the suggestions with wildcards 5 When query logs (which are also graphs) are available, GQAC may generate suggestions from them also.
In conclusion, this paper makes the following contributions.
1. We propose wildcard labels for query graph and query suggestions. We propose a notion of well-formed wildcard graph for GQAC. 2. We propose specialization value ( ) and summarization value ( ) to measure how much a suggestion specializes an existing query and summarizes other candidate suggestions. 3. We propose a ranking function based on and . 4. To optimize the efficiency of online ranking of query suggestions, we present the techniques that are needed to extend an existing index for the wildcard-enabled GQAC. 5. We use a stochastic gradient descent algorithm to learn the parameters of the ranking functions in experiments. We investigate the usefulness and efficiency of FLAG via a user study and extensive simulations. The results show that FLAG saves about 50% of mouse clicks in query formulations and the suggestions are returned in several seconds under a large variety of settings.
The rest of the paper is organized as follows. Section 2 provides the background of GQAC. Section 3 proposes wildcard labels for GQAC. Section 4 proposes specialization value ( ) and summarization value ( ) for query suggestions. Section 5 provides details of the efficient online suggestion ranking. We present a performance study in Sect. 6. We discuss the related work in Sect. 7. Section 8 concludes the paper and presents some future work.

Preliminaries
This section provides the preliminaries of graph query autocompletion (GQAC) and presents the problem being studied. Some frequently used notations are listed in Table 1.

Graph Data
We consider a single large graph G = (V, E, l) , consists of a set of nodes V, a set of edges E and a labeling function l that assigns labels to nodes and edges. The size of a graph is defined by |E|. is a function that returns the degree of a vertex. For example, Fig. 1 shows the CiteSeer graph. Node labels represent the area of the publication (e.g., DB, DM, IR) and edge labels represent the distance between the pair of publications. This dataset will be used for subsequent examples. For presentation simplicity, all examples illustrate undirected graphs with a single label for each node and edge.

Query Formalism
This paper adopts subgraph isomorphism, a popular and fundamental query formalism, for the technical discussions. The subgraph isomorphism is recalled below.

Definition 1 (Subgraph isomorphism) Given two graphs
where (l 1 , l 2 ) = iff l 1 = l 2 Multiple subgraph isomorphic embeddings of g may exist in g ′ , denoted as 0 g,g ′ , 1 g,g ′ , … , m g,g ′ . For succinct presentation, we refer to each i g,g ′ as an embedding , when the subscripts and superscripts are clear from or irrelevant to the context. Definition 2 (Subgraph query) Given a single large graph G and a query graph q, the answer (or result set) of q is

Visual Graph Query Construction
Graphs and their query graphs can be intuitively displayed and drawn in a visual environment (e.g., a Gui Fig. 1). In the process of visual query construction, the user draws the current query q on the query editor and has the target query q t in Query increment (adding Δq to q yields q ′ ) Q ′ Query suggestions q ⊆ q ′ q is a subgraph of q ′ and is the embedding of q in q ′ f A (proper) connected subgraph of the query q mind; and he/she performs an action (e.g., adding an edge or subgraph to q) to make the current query closer to the target. Performing this process manually can be error-prone.

Graph Query Autocompletion (GQAC)
A visual environment often provides visual aids for query construction, in addition to the basic constructs (e.g., as shown node and edge labels in the label panel of Fig. 1).
Recently, Yi et al. [36] propose GQAC that aims at alleviating users from the cumbersome actions by providing useful subgraph suggestions. The process of GQAC is sketched in Fig. 3. Here, we present its major steps. The details related to FLAG are postponed to later sections.
1. GQAC takes the user's current query q and the user preference of suggestions as input. Voluminous candidate query suggestions are generated and ranked. A query suggestion is a graph that augments the current query with structure and/or labels. Note that the increments to the query can be subgraphs. A small set of ranked query suggestions are efficiently generated for the user's review. 2. The user may compose the query by either adopting a suggestion or manually adding other edges. 3. The above steps are repeated until the target query is constructed.

Formalizing GQAC
Recall that query suggestions are formed by incrementing the current query with a subgraph. The current state-of-theart of GQAC [36] exploits the concepts of graph features (or simply features). Graph features are generally understood as subgraphs that carry important characteristics of graph data. Features have also been considered the tokens of GQAC. For example, an existing work of GQAC [36] decomposes the current query into a set of features and augments the current query with another feature to form a query suggestion.
The intuition is that users may want to specify some characteristics of the graph in their target queries. While existing work uses c-prime features as the features for GQAC, other features can be plugged into GQAC, depending on the users' applications. 7 The composition of two subgraphs (incrementing a subgraph with another) can be intuitively understood as a one-step construction of a query suggestion, which can be formally defined as a function. We recall some relevant definitions below.
Definition 3 (Common subgraph ( )) Given two graphs g 1 and g 2 , a common subgraph of g 1 and g 2 is a connected subgraph containing at least one edge and it is a subgraph of g 1 and g 2 (denoted as (g 1 , g 2 ) , or simply when g 1 and g 2 are clear from the context), i.e., ⊆ 1 g 1 and ⊆ 2 g 2 , for some 1 and 2 .

Definition 4 ( for query composition)
[36] is a function that takes two graphs, g 1 and g 2 , and the corresponding embeddings ( 1 and 2 ) of a common subgraph as input, and returns the graph g that is composed by g 1 and g 2 via 1 and 2 of , respectively, denoted by g = (g 1 , g 2 , , 1 , 2 ). Fig. 4. Assume that f 10 is the current query and f 13 is the graph feature used to increment f 10 . Then, g is the query graph formed by adding f 13 to f 10 via the common subgraph f 4 , i.e., g = ( f 10 , f 13 , f 4 , 1 , 2 ). The increment is highlighted in blue with the gray background. The embeddings 1 and 2 specify the locations of f 4 in f 10 and f 13 , respectively.

Example 2 An example of query composition is shown in
Definition 5 (Useful suggestion) Given a target query q t and existing (or current) query q, a query suggestion q ′ is useful if and only if q ⊂ 1 q ′ and q ′ ⊆ 2 q t , for some 1 and 2 .   Fig. 4 An illustration of query composition-forming a large query graph from small graphs As motivated, users' target queries are hard to predict. Recently, GQAC systems have proposed various ranking mechanisms (according to users' preferences and a ranking function ) to efficiently compute a small list of suggestions with the hope that they are useful. Some ranking factors include the result counts of the suggested queries and the structural diversity of the suggestions. It is not surprising that the suggestion ranking problems are generally intractable and hence, greedy algorithms have been proposed to efficiently rank the useful query suggestions.

Problem Statement
Given a large graph G, an existing query q, a ranking function , a user preference and a parameter k, the paper investigates to return query suggestions i is composed by adding an increment to q and Q ′ k is the top-k suggestions w.r.t. the ranking function and the user preference .
To the best of our knowledge, this paper is the first work that computes query suggestions for querying a single large graph and wildcards for GQAC have not been proposed before.

Wildcard Labels for GQAC
In this section, we propose wildcard labels to generalize similar substructures into a summary structure. We further discuss how to introduce wildcard labels to the process of GQAC (e.g., graph features and query compositions).

Wildcard Labels and Graphs
A wildcard label (or simply wildcard) represents any possible labels of nodes/edges and is assigned to new unlabeled nodes and edges by default, meaning that the labels are not yet specified. Figure 5 shows an ordinary feature ( f 13 ) and features having a wildcard label of an edge ( f 8 and f 9 ). The query formalism of subgraph isomorphism can be readily extended with wildcards by simply replacing the matching function of Def. 1 with * , where * (l 1 ,l 2 )= iff l 1 ="*" or l 1 =l 2 .
Definition 6 (Wildcard graph) A graph with wildcard labels "*", denoted as G * , 8 is defined as a 3-ary tuple (V,E, * ), where V and E are node and edge sets and the label function * that assigns ordinary labels or "*" to a node or edges.
A wildcard can be introduced to query graphs manually by users or suggested by GQAC. When introducing the wildcards to GQAC, the features to be added to an existing query must allow wildcards. However, this leads to an exponential blowup in the number of features used in existing GQAC for constructing query suggestions. Having too many wildcards in queries or suggestions is not only computationally costly to generate and rank but also confuses the users. Furthermore, the suggestions having wildcards can be neither too generic nor too specific with respect to the closest ordinary suggestion. To this end, we restrict the wildcards only occur at the leaf nodes/edges only (see Def. 7). Hence, users may often expand their query graphs at the boundaries.
Definition 7 (Well-formed wildcard graph) A graph G * is a well-formed wildcard graph if it is a wildcard graph and all wildcard labels are on one leaf edge and the incident leaf node.

Wildcard Features for GQAC
While wildcards may still significantly increase the number of features, and hence, query suggestions, not every wildcard feature is useful. Consider an extreme case, where two frequent features f and f * of the same size have the same result set, i.e., f * ⊆ f , |f * | = |f | and D f * = D f , it is not necessary to consider f * in GQAC. Among wildcard features and ordinary features with the same result set, it is sufficient to increment the existing query with the ordinary feature. Thus, such f * can be omitted from GQAC. Recall that GQAC generates query suggestions by adding a feature from a feature set to the existing query. We propose independent wildcard features such that the features retrieve different results from data.

Fig. 5
Adding wildcard labels to features 8 We use G * to denote a graph having wildcards but may omit "*" when they are not relevant to the discussion. 3.
The independent ratio has the following properties: i) ≥ 1 ; and ii) the larger value of , the more difference in the feature result sets and intuitively, more independent the wildcard features with respect to their closest ordinary fea- The detailed process of generating independent wellformed wildcard features for graph query autocompletion is presented in Algo. 1. We adopt existing studies of feature mining [35] to obtain a set of features F = {f 1 , f 2 , … , f n } . Then, we add wildcard labels one by one to f to obtain wildcard features F * , that are both independent and well-formed. Applying the concepts introduced in Def. 8 and 7, we iteratively generate all wildcard features by substituting labels on one leaf edge with wildcards. Meanwhile, we eliminate the wildcard features that are dependent to existing ordinary features.

Example 3
We illustrate the process of adding wildcard labels to features with Fig. 5. Given an ordinary feature f 13 , and edge DB-DB connects leaf node DB. We replace the labels on edge DB-DB with wildcard labels to obtain wildcard features f 9 , f 8 , and f 6 , which can be regarded as generalizing the labels on the edge DB-DB with wildcard labels.

Composition of Well-Formed Wildcard Features
The features discussed earlier can be the tokens for query autocompletion. Suggestions with wildcards are constructed by adding a feature to an existing query graph. The query composition (a one-step query suggestion construction Def. 4) can be readily extended. Given a query composition (g 1 , g 2 , , 1 , 2 ), g 1 and g 2 could be wildcard features and is restricted to return a well-formed suggestion.

Example 4 Recall the query composition in Example 2 with
( f 10 , f 13 , f 4 , 1 , 2 ). We add wildcard labels into f 13 in Example 3 and obtain wildcard features { f 9 , f 8 , f 6 }. A wildcard composition can be obtained by simply substituting f 13 of the composition with any of the wildcard features. One of the wildcard compositions, i.e., (

Query Specialization and Query Summarization
The previous sections presented the features and their composition. In this section, we formalize query specialization for modeling the whole query suggestion construction process. We propose the specialization value ( ) to quantify how a query graph is specialized from an empty graph, and summarization value ( ) to quantify how one wildcard query suggestion summarizes other suggestions.

Specialization Order ( ≺)
Specialization order is a partial order defined between two query graphs. The intuition is that a more specialized query is closer to the target query. It also models one query is constructed from the other. We formally define the specialization operators and specialization order as follows.
The specialization operators are the following two: 1.
(q, e:(u, v)): add a new edge e, where (e) is a "*" label, if u and v are existing nodes; and (v) is a "*" label, if v is a new node. 2.
(q, e): replace a "*" label with a specific label of the edge or node of e. 6 An illustration of wildcard composition Definition 9 (Specialization order ( ≺ )) Given two query graphs q = (V, E, l) and

Specialization Value ( )
To further measure the different degree of specialization of query graphs, we propose specialization value based on the specialization operators, in Def. 10. In addition, given a suggestion to an existing query, the difference of their specialization values captures how much does the suggestion augment the query. For simplicity, Def. 10 assumes that all operators are equal.
Definition 10 (Specialization value ( )) The specialization value of a query graph q is the number of specialization operators needed to formulate q from an empty graph q ∅ , denoted as (q).
Example 5 We illustrate specialization order and specialization value with Fig. 7. The specialization order of the query graphs is q ≺ q ′ ≺ q ′′ ≺ q ′′′ . The existing query is q (leftmost) with a wildcard. The specialization value of q is 13, indicated in bold at the center of q. After specializing the wildcard of q into the label DB, the user obtains q ′ with a specialization value increased by 1. Then, the user adopts a suggestion with a wildcard to get q ′′ , with specialization value increased by 7. At last, the user specializes the wildcard to a specific label and obtains the target query q ′′′ .

Summarization Set ( )
To model how likely a query q is useful to the user, we compute how many suggestions can be specialized from q. We formally define the summarization set to denote such suggestions.
Definition 11 (Summarization set ( )) The summarization set of a query q, denoted as (q) , contains all query graphs that specialize q.
where G q ′ is the subgraph query results set of q ′ . In other words, q summarizes all the query graphs that specialize q. The summarization of a set of graphs Q is as follows. Fig. 7, given four query graphs, the specialization order of the query graphs is

Example 6 Continuing with
When the user formulates the query graph, both the number of query results and the possible suggestions are decreasing. This property (see Prop. 1) can be used to reduce the number of candidate suggestions for efficient GQAC. In particular, if g is an answer for query q ′ , then g is an answer for every query q that summarizes q ′ . On the other hand, if g is not an answer for q, then g is not an answer for every query q ′ that specializes q. This is formally described as follows.

Autocompletion Framework for Large Graphs
The overall query autocompletion is presented in Algo 3 and illustrated with Fig. 3. FLAG assumes 1 the user submits a query and an intent, and 2 the query is decomposed into a set of embeddings of wildcard features of the data graph. FLAG then supports wildcards in two main steps of GQAC. First, in the candidate generation step, 3 we determine possible candidate suggestions, i.e., the well-formed wildcard features to attach to the current query to form suggestions that may yield non-empty answers. In Sect. 5.1, we propose pruning techniques for large graphs and sampling techniques. Second, in Sect. 5.2, 4 we present a new ranking function that combines the specialization value and summarization set size.

Query decomposition
During the online autocomplete, the query decomposition procedure (Algo. 1 from AutoG [36]) is adopted. The query graph q is decomposed into a feature set F * q , along with the  Fig. 7 An illustration of specialization orders and values embeddings of the features in the query. The detailed process is presented in Algo. 2.
To generate well-form query suggestions, where the wildcards appear in the leaf nodes/edges, the non-leaf wildcards (if any) in F * q need to be specialized before generating candidate suggestions (Lines 3-9).

Non-empty candidate suggestions
Candidate suggestions can specialize the existing query in multiple ways. First, suggestions can replace wildcards in the query with specific labels. Second, candidates can increment the query with (wildcard) features. Specifically, given a set of features, the number of possible candidates is, in the worst case, exponential to the query and feature sizes. However, many of the composed queries may not make sense, when the composed queries do not retrieve any results from the underlying data graph. Such queries are known as empty queries. Furthermore, the problem of deciding the emptiness of a subgraph matching query is NP-hard.
Existing work [36] has proposed a necessary condition for compositions of non-empty query candidates. It has been reported that the condition reduced 13% and 45% of query compositions for AidS and PubChem, which consist of a large collection of modest-sized graphs. When directly applied, [36] prunes only 0.1% of the possible compositions of the CiteSeer dataset. Therefore, in Prop 2, we propose a necessary condition for non-empty query compositions based on the large graph and sampling techniques.
We illustrate how to efficiently prune empty compositions using the embedding information. The queries that are not pruned are considered candidate suggestions.
Consider a large graph G, a set of sampled graphs D obtained from G using existing graph sampling techniques (Sect. 6), and the set of frequent features F extracted from D using existing frequent subgraph mining techniques offline. The embeddings M f of the features in the sampled graphs are and the embeddings M g of sampled graphs in the large graph can be computed offline. For a composition (f 1 , f 2 , , 1 , 2 ) , the embeddings of f 1 and f 2 in the large graph f 1 ,G , f 2 ,G are obtained using M f and M g .

Proposition 2 A query q is a non-empty query of the sampled graphs only if for each query composition
( Proposition 2 verifies whether each composition of the query can find at least one instance in the large graph from the sampled portion. There could be false negatives simply because the sampled graphs may not cover all possible compositions of the large graph, even one may increase the sampling size for higher accuracy. Prop. 2 is used in both online candidate generation and indexing of query compositions offline.

Suggestion Ranking
From our preliminary experiments, we observed that the number of candidate suggestions can be thousands. Considering the users may only be able to interpret a small subset of them, FLAG returns top-k suggestions w.r.t. a ranking function and a user preference. Suggestion ranking criteria of existing studies [36] are either infeasible to obtain from large graphs for efficient online autocomplete (e.g., ) or indistinguishable among the candidate suggestions (e.g., ) because the increment parts share no common subgraphs (i.e., ) and yield the same value. As the first attempt on GQAC for large graphs, we present a ranking function prefers query suggestions that i) augments the existing query more and ii) summarizes more candidate suggestions. The first preference simply reflects the user's intent to adopt larger useful increments, whereas the second one recognizes the importance of summarizing more suggestions that can be useful to the user. These two preferences can be quantified as specialization power and summarization power. We then combine these two criteria to measure the utility of a set of query suggestions.

Definition 12 (Specialization power (
)) Given a set of candidate suggestions U to an existing query q, the specialization power of a suggestion q ′ ∈ U w.r.t. q is defined as The specialization power of a suggestion q ′ is defined as the increment of the specialization value if the user .
adopts the suggestion, and normalized by the maximum specialization value increment of all candidate suggestions.
Definition 13 (Summarization power ( )) Given a set of candidate suggestions U to an existing query q, the summarization power of a subset of candidate suggestions Q ′ ⊆ U w.r.t. U is defined as The summarization power of a set of suggestions is defined as the number of candidate suggestions summarized by them, and normalized by the total number of candidate suggestions.

Example 7
We illustrate specialization power ( ) and summarization power ( ) using Fig. 8. The user manually adds a wildcard edge (with a wildcard node) to query q 12 and obtains q 14 . There are 7 candidate suggestions for the current query q 14 , i.e., q ′ 1 , q ′ 2 , ..., q ′ 7 . According to Def. 12, (q � 1 , q 14 ) = 0.5 and (q � 5 , q 14 ) = 1 . According to Definition 11 and 13, Definition 14 (Utility of query suggestions) Given a set of query suggestions Q � ∶ {q � 1 , q � 2 , … , q � k } , the specialization power of each suggestion with respect to the existing query q, the summarization power of Q ′ with respect to all candidate suggestions, a user preference component ∈ [0, 1] , and scaling factors and , the utility of Q ′ is defined as follows: The bi-criteria ranking function combines the specialization power and summarization power of the query suggestions. is a parameter to set the preference between the two criteria, and the constant denominator k is for normalization. Since the values of the two criteria can be of very different ranges in practice, which makes sensitive and difficult to tune, we introduce the scaling factors. The parameters , and are data-specific. In order to tune the parameters, we adopt a machine learning method. However, this requires all the functions involved to be differentiable. However, the maximum function in Def. 12 is not continuous and differentiable. We adopt a differentiable approximation to the maximum function [4]. Hence, in the experiments, we can use a stochastic gradient descent algorithm to learn the parameters. Fig. 8, we illustrate the utility of query suggestions defined in Def. 14. and are set to 1. There are 7 candidate suggestions to the existing query q 14 , i.e., q ′ 1 , q ′ 2 , ..., q ′ 7 . When GQAC only considers how much the suggestions specialize the existing query (i.e., = 1 ), q ′ 5 and q ′ 6 would be the top-2 suggestions. When GQAC only considers how much the suggestions summarize other candidate suggestions (i.e., = 0 ), then q ′ 1 and q ′ 3 would be the top-2 suggestions. When is set to 0.5, then q ′ 1 and q ′ 5 would be the top-2 suggestions.

Example 8 Continuing with
The ranking task is then to find the top-k candidate suggestions that have the highest value. It can be noted that the two objectives and of can be competing: in practice, the summarization power of smaller queries are often larger as more candidate suggestions are summarized by smaller ones, whereas smaller queries provide smaller specialization power. It is not surprising that the problem of determining the query suggestions with the highest value is nP-hard.

Definition 15 (Ranked Subgraph Query Suggestions for
Large Graphs (rSQL ) ) Given a query q, a set of query suggestions Q ′ , the ranking function , a user preference component , and a user-specified constraint k, the ranked subgraph query suggestions problem is to determine a subset Q ′′ , (Q ′′ ) is maximized, i.e., Q ′′ ⊆ Q ′ , |Q ′′ | ≤ k and there is no other Q ′′′ ⊆ Q ′ such that (Q ′′′ ) > (Q ′′ ).

Proposition 3 The Rsql problem is NP-hard.
(Proof sketch) The maximization of this utility function is nP-hard, by a reduction from the Set Cover (SC) problem. Given an instance of SC problem, each subset S i of elements { o i 1 , … , o i m } is converted to a candidate suggestion q ′ i that summarizes q ′ 1 , … , q ′ m ; and k remains the same. and of rSQL is set to 0 and 1, respectively. Finding the query suggestion set is then to find the i query suggestions, where i is smaller than or equal to k, that cover the candidate suggestions the most. It can be trivially mapped to the

Efficient Summarization Computation
This subsection presents efficient algorithms for determining , which enables efficient ranking for the online autocompletion. We remark that the computation of is straightforward, given q, and hence, is omitted.
The computation of depends on (Defs. 12 and 13). To determine whether suggestions summarize the others, i.e., the specialization orders between them, we need to compute subgraph isomorphism between each pair of online suggestions. Hence, we derive a necessary condition for the specialization order between candidate suggestions and index them. This can be efficiently indexed for the following two reasons. (i) Some query suggestions are similar because they are composed by adding small increments on the same existing query graph. (ii) The specialization order between the wildcard features (i.e., the increments) is available offline.
We formalize a necessary condition for the specialization order between candidate suggestions. We illustrate how to efficiently i) offline compute and index all possible specialization orders and ii) online prune the false ones based on current query graph q. The proposition can be established by a simple proof by contradiction. The first condition of Prop. 4 can be computed offline and then indexed. The second condition can be used in the online autocompletion to prune false specialization orders using the current query.

Indexing Wildcard Features
We extend Feature DAG index (FdAG) [36] with the support of wildcards. Due to space limitations, we highlight the main ideas of the extensions but skip the verbose index definition. An illustration of the index is shown in Fig. 9. In particular, we index the wildcard features (shown in the bottom) in a DAG, where each index node represents a feature, and an edge represents a specialization order between features. All possible subgraph isomorphism embeddings are indexed (shown in M of the index edge and of the indexed content). That is, all the possible ways that two well-formed features are composed have been precomputed and indexed. This avoids computing specializations of features online. The features are further indexed by their values.

Efficient Ranking Algorithm
Given that the ranking function of a set of candidate suggestions Q can be efficiently computed, we present a greedy ranking algorithm in Lines 16-20 of Algo 3. Greedy algorithms are typical approximation algorithm for rSQL because is submodular. Recall that a function is submodular if the marginal gain from adding an element to a set S is at least as high as the marginal gain from adding it to a superset of S. In particular, it satisfies: for all element o and all pair of sets S ⊆ T . We can analyze as follows. Firstly, ∑ is linear and monotone submodular since it is a sum of non-negative numbers. Secondly, is monotone submodular because adding new suggestions can only summarize more candidate suggestions. Hence, is a non-negative linear combination of the two scaled monotone submodular components. Thus, is monotone submodular. The problem of maximizing a monotone submodular function subject to a cardinality constraint admits a 1 − 1∕ approximation algorithm [27].

Experimental Evaluation
This section presents an experimental evaluation of FLAG. We first investigated the suggestion quality via user study and then conducted an extensive performance evaluation via simulation on popular real datasets. In particular, we studied the overall performance of FLAG, the effectiveness of the optimizations, and the effects of the parameters of FLAG.

Software and Hardware
We implemented the FLAG prototype on top of AutoG [36]. The prototype was mainly implemented in C++, using VF2 [7] for subgraph query processing and the McGregor's algorithm [21] (with minor adaptations) for determining common subgraphs. We used GSPAn [35] for frequent subgraph mining. We conducted all the experiments on a machine with a 2.2GHz Xeon E5-2630 processor and 256GB memory, running Linux. All the indexes were built offline and loaded from the hard disk and were then made fully memory-resident for online query autocompletion.

Datasets
We conducted experiments on several different workload settings by employing real graph datasets with various characteristics. Table 2 reports some dataset characteristics.
1. twitter. 9 This dataset models the Twitter social network. It consists of ∼11M vertices and ∼85M edges. Each vertex represents a user and each edge represents the friendship/followership relation between two users. The original graph has no labels. We randomly added labels to the vertices. The number of distinct labels was set to 32 and the randomization follows a Gaussian distribution ( =50 and =3). 2. wordnet. 10 This dataset models the lexical network of words. It consists of ∼74K vertices and ∼234K edges. Each vertex represents an English word and each edge represents the relationships between them, such as synonym, antonym, and meronym. The original graph has no labels. We randomly added labels to the vertices, similar to the way used in twitter. 3. CiteSeer. 11 This dataset models publications in Cit-eSeer. It consists of ∼ 3K vertices and ∼ 4K edges. Each vertex represents a publication and each edge represents the citation relation between two publications. Each vertex is labeled with the Computer Science area (e.g., DB, DM, IR) and each edge is labeled with the Jaccard distance between the pair of publications. The distance is computed from the word attributes of the publications and further evenly categorized into three types (small, medium, large distances).

Query Sets
We generated numerous sets of query graphs of different query sizes |q| (the number of edges) and various frequencies in the large graph. Each query set contained 100 graphs. 12 In particular, we generated queries that yield different result set sizes (i.e., |G q | > |G min q | for all query graphs). These query sets enable us to investigate the usefulness and performance of FLAG with different user workloads. Query sets of query sizes ranged from 2 to 9.

Graph Sampling
Instead of running expensive frequent subgraph mining algorithms on the single large graph, we scaled down the large graph using Random Walk sampling [16] before frequent subgraph mining. We sampled min{|V(G)|, 10 6 } graphs of 10 edges from the large graph. 13 In particular, we randomly selected a vertex as the starting vertex and then simulated a random walk on the graph. At each step, there is a probability 0.15 (the value commonly used in literature) we jumped to the starting vertex and continued the random walk. If we cannot meet the required sample graph size after a large number of steps (e.g., 100 * |V(G| ) or random walk has exhausted the neighbors of the starting vertex, we would select another starting vertex and restart the random walk.

Feature Mining
We followed AutoG using GSPAn [35] to obtain a sufficient number of features (frequent subgraphs) to build the index offline. 14 In particular, we set the default minimum support value ( ) to 0.2, 0.3, and 0.5% for twitter, wordnet, and CiteSeer, respectively. These minimum support values are an order of magnitude smaller than those used in AutoG. We set smaller s because that frequent subgraphs are relatively scarce in large graphs. The maximum feature size maxL was set to 10 for all datasets. Some statistics of the features are summarized in Table 3.

Index
With frequent features mined by GSPAn, we adopted the AutoG procedure (i.e., Algorithm 4 of [36]) to enumerate the possible compositions of feature pairs. We discovered that the pruning technique proposed in AutoG for composition enumeration is ineffective for the employed large graphs. Their pruning technique can prune 13 and 45% of the empty compositions for the AidS and PubChem datasets. It is not surprising to find this necessary condition only prunes 0.1% of the compositions on CiteSeer since the characteristics of citation network are much different from those of chemical and biological structures.
After applying the embedding-based necessary condition for non-empty query compositions (introduced in Sect. 5.1), 41% of the compositions for the CiteSeer dataset are pruned. Table 4 briefly summarizes the characteristics of constructing an index and enumerating compositions, respectively.

Quality Metrics
We adopted several popular metrics to measure suggestion qualities [25,36]. We report the number of suggestion adoptions (i.e., #Auto) and the total profit metric (i.e., tPm). Specifically, the total profit metric (tPm) [25,36] quantifies the percentage of mouse clicks saved by adopting suggestions during the visual query formulation.
no. of clicks saved by suggestions no. of clicks without suggestions × 100%.  12 A query was generated following the Random Walk sampling (same as graph sampling). We checked that the constructed query q was not generated before and with a result set size |G q | large than |G min q |. 13 This limitation is due to the GSPAn binary executable. 14 We have investigated several existing feature mining work before opting to apply GSPAn to graph samples.
In addition to #Auto and TPM , we report the number of specializations from adopting suggestions (denoted as ), the average number of specializations from each adoption (denoted as ( ) ) and the useful suggestion ratio U defined as no. of useful suggestions no. of returned suggestions × 100% . Each reported number is the average of the 100 queries in each query set. Note that even when the suggestions are correct, users still need at least a mouse click to adopt them to obtain the target query. The employed quality metrics are listed in Table 5.

Learning Scaling Factors
We used a stochastic gradient descent algorithm to learn the default scaling factors for Definition 14. We generate 100 random simple queries from a dataset. Each initial query contains 1 edge and its target query contains 4 edges. We divide the queries into 10 groups. Each group is used to learn the parameters around 33 iterations. The learning rate is set to 0.01. The learning algorithm converges at around 300 iterations. For twitter dataset, the default and are 3.8 and 7.6, respectively. For CiteSeer dataset, we obtained the defaults similarly. Their values are 3.6 and 7.2, respectively. The learned for twitter and CiteSeer are 0.58 and 0.45, respectively. , and values of the wordnet dataset are 1, 1, and 0.5, respectively.

User Study
We first conducted a user test with 10 volunteers. Each user was given 3 queries with high, medium, and low TPM values, respectively, from the simulation. We randomly shuffled these 9 queries. The users were asked to formulate the target queries via the visual aid shown in Fig. 1. They expressed their level of agreement to the statement "FLAG is useful when I draw the query." via a symmetric 5-level agree-disagree Likert scale, where 1 means "strongly disagree" and 5 means "strongly agree".
Consistent with [36,37], our result showed that the correlation coefficient between TPMs and users' points is 0.819 and the p-value is 0.007. Thus, TPM is a good quality indication of FLAG. The average ratings of the queries with high, medium, and low TPM values are 3.57 (between "strongly agree" and "agree"), 2.63 ("neither agree nor disagree") and 1.83 (between "disagree," and "strongly disagree"), respectively.

Large-Scale Simulations
We investigated the suggestion qualities via simulations under a large variety of parameter settings. is set to 0.5 so that both and contribute to ranking. For each target query, we started with a random edge with one node label (the other node and edge are with wildcard labels). In each step, we called FLAG. Then, we chose the useful suggestion with the largest number of specializations. If no useful suggestions were returned, we specialized the query by a random specialization operator toward the target query. Each target query set contains 100 queries.
We studied the effects of the major parameters of FLAG on CiteSeer, wordnet, and twitter. We report the representative simulation results in Tables 6-17. The performance characteristics presented here can be useful for users to set their default parameter values, which could be dataset-specific. Table 6 shows the quality metrics of Q5 (i.e., queries of 5 edges) with various max on CiteSeer. The results show the qualities decrease as max increases. #Auto shows that the suggestions were used in multiple iterations of the The total profit metric (TPM) adopted from [25], which quantifies the % of specializations saved by FLAG in the visual graph query formulation: TPM = no. of specializations saved by FLAG no. of specializations without FLAG query formulation. In particular, the formulation process of each query adopted around 5.7 suggestions on average. shows that the number of specialization added by FLAG was around 15. TPM shows that FLAG saved roughly 53% manual specialization in query formulation. ( ) shows that each adoption introduced 2-3 specializations to the existing query. U shows that FLAG generally produced useful suggestions. Tables 7 and 8 show the quality metrics of Q4 with various max on wordnet and twitter. The results of wordnet and twitter share the same trends as that of CiteSeer. The values of quality metrics of wordnet and twitter were lower than CiteSeer since the number of compositions of wordnet and twitter was relatively few.

Varying the Target Query Sizes (|q|)
Tables 9, 10 and 11 show the quality metrics of various |q|. It is not surprising that FLAG achieved more suggestion adoptions as |q| increased. The number of adoption (#Auto) and adopted specializations ( ) increased as |q| increased.
TPM and U of CiteSeer, wordnet, and twitter generally retained as |q| increased. Tables 12, 13 and 14 show the suggestion quality when we varied k. The results show that #Auto, , TPM , and ( ) generally increased with k. It is not surprising because as more suggestions are returned, the higher chance some of  them are adopted. Importantly, the useful suggestions ratio is higher when k is smaller mainly because the useful suggestions of CiteSeer usually rank higher than wordnet and twitter. Tables 15, 16 and 17 show the suggestion quality with various s. The results show that the suggestion qualities were generally good when was small. The optimal for CiteSeer was 0.2, that for wordnet was around 0, and that for twitter was 0.0-0.4. Then, the quality decreased as the value of increased. The learned s from Sect. 6.8 produced slightly lower TPM when compared to the optimal ones. FLAG generally produced high-quality suggestions when s are smaller than 0.8 for CiteSeer, 0.2 for wordnet, and 0.8 for twitter.

Efficiency of FLAG
We conducted a detailed evaluation of the online FLAG processing. We report the Average Response Time (Art) of FLAG under the default setting in Fig. 11. 15 For CiteSeer, we obtained Arts around 3s. For twitter, we obtained short Arts as the number of compositions was relatively few. Thus, the response time of FLAG is generally very short. The rest of this section reports the average response time when we vary major parameters of FLAG, i.e., , k, and |q|.

Varying ˛ of the Ranking Functions
We ranged from 0 to 1. Figure 12 shows the effects of on Arts. The Art was always less than 3.5s. We also noticed that the Art decreased when approaching 1. The higher the value of , the GQAC process prefers suggestions with large specialization more and small summarization, which results in shorter time for updating the summarizations of the candidate suggestions.

Varying the User-Specified Constant k
We varied k from 10 to 50 and reported the Arts for CiteSeer and twitter in Fig. 13. The largest value of k tested was 50, which is large enough for common visual interfaces. The results show that the Arts increased as k increased. FLAG returned suggestions within 5s when k is less than 20. The   15 We remark that query decomposition takes less than a few milliseconds, which are negligible, and hence, is not shown separately.
GQAC process may need 8s to provide suggestions when k is up to 50. Figure 14 shows the Art as the query size increased. The results show that the autocomplete process of FLAG finished within 6s for queries with up to 8 edges. The Art increased when the query size |q| increased. The Art increased mainly because large queries required more time to generate more candidate suggestions and then rank them.

Related Work
Query formulation aids have recently gained increasing research attention. Firstly, recent work has proposed a variety of innovative approaches to help query formulation. For example, GeStureQuery [26] proposes to use gestures for specifying SQL queries. SnapToQuery [15] guides users to explore query specification via snapping user's likely intended queries. [3] has proposed a datadriven approach for GUI construction. Exploratory search has been demonstrated as useful for enhancing interactions between users and search systems (e.g., [20,22,23]). QubLe [11] allows users to explore regions of a graph that contains at least a query answer. Wang et al. [32] recently propose efficient visual exploratory search in graph databases. Huang et al. [10] study canned subgraph patterns for GUI. Seedb [31] proposes visualization recommendations for supporting data analysis. [18] introduces Meaningful Query Focus (mQF) of a given keyword search to generate XQuery. While keyword search (e.g., [33]) has been proposed to query graphs, this approach does not allow users to precisely specify query structures. This paper contributes to query autocompletion for query formulation. Secondly, there is existing work on query autocompletion on various query types. For instance, there is work on query autocompletion for keyword search (e.g., [2,25,34]) and structured queries (e.g., [24]). Li et al. [9] extend keyword search autocompletion to XmL queries. [18] associated structures to query keywords. LotusX provides position-aware autocompletion capability for XmL [19]. An autocompletion learning editor for XmL provides intelligence autocompletion [1]. [12] presents a conversational mechanism that accepts incomplete SQL queries, which then matches and replaces a part (user focus) of the previously issued queries. There has been a stream of work on extending Query By Example to construct structural queries, e.g., [5,6,14]. In contrast, this paper focuses on structural queries for graphs. Hence, we only include related work on graphs.
Regarding GQAC, Yi et al. [36] proposed AutoG to rank subgraph suggestions for graphs of small or modest sizes. The recent work [28,37] introduced user focus to GQAC. In [22], Mottin et al. proposed graph query reformulation, which determines a set of reformulated queries that maximally cover the results of the current query. In Pienta et al. [29] and Li et al. [13], the authors demonstrated interactive methods to produce edge or node suggestions for visual graph query construction. In contrast, this paper considers flexible subgraph suggestions for large graphs.

Conclusion
We proposed FLAG that exploits the wildcard label notion to generate top-k query suggestions to help the query formulation for large graphs. Considering that the graph features exploited by existing GQAC studies are either absent or rare in large graphs, we proposed to introduce wildcard labels for query graph and query suggestions to allow more query suggestion candidates. Candidate query suggestions are ranked by a new ranking function that considers how much the suggestion augments the existing query and how many other suggestions it summarizes. We proposed efficient algorithm for suggestion ranking. Our user study and experiments verified both the effectiveness and efficiency of FLAG.
This paper leads to a variety of interesting future work. We are extending the study of histories of users' activities [38] into the ranking. We are studying the explanations of the few cases (e.g., [17]), where GQAC returned incorrect suggestions. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.