K-Connected Cores Computation in Large Dual Networks

Computing k-core\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\text {-}core$$\end{document}s is a fundamental and important graph problem, which can be applied in many areas, such as community detection, network visualization, and network topology analysis. Due to the complex relationship between different entities, dual graph widely exists in the applications. A dual graph contains a physical graph and a conceptual graph, both of which have the same vertex set. Given that there exist no previous studies on the k-core\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\text {-}core$$\end{document} in dual graphs, we formulate a k-connected core (k-CCO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\text {-}CCO$$\end{document}) model in dual graphs. A k-CCO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\text {-}CCO$$\end{document} is a k-core\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\text {-}core$$\end{document} in the conceptual graph, and also connected in the physical graph. Given a dual graph and an integer k, we propose a polynomial time algorithm for computing all k-CCO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\text {-}CCO$$\end{document}s. We also propose three algorithms for computing all maximum-connected cores (MCCO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$MCCO$$\end{document}), which are the existing k-CCO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\text {-}CCO$$\end{document}s such that a (k+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k+1)$$\end{document}-CCO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$CCO$$\end{document} does not exist. We further study a subgraph search problem, which is computing a k-CCO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\text {-}CCO$$\end{document} that contains a set of query vertices. We propose an index-based approach to efficiently answer the query for any given parameter k. We conduct extensive experiments on six real-world datasets and four synthetic datasets. The experimental results demonstrate the effectiveness and efficiency of our proposed algorithms.


Introduction
Graph model has been used to represent the relationship of entities in many real-world applications, such as social networks, web graphs, collaboration networks, and biological networks. Given a graph G(V, E), vertices in V represent the interested entities and edges in E represent the relationship between entities. Significant research efforts have been devoted toward many fundamental problems in managing and analyzing graph data. Among them, cohesive subgraph detection has been extensively studied recently [5,9,13,17,31].
Given a graph G and an integer k, a k-core of G is a maximal-connected subgraph in which each vertex has degree at least k [29]. The problem of computing k-core s draws a lot of attention [7,17,28,32] due to the elegant structural properties of k-core [29] and the linear time solution [3]. It can be applied in many areas including but not limited to community detection [11], dense subgraph discovery [2,6], graph visualization [1], and system analysis [10].
Motivations In many real-world applications, a single simple graph is hard to express the complex relationship between entities. [33] models a dual graph containing two complementary graphs with the same vertex set, one of which represents the physical interaction between vertices, and the other represents the conceptual interaction. They study the problem of computing the subgraph, namely DCS , which is the densest in the conceptual graph and also connected in the physical graph. However, computing the DCS in dual graphs is NP-hard. Even though an approximate solution is proposed and a relatively poorer result quality is endured in [33], the time consuming for this problem is still large and not scalable to big graphs. Additionally, they do not restrict the connectivity of DCS in the conceptual graph.

3
The result subgraph is probably disconnected and obviously not cohesive.
Given that there exists no any research on the k-core computation in dual graphs, in this paper, we adopt the classic k-core definition to model a k-Connected COre ( k-CCO ) in dual graphs. Given a dual graph and an integer k, a k-CCO is a dual subgraph g satisfying the following three conditions: (1) the minimum degree of g is not less than k in the conceptual graph; (2) g is connected in the conceptual graph; and (3) g is connected in the physical graph.
An example of a dual graph is given in Fig. 1. Figure 1a shows a physical graph, and Fig. 1b shows a conceptual graph. Given an integer k = 3 , there exists only one 3-CCO , that is the induced dual subgraph of {v 2 , v 3 , v 4 , v 5 } . The minimum degree is not < 3 , and the subgraph is connected in both two graphs.
Our k-CCO model restricts the connectivity for both two graphs and guarantees the cohesiveness of the result graph by given integer parameter k. Based on this model, we formulate two global detection problems. Given a dual graph G and an integer k, the first problem is computing all k-CCO s in G. It offers a flexible selection for the degree constraint k and returns a subjective result for users. Similar to the DCS problem in [33], we also study a parameterfree problem, that is computing the Maximum-Connected COres ( MCCO s) in a given dual graph G. Here, a MCCO is a k-CCO in G such that there does not exist any (k + 1) -CCO in G. The 3-CCO in the dual graph G in Fig. 1 is also a MCCO , since there does not exist any 4-CCO in G.
We further study a subgraph search problem for the purpose of personalized query. Specifically, given an integer k and a set of query vertices, we aim to compute a k-CCO containing these vertices. In Fig. 1, given an integer k = 3 and a set of vertices {v 2 , v 5 } , the 3-CCO containing {v 2 , v 5 } is the induced dual subgraph of {v 2 , v 3 , v 4 , v 5 }.
Applications Computing k-CCO s and MCCO s can be applied in many areas. For example, to mine a research group, the researchers in the group should be connected in their collaboration network (physical graph), in which each edge represents two researchers have co-authored a paper. Simultaneously, each researcher should have enough neighbors in a similarity network (conceptual graph), in which each edge represents two researchers have similar research interests. In social networks, each user may have many interest labels, such as soccer, basketball, cartoon. A conceptual graph can be built by computing the interest similarity between any two users. A physical graph can be built by checking whether any two users follow each other. A social community should be connected in the physical graph, and each user in the group should have enough neighbors with similar interest.
Challenges It is nontrivial to compute all k-CCO s. A k-core in conceptual graph may be disconnected in the physical graph, and a connected component in the physical graph may conversely violate the degree constraint and connectivity constraint in the conceptual graph. For the problem of computing the MCCO s, let k max be the maximum k in a dual graph such that a k-CCO exists. Given the solution for computing k-CCO s, the MCCO s can be obtained if k max is known. Therefore, a main challenge in computing the MCCO s is computing k max . About the k-CCO search problem, a straightforward idea is invoking the k-CCO detection procedure as a preprocessing step and returning the one that contains all query vertices. It is costly to start by processing the whole graph, given that the size of the result subgraph is normally quite small.
Our Approaches and Contributions We propose a polynomial algorithm to compute all k-CCO s in dual graphs. It performs by recursively removing the vertex which violates the k-CCO definition. For the problem of computing MCCO s, we first follow the similar idea in computing all k-CCO s, and give a bottom-up solution. More specifically, we compute the MCCO s by iteratively removing all unsatisfied vertices. We also propose a top-down algorithm, which selects k max following a top-down strategy and returns the k-CCO s if exist. To further improve the algorithmic efficiency, we propose a binary search algorithm for computing all MCCO s. We propose an index-based solution to compute the k-CCO containing the query vertices. Based on the proposed index, we well bounded the index size and process the query in the complexity only related to the result subgraph. The experimental results show the excellent performance of our optimized algorithms. More details are given in Sect. 6. We summarize the main contributions in this paper as follows.
-A k-connected core model in dual graphs We design a k-connected core model, which inherits the properties of classic k-core model in dual graphs. To the best of our knowledge, this is the first work that studies the k-core concept in dual graphs. -A polynomial time algorithm for computing all k-connected cores. Given a dual graph G and an integer k, we propose a polynomial peeling-style algorithm, named , to compute all k-CCO s in G. We prove the time complexity of is O(h × m) . Here, m is the number of edges in the conceptual graph, and h is a value theoretically roughly bounded by but practically much less than the number of vertices in G.
-Three algorithms for computing the maximum-connected cores We give a bottom-up and a top-down algorithms for the MCCO computation. An optimized binary search algorithm is finally proposed to achieve significant speedup. -An index-based solution for searching the k-connected core We design an elegant index structure to compute a k-CCO containing a set of query vertices. Based on the proposed index structure, we give an efficient queryprocessing algorithm and a polynomial time index construction algorithm. The size of index is bounded by O(n) and the time complexity of index-based query algorithm is O(m b ) . Here, n is the number of vertices in G and m b is the number of physical edges in the result subgraph. -Extensive performance studies We conduct extensive performance studies on four synthetic graphs and six real large graphs. We also present a case study. The results demonstrate the effectiveness and efficiency of our proposed model and algorithms.
Organization The rest of this paper is organized as follows. Section 2 introduces preliminary concept and defines the problem. Section 3 proposes an algorithm for computing all k-CCO s. Section 4 studies the problem of computing all MCCO s. Section 5 studies the k-CCO search problem and proposes an index-based solution. Section 6 evaluates our proposed algorithms in extensive experiments. Section 7 introduces the related works, and Sect. 8 concludes the paper.
A short version of this paper is published in [34]. The current version extends the original paper by studying the k-CCO search problem, which is computing a k-CCO that contains a set of query vertices. We propose an index-based solution and conduct extensive experiments to show the high efficiency of our method.

Preliminaries
Cores in Simple Graphs Before studying the dual graphs, we briefly introduce several definitions and recall the problem of k-core computation in simple graphs. Let G(V, E) be an undirected graph, where V is the set of vertices and E is the set of edges. Given a vertex u in G, we use N G (u) to denote the neighbor set of u in G, i.e., N G (u) = {v ∈ V|(u, v) ∈ E} . The degree of a vertex u in G is denoted by deg G (u) , i.e., deg G (u) = |N G (u)| . Given a vertex set S, the induced subgraph of S in G is denoted by G[S], i.e., . The formal definition of k-core is given below. E) is a maximal-connected subgraph in which each vertex has degree at least k. [29] Definition 2 (Core Number) The core number of a vertex u in G, denoted by core(u) , is the maximal number of k such that u is contained in a k-core.

Definition 3 (Degeneracy)
The degeneracy of a graph G, denoted by (G) , is the maximal number of k such that a k-core exists, i.e., (G) = max u∈V core(u).
We denote the k-core containing a given vertex u by G k (u) , and have the following lemma.
Let V k (u) be the set of vertices in which each vertex v can be reached from u via a path that every vertex w in the path satisfies core(w) ≥ k . Following lemma holds: Given the core numbers of all vertices, all k-core s can be easily found based on Lemma 2. The algorithm for computing all core numbers [3] is given in Algorithm 1 . It performs by iteratively removing the vertex with minimum degree and its incident edges. The time complexity of Algorithm 1 is O(m).
Cores in Dual Graphs In this paper, we focus on an undirected dual graph G(V, E a , E b ) , where E a and E b represent the edge sets in physical graph G a and conceptual graph G b , respectively. The example of the dual graph is shown in Fig. 1. Based on the aforementioned classic k-core concept, we define the k-Connected COre ( k-CCO ) in dual graphs.
Algorithm 1 Core-Decomposition [3] Input: A graph G(V, E) Output: The core numbers of all vertices in G while ∃u ∈ V , deg G (u) < k + 1 do 5: core(u) ← k; 6: remove u and its incident edges from G ; 7: return core(u) for all u ∈ V ; Note that in the existing work [33] for computing the densest connected subgraph in dual graphs, only the connectivity in physical graph is required. This condition is insufficient to support the cohesiveness of result subgraphs, since the subgraph may be disconnected in the conceptual graph. To conquer this drawback, our k-CCO definition guarantees the connectivity for both physical and conceptual graphs. Based on Definition 4, we further define the Maximum-Connected COre ( MCCO ) below.
is a k-CCO , and (k + 1)-CCO does not exist.

Definition 6 (Maximum CCO Number)
Given a dual graph G, the maximum CCO number of G, denoted by k max (G) , is the maximum value of k such that a k-CCO exists.
Based on Definitions 4 and 5, we formally define the two problems studied in this paper as follows.

Computing K-Connected Cores
Given an integer k, we study the problem of computing all k-CCO s in this section. We first give several lemmas about k-CCO based on Definition 4. Given a dual graph G(V, E a , E b ) and a k-CCO G[C] ⊂ G , following lemmas hold.
Based on Lemmas 3 and 4, we propose a peeling algorithm for computing all k-CCO s. The pseudocode is given in Algorithm 2.
The algorithm performs by recursively removing the vertex that does not satisfy the degree constraint and the connectivity constraint in Definition 4. We compute all connected components of G a in line 2. Lemma 4 guarantees that we will not lose any k-CCO in this step. We add G b [C] to the result set if G b [C] is connected and satisfies the degree constraint (line 3-4). Otherwise, the algorithm from line 6 to line 8 computes a k-core . The correctness of this step is guaranteed by Lemma 3.
The process of Algorithm 2 can be represented by a DFS tree as depicted in Fig. 3. Each node in the tree demonstrates an input dual graph G for the invocation of Let h be the height of the tree. The time complexity of Algorithm 2 is given as follows.

Theorem 1 Given a graph G(V, E a , E b ) and an integer k, the time complexity of Algorithm 2 is O(h|E b |).
Proof Obtaining all connected components in line 2 of Algorithm 2 costs O(|E a |) time. Checking the degree constraint and connectivity of G b in line 3 costs O(|E b |) time. From line 6 to line 8, Algorithm 2 also costs O(|E b |) time to remove all vertices whose degree is less than k and compute the connected components G b [H] . Normally, we have |E a | < |E b | , and the time cost for each node in the DFS tree is Let l be the set of all input graphs on height l of DFS tree, where the height of a node is the distance from root to that node. We can find that there does not exist any vertex or edge overlap between different connected components in line 8. Given t he tree height h , we have The time complexity of Algorithm 2 is the product of two parts: -The first part is the tree height h. Note that in the DFS tree, the size of input graph in each node must be less than that in its parent node. Therefore, h is roughly bounded by |V|. However, h is much smaller than |V| in practice. In our experiments, h is not larger than 5 on all datasets. -The second part is the graph size |E b | . Given that vertices violating the degree constraint are removed in line 7 of Algorithm 2, the graph size becomes small when the tree height increases. The practical performance of Algorithm 2 is given in Sect. 6.

Computing Maximal-Connected Cores
We study the problem of computing all MCCO s in this section. A straightforward bottom-up solutionis first given in Sect. 4 , k); 10: return C;

A Bottom-Up Approach
We give a straightforward algorithm for computing all MCCO s in this section. Similar to the concept of k-core , a nest property of k-CCO can be also easily obtained according to Definition 4.

Lemma 5 Given an integer
Inspired by the lemma above, we propose a bottomup algorithm, namely -. More specifically, we iteratively compute the k-CCO s based on computed (k − 1)-CCO s when increasing k. The detailed pseudocode is given in Algorithm 3. We first compute all 1-CCO s in G (line 2). Then we iteratively increase k (line 6), and compute k-CCO s in ℂ , where ℂ is the set of the previous computed (k − 1)-CCO s (line 9). The algorithm terminates once no any k-CCO is found. The time complexity of Algorithm 3 is given as follows.

Proof
costs O(h|E b |) time in line 2. The number of iterations in line 4 is at most (G) . Since there does not exist any overlap between any two components in ℂ (line 8), the time complexity from line 4 to line The total time complexity of Algorithm 4 is obtained. □

A Top-Down Approach
A bottom-up solution is given in the previous section. Given that k max may be very large, the time consuming inmay be very large. To handle this problem, we propose a top-down algorithm, namely -, in this section.
Given a dual graph G, computing all MCCO s is equivalent to computing all k max (G)-CCO s. We adopt a top-down strategy to select the k max . An upper bound for k max (G) can be easily obtained according to Lemma 3: Based on Lemma 6, we propose the algorithmin Algorithm 4.
is invoked in line 2, and we initialize k max by the graph degeneracy in line 3. For each k max , the vertex set of k-core s of G b is obtained in line 5 based on Lemma 3.
is invoked to compute all k max -CCO s in G[C] (line 6). We terminate the algorithm if any k max -CCO is found.

Theorem 3 Given an input dual graph
Proof The proof is similar to that for Theorem 2 and is omitted here. □

Binary Searching MCCOs
We proposeandin Sects. 4.1 and 4.2, respectively. Even though they can successfully compute all MCCO s in the given dual graph G, both of them endure (G b ) times of invocation in the time complexity. To conquer this drawback, we propose a binary search algorithm, namely -, in this section. Similar to the conventional binary search, we maintain a lower bound k and an upper bound k of k, and attempt to find all k-CCO s in each iteration, where k = ⌊(k + k)∕2⌋ . If no any k-CCO is found, we know there does not exist any k ′ -CCO for k < k ′ < k according to Lemma 5. In this case, we assign the upper bound by k, and continue the search. Otherwise, we assign the lower bound by k. The procedure terminates once we find all k-CCO s and (k + 1)-CCO does not exist. The initial lower bound for k is assigned by 1, and the upper bound is assigned by (G b ) based on Lemma 6. The detailed pseudocode of is given in Algorithm 5.

Efficient K-Connected Cores Search
In this section, we study the k-connected core search problem. That is computing a k-connected core containing a set of given vertices. Note that for the ease of presentation, we mainly study the problem for only one query vertex, while we will show that the solution can be naturally extended to handle the case of over one query vertices. The problem is formally defined as follows.
Problem 3 Given a dual network G(V, E a , E b ) , an integer k and an vertex u ∈ V , computing a k-CCO in G that contains u.

Online Search Algorithm
To solve the Problem 3, we can naturally extend the Algorithm 2 and get an online search algorithm, namely -. Specifically, we compute all k-CCO s in the dual graph and return the one that contains the query vertex. Note that we can directly return an empty set if the query vertex is removed due to the degree constraint (line 7 of Algorithm 2), as there is no k-CCO containing the query vertex. The search algorithm has the same time complexity as Algorithm 2. The detailed pseudocode of is given in Algorithm 6. For the query with several vertices, we only need to check whether there is a k-CCO containing all query vertices.

Index-Based Search Algorithm
Even though the online algorithm successfully computes the k-CCO containing the query vertex, there are several drawbacks. First, the algorithm is inefficient. The result k-CCO is normally much smaller than the original graph, while the online algorithm still need to scan the whole graph at least to get the result. Additionally, in certain special case, we cannot find the result k-CCO for a query vertex, and the algorithm may be still costly in computing the k-CCO s. To address these drawbacks, we propose an index-based search algorithm in this section.

The Index Structure
Before giving the detailed index structure, we first give the following observation based on Lemma 5.

Observation 1 Given a dual network G(V, E a , E b ) , and a vertex v ∈ V , if G[C] is a k-CCO and
According to the above observation, we save the maximum k for each vertex u that will be in a k-CCO . We call such a value the connected-core number and it is formally defined as follows.

Definition 7 (Connected Core Number) Given a dual graph
The connected-core number of v, denoted by ccn(v) , is the maximal number of k such that v is contained in the k-CCO.
We have the following lemma based on Definition 7.

Lemma 7 Given a dual graph G(V, E a , E b ) and a vertex
We compute the connected-core number for all vertices as the proposed index, named Connected COre-Index ( CCO-Index ). An example is given as follows.

Example 2
We give an example of CCO-Index in Fig. 4. The connected-core number is shown over each vertex. The MCCO of G is 3-CCO containing v 2 , v 3 , v 4 and v 5 , and the connected-core number s of these nodes are 3. For nodes v 6 , v 7 , v 8 , v 9 and v 10 , they are not contained in 3-CCO and the maximum k-CCO containing them is 2-CCO , thus the connected-core number s of them are 2. Similarly, the connected-core number s of v 0 , v 1 and v 11 are 1.

Theorem 5 Given a dual graph G(V, E a , E b ) , the space complexity of CCO-Index is O(V).
Proof The proof is omitted here. □

The Query-Processing Algorithm
Based on CCO-Index , we propose a query-processing algorithm. The idea of this algorithm is based on the following theorem: where A k (u) is the set of vertices in which each vertex v can be reached from u via a path in G b that every vertex w in the path satisfies ccn(w) ≥ k.
Given that 1 ≤ k ≤ ccn(u) , based on Lemma 7, there exists S ⊆ V such that G[S] is a k-CCO and u ∈ S. In addition, according to Lemma 5 Thus, the maximal-connected core containing v ′ is k ′ -CCO . However, this contradicts the precondition that ccn(v � ) ≥ k . Therefore, Based on the Theorem 6, we propose an algorithm for k-CCO search. The pseudocode is given in Algorithm 7.
In line 2, the algorithm firstly checks whether ccn(v) lower than k. If it is, the algorithm terminates. Otherwise, in line 6, the algorithm recursively invokes a depth-first search subroutine to find the vertex set ℂ . Each vertex v in ℂ can be reached from u via a path in G b , and every vertex w in the path satisfies ccn(w) ≥ k (line 8-9). The first parameter of is the set of vertices in the k-CCO containing v. We terminate the algorithm if all vertices in k-CCO are found. The correctness of can be guaranteed by Theorem 6. C ← CCO-DFS(C, u, k); 10: return C; Example 3 We give an example of -. Consider the graph G in Fig. 2. Given k = 3 and the query vertex v 3 , the CCO-Index of G is shown in Fig. 4.

Algorithm 7 Query Processing Algorithm (CCO-Query)
depth-first searches the physical graph with v 3 as the source. {v 2 , v 4 } are the vertices whose connected-core number is greater than or equal to 3 in N G b (v 3 ) . Suppose first visits v 2 , and v 5 is the vertex which connected-core number is greater than or equal to 3 in N G b (v 2 ) .
expands to v 5 and finds no vertex to be expanded.
next visits v 4 , and the only qualified neighbor v 5 has been visited. Then the depth-first search terminates. Finally, we get the result vertex set {v 2 , v 3 , v 4 , v 5 }. G(V, E a , E b ) , a vertex v ∈ V and an integer k, the time complexity of Algorithm 7 is O(m b ) , where m b is the number of physical edges in the result subgraph.

Theorem 7 Given a graph
Proof The theorem is obvious and the proof is omitted here. □

The Index Construction Algorithm
In order to construct CCO-Index , we propose a peeling algorithm as follows. The detailed pseudocode is given in Algorithm 8. The idea of Algorithm 8 is similar to that of Algorithm 3. If

Algorithm 8 The Index Construction Algorithm (CCO-Construct)
ccn(u) = k − 1; 9: N ← ccn(u); 10: The time complexity of Algorithm 8 is given as follows. G(V, E a , E b ) . Let h be the height of the DFS tree of Algorithm 8 and  be the degeneracy of G b , the time complexity of Algorithm 8 is

Theorem 8 Given an input dual graph
Proof The proof is similar to that for Algorithm 3 and is omitted here. □

Experiments
We conduct extensive experiments to evaluate the performance of our proposed solutions. We obtain the code for from the author as a comparison. All other algorithms are implemented in C++. All the experiments are conducted on a Windows Server operating system running on a machine with an Intel Xeon 2.0 GHz CPU, 32 GB 1333 MHz DDR3-RAM. The time cost for algorithms is measured as the amount of wall-clock time elapsed during the program execution.
Real-World Datasets We evaluate the algorithms on six real graphs. The detailed statistics of these graphs are summarized in Table 1. d b is the average degree in the conceptual graph.
We adopt a similar idea in [33] to construct the dual graphs. DBLP [30] is constructed based on the computer science bibliography DBLP. We select several conferences and journals in database research area. The vertices represent the authors of the published papers. An edge exists if two authors have a common paper in the physical graph, and edges in the conceptual graph are constructed by measuring the similarity between the abstracts of papers published by any two authors. Hep-TH [18] is a theory collaboration network in high energy physics area. The construction for Hep-THis same as that for DBLP.
Epinions [23] and CiaoDVD 1 are recommendation networks. Each vertex represents a user. A physical edge exists if a user expresses a positive trust statement on the other user. To construct the conceptual graph, we calculate the correlation coefficient [22] of the common ratings between users, and connected two users by a conceptual edge if their coefficient value is larger than a threshold.
Brightkite [8] and Gowalla [8] are geosocial networks. Each vertex represents a user. The physical edges represent the friend relationship between users, and the conceptual edges are constructed based on the Euclidean distance between the locations of users.
Synthetic Datasets We adopt the same method in [33] to generate several synthetic graphs. In specific, we use the graph generator GTgraph 2 to construct both physical graphs and conceptual graphs. The statistics of generated graphs are summarized in Table 2.
Algorithms The experiment involved 1 algorithm for computing all k-CCO s, 3 algorithms for computing MCCO s, 2 algorithms for searching k-CCO and 1 algorithm for index construction. Algorithms that appeared in the experiments are summarized as follows:

Performance Studies of the Algorithm for Computing all k-CCOs
Eval-I: Evaluating the algorithm for computing all k-CCO s. The time consuming for algorithm on six realworld graphs is reported in Fig. 5. For each dataset, we select 20% × k max , 40% × k max , 60% × k max , 80% × k max and k max as the input integer k, and present a line chart. We can find that the time cost of decreases when increasing k. This is mainly because a large number of vertices are removed when the degree constraint k is large, and the result subgraph is small.

Eval-II: Evaluating the algorithms on real-world graphs
The time consuming for algorithms -, and on six real-world graphs is reported in Fig. 6a. Given that there exists no previous work on this problem, we give the time cost for computing DCS [33], namely , as a comparison in the figure. Note that the time cost for is not given in some datasets, since the procedure cannot terminate in 4 hours.
As we can see from the figure, is the fastest algorithm. It costs about 13 s in Gowallaand < 4 s in all other datasets.
is the second fastest algorithm in all datasets, while is slightly slower than -. For example, in Brightkite, and cost about 77 s and 113 s, respectively. costs about 2 s, which is almost two orders of magnitude faster than and -. As a comparison, costs over 3000 s and 750 s in DBLP and CiaoDVD, respectively, while costs only about 1.3 s and 0.7 s, respectively, in those two datasets. The result demonstrates the high efficiency of -.

Eval-III: Evaluating the algorithms on synthetic graphs
The running time for computing MCCO s in synthetic graphs is given in Fig. 6b.
is the fastest algorithm on all graph size.has a slower increasing rate than -, and is even faster than finally. This is mainly because the gap between k max and  is large given a big graph size.

Scalability Testing of the Algorithm for Computing MCCOs
We test the scalability of our proposed algorithms in this section. For each real-world dual graph, we randomly sample physical edges, conceptual edges and vertices, respectively, from 20 to 100%. When sampling physical edges, we get the incident vertices of the edges as the vertex set, and preserve the induced subgraph of this vertex set in the conceptual graph. The sampling strategy for conceptual edges is same as that for physical edges. When sampling vertices, we get the induced dual subgraph of the sampled vertices. Due to the space limitation, we only report the charts for DBLP,  Fig. 7a-c when sampling physical edges. We can see that is the fastest, and the time cost of all algorithms performs a slightly downward trend in all datasets. This is mainly due to the speedup of performing . In specific, when the physical edge size is large, a k-core in the conceptual graph is more likely to be connected in the physical graph, which means the depth of the invocation tree depicted in Fig. 3   is the fastest algorithm, and the lines for in all datasets are stable.
is the second fastest algorithm. The time cost ofpresents a relatively obvious increase from 20 to 100% in all datasets, and the gap between and decreases when edge size increases. This is mainly because the graph degeneracy  of G b increases when increasing |E b | , and the gap between  and k max increases. Therefore, more iterations in are performed, and the efficiency of declines.
Eval-VI: Sampling vertices The running time of -, and is reported in Fig. 7g-i when sampling vertices. We can see that is still the fastest in all scenarios. The chart for presents a slight increase when increasing the vertex size.
is faster than -, and in some datasets, the gap between them decreases when increasing vertex size. For example, in Epinions,costs about 0.5 s on 20% and reaches about 25 s on 100% . By contrast,costs about 5.3 s on 20% and reaches about 29 s on 100% . The main reason is similar to that in sampling conceptual edges. From the three scalability experiments, we can see that high efficiency and stability of -. The top-down solution is the second fastest, while the efficiency of highly depends on the graph structure, and the gap between k max and  . The bottom-up solutionis the slowest but performs more stable than -.

Performance Studies of the Search Algorithm
Eval-VII: Query processing (Vary k) We vary k from 5 to 25 and evaluate the efficiency of our two proposed subgraph search algorithms ( -, -). For each dataset, we select 5, 10, 15, 20 and 25 as the input integer k, and present a line chart. To assess query performance, we randomly generate 1000 queries in each dataset (each query contains one query vertex) and compute the average query time. Due to the space limitation, we only report the charts for three real-world graphs (DBLP, Hep-TH, and Brightkite) and three synthetic graphs, while the results in other datasets show the similar trends.
The results are reported in Fig. 8. We can see that is around two orders of magnitude faster than -. When k increases, the change for is not obvious in all datasets, while the processing time of performs a downward trend in all graphs. This is because the running time of is only related to the size of result subgraph, which becomes small when k is large. Eval-VIII: Index size We report the size of the our proposed index for all datasets. The size of original graph is also given as a comparison. The results are depicted in Fig. 9. Over all the datasets, the sizes of CCO-Index is equal to the size of the vertices and much less than the size of the original graph. For example, the index size of Gowalla is about 0.8 GB, while the size of the whole graph is > 22 GB.
Eval-IX: Index construction We report the running time of index construction algorithm for all datasets. The result is shown in Fig. 10.
takes about 231 s in the Gowalla dataset, and takes about only 14 s in CiaoDVD dataset.

Effectiveness Evaluation
Eval-X: Case study in Gowalla We conduct a case study to present the effectiveness of our solution. Due to the space limitation, we select a subgraph of Gowalla, and compute the MCCO in this subgraph. The result is reported in Fig. 11b. As a comparison, we also give the result of DCS in the same subgraph in Fig. 11a. We can see that there exist several vertices whose degree less than three in the DCS . This demonstrates the approximate solution for DCS may generate a result with a sparse subgraph. By contrast, the degree of each vertex is not less than k max in Fig. 11b, and the result of MCCO is cohesive.

Related Works
Computing k-core k-core is first introduced in [3,29] proposes a linear time solution for core decomposition. k-core in directed graph and weighted graphs is studied in [14,15], respectively. [7] proposes a partition-based external memory algorithm for computing k-core s. [17,32] apply a semi-external model and further speed up the core decomposition algorithm for big graphs. [25] gives a distributed algorithm for core decomposition. Given that real-world graphs are highly dynamic, core number maintenance is studied in [20,28]. Locally estimating core number is studied in [26]. Several work studies k-core in different graph models, such as uncertain graphs [4], random graphs [16,21,24,27], and attribute graphs [12]. [11,19] use k-core to detect communities in the graph. Cohesive subgraph detection in dual networks [33] studies the cohesive subgraph problem in dual networks. An approximate algorithm is proposed for computing the densest connected subgraph in the input dual graph.

Conclusion
Computing k-core s is a fundamental and important graph problem. In this paper, we define the k-connected core in dual graphs. A subgraph g is a k-connected core if the minimum degree of g is at least k in the conceptual graph, and g is connected in both conceptual graph and physical graph. We propose a polynomial time algorithm for computing all k-connected cores in the dual graph. We also propose three algorithms for computing all maximum-connected cores, which are the maximum k-connected cores such that a (k + 1) -connected core does not exist. Given a set of query vertices, we study the k-CCO search problem and propose an indexbased solution. We do extensive experiments to demonstrate the effectiveness and efficiency of our propose algorithms.