Abstract
Despite being very successful within the pattern recognition and machine learning community, graphbased methods are often unusable because of the lack of mathematical operations defined in graph domain. Graph embedding, which maps graphs to a vectorial space, has been proposed as a way to tackle these difficulties enabling the use of standard machine learning techniques. However, it is well known that graph embedding functions usually suffer from the loss of structural information. In this paper, we consider the hierarchical structure of a graph as a way to mitigate this loss of information. The hierarchical structure is constructed by topologically clustering the graph nodes and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure is constructed, we consider several configurations to define the mapping into a vector space given a classical graph embedding, in particular, we propose to make use of the stochastic graphlet embedding (SGE). Broadly speaking, SGE produces a distribution of uniformly sampled lowtohighorder graphlets as a way to embed graphs into the vector space. In what follows, the coarsetofine structure of a graph hierarchy and the statistics fetched by the SGE complements each other and includes important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, obtaining a more robust graph representation. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the stateoftheart methods.
1 Introduction
Graphbased methods have been very successful for pattern recognition, computer vision and machine learning tasks [16, 25, 77]. However, due to their symbolic and relational nature, graphs have some limitations if we compare them with the traditional statistical (vectorbased) representations. Some trivial mathematical operations do not have an equivalence in the graph domain. For example, computing pairwise sums or products (which are elementary operations in many classification and clustering algorithms) is not defined in a standard way in the graph domain. In the literature, a possible way this problem has been addressed is by means of embedding functions. Given a graph space \({\mathbb {G}}\), an explicit embedding function is defined as \(\varphi :{\mathbb {G}}\rightarrow {\mathbb {R}}^n\) which maps a given graph to a vector representation [12, 29, 47, 65, 68] whereas an implicit embedding function is defined as \(\varphi :{\mathbb {G}}\rightarrow {\mathcal {H}}\) which maps a given graph to a highdimensional Hilbert space \({\mathcal {H}}\) where a dot product defines the similarity between two graphs \(K(G,G')=\langle \varphi (G),\varphi (G') \rangle\), \(G,G'\in {\mathbb {G}}\) [18, 27, 32, 35]. In the graph domain, the process of implicitly embedding graph is termed as graph kernel which basically defines a way to compute the similarity between two graphs. However, defining such embedding functions is extremely challenging, when the constraints on time efficiency and preserving the underlying structural information is concerned. The problem becomes even more difficult with the growing size of graphs, as the structural complexity increases the possibility of noise and distortion in structure, and raises risk of loosing information. Hierarchical representation is often used as a way to deal with noise and distortion [50, 76], which provides a stable delineation for an underlying object. Hierarchical representations allow to incrementally contract the graph, in a spacescale representation, so the salient features (relevant subgraphs) remain in the hierarchy. Thus, top levels become a compact and stable summarization.
Processing information using a multiscale representation is successfully employed in computer vision and image processing algorithms, which is mostly inspired by its resemblance with human visual perception [1]. It is observed that a naturalistic visual interpretation always demands a data structure able to represent scattered local information as well as summarized global facts [33]. Hierarchical representation is often used as a paradigm to efficiently extract the global information from the local features. Apart from that, hierarchical models are also believed to provide time and spaceefficient solutions [76]. Motivated by the abovementioned intuition and the existing works in the related fields, many authors have come up with different hierarchical graph structures for solving various problems [22, 23, 48, 76]. In this sense, it is worth to mention the work of Mousavi et al. [50], who presented a hierarchical framework for graph embedding, although they did not explore the complex encoding of the hierarchy.
In this paper, motivated by the successes of the hierarchical models and the efficiency of graph embedding theory, we propose a general hierarchical graph embedding formulation that first creates a hierarchical structure from a given graph and then utilizes the multiscale structure to explicitly embed a graph in a real vector space by means of local graphlets. First, we make use of the graph clustering algorithm proposed in [31] to obtain a hierarchical graph representation of a given input graph. Here, each cluster of nodes in a level i is depicted as a single node in the upper hierarchical level \(i+1\), whereas the edges in a level are connected depending on the original topology of the base graph, and the hierarchical edges are created by joining a node representing a cluster to all the nodes in the lower level. Thus, we propose a richer encoding than Mousavi [50], because our hierarchy not only contains different graph abstractions but also encodes useful hierarchical contractions through the hierarchical edges.
Once the hierarchical structure of a graph is created, we propose a novel use of the Stochastic Graphlet Embedding (SGE) [21] to exploit this hierarchical information. On the one hand, we can exploit the local configuration in form of graphlets thanks to the SGE design, because graphlets provide information at different neighborhood sizes. On the other hand, the hierarchical connections allow to encode more abstract information and hence to deal with noise present in the data. As a result, the Hierarchical Stochastic Graphlet Embedding (HSGE) encodes a global and compact representation of the graph that is embedded in a vector space. The consideration of the entire graph hierarchy for the embedding instead of only the base graph empowers the representation ability and handles the loss of information that usually occurs in graph embedding methods. Moreover, the statistics obtained from the uniformly sampled graphlets of increasing size model the complex interactions among different object parts represented as graph nodes. Here, the hierarchical graph structure and the statistics of increasing sized graphlets fetch important structural information of varied contexts.
As a result, our approach produces robust representations that can benefit from the advantages of the two abovementioned strategies: we first take advantage of the embedding ability for mapping symbolic relational representations to ndimensional spaces, so machine learning approaches can be used; and second, the ability of hierarchical structures to reduce noise and distortion inherently involved in graph representations of real data, keeping the more stable and relevant substructures in a compact way.
In conclusion, the main contribution of our work is the exploitation of the hierarchical structure of a given graph, rather than only studying the base graph for graph embedding purposes. Assessing the hierarchical information of a graph pyramid allows to extend the representation power of the embedded graph and tolerate the instability caused due to noise and distortion. Our proposal is robust because, on the one hand, it organizes the structural information in the hierarchical abstraction, and on the other hand, it considers the relation between object parts and their complex interactions with the help of uniformly sampled graphlets of unbounded size. Additionally, the proposed method is generic and can adapt any other graph embedding algorithm in the framework. In this sense, we extensively validated our proposed algorithm on many different benchmark graph datasets coming from different application domains.
The rest of this paper is organized as follows: Sect. 2 describes the related works in the literature. In Sect. 3, we introduce some definitions and notations related to the work. Our generic hierarchical graph representation is presented in Sect. 4. Section 5 introduces the Stochastic Graphlet Embedding as the base embedding we will use. Afterward, Sect. 7 reports our experimental validation and compares the proposed method with available stateoftheart algorithms. Finally, in Sect. 8 we draw the conclusions and describe the future direction of the present work.
2 Related work
In what follows, we review the related works, respectively, on explicit and implicit graph embedding techniques, different hierarchical models and graph summarization methods, which we believed to be relevant to the main focus of the present paper.
2.1 Graph embedding
Graph embedding methods are mainly divided into two different categories: (1) explicit graph embedding, (2) implicit graph embedding or graph kernel.
2.1.1 Explicit graph embedding
Explicit graph embedding refers to those techniques that aim to explicitly map graphs to vector spaces. The methods belonging to this category can be further divided into four different classes. The first one, known as graph probing [47], needs measuring the frequency of specific substructures (that capture content and topology) into graphs. Based on different graph substructures (e.g., node, edge, subgraph etc.) considered, different embedding techniques have been proposed. For example, Shervashidze et al. [68] studied the nonisomorphic graphlets, albeit, node label and edge relation statistics are considered by Gibert et al. [29]. Saund in [65], introduced a bottom up graph lattice in order to efficiently extract the subgraph features in preprocessed administrative documents, while Dutta and Sahbi [21] proposed a distribution of stochastic graphlets for embedding graphs into a vector space. The second class of graph embedding techniques is based on spectral graph theory [13, 34, 37, 39, 64, 82], which aims to analyze the structural properties of graphs in terms of the eigenvectors/eigenvalues of the adjacency or Laplacian matrices of a graph [82]. Recently, Verma and Zhang [78] proposed a family of graph spectral distances for robust graph feature representation. Despite their relative successes, spectral methods are quite prone to structural noise and distortions. The third class of methods is inspired by dissimilarity measurements proposed in [56]; in this context, Bunke and Riesen have presented several works on the vectorial description of a given graph by its distances to a number of preselected prototype graphs [9, 12, 62, 63]. Motivated by the recent advancements of deep learning and neural networks, many researchers have proposed to utilize neural network for obtaining a vectorial representation of graphs [4, 17, 30, 36, 55], which results in the fourth category of methods, called geometric deep learning.
2.1.2 Implicit graph embedding
Implicit graph embedding or graph kernel methods is primarily another way to embed graphs into a vector space. They are also popular for the ability to efficiently extend the existing machine learning algorithms to nonlinear data, such as, graphs, strings etc. Graph kernel methods can be roughly divided into three different categories. The first one, known as diffusion kernel, is based on the similarity measures among the subparts of two graphs, and propagating them on the entire structure to obtain global similarity measure for two graphs [43, 72]. The second class of methods, called as convolution kernel, aims to measure the similarity of composite objects (modeled with graph) from the similarity of their parts (i.e., nodes) [80]. This type of graph kernel derives the similarity between two graphs G, \(G'\) from the sum, over all decompositions, of the similarity products of the subparts of G and \(G'\) [52]. Recently, Kondor and Pan [38] proposed multiscale Laplacian graph kernel having the property of lifting a base kernel defined on the vertices of two graphs to a kernel between graphs. The third class of methods is based on the analysis of the common substructures that belong to both graphs and is termed as substructure kernel. This family includes the graph kernel methods that consider random walks [27, 79], backtrackless walks [5], shortest paths [8], subtrees [68], graphlets [70] as the substructure. Different from the above three categories, Shervashidze et al. [69] proposed a family of efficient graph kernels on the WeisfeilerLehman test of graph isomorphism, which maps the original graph to a sequence of graphs. More recently, inspired by the successes of deep learning, Yanardag and Viswanathan [83] presented a unified framework to learn latent representations of substructures for graphs. They claimed that given a precomputed kernel of graphs, their proposed technique produces an improved representation that leverages hidden representations of substructures.
2.2 Hierarchical graph representation
In general, hierarchical models have been successfully employed in many different domains within the computer vision and image processing field, such as, image segmentation [22, 48], scene categorization [23], action recognition [54], shape classification [18], graphic recognition [10], 3D object recognition [76] etc. These approaches usually exploit some kind of pyramidal structure containing information at various resolutions. Usually, at the finest level of the pyramid, the captured information is related to local features, whereas, at coarser levels, global aspects of the underlying data are represented. This way of representation helps to interpret knowledge in a naturalistic way [33].
Inspired by the above intuition, hierarchical structures are often employed to extract coarsetofine information from a graph representation. Pelillo et al. [57] proposed to match two hierarchical structures as a clique detection problem on their association graph, which was solved with a dynamic programming approach. In [71], Shokoufandeh et al. presented a spectral characterization based framework for indexing hierarchical structures that embed the topological information of a directed acyclic graph. Hierarchical representation of objects and an elastic matching procedure are also proposed from deformable shape matching in [24]. In [46], Liu et al. utilized hierarchical graph representation and a stochastic sampling strategy for layered shape matching and registration problem. A graph kernel based on hierarchical bagofpaths where each path is associated to a hierarchy encoding successive simplifications is presented in [18]. Ahuja and Todorovic [2] used a hierarchical graph of segmented regions for object recognition. Motivated by them, Broelemann et al. [10, 11] proposed two closely related approaches based on hierarchical graph for errortolerant matching of graphical symbols. Mousavi et al. [50] proposed a graph embedding strategy based on hierarchical graph representation, which considers different levels of a graph pyramid. They claimed that the proposed framework is generic enough to incorporate any kind of graph embedding technique. However, the authors did not take advantage of the complex and rich encoding of hierarchy.
From the literature review, we can conclude that although there are some works in the graph domain exploiting the hierarchical graph structure, most of them are focused on some kind of error tolerance or elastic matching. Utilization of this type of multiscale representation of graph for vector space embedding is quite rare and has not been properly explored yet. This fact has worked as our motivation to work on a graph hierarchical structure for explicit graph embedding task.
3 Definitions and notations
In this section, we introduce some definitions and notations, which are relevant to the proposed work.
Definition 1
(Attributed Graph) An attributed graph is a \(4\text {tuple}\)\(G=(V,E,L_V,L_E)\) comprising a set V of vertices together with a set \(E\subseteq V\times V\) of edges and two mappings\(L_V:V\rightarrow {\mathbb {R}}^m\) and \(L_E:E\rightarrow {\mathbb {R}}^n\) which, respectively, assign attributes to the nodes and edges.
Attributed graphs have been widely used for all sort of realworld problems. The most common methodologies are errortolerant graph matching [51, 67], graph kernels and embedding techniques [41].
Definition 2
(Subgraph) Given an attributed graph \(G=(V,E,L_V,L_E)\), another attributed graph \(G'=(V',E',L_V',L_E')\) is said to be a subgraph of G and is denoted by \(G'\subseteq G\) iff,

\(V'\subseteq V\)

\(E'=E\cap V'\times V'\)

\(L_V'(u)=L_V(u)\), \(\forall u \in V'\)

\(L_E'(e)=L_E(e)\), \(\forall e \in E'\)
A graphletg of G is nothing but a subgraph which inherits the topology and the attributes of G. In the literature, subgraphs are often used for errortolerant matching [7, 19, 66, 73, 75] and frequent pattern discovery problems [2, 6, 42].
Definition 3
(Hierarchical graph) A hierarchical graphH is defined as a 6tuple \(H=(V,E_N,E_H,L_V,\, L_{E_N},L_{E_H})\) where V is the set of nodes; \(E_N \subseteq V\times V\) are the neighborhood edges; \(E_H \subseteq V\times V\) are the hierarchical edges; \({\text {L}}_{{\text {V}}}\), \({\text {L}}_{{{\text {E}}}_{{\text {N}}}}\) and \({\text {L}}_{{{\text {E}}}_{{\text {H}}}}\) are three labeling functions defined as \({\text {L}}_{{{\text {V}}}}:V \rightarrow \Sigma _V \times A^k_V\), \({\text {L}}_{{{\text {E}}}_{{\text {N}}}}: E_N \rightarrow \Sigma _{E_N} \times A^l_{E_N}\) and \({\text {L}}_{{{\text {E}}}_{{\text {H}}}}: E_H \rightarrow \Sigma _{E_H} \times A^m_{E_H}\), where \(\Sigma _V\), \(\Sigma _{E_N}\) and \(\Sigma _{E_H}\) are three sets of symbolic labels for vertices and edges, \(A_V\), \(A_{E_N}\) and \(A_{E_H}\) are three sets of attributes for vertices and edges, respectively, and \(k,l,m\in {\mathbb {N}}\).
Prior works used hierarchical structures for allowing a reasonable tolerance in the representation paradigm [11, 18, 24] and also for bringing robustness in the feature representation [46].
4 Hierarchical embedding
In the literature, only few embedding approaches exploit the idea of multiscale or abstraction information [38]. This section is devoted to provide a framework able to include this information given a graph embedding. Some works that have been proposed to exploit the mentioned multiscale information in the literature [20, 50, 59] discard the hierarchical information provided by the hierarchical edges and focus on abstractions of the original graph.
4.1 Graph clustering
Graph clustering has been widely used in several fields such as social and biological networks [31], recommendation systems [28, 44] etc. It can be roughly described as the task of grouping graph nodes into clusters depending on the graph structure. Ideally, the grouping should be performed in such a way that intracluster nodes are densely connected whereas the connections among intercluster nodes are sparse. For example, Girvan and Newman [31] propose a graph clustering algorithm to detect a community structures for studying social and biological networks. Li et al. [28, 40, 44, 45] have proposed several graph clustering techniques for recommendation systems based on different strategies: context awareness [28], inclusion of frequency property [44], distributed clustering confidence [40], etc. Here we do not further review on graph clustering algorithms since it is not within the main scope of this paper. However, we would like to remark that one of the most important aspects of graph clustering is the evaluation of cluster quality, which is crucial not only to measure the effectiveness of clustering algorithms, but also to give insights on the dynamics of relationships in a given graph. For a detailed overview on effective graph clustering metrics, the interested readers are referred to [3].
Even though any graph clustering algorithm can be used, we use the standard divisivebased Girvan–Newman algorithm [31] for our purpose, because it provides structurally meaningful clusters of a given graph. The Girvan–Newman algorithm is an intuitive and wellknown algorithm used for community detection in complex systems. It is a global divisive algorithm which removes the appropriate edge iteratively until all the edges are deleted. At each iteration, new clusters can emerge by means of connected components. The idea is that the edges with higher centrality are the candidates to be connecting two clusters. Therefore, betweenness centrality measure of the edges [26] is used to decide which edge is being removed. Betweenness centrality on an edge \(e \in E\) is defined as the number of shortest walks between any pair of nodes that cross e. The output of this algorithm is a dendrogram codifying a hierarchical clustering of nodes. This algorithm consists of 4 steps:

1.
Calculate the betweenness centrality for all edges in the network.

2.
Remove the edge with highest betweenness and generate a cluster for each connected component.

3.
Recalculate betweennesses for all edges affected by the removal.

4.
Repeat from step 2 until no edges remain.
In this work, Girvan–Newman algorithm is early stopped given a reduction ratio \(r \in {\mathbb {R}}\). Therefore, the number of clusters is forced to be \(\lfloor r \cdot V \rfloor\).
4.2 Hierarchical construction
Given a graph G and a clustering \(C = \{C_1,\ldots ,C_k\}\), each cluster is summarized into a new node with a representative label (see line 5). Let us consider that this label can be defined as the result of an embedding function applied to the subgraph defined by the clustered nodes and their edges. Moreover, edges between the new nodes are created depending on a connection ratio between clusters. That means that an edge is only created if there are enough connections between the set of nodes defined by both clusters (see line 7). Finally, hierarchical edges are created connecting the new node \(v_{C_i}\) with all the nodes belonging to the summarized cluster \(C_i\) (see line 12). The proposed hierarchical construction is similar to the one proposed by Mousavi et al. [50] but including explicitly the summarization generated by the clustering algorithm by means of the hierarchical edges. Thus, the proposed hierarchical construction obtains a representation which encodes abstract information by means of the clusters while keeping the relation with the original graph.
Let us introduce some notations that will be used in the following sections. Given a graph G and a number of levels L, \(H_G\) denotes their corresponding hierarchical graph computed from G with L levels. \(H_G^l\), where \(l = \{0,\ldots ,L\}\) is a graph without hierarchical edges corresponding to the l level of summarization, therefore, \(H_G^0 = G\). Moreover, \(H_G^{l_1,l_2}\) where \(l_i = \{0,\ldots ,L\}\) and \(l_1\le l_2\), corresponds to the hierarchical graph compressed between levels \(l_1\) and \(l_2\). Hence, \(H_G = H_G^{0,L}\) and \(H_G^l = H_G^{l,l}\). Finally, \(H_G^{l_1} \cup H_G^{l_2}\) corresponds to the union of two graphs without hierarchical edges.
Figure 1a shows the construction of the hierarchy given a graph G. Each level shows an abstraction of the input graph where the nodes have been reduced.
4.3 Hierarchical embedding
This section introduces a novel way to encode hierarchical information of a graph into an embedding. Moreover, the proposed technique is generic in the sense that can be used by any graph embedding function.
Given a graph G which should be mapped into a vectorial space and an embedding function \(\varphi :{\mathbb {G}}\rightarrow {\mathbb {R}}^n\), we first proceed to obtain hierarchical representation \(H_G\) following the proposed methodology in Sect. 4.2. Therefore, \(H_G\) has enriched the original graph with abstract information considering L levels. Finally, we propose to make use of the hierarchical information to construct a hierarchical embedding. The general form of the proposed embedding takes into account graphs at multiple scales and hierarchical relations. Thus, the embedding function does not only compactly encode the contextual information of nodes at different abstraction levels, but also it encodes the hierarchy contraction. The embedding function is defined as follows:
where
where \(K \le L\) are the hierarchical levels taken into account and \(k_1,k_2 \le K\) indicate the number of levels taken into account at the same time. Note that \(K=L\), \(k_1=K\) and \(k_2=K\) will take into account the whole hierarchy and possible combinations. From this general representation of the proposed embedding, we have evaluated some particular cases (the reader is referred to Sect. 7 for more details on the experimental evaluation).
Baseline embedding This embedding is the one used as a baseline. In this scenario \(K=0\), \(k_1=0\) and \(k_2=0\), therefore \(\Phi (H_G) = \varphi (H_G^0)\). No abstract information is taken into consideration, hence, \(\Phi (H_G) = \varphi (G)\).
Pyramidal embedding This embedding has been previously proposed in the literature [20, 50]. It combines information of the abstract levels of the graph, i.e., \(H_G^i\) not taking into account hierarchical information. Therefore, the hierarchical edges are discarded and no relation between levels is considered, \(K\ge 1\), \(k_1=0\) and \(k_2=0\). We define \(\Phi _{\text {pyr}}(H_G) = [\varphi (H_G^0),\ldots ,\varphi (H_G^K)]\). Note that each element corresponds to independent levels of the hierarchy without hierarchical edges.
Generalized pyramidal embedding Following the previous idea, the information of the abstract levels of the graph, i.e., \(H_G^i\) is combined. Now, hierarchical information is taken into account by embedding unions of levels, i.e., \(H_G^{i_1} \cup H_G^{i_2}\) but discarding hierarchical edges (no clustering information is taken into account). In this scenario \(K\ge 1\), \(k_1=0\) and \(k_2\ge 1\), therefore, we define \(\Phi _{\text {gen}\_\text {pyr}}(H_G) = [\varphi (H_G^0),\ldots ,\varphi (H_G^K),\varphi (H_G^0 \cup H_G^1),\ldots ,\varphi (H_G^{K1} \cup H_G^K), \ldots , \varphi (H_G^0 \cup \cdots \cup H_G^{k_2}),\ldots ,\varphi (H_G^{Kk_2} \cup \cdots \cup H_G^K)]\).
Hierarchical embedding This embedding is computed mixing different levels considering them as a single graph through the hierarchical edges, \(K \ge 1\), \(k_1 \ge 1\) and \(k_2=0\). The idea is to create an embedding able to codify both, graph and clustering information. Depending on the embedding, hierarchical edges can make use of special label to treat them differently. The hierarchial embedding is defined as \(\Phi _{\text {hier}}(H_G) = [\varphi (H_G^0),\ldots ,\varphi (H_G^K),\varphi (H_G^{0,1}),\ldots ,\varphi (H_G^{K1,K}) ,\ldots , \varphi (H_G^{0,k_1}), \ldots ,\varphi (H_G^{Kk_1,K})]\). Note that each element corresponds to the subhierarchy compressed between the specified levels.
Exhaustive embedding Finally, in order to take into consideration the whole hierarchy, we can make use of the whole embedding \(\Phi\) as defined in Eq. (1) where \(K \ge 1\), \(k_1, k_2 \ge 1\).
Figure 1b shows the graphs taken into consideration when the hierarchical embeddings are computed.
5 Stochastic graphlet embedding
The Stochastic Graphlet Embedding (SGE) can be defined as a function \(\varphi :{\mathbb {G}} \rightarrow {\mathbb {R}}^n\) that explicitly embeds a graph \(G\in {\mathbb {G}}\) to a highdimensional vector space \({\mathbb {R}}^n\) [21]. The entire procedure of SGE can be described in two stages (see Fig. 2), where in the first step, the method samples graphlets from G in a stochastic manner and in the second step, it counts the frequency of each isomorphic graphlet from the extracted ones in an approximated but near accurate manner. The entire procedure fetches a precise distribution of connected graphlets with increasing number of edges in G with a controlled complexity, which fetches the relation among information represented as nodes and their complex interaction.
5.1 Stochastic graphlets sampling
Considering a graph \(G=(V,E,L_V,L_E)\), the goal of the graphlet extraction procedure is to obtain statistics of stochastic graphlets with increasing number of edges in G. The way of extracting graphlets is stochastic and it uniformly samples graphlets with boundlessly increasing number of edges without constraining their topology or structural properties such as maximum degree, maximum number of nodes, etc. Our graphlet sampling procedure, outlined in Algorithm 2, is recurrent and the number of recurrences is controlled by a parameter M that indicates the number of distinct graphlets to be sampled (see line 2 of Algorithm 2). Also, each of these M recurrent processes is regulated by another parameter T that denotes the maximum number of iterations a single recurrent process should have (see line 5). Since each of these iterations adds an edge to the presently constructing graphlet, T indirectly specifies the maximum number of distinct edges each graphlet should contain. Considering \(U_t\) and \(A_t,\) respectively, as the aggregated sets of visited nodes and edges till iteration t, they are initialized at the beginning of each recurrent step as \(A_0=\emptyset\) and \(U_0=\lbrace u \rbrace\) with a randomly selected node u which is uniformly sampled from V (see line 4). Thereafter, at tth iteration (with \(t\ge 1\)), the sampling procedure randomly selects an edge \((u,v)\in E \backslash A_{t1}\) that is connected from any node \(u\in U_{t1}\) (see line 7). Accordingly, the process updates \(U_t \leftarrow U_{t1} \cup \lbrace v \rbrace\) and \(A_{t} \leftarrow A_{t1} \cup \lbrace (u,v) \rbrace\) (see line 8). All these processes within a recurrent step are repeated T times to sample a graphlet with maximum T edges. M is set to relatively large values in order to make the graphlet generation statistically meaningful. Theoretically, the values of M are guided by the theorem of sample complexity [81], which is widely studied and used in the Bioinformatics domain [58, 70]. However, the discussion and proof of that is out of scope of the current paper. Intuitively, the graphlet sampling procedure explained in this section follows a random walk process with restart that efficiently parses G and extracts the desired number of connected graphlets with an increasing number of edges. This algorithm allows to sample connected graphlets from a given graph but avoids expensive way of extracting them in an exact manner. Here the hypothesis is that if a sufficient number of graphlets are sampled, then the empirical distribution will be close to the actual distribution of graphlets in the graph. Furthermore, it is important to note that from the above process, one can extract, in total, \(M \times T\) graphlets each with number of edges varying from 1 to T.
5.2 Hashed graphlets distribution
For obtaining a distribution of the extracted graphlets from G, it is needed to identify sets of isomorphic graphlets from the sampled ones and then count cardinality of each isomorphic set. A trivial way of doing that certainly involves checking the graph isomorphism for all possible pairs of graphlets for detecting possible partitions that might exist among them. Nevertheless, graph isomorphism is a GIcomplete problem [49] for general graphs, so the previously mentioned scheme is extremely costly as the method samples huge number of graphlets with many edges. An alternative, efficient and approximate way of partitioning isomorphic graphlets is graph hashing. A graph hash function that can be defined as a mapping \(h:{\mathbb {G}} \rightarrow {\mathbb {R}}^m\) that maps a graph into a hash code (a sequence of real numbers) based on the local as well as holistic topological characteristic of graphs. An ideal graph hash function should map two isomorphic graphs to the same hash code as well as two nonisomorphic graphs to two different hash codes. While it is easy to design hash functions satisfying the condition that two isomorphic graphs should have the same hash code, it is extremely difficult to find hash function that ensures different hash codes for every pair of nonisomorphic graphs. An alternative is to design graph hash functions with low collision probability, i.e., mapping any two nonisomorphic graphs to the same hash code with a very low probability. For obtaining a distribution of graphlets, the main aim of graph hashing is to assign extracted graphlets from G to corresponding subsets of isomorphic graphlets (a.k.a. partition index or histogram bins) in order to count and quantify their distributions. The proposed mechanism for obtaining the distribution of uniformly sampled graphlets, outlined in Algorithm 3, maintains a global hash table \({\mathbf {H}}\), whose single entry corresponds to a hash code of a graphlet g produced by the graph hash function. \({\mathbf {H}}\) grows incrementally as the algorithm confronts new graph hash codes and maintains all the unique hash codes encountered by the system. It is to be noted that the position of each unique hash code is kept fixed, because each position corresponds to a partition index or histogram bin. Now to allocate a given graphlet g to its corresponding histogram bin, its hash code h(g) is mapped to the index of the hash table \({\mathbf {H}}\), whose corresponding graph hash code gives a hit with h(g) (see line 8). If h(g) does not exist in \({\mathbf {H}}\) at some instance, it is considered as a new hash code (and hence g as a new graphlet) encountered by the system and appended h(g) at the end of \({\mathbf {H}}\) (see line 6).
Designing hash functions that yield identical hash codes for two isomorphic graphlets is quite simple, whereas, prototyping those providing two distinct hash codes for two nonisomorphic graphs is very challenging. The chance of mapping two nonisomorphic subgraphs to the same hash code is termed as probability of collision. Indicating \(H_0\) as the set of all pairs of nonisomorphic graphs, the probability of collision can be expressed as the following energy function:
So, in terms of collision probability, the hash functions that produce comparatively lower E(f) values in Eq. (4) are considered to be more reliable for checking the graph isomorphism. It has been studied that sorted degree of nodes has 0 collision probability for all graphs with number of edges less or equal to 4 [21]. Moreover, it is also a wellknown fact that two graphs with the same betweenness centrality (sorted) would indeed be isomorphic with high probability [15, 53]. For example, sorted betweenness centrality has collision probabilities equal to \(3.2e^{4}\), \(1.9e^{4}\), \(1.1e^{4},\) respectively, for graphlets with 7, 8 and 9 edges. Interested readers are requested to see [21] for further discussions and analysis on various graph hash functions and corresponding elaboration on probability of collision. Considering the above facts, in this work, we consider sorted degree of nodes for graphlets with \(t\le 4\) and the betweenness centrality for graphlets with \(t\ge 5\).
It should be observed that the distribution of sampled graphlets obtained the way mentioned until now, only considers the topological structure of a graph, and ignores the node and edge attributes. However, it is worth mentioning that the stochastic graphlet embedding permits to consider a small set of nodes and edge attributes by creating respective signatures and then appending it to the hash code encoding the topology of the graphlet. In this work, if needed, we first discretize the existing continuous attributes using a combination of clustering algorithm such as kmeans and pooling technique. Later, the sorted discrete node and edge labels are used as the attribute signatures and combined with the hash code.
5.3 Hierarchical stochastic graphlet embedding
In this work, we propose to combine the properties of the proposed Stochastic Graphlet Embedding with the Hierarchical Embedding introduced in the previous section.
On the one hand, SGE provides statistical information about local structures varying the number of edges involved. Therefore, it provides finegrained insights of the graph which cannot deal with too noisy data. The use of abstractions provided by the graph hierarchy increases the receptive field of each graphlet moving to coarser information that is able to provide insights of the global graph information. Moreover, the use of hierarchical edges during the computation allows to combine information at some levels, i.e., combining different levels of detail (see Eq. (1)). For now on, we will denote this embedding as Hierarchical Stochastic Graphlet Embedding (HSGE).
6 Computational complexity
This section is devoted to study the computational complexity of the proposed approach given a graph \(G=(V,E,L_V,L_E)\) where \(V=n\) and \(E=m\).
6.1 Hierarchical embedding complexity
Graph clustering algorithms are usually high computational complexity techniques. As it has been stated in Sect. 4.3, the Girvan–Newman algorithm has been chosen as a graph clustering technique. The Girvan–Newman algorithm is based on the betweenness centrality of the edges which has a time complexity of \({\mathcal {O}}(n \cdot m)\) for unweighted graphs and \({\mathcal {O}}(n \cdot m + n\cdot (n+m) \log (n))\) for weighted graphs. Hence, the Girvan–Newman algorithm, which has to remove all the edges, can be computed in \({\mathcal {O}}(n \cdot m^2)\) for unweighted graphs and \({\mathcal {O}}(n \cdot m^2 + n\cdot m \cdot (n+m) \log (n))\) for weighted graphs.
Assuming an embedding function \(\varphi\) which has a complexity of \({\mathcal {O}}(N)\) and assuming that the hierarchical graph construction has a complexity of \(C_1\), then, if we assume L levels, the proposed configurations would become a complexity \({\mathcal {O}}(C_1 + L\cdot N)\) in the case of the pyramid and \({\mathcal {O}}(C_1 + L^2\cdot N)\) for the hierarchy and the exhaustive embeddings.
6.2 Stochastic graphlet embedding complexity
The computational complexity of Algorithm 2 is \({\mathcal {O}}(M \cdot T)\) where M is the number of graphlets to be sampled and T is the maximum size of graphlets in terms of the number of edges. Assuming a hash function with a complexity of \({\mathcal {O}}(C_2)\), Algorithm 3 has a time complexity of \({\mathcal {O}}(M \cdot T \cdot C_2)\) for computing the stochastic graphlet embedding. Here it is worth mentioning that “degree of nodes” and “betweeness centrality,” respectively, have the time complexity of \({\mathcal {O}}(n)\) and \({\mathcal {O}}(n \cdot m)\). From the above explanation, it is clear that the complexity of these two algorithms do not depend on the size of the input graph G, but only on the parameters M, T and the hash functions used.
7 Experimental validation
This section presents the experimental results obtained by our proposed Hierarchical Stochastic Graphlet Embedding method. The main aim of this experimental study is to validate the proposed graph embedding technique for the graph classification task, which demands robust embedding technique for mapping a graph into a vector space. For experimentation, we have considered many different widely used graph datasets with varied characteristics. All these graphs come from real data generated in the fields of Biology, Chemistry, Graphics and Handwriting recognition. The MATLAB code of our experiment is available at https://github.com/priba/hierarchicalSGE.
7.1 Experiments on molecular graph datasets
The first set of experiments is conducted on various benchmarks of molecular graphs. Below, we provide a brief description of them followed by the experimental setup, results and discussions.
7.1.1 Dataset description
Several bioinformatics datasets have been used: MUTAG, PTC, PROTEINS, NCI1, NCI109, D&D and MAO. These datasets have been widely used as benchmark in the literature. The MUTAG dataset contains graph representations of 188 chemical compounds which are either mutagenic aromatic or heteroromatic nitro compounds where nodes can have 7 discrete labels. The PTC or Predictive Toxicology Challenge dataset consists of 344 chemical compounds known to cause or not cause cancer in rats and mice. It has 19 discrete node labels. The PROTEINS dataset contains relations between secondary structure elements (SSEs) represented by nodes and neighborhood in the aminoacid sequence or in 3D space by edges. It has 3 discrete labels viz. helix, sheet or turn. The NCI1 and NCI109 come from the National Cancer Institute (NCI) and are two balanced subsets of chemical compounds screened for their ability to suppress or inhibit the growth of a panel of human tumor cell lines, having 37 and 38 discrete node labels, respectively. The D&D dataset consists of enzymes and nonenzymes proteins structures, in which their nodes are amino acids. The MAO database, taken from GREYC Chemistry graph dataset collection, is composed of 68 graphs representing molecules that either inhibit or not the monoamine oxidase, which is an antidepressant drug. Some more details on the proposed bioinformatics datasets are provided in Table 1.
7.1.2 Experimental setup
We have performed two different experiments: the first one does not use the attribute information encoded in the nodes and edges of the graphs, whereas the second experiment does use the available node and edge features. For evaluating the performance of the proposed embedding technique, we have used a CSVM solver [14] as a classifier. Since the datasets considered in this set of experiments do not contain predefined train and test sets, we have used a 10fold crossvalidation scheme to obtain accuracies and have reported the mean accuracies, respectively, in Tables 2 and 3 for unlabeled and labeled datasets. We follow a classical graph classification pipeline, where, in the first stage, graph embedding is computed by our proposed scheme, whereas in the second step, embedded graphs are classified using a previously trained classifier.
7.1.3 Results and discussion
In Table 2, we present the experimental results obtained by our proposed hierarchical embedding techniques together with other existing works on the unlabeled datasets. The previously mentioned three configurations of our hierarchical embedding are, respectively, denoted as: pyramidal, hierarchical and exhaustive. For unlabeled datasets, we have considered 10 different stateoftheart methods: (1) random walk kernel (RW) [27], (2) shortest path kernel (SP) [8], (3) graphlet kernel (GK) [70], (4) WeisfeilerLehman kernel (WL) [69], (5) deep graph kernel (DGK) [83], (6) multiscale Laplacian graph kernel (MLK) [38], (7) diffusion CNNs (DCNN) [4], (8) strong graph spectrums (SGS) [37], (9) family of graph spectral distances (F_GSD) [78], and (10) stochastic graphlet embedding (SGE) [21].
From the quantitative results shown in Table 2, it should be observed that for most datasets, the highest accuracy is achieved by one of the hierarchical configurations proposed by us, which sets a new stateoftheart results on all the datasets considered. Particularly, the best accuracies are obtained either by the pyramidal or the exhaustive configurations, which indicates the importance of considering hierarchical information for the graph embedding problem. As expected, the proposed hierarchical embeddings have achieved better performance than the SGE which is regarded as the baseline of our proposal. It should be observed that with this experimental setting, particularly the hierarchical configuration has performed quite poorly compared to the other two configurations. This fact might suggest that only hierarchical edges together with the connecting levels do not contain sufficient information for a robust graph representation. Information captured in the multiscale graphs thought to play a vital role for graph embedding, which is proved by the excellent performance obtained with the pyramidal and exhaustive configurations.
In Table 3, we demonstrate the results acquired by three different configurations of our proposed hierarchical embedding on the labeled graph datasets. For comparing with other stateoftheart methods, we have considered two additional techniques: (1) PATCHYSAN (PSCN) [55] and (2) graphlet spectrum (GS) [39]. Some of the previously considered stateoftheart techniques do not work with labeled graphs, so they have not been evaluated in this experimentation.
The results presented in Table 3 show that, except on the MUTAG dataset, our proposed hierarchical embedding techniques have achieved the best performances on all the other datasets. This demonstrates the usefulness of considering the hierarchical information for embedding graphs to a vector space. Contrary to the previous experiments on unlabeled datasets, in this case, the hierarchical configuration has performed reasonably better. This fact shows that on labeled graphs, the hierarchical edges together with the connecting levels might provide important structural information. Also, it is important to note that the level information also performed consistently on all the datasets.
7.2 Experiments on AIDS, GREC, COILDEL and histograph datasets
While the datasets considered in the previous set of experiments were mostly molecular in nature, the set of experiments to be discussed in this section consider graphs from various fields, such as, Biology, Computer Vision, Graphics Recognition and Handwriting Recognition. Underneath, we give a brief description of the datasets considered followed by the experimental setup, results and discussions.
7.2.1 Dataset description
In this experiment, we consider four different datasets; three of them viz. AIDS, GREC and COILDEL are taken from the IAM graph database repository^{Footnote 1} [60]. The first one, viz., the AIDS database consists of 2000 graphs representing molecular compounds which are constructed from the AIDS Antiviral Screen Database of Active Compounds.^{Footnote 2} This dataset consists of two classes, viz., active (400 elements) and inactive (1600 elements), which, respectively, represent molecules with possible activity against HIV. The GREC dataset consists of 1100 graphs representing 22 different classes (characterizing architectural and electronic symbols) with 50 instances per class; these instances have different noise levels. The COILDEL database includes 3900 graphs belonging to 100 different classes with 39 instances per class; each instance has a different rotation angle. The HistoGraph dataset^{Footnote 3} [74] consists of graphs representing words from the communicating letters written by the first US president, George Washington. It consists of 293 graphs generated from 30 distinct words. Therefore, given a word, the task of the classifier is to predict its class which should be among the 30 words. Nodes are only labeled with their position in the image. Furthermore, this dataset used six different graph representation paradigms for delineating a single word into a graph, which results in six different subsets of graphs. The entire dataset is divided into 90, 60 and 143 graphs, respectively, for train, validation and test purposes. See Table 4 for the relevant statistics on these four datasets.
7.2.2 Experimental setup
In this case as well, we have employed a CSVM solver [14] as a classifier. Since the datasets used in this set of experiments contain well defined train and test sets, we have reported the obtained accuracies on the test set of the respective datasets in Table 5.
7.2.3 Results and discussion
Similar to the experimental results obtained in the previous section, in this set of experiments as well, our proposed hierarchical embeddings have achieved the best results on most datasets. In this set of experiments, the leading scores are mostly obtained by the exhaustive configuration, which shows the effectiveness of combining multiscale structural information together with the hierarchical connections. For some datasets, our hierarchical embedding does not achieve the best results, but it has performed very competitively. This also proves the robustness of the hierarchical graph representation.
7.3 Discussion on the parameters involved in the algorithm
Our algorithm is mainly controlled by three different parameters: (1) the number of levelsL of the graph pyramid, (2) the reduction ratioR and (3) the maximum number of edgesT of a graphlet. For illustrating how these three parameters control the performance of the system, first we plot the classification accuracy by varying the levels of the graph pyramid (see Fig. 3), reduction ratio (see Fig. 4) and T (see Fig. 5). Here it is worth mentioning that for the sake of simplicity, for each level we just consider the maximum accuracy obtained by any configuration mentioned in Sect. 4.3. From Fig. 3, we can observe that for all the datasets, considering a second level together with the base graph increases the classification accuracy. However, the successive inclusion of hierarchical levels does not always increase the performance. It has been observed that for smaller graphs (with less number nodes and edges, e.g., the graphs from MUTAG), the further inclusion of hierarchical abstraction decreases the performance of the system; this means that for smaller graphs a higher level abstraction can introduce noise or distortion. The reduction ratio R directly decides the number of clusters in a given level, and hence the number of nodes in the next higher level of the hierarchy. For example, \(R=1\) indicates that the number of clusters should remain the same with the number of nodes, while \(R=2\) indicates that the number of clusters should be half the number of nodes in that level. Figure 4 shows the behavior of our method with different values of R while we have fixed \(L=2\). From these plots, one must observe that R is completely dependant on the datasets irrespective of the size of graphs they contain. For PTC, PROTEINS, and MAO datasets, the performance mostly increases with the increase of R, while for MUTAG, it improves until \(R=2\), and then it decreases for all hierarchical configurations. For MAO dataset, all the hierarchical configurations behave exactly in the same way with the increase of R, which might be because the smaller sized graphs on which the contribution of different hierarchical configuration is indistinguishable.
In Fig. 5, we show the performance trend on six datasets (i.e., MUTAG, PTC, PROTEINS, NCI1, and NCI109) only with the SGE algorithm, which is the baseline graph embedding technique that we considered. The hierarchical configurations are not considered in this case because they have different graphlet sizes in different hierarchical levels, so understanding their behavior would have been complicated. From Fig. 5, it is clear that increasing T mostly improves the performance of the system on all the datasets. Albeit, there are some exceptions (e.g., for PTC dataset, \(T=6\)), which suggests that graphlets with T edges are less informative for that particular graph dataset.
7.4 Discussion on the stochasticity of the algorithm
It is important to note that our proposed algorithm is stochastic in nature because of the involvement of the stochastic graphlet sampling and the subsequent graph embedding procedure. The graphlet sampling engaged here uniformly samples graphlets from a given population of graphs, and by the law of large numbers, this sampling guarantees that the empirical distribution of graphlets is asymptotically close to the actual distribution [58]. For demonstrating the fact that the stochastic behavior of our algorithm does not heavily impact on the experimental results, we repeated the last experiment on all the datasets considered for 10 iterations, and in each iteration, we randomly seeded the sampling algorithm. The mean and standard deviation of the classification accuracy obtained for each dataset is reported in Table 6. The mean accuracies reported in the table are quite close to the ones reported in Table 5, and the standard deviations are comparatively low (all of them are less than 1.0). This suggests that the proposed graph embedding technique, although employed a stochastic process, is consistent in terms of performance.
8 Conclusions
In this paper, we have proposed to enhance the information encoded in graph embeddings by means of hierarchical representations. We have experimentally validated that the abstract information is able to improve the graph classification performance. The embedding function is based on a stochastic sampling of graphlets to obtain the graphlet distribution within the graph. Graphlets of different sizes are considered to allow a change on the node context. Moreover, the hashing functions are used to identify graphlets in an efficient way. Event though considering different size graphlets provides robustness in terms of graph distortions, they still provide local information when we consider larger graphs. Therefore, building a graph hierarchy allows to increase the graphlet context without increasing the time needed for identifying the graphlet. In this work, we have carefully validated the performance of our approach in different application scenarios, showing that we outperform the stateoftheart approaches in the graph classification task using an SVM as a classifier.
Further research will focus on improving the hierarchical graph construction. Even though the Girvan–Newman algorithm is able to exploit the desired properties of the graph, creating clusterings that allow to create good abstractions, their time complexity is a drawback that should be studied when considering large graphs.
Notes
Available at http://www.fki.inf.unibe.ch/databases/iamgraphdatabase.
Available at http://www.histograph.ch.
References
Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM (1984) Pyramid methods in image processing. RCA Eng 29(6):33–41
Ahuja N, Todorovic S (2010) From region based image representation to object discovery and recognition. In: S+SSPR, vol 6218, pp 1–19
Almeida H, Guedes D, Meira W, Zaki MJ (2011) Is there a best quality metric for graph clusters? In: MLKDD, pp 44–59
Atwood J, Towsley D (2016) Diffusionconvolutional neural networks. In: NIPS, pp 1993–2001
Aziz F, Wilson R, Hancock E (2013) Backtrackless walks on a graph. IEEE Trans Neural Netw Learn Syst 24(6):977–989
Barbu E, Héroux P, Adam S, Trupin E (2005) Frequent graph discovery: application to line drawing document images. Electron Lett Comput Vis Image Anal 5(2):47–54
Bodic PL, Héroux P, Adam S, Lecourtier Y (2012) An integer linear program for substitutiontolerant subgraph isomorphism and its use for symbol spotting in technical drawings. Pattern Recognit 45(12):4214–4224
Borgwardt K, Kriegel HP (2005) Shortestpath kernels on graphs. In: ICDM, pp 74–81
Borzeshi EZ, Piccardi M, Riesen K, Bunke H (2013) Discriminative prototype selection methods for graph embedding. Pattern Recognit 46(6):1648–1657
Broelemann K, Dutta A, Jiang X, Lladós J (2012) Hierarchical graph representation for symbol spotting in graphical document images. In: S+SSPR, vol 7626. Springer, Berlin, pp 529–538
Broelemann K, Dutta A, Jiang X, Lladós J (2013) Hierarchical plausibilitygraphs for symbol spotting in graphical documents. In: GREC, pp 13–18
Bunke H, Riesen K (2010) Improving vector space embedding of graphs through feature selection algorithms. Pattern Recognit 44(9):1928–1940
Caelli T, Kosinov S (2004) An eigenspace projection clustering method for inexact graph matching. IEEE Trans Pattern Anal Mach Intell 26(4):515–519
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Comellas F, PazSánchez J (2008) Reconstruction of networks from their betweenness centrality. In: AEC. Springer, Berlin, pp 31–37
Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS, pp 1–14
Dupé F.X, Brun L (2010) Hierarchical bag of paths for kernel based shape classification. In: S+SSPR, pp 227–236
Dutta A, Lladós J, Bunke H, Pal U (2017) Product graphbased higher order contextual similarities for inexact subgraph matching. Pattern Recognit 76:596–611
Dutta A, Riba P, Lladós J, Fornés A (2017) Pyramidal stochastic graphlet embedding for document pattern classification. In: ICDAR, pp 33–38
Dutta A, Sahbi H (2019) Stochastic graphlet embedding. IEEE Trans Neural Netw Learn Syst 30(8):2369–2382
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
FeiFei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: CVPR, pp 524–531
Felzenszwalb P, Schwartz J (2007) Hierarchical matching of deformable shapes. In: CVPR, pp 1–8
Foggia P, Percannella G, Vento M (2014) Graph matching and learning in pattern recognition in the last 10 years. Int J Pattern Recognit Artif Intell 28(1):1–40
Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41
Gärtner T (2003) A survey of kernels for structured data. ACM SIGKDD Explor Newslett 5(1):49–58
Gentile C, Li S, Kar P, Karatzoglou A, Zappella G, Etrue E (2017) On contextdependent clustering of bandits. In: ICML, pp 1253–1262. JMLR.org
Gibert J, Valveny E, Bunke H (2012) Graph embedding in vector spaces by node attribute statistics. Pattern Recognit 45(9):3072–3083
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: ICML, pp 1263–1272
Girvan M, Newman M (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Horváth T, Gärtner T, Wrobel S (2004) Cyclic pattern kernels for predictive graph mining. In: KDD, pp 158–167
Jolion JM, Rosenfeld A (1994) A pyramid framework for early vision: multiresolutional computer vision. Kluwer Academic Publishers, Norwell
Jouili S, Tabbone S (2010) Graph embedding using constant shift embedding. In: ICPR, pp 83–92
Kashima H, Tsuda K, Inokuchi A (2004) Kernels for graphs. Kernel Methods Comput Biol 39(1):101–113
Kipf TN, Welling M (2017) Semisupervised classification with graph convolutional networks. In: ICLR, pp 1–10
Kondor R, Borgwardt KM (2008) The skew spectrum of graphs. In: ICML, pp 496–503
Kondor R, Pan H (2016) The multiscale Laplacian graph kernel. In: NIPS, pp 2982–2990
Kondor R, Shervashidze N, Borgwardt KM (2009) The graphlet spectrum. In: ICML, pp 529–536
Korda N, Szörényi B, Li S (2016) Distributed clustering of linear bandits in peer to peer networks. In: ICML
Kriege N, Mutzel P (2012) Subgraph matching kernels for attributed graphs. In: ICML, pp 1015–1022
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: IEEE
Lafferty J, Lebanon G (2005) Diffusion kernels on statistical manifolds. J Mach Learn Res 6:129–163
Li S, Chen W, Li S, Leung K (2019) Improved algorithm on online clustering of bandits. In: IJCAI
Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: SIGIR
Liu X, Lin L, Li H, Jin H, Tao W (2008) Layered shape matching and registration: Stochastic sampling with hierarchical graph representation. In: ICPR, pp 1–4
Luqman MM, Ramel JY, Lladós J, Brouard T (2013) Fuzzy multilevel graph embedding. Pattern Recognit 46(2):551–565
Marfil R, MolinaTanco L, Bandera A, Sandoval F (2007) The construction of bounded irregular pyramids with a unionfind decimation process. In: GbRPR, pp 307–318
Mehlhorn K (1984) Graph algorithms and NPcompleteness. Springer, New York
Mousavi SF, Safayani M, Mirzaei A, Bahonar H (2017) Hierarchical graph embedding in vector space by graph pyramid. Pattern Recognit 61:245–254
Neuhaus M, Bunke H (2004) An errortolerant approximate matching algorithm for attributed planar graphs and its application to fingerprint classification. In: S+SSPR, pp 180–189
Neuhaus M, Bunke H (2007) Bridging the gap between graph edit distance and kernel machines. World Scientific, Singapore
Newman MJ (2005) A measure of betweenness centrality based on random walks. Soc Netw 27(1):39–54
Niebles J, FeiFei L (2007) A hierarchical model of shape and appearance for human action classification. In: CVPR, pp 1–8
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: ICML, pp 2014–2023
Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. World Scientific, Hackensack
Pelillo M, Siddiqi K, Zucker SW (1999) Matching hierarchical structures using association graphs. IEEE Trans Pattern Anal Mach Intell 21(11):1105–1120
Pržulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):e177
Riba P, Lladós J, Fornés A (2017) Errortolerant coarsetofine matching model for hierarchical graphs. In: International workshop on graphbased representations in pattern recognition. Springer, pp 107–117
Riesen K, Bunke H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: S+SSPR, pp 287–297
Riesen K, Bunke H (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image Vis Comput 27(7):950–959
Riesen K, Bunke H (2009) Graph classification by means of Lipschitz embedding. IEEE Trans Syst Man Cybern Part B 39(6):1472–1483
Riesen K, Neuhaus M, Bunke H (2007) Bipartite graph matching for computing the edit distance of graphs. In: Escolano F, Vento M (eds) Graphbased representations in pattern recognition, LNCS, vol 4538. Springer, Berlin, pp 1–12
RoblesKelly A, Hancock ER (2007) A riemannian approach to graph embedding. Pattern Recognit 40(3):1042–1056
Saund E (2013) A graph lattice approach to maintaining and learning dense collections of subgraphs as image features. IEEE Trans Pattern Anal Mach Intell 35(10):2323–2339
Schellewald C, Schnörr C (2005) Probabilistic subgraph matching based on convex relaxation. In: EMMCVPR, pp 171–186
Serratosa F, Alquézar R, Sanfeliu A (2000) Efficient algorithms for matching attributed graphs and functiondescribed graphs. In: International conference on pattern recognition, vol 2, pp 867–872
Shervashidze N, Borgwardt K.M (2009) Fast subtree kernels on graphs. In: NIPS, pp 1660–1668
Shervashidze N, Schweitzer P, van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) WeisfeilerLehman graph kernels. J Mach Learn Res 12:2539–2561
Shervashidze N, Vishwanathan SVN, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: AISTATS, pp 488–495
Shokoufandeh A, Macrini D, Dickinson S, Siddiqi K, Zucker S (2005) Indexing hierarchical structures using graph spectra. IEEE Trans Pattern Anal Mach Intell 27(7):1125–1140
Smola AJ, Kondor R (2003) Kernels and regularization on graphs. In: COLT, pp 144–158
Solnon C (2010) All differentbased filtering for subgraph isomorphism. Artif Intell 174(12–13):850–864
Stauffer M, Fischer A, Riesen K (2016) A novel graph database for handwritten word images. In: S+SSPR, pp 553–563
Suh Y, Adamczewski K, Mu Lee K (2015) Subgraph matching using compactness prior for robust feature correspondence. In: CVPR
Ulrich M, Wiedemann C, Steger C (2012) Combining scalespace and similaritybased aspect graphs for fast 3d object recognition. IEEE Trans Pattern Anal Mach Intell 34(10):1902–1914
Vento M (2015) A long trip in the charming world of graphs for pattern recognition. Pattern Recognit 48(2):291–301
Verma S, Zhang ZL (2017) Hunt for the unique, stable, sparse and fast feature learning on graphs. In: NIPS, pp 87–97
Vishwanathan SVN, Schraudolph NN, Kondor R, Borgwardt KM (2010) Graph kernels. J Mach Learn Res 11:1201–1242
Watkins C (1999) Kernels from matching operations. Technical report, Computer Science Department, University of London
Weissman T, Ordentlich E, Seroussi G, Verdu S, Weinberger MJ (2003) Inequalities for the l1 deviation of the empirical distribution. Technical report, HP Labs, Palo Alto
Wilson R, Hancock E, Luo B (2005) Pattern vectors from algebraic graph theory. IEEE Trans Pattern Anal Mach Intell 27(7):1112–1124
Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: KDD, pp 1365–1374
Acknowledgements
This work has been partially supported by the European Union’s research and innovation program under the Marie SkłodowskaCurie Grant Agreement No. 665919 (PSPHERE project), the Spanish projects RTI2018102285AI00 and RTI2018095645BC21, the FPU fellowship FPU15/06264 from the Spanish Ministerio de Educación, Cultura y Deporte, the Ramon y Cajal Fellowship RYC20141683, and the CERCA Program/Generalitat de Catalunya. Anjan Dutta was a MarieCurie Fellow (under the PSPHERE Project) at the Computer Vision Center of Barcelona, where most of the work was done and the paper was written.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Anjan Dutta, Pau Riba, Josep Lladós and Alicia Fornés declare that they do not have any conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Dutta, A., Riba, P., Lladós, J. et al. Hierarchical stochastic graphlet embedding for graphbased pattern recognition. Neural Comput & Applic 32, 11579–11596 (2020). https://doi.org/10.1007/s00521019046427
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521019046427
Keywords
 Graph embedding
 Hierarchical graph
 Stochastic graphlets
 Graph hashing
 Graph classification