Hierarchical stochastic graphlet embedding for graph-based pattern recognition

Dutta, Anjan; Riba, Pau; Lladós, Josep; Fornés, Alicia

doi:10.1007/s00521-019-04642-7

Hierarchical stochastic graphlet embedding for graph-based pattern recognition

Original Article
Open access
Published: 06 December 2019

Volume 32, pages 11579–11596, (2020)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Hierarchical stochastic graphlet embedding for graph-based pattern recognition

Download PDF

Anjan Dutta ORCID: orcid.org/0000-0002-1667-2245¹^na1,
Pau Riba²^na1,
Josep Lladós² &
…
Alicia Fornés²

2605 Accesses
12 Citations
2 Altmetric
Explore all metrics

Abstract

Despite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable because of the lack of mathematical operations defined in graph domain. Graph embedding, which maps graphs to a vectorial space, has been proposed as a way to tackle these difficulties enabling the use of standard machine learning techniques. However, it is well known that graph embedding functions usually suffer from the loss of structural information. In this paper, we consider the hierarchical structure of a graph as a way to mitigate this loss of information. The hierarchical structure is constructed by topologically clustering the graph nodes and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure is constructed, we consider several configurations to define the mapping into a vector space given a classical graph embedding, in particular, we propose to make use of the stochastic graphlet embedding (SGE). Broadly speaking, SGE produces a distribution of uniformly sampled low-to-high-order graphlets as a way to embed graphs into the vector space. In what follows, the coarse-to-fine structure of a graph hierarchy and the statistics fetched by the SGE complements each other and includes important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, obtaining a more robust graph representation. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the state-of-the-art methods.

Graph Embedding in Vector Spaces Using Matching-Graphs

Graph Embedding Through Probabilistic Graphical Model Applied to Symbolic Graphs

Graph Classification Based on Sparse Graph Feature Selection and Extreme Learning Machine

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Graph-based methods have been very successful for pattern recognition, computer vision and machine learning tasks [16, 25, 77]. However, due to their symbolic and relational nature, graphs have some limitations if we compare them with the traditional statistical (vector-based) representations. Some trivial mathematical operations do not have an equivalence in the graph domain. For example, computing pairwise sums or products (which are elementary operations in many classification and clustering algorithms) is not defined in a standard way in the graph domain. In the literature, a possible way this problem has been addressed is by means of embedding functions. Given a graph space ${\mathbb {G}}$, an explicit embedding function is defined as $\varphi :{\mathbb {G}}\rightarrow {\mathbb {R}}^n$ which maps a given graph to a vector representation [12, 29, 47, 65, 68] whereas an implicit embedding function is defined as $\varphi :{\mathbb {G}}\rightarrow {\mathcal {H}}$ which maps a given graph to a high-dimensional Hilbert space ${\mathcal {H}}$ where a dot product defines the similarity between two graphs $K(G,G')=\langle \varphi (G),\varphi (G') \rangle$, $G,G'\in {\mathbb {G}}$ [18, 27, 32, 35]. In the graph domain, the process of implicitly embedding graph is termed as graph kernel which basically defines a way to compute the similarity between two graphs. However, defining such embedding functions is extremely challenging, when the constraints on time efficiency and preserving the underlying structural information is concerned. The problem becomes even more difficult with the growing size of graphs, as the structural complexity increases the possibility of noise and distortion in structure, and raises risk of loosing information. Hierarchical representation is often used as a way to deal with noise and distortion [50, 76], which provides a stable delineation for an underlying object. Hierarchical representations allow to incrementally contract the graph, in a space-scale representation, so the salient features (relevant subgraphs) remain in the hierarchy. Thus, top levels become a compact and stable summarization.

Processing information using a multiscale representation is successfully employed in computer vision and image processing algorithms, which is mostly inspired by its resemblance with human visual perception [1]. It is observed that a naturalistic visual interpretation always demands a data structure able to represent scattered local information as well as summarized global facts [33]. Hierarchical representation is often used as a paradigm to efficiently extract the global information from the local features. Apart from that, hierarchical models are also believed to provide time- and space-efficient solutions [76]. Motivated by the above-mentioned intuition and the existing works in the related fields, many authors have come up with different hierarchical graph structures for solving various problems [22, 23, 48, 76]. In this sense, it is worth to mention the work of Mousavi et al. [50], who presented a hierarchical framework for graph embedding, although they did not explore the complex encoding of the hierarchy.

In this paper, motivated by the successes of the hierarchical models and the efficiency of graph embedding theory, we propose a general hierarchical graph embedding formulation that first creates a hierarchical structure from a given graph and then utilizes the multiscale structure to explicitly embed a graph in a real vector space by means of local graphlets. First, we make use of the graph clustering algorithm proposed in [31] to obtain a hierarchical graph representation of a given input graph. Here, each cluster of nodes in a level i is depicted as a single node in the upper hierarchical level $i+1$, whereas the edges in a level are connected depending on the original topology of the base graph, and the hierarchical edges are created by joining a node representing a cluster to all the nodes in the lower level. Thus, we propose a richer encoding than Mousavi [50], because our hierarchy not only contains different graph abstractions but also encodes useful hierarchical contractions through the hierarchical edges.

Once the hierarchical structure of a graph is created, we propose a novel use of the Stochastic Graphlet Embedding (SGE) [21] to exploit this hierarchical information. On the one hand, we can exploit the local configuration in form of graphlets thanks to the SGE design, because graphlets provide information at different neighborhood sizes. On the other hand, the hierarchical connections allow to encode more abstract information and hence to deal with noise present in the data. As a result, the Hierarchical Stochastic Graphlet Embedding (HSGE) encodes a global and compact representation of the graph that is embedded in a vector space. The consideration of the entire graph hierarchy for the embedding instead of only the base graph empowers the representation ability and handles the loss of information that usually occurs in graph embedding methods. Moreover, the statistics obtained from the uniformly sampled graphlets of increasing size model the complex interactions among different object parts represented as graph nodes. Here, the hierarchical graph structure and the statistics of increasing sized graphlets fetch important structural information of varied contexts.

As a result, our approach produces robust representations that can benefit from the advantages of the two above-mentioned strategies: we first take advantage of the embedding ability for mapping symbolic relational representations to n-dimensional spaces, so machine learning approaches can be used; and second, the ability of hierarchical structures to reduce noise and distortion inherently involved in graph representations of real data, keeping the more stable and relevant substructures in a compact way.

In conclusion, the main contribution of our work is the exploitation of the hierarchical structure of a given graph, rather than only studying the base graph for graph embedding purposes. Assessing the hierarchical information of a graph pyramid allows to extend the representation power of the embedded graph and tolerate the instability caused due to noise and distortion. Our proposal is robust because, on the one hand, it organizes the structural information in the hierarchical abstraction, and on the other hand, it considers the relation between object parts and their complex interactions with the help of uniformly sampled graphlets of unbounded size. Additionally, the proposed method is generic and can adapt any other graph embedding algorithm in the framework. In this sense, we extensively validated our proposed algorithm on many different benchmark graph datasets coming from different application domains.

The rest of this paper is organized as follows: Sect. 2 describes the related works in the literature. In Sect. 3, we introduce some definitions and notations related to the work. Our generic hierarchical graph representation is presented in Sect. 4. Section 5 introduces the Stochastic Graphlet Embedding as the base embedding we will use. Afterward, Sect. 7 reports our experimental validation and compares the proposed method with available state-of-the-art algorithms. Finally, in Sect. 8 we draw the conclusions and describe the future direction of the present work.

2 Related work

In what follows, we review the related works, respectively, on explicit and implicit graph embedding techniques, different hierarchical models and graph summarization methods, which we believed to be relevant to the main focus of the present paper.

2.1 Graph embedding

Graph embedding methods are mainly divided into two different categories: (1) explicit graph embedding, (2) implicit graph embedding or graph kernel.

2.1.1 Explicit graph embedding

Explicit graph embedding refers to those techniques that aim to explicitly map graphs to vector spaces. The methods belonging to this category can be further divided into four different classes. The first one, known as graph probing [47], needs measuring the frequency of specific substructures (that capture content and topology) into graphs. Based on different graph substructures (e.g., node, edge, subgraph etc.) considered, different embedding techniques have been proposed. For example, Shervashidze et al. [68] studied the non-isomorphic graphlets, albeit, node label and edge relation statistics are considered by Gibert et al. [29]. Saund in [65], introduced a bottom up graph lattice in order to efficiently extract the subgraph features in preprocessed administrative documents, while Dutta and Sahbi [21] proposed a distribution of stochastic graphlets for embedding graphs into a vector space. The second class of graph embedding techniques is based on spectral graph theory [13, 34, 37, 39, 64, 82], which aims to analyze the structural properties of graphs in terms of the eigenvectors/eigenvalues of the adjacency or Laplacian matrices of a graph [82]. Recently, Verma and Zhang [78] proposed a family of graph spectral distances for robust graph feature representation. Despite their relative successes, spectral methods are quite prone to structural noise and distortions. The third class of methods is inspired by dissimilarity measurements proposed in [56]; in this context, Bunke and Riesen have presented several works on the vectorial description of a given graph by its distances to a number of pre-selected prototype graphs [9, 12, 62, 63]. Motivated by the recent advancements of deep learning and neural networks, many researchers have proposed to utilize neural network for obtaining a vectorial representation of graphs [4, 17, 30, 36, 55], which results in the fourth category of methods, called geometric deep learning.

2.1.2 Implicit graph embedding

Implicit graph embedding or graph kernel methods is primarily another way to embed graphs into a vector space. They are also popular for the ability to efficiently extend the existing machine learning algorithms to nonlinear data, such as, graphs, strings etc. Graph kernel methods can be roughly divided into three different categories. The first one, known as diffusion kernel, is based on the similarity measures among the subparts of two graphs, and propagating them on the entire structure to obtain global similarity measure for two graphs [43, 72]. The second class of methods, called as convolution kernel, aims to measure the similarity of composite objects (modeled with graph) from the similarity of their parts (i.e., nodes) [80]. This type of graph kernel derives the similarity between two graphs G, $G'$ from the sum, over all decompositions, of the similarity products of the subparts of G and $G'$ [52]. Recently, Kondor and Pan [38] proposed multiscale Laplacian graph kernel having the property of lifting a base kernel defined on the vertices of two graphs to a kernel between graphs. The third class of methods is based on the analysis of the common substructures that belong to both graphs and is termed as substructure kernel. This family includes the graph kernel methods that consider random walks [27, 79], backtrackless walks [5], shortest paths [8], subtrees [68], graphlets [70] as the substructure. Different from the above three categories, Shervashidze et al. [69] proposed a family of efficient graph kernels on the Weisfeiler-Lehman test of graph isomorphism, which maps the original graph to a sequence of graphs. More recently, inspired by the successes of deep learning, Yanardag and Viswanathan [83] presented a unified framework to learn latent representations of substructures for graphs. They claimed that given a pre-computed kernel of graphs, their proposed technique produces an improved representation that leverages hidden representations of substructures.

2.2 Hierarchical graph representation

In general, hierarchical models have been successfully employed in many different domains within the computer vision and image processing field, such as, image segmentation [22, 48], scene categorization [23], action recognition [54], shape classification [18], graphic recognition [10], 3D object recognition [76] etc. These approaches usually exploit some kind of pyramidal structure containing information at various resolutions. Usually, at the finest level of the pyramid, the captured information is related to local features, whereas, at coarser levels, global aspects of the underlying data are represented. This way of representation helps to interpret knowledge in a naturalistic way [33].

Inspired by the above intuition, hierarchical structures are often employed to extract coarse-to-fine information from a graph representation. Pelillo et al. [57] proposed to match two hierarchical structures as a clique detection problem on their association graph, which was solved with a dynamic programming approach. In [71], Shokoufandeh et al. presented a spectral characterization based framework for indexing hierarchical structures that embed the topological information of a directed acyclic graph. Hierarchical representation of objects and an elastic matching procedure are also proposed from deformable shape matching in [24]. In [46], Liu et al. utilized hierarchical graph representation and a stochastic sampling strategy for layered shape matching and registration problem. A graph kernel based on hierarchical bag-of-paths where each path is associated to a hierarchy encoding successive simplifications is presented in [18]. Ahuja and Todorovic [2] used a hierarchical graph of segmented regions for object recognition. Motivated by them, Broelemann et al. [10, 11] proposed two closely related approaches based on hierarchical graph for error-tolerant matching of graphical symbols. Mousavi et al. [50] proposed a graph embedding strategy based on hierarchical graph representation, which considers different levels of a graph pyramid. They claimed that the proposed framework is generic enough to incorporate any kind of graph embedding technique. However, the authors did not take advantage of the complex and rich encoding of hierarchy.

From the literature review, we can conclude that although there are some works in the graph domain exploiting the hierarchical graph structure, most of them are focused on some kind of error tolerance or elastic matching. Utilization of this type of multiscale representation of graph for vector space embedding is quite rare and has not been properly explored yet. This fact has worked as our motivation to work on a graph hierarchical structure for explicit graph embedding task.

3 Definitions and notations

In this section, we introduce some definitions and notations, which are relevant to the proposed work.

Definition 1

(Attributed Graph) An attributed graph is a $4-\text {tuple}$$G=(V,E,L_V,L_E)$ comprising a set V of vertices together with a set $E\subseteq V\times V$ of edges and two mappings$L_V:V\rightarrow {\mathbb {R}}^m$ and $L_E:E\rightarrow {\mathbb {R}}^n$ which, respectively, assign attributes to the nodes and edges.

Attributed graphs have been widely used for all sort of real-world problems. The most common methodologies are error-tolerant graph matching [51, 67], graph kernels and embedding techniques [41].

Definition 2

(Subgraph) Given an attributed graph $G=(V,E,L_V,L_E)$, another attributed graph $G'=(V',E',L_V',L_E')$ is said to be a subgraph of G and is denoted by $G'\subseteq G$ iff,

$V'\subseteq V$
$E'=E\cap V'\times V'$
$L_V'(u)=L_V(u)$, $\forall u \in V'$
$L_E'(e)=L_E(e)$, $\forall e \in E'$

A graphletg of G is nothing but a subgraph which inherits the topology and the attributes of G. In the literature, subgraphs are often used for error-tolerant matching [7, 19, 66, 73, 75] and frequent pattern discovery problems [2, 6, 42].

Definition 3

(Hierarchical graph) A hierarchical graphH is defined as a 6-tuple $H=(V,E_N,E_H,L_V,\, L_{E_N},L_{E_H})$ where V is the set of nodes; $E_N \subseteq V\times V$ are the neighborhood edges; $E_H \subseteq V\times V$ are the hierarchical edges; ${\text {L}}_{{\text {V}}}$, ${\text {L}}_{{{\text {E}}}_{{\text {N}}}}$ and ${\text {L}}_{{{\text {E}}}_{{\text {H}}}}$ are three labeling functions defined as ${\text {L}}_{{{\text {V}}}}:V \rightarrow \Sigma _V \times A^k_V$, ${\text {L}}_{{{\text {E}}}_{{\text {N}}}}: E_N \rightarrow \Sigma _{E_N} \times A^l_{E_N}$ and ${\text {L}}_{{{\text {E}}}_{{\text {H}}}}: E_H \rightarrow \Sigma _{E_H} \times A^m_{E_H}$, where $\Sigma _V$, $\Sigma _{E_N}$ and $\Sigma _{E_H}$ are three sets of symbolic labels for vertices and edges, $A_V$, $A_{E_N}$ and $A_{E_H}$ are three sets of attributes for vertices and edges, respectively, and $k,l,m\in {\mathbb {N}}$.

Prior works used hierarchical structures for allowing a reasonable tolerance in the representation paradigm [11, 18, 24] and also for bringing robustness in the feature representation [46].

4 Hierarchical embedding

In the literature, only few embedding approaches exploit the idea of multiscale or abstraction information [38]. This section is devoted to provide a framework able to include this information given a graph embedding. Some works that have been proposed to exploit the mentioned multiscale information in the literature [20, 50, 59] discard the hierarchical information provided by the hierarchical edges and focus on abstractions of the original graph.

4.1 Graph clustering

Graph clustering has been widely used in several fields such as social and biological networks [31], recommendation systems [28, 44] etc. It can be roughly described as the task of grouping graph nodes into clusters depending on the graph structure. Ideally, the grouping should be performed in such a way that intra-cluster nodes are densely connected whereas the connections among inter-cluster nodes are sparse. For example, Girvan and Newman [31] propose a graph clustering algorithm to detect a community structures for studying social and biological networks. Li et al. [28, 40, 44, 45] have proposed several graph clustering techniques for recommendation systems based on different strategies: context awareness [28], inclusion of frequency property [44], distributed clustering confidence [40], etc. Here we do not further review on graph clustering algorithms since it is not within the main scope of this paper. However, we would like to remark that one of the most important aspects of graph clustering is the evaluation of cluster quality, which is crucial not only to measure the effectiveness of clustering algorithms, but also to give insights on the dynamics of relationships in a given graph. For a detailed overview on effective graph clustering metrics, the interested readers are referred to [3].

Even though any graph clustering algorithm can be used, we use the standard divisive-based Girvan–Newman algorithm [31] for our purpose, because it provides structurally meaningful clusters of a given graph. The Girvan–Newman algorithm is an intuitive and well-known algorithm used for community detection in complex systems. It is a global divisive algorithm which removes the appropriate edge iteratively until all the edges are deleted. At each iteration, new clusters can emerge by means of connected components. The idea is that the edges with higher centrality are the candidates to be connecting two clusters. Therefore, betweenness centrality measure of the edges [26] is used to decide which edge is being removed. Betweenness centrality on an edge $e \in E$ is defined as the number of shortest walks between any pair of nodes that cross e. The output of this algorithm is a dendrogram codifying a hierarchical clustering of nodes. This algorithm consists of 4 steps:

1.
Calculate the betweenness centrality for all edges in the network.
2.
Remove the edge with highest betweenness and generate a cluster for each connected component.
3.
Recalculate betweennesses for all edges affected by the removal.
4.
Repeat from step 2 until no edges remain.

In this work, Girvan–Newman algorithm is early stopped given a reduction ratio $r \in {\mathbb {R}}$. Therefore, the number of clusters is forced to be $\lfloor r \cdot |V| \rfloor$.

4.2 Hierarchical construction

Given a graph G and a clustering $C = \{C_1,\ldots ,C_k\}$, each cluster is summarized into a new node with a representative label (see line 5). Let us consider that this label can be defined as the result of an embedding function applied to the subgraph defined by the clustered nodes and their edges. Moreover, edges between the new nodes are created depending on a connection ratio between clusters. That means that an edge is only created if there are enough connections between the set of nodes defined by both clusters (see line 7). Finally, hierarchical edges are created connecting the new node $v_{C_i}$ with all the nodes belonging to the summarized cluster $C_i$ (see line 12). The proposed hierarchical construction is similar to the one proposed by Mousavi et al. [50] but including explicitly the summarization generated by the clustering algorithm by means of the hierarchical edges. Thus, the proposed hierarchical construction obtains a representation which encodes abstract information by means of the clusters while keeping the relation with the original graph.

Let us introduce some notations that will be used in the following sections. Given a graph G and a number of levels L, $H_G$ denotes their corresponding hierarchical graph computed from G with L levels. $H_G^l$, where $l = \{0,\ldots ,L\}$ is a graph without hierarchical edges corresponding to the l level of summarization, therefore, $H_G^0 = G$. Moreover, $H_G^{l_1,l_2}$ where $l_i = \{0,\ldots ,L\}$ and $l_1\le l_2$, corresponds to the hierarchical graph compressed between levels $l_1$ and $l_2$. Hence, $H_G = H_G^{0,L}$ and $H_G^l = H_G^{l,l}$. Finally, $H_G^{l_1} \cup H_G^{l_2}$ corresponds to the union of two graphs without hierarchical edges.

Figure 1a shows the construction of the hierarchy given a graph G. Each level shows an abstraction of the input graph where the nodes have been reduced.

4.3 Hierarchical embedding

This section introduces a novel way to encode hierarchical information of a graph into an embedding. Moreover, the proposed technique is generic in the sense that can be used by any graph embedding function.

Given a graph G which should be mapped into a vectorial space and an embedding function $\varphi :{\mathbb {G}}\rightarrow {\mathbb {R}}^n$, we first proceed to obtain hierarchical representation $H_G$ following the proposed methodology in Sect. 4.2. Therefore, $H_G$ has enriched the original graph with abstract information considering L levels. Finally, we propose to make use of the hierarchical information to construct a hierarchical embedding. The general form of the proposed embedding takes into account graphs at multiple scales and hierarchical relations. Thus, the embedding function does not only compactly encode the contextual information of nodes at different abstraction levels, but also it encodes the hierarchy contraction. The embedding function is defined as follows:

$$\begin{aligned} \begin{aligned} \Phi (H_G) = [&\varphi (H_G^0),\ldots ,\varphi (H_G^K), \\&\phi _1^1(H_G),\ldots ,\phi _1^{k_1}(H_G), \\&\phi _2^1(H_G), \ldots , \phi _2^{k_2}(H_G) ] \end{aligned} \end{aligned},$$

(1)

where

$$\begin{aligned} \phi _1^k(H_G)= & {} [ \varphi (H_G^{0,k}),\ldots ,\varphi (H_G^{K-k,K})] \end{aligned}$$

(2)

$$\begin{aligned} \phi _2^k(H_G)= & {} [ \varphi (H_G^0 \cup \cdots \cup H_G^{k}),\ldots ,\varphi (H_G^{K-k} \cup \cdots \cup H_G^K)] \end{aligned}$$

(3)

where $K \le L$ are the hierarchical levels taken into account and $k_1,k_2 \le K$ indicate the number of levels taken into account at the same time. Note that $K=L$, $k_1=K$ and $k_2=K$ will take into account the whole hierarchy and possible combinations. From this general representation of the proposed embedding, we have evaluated some particular cases (the reader is referred to Sect. 7 for more details on the experimental evaluation).

Baseline embedding This embedding is the one used as a baseline. In this scenario $K=0$, $k_1=0$ and $k_2=0$, therefore $\Phi (H_G) = \varphi (H_G^0)$. No abstract information is taken into consideration, hence, $\Phi (H_G) = \varphi (G)$.

Pyramidal embedding This embedding has been previously proposed in the literature [20, 50]. It combines information of the abstract levels of the graph, i.e., $H_G^i$ not taking into account hierarchical information. Therefore, the hierarchical edges are discarded and no relation between levels is considered, $K\ge 1$, $k_1=0$ and $k_2=0$. We define $\Phi _{\text {pyr}}(H_G) = [\varphi (H_G^0),\ldots ,\varphi (H_G^K)]$. Note that each element corresponds to independent levels of the hierarchy without hierarchical edges.

Generalized pyramidal embedding Following the previous idea, the information of the abstract levels of the graph, i.e., $H_G^i$ is combined. Now, hierarchical information is taken into account by embedding unions of levels, i.e., $H_G^{i_1} \cup H_G^{i_2}$ but discarding hierarchical edges (no clustering information is taken into account). In this scenario $K\ge 1$, $k_1=0$ and $k_2\ge 1$, therefore, we define $\Phi _{\text {gen}\_\text {pyr}}(H_G) = [\varphi (H_G^0),\ldots ,\varphi (H_G^K),\varphi (H_G^0 \cup H_G^1),\ldots ,\varphi (H_G^{K-1} \cup H_G^K), \ldots , \varphi (H_G^0 \cup \cdots \cup H_G^{k_2}),\ldots ,\varphi (H_G^{K-k_2} \cup \cdots \cup H_G^K)]$.

Hierarchical embedding This embedding is computed mixing different levels considering them as a single graph through the hierarchical edges, $K \ge 1$, $k_1 \ge 1$ and $k_2=0$. The idea is to create an embedding able to codify both, graph and clustering information. Depending on the embedding, hierarchical edges can make use of special label to treat them differently. The hierarchial embedding is defined as $\Phi _{\text {hier}}(H_G) = [\varphi (H_G^0),\ldots ,\varphi (H_G^K),\varphi (H_G^{0,1}),\ldots ,\varphi (H_G^{K-1,K}) ,\ldots , \varphi (H_G^{0,k_1}), \ldots ,\varphi (H_G^{K-k_1,K})]$. Note that each element corresponds to the subhierarchy compressed between the specified levels.

Exhaustive embedding Finally, in order to take into consideration the whole hierarchy, we can make use of the whole embedding $\Phi$ as defined in Eq. (1) where $K \ge 1$, $k_1, k_2 \ge 1$.

Figure 1b shows the graphs taken into consideration when the hierarchical embeddings are computed.

5 Stochastic graphlet embedding

The Stochastic Graphlet Embedding (SGE) can be defined as a function $\varphi :{\mathbb {G}} \rightarrow {\mathbb {R}}^n$ that explicitly embeds a graph $G\in {\mathbb {G}}$ to a high-dimensional vector space ${\mathbb {R}}^n$ [21]. The entire procedure of SGE can be described in two stages (see Fig. 2), where in the first step, the method samples graphlets from G in a stochastic manner and in the second step, it counts the frequency of each isomorphic graphlet from the extracted ones in an approximated but near accurate manner. The entire procedure fetches a precise distribution of connected graphlets with increasing number of edges in G with a controlled complexity, which fetches the relation among information represented as nodes and their complex interaction.

5.1 Stochastic graphlets sampling

Considering a graph $G=(V,E,L_V,L_E)$, the goal of the graphlet extraction procedure is to obtain statistics of stochastic graphlets with increasing number of edges in G. The way of extracting graphlets is stochastic and it uniformly samples graphlets with boundlessly increasing number of edges without constraining their topology or structural properties such as maximum degree, maximum number of nodes, etc. Our graphlet sampling procedure, outlined in Algorithm 2, is recurrent and the number of recurrences is controlled by a parameter M that indicates the number of distinct graphlets to be sampled (see line 2 of Algorithm 2). Also, each of these M recurrent processes is regulated by another parameter T that denotes the maximum number of iterations a single recurrent process should have (see line 5). Since each of these iterations adds an edge to the presently constructing graphlet, T indirectly specifies the maximum number of distinct edges each graphlet should contain. Considering $U_t$ and $A_t,$ respectively, as the aggregated sets of visited nodes and edges till iteration t, they are initialized at the beginning of each recurrent step as $A_0=\emptyset$ and $U_0=\lbrace u \rbrace$ with a randomly selected node u which is uniformly sampled from V (see line 4). Thereafter, at tth iteration (with $t\ge 1$), the sampling procedure randomly selects an edge $(u,v)\in E \backslash A_{t-1}$ that is connected from any node $u\in U_{t-1}$ (see line 7). Accordingly, the process updates $U_t \leftarrow U_{t-1} \cup \lbrace v \rbrace$ and $A_{t} \leftarrow A_{t-1} \cup \lbrace (u,v) \rbrace$ (see line 8). All these processes within a recurrent step are repeated T times to sample a graphlet with maximum T edges. M is set to relatively large values in order to make the graphlet generation statistically meaningful. Theoretically, the values of M are guided by the theorem of sample complexity [81], which is widely studied and used in the Bioinformatics domain [58, 70]. However, the discussion and proof of that is out of scope of the current paper. Intuitively, the graphlet sampling procedure explained in this section follows a random walk process with restart that efficiently parses G and extracts the desired number of connected graphlets with an increasing number of edges. This algorithm allows to sample connected graphlets from a given graph but avoids expensive way of extracting them in an exact manner. Here the hypothesis is that if a sufficient number of graphlets are sampled, then the empirical distribution will be close to the actual distribution of graphlets in the graph. Furthermore, it is important to note that from the above process, one can extract, in total, $M \times T$ graphlets each with number of edges varying from 1 to T.

5.2 Hashed graphlets distribution

For obtaining a distribution of the extracted graphlets from G, it is needed to identify sets of isomorphic graphlets from the sampled ones and then count cardinality of each isomorphic set. A trivial way of doing that certainly involves checking the graph isomorphism for all possible pairs of graphlets for detecting possible partitions that might exist among them. Nevertheless, graph isomorphism is a GI-complete problem [49] for general graphs, so the previously mentioned scheme is extremely costly as the method samples huge number of graphlets with many edges. An alternative, efficient and approximate way of partitioning isomorphic graphlets is graph hashing. A graph hash function that can be defined as a mapping $h:{\mathbb {G}} \rightarrow {\mathbb {R}}^m$ that maps a graph into a hash code (a sequence of real numbers) based on the local as well as holistic topological characteristic of graphs. An ideal graph hash function should map two isomorphic graphs to the same hash code as well as two non-isomorphic graphs to two different hash codes. While it is easy to design hash functions satisfying the condition that two isomorphic graphs should have the same hash code, it is extremely difficult to find hash function that ensures different hash codes for every pair of non-isomorphic graphs. An alternative is to design graph hash functions with low collision probability, i.e., mapping any two non-isomorphic graphs to the same hash code with a very low probability. For obtaining a distribution of graphlets, the main aim of graph hashing is to assign extracted graphlets from G to corresponding subsets of isomorphic graphlets (a.k.a. partition index or histogram bins) in order to count and quantify their distributions. The proposed mechanism for obtaining the distribution of uniformly sampled graphlets, outlined in Algorithm 3, maintains a global hash table ${\mathbf {H}}$, whose single entry corresponds to a hash code of a graphlet g produced by the graph hash function. ${\mathbf {H}}$ grows incrementally as the algorithm confronts new graph hash codes and maintains all the unique hash codes encountered by the system. It is to be noted that the position of each unique hash code is kept fixed, because each position corresponds to a partition index or histogram bin. Now to allocate a given graphlet g to its corresponding histogram bin, its hash code h(g) is mapped to the index of the hash table ${\mathbf {H}}$, whose corresponding graph hash code gives a hit with h(g) (see line 8). If h(g) does not exist in ${\mathbf {H}}$ at some instance, it is considered as a new hash code (and hence g as a new graphlet) encountered by the system and appended h(g) at the end of ${\mathbf {H}}$ (see line 6).

Designing hash functions that yield identical hash codes for two isomorphic graphlets is quite simple, whereas, prototyping those providing two distinct hash codes for two non-isomorphic graphs is very challenging. The chance of mapping two non-isomorphic subgraphs to the same hash code is termed as probability of collision. Indicating $H_0$ as the set of all pairs of non-isomorphic graphs, the probability of collision can be expressed as the following energy function:

$$\begin{aligned} E(f) = P((g,g') \in H_0 \quad | \quad h(g) = h(g')) \end{aligned}$$

(4)

So, in terms of collision probability, the hash functions that produce comparatively lower E(f) values in Eq. (4) are considered to be more reliable for checking the graph isomorphism. It has been studied that sorted degree of nodes has 0 collision probability for all graphs with number of edges less or equal to 4 [21]. Moreover, it is also a well-known fact that two graphs with the same betweenness centrality (sorted) would indeed be isomorphic with high probability [15, 53]. For example, sorted betweenness centrality has collision probabilities equal to $3.2e^{-4}$, $1.9e^{-4}$, $1.1e^{-4},$ respectively, for graphlets with 7, 8 and 9 edges. Interested readers are requested to see [21] for further discussions and analysis on various graph hash functions and corresponding elaboration on probability of collision. Considering the above facts, in this work, we consider sorted degree of nodes for graphlets with $t\le 4$ and the betweenness centrality for graphlets with $t\ge 5$.

$$\begin{aligned} \text {Hash function}= {\left\{ \begin{array}{ll} \text { degree of nodes},&{} \quad \text {if}\, t\le 4\\ \text { betweenness centrality},&{} \quad \text {otherwise} \end{array}\right. } \end{aligned}$$

(5)

It should be observed that the distribution of sampled graphlets obtained the way mentioned until now, only considers the topological structure of a graph, and ignores the node and edge attributes. However, it is worth mentioning that the stochastic graphlet embedding permits to consider a small set of nodes and edge attributes by creating respective signatures and then appending it to the hash code encoding the topology of the graphlet. In this work, if needed, we first discretize the existing continuous attributes using a combination of clustering algorithm such as k-means and pooling technique. Later, the sorted discrete node and edge labels are used as the attribute signatures and combined with the hash code.

5.3 Hierarchical stochastic graphlet embedding

In this work, we propose to combine the properties of the proposed Stochastic Graphlet Embedding with the Hierarchical Embedding introduced in the previous section.

On the one hand, SGE provides statistical information about local structures varying the number of edges involved. Therefore, it provides fine-grained insights of the graph which cannot deal with too noisy data. The use of abstractions provided by the graph hierarchy increases the receptive field of each graphlet moving to coarser information that is able to provide insights of the global graph information. Moreover, the use of hierarchical edges during the computation allows to combine information at some levels, i.e., combining different levels of detail (see Eq. (1)). For now on, we will denote this embedding as Hierarchical Stochastic Graphlet Embedding (HSGE).

6 Computational complexity

This section is devoted to study the computational complexity of the proposed approach given a graph $G=(V,E,L_V,L_E)$ where $|V|=n$ and $|E|=m$.

6.1 Hierarchical embedding complexity

Graph clustering algorithms are usually high computational complexity techniques. As it has been stated in Sect. 4.3, the Girvan–Newman algorithm has been chosen as a graph clustering technique. The Girvan–Newman algorithm is based on the betweenness centrality of the edges which has a time complexity of ${\mathcal {O}}(n \cdot m)$ for unweighted graphs and ${\mathcal {O}}(n \cdot m + n\cdot (n+m) \log (n))$ for weighted graphs. Hence, the Girvan–Newman algorithm, which has to remove all the edges, can be computed in ${\mathcal {O}}(n \cdot m^2)$ for unweighted graphs and ${\mathcal {O}}(n \cdot m^2 + n\cdot m \cdot (n+m) \log (n))$ for weighted graphs.

Assuming an embedding function $\varphi$ which has a complexity of ${\mathcal {O}}(N)$ and assuming that the hierarchical graph construction has a complexity of $C_1$, then, if we assume L levels, the proposed configurations would become a complexity ${\mathcal {O}}(C_1 + L\cdot N)$ in the case of the pyramid and ${\mathcal {O}}(C_1 + L^2\cdot N)$ for the hierarchy and the exhaustive embeddings.

6.2 Stochastic graphlet embedding complexity

The computational complexity of Algorithm 2 is ${\mathcal {O}}(M \cdot T)$ where M is the number of graphlets to be sampled and T is the maximum size of graphlets in terms of the number of edges. Assuming a hash function with a complexity of ${\mathcal {O}}(C_2)$, Algorithm 3 has a time complexity of ${\mathcal {O}}(M \cdot T \cdot C_2)$ for computing the stochastic graphlet embedding. Here it is worth mentioning that “degree of nodes” and “betweeness centrality,” respectively, have the time complexity of ${\mathcal {O}}(n)$ and ${\mathcal {O}}(n \cdot m)$. From the above explanation, it is clear that the complexity of these two algorithms do not depend on the size of the input graph G, but only on the parameters M, T and the hash functions used.

7 Experimental validation

This section presents the experimental results obtained by our proposed Hierarchical Stochastic Graphlet Embedding method. The main aim of this experimental study is to validate the proposed graph embedding technique for the graph classification task, which demands robust embedding technique for mapping a graph into a vector space. For experimentation, we have considered many different widely used graph datasets with varied characteristics. All these graphs come from real data generated in the fields of Biology, Chemistry, Graphics and Handwriting recognition. The MATLAB code of our experiment is available at https://github.com/priba/hierarchicalSGE.

7.1 Experiments on molecular graph datasets

The first set of experiments is conducted on various benchmarks of molecular graphs. Below, we provide a brief description of them followed by the experimental setup, results and discussions.

7.1.1 Dataset description

Several bioinformatics datasets have been used: MUTAG, PTC, PROTEINS, NCI1, NCI109, D&D and MAO. These datasets have been widely used as benchmark in the literature. The MUTAG dataset contains graph representations of 188 chemical compounds which are either mutagenic aromatic or heteroromatic nitro compounds where nodes can have 7 discrete labels. The PTC or Predictive Toxicology Challenge dataset consists of 344 chemical compounds known to cause or not cause cancer in rats and mice. It has 19 discrete node labels. The PROTEINS dataset contains relations between secondary structure elements (SSEs) represented by nodes and neighborhood in the amino-acid sequence or in 3D space by edges. It has 3 discrete labels viz. helix, sheet or turn. The NCI1 and NCI109 come from the National Cancer Institute (NCI) and are two balanced subsets of chemical compounds screened for their ability to suppress or inhibit the growth of a panel of human tumor cell lines, having 37 and 38 discrete node labels, respectively. The D&D dataset consists of enzymes and non-enzymes proteins structures, in which their nodes are amino acids. The MAO database, taken from GREYC Chemistry graph dataset collection, is composed of 68 graphs representing molecules that either inhibit or not the monoamine oxidase, which is an antidepressant drug. Some more details on the proposed bioinformatics datasets are provided in Table 1.

Table 1 Details of the molecular graph datasets

Full size table

7.1.2 Experimental setup

We have performed two different experiments: the first one does not use the attribute information encoded in the nodes and edges of the graphs, whereas the second experiment does use the available node and edge features. For evaluating the performance of the proposed embedding technique, we have used a C-SVM solver [14] as a classifier. Since the datasets considered in this set of experiments do not contain predefined train and test sets, we have used a 10-fold cross-validation scheme to obtain accuracies and have reported the mean accuracies, respectively, in Tables 2 and 3 for unlabeled and labeled datasets. We follow a classical graph classification pipeline, where, in the first stage, graph embedding is computed by our proposed scheme, whereas in the second step, embedded graphs are classified using a previously trained classifier.

7.1.3 Results and discussion

In Table 2, we present the experimental results obtained by our proposed hierarchical embedding techniques together with other existing works on the unlabeled datasets. The previously mentioned three configurations of our hierarchical embedding are, respectively, denoted as: pyramidal, hierarchical and exhaustive. For unlabeled datasets, we have considered 10 different state-of-the-art methods: (1) random walk kernel (RW) [27], (2) shortest path kernel (SP) [8], (3) graphlet kernel (GK) [70], (4) Weisfeiler-Lehman kernel (WL) [69], (5) deep graph kernel (DGK) [83], (6) multiscale Laplacian graph kernel (MLK) [38], (7) diffusion CNNs (DCNN) [4], (8) strong graph spectrums (SGS) [37], (9) family of graph spectral distances (F_GSD) [78], and (10) stochastic graphlet embedding (SGE) [21].

From the quantitative results shown in Table 2, it should be observed that for most datasets, the highest accuracy is achieved by one of the hierarchical configurations proposed by us, which sets a new state-of-the-art results on all the datasets considered. Particularly, the best accuracies are obtained either by the pyramidal or the exhaustive configurations, which indicates the importance of considering hierarchical information for the graph embedding problem. As expected, the proposed hierarchical embeddings have achieved better performance than the SGE which is regarded as the baseline of our proposal. It should be observed that with this experimental setting, particularly the hierarchical configuration has performed quite poorly compared to the other two configurations. This fact might suggest that only hierarchical edges together with the connecting levels do not contain sufficient information for a robust graph representation. Information captured in the multiscale graphs thought to play a vital role for graph embedding, which is proved by the excellent performance obtained with the pyramidal and exhaustive configurations.

Table 2 Classification accuracies on unlabeled molecular graph datasets

Full size table

In Table 3, we demonstrate the results acquired by three different configurations of our proposed hierarchical embedding on the labeled graph datasets. For comparing with other state-of-the-art methods, we have considered two additional techniques: (1) PATCHY-SAN (PSCN) [55] and (2) graphlet spectrum (GS) [39]. Some of the previously considered state-of-the-art techniques do not work with labeled graphs, so they have not been evaluated in this experimentation.

The results presented in Table 3 show that, except on the MUTAG dataset, our proposed hierarchical embedding techniques have achieved the best performances on all the other datasets. This demonstrates the usefulness of considering the hierarchical information for embedding graphs to a vector space. Contrary to the previous experiments on unlabeled datasets, in this case, the hierarchical configuration has performed reasonably better. This fact shows that on labeled graphs, the hierarchical edges together with the connecting levels might provide important structural information. Also, it is important to note that the level information also performed consistently on all the datasets.

Table 3 Classification accuracy on labeled molecular graph datasets

Full size table

7.2 Experiments on AIDS, GREC, COIL-DEL and histograph datasets

While the datasets considered in the previous set of experiments were mostly molecular in nature, the set of experiments to be discussed in this section consider graphs from various fields, such as, Biology, Computer Vision, Graphics Recognition and Handwriting Recognition. Underneath, we give a brief description of the datasets considered followed by the experimental setup, results and discussions.

7.2.1 Dataset description

In this experiment, we consider four different datasets; three of them viz. AIDS, GREC and COIL-DEL are taken from the IAM graph database repository^{Footnote 1} [60]. The first one, viz., the AIDS database consists of 2000 graphs representing molecular compounds which are constructed from the AIDS Antiviral Screen Database of Active Compounds.^{Footnote 2} This dataset consists of two classes, viz., active (400 elements) and inactive (1600 elements), which, respectively, represent molecules with possible activity against HIV. The GREC dataset consists of 1100 graphs representing 22 different classes (characterizing architectural and electronic symbols) with 50 instances per class; these instances have different noise levels. The COIL-DEL database includes 3900 graphs belonging to 100 different classes with 39 instances per class; each instance has a different rotation angle. The HistoGraph dataset^{Footnote 3} [74] consists of graphs representing words from the communicating letters written by the first US president, George Washington. It consists of 293 graphs generated from 30 distinct words. Therefore, given a word, the task of the classifier is to predict its class which should be among the 30 words. Nodes are only labeled with their position in the image. Furthermore, this dataset used six different graph representation paradigms for delineating a single word into a graph, which results in six different subsets of graphs. The entire dataset is divided into 90, 60 and 143 graphs, respectively, for train, validation and test purposes. See Table 4 for the relevant statistics on these four datasets.

Table 4 Details of the AIDS, GREC, COIL-DEL and HistoGraph datasets

Full size table

7.2.2 Experimental setup

In this case as well, we have employed a C-SVM solver [14] as a classifier. Since the datasets used in this set of experiments contain well defined train and test sets, we have reported the obtained accuracies on the test set of the respective datasets in Table 5.

7.2.3 Results and discussion

Similar to the experimental results obtained in the previous section, in this set of experiments as well, our proposed hierarchical embeddings have achieved the best results on most datasets. In this set of experiments, the leading scores are mostly obtained by the exhaustive configuration, which shows the effectiveness of combining multiscale structural information together with the hierarchical connections. For some datasets, our hierarchical embedding does not achieve the best results, but it has performed very competitively. This also proves the robustness of the hierarchical graph representation.

Table 5 Results obtained on the AIDS, GREC, COIL-DEL and HistoGraph datasets

Full size table

7.3 Discussion on the parameters involved in the algorithm

Our algorithm is mainly controlled by three different parameters: (1) the number of levelsL of the graph pyramid, (2) the reduction ratioR and (3) the maximum number of edgesT of a graphlet. For illustrating how these three parameters control the performance of the system, first we plot the classification accuracy by varying the levels of the graph pyramid (see Fig. 3), reduction ratio (see Fig. 4) and T (see Fig. 5). Here it is worth mentioning that for the sake of simplicity, for each level we just consider the maximum accuracy obtained by any configuration mentioned in Sect. 4.3. From Fig. 3, we can observe that for all the datasets, considering a second level together with the base graph increases the classification accuracy. However, the successive inclusion of hierarchical levels does not always increase the performance. It has been observed that for smaller graphs (with less number nodes and edges, e.g., the graphs from MUTAG), the further inclusion of hierarchical abstraction decreases the performance of the system; this means that for smaller graphs a higher level abstraction can introduce noise or distortion. The reduction ratio R directly decides the number of clusters in a given level, and hence the number of nodes in the next higher level of the hierarchy. For example, $R=1$ indicates that the number of clusters should remain the same with the number of nodes, while $R=2$ indicates that the number of clusters should be half the number of nodes in that level. Figure 4 shows the behavior of our method with different values of R while we have fixed $L=2$. From these plots, one must observe that R is completely dependant on the datasets irrespective of the size of graphs they contain. For PTC, PROTEINS, and MAO datasets, the performance mostly increases with the increase of R, while for MUTAG, it improves until $R=2$, and then it decreases for all hierarchical configurations. For MAO dataset, all the hierarchical configurations behave exactly in the same way with the increase of R, which might be because the smaller sized graphs on which the contribution of different hierarchical configuration is indistinguishable.

In Fig. 5, we show the performance trend on six datasets (i.e., MUTAG, PTC, PROTEINS, NCI1, and NCI109) only with the SGE algorithm, which is the baseline graph embedding technique that we considered. The hierarchical configurations are not considered in this case because they have different graphlet sizes in different hierarchical levels, so understanding their behavior would have been complicated. From Fig. 5, it is clear that increasing T mostly improves the performance of the system on all the datasets. Albeit, there are some exceptions (e.g., for PTC dataset, $T=6$), which suggests that graphlets with T edges are less informative for that particular graph dataset.

7.4 Discussion on the stochasticity of the algorithm

It is important to note that our proposed algorithm is stochastic in nature because of the involvement of the stochastic graphlet sampling and the subsequent graph embedding procedure. The graphlet sampling engaged here uniformly samples graphlets from a given population of graphs, and by the law of large numbers, this sampling guarantees that the empirical distribution of graphlets is asymptotically close to the actual distribution [58]. For demonstrating the fact that the stochastic behavior of our algorithm does not heavily impact on the experimental results, we repeated the last experiment on all the datasets considered for 10 iterations, and in each iteration, we randomly seeded the sampling algorithm. The mean and standard deviation of the classification accuracy obtained for each dataset is reported in Table 6. The mean accuracies reported in the table are quite close to the ones reported in Table 5, and the standard deviations are comparatively low (all of them are less than 1.0). This suggests that the proposed graph embedding technique, although employed a stochastic process, is consistent in terms of performance.

Table 6 Mean and standard deviation of the accuracies obtained by repeating the classification task on the AIDS, GREC, COIL-DEL and HistoGraph datasets for 10 iterations. Here the mean accuracies consistent with the ones in Table 5 and the low standard deviations show that the proposed graph embedding is not sensitive to the stochasticity involved in the algorithm

Full size table

8 Conclusions

In this paper, we have proposed to enhance the information encoded in graph embeddings by means of hierarchical representations. We have experimentally validated that the abstract information is able to improve the graph classification performance. The embedding function is based on a stochastic sampling of graphlets to obtain the graphlet distribution within the graph. Graphlets of different sizes are considered to allow a change on the node context. Moreover, the hashing functions are used to identify graphlets in an efficient way. Event though considering different size graphlets provides robustness in terms of graph distortions, they still provide local information when we consider larger graphs. Therefore, building a graph hierarchy allows to increase the graphlet context without increasing the time needed for identifying the graphlet. In this work, we have carefully validated the performance of our approach in different application scenarios, showing that we outperform the state-of-the-art approaches in the graph classification task using an SVM as a classifier.

Further research will focus on improving the hierarchical graph construction. Even though the Girvan–Newman algorithm is able to exploit the desired properties of the graph, creating clusterings that allow to create good abstractions, their time complexity is a drawback that should be studied when considering large graphs.

Notes

Available at http://www.fki.inf.unibe.ch/databases/iam-graph-database.
See at http://dtp.nci.nih.gov/docs/aids/aids_data.html.
Available at http://www.histograph.ch.

References

Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM (1984) Pyramid methods in image processing. RCA Eng 29(6):33–41
Google Scholar
Ahuja N, Todorovic S (2010) From region based image representation to object discovery and recognition. In: S+SSPR, vol 6218, pp 1–19
Almeida H, Guedes D, Meira W, Zaki MJ (2011) Is there a best quality metric for graph clusters? In: MLKDD, pp 44–59
Atwood J, Towsley D (2016) Diffusion-convolutional neural networks. In: NIPS, pp 1993–2001
Aziz F, Wilson R, Hancock E (2013) Backtrackless walks on a graph. IEEE Trans Neural Netw Learn Syst 24(6):977–989
Article Google Scholar
Barbu E, Héroux P, Adam S, Trupin E (2005) Frequent graph discovery: application to line drawing document images. Electron Lett Comput Vis Image Anal 5(2):47–54
Article Google Scholar
Bodic PL, Héroux P, Adam S, Lecourtier Y (2012) An integer linear program for substitution-tolerant subgraph isomorphism and its use for symbol spotting in technical drawings. Pattern Recognit 45(12):4214–4224
Article Google Scholar
Borgwardt K, Kriegel HP (2005) Shortest-path kernels on graphs. In: ICDM, pp 74–81
Borzeshi EZ, Piccardi M, Riesen K, Bunke H (2013) Discriminative prototype selection methods for graph embedding. Pattern Recognit 46(6):1648–1657
Article MATH Google Scholar
Broelemann K, Dutta A, Jiang X, Lladós J (2012) Hierarchical graph representation for symbol spotting in graphical document images. In: S+SSPR, vol 7626. Springer, Berlin, pp 529–538
Broelemann K, Dutta A, Jiang X, Lladós J (2013) Hierarchical plausibility-graphs for symbol spotting in graphical documents. In: GREC, pp 13–18
Bunke H, Riesen K (2010) Improving vector space embedding of graphs through feature selection algorithms. Pattern Recognit 44(9):1928–1940
Article Google Scholar
Caelli T, Kosinov S (2004) An eigenspace projection clustering method for inexact graph matching. IEEE Trans Pattern Anal Mach Intell 26(4):515–519
Article Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Comellas F, Paz-Sánchez J (2008) Reconstruction of networks from their betweenness centrality. In: AEC. Springer, Berlin, pp 31–37
Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298
Article Google Scholar
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS, pp 1–14
Dupé F.X, Brun L (2010) Hierarchical bag of paths for kernel based shape classification. In: S+SSPR, pp 227–236
Dutta A, Lladós J, Bunke H, Pal U (2017) Product graph-based higher order contextual similarities for inexact subgraph matching. Pattern Recognit 76:596–611
Article Google Scholar
Dutta A, Riba P, Lladós J, Fornés A (2017) Pyramidal stochastic graphlet embedding for document pattern classification. In: ICDAR, pp 33–38
Dutta A, Sahbi H (2019) Stochastic graphlet embedding. IEEE Trans Neural Netw Learn Syst 30(8):2369–2382
Article MathSciNet Google Scholar
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Article Google Scholar
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: CVPR, pp 524–531
Felzenszwalb P, Schwartz J (2007) Hierarchical matching of deformable shapes. In: CVPR, pp 1–8
Foggia P, Percannella G, Vento M (2014) Graph matching and learning in pattern recognition in the last 10 years. Int J Pattern Recognit Artif Intell 28(1):1–40
Article MathSciNet Google Scholar
Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41
Article Google Scholar
Gärtner T (2003) A survey of kernels for structured data. ACM SIGKDD Explor Newslett 5(1):49–58
Article Google Scholar
Gentile C, Li S, Kar P, Karatzoglou A, Zappella G, Etrue E (2017) On context-dependent clustering of bandits. In: ICML, pp 1253–1262. JMLR.org
Gibert J, Valveny E, Bunke H (2012) Graph embedding in vector spaces by node attribute statistics. Pattern Recognit 45(9):3072–3083
Article Google Scholar
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: ICML, pp 1263–1272
Girvan M, Newman M (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Article MathSciNet MATH Google Scholar
Horváth T, Gärtner T, Wrobel S (2004) Cyclic pattern kernels for predictive graph mining. In: KDD, pp 158–167
Jolion JM, Rosenfeld A (1994) A pyramid framework for early vision: multiresolutional computer vision. Kluwer Academic Publishers, Norwell
Book Google Scholar
Jouili S, Tabbone S (2010) Graph embedding using constant shift embedding. In: ICPR, pp 83–92
Kashima H, Tsuda K, Inokuchi A (2004) Kernels for graphs. Kernel Methods Comput Biol 39(1):101–113
Google Scholar
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: ICLR, pp 1–10
Kondor R, Borgwardt KM (2008) The skew spectrum of graphs. In: ICML, pp 496–503
Kondor R, Pan H (2016) The multiscale Laplacian graph kernel. In: NIPS, pp 2982–2990
Kondor R, Shervashidze N, Borgwardt KM (2009) The graphlet spectrum. In: ICML, pp 529–536
Korda N, Szörényi B, Li S (2016) Distributed clustering of linear bandits in peer to peer networks. In: ICML
Kriege N, Mutzel P (2012) Subgraph matching kernels for attributed graphs. In: ICML, pp 1015–1022
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: IEEE
Lafferty J, Lebanon G (2005) Diffusion kernels on statistical manifolds. J Mach Learn Res 6:129–163
MathSciNet MATH Google Scholar
Li S, Chen W, Li S, Leung K (2019) Improved algorithm on online clustering of bandits. In: IJCAI
Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: SIGIR
Liu X, Lin L, Li H, Jin H, Tao W (2008) Layered shape matching and registration: Stochastic sampling with hierarchical graph representation. In: ICPR, pp 1–4
Luqman MM, Ramel JY, Lladós J, Brouard T (2013) Fuzzy multilevel graph embedding. Pattern Recognit 46(2):551–565
Article MATH Google Scholar
Marfil R, Molina-Tanco L, Bandera A, Sandoval F (2007) The construction of bounded irregular pyramids with a union-find decimation process. In: GbRPR, pp 307–318
Mehlhorn K (1984) Graph algorithms and NP-completeness. Springer, New York
MATH Google Scholar
Mousavi SF, Safayani M, Mirzaei A, Bahonar H (2017) Hierarchical graph embedding in vector space by graph pyramid. Pattern Recognit 61:245–254
Article MATH Google Scholar
Neuhaus M, Bunke H (2004) An error-tolerant approximate matching algorithm for attributed planar graphs and its application to fingerprint classification. In: S+SSPR, pp 180–189
Neuhaus M, Bunke H (2007) Bridging the gap between graph edit distance and kernel machines. World Scientific, Singapore
Book MATH Google Scholar
Newman MJ (2005) A measure of betweenness centrality based on random walks. Soc Netw 27(1):39–54
Article Google Scholar
Niebles J, Fei-Fei L (2007) A hierarchical model of shape and appearance for human action classification. In: CVPR, pp 1–8
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: ICML, pp 2014–2023
Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. World Scientific, Hackensack
Book MATH Google Scholar
Pelillo M, Siddiqi K, Zucker SW (1999) Matching hierarchical structures using association graphs. IEEE Trans Pattern Anal Mach Intell 21(11):1105–1120
Article Google Scholar
Pržulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):e177
Article Google Scholar
Riba P, Lladós J, Fornés A (2017) Error-tolerant coarse-to-fine matching model for hierarchical graphs. In: International workshop on graph-based representations in pattern recognition. Springer, pp 107–117
Riesen K, Bunke H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: S+SSPR, pp 287–297
Riesen K, Bunke H (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image Vis Comput 27(7):950–959
Article Google Scholar
Riesen K, Bunke H (2009) Graph classification by means of Lipschitz embedding. IEEE Trans Syst Man Cybern Part B 39(6):1472–1483
Article Google Scholar
Riesen K, Neuhaus M, Bunke H (2007) Bipartite graph matching for computing the edit distance of graphs. In: Escolano F, Vento M (eds) Graph-based representations in pattern recognition, LNCS, vol 4538. Springer, Berlin, pp 1–12
Robles-Kelly A, Hancock ER (2007) A riemannian approach to graph embedding. Pattern Recognit 40(3):1042–1056
Article MATH Google Scholar
Saund E (2013) A graph lattice approach to maintaining and learning dense collections of subgraphs as image features. IEEE Trans Pattern Anal Mach Intell 35(10):2323–2339
Article Google Scholar
Schellewald C, Schnörr C (2005) Probabilistic subgraph matching based on convex relaxation. In: EMMCVPR, pp 171–186
Serratosa F, Alquézar R, Sanfeliu A (2000) Efficient algorithms for matching attributed graphs and function-described graphs. In: International conference on pattern recognition, vol 2, pp 867–872
Shervashidze N, Borgwardt K.M (2009) Fast subtree kernels on graphs. In: NIPS, pp 1660–1668
Shervashidze N, Schweitzer P, van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler-Lehman graph kernels. J Mach Learn Res 12:2539–2561
MathSciNet MATH Google Scholar
Shervashidze N, Vishwanathan SVN, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: AISTATS, pp 488–495
Shokoufandeh A, Macrini D, Dickinson S, Siddiqi K, Zucker S (2005) Indexing hierarchical structures using graph spectra. IEEE Trans Pattern Anal Mach Intell 27(7):1125–1140
Article Google Scholar
Smola AJ, Kondor R (2003) Kernels and regularization on graphs. In: COLT, pp 144–158
Solnon C (2010) All different-based filtering for subgraph isomorphism. Artif Intell 174(12–13):850–864
Article MathSciNet MATH Google Scholar
Stauffer M, Fischer A, Riesen K (2016) A novel graph database for handwritten word images. In: S+SSPR, pp 553–563
Suh Y, Adamczewski K, Mu Lee K (2015) Subgraph matching using compactness prior for robust feature correspondence. In: CVPR
Ulrich M, Wiedemann C, Steger C (2012) Combining scale-space and similarity-based aspect graphs for fast 3d object recognition. IEEE Trans Pattern Anal Mach Intell 34(10):1902–1914
Article Google Scholar
Vento M (2015) A long trip in the charming world of graphs for pattern recognition. Pattern Recognit 48(2):291–301
Article MATH Google Scholar
Verma S, Zhang ZL (2017) Hunt for the unique, stable, sparse and fast feature learning on graphs. In: NIPS, pp 87–97
Vishwanathan SVN, Schraudolph NN, Kondor R, Borgwardt KM (2010) Graph kernels. J Mach Learn Res 11:1201–1242
MathSciNet MATH Google Scholar
Watkins C (1999) Kernels from matching operations. Technical report, Computer Science Department, University of London
Weissman T, Ordentlich E, Seroussi G, Verdu S, Weinberger MJ (2003) Inequalities for the l1 deviation of the empirical distribution. Technical report, HP Labs, Palo Alto
Wilson R, Hancock E, Luo B (2005) Pattern vectors from algebraic graph theory. IEEE Trans Pattern Anal Mach Intell 27(7):1112–1124
Article Google Scholar
Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: KDD, pp 1365–1374

Download references

Acknowledgements

This work has been partially supported by the European Union’s research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 665919 (P-SPHERE project), the Spanish projects RTI2018-102285-A-I00 and RTI2018-095645-B-C21, the FPU fellowship FPU15/06264 from the Spanish Ministerio de Educación, Cultura y Deporte, the Ramon y Cajal Fellowship RYC-2014-1683, and the CERCA Program/Generalitat de Catalunya. Anjan Dutta was a Marie-Curie Fellow (under the P-SPHERE Project) at the Computer Vision Center of Barcelona, where most of the work was done and the paper was written.

Author information

Anjan Dutta and Pau Riba have contributed equally to this work.

Authors and Affiliations

Department of Computer Science, University of Exeter, Innovation Centre, Streatham Campus, Exeter, EX4 4RN, UK
Anjan Dutta
Computer Vision Center, Computer Science Department, Autonomous University of Barcelona, Edifici O, Campus UAB, Bellaterra, 08193, Barcelona, Spain
Pau Riba, Josep Lladós & Alicia Fornés

Authors

Anjan Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Pau Riba
View author publications
You can also search for this author in PubMed Google Scholar
Josep Lladós
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Fornés
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anjan Dutta.

Ethics declarations

Conflict of interest

Anjan Dutta, Pau Riba, Josep Lladós and Alicia Fornés declare that they do not have any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Dutta, A., Riba, P., Lladós, J. et al. Hierarchical stochastic graphlet embedding for graph-based pattern recognition. Neural Comput & Applic 32, 11579–11596 (2020). https://doi.org/10.1007/s00521-019-04642-7

Download citation

Received: 01 August 2019
Accepted: 22 November 2019
Published: 06 December 2019
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00521-019-04642-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hierarchical stochastic graphlet embedding for graph-based pattern recognition

Abstract

Similar content being viewed by others

Graph Embedding in Vector Spaces Using Matching-Graphs

Graph Embedding Through Probabilistic Graphical Model Applied to Symbolic Graphs

Graph Classification Based on Sparse Graph Feature Selection and Extreme Learning Machine

Explore related subjects

1 Introduction

2 Related work

2.1 Graph embedding

2.1.1 Explicit graph embedding

2.1.2 Implicit graph embedding

2.2 Hierarchical graph representation

3 Definitions and notations

Definition 1

Definition 2

Definition 3

4 Hierarchical embedding

4.1 Graph clustering

4.2 Hierarchical construction

4.3 Hierarchical embedding

5 Stochastic graphlet embedding

5.1 Stochastic graphlets sampling

5.2 Hashed graphlets distribution

5.3 Hierarchical stochastic graphlet embedding

6 Computational complexity

6.1 Hierarchical embedding complexity

6.2 Stochastic graphlet embedding complexity

7 Experimental validation

7.1 Experiments on molecular graph datasets

7.1.1 Dataset description

7.1.2 Experimental setup

7.1.3 Results and discussion

7.2 Experiments on AIDS, GREC, COIL-DEL and histograph datasets

7.2.1 Dataset description

7.2.2 Experimental setup

7.2.3 Results and discussion

7.3 Discussion on the parameters involved in the algorithm

7.4 Discussion on the stochasticity of the algorithm

8 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation