Multilayer Graph Node Kernels: Stacking While Maintaining Convexity

Oneto, Luca; Navarin, Nicolò; Sperduti, Alessandro; Anguita, Davide

doi:10.1007/s11063-017-9742-z

Multilayer Graph Node Kernels: Stacking While Maintaining Convexity

Published: 07 November 2017

Volume 48, pages 649–667, (2018)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Luca Oneto ORCID: orcid.org/0000-0002-8445-395X¹,
Nicolò Navarin²,
Alessandro Sperduti² &
…
Davide Anguita¹

283 Accesses
3 Citations
Explore all metrics

Abstract

Nowadays, developing effective techniques able to deal with data coming from structured domains is becoming crucial. In this context kernel methods are the state-of-the-art tool widely adopted in real-world applications that involve learning on structured data. Contrarily, when one has to deal with unstructured domains, deep learning methods represent a competitive, or even better, choice. In this paper we propose a new family of kernels for graphs which exploits an abstract representation of the information inspired by the multilayer perceptron architecture. Our proposal exploits the advantages of the two worlds. From one side we exploit the potentiality of the state-of-the-art graph node kernels. From the other side we develop a multilayer architecture through a series of stacked kernel pre-image estimators, trained in an unsupervised fashion via convex optimization. The hidden layers of the proposed framework are trained in a forward manner and this allows us to avoid the greedy layerwise training of classical deep learning. Results on real world graph datasets confirm the quality of the proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel Graph Convolutional Neural Networks

Graph Neural Networks: Graph Classification

An Empirical Study of the Expressiveness of Graph Kernels and Graph Neural Networks

References

Abrahamsen TJ, Hansen LK (2011) Regularized pre-image estimation for kernel PCA de-noising: input space regularization and sparse reconstruction. J Signal Process Syst 65(3):403–412
Article Google Scholar
Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141
MathSciNet MATH Google Scholar
Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2012) Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In: International workshop on ambient assisted living
Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst 23(9):1390–1406
Article Google Scholar
Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample model selection for trimmed hinge loss support vector machine. Neural Process Lett 36(3):275–283
Article Google Scholar
Anguita D, Ridella S, Sterpi D (2006) Testing the augmented binary multiclass svm on microarray data. In: International joint conference on neural networks
Bakir G, Hofman T, Schölkopf B, Smola AJ, Taskar B, Vishwanathan SVN (2007) Predicting structured data. MIT Press, Cambridge
Google Scholar
Bakir GH, Weston J, Schölkopf B (2004) Learning to find pre-images. In: Advances in neural information processing systems
Ben-Israel A, Greville TNE (2003) Generalized inverses: theory and applications. Springer, Berlin
MATH Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Caponnetto A (2005) A note on the role of squared loss in regression. CBCL, MIT, Cambridge
Google Scholar
Chen BL, Li M, Wang JX, Wu FX (2014) Disease gene identification by using graph kernels and Markov random fields. Sci China Life Sci 57(11):1054–1063
Article Google Scholar
Cortes C, Vapnik C (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Da San Martino G, Navarin N, Sperduti A (2016) Ordered decompositional DAG kernels enhancements. Neurocomputing 192:92–103
Article Google Scholar
Davie AM, Stothers AJ (2013) Improved bound for complexity of matrix multiplication. Proc R Soc Edinb: Sect A Math 143(02):351–369
Article MathSciNet Google Scholar
Fouss F, Francoisse K, Yen L, Pirotte A, Saerens M (2012) An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Netw 31:53–72
Article Google Scholar
Fürnkranz J (2002) Round robin classification. J Mach Learn Res 2:721–747
MathSciNet MATH Google Scholar
Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C (2008) Using ghost edges for classification in sparsely labeled networks. In: International conference on Knowledge discovery and data mining, pp 256–264
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Hofmann T, Scholkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220
Article MathSciNet Google Scholar
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Kasun LLC, Zhou H, Huang GB, Vong CM (2013) Representational learning with elms for big data. IEEE Intell Syst 28(6):31–34
Google Scholar
Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audiovisual emotion recognition. In: IEEE International conference on acoustics, speech and signal processing, pp 3687–3691
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence
Lafferty RI, Kondor J (2002) Diffusion kernels on graphs and other discrete structures. In: International conference machine learning
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031
Article Google Scholar
Mantrach A, Van Zeebroeck N, Francq P, Shimbo M, Bersini H, Saerens M (2011) Semi-supervised classification and betweenness computation on large, sparse, directed graphs. Pattern Recognit 44(6):1212–1224
Article Google Scholar
NetKit-SRL: Network Learning Toolkit for Statistical Relational Learning. http://netkit-srl.sourceforge.net/data.html. [Online; Accessed 1 Dec 2016]
Neumann M, Garnett R, Kersting K (2013) Coinciding walk kernels: parallel absorbing random walks for learning with graphs and few labels. In: Asian conference on machine learning
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: International conference on Knowledge discovery and data mining
Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):191–218
Article MathSciNet Google Scholar
Reka A, Barabasi AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97
Article MathSciNet Google Scholar
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
MathSciNet MATH Google Scholar
Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16(5):1063–1076
Article Google Scholar
Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: International conference on computational learning theory. Springer, Berlin
Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Smola AJ, Kondor R (2003) Kernels and regularization on graphs. In: Conference on learning theory
Spitzer F (1981) Reversibility and stochastic networks. SIAM Rev 23(3):400–401
Article Google Scholar
Tang J, Deng C, Huang GB (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821
Article MathSciNet Google Scholar
Tikhonov AN, Arsenin VIA (1977) Solutions of ill-posed problems. Halsted Press, New York
MATH Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp 1096–1103
WebKB Project: CMU World Wide Knowledge Base (Web-KB) project. http://www.cs.cmu.edu/~webkb/. Accessed 1 Dec 2016
Yanardag P, Vishwanathan SVN (2015) Deep graph kernels. In: ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Author information

Authors and Affiliations

DIBRIS, University of Genova, Via Opera Pia 13, 16145, Genova, Italy
Luca Oneto & Davide Anguita
Department of Mathematics, University of Padua, Via Trieste, 63, 35121, Padova, Italy
Nicolò Navarin & Alessandro Sperduti

Authors

Luca Oneto
View author publications
You can also search for this author in PubMed Google Scholar
Nicolò Navarin
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Sperduti
View author publications
You can also search for this author in PubMed Google Scholar
Davide Anguita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Oneto.

Appendix: Hyperparameters Selection Results

In this appendix we report the hyperparameters selected during the model selection phase of the experiments presented in Section 5.1. From Table 3, we can draw some observations. First, the SVMs C parameter is generally high, while the \(\lambda \) parameter of the pre-image estimator (the first layer) has more variability. This indicates that the regularization happens mostly in the first layer. Second, the kernel parameters over the different datasets tend to be pretty stable (in the same order of magnitude) for the architectures involving LEDK and RLK kernels. On the contrary, architectures involving MDK kernel tend to show more variability in the selected parameters.

Table 3 Best parameters for every dataset/kernels combination, for the proposed 2 layers architecture

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oneto, L., Navarin, N., Sperduti, A. et al. Multilayer Graph Node Kernels: Stacking While Maintaining Convexity. Neural Process Lett 48, 649–667 (2018). https://doi.org/10.1007/s11063-017-9742-z

Download citation

Published: 07 November 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11063-017-9742-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilayer Graph Node Kernels: Stacking While Maintaining Convexity

Abstract

Access this article

Similar content being viewed by others

Kernel Graph Convolutional Neural Networks

Graph Neural Networks: Graph Classification

An Empirical Study of the Expressiveness of Graph Kernels and Graph Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Hyperparameters Selection Results

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multilayer Graph Node Kernels: Stacking While Maintaining Convexity

Abstract

Access this article

Similar content being viewed by others

Kernel Graph Convolutional Neural Networks

Graph Neural Networks: Graph Classification

An Empirical Study of the Expressiveness of Graph Kernels and Graph Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Hyperparameters Selection Results

Appendix: Hyperparameters Selection Results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation