Predicting Graph Operator Output over Multiple Graphs

Bakogiannis, Tasos; Giannakopoulos, Ioannis; Tsoumakos, Dimitrios; Koziris, Nectarios

doi:10.1007/978-3-030-19274-7_9

Predicting Graph Operator Output over Multiple Graphs

Tasos Bakogiannis¹⁷,
Ioannis Giannakopoulos¹⁷,
Dimitrios Tsoumakos¹⁸ &
…
Nectarios Koziris¹⁷

Conference paper
First Online: 26 April 2019

1609 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11496))

Abstract

A growing list of domains, in the forefront of which are Web data and applications, are modeled by graph representations. In content-driven graph analytics, knowledge must be extracted from large numbers of available data graphs. As the number of datasets (a different type of volume) can reach immense sizes, a thorough evaluation of each input is prohibitively expensive. To date, there exists no efficient method to quantify the impact of numerous available datasets over different graph analytics tasks. To address this challenge, we propose an efficient graph operator modeling methodology. Our novel, operator-agnostic approach focuses on the inputs themselves, utilizing graph similarity to infer knowledge about them. An operator is executed for a small subset of the available inputs and its behavior is modeled for the rest of the graphs utilizing machine learning. We propose a family of similarity measures based on the degree distribution that prove capable of producing high quality models for many popular graph tasks, even compared to modern, state of the art similarity functions. Our evaluation over both real-world and synthetic graph datasets indicates that our method achieves extremely accurate modeling of many commonly encountered operators, managing massive speedups over a brute-force alternative.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/giagiannis/data-profiler.

References

Aherne, F.J., et al.: The bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika 34(4), 363–368 (1998)
MathSciNet MATH Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, pp. 1027–10357–9 January 2007
Google Scholar
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Article MathSciNet Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MathSciNet Google Scholar
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35(1), 99–109 (1943)
MathSciNet MATH Google Scholar
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Article Google Scholar
Bounova, G., de Weck, O.: Overview of metrics and their correlation patterns for multiple-metric topology analysis on heterogeneous graph ensembles. Phys. Rev. E 85, 016117 (2012)
Article Google Scholar
Brandes, U., Erlebach, T.: Network Analysis: Methodological Foundations. Springer, New York (2005). https://doi.org/10.1007/b106453
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)
Google Scholar
Csardi, G., Nepusz, T.: The Igraph software package for complex network research. Inter J. Complex Syst. 1695, 1–9 (2006)
Google Scholar
Eppstein, D., Wang, J.: Fast approximation of centrality. J. Graph Algorithms Appl. 8, 39–45 (2004)
Article MathSciNet Google Scholar
da, F., Costa, L., et al.: Characterization of complex networks: a survey of measurements. Adv. Phys. 56(1), 167–242 (2007)
Google Scholar
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)
Article Google Scholar
Gandomi, et al.: Beyond the hype. Int. J. Inf. Manage. 35(2), 137–144 (2015)
Google Scholar
Gärtner, T.: A survey of kernels for structured data. SIGKDD 5(1), 49–58 (2003)
Article Google Scholar
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45167-9_11
Chapter MATH Google Scholar
Ghosh, S., Das, N., Gonçalves, T., Quaresma, P., Kundu, M.: The journey of graph kernels through two decades. Comput. Sci. Rev. 27, 88–111 (2018)
Article MathSciNet Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: SIGKDD, pp. 855–864. ACM (2016)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-21606-5
Hernández, J.M., Mieghem, P.V.: Classification of graph metrics, pp. 1–20 (2011)
Google Scholar
Jamakovic, A., et al.: Robustness of networks against viruses: the role of the spectral radius. In: Symposium on Communications and Vehicular Technology, pp. 35–38 (2006)
Google Scholar
Jamakovic, A., Uhlig, S.: On the relationships between topological measures in real-world networks. NHM 3(2), 345–359 (2008)
Article MathSciNet Google Scholar
Kaufmann, L., Rousseeuw, P.: Clustering by means of medoids, pp. 405–416 (1987)
Google Scholar
Koutra, D., Parikh, A., Ramdas, A., Xiang, J.: Algorithms for graph similarity and subgraph matching. Technical Report Carnegie-Mellon-University (2011). https://people.eecs.berkeley.edu/~aramdas/reports/DBreport.pdf
Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. TWEB 1(1), 5 (2007)
Article Google Scholar
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
Leskovec, J., Sosič, R.: Snap: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. 8(1), 1 (2016)
Article Google Scholar
McAuley, J.J., Leskovec, J.: Learning to discover social circles in ego networks. In: Bartlett, P.L., Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp. 548–556 (2012)
Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Article Google Scholar
Riondato, M., Kornaropoulos, E.M.: Fast approximation of betweenness centrality through sampling. Data Min. Knowl. Discov. 30(2), 438–475 (2016)
Article MathSciNet Google Scholar
Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern. 13(3), 353–362 (1983)
Article Google Scholar
Schieber, T.A., et al.: Quantification of network structural dissimilarities. Nat. Commun. 8, 13928 (2017)
Article Google Scholar
Sugiyama, M., Borgwardt, K.M.: Halting in random walk kernels. In: Annual Conference on Neural Information Processing Systems, pp. 1639–1647 (2015)
Google Scholar
Bakogiannis, T., Giannakopoulos, I., Tsoumakos, D., Koziris, N.: Graph operator modeling over large graph datasets. CoRR abs/1802.05536 (2018). http://arxiv.org/abs/1802.05536
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)
MathSciNet MATH Google Scholar
Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. Appl. Math. Lett. 21(1), 86–94 (2008)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

CSLab, School of ECE, National Technical University of Athens, Athens, Greece
Tasos Bakogiannis, Ioannis Giannakopoulos & Nectarios Koziris
Department of Informatics, Ionian University, Corfu, Greece
Dimitrios Tsoumakos

Authors

Tasos Bakogiannis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Giannakopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Tsoumakos
View author publications
You can also search for this author in PubMed Google Scholar
Nectarios Koziris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tasos Bakogiannis .

Editor information

Editors and Affiliations

Novosibirsk State Technical University, Novosibirsk, Russia
Maxim Bakaev
Erasmus University Rotterdam, Rotterdam, The Netherlands
Flavius Frasincar
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
In-Young Ko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bakogiannis, T., Giannakopoulos, I., Tsoumakos, D., Koziris, N. (2019). Predicting Graph Operator Output over Multiple Graphs. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-19274-7_9
Published: 26 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19273-0
Online ISBN: 978-3-030-19274-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics