Abstract
A growing list of domains, in the forefront of which are Web data and applications, are modeled by graph representations. In content-driven graph analytics, knowledge must be extracted from large numbers of available data graphs. As the number of datasets (a different type of volume) can reach immense sizes, a thorough evaluation of each input is prohibitively expensive. To date, there exists no efficient method to quantify the impact of numerous available datasets over different graph analytics tasks. To address this challenge, we propose an efficient graph operator modeling methodology. Our novel, operator-agnostic approach focuses on the inputs themselves, utilizing graph similarity to infer knowledge about them. An operator is executed for a small subset of the available inputs and its behavior is modeled for the rest of the graphs utilizing machine learning. We propose a family of similarity measures based on the degree distribution that prove capable of producing high quality models for many popular graph tasks, even compared to modern, state of the art similarity functions. Our evaluation over both real-world and synthetic graph datasets indicates that our method achieves extremely accurate modeling of many commonly encountered operators, managing massive speedups over a brute-force alternative.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aherne, F.J., et al.: The bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika 34(4), 363–368 (1998)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, pp. 1027–10357–9 January 2007
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35(1), 99–109 (1943)
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Bounova, G., de Weck, O.: Overview of metrics and their correlation patterns for multiple-metric topology analysis on heterogeneous graph ensembles. Phys. Rev. E 85, 016117 (2012)
Brandes, U., Erlebach, T.: Network Analysis: Methodological Foundations. Springer, New York (2005). https://doi.org/10.1007/b106453
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)
Csardi, G., Nepusz, T.: The Igraph software package for complex network research. Inter J. Complex Syst. 1695, 1–9 (2006)
Eppstein, D., Wang, J.: Fast approximation of centrality. J. Graph Algorithms Appl. 8, 39–45 (2004)
da, F., Costa, L., et al.: Characterization of complex networks: a survey of measurements. Adv. Phys. 56(1), 167–242 (2007)
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)
Gandomi, et al.: Beyond the hype. Int. J. Inf. Manage. 35(2), 137–144 (2015)
Gärtner, T.: A survey of kernels for structured data. SIGKDD 5(1), 49–58 (2003)
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45167-9_11
Ghosh, S., Das, N., Gonçalves, T., Quaresma, P., Kundu, M.: The journey of graph kernels through two decades. Comput. Sci. Rev. 27, 88–111 (2018)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: SIGKDD, pp. 855–864. ACM (2016)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-21606-5
Hernández, J.M., Mieghem, P.V.: Classification of graph metrics, pp. 1–20 (2011)
Jamakovic, A., et al.: Robustness of networks against viruses: the role of the spectral radius. In: Symposium on Communications and Vehicular Technology, pp. 35–38 (2006)
Jamakovic, A., Uhlig, S.: On the relationships between topological measures in real-world networks. NHM 3(2), 345–359 (2008)
Kaufmann, L., Rousseeuw, P.: Clustering by means of medoids, pp. 405–416 (1987)
Koutra, D., Parikh, A., Ramdas, A., Xiang, J.: Algorithms for graph similarity and subgraph matching. Technical Report Carnegie-Mellon-University (2011). https://people.eecs.berkeley.edu/~aramdas/reports/DBreport.pdf
Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. TWEB 1(1), 5 (2007)
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
Leskovec, J., Sosič, R.: Snap: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. 8(1), 1 (2016)
McAuley, J.J., Leskovec, J.: Learning to discover social circles in ego networks. In: Bartlett, P.L., Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp. 548–556 (2012)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Riondato, M., Kornaropoulos, E.M.: Fast approximation of betweenness centrality through sampling. Data Min. Knowl. Discov. 30(2), 438–475 (2016)
Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern. 13(3), 353–362 (1983)
Schieber, T.A., et al.: Quantification of network structural dissimilarities. Nat. Commun. 8, 13928 (2017)
Sugiyama, M., Borgwardt, K.M.: Halting in random walk kernels. In: Annual Conference on Neural Information Processing Systems, pp. 1639–1647 (2015)
Bakogiannis, T., Giannakopoulos, I., Tsoumakos, D., Koziris, N.: Graph operator modeling over large graph datasets. CoRR abs/1802.05536 (2018). http://arxiv.org/abs/1802.05536
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)
Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. Appl. Math. Lett. 21(1), 86–94 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bakogiannis, T., Giannakopoulos, I., Tsoumakos, D., Koziris, N. (2019). Predicting Graph Operator Output over Multiple Graphs. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-19274-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19273-0
Online ISBN: 978-3-030-19274-7
eBook Packages: Computer ScienceComputer Science (R0)