Learning and Scaling Directed Networks via Graph Embedding
- 2.9k Downloads
Reliable evaluation of network mining tools implies significance and scalability testing. This is usually achieved by picking several graphs of various size from different domains. However, graph properties and thus evaluation results could be dramatically different from one domain to another. Hence the necessity of aggregating results over a multitude of graphs within each domain.
The paper introduces an approach to automatically learn features of a directed graph from any domain and generate similar graphs while scaling input graph size with a real-valued factor. Generating multiple graphs with similar size allows significance testing, while scaling graph size makes scalability evaluation possible. The proposed method relies on embedding an input graph into low-dimensional space, thus encoding graph features in a set of node vectors. Edge weights and node communities could be imitated as well in optional steps.
We demonstrate that embedding-based approach ensures variability of synthetic graphs while keeping degree and subgraphs distributions close to the original graphs. Therefore, the method could make significance and scalability testing of network algorithms more reliable without the need to collect additional data. We also show that embedding-based approach preserves various features in generated graphs which can’t be achieved by other generators imitating a given graph.
KeywordsRandom graph generating Graph embedding Representation learning
This research was collaborated with and supported by Huawei Technologies Co.,Ltd. under contract YB2015110136.
We are also thankful to Ilya Kozlov and Sergey Bartunov for their ideas and valuable contributions.
- 2.Bordino, I., Donato, D., Gionis, A., Leonardi, S.: Mining large networks with subgraph counting. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 737–742. IEEE (2008)Google Scholar
- 3.Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: SDM, vol. 4, pp. 442–446. SIAM (2004)Google Scholar
- 4.Chykhradze, K., Korshunov, A., Buzun, N., Pastukhov, R., Kuzyurin, N., Turdakov, D., Kim, H.: Distributed generation of billion-node social graphs with overlapping community structure. In: Contucci, P., Menezes, R., Omicini, A., Poncela-Casasnovas, J. (eds.) Complex Networks V. SCI, vol. 549, pp. 199–208. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05401-8_19 CrossRefGoogle Scholar
- 9.Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: AISTATS, vol. 1, pp. 6 (2010)Google Scholar
- 15.Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
- 17.Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
- 18.Mossel, E., Neeman, J., Sly, A.: Stochastic block models and reconstruction. arXiv preprint arXiv:1202.1499 (2012)
- 19.Nanavati, A.A., Gurumurthy, S., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom call graphs: findings and implications. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM 2006, pp. 435–444, New York, NY, USA. ACM (2006)Google Scholar
- 20.Nickel, C.L.M.: Random dot product graphs: a model for social networks, vol. 68 (2007)Google Scholar
- 23.Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)Google Scholar
- 24.Staudt, C.L., Hamann, M., Safro, I., Gutfraind, A., Meyerhenke, H.: Generating scaled replicas of real-world complex networks. arXiv preprint arXiv:1609.02121 (2016)
- 25.Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. ACM (2015)Google Scholar
- 27.Wegner, A., et al.: Random graphs with motifs (2011)Google Scholar
- 29.Ying, X., Wu, X.: Graph generation with prescribed feature constraints. In: SDM, vol. 9, pp. 966–977. SIAM (2009)Google Scholar