Abstract
Graph is widely used to model data in various applications. With the rapid growth of many emerging applications such as Internet of Things, it is urgent to require the processing capability on large scale graphs with billions of vertices. Web graph is a typical case of graph data that is widely used for analyzing the structure, behavior and evolution of the World Wide Web. In this paper, we focus on optimal representation of large-scale Web graphs. Our work is motivated by the need of fit large-scale graphs into the main memory and carry out analyze on them. By analyzing the adjacency matrix of Web graphs, we find two characteristics on the distribution of 1s in the matrix. Firstly, only a very small proportion of elements in the matrix are 1s. Secondly, majority of 1s gather around the principal diagonal and form a few number of clusters in the matrix. Based on these characteristics, we first develop a clustering mechanism to locate the clusters of 1s in the adjacency matrix. Then, we combine this clustering mechanism with a structure named K2-tree and propose an approach for representing large-scale Web graphs compactly. Basic idea of the approach is trying to compress a large number of zeros as a single zero. Experimental results show that, our approach not only reduces the space for representing a Web graph, but also reduces the time consumption for operations such as retrieving neighbors of any nodes on the graph; compared with existing approaches, our approach achieves the best space/time tradeoff.
Similar content being viewed by others
References
Sun, Y. C., Yu, X. P., Bie, R. F., & Song, H. B. (2016). Discovering time-dependent shortest path on traffic graph for drivers towards green driving. Journal of Network and Computer Applications, 83(1), 204–212.
Lv, Z., Song, H., Basanta-Val, P., Steed, A., & Jo, M. (2017). Next-generation big data analytics: State of the art, challenges, and future research topics. IEEE Transactions on Industrial Informatics. doi:10.1109/TII.2017.2650204.
Song, H., Srinivasan, R., Sookoor, T., Jeschke, S., & Cities, S. (2017). Foundations, principles and applications (pp. 1–906). Hoboken: Wiley.
Tawalbeh, L. A., Bakheder W., & Song H. (2016) A mobile cloud computing model using the cloudlet scheme for big data applications. In IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) (pp. 73–77). Washington, DC.
Shojafar, M., Javanmardi, S., Abolfazli, S., & Cordeschi, N. (2015). FUGE: A joint meta-heuristic approach to cloud job scheduling algorithm using fuzzy theory and a genetic method. Cluster Computing, 18(2), 829–844.
Shojafar, M., Cordeschi, N., & Baccarelli, E. (2016). Energy-efficient adaptive resource management for real-time vehicular cloud services. IEEE Transactions on Cloud Computing. doi:10.1109/TCC.2016.2551747.
Yu, J. G., Deng, X., Yu, D. X., Wang, G. H., & Gu, X. (2013). CWSC: Connected k-coverage working sets construction algorithm in wireless sensor networks. International Journal of Electronics and Communications, 67(11), 937–946.
Sun, Y. C., Lu, C., Bie, R. F., & Zhang, J. S. (2016). Semantic relation computing theory and its application. Journal of Network and Computer Applications, 59, 219–229.
Lin, C., Song, Z., Song, H., Zhou, Y., Wang, Y., & Guowei, W. (2016). Differential privacy preserving in big data analytics for connected health. Journal of Medical Systems, 40(4), 1–9.
Tawalbeh, L. A., Mehmood, R., Benkhlifa, E., & Song, H. (2016) Mobile cloud computing model and big data analysis for healthcare applications. In IEEE Access (Vol. 4, pp. 6171–6180).
Lin, C., Wang, P., Song, H., Zhou, Y., Liu, Q., & Guowei, W. (2016). A differential privacy protection scheme for sensitive big data in body sensor networks. Annals of Telecommunications, 71(9–10), 465–475.
Cordeschi, N., Shojafar, M., & Baccarelli, E. (2013). Energy-saving self-configuring networked data centers. Computer Networks, 57(17), 3479–3491.
Yu, J. G., Chen, Y., Ma, L. R., Huang, B. G., & Cheng, X. Z. (2016). On connected target k-coverage in heterogeneous wireless sensor networks. Sensors, 16(1), 104.
Meusel, R. (2015). The graph structure in the web—analyzed on different aggregation levels. The Journal of Web Science, 1(1), 33–47.
Vitter, J. S. (2009). External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2), 209–271.
Khan, L. R., Thuraisingham, B., McGlothlin, J., Masud, M. M., & Husain, M. F. (2011). Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering, 23(9), 1312–1327.
Boldi, P., & Vigna, S. (2004). The webgraph framework I: Compression techniques. In: Proccedings of the 13th International Conference on World Wide Web (pp. 595–602).
Brisaboa, N. R., Ladra, S., & Navarro, G. (2009). K2-trees for compact web graph representation. In: Proceedings of the 16th International Symposium on String Processing and Information Retrieval (pp. 18–30).
Brisaboa, N. R., Ladra, S., & Navarro, G. (2014). Compact representation of Web graphs with extended functionality. Information Systems, 39(1), 152–174.
Apostolico, A., & Drovandi, G. (2009). Graph compression by BFS. Algorithms, 2(3), 1031–1044.
Brisaboa, N. R., Ladra, S., & Navarro, G. (2013). DACs: Bringing direct access to variable-length codes. Information Processing and Management, 49(1), 392–404.
Claude, F., & Navarro, G. (2010). Fast and compact web graph representations. ACM Transactions on the Web, 4(4), 523–530.
Acknowledgements
This work is supported by the Natural Science Foundation of China (Nos. 61572146, 61363030, 61572231, U1501252); the Natural Science Foundation of Guangxi Province (Nos. 2015GXNSFAA139285, 2016GXNSFDA380006); and the High Level of Innovation Team of Colleges and Universities in Guangxi and Outstanding Scholars Program.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Chang, L., Zeng, X., Xu, Z. et al. Optimal Representation of Large-Scale Graph Data Based on K2-Tree. Wireless Pers Commun 95, 2271–2284 (2017). https://doi.org/10.1007/s11277-017-4087-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-017-4087-5