Skip to main content
Log in

Optimal Representation of Large-Scale Graph Data Based on K2-Tree

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Graph is widely used to model data in various applications. With the rapid growth of many emerging applications such as Internet of Things, it is urgent to require the processing capability on large scale graphs with billions of vertices. Web graph is a typical case of graph data that is widely used for analyzing the structure, behavior and evolution of the World Wide Web. In this paper, we focus on optimal representation of large-scale Web graphs. Our work is motivated by the need of fit large-scale graphs into the main memory and carry out analyze on them. By analyzing the adjacency matrix of Web graphs, we find two characteristics on the distribution of 1s in the matrix. Firstly, only a very small proportion of elements in the matrix are 1s. Secondly, majority of 1s gather around the principal diagonal and form a few number of clusters in the matrix. Based on these characteristics, we first develop a clustering mechanism to locate the clusters of 1s in the adjacency matrix. Then, we combine this clustering mechanism with a structure named K2-tree and propose an approach for representing large-scale Web graphs compactly. Basic idea of the approach is trying to compress a large number of zeros as a single zero. Experimental results show that, our approach not only reduces the space for representing a Web graph, but also reduces the time consumption for operations such as retrieving neighbors of any nodes on the graph; compared with existing approaches, our approach achieves the best space/time tradeoff.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Sun, Y. C., Yu, X. P., Bie, R. F., & Song, H. B. (2016). Discovering time-dependent shortest path on traffic graph for drivers towards green driving. Journal of Network and Computer Applications, 83(1), 204–212.

    Google Scholar 

  2. Lv, Z., Song, H., Basanta-Val, P., Steed, A., & Jo, M. (2017). Next-generation big data analytics: State of the art, challenges, and future research topics. IEEE Transactions on Industrial Informatics. doi:10.1109/TII.2017.2650204.

    Google Scholar 

  3. Song, H., Srinivasan, R., Sookoor, T., Jeschke, S., & Cities, S. (2017). Foundations, principles and applications (pp. 1–906). Hoboken: Wiley.

    Google Scholar 

  4. Tawalbeh, L. A., Bakheder W., & Song H. (2016) A mobile cloud computing model using the cloudlet scheme for big data applications. In IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) (pp. 73–77). Washington, DC.

  5. Shojafar, M., Javanmardi, S., Abolfazli, S., & Cordeschi, N. (2015). FUGE: A joint meta-heuristic approach to cloud job scheduling algorithm using fuzzy theory and a genetic method. Cluster Computing, 18(2), 829–844.

    Article  Google Scholar 

  6. Shojafar, M., Cordeschi, N., & Baccarelli, E. (2016). Energy-efficient adaptive resource management for real-time vehicular cloud services. IEEE Transactions on Cloud Computing. doi:10.1109/TCC.2016.2551747.

    Google Scholar 

  7. Yu, J. G., Deng, X., Yu, D. X., Wang, G. H., & Gu, X. (2013). CWSC: Connected k-coverage working sets construction algorithm in wireless sensor networks. International Journal of Electronics and Communications, 67(11), 937–946.

    Article  Google Scholar 

  8. Sun, Y. C., Lu, C., Bie, R. F., & Zhang, J. S. (2016). Semantic relation computing theory and its application. Journal of Network and Computer Applications, 59, 219–229.

    Article  Google Scholar 

  9. Lin, C., Song, Z., Song, H., Zhou, Y., Wang, Y., & Guowei, W. (2016). Differential privacy preserving in big data analytics for connected health. Journal of Medical Systems, 40(4), 1–9.

    Google Scholar 

  10. Tawalbeh, L. A., Mehmood, R., Benkhlifa, E., & Song, H. (2016) Mobile cloud computing model and big data analysis for healthcare applications. In IEEE Access (Vol. 4, pp. 6171–6180).

  11. Lin, C., Wang, P., Song, H., Zhou, Y., Liu, Q., & Guowei, W. (2016). A differential privacy protection scheme for sensitive big data in body sensor networks. Annals of Telecommunications, 71(9–10), 465–475.

    Article  Google Scholar 

  12. Cordeschi, N., Shojafar, M., & Baccarelli, E. (2013). Energy-saving self-configuring networked data centers. Computer Networks, 57(17), 3479–3491.

    Article  Google Scholar 

  13. Yu, J. G., Chen, Y., Ma, L. R., Huang, B. G., & Cheng, X. Z. (2016). On connected target k-coverage in heterogeneous wireless sensor networks. Sensors, 16(1), 104.

    Article  Google Scholar 

  14. Meusel, R. (2015). The graph structure in the web—analyzed on different aggregation levels. The Journal of Web Science, 1(1), 33–47.

    Article  Google Scholar 

  15. Vitter, J. S. (2009). External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2), 209–271.

    Article  Google Scholar 

  16. Khan, L. R., Thuraisingham, B., McGlothlin, J., Masud, M. M., & Husain, M. F. (2011). Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering, 23(9), 1312–1327.

    Article  Google Scholar 

  17. Boldi, P., & Vigna, S. (2004). The webgraph framework I: Compression techniques. In: Proccedings of the 13th International Conference on World Wide Web (pp. 595–602).

  18. Brisaboa, N. R., Ladra, S., & Navarro, G. (2009). K2-trees for compact web graph representation. In: Proceedings of the 16th International Symposium on String Processing and Information Retrieval (pp. 18–30).

  19. Brisaboa, N. R., Ladra, S., & Navarro, G. (2014). Compact representation of Web graphs with extended functionality. Information Systems, 39(1), 152–174.

    Article  Google Scholar 

  20. http://law.di.unimi.it/datasets.php.

  21. Apostolico, A., & Drovandi, G. (2009). Graph compression by BFS. Algorithms, 2(3), 1031–1044.

    Article  MathSciNet  Google Scholar 

  22. Brisaboa, N. R., Ladra, S., & Navarro, G. (2013). DACs: Bringing direct access to variable-length codes. Information Processing and Management, 49(1), 392–404.

    Article  Google Scholar 

  23. Claude, F., & Navarro, G. (2010). Fast and compact web graph representations. ACM Transactions on the Web, 4(4), 523–530.

    Article  Google Scholar 

  24. http://webgraphs.recoded.cl/.

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China (Nos. 61572146, 61363030, 61572231, U1501252); the Natural Science Foundation of Guangxi Province (Nos. 2015GXNSFAA139285, 2016GXNSFDA380006); and the High Level of Innovation Team of Colleges and Universities in Guangxi and Outstanding Scholars Program.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tianlong Gu or Houbing Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, L., Zeng, X., Xu, Z. et al. Optimal Representation of Large-Scale Graph Data Based on K2-Tree. Wireless Pers Commun 95, 2271–2284 (2017). https://doi.org/10.1007/s11277-017-4087-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-017-4087-5

Keywords

Navigation