Skip to main content

Analysis of Link Prediction Algorithms in Hashtag Graphs

  • Chapter
  • First Online:
Big Data and Social Media Analytics

Abstract

Twitter is a prominent multilingual social networking site where users can post messages known as “tweets”. Twitter, like other social networking sites such as Facebook, allows users to categorize tweets by the use of “hashtags”. Communication on Twitter can be mapped in terms of hashtag graphs, where vertices correspond to hashtags, and edges correspond to co-occurrences of hashtags within the same distinct tweet. Furthermore, a vertex in hashtag graphs can be weighted with the number of tweets a hashtag has occurred in, and edges can be weighted with the number of tweets both hashtags have co-occurred in, creating a “weighted hashtag graph”. In this chapter, we describe additions to some well-known link prediction methods that allow the weights of both vertices and edges in a weighted hashtag graph to be taken into account. We base our novel predictive additions on the assumption that more popular hashtags have a higher probability to appear with other hashtags in the future. We then apply these improved methods to three sets of Twitter data with the intent of predicting hashtag co-occurrences in the future. In addition to these methods, we investigate the performance of a new, graph neural network-based framework, SEAL, which has been shown in past trials to perform better than heuristic-based approaches such as the Katz index, SimRank and rooted PageRank. Experiments were conducted on real-life data sets consisting of over 3,000,000 combined unique tweets and over 250,000 combined unique hashtags. Results from the experiments show that simpler heuristic-based scoring methods have marginal performance that decreases with the addition of more data over time. On the other hand, SEAL is shown to have superior performance in hashtag graph link prediction over the approaches it has been previously compared against in other domains. The AUC score of 0.959 obtained in our experiments by using SEAL significantly exceeds those of our benchmark approaches for link prediction, which include the Katz index, SimRank, and rooted PageRank.

Authors who have equal contribution

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.twitter.com

  2. 2.

    https://pypi.org/project/TwitterSearch/

  3. 3.

    https://docs.python.org/3.5/

  4. 4.

    https://dev.mysql.com/doc/refman/5.7/en

  5. 5.

    https://docs.oracle.com/javase/8/docs/api/overview-summary.html

References

  1. Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.

    Article  Google Scholar 

  2. Badami, M., & Nasraoui, O. (2018). Cross-domain hashtag recommendation and story revelation in social media. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 4294–4303). IEEE.

    Google Scholar 

  3. Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    Article  MathSciNet  Google Scholar 

  4. Barabâsi, A. L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and Its Applications, 311(3–4), 590–614.

    Article  MathSciNet  Google Scholar 

  5. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.

    Article  Google Scholar 

  6. Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.

    Google Scholar 

  7. Chakrabarti, S. (2007). Dynamic personalized pagerank in entity-relation graphs. In Proceedings of the 16th International Conference on World Wide Web (pp. 571–580). ACM.

    Google Scholar 

  8. Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–874.

    Article  MathSciNet  Google Scholar 

  9. Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 855–864).

    Google Scholar 

  10. Jeh, G., & Widom, J. (2002). Simrank: a measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). ACM.

    Google Scholar 

  11. Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.

    Article  Google Scholar 

  12. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.

    Article  Google Scholar 

  13. Kovács, I. A., Luck, K., Spirohn, K., Wang, Y., Pollis, C., Schlabach, S., Bian, W., Kim, D. K., Kishore, N., Hao, T., et al. (2019). Network-based prediction of protein interactions. Nature Communications, 10(1), 1240.

    Article  Google Scholar 

  14. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.

    Article  Google Scholar 

  15. Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and Its Applications, 390(6), 1150–1170.

    Article  Google Scholar 

  16. Luxburg, U. V., Radl, A., & Hein, M. (2010). Getting lost in space: Large sample analysis of the resistance distance. In Advances in Neural Information Processing Systems (pp. 2622–2630).

    Google Scholar 

  17. Martinčić-Ipšić, S., Močibob, E., & Perc, M. (2017). Link prediction on twitter. PloS one, 12(7), e0181079.

    Article  Google Scholar 

  18. Mendhe, C. H., Henderson, N., Srivastava, G., & Mago, V. (2020). A scalable platform to collect, store, visualize, and analyze big data in real time. IEEE Transactions on Computational Social Systems, 2020, 1–10.

    Google Scholar 

  19. Monti, F., Bronstein, M., & Bresson, X. (2017). Geometric matrix completion with recurrent multi-graph neural networks. In Advances in Neural Information Processing Systems (pp. 3697–3707).

    Google Scholar 

  20. Murata, T., & Moriyasu, S. (2007). Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (pp. 85–88). IEEE Computer Society.

    Google Scholar 

  21. Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025102.

    Article  Google Scholar 

  22. Nickel, M., Murphy, K., Tresp, V., & Gabrilovich, E. (2015). A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1), 11–33.

    Article  Google Scholar 

  23. Patel, K. D., Zainab, K., Heppner, A., Srivastava, G., & Mago, V. (2020). Using Twitter for diabetes community analysis. Network Modeling Analysis in Health Informatics and Bioinformatics. 9, 1–6.

    Article  Google Scholar 

  24. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 701–710).

    Google Scholar 

  25. Praznik, L., Srivastava, G., Mendhe, C., & Mago, V. (2019). Vertex-weighted measures for predicting links in hashtag graphs (pp. 1–8).

    Google Scholar 

  26. Qudar, M., & Mago, V. (2020). A survey on language models. https://www.researchgate.net/publication/344158120_A_Survey_on_Language_Models.

    Google Scholar 

  27. Quercia, D., Askham, H., & Crowcroft, J. (2012). Tweetlda: Supervised topic classification and link prediction in twitter. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 247–250). ACM.

    Google Scholar 

  28. Ribeiro, L. F., Saverese, P. H., & Figueiredo, D. R. (2017). struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 385–394).

    Google Scholar 

  29. Sandhu, M., Giabbanelli, P. J., & Mago, V. K. (2019). From social media to expert reports: The impact of source selection on automatically validating complex conceptual models of obesity. In International Conference on Human-Computer Interaction (pp. 434–452). Springer.

    Google Scholar 

  30. Sasaki, Y., et al. (2007). The truth of the f-measure. Teach Tutor Mater, 1(5), 1–5.

    Google Scholar 

  31. Sharma, G., Srivastava, G., & Mago, V. (2019). A framework for automatic categorization of social data into medical domains. IEEE Transactions on Computational Social Systems, 7(1), 129–140.

    Article  Google Scholar 

  32. Shervashidze, N., Schweitzer, P., Van Leeuwen, E. J., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(77), 2539–2561.

    MathSciNet  MATH  Google Scholar 

  33. Sokolova, K., & Perez, C. (2018). Elections and the twitter community: The case of right-wing and left-wing primaries for the 2017 french presidential election. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 1021–1026). IEEE.

    Google Scholar 

  34. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web (pp. 1067–1077).

    Google Scholar 

  35. Tassone, J., Yan, P., Simpson, M., Mendhe, C., Mago, V., & Choudhury, S. (2020). Utilizing deep learning to identify drug use on twitter data. arXiv preprint arXiv:2003.11522.

    Google Scholar 

  36. Valverde-Rebaza, J., & de Andrade Lopes, A. (2013) Exploiting behaviors of communities of twitter users for link prediction. Social Network Analysis and Mining, 3(4), 1063–1074.

    Article  Google Scholar 

  37. Wang, W., Wu, L., Huang, Y., Wang, H., & Zhu, R. (2019). Link prediction based on deep convolutional neural network. Information, 10(5), 172.

    Article  Google Scholar 

  38. Wang, X., Wei, F., Liu, X., Zhou, M., & Zhang, M. (2011). Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (pp. 1031–1040). ACM.

    Google Scholar 

  39. Wang, Y., Liu, J., Huang, Y., & Feng, X. (2016). Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1919–1933.

    Article  Google Scholar 

  40. Zhang, M., & Chen, Y. (2018). Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems (pp. 5165–5175).

    Google Scholar 

  41. Zhang, M., Cui, Z., Neumann, M., & Chen, Y. (2018). An end-to-end deep learning architecture for graph classification. In Thirty-Second AAAI Conference on Artificial Intelligence.

    Google Scholar 

  42. Zhao, H., Du, L., & Buntine, W. (2017). Leveraging node attributes for incomplete relational data. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 4072–4081). JMLR. org.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gautam Srivastava .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Praznik, L., Qudar, M.M.A., Mendhe, C., Srivastava, G., Mago, V. (2021). Analysis of Link Prediction Algorithms in Hashtag Graphs. In: Çakırtaş, M., Ozdemir, M.K. (eds) Big Data and Social Media Analytics. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-67044-3_11

Download citation

Publish with us

Policies and ethics