Analysis of Link Prediction Algorithms in Hashtag Graphs

Praznik, Logan; Qudar, Mohiuddin Md Abdul; Mendhe, Chetan; Srivastava, Gautam; Mago, Vijay

doi:10.1007/978-3-030-67044-3_11

Logan Praznik¹⁵,
Mohiuddin Md Abdul Qudar¹⁶,
Chetan Mendhe¹⁶,
Gautam Srivastava^17,15 &
…
Vijay Mago¹⁶

Part of the book series: Lecture Notes in Social Networks ((LNSN))

948 Accesses
1 Citations

Abstract

Twitter is a prominent multilingual social networking site where users can post messages known as “tweets”. Twitter, like other social networking sites such as Facebook, allows users to categorize tweets by the use of “hashtags”. Communication on Twitter can be mapped in terms of hashtag graphs, where vertices correspond to hashtags, and edges correspond to co-occurrences of hashtags within the same distinct tweet. Furthermore, a vertex in hashtag graphs can be weighted with the number of tweets a hashtag has occurred in, and edges can be weighted with the number of tweets both hashtags have co-occurred in, creating a “weighted hashtag graph”. In this chapter, we describe additions to some well-known link prediction methods that allow the weights of both vertices and edges in a weighted hashtag graph to be taken into account. We base our novel predictive additions on the assumption that more popular hashtags have a higher probability to appear with other hashtags in the future. We then apply these improved methods to three sets of Twitter data with the intent of predicting hashtag co-occurrences in the future. In addition to these methods, we investigate the performance of a new, graph neural network-based framework, SEAL, which has been shown in past trials to perform better than heuristic-based approaches such as the Katz index, SimRank and rooted PageRank. Experiments were conducted on real-life data sets consisting of over 3,000,000 combined unique tweets and over 250,000 combined unique hashtags. Results from the experiments show that simpler heuristic-based scoring methods have marginal performance that decreases with the addition of more data over time. On the other hand, SEAL is shown to have superior performance in hashtag graph link prediction over the approaches it has been previously compared against in other domains. The AUC score of 0.959 obtained in our experiments by using SEAL significantly exceeds those of our benchmark approaches for link prediction, which include the Katz index, SimRank, and rooted PageRank.

Authors who have equal contribution

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.
Article Google Scholar
Badami, M., & Nasraoui, O. (2018). Cross-domain hashtag recommendation and story revelation in social media. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 4294–4303). IEEE.
Google Scholar
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Article MathSciNet Google Scholar
Barabâsi, A. L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and Its Applications, 311(3–4), 590–614.
Article MathSciNet Google Scholar
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.
Article Google Scholar
Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.
Google Scholar
Chakrabarti, S. (2007). Dynamic personalized pagerank in entity-relation graphs. In Proceedings of the 16th International Conference on World Wide Web (pp. 571–580). ACM.
Google Scholar
Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–874.
Article MathSciNet Google Scholar
Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 855–864).
Google Scholar
Jeh, G., & Widom, J. (2002). Simrank: a measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). ACM.
Google Scholar
Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.
Article Google Scholar
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.
Article Google Scholar
Kovács, I. A., Luck, K., Spirohn, K., Wang, Y., Pollis, C., Schlabach, S., Bian, W., Kim, D. K., Kishore, N., Hao, T., et al. (2019). Network-based prediction of protein interactions. Nature Communications, 10(1), 1240.
Article Google Scholar
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.
Article Google Scholar
Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and Its Applications, 390(6), 1150–1170.
Article Google Scholar
Luxburg, U. V., Radl, A., & Hein, M. (2010). Getting lost in space: Large sample analysis of the resistance distance. In Advances in Neural Information Processing Systems (pp. 2622–2630).
Google Scholar
Martinčić-Ipšić, S., Močibob, E., & Perc, M. (2017). Link prediction on twitter. PloS one, 12(7), e0181079.
Article Google Scholar
Mendhe, C. H., Henderson, N., Srivastava, G., & Mago, V. (2020). A scalable platform to collect, store, visualize, and analyze big data in real time. IEEE Transactions on Computational Social Systems, 2020, 1–10.
Google Scholar
Monti, F., Bronstein, M., & Bresson, X. (2017). Geometric matrix completion with recurrent multi-graph neural networks. In Advances in Neural Information Processing Systems (pp. 3697–3707).
Google Scholar
Murata, T., & Moriyasu, S. (2007). Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (pp. 85–88). IEEE Computer Society.
Google Scholar
Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025102.
Article Google Scholar
Nickel, M., Murphy, K., Tresp, V., & Gabrilovich, E. (2015). A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1), 11–33.
Article Google Scholar
Patel, K. D., Zainab, K., Heppner, A., Srivastava, G., & Mago, V. (2020). Using Twitter for diabetes community analysis. Network Modeling Analysis in Health Informatics and Bioinformatics. 9, 1–6.
Article Google Scholar
Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 701–710).
Google Scholar
Praznik, L., Srivastava, G., Mendhe, C., & Mago, V. (2019). Vertex-weighted measures for predicting links in hashtag graphs (pp. 1–8).
Google Scholar
Qudar, M., & Mago, V. (2020). A survey on language models. https://www.researchgate.net/publication/344158120_A_Survey_on_Language_Models.
Google Scholar
Quercia, D., Askham, H., & Crowcroft, J. (2012). Tweetlda: Supervised topic classification and link prediction in twitter. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 247–250). ACM.
Google Scholar
Ribeiro, L. F., Saverese, P. H., & Figueiredo, D. R. (2017). struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 385–394).
Google Scholar
Sandhu, M., Giabbanelli, P. J., & Mago, V. K. (2019). From social media to expert reports: The impact of source selection on automatically validating complex conceptual models of obesity. In International Conference on Human-Computer Interaction (pp. 434–452). Springer.
Google Scholar
Sasaki, Y., et al. (2007). The truth of the f-measure. Teach Tutor Mater, 1(5), 1–5.
Google Scholar
Sharma, G., Srivastava, G., & Mago, V. (2019). A framework for automatic categorization of social data into medical domains. IEEE Transactions on Computational Social Systems, 7(1), 129–140.
Article Google Scholar
Shervashidze, N., Schweitzer, P., Van Leeuwen, E. J., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(77), 2539–2561.
MathSciNet MATH Google Scholar
Sokolova, K., & Perez, C. (2018). Elections and the twitter community: The case of right-wing and left-wing primaries for the 2017 french presidential election. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 1021–1026). IEEE.
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web (pp. 1067–1077).
Google Scholar
Tassone, J., Yan, P., Simpson, M., Mendhe, C., Mago, V., & Choudhury, S. (2020). Utilizing deep learning to identify drug use on twitter data. arXiv preprint arXiv:2003.11522.
Google Scholar
Valverde-Rebaza, J., & de Andrade Lopes, A. (2013) Exploiting behaviors of communities of twitter users for link prediction. Social Network Analysis and Mining, 3(4), 1063–1074.
Article Google Scholar
Wang, W., Wu, L., Huang, Y., Wang, H., & Zhu, R. (2019). Link prediction based on deep convolutional neural network. Information, 10(5), 172.
Article Google Scholar
Wang, X., Wei, F., Liu, X., Zhou, M., & Zhang, M. (2011). Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (pp. 1031–1040). ACM.
Google Scholar
Wang, Y., Liu, J., Huang, Y., & Feng, X. (2016). Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1919–1933.
Article Google Scholar
Zhang, M., & Chen, Y. (2018). Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems (pp. 5165–5175).
Google Scholar
Zhang, M., Cui, Z., Neumann, M., & Chen, Y. (2018). An end-to-end deep learning architecture for graph classification. In Thirty-Second AAAI Conference on Artificial Intelligence.
Google Scholar
Zhao, H., Du, L., & Buntine, W. (2017). Leveraging node attributes for incomplete relational data. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 4072–4081). JMLR. org.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Brandon University, Brandon, MB, Canada
Logan Praznik & Gautam Srivastava
DaTALab, Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
Mohiuddin Md Abdul Qudar, Chetan Mendhe & Vijay Mago
Research Center for Interneural Computing, China Medical University, Taichung, Taiwan, Republic of China
Gautam Srivastava

Authors

Logan Praznik
View author publications
You can also search for this author in PubMed Google Scholar
Mohiuddin Md Abdul Qudar
View author publications
You can also search for this author in PubMed Google Scholar
Chetan Mendhe
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Mago
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gautam Srivastava .

Editor information

Editors and Affiliations

Bilkent yerleşkesi, Turkish Ministry of Health, Çankaya, Ankara, Turkey
Mehmet Çakırtaş
Computer Engineering, Istanbul Medipol University, Istanbul, Turkey
Mehmet Kemal Ozdemir

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Praznik, L., Qudar, M.M.A., Mendhe, C., Srivastava, G., Mago, V. (2021). Analysis of Link Prediction Algorithms in Hashtag Graphs. In: Çakırtaş, M., Ozdemir, M.K. (eds) Big Data and Social Media Analytics. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-67044-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-67044-3_11
Published: 19 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67043-6
Online ISBN: 978-3-030-67044-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics