Abstract
Twitter is a prominent multilingual social networking site where users can post messages known as “tweets”. Twitter, like other social networking sites such as Facebook, allows users to categorize tweets by the use of “hashtags”. Communication on Twitter can be mapped in terms of hashtag graphs, where vertices correspond to hashtags, and edges correspond to co-occurrences of hashtags within the same distinct tweet. Furthermore, a vertex in hashtag graphs can be weighted with the number of tweets a hashtag has occurred in, and edges can be weighted with the number of tweets both hashtags have co-occurred in, creating a “weighted hashtag graph”. In this chapter, we describe additions to some well-known link prediction methods that allow the weights of both vertices and edges in a weighted hashtag graph to be taken into account. We base our novel predictive additions on the assumption that more popular hashtags have a higher probability to appear with other hashtags in the future. We then apply these improved methods to three sets of Twitter data with the intent of predicting hashtag co-occurrences in the future. In addition to these methods, we investigate the performance of a new, graph neural network-based framework, SEAL, which has been shown in past trials to perform better than heuristic-based approaches such as the Katz index, SimRank and rooted PageRank. Experiments were conducted on real-life data sets consisting of over 3,000,000 combined unique tweets and over 250,000 combined unique hashtags. Results from the experiments show that simpler heuristic-based scoring methods have marginal performance that decreases with the addition of more data over time. On the other hand, SEAL is shown to have superior performance in hashtag graph link prediction over the approaches it has been previously compared against in other domains. The AUC score of 0.959 obtained in our experiments by using SEAL significantly exceeds those of our benchmark approaches for link prediction, which include the Katz index, SimRank, and rooted PageRank.
Authors who have equal contribution
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.
Badami, M., & Nasraoui, O. (2018). Cross-domain hashtag recommendation and story revelation in social media. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 4294–4303). IEEE.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Barabâsi, A. L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and Its Applications, 311(3–4), 590–614.
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.
Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.
Chakrabarti, S. (2007). Dynamic personalized pagerank in entity-relation graphs. In Proceedings of the 16th International Conference on World Wide Web (pp. 571–580). ACM.
Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–874.
Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 855–864).
Jeh, G., & Widom, J. (2002). Simrank: a measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). ACM.
Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.
Kovács, I. A., Luck, K., Spirohn, K., Wang, Y., Pollis, C., Schlabach, S., Bian, W., Kim, D. K., Kishore, N., Hao, T., et al. (2019). Network-based prediction of protein interactions. Nature Communications, 10(1), 1240.
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.
Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and Its Applications, 390(6), 1150–1170.
Luxburg, U. V., Radl, A., & Hein, M. (2010). Getting lost in space: Large sample analysis of the resistance distance. In Advances in Neural Information Processing Systems (pp. 2622–2630).
Martinčić-Ipšić, S., Močibob, E., & Perc, M. (2017). Link prediction on twitter. PloS one, 12(7), e0181079.
Mendhe, C. H., Henderson, N., Srivastava, G., & Mago, V. (2020). A scalable platform to collect, store, visualize, and analyze big data in real time. IEEE Transactions on Computational Social Systems, 2020, 1–10.
Monti, F., Bronstein, M., & Bresson, X. (2017). Geometric matrix completion with recurrent multi-graph neural networks. In Advances in Neural Information Processing Systems (pp. 3697–3707).
Murata, T., & Moriyasu, S. (2007). Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (pp. 85–88). IEEE Computer Society.
Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025102.
Nickel, M., Murphy, K., Tresp, V., & Gabrilovich, E. (2015). A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1), 11–33.
Patel, K. D., Zainab, K., Heppner, A., Srivastava, G., & Mago, V. (2020). Using Twitter for diabetes community analysis. Network Modeling Analysis in Health Informatics and Bioinformatics. 9, 1–6.
Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 701–710).
Praznik, L., Srivastava, G., Mendhe, C., & Mago, V. (2019). Vertex-weighted measures for predicting links in hashtag graphs (pp. 1–8).
Qudar, M., & Mago, V. (2020). A survey on language models. https://www.researchgate.net/publication/344158120_A_Survey_on_Language_Models.
Quercia, D., Askham, H., & Crowcroft, J. (2012). Tweetlda: Supervised topic classification and link prediction in twitter. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 247–250). ACM.
Ribeiro, L. F., Saverese, P. H., & Figueiredo, D. R. (2017). struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 385–394).
Sandhu, M., Giabbanelli, P. J., & Mago, V. K. (2019). From social media to expert reports: The impact of source selection on automatically validating complex conceptual models of obesity. In International Conference on Human-Computer Interaction (pp. 434–452). Springer.
Sasaki, Y., et al. (2007). The truth of the f-measure. Teach Tutor Mater, 1(5), 1–5.
Sharma, G., Srivastava, G., & Mago, V. (2019). A framework for automatic categorization of social data into medical domains. IEEE Transactions on Computational Social Systems, 7(1), 129–140.
Shervashidze, N., Schweitzer, P., Van Leeuwen, E. J., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(77), 2539–2561.
Sokolova, K., & Perez, C. (2018). Elections and the twitter community: The case of right-wing and left-wing primaries for the 2017 french presidential election. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 1021–1026). IEEE.
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web (pp. 1067–1077).
Tassone, J., Yan, P., Simpson, M., Mendhe, C., Mago, V., & Choudhury, S. (2020). Utilizing deep learning to identify drug use on twitter data. arXiv preprint arXiv:2003.11522.
Valverde-Rebaza, J., & de Andrade Lopes, A. (2013) Exploiting behaviors of communities of twitter users for link prediction. Social Network Analysis and Mining, 3(4), 1063–1074.
Wang, W., Wu, L., Huang, Y., Wang, H., & Zhu, R. (2019). Link prediction based on deep convolutional neural network. Information, 10(5), 172.
Wang, X., Wei, F., Liu, X., Zhou, M., & Zhang, M. (2011). Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (pp. 1031–1040). ACM.
Wang, Y., Liu, J., Huang, Y., & Feng, X. (2016). Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1919–1933.
Zhang, M., & Chen, Y. (2018). Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems (pp. 5165–5175).
Zhang, M., Cui, Z., Neumann, M., & Chen, Y. (2018). An end-to-end deep learning architecture for graph classification. In Thirty-Second AAAI Conference on Artificial Intelligence.
Zhao, H., Du, L., & Buntine, W. (2017). Leveraging node attributes for incomplete relational data. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 4072–4081). JMLR. org.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Praznik, L., Qudar, M.M.A., Mendhe, C., Srivastava, G., Mago, V. (2021). Analysis of Link Prediction Algorithms in Hashtag Graphs. In: Çakırtaş, M., Ozdemir, M.K. (eds) Big Data and Social Media Analytics. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-67044-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-67044-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67043-6
Online ISBN: 978-3-030-67044-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)