Abstract
In today’s world, most readers prefer to read news online as they get instant access to what is happening right now. Furthermore, personalized recommendations help in keeping users engaged. Along with these virtues, online news has some vices as well. One such vice is the presence of alluring social media posts (tweets) relating to news articles whose sole purpose is to draw the attention of the users rather than directing them to read the actual content. Such posts are referred to as click baits. The objective of this paper is to develop a system which is capable of predicting how likely are the social media posts (tweets) relating to new articles tend to be click baits. GloVe embeddings [Pennington et al. in: Empirical methods in natural language processing (EMNLP), pp 1532–1543, 2014] have been used to represent text data numerically. Various novel features (like Word mover’s distances (Kusner et al. in: Proceedings of the 32nd international conference on international conference on machine learning, ICML’15, vol 37, pp 957–966, 2015), subjectivity, polarity of the tweets and so on) have been engineered. Several machine learning-based models like Logistic Regression, Random Forest, XG-Boost and Light GBM have been trained for classification. Moreover, we have also implemented a few deep learning-based models like Deep Neural Networks and Long Short Term Memory for developing this predictive system.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-020-00473-1/MediaObjects/41870_2020_473_Fig9_HTML.png)
Similar content being viewed by others
References
Chakraborty A, Sarkar R, Mrigen A, Ganguly N (2017) Tabloids in the era of social media? Understanding the production and consumption of clickbaits in twitter. In: PACMHCI, pp 1–21
Chakraborty A, Paranjape B, Kakarla S, Ganguly N (2016) Stop clickbait: detecting and preventing clickbaits in online news media. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 9–16
Rony MMU, Hassan N, Yousuf M (2017) Diving deep into clickbaits: Who use them to what extents in which topics with what effects? In: ASONAM
Deudon M (2018) Learning semantic similarity in a continuous space. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31, Curran Associates, Inc., pp 986–997
Kusner MJ, Sun Y, Kolkin NI, Weinberger KQ (2015) From word embeddings to document distances. In: Proceedings of the 32nd international conference on international conference on machine learning, ICML’15, JMLR.org, vol 37, pp 957–966
Anand A, Chakraborty T, Park N (2016) We used neural networks to detect clickbaits: You won’t believe what happened next! CoRR. abs/1612.01340. arXiv:1612.01340v2
Biyani P, Tsioutsiouliklis K, Blackmer J (2016) “8 amazing secrets for getting more clicks”: detecting clickbaits in news streams using article informality. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16, AAAI Press, pp 94–100
Cao X, Le T, Zhang J (2017) Machine learning based detection of clickbait posts in social media. CoRR. abs/1710.01977. arXiv:1710.01977v1
Elyashar A, Bendahan J, Puzis R (2017) Detecting clickbait in online social media: You won’t believe how we did it. CoRR. abs/1710.06699. arXiv:1710.06699v1
Glenski M, Ayton E, Arendt D, Volkova S (2017) Fishing for clickbaits in social images and texts with linguistically-infused neural network models. CoRR. abs/1710.06390. arXiv:1710.06390v1
Grigorev A (2017) Identifying clickbait posts on social media with an ensemble of linear models. CoRR. abs/1710.00399. arXiv:1710.00399v1
Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27, Curran Associates, Inc., pp 2042–2050
Indurthi V, Oota SR (2017) Clickbait detection using word embeddings. CoRR. abs/1710.02861. arXiv:1710.02861v1
Omidvar A, Jiang H, An A (2018) Using neural network for identifying clickbait in online news media. CoRR. abs/1806.07713. arXiv:1806.07713v1
Papadopoulou O, Zampoglou M, Papadopoulos S, Kompatsiaris I (2017) A two-level classification approach for detecting clickbait posts using text-based features. CoRR. abs/1710.08528. arXiv:1710.08528v1
Thomas P (2017) Clickbait identification using neural networks. arXiv e-prints: arXiv:1710.08721. arXiv:1710.08721v1
Zhou Y (2017) Clickbait detection in tweets using self-attentive network. CoRR. abs/1710.05364. arXiv:1710.05364v1
Potthast M, Gollub T, Komlossy K, Schuster S, Wiegmann M, Fernandez EPG, Hagen M, Stein B (2018) Crowdsourcing a large corpus of clickbait on twitter. In: COLING
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD’16, ACM, New York, NY, USA, pp 785–794
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: NIPS
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, NIPS’13, vol 2, Curran Associates Inc, USA, pp 3111–3119
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Brew J (2019) HuggingFace’s transformers: state-of-the-art natural language processing. CoRR. abs/1910.03771 arXiv:1910.03771v3
Acknowledgements
We thank Mr Yashu Kant Gupta, Mr Asif Iquebal Ajazi, Mr Uttam Kumar Pandey and Dr Swati Agarwal for their valuable time, guidance and support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of this work has been done when the author was previously associated with Times Internet and BITS, Pilani.
Rights and permissions
About this article
Cite this article
Ghosh, S. Identifying click baits using various machine learning and deep learning techniques. Int. j. inf. tecnol. 13, 1235–1242 (2021). https://doi.org/10.1007/s41870-020-00473-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-020-00473-1