Skip to main content
Log in

Identifying click baits using various machine learning and deep learning techniques

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

In today’s world, most readers prefer to read news online as they get instant access to what is happening right now. Furthermore, personalized recommendations help in keeping users engaged. Along with these virtues, online news has some vices as well. One such vice is the presence of alluring social media posts (tweets) relating to news articles whose sole purpose is to draw the attention of the users rather than directing them to read the actual content. Such posts are referred to as click baits. The objective of this paper is to develop a system which is capable of predicting how likely are the social media posts (tweets) relating to new articles tend to be click baits. GloVe embeddings [Pennington et al. in: Empirical methods in natural language processing (EMNLP), pp 1532–1543, 2014] have been used to represent text data numerically. Various novel features (like Word mover’s distances (Kusner et al. in: Proceedings of the 32nd international conference on international conference on machine learning, ICML’15, vol 37, pp 957–966, 2015), subjectivity, polarity of the tweets and so on) have been engineered. Several machine learning-based models like Logistic Regression, Random Forest, XG-Boost and Light GBM have been trained for classification. Moreover, we have also implemented a few deep learning-based models like Deep Neural Networks and Long Short Term Memory for developing this predictive system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Chakraborty A, Sarkar R, Mrigen A, Ganguly N (2017) Tabloids in the era of social media? Understanding the production and consumption of clickbaits in twitter. In: PACMHCI, pp 1–21

  2. Chakraborty A, Paranjape B, Kakarla S, Ganguly N (2016) Stop clickbait: detecting and preventing clickbaits in online news media. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 9–16

  3. Rony MMU, Hassan N, Yousuf M (2017) Diving deep into clickbaits: Who use them to what extents in which topics with what effects? In: ASONAM

  4. Deudon M (2018) Learning semantic similarity in a continuous space. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31, Curran Associates, Inc., pp 986–997

  5. Kusner MJ, Sun Y, Kolkin NI, Weinberger KQ (2015) From word embeddings to document distances. In: Proceedings of the 32nd international conference on international conference on machine learning, ICML’15, JMLR.org, vol 37, pp 957–966

  6. Anand A, Chakraborty T, Park N (2016) We used neural networks to detect clickbaits: You won’t believe what happened next! CoRR. abs/1612.01340. arXiv:1612.01340v2

  7. Biyani P, Tsioutsiouliklis K, Blackmer J (2016) “8 amazing secrets for getting more clicks”: detecting clickbaits in news streams using article informality. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16, AAAI Press, pp 94–100

  8. Cao X, Le T, Zhang J (2017) Machine learning based detection of clickbait posts in social media. CoRR. abs/1710.01977. arXiv:1710.01977v1

  9. Elyashar A, Bendahan J, Puzis R (2017) Detecting clickbait in online social media: You won’t believe how we did it. CoRR. abs/1710.06699. arXiv:1710.06699v1

  10. Glenski M, Ayton E, Arendt D, Volkova S (2017) Fishing for clickbaits in social images and texts with linguistically-infused neural network models. CoRR. abs/1710.06390. arXiv:1710.06390v1

  11. Grigorev A (2017) Identifying clickbait posts on social media with an ensemble of linear models. CoRR. abs/1710.00399. arXiv:1710.00399v1

  12. Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27, Curran Associates, Inc., pp 2042–2050

  13. Indurthi V, Oota SR (2017) Clickbait detection using word embeddings. CoRR. abs/1710.02861. arXiv:1710.02861v1

  14. Omidvar A, Jiang H, An A (2018) Using neural network for identifying clickbait in online news media. CoRR. abs/1806.07713. arXiv:1806.07713v1

  15. Papadopoulou O, Zampoglou M, Papadopoulos S, Kompatsiaris I (2017) A two-level classification approach for detecting clickbait posts using text-based features. CoRR. abs/1710.08528. arXiv:1710.08528v1

  16. Thomas P (2017) Clickbait identification using neural networks. arXiv e-prints: arXiv:1710.08721. arXiv:1710.08721v1

  17. Zhou Y (2017) Clickbait detection in tweets using self-attentive network. CoRR. abs/1710.05364. arXiv:1710.05364v1

  18. Potthast M, Gollub T, Komlossy K, Schuster S, Wiegmann M, Fernandez EPG, Hagen M, Stein B (2018) Crowdsourcing a large corpus of clickbait on twitter. In: COLING

  19. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543

  20. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  21. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD’16, ACM, New York, NY, USA, pp 785–794

  22. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: NIPS

  23. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, NIPS’13, vol 2, Curran Associates Inc, USA, pp 3111–3119

  24. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  25. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Brew J (2019) HuggingFace’s transformers: state-of-the-art natural language processing. CoRR. abs/1910.03771 arXiv:1910.03771v3

Download references

Acknowledgements

We thank Mr Yashu Kant Gupta, Mr Asif Iquebal Ajazi, Mr Uttam Kumar Pandey and Dr Swati Agarwal for their valuable time, guidance and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sohom Ghosh.

Additional information

Part of this work has been done when the author was previously associated with Times Internet and BITS, Pilani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, S. Identifying click baits using various machine learning and deep learning techniques. Int. j. inf. tecnol. 13, 1235–1242 (2021). https://doi.org/10.1007/s41870-020-00473-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-020-00473-1

Keywords

Navigation