Advertisement

Clickbait Detection

  • Martin PotthastEmail author
  • Sebastian Köpsel
  • Benno Stein
  • Matthias Hagen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)

Abstract

This paper proposes a new model for the detection of clickbait, i.e., short messages that lure readers to click a link. Clickbait is primarily used by online content publishers to increase their readership, whereas its automatic detection will give readers a way of filtering their news stream. We contribute by compiling the first clickbait corpus of 2992 Twitter tweets, 767 of which are clickbait, and, by developing a clickbait model based on 215 features that enables a random forest classifier to achieve 0.79 ROC-AUC at 0.76 precision and 0.76 recall.

Keywords

Clickbait Random Forest Content Publishing Twitter Tweets News Streams 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Ajani, S.: A full 63% of buzzfeed’s posts are clickbait (2015). http://keyhole.co/blog/buzzfeed-clickbait/
  2. 2.
    Beckman, J.: Saved you a click—don’t click on that. I already did (2015). https://twitter.com/savedyouaclick
  3. 3.
    Blom, J.N., Hansen, K.R.: Click bait: forward-reference as lure in online news headlines. J. Pragmat. 76, 87–100 (2015)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Rocca, J.: Dale-Chall easy word list (2013). http://countwordsworth.com/download/DaleChallEasyWordList.txt
  6. 6.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of ICML 2006, pp. 233–240 (2006)Google Scholar
  7. 7.
    Eidnes, L.: Auto-generating clickbait with recurrent neural networks (2015). http://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks/
  8. 8.
    El-Arini, K., Tang, J.: News feed FYI: click-baiting (2014). http://newsroom.fb.com/news/2014/08/news-feed-fyi-click-baiting/
  9. 9.
    Gianotto, A.: Downworthy—a browser plugin to turn hyperbolic viral headlines into what they really mean (2014). http://downworthy.snipe.net
  10. 10.
    Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hagey, K.: Henry Blodget’s Second Act (2011). http://www.wsj.com/articles/SB10000872396390444840104577555180608254796
  12. 12.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  13. 13.
    Imagga Image Tagging Technology (2015). http://imagga.com
  14. 14.
    John, G.H., langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of UAI 1995, pp. 338–345 (1995)Google Scholar
  15. 15.
    Kempe, R.: Clickbait spoilers—channeling traffic from clickbaiting sites back to reputable providers of original content (2015). http://www.clickbaitspoilers.org
  16. 16.
    Koechley, P.: Why the title matters more than the talk (2012). http://blog.upworthy.com/post/26345634089/why-the-title-matters-more-than-the-talk
  17. 17.
    Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of WSDM 2010, pp. 441–450 (2010)Google Scholar
  18. 18.
    le Cessie, S., van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 41(1), 191–201 (1992)CrossRefzbMATHGoogle Scholar
  19. 19.
    Loewenstein, G.: The psychology of curiosity: a review and reinterpretation. Psychol. Bull. 116(1), 75 (1994)CrossRefGoogle Scholar
  20. 20.
    Mizrahi, A.: HuffPo spoilers—I give in to click-bait so you don’t have to (2015). https://twitter.com/huffpospoilers
  21. 21.
    NewsWhip Media Tracker (2015). http://www.newswhip.com
  22. 22.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: OSIR @ SIGIR (2006)Google Scholar
  23. 23.
    Smith, B.: Why buzzfeed doesn’t do clickbait (2015). http://www.buzzfeed.com/bensmith/why-buzzfeed-doesnt-do-clickbait
  24. 24.
    Stempeck, M.: Upworthy spoiler—words that describe the links that follow (2015). https://twitter.com/upworthyspoiler
  25. 25.
    Stone, P.J., Dunphy, D.C., Smith, M.S., Inquirer, T.G.: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)Google Scholar
  26. 26.
    Vijgen, B.: The listicle: an exploring research on an interesting shareable new media phenomenon. Stud. Univ. Babes-Bolyai-Ephemerides 1, 103–122 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Martin Potthast
    • 1
    Email author
  • Sebastian Köpsel
    • 1
  • Benno Stein
    • 1
  • Matthias Hagen
    • 1
  1. 1.Bauhaus-Universität WeimarWeimarGermany

Personalised recommendations