Clickbait Detection

  • Martin PotthastEmail author
  • Sebastian Köpsel
  • Benno Stein
  • Matthias Hagen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)


This paper proposes a new model for the detection of clickbait, i.e., short messages that lure readers to click a link. Clickbait is primarily used by online content publishers to increase their readership, whereas its automatic detection will give readers a way of filtering their news stream. We contribute by compiling the first clickbait corpus of 2992 Twitter tweets, 767 of which are clickbait, and, by developing a clickbait model based on 215 features that enables a random forest classifier to achieve 0.79 ROC-AUC at 0.76 precision and 0.76 recall.


  1. 1.
    Ajani, S.: A full 63% of buzzfeed’s posts are clickbait (2015).
  2. 2.
    Beckman, J.: Saved you a click—don’t click on that. I already did (2015).
  3. 3.
    Blom, J.N., Hansen, K.R.: Click bait: forward-reference as lure in online news headlines. J. Pragmat. 76, 87–100 (2015)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Rocca, J.: Dale-Chall easy word list (2013).
  6. 6.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of ICML 2006, pp. 233–240 (2006)Google Scholar
  7. 7.
    Eidnes, L.: Auto-generating clickbait with recurrent neural networks (2015).
  8. 8.
    El-Arini, K., Tang, J.: News feed FYI: click-baiting (2014).
  9. 9.
    Gianotto, A.: Downworthy—a browser plugin to turn hyperbolic viral headlines into what they really mean (2014).
  10. 10.
    Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hagey, K.: Henry Blodget’s Second Act (2011).
  12. 12.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  13. 13.
    Imagga Image Tagging Technology (2015).
  14. 14.
    John, G.H., langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of UAI 1995, pp. 338–345 (1995)Google Scholar
  15. 15.
    Kempe, R.: Clickbait spoilers—channeling traffic from clickbaiting sites back to reputable providers of original content (2015).
  16. 16.
    Koechley, P.: Why the title matters more than the talk (2012).
  17. 17.
    Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of WSDM 2010, pp. 441–450 (2010)Google Scholar
  18. 18.
    le Cessie, S., van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 41(1), 191–201 (1992)CrossRefzbMATHGoogle Scholar
  19. 19.
    Loewenstein, G.: The psychology of curiosity: a review and reinterpretation. Psychol. Bull. 116(1), 75 (1994)CrossRefGoogle Scholar
  20. 20.
    Mizrahi, A.: HuffPo spoilers—I give in to click-bait so you don’t have to (2015).
  21. 21.
    NewsWhip Media Tracker (2015).
  22. 22.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: OSIR @ SIGIR (2006)Google Scholar
  23. 23.
    Smith, B.: Why buzzfeed doesn’t do clickbait (2015).
  24. 24.
    Stempeck, M.: Upworthy spoiler—words that describe the links that follow (2015).
  25. 25.
    Stone, P.J., Dunphy, D.C., Smith, M.S., Inquirer, T.G.: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)Google Scholar
  26. 26.
    Vijgen, B.: The listicle: an exploring research on an interesting shareable new media phenomenon. Stud. Univ. Babes-Bolyai-Ephemerides 1, 103–122 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Martin Potthast
    • 1
    Email author
  • Sebastian Köpsel
    • 1
  • Benno Stein
    • 1
  • Matthias Hagen
    • 1
  1. 1.Bauhaus-Universität WeimarWeimarGermany

Personalised recommendations