Detection and visualization of misleading content on Twitter

Abstract

The problems of online misinformation and fake news have gained increasing prominence in an age where user-generated content and social media platforms are key forces in the shaping and diffusion of news stories. Unreliable information and misleading content are often posted and widely disseminated through popular social media platforms such as Twitter and Facebook. As a result, journalists and editors are in need of new tools that can help them speed up the verification process for content that is sourced from social media. Motivated by this need, in this paper, we present a system that supports the automatic classification of multimedia Twitter posts into credible or misleading. The system leverages credibility-oriented features extracted from the tweet and the user who published it, and trains a two-step classification model based on a novel semisupervised learning scheme. The latter uses the agreement between two independent pretrained models on new posts as guiding signals for retraining the classification model. We analyze a large labeled dataset of tweets that shared debunked fake and confirmed real images and videos, and show that integrating the newly proposed features, and making use of bagging in the initial classifiers and of the semisupervised learning scheme, significantly improves classification accuracy. Moreover, we present a Web-based application for visualizing and communicating the classification results to end users.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    snopes.com/photos/airplane/malaysia.asp.

  2. 2.

    www.theguardian.com/business/2013/apr/23/ap-tweet-hack-wall-street-freefall.

  3. 3.

    github.com/MKLab-ITI/computational-verification.

  4. 4.

    github.com/MKLab-ITI/image-verification-corpus.

  5. 5.

    code.google.com/p/language-detection/.

  6. 6.

    github.com/jeffreybreen/twitter-sentiment-analysis-tutorial-201107.

  7. 7.

    onlineslangdictionary.com/word-list/0-a/.

  8. 8.

    www.languagerealm.com/spanish/spanishslang.php.

  9. 9.

    github.com/ipeirotis/ReadabilityMetrics.

  10. 10.

    www.mywot.com/.

  11. 11.

    wwwranking.webdatacommons.org/more.html.

  12. 12.

    data.alexa.com/data?cli=10&url=google.com.

  13. 13.

    Using: github.com/socialsensor/geo-util.

  14. 14.

    The selection is based on their performance on the training set during cross-validation.

  15. 15.

    reveal-mklab.iti.gr/reveal/fake/.

  16. 16.

    The VC was since then expanded with new data that was used as part of the VMU 2016 task. However, the bulk of the experiments reported here refer to the 2015 version of the data, so any reference to VC refers to the 2015 edition of the dataset, unless otherwise stated.

  17. 17.

    In MediaEval, each team can submit up to five runs.

References

  1. 1.

    Boididou C, Papadopoulos S, Kompatsiaris Y, Schifferes S, Newman N (2014) Challenges of computational verification in social multimedia. In: Proceedings of the companion publication of the 23rd international conference on world wide web companion, pp 743–748

  2. 2.

    Boididou C, Andreadou K, Papadopoulos S, Dang-Nguyen DT, Boato G, Riegler M, Kompatsiaris Y (2015a) Verifying multimedia use at mediaeval 2015. In: MediaEval 2015 workshop, Sept 14–15, 2015, Wurzen, Germany

  3. 3.

    Boididou C, Papadopoulos S, Dang-Nguyen DT, Boato G, Kompatsiaris Y (2015b) The certh-unitn participation @ verifying multimedia use 2015. In: MediaEval 2015 workshop, Sept 14–15, 2015, Wurzen, Germany

  4. 4.

    Boididou C, Papadopoulos S, Dang-Nguyen D, Boato G, Riegler M, Middleton SE, Petlund A, Kompatsiaris Y (2016a) Verifying multimedia use at mediaeval 2016. In: Working notes proceedings of the MediaEval 2016 workshop, Oct 20–21, 2016, Hilversum, The Netherlands

  5. 5.

    Boididou C, Papadopoulos S, Middleton SE, Dang-Nguyen D, Riegler M, Petlund A, Kompatsiaris Y (2016b) The VMU participation @ verifying multimedia use 2016. In: Working notes proceedings of the MediaEval 2016 workshop, Oct 20–21, 2016, The Netherlands

  6. 6.

    Boididou C, Middleton SE, Jin Z, Papadopoulos S, Dang-Nguyen DT, Boato G, Kompatsiaris Y (2017a) Verifying information with multimedia content on twitter. Multimedia Tools Appl. https://doi.org/10.1007/s11042-017-5132-9

    Google Scholar 

  7. 7.

    Boididou C, Papadopoulos S, Apostolidis L, Kompatsiaris Y (2017b) Learning to detect misleading content on twitter. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17. ACM, pp 278–286

  8. 8.

    Cao J, Jin Z, Zhang Y (2016) Mcg-ict at mediaeval 2016 verifying tweets from both text and visual content. In: Working notes proceedings of the MediaEval 2016 workshop, CEUR-WS.org, vol 1739, Oct 20–21, 2016, Hilversum, The Netherlands

  9. 9.

    Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. In: Proceedings of the 20th international conference on world wide web. ACM, pp 675–684

  10. 10.

    Gupta A, Lamba H, Kumaraguru P, Joshi A (2013) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on world wide web companion, pp 729–736

  11. 11.

    Gupta A, Kumaraguru P, Castillo C, Meier P (2014) Tweetcred: a real-time web-based system for assessing credibility of content on twitter. In: Proceedings of 6th international conference on social informatics (SocInfo)

  12. 12.

    Hassan N, Adair B, Hamilton J, Li C, Tremayne M, Yang J, Yu C (2015) The quest to automate fact-checking. In: Proceedings of the 2015 computation and journalism symposium, pp 1–5

  13. 13.

    Jin Z, Cao J, Zhang Y, Zhang Y (2015) Mcg-ict at mediaeval 2015: verifying multimedia use with a two-level classification model. In: MediaEval 2015 workshop, Sept 14–15, 2015, Wurzen, Germany

  14. 14.

    Jin Z, Cao J, Zhang Y, Zhou J, Tian Q (2017) Novel visual and statistical image features for microblogs news verification. IEEE Trans Multimedia 19(3):598–608

    Article  Google Scholar 

  15. 15.

    Kanske P, Kotz SA (2010) Leipzig affective norms for german: a reliability study. Behav Res Methods 42(4):987–991

    Article  Google Scholar 

  16. 16.

    Klein D, Manning CD (2003) Accurate unlexicalized parsing. In: Proceedings of the 41st annual meeting on association for computational linguistics—Volume 1, Association for Computational Linguistics. ACL’03, pp 423–430

  17. 17.

    Kumar S, West R, Leskovec J (2016) Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In: Proceedings of the 25th international conference on world wide web, WWW 2016, Montreal, Canada, April 11–15, 2016. ACM, pp 591–602

  18. 18.

    Maigrot C, Claveau V, Kijak E, Sicre R (2016) Mediaeval 2016: A multimodal system for the verifying multimedia use task. In: Working notes proceedings of the MediaEval 2016 workshop, Hilversum, vol 1739, CEUR-WS.org, Oct 20-21, 2016, The Netherlands

  19. 19.

    Martin N, Comm B (2014) Information verification in the age of digital journalism. In: Special libraries association annual conference, Vancouver

  20. 20.

    Metaxas P, Finn S, Mustafaraj E (2015) Using twittertrails.com to investigate rumor propagation. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing. ACM, pp 69–72

  21. 21.

    Middleton S (2015) Extracting attributed verification and debunking reports from social media: Mediaeval-2015 trust and credibility analysis of image and video. In: MediaEval 2015 workshop, Sept 14–15, 2015, Wurzen, Germany

  22. 22.

    O’Donovan J, Kang B, Meyer G, Hollerer T, Adalii S (2012) Credibility in context: An analysis of feature distributions in twitter. In: 2012 international conference on privacy, security, risk and trust (PASSAT) and 2012 international conference on social computing (SocialCom). IEEE, pp 293–301

  23. 23.

    Oikawa MA, Dias Z, de Rezende Rocha A, Goldenstein S (2016) Manifold learning and spectral clustering for image phylogeny forests. IEEE Trans Inf Forensics Secur 11(1):5–18

    Article  Google Scholar 

  24. 24.

    Pandey RC, Singh SK, Shukla KK (2016) Passive forensics in image and video using noise features: a review. Digit Investig 19:1–28. https://doi.org/10.1016/j.diin.2016.08.002

    Article  Google Scholar 

  25. 25.

    Phan QT, Budroni A, Pasquini C, Natale FGBD (2016) A hybrid approach for multimedia use verification. In: Working notes proceedings of the MediaEval 2016 Workshop, vol 1739, CEUR-WS.org, Octob 20–21, 2016, Hilversum, The Netherlands

  26. 26.

    Ratkiewicz J, Conover M, Meiss M, Gonçalves B, Patil S, Flammini A, Menczer F (2011) Truthy: mapping the spread of astroturf in microblog streams. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 249–252

  27. 27.

    Redondo J, Fraga I, Padrón I, Comesaña M (2007) The spanish adaptation of anew (affective norms for english words). Beh Res Methods 39(3):600–605

    Article  Google Scholar 

  28. 28.

    Resnick P, Carton S, Park S, Shen Y, Zeffer N (2014) Rumorlens: a system for analyzing the impact of rumors and corrections in social media. In: Proceedings of computational journalism conference

  29. 29.

    Rubin VL, Conroy NJ, Chen Y, Cornwell S (2016) Fake news or truth? using satirical cues to detect potentially misleading news. In: Proceedings of NAACL-HLT, pp 7–17

  30. 30.

    Shao C, Ciampaglia GL, Flammini A, Menczer F (2016) Hoaxy: a platform for tracking online misinformation. In: Proceedings of the 25th international conference companion on world wide web, pp 745–750

  31. 31.

    Silva E, de Carvalho TJ, Ferreira A, Rocha A (2015) Going deeper into copy-move forgery detection: exploring image telltales via multi-scale analysis and voting processes. J Vis Commun Image Represent 29:16–32

    Article  Google Scholar 

  32. 32.

    Silverman C (2013) Verification handbook. The European Journalism Centre (EJC), Maastricht

    Google Scholar 

  33. 33.

    Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris I, Tsoumakas G, Vlahavas I (2014) A comprehensive study over VLAD and Product Quantization in large-scale image retrieval. IEEE Trans Multimedia 16(6):1713–1728

    Article  Google Scholar 

  34. 34.

    Sun S, Liu H, He J, Du X (2013) Detecting event rumors on sina weibo automatically. In: Web technologies and applications—15th Asia-Pacific web conference, APWeb 2013, Sydney, Australia, April 4–6, 2013. Proceedings, lecture notes in computer science, vol 7808. Springer, pp 120–131

  35. 35.

    Teyssou D, Leung JM, Apostolidis E, Apostolidis K, Papadopoulos S, Zampoglou M, Papadopoulou O, Mezaris V (2017) The invid plug-in: web video verification on the browser. In: Proceedings of the 1st workshop on multimedia verification

  36. 36.

    Tsakalidis A, Papadopoulos S, Kompatsiaris I (2014) An ensemble model for cross-domain polarity classification on twitter. In: Web information systems engineering—WISE 2014. Springer, pp 168–177

  37. 37.

    Volkova S, Shaffer K, Jang JY, Hodas N (2017) Separating facts from fiction: linguistic models to classify suspicious and trusted news posts on twitter. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol 2, pp 647–653

  38. 38.

    Vosoughi S, Mohsenvand MN, Roy D (2017) Rumor gauge: predicting the veracity of rumors on twitter. ACM Trans Knowl Discov Data 11:1–36

    Article  Google Scholar 

  39. 39.

    Wu K, Yang S, Zhu KQ (2015) False rumors detection on sina weibo by propagation structures. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13–17, 2015. IEEE Computer Society, pp 651–662

  40. 40.

    Zampoglou M, Papadopoulos S, Kompatsiaris Y (2015) Detecting image splicing in the wild (web). In: IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 1–6

  41. 41.

    Zampoglou M, Papadopoulos S, Kompatsiaris Y, Bouwmeester R, Spangenberg J (2016) Web and social media image forensics for news professionals. In: Social media in the newsroom, papers from the 2016 ICWSM workshop, vol WS-16-19, Cologne, Germany, May 17, 2016. AAAI Press

  42. 42.

    Zampoglou M, Papadopoulos S, Kompatsiaris Y (2017) A large-scale evaluation of splicing localization algorithms for web images. Multimedia Tools Appl 76(4):4801–4834

    Article  Google Scholar 

  43. 43.

    Zubiaga A, Aker A, Bontcheva K, Liakata M, Procter R (2017) Detection and resolution of rumours in social media: a survey. CoRR. arXiv:1704.00656

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Symeon Papadopoulos.

Additional information

This work has been supported by the REVEAL and InVID projects, under Contract Nos. 610928 and 687786, respectively, funded by the European Commission.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Boididou, C., Papadopoulos, S., Zampoglou, M. et al. Detection and visualization of misleading content on Twitter. Int J Multimed Info Retr 7, 71–86 (2018). https://doi.org/10.1007/s13735-017-0143-x

Download citation

Keywords

  • Social media
  • Verification
  • Fake detection
  • Information credibility