Skip to main content

Deep Learning with Self-Attention Mechanism for Fake News Detection

  • Chapter
  • First Online:
Combating Fake News with Computational Intelligence Techniques

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1001))

Abstract

Nowadays, fake news is one of major concerns in our society, that is a form of news consisting of deliberate disinformation or hoaxes spread via traditional news media or online social media. Thus, this study aims to explore state-of-the-art methods for detecting fake news in order to design and implement classification models. Four different classification models based on deep learning with self-attention mechanism were trained and evaluated using current datasets that are available for this purpose. Three models explored traditional supervised learning, while the fourth model explored transfer learning by fine-tuning the pre-trained language model for the same task. All four models yield comparable results with the fourth model achieving the best classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bian, T., Xiao, X., Xu, T., Zhao, P., Huang, W., Rong, Y., & Huang, J. (2020). Rumor detection on social media with bi-directional graph convolutional networks. Proceedings of the AAAI Conference on Artificial Intelligence, 34(1), 549–556.

    Google Scholar 

  2. Kshetri, N., & Voas, J. (2017). The economics of “fake news.” IT Professional, 19(6), 8–12.

    Article  Google Scholar 

  3. Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–1151.

    Article  Google Scholar 

  4. Przybyla, P. (2020). Capturing the style of fake news. Proceedings of the AAAI Conference on Artificial Intelligence, 34(1), 490–497.

    Google Scholar 

  5. Oshikawa, R., Qian, J., & Wang, W. Y. (2018). A survey on natural language processing for fake news detection. arXiv:1811.00770

  6. Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The spread of fake news by social bots. arXiv:1707.07592

  7. Ruchansky, N., Seo, S., & Liu, Y. (2017). Csi: A hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 797–806).

    Google Scholar 

  8. Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., McCoy, R. T., Pavlick, E., et al. (2019). What do you learn from context? Probing for sentence structure in contextualized word representations. In International conference on learning representations.

    Google Scholar 

  9. Starbird, K., Maddock, J., Orand, M., Achterman, P., & Mason, R. M. (2014). Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston marathon bombing. In Conference 2014 Proceedings.

    Google Scholar 

  10. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555

  11. Pérez-Rosas, V., Kleinberg, B., Lefevre, A., & Mihalcea, R. (2017). Automatic detection of fake news. arXiv:1708.07104

  12. Zhou, X., & Zafarani, R. (2020). A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5), 1–40. https://doi.org/10.1145/3395046

    Article  Google Scholar 

  13. Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), 211–236.

    Article  Google Scholar 

  14. Chernyavskiy, A., Ilvovsky, D., & Nakov, P. (2021). Transformers: “The end of history” for NLP?. arXiv:2105.00813

  15. Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22–36.

    Article  Google Scholar 

  16. Conroy, N. K., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1), 1–4.

    Article  Google Scholar 

  17. Afroz, S., Brennan, M., & Greenstadt, R. (2012). Detecting hoaxes, frauds, and deception in writing style online. In IEEE symposium on security and privacy (pp. 461–475). IEEE.

    Google Scholar 

  18. Girgis, S., Amer, E., & Gadallah, M. (2018). Deep learning algorithms for detecting fake news in online text. In 13th International conference on computer engineering and systems (ICCES) (pp. 93–97). IEEE.

    Google Scholar 

  19. Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980

  20. Shu, K., Cui, L., Wang, S., Lee, D., & Liu, H. (2019). Defend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 395–405).

    Google Scholar 

  21. Zhang, J., Cui, L., Fu, Y., & Gouza, F. B. (2018). Fake news detection with deep diffusive network model. arXiv:1805.08751

  22. Qiao, Y., Wiechmann, D., & Kerz, E. (2020). A language-based approach to fake news detection through interpretable features and BRNN. In Proceedings of the 3rd international workshop on rumours and deception in social media (RDSM) (pp. 14–31).

    Google Scholar 

  23. Agarwal, A., Mittal, M., Pathak, A., & Goyal, L. M. (2020). Fake news detection using a blend of neural networks: An application of deep learning. SN Computer Science, 143(3), 1–9. https://doi.org/10.1007/s42979-020-00165-4

    Article  Google Scholar 

  24. Bajaj, S. (2017). The pope has a new baby! Fake news detection using deep learning. CS 224N (pp. 1–8).

    Google Scholar 

  25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. arXiv:1706.03762

  26. Fang, Y., Gao, J., Huang, C., Peng, H., & Wu, R. (2019). Self multi-head attention-based convolutional neural networks for fake news detection. PloS One, 14(9), e0222713.

    Google Scholar 

  27. Durrani, N., Sajjad, H., & Dalvi, F. (2021). How transfer learning impacts linguistic knowledge in deep NLP models?. arXiv:2105.15179

  28. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  29. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Stoyanov, V., et al. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692

  30. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942

  31. Gundapu, S., & Mamid, R. (2021). Transformer based automatic COVID-19 fake news detection system. arXiv:2101.00180

  32. Al Asaad, B., & Erascu, M. (2018). A tool for fake news detection. In 20th International symposium on symbolic and numeric algorithms for scientific computing (SYNASC) (pp. 379–386). IEEE.

    Google Scholar 

  33. Tang, J., Qu, M., & Mei, Q. (2015). Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1165–1174).

    Google Scholar 

  34. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 26. NIPS.

    Google Scholar 

  35. Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A neural probabilistic language model. The Journal of Machine Learning Research, 3, 1137–1155.

    MATH  Google Scholar 

  36. Almeida, F., & Xexéo, G. (2019). Word embeddings: A survey. arXiv:1901.09069

  37. Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. Studies in Linguistic Analysis.

    Google Scholar 

  38. Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.

    Article  Google Scholar 

  39. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

    Google Scholar 

  40. Bogoychev, N. (2020). Not all parameters are born equal: Attention is mostly what you need. arXiv:2010.11859

  41. Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259

  42. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  43. Hu, D. (2019). An introductory survey on attention mechanisms in NLP problems. In Proceedings of SAI intelligent systems conference (pp. 432–448). Springer.

    Google Scholar 

  44. Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. https://doi.org/10.1109/72.279181

    Article  Google Scholar 

  45. Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: A deep learning approach. SMU Data Science Review, 1(3), 10.

    Google Scholar 

  46. Vaswani, A., & Huang, A. (2020). Self-attention for generative models. Presentation slides at Stanford University. https://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf

  47. Alammar, J. (2018). The illustrated transformer. https://jalammar.github.io/illustrated-transformer/

  48. Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv:1312.4400

  49. Nasir, J. A., Khan, O. S., & Varlamis, I. (2021). Fake news detection: A hybrid CNN-RNN based deep learning approach. International Journal of Information Management Data Insights, 1(1), 100007. https://doi.org/10.1016/j.jjimei.2020.100007

    Article  Google Scholar 

  50. Singhania, S., Fernandez, N., & Rao, S. (2017). 3HAN: A deep neural network for fake news detection. In: D. Liu, S. Xie, Y. Li, D. Zhao, & E. S. El-Alfy (Eds.), Neural information processing. ICONIP 2017. Lecture notes in computer science (vol. 10635). Springer. https://doi.org/10.1007/978-3-319-70096-0_59

  51. Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv:2004.05150

  52. Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., & Dehak, N. (2019). Hierarchical transformers for long document classification. In IEEE automatic speech recognition and understanding workshop (ASRU) (pp. 838–844). IEEE. https://doi.org/10.1109/ASRU46091.2019.9003958

  53. Child, R., Gray, S., Radford, A., & Sutskever, I. (2019). Generating long sequences with sparse transformers. arXiv:1904.10509

  54. Cui, B., Li, Y., Chen, M., & Zhang, Z. (2019). Fine-tune BERT with sparse self-attention mechanism. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 3539–3544).

    Google Scholar 

  55. Huang, L., Yuan, Y., Guo, J., Zhang, C., Chen, X., & Wang, J. (2019). Interlaced sparse self-attention for semantic segmentation. arXiv:190712273

  56. Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8, 842–866. https://doi.org/10.1162/tacl_a_00349

    Article  Google Scholar 

  57. Ruder, S. (2019). The state of transfer learning in NLP. Sebastian ruder. https://ruder.io/state-of-transfer-learning-in-nlp/

  58. Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. https://doi.org/10.1109/TKDE.2009.191

    Article  Google Scholar 

  59. Ruder S. (2021). Transfer learning—machine learning’s next frontier. Sebastian ruder. https://ruder.io/transfer-learning/index.html#whatistransferlearning

  60. Ruder, S., Peters, M. E., Swayamdipta, S., & Wolf, T. (2019). Transfer learning in natural language processing. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Tutorials (pp. 15–18).

    Google Scholar 

  61. Mouratidis, D., Nikiforos, M. N., & Kermanidis, K. L. (2021). Deep learning for fake news detection in a pairwise textual input schema. Computation, 9(20). https://doi.org/10.3390/computation9020020

  62. Peters, M. E., Ammar, W., Bhagavatula, C., & Power, R. (2017). Semi-supervised sequence tagging with bidirectional language models. arXiv:1705.00108

  63. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2020). FakeNewsNet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data, 8(3), 171–188. https://doi.org/10.1089/big.2020.0062

    Article  Google Scholar 

  64. Wang, W. Y. (2017). “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th annual meeting of the association for computational linguistics (vol. 2, pp. 422–426). https://doi.org/10.18653/v1/P17-2067

  65. Ahmed, H., Traore, I., & Saad, S. (2018). Detecting opinion spams and fake news using text classification. Journal of Security and Privacy, 1(1).

    Google Scholar 

  66. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (vol. 32, pp. 8024–8035). Curran Associates, Inc.

    Google Scholar 

  67. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

    Google Scholar 

  68. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.

    MathSciNet  MATH  Google Scholar 

  69. Miranda, L. J. (2017). Understanding softmax and the negative log-likelihood. https://ljvmiranda921.github.io/notebook/2017/08/13/softmax-and-the-negative-log-likelihood/

  70. McCormick, C., & Ryan, N. (2020). BERT fine-tuning tutorial with Pytorch. mccor-mickml.com. https://mccormickml.com/2019/07/22/BERT-fine-tuning/

  71. Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv:1711.05101

  72. Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to fine-tune BERT for text classification?. In China national conference on Chinese computational linguistics (pp. 194–206). Springer.

    Google Scholar 

  73. Shakeel, D., & Jain, N. Fake news detection and fact verification using knowledge graphs and machine learning. https://doi.org/10.13140/RG.2.2.18349.41448

  74. Deepak, S., & Chitturi, B. (2020). Deep neural approach to fake-news identification. Procedia Computer Science, 167, 2236–2243.

    Article  Google Scholar 

  75. Liu, N. F., Gardner, M., Belinkov, Y., Peters, M. E., & Smith, N. A. (2019). Linguistic knowledge and transferability of contextual representations. arXiv:1903.08855

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marina Bagić Babac .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cvitanović, I., Babac, M.B. (2022). Deep Learning with Self-Attention Mechanism for Fake News Detection. In: Lahby, M., Pathan, AS.K., Maleh, Y., Yafooz, W.M.S. (eds) Combating Fake News with Computational Intelligence Techniques. Studies in Computational Intelligence, vol 1001. Springer, Cham. https://doi.org/10.1007/978-3-030-90087-8_10

Download citation

Publish with us

Policies and ethics