Skip to main content

Mining for Fake News

  • 319 Accesses

Part of the Lecture Notes in Networks and Systems book series (LNNS,volume 450)

Abstract

Fake news is an ever-growing concern in the modern age of the internet. Discerning fake information from the truthful is an important task given the simplicity of sharing information digitally. In this paper, we present a data mining solution to classify articles as real or fake by using bag-of-words (BoW) and sequential mining techniques, and compare reliability for detecting fake news on various datasets. Specifically, our solution first cleans the input news by normalizing words and removing “filler” words. It then uses the BoW or sequential mining techniques to vectorize cleaned data. Afterwards, it trains the classification models based on vectorized data and classifies unseen news as real or fake. Evaluation on real-life data shows the feasibility of our solution to mine and classify fake news.

Keywords

  • Data mining
  • Big data analytics
  • Social networks
  • Fake news

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-99587-4_14
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   219.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-99587-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   279.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    https://www.kaggle.com/mrisdal/fake-news/kernels, https://www.kaggle.com/c/fake-news/, https://components.one/datasets/all-the-news-2-news-articles-dataset/, https://www.kaggle.com/mdepak/fakenewsnet.

  2. 2.

    https://lit.eecs.umich.edu/downloads.html,

    http://web.eecs.umich.edu/~mihalcea/downloads/fakeNewsDatasets.zip.

References

  1. Argenzio, B., Amatucci, N., Botte, M., D'Acierno, L., Di Costanzo, L., Pariota, L.: The use of automatic vehicle location (AVL) data for improving public transport service regularity. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 3. LNNS, vol. 227, pp. 667–676. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75078-7_66

  2. Leung, C.K., et al.: Data mining on open public transit data for transportation analytics during pre-COVID-19 era and COVID-19 era. In: Barolli, L., Li, K.F., Miwa, H. (eds.) INCoS 2020. AISC, vol. 1263, pp. 133–144. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-57796-4_13

  3. Xhafa, F., Aly, A., Juan, A.A.: Optimization of task allocations in cloud to fog environment with application to intelligent transportation systems. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 1. LNNS, vol. 225, pp. 1–12. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75100-5_1

  4. Leung, C.K.-S., Tanbeer, S.K., Cameron, J.J.: Interactive discovery of influential friends from social networks. Social Netw. Anal. Min. 4(1), 154:1–154:13 (2014). https://doi.org/10.1007/s13278-014-0154-z

  5. Leung, C.K., et al.: Parallel social network mining for interesting ‘following’ patterns. Concurr. Computat. Pract. Exp. 28(15), 3994–4012 (2016)

    CrossRef  Google Scholar 

  6. Honda, M., Toshima, J., Suganuma, T., Takahashi, A.: Design of healthcare information sharing methods using range-based information disclosure incentives. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 1. LNNS, vol. 225, pp. 758–769. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75100-5_64

  7. Leung, C.K., Kaufmann, T.N., Wen, Y., Zhao, C., Zheng, H.: Revealing COVID-19 data by data mining and visualization. In: Barolli, L., Chen, H.-C., Miwa, H. (eds.) INCoS 2021. LNNS, vol. 312, pp. 70–83. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-84910-8_8

  8. Souza, J., Leung, C.K., Cuzzocrea, A.: An Innovative big data predictive analytics framework over hybrid big data sources with an application for disease analytics. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds.) AINA 2020. AISC, vol. 1151, pp. 669–680. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44041-1_59

    CrossRef  Google Scholar 

  9. Braun, P., et al.: Game data mining: clustering and visualization of online game data in cyber-physical worlds. Proc. Comput. Sci. 112, 2259–2268 (2017)

    CrossRef  Google Scholar 

  10. Anderson-Gregoire, I.M., et al.: A big data science solution for analytics on moving objects. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 2. LNNS, vol. 226, pp. 133–145. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75075-6_11

  11. Atif, F., Rodriguez, M., Araujo, L.J.P., Amartiwi, U., Akinsanya, B.J., Mazzara, M.: A survey on data science techniques for predicting software defects. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 3. LNNS, vol. 227, pp. 298–309. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75078-7_31

  12. Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8

  13. Leung, C.K., et al.: Distributed uncertain data mining for frequent patterns satisfying anti-monotonic constraints. In: IEEE AINA Workshops 2014, pp. 1–6 (2014)

    Google Scholar 

  14. Leung, C.K., et al.: Fast algorithms for frequent itemset mining from uncertain data. In: IEEE ICDM 2014, pp. 893–898 (2014)

    Google Scholar 

  15. Liu, C., Li, X.: Mining method based on semantic trajectory frequent pattern. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021, vol. 2. LNNS, vol. 226, pp. 146–159. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75075-6_12

  16. Ni, J., Yin, W., Jiang, Y., Zhao, J., Hu, Y.: Periodic mining of traffic information in industrial control networks. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds.) AINA 2020. AISC, vol. 1151, pp. 176–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44041-1_16

    CrossRef  Google Scholar 

  17. Ngaffo, A.N., El Ayeb, W., Choukair, Z.: An IP multimedia subsystem service discovery and exposure approach based on opinion mining by exploiting Twitter trending topics. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds.) AINA 2019. AISC, vol. 926, pp. 431–445. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-15032-7_37

  18. Ahn, S., et al.: A fuzzy logic based machine learning tool for supporting big data business analytics in complex artificial intelligence environments. In: FUZZ-IEEE 2019, pp. 1259–1264 (2019)

    Google Scholar 

  19. Leung, C.K.: Mathematical model for propagation of influence in a social network. In: Alhajj, R., Rokne, J. (eds) Encyclopedia of Social Network Analysis and Mining, 2nd edn., pp. 1261–1269. Springer, New York (2018). https://doi.org/10.1007/978-1-4939-7131-2_110201

  20. Shu, K., et al.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explorat. 19(1), 22–36 (2017)

    CrossRef  Google Scholar 

  21. Whittaker, J.P.: Tech Giants, Artificial Intelligence and the Future of Journalism. Routledge, New York (2019)

    CrossRef  Google Scholar 

  22. Christin, A.: Metrics at Work: Journalism and the Contested Meaning of Algorithms. Princeton University Press (2020)

    Google Scholar 

  23. Sriram, S.: An Evaluation of Text Representation Techniques for Fake News Detection Using: TF-IDF, Word Embeddings, Sentence Embeddings with Linear Support Vector Machine. M.Sc. Dissertation, Technological University Dublin (2020). https://doi.org/10.21427/5519-h979

  24. Hartley, K., Vu, M.K.: Fighting fake news in the COVID-19 era: policy insights from an equilibrium model. Policy Sci. 53(4), 735–758 (2020). https://doi.org/10.1007/s11077-020-09405-z

    CrossRef  Google Scholar 

  25. Horne, B.D., Adah, S.: This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In: ICWSM 2017 Workshop W7 on NECO, pp. 759–766 (2017). https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15772/14898

  26. Ibrishimova M.D., Li K.F.: A machine learning approach to fake news detection using knowledge verification and natural language processing. In: Barolli L., Nishino H., Miwa H. (eds) INCoS 2019. AISC, vol. 1035, pp. 223–234. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29035-1_22

  27. Shu, K., et al.: Mining disinformation and fake news: concepts, methods, and recent advancements. In: Disinformation, Misinformation, and Fake News in Social Media, pp. 1–19 (2020)

    Google Scholar 

  28. Pérez-Rosas, V., et al.: Automatic detection of fake news. In: COLING 2018, pp. 3391–3401 (2018). https://aclanthology.org/C18-1287

  29. Pei, J., et al.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE TKDE 16(11), 1424–1440 (2004)

    Google Scholar 

  30. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This project is partially supported by (a) Natural Sciences and Engineering Research Council of Canada (NSERC) and (b) University of Manitoba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carson K. Leung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Cabusas, R.M., Epp, B.N., Gouge, J.M., Kaufmann, T.N., Leung, C.K., Tully, J.R.A. (2022). Mining for Fake News. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2022. Lecture Notes in Networks and Systems, vol 450. Springer, Cham. https://doi.org/10.1007/978-3-030-99587-4_14

Download citation