Skip to main content

Text and Data Mining to Detect Phishing Websites and Spam Emails

  • Conference paper
Swarm, Evolutionary, and Memetic Computing (SEMCCO 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8298))

Included in the following conference series:

Abstract

In this paper, we performed phishing and spam detection using textand data mining. For phishing websites detection, we extracted 17 features from the source code and URL of the websites and for spam-email detection we ap-plied text and data mining in tandem. In both studies, we achieved high sensi-tivity compared to previous studies and also provided decision rules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abi-Haidar, A., Rocha, L.M.: Adaptive Spam Detection Inspired by the Immune System. In: Bullock, S., Noble, J., Watson, R.A., Bedau, M.A. (eds.) Artificial Life XI: Eleventh International Conference on the Simulation and Synthesis of Living Systems, pp. 1–8. MIT Press (2008)

    Google Scholar 

  2. Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishingdetection. In: Proceedings of the APWG Ecrime Researchers Summit, Pittsburgh, USA (2007)

    Google Scholar 

  3. Afroz, S., Greenstadt, R.: PhishZoo: An automated web phishing detection approach based on profiling and fuzzy matching, Technical Report DU-CS-09-03, Department of Computer Science, Drexel University, Pennsylvania, USA (2009)

    Google Scholar 

  4. ALmomani, A., Wan, T.-C., Altaher, A., Manasrah, A., Eman, A., Anbar, M., Esraa, A., Ramadass, S.: Evolving fuzzy neural network for phishing emails detection. Journal of Computer Science 8(7), 1099–1107 (2012)

    Article  Google Scholar 

  5. Basnet, R., Mukkamala, S., Sung, A.H.: Detection of phishing attacks: A machine learning approach. In: Prasad, B. (ed.) Soft Computing Applications in Industry. STUDFUZZ, vol. 226, pp. 373–383. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Boykin, P.O., Roychowdhury, V.P.: Leveraging Social Networks to Fight Spam. IEEE Computer 38(4), 61–68 (2005)

    Article  MathSciNet  Google Scholar 

  7. Chou, N., Ledesma, R., Teraguchi, Y., Boneh, D., Mitchell, J.C.: Client side defense against Web based Identity Theft. In: Proceedings of 11th Annual Network and Distributed System Security Symposium, San Diego, CA (2004)

    Google Scholar 

  8. Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)

    Article  Google Scholar 

  9. Diesner, J., Carley, K.M.: Exploration of Communication Networks from the Enron Email Corpus. In: Proc. of Workshop on Link Analysis, Counterterrorism and Security, SIAM International Conference on Data Mining, Newport Beach, California, USA (2005)

    Google Scholar 

  10. Fergus, T., Joe, C.: Phishing detection using Classifier Ensembles, e-Crime Researchers Summit, Tacoma, WA, 1-9 (2009)

    Google Scholar 

  11. Fumera, G., Pillai, I., Roli, F.: Spam Filtering Based on the Analysis of Text Information Embedded into Images. The Journal of Machine Learning Researc 7, 2699–2720 (2006)

    Google Scholar 

  12. He, M., Horng, S.-J., Fan, P., Khan, M.K., Run, R.-S., Lai, J.-L., Chen, R.-J., Sutanto, A.: An efficient phishing webpage detector. Expert Systems with Applications: An International Journal 38(10) (2011)

    Google Scholar 

  13. Herzberg, A., Gbara, A.: TrustBar: Protecting web users from spoofing and phishing attacks, Cryptology ePrint Archive: Report 2004/155 (2004)

    Google Scholar 

  14. Islam, R., Abawajy, J.: A multi-tier phishing detection and filtering approach. J. Network and Computer Applications 36(1), 324–335 (2013)

    Article  Google Scholar 

  15. Lakshmi, V.S., Vijaya, M.S.: Efficient prediction of phishing websites using supervised learning algorithms. In: International Conference on Communication Technology and System Design, vol. 30, pp. 798–805 (2011)

    Google Scholar 

  16. Maher, A., Hossain, M.A., Fadi, T., Dahal, K.: Intelligent phishing website detection using fuzzy Techniques. In: 3rd International Conference Information and communication technologies: From theory to applications, ICTTA, Damascus, Syria, pp. 1–6 (2008)

    Google Scholar 

  17. Maher, A., Hossain, M.A., Fadi, T., Dahal, K.: Intelligent phishing Detection System for e-Banking using fuzzy data mining. Expert Systems with Applications 37(12), 7913–7921 (2010)

    Article  Google Scholar 

  18. Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes? In: Third Conference on Email and Anti-Spam, Mountain View, California, USA (2006)

    Google Scholar 

  19. Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: Twenty Second Annual Computer Security Applications Conference, pp. 381–392. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  20. Pandey, M., Ravi, V.: Detecting Phishing emails using Text and Data mining. In: Proceedings of Internation Conference on Computational Intelligence and Computing Research (ICCIC 2012), Coimbatore, India, pp. 249–254 (2012)

    Google Scholar 

  21. Ravi, V., Lal, R., Rajkiran, N.: Foreign exchange rate prediction using Computational Intelligence Methods. International Journal of Computer Science and Industrial Management Applications 4, 659–670 (2012) ISSN 2150-7988

    Google Scholar 

  22. Spira, J.: Spam E-Mail and its Impact on IT Spending and Productivity, Basex Report (2003), http://www.basex.com/poty2003.nsfl

  23. The Anti Phishing Working Group, http://www.antiphishing.org

  24. Wei, C.-P., Chen, H.-C., Cheng, T.-H.: Effective spam filtering: A single-class learning and ensemble approach. Decision Support System 45(3), 491–503 (2008)

    Article  Google Scholar 

  25. Whitworth, B., Whitworth, E.: Spam and the social-technical gap. IEEE Computer 37(10), 38–45 (2004)

    Article  Google Scholar 

  26. Zhang, V., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing 3(4), 243–269 (2004)

    Article  Google Scholar 

  27. Zhang, Y., Hong, J., Cranor, L.: CANTINA: A content-based approach to detecting phishing web sites. In: Proceedings of the international World Wide Web Conference, Banff, Alberta, Canada, May 8-12 (2007)

    Google Scholar 

  28. Phishtank, http://www.phishtank.com

  29. PhishingCorpus, http://monkey.org/~jose/wiki/doku.php?id=PhishingCorpus

  30. SpamAssassin, http://www.spamassassin.apache.org

  31. Netcraft, http://toolbar.netcraft.com/

  32. Carneiro, J., Leon, K., Caramalho, Í., van den Dool, C., Gardner, R., Oliveira, V., Berg-man, M., Sepúlveda, N., Paixão, T., Faro, J.: When three is not a crowd: a Cross regulation Model of the dynamics and repertoire selection of regulatory CD4 T cells. Immuno-logical Reviews 216(1), 48–68 (2007)

    Google Scholar 

  33. Anti Phishing Working Group, http://www.apwg.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Pandey, M., Ravi, V. (2013). Text and Data Mining to Detect Phishing Websites and Spam Emails. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Dash, S.S. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2013. Lecture Notes in Computer Science, vol 8298. Springer, Cham. https://doi.org/10.1007/978-3-319-03756-1_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03756-1_50

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03755-4

  • Online ISBN: 978-3-319-03756-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics