Abstract
In this paper, we performed phishing and spam detection using textand data mining. For phishing websites detection, we extracted 17 features from the source code and URL of the websites and for spam-email detection we ap-plied text and data mining in tandem. In both studies, we achieved high sensi-tivity compared to previous studies and also provided decision rules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abi-Haidar, A., Rocha, L.M.: Adaptive Spam Detection Inspired by the Immune System. In: Bullock, S., Noble, J., Watson, R.A., Bedau, M.A. (eds.) Artificial Life XI: Eleventh International Conference on the Simulation and Synthesis of Living Systems, pp. 1–8. MIT Press (2008)
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishingdetection. In: Proceedings of the APWG Ecrime Researchers Summit, Pittsburgh, USA (2007)
Afroz, S., Greenstadt, R.: PhishZoo: An automated web phishing detection approach based on profiling and fuzzy matching, Technical Report DU-CS-09-03, Department of Computer Science, Drexel University, Pennsylvania, USA (2009)
ALmomani, A., Wan, T.-C., Altaher, A., Manasrah, A., Eman, A., Anbar, M., Esraa, A., Ramadass, S.: Evolving fuzzy neural network for phishing emails detection. Journal of Computer Science 8(7), 1099–1107 (2012)
Basnet, R., Mukkamala, S., Sung, A.H.: Detection of phishing attacks: A machine learning approach. In: Prasad, B. (ed.) Soft Computing Applications in Industry. STUDFUZZ, vol. 226, pp. 373–383. Springer, Heidelberg (2008)
Boykin, P.O., Roychowdhury, V.P.: Leveraging Social Networks to Fight Spam. IEEE Computer 38(4), 61–68 (2005)
Chou, N., Ledesma, R., Teraguchi, Y., Boneh, D., Mitchell, J.C.: Client side defense against Web based Identity Theft. In: Proceedings of 11th Annual Network and Distributed System Security Symposium, San Diego, CA (2004)
Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)
Diesner, J., Carley, K.M.: Exploration of Communication Networks from the Enron Email Corpus. In: Proc. of Workshop on Link Analysis, Counterterrorism and Security, SIAM International Conference on Data Mining, Newport Beach, California, USA (2005)
Fergus, T., Joe, C.: Phishing detection using Classifier Ensembles, e-Crime Researchers Summit, Tacoma, WA, 1-9 (2009)
Fumera, G., Pillai, I., Roli, F.: Spam Filtering Based on the Analysis of Text Information Embedded into Images. The Journal of Machine Learning Researc 7, 2699–2720 (2006)
He, M., Horng, S.-J., Fan, P., Khan, M.K., Run, R.-S., Lai, J.-L., Chen, R.-J., Sutanto, A.: An efficient phishing webpage detector. Expert Systems with Applications: An International Journal 38(10) (2011)
Herzberg, A., Gbara, A.: TrustBar: Protecting web users from spoofing and phishing attacks, Cryptology ePrint Archive: Report 2004/155 (2004)
Islam, R., Abawajy, J.: A multi-tier phishing detection and filtering approach. J. Network and Computer Applications 36(1), 324–335 (2013)
Lakshmi, V.S., Vijaya, M.S.: Efficient prediction of phishing websites using supervised learning algorithms. In: International Conference on Communication Technology and System Design, vol. 30, pp. 798–805 (2011)
Maher, A., Hossain, M.A., Fadi, T., Dahal, K.: Intelligent phishing website detection using fuzzy Techniques. In: 3rd International Conference Information and communication technologies: From theory to applications, ICTTA, Damascus, Syria, pp. 1–6 (2008)
Maher, A., Hossain, M.A., Fadi, T., Dahal, K.: Intelligent phishing Detection System for e-Banking using fuzzy data mining. Expert Systems with Applications 37(12), 7913–7921 (2010)
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes? In: Third Conference on Email and Anti-Spam, Mountain View, California, USA (2006)
Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: Twenty Second Annual Computer Security Applications Conference, pp. 381–392. IEEE Computer Society, Washington, DC (2006)
Pandey, M., Ravi, V.: Detecting Phishing emails using Text and Data mining. In: Proceedings of Internation Conference on Computational Intelligence and Computing Research (ICCIC 2012), Coimbatore, India, pp. 249–254 (2012)
Ravi, V., Lal, R., Rajkiran, N.: Foreign exchange rate prediction using Computational Intelligence Methods. International Journal of Computer Science and Industrial Management Applications 4, 659–670 (2012) ISSN 2150-7988
Spira, J.: Spam E-Mail and its Impact on IT Spending and Productivity, Basex Report (2003), http://www.basex.com/poty2003.nsfl
The Anti Phishing Working Group, http://www.antiphishing.org
Wei, C.-P., Chen, H.-C., Cheng, T.-H.: Effective spam filtering: A single-class learning and ensemble approach. Decision Support System 45(3), 491–503 (2008)
Whitworth, B., Whitworth, E.: Spam and the social-technical gap. IEEE Computer 37(10), 38–45 (2004)
Zhang, V., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing 3(4), 243–269 (2004)
Zhang, Y., Hong, J., Cranor, L.: CANTINA: A content-based approach to detecting phishing web sites. In: Proceedings of the international World Wide Web Conference, Banff, Alberta, Canada, May 8-12 (2007)
Phishtank, http://www.phishtank.com
PhishingCorpus, http://monkey.org/~jose/wiki/doku.php?id=PhishingCorpus
SpamAssassin, http://www.spamassassin.apache.org
Netcraft, http://toolbar.netcraft.com/
Carneiro, J., Leon, K., Caramalho, Í., van den Dool, C., Gardner, R., Oliveira, V., Berg-man, M., Sepúlveda, N., Paixão, T., Faro, J.: When three is not a crowd: a Cross regulation Model of the dynamics and repertoire selection of regulatory CD4 T cells. Immuno-logical Reviews 216(1), 48–68 (2007)
Anti Phishing Working Group, http://www.apwg.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Pandey, M., Ravi, V. (2013). Text and Data Mining to Detect Phishing Websites and Spam Emails. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Dash, S.S. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2013. Lecture Notes in Computer Science, vol 8298. Springer, Cham. https://doi.org/10.1007/978-3-319-03756-1_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-03756-1_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03755-4
Online ISBN: 978-3-319-03756-1
eBook Packages: Computer ScienceComputer Science (R0)