Abstract
In this work we propose a method to capture the writing style of spams and non-spam messages by preserving the sequentiality of the text in the feature space. To be more specific, we propose to build the feature vector considering the features apparition order in the text. We extract features from messages by applying three techniques: Extrinsic Information, Sequential Labeling Extraction and Term Clustering. In doing so, the method presents low dimensional feature space that shows competitive classification accuracy for the tested classifiers.
Chapter PDF
Similar content being viewed by others
Keywords
References
Siefkes, C., Assis, F., Chhabra, S., Yerazunis, W.S.: Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering. In: Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 410–421. Springer-Verlag New York, Inc. (2004)
Zelikovitz, S., Hirsh, H.: Improving short text classification using unlabeled background knowledge to assess document similarity. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1183–1190 (2000)
Cormack, G.V., Gmez Hidalgo, J.M., Snz, E.P.: Spam Filtering for Short Messages. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 313–320. ACM (2007)
Cormack, G.V., Hidalgo, J.M.G., Snz, E.P.: Feature Engineering for Mobile (SMS) Spam Filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 871–872. ACM (2007)
Santini, M.: A shallow approach to syntactic feature extraction for genre classification. In: Proceedings of the 7th Annual Colloquium for the UK Special Interest Group for Computational Linguistics, pp. 6–7 (2004)
Li, Y.H., Jain, A.K.: Classification of text documents. The Computer Journal, Br. Computer Soc 41, 537–546 (1998)
Sohn, D.-N., Lee, J.-T., Han, K.-S., Rim, H.-C.: Content-based mobile spam classification using stylistically motivated features. Pattern Recognition Letters 33, 364–369 (2012)
Assis, Fidelis. OSBF-Lua-A Text Classification Module for Lua: The Importance of the Training Method. En TREC (2006)
SMS Spam Collection v1.0, http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
Fabrizio, S.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Prado, D., Antonio, H., Ferneda, E.: Emerging technologies of text mining: techniques and applications. Information Science Reference (2008)
MySpell Dictionary, http://www.openoffice.org/lingucomponent/dictionary.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Serrano, J.M.B., Hernández Palancar, J., Cumplido, R. (2014). The Evaluation of Ordered Features for SMS Spam Filtering. In: Bayro-Corrochano, E., Hancock, E. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2014. Lecture Notes in Computer Science, vol 8827. Springer, Cham. https://doi.org/10.1007/978-3-319-12568-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-12568-8_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12567-1
Online ISBN: 978-3-319-12568-8
eBook Packages: Computer ScienceComputer Science (R0)