Using Negation and Phrases in Inducing Rules for Text Classification

Chua, Stephanie; Coenen, Frans; Malcolm, Grant; Fernando, Matías; Constantino, García

doi:10.1007/978-1-4471-2318-7_11

Stephanie Chua⁴,
Frans Coenen⁴,
Grant Malcolm⁴,
Matías Fernando⁴ &
…
García Constantino⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

634 Accesses

Abstract

An investigation into the use of negation in Inductive Rule Learning (IRL) for text classification is described. The use of negated features in the IRL process has been shown to improve effectiveness of classification. However, although in the case of small datasets it is perfectly feasible to include the potential negation of all possible features as part of the feature space, this is not possible for datasets that include large numbers of features such as those used in text mining applications. Instead a process whereby features to be negated can be identified dynamically is required. Such a process is described in the paper and compared with established techniques (JRip, NaiveBayes, Sequential Minimal Optimization (SMO), OlexGreedy). The work is also directed at an approach to text classification based on a “bag of phrases” representation; the motivation here being that a phrase contains semantic information that is not present in single keyword. In addition, a given text corpus typically contains many more key-phrase features than keyword features, therefore, providing more potential features to be negated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apté, C., Damerau, F. J., Weiss, S. M.: Automated learning of decision rules for text categorization. ACM Transactions on Information Systems 12, 233-251 (1994)
Article Google Scholar
Bakus, J., Kamel, M.: Document classification using phrases. Caelli, T. and Amin, A. and Duin, R. and de Ridder, D. and Kamel, M. (eds.): Structural, Syntactic, and Statistical Pattern Recognition, Lecture Notes in Computer Science, vol. 2396. Springer Berlin/Heidelberg, pp. 341-354 (2002)
Google Scholar
Chang, M., Poon, C. K.: Using phrases as features in email classification. Journal of Systems and Software, Elsevier Science Inc., 82, pp. 1036-1045 (2009)
Google Scholar
Chua, S., Coenen, F, Malcolm, G.: Classification Inductive Rule Learning with Negated Features. In: Proceedings of the 6th International Conference on Advanced Data Mining and Applications (ADMA’10), Part 1, Springer LNAI, pp. 125-136 (2010)
Google Scholar
Cohen, W.: Fast effective rule induction. In: Proceedings of the 12th Int. Conf. on Machine Learning (ICML), pp. 115-123, Morgan Kaufmann (1995)
Google Scholar
Fürnkranz, J., Mitchell, T., Riloff, E.: A case study in using linguistic phrases for text categorization on the WWW. In: Working Notes of the AAAI/ICML Workshop on Learning for Text Categorization, AAAI Press, pp. 5-12 (1998)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I. H.: The WEKA data mining software: An update. SIGKDD Explorations 11 10-18 (2009)
Article Google Scholar
Holmes, G., Trigg, L.: A diagnostic tool for tree based supervised classification learning algorithms. In: Proceedings of the 6th Int. Conf. on Neural Information Processing (ICONIP), pp. 514-519 (1999)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conf. on Machine Learning (ECML), pp. 137-142 (1998)
Google Scholar
Johnson, D. E., Oles, F. J., Zhang, T., Goetz, T.: A decision-tree-based symbolic rule induction system for text categorization. The IBM Systems Journal, 41 428-437 (2002)
Article Google Scholar
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the 12th Int. Conf. on Machine Learning, pp. 331-339 (1995)
Google Scholar
Lewis, D. D.: Reuters-21578 text categorization test collection, Distribution 1.0, README file (v 1.3). Available at http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt (2004)
Li, Z., Li, P., Wei, W., Liu, H., He, J., Liu, T., Du, X.: AutoPCS: A phrase-based text categorization system for similar texts. In: Li, Q., Feng, L., Pei, J., Wang, S., Zhou, X., Zhu, Q.-M. (eds.): Advances in Data and Web Management, Lecture Notes in Computer Science, vol. 5446. Springer Berlin/Heidelberg, pp. 369-380 (2009)
Google Scholar
McCallum, A., Nigam, K.: A comparison of event model for naive Bayes text classification. In: Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, pp. 41-48 (1998)
Google Scholar
Rullo, P., Cumbo, C., Policicchio, V. L.: Learning rules with negation for text categorization. In: Proceedings of the 22nd ACM Symposium on Applied Computing, pp. 409-416. ACM (2007)
Google Scholar
Rullo, P., Policicchio, V., Cumbo, C., Iiritano, S.: Olex: Effective rule learning for text categorization. Transaction on Knowledge and Data Engineering, 21 1118-1132 (2009)
Article Google Scholar
Scott, S., Matwin, S.: Feature engineering for text classification. In: Proceedings of the 16th Int. Conf. on Machine Learning (ICML), pp. 379-388 (1999)
Google Scholar
Wang, Y. J.: Language-independent pre-processing of large documentbases for text classifcation. PhD thesis (2007)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd ACM Int. Conf. on Research and Development in Information Retrieval, pp. 42-49 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, L69 3BX, Liverpool, UK
Stephanie Chua, Frans Coenen, Grant Malcolm, Matías Fernando & García Constantino

Authors

Stephanie Chua
View author publications
You can also search for this author in PubMed Google Scholar
Frans Coenen
View author publications
You can also search for this author in PubMed Google Scholar
Grant Malcolm
View author publications
You can also search for this author in PubMed Google Scholar
Matías Fernando
View author publications
You can also search for this author in PubMed Google Scholar
García Constantino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephanie Chua .

Editor information

Editors and Affiliations

University of Portsmouth, Lion Terrace, Portsmouth, PO1 3HE, United Kingdom
Max Bramer
School of Computing &, Mathematical Sciences, University of Greenwich, Park Row 30, London, SE10 9LS, United Kingdom
Miltos Petridis
, School of Computing and Informatics, Nottingham Trent University, Burton Street, Nottingham, NG1 4BU, United Kingdom
Lars Nolle

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chua, S., Coenen, F., Malcolm, G., Fernando, M., Constantino, G. (2011). Using Negation and Phrases in Inducing Rules for Text Classification. In: Bramer, M., Petridis, M., Nolle, L. (eds) Research and Development in Intelligent Systems XXVIII. SGAI 2011. Springer, London. https://doi.org/10.1007/978-1-4471-2318-7_11

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2318-7_11
Published: 14 October 2011
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2317-0
Online ISBN: 978-1-4471-2318-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics