Automatic Foldering of Email Messages:A Combination Approach

Tam, Tony; Ferreira, Artur; Lourenço, André

doi:10.1007/978-3-642-28997-2_20

Tony Tam²²,
Artur Ferreira^22,23 &
André Lourenço^22,23

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7224))

Included in the following conference series:

European Conference on Information Retrieval

2770 Accesses
5 Citations

Abstract

Automatic organization of email messages into folders is both an open problem and challenge for machine learning techniques. Besides the effect of email overload, which affects many email users worldwide, there are some increasing difficulties caused by the semantics applied by each user. The varying number of folders and their meaning are personal and in many cases pose difficulties to learning methods. This paper addresses automatic organization of email messages into folders, based on supervised learning algorithms. The textual fields of the email message (subject and body) are considered for learning, with different representations, feature selection methods, and classifiers. The participant fields are embedded into a vector-space model representation. The classification decisions from the different email fields are combined by majority voting. Experiments on a subset of the Enron Corpus and on a private email data set show the significant improvement over both single classifiers on these fields as well as over previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Whittaker, S., Sidner, C.: Email overload - exploring personal information management of email. In: ACM Conference on Human Factors in Computing Systems, pp. 276–283 (1996)
Google Scholar
Brutlag, J., Meek, C.: Challenges of the Email Domain for Text Classification. In: International Conference on Machine Learning - ICML, pp. 103–110 (2000)
Google Scholar
Bekkerman, R., Mccallum, A., Huang, G.: Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. Technical report, University of Massachusetts (2004)
Google Scholar
Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS(LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)
Chapter Google Scholar
Roth, M., Barenholz, T., Ben-David, A., Deutscher, D., Flysher, G., Hassidim, A., Horn, I., Leichtberg, A., Leiser, N., Matias, Y., Merom, R.: Suggesting (More) Friends Using the Implicit Social Graph. In: International Conference on Machine Learning - ICML, pp. 233–241 (2011)
Google Scholar
Salton, G., Wong, A., Yang, C.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
McCallum, A.: Mallet: A machine learning for language toolkit (2002), http://mallet.cs.umass.edu
Chang, C., Lin, C.: LIBSVM: A Library for Support Vector Machines. ACM Trans. on Intelligent Systems and Technology 2(3), 1–39 (2011)
Google Scholar
Liu, L., Kang, J., Yu, J., Wang, Z.: A Comparative Study on Unsupervised Feature Selection Methods for Text Clustering. In: Int. Conference on Natural Language Processing and Knowledge Engineering, pp. 597–601. IEEE (2005)
Google Scholar
Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005)
Article Google Scholar
Das, S.: Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection. In: International Conference on Machine Learning - ICML, pp. 74–81 (1994)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (2006)
MATH Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers (2001)
Google Scholar
Ferreira, A., Figueiredo, M.: Feature Transformation and Reduction for Text Classification. In: International Workshop on Pattern Recognition in Information Systems, pp. 72–81 (2010)
Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. John Wiley & Sons (1991)
Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning, 1st edn. Springer, Heidelberg (2006)
MATH Google Scholar
Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications 38, 8696–8702 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Superior de Engenharia de Lisboa, Lisboa, Portugal
Tony Tam, Artur Ferreira & André Lourenço
Instituto de Telecomunicações, Lisboa, Portugal
Artur Ferreira & André Lourenço

Authors

Tony Tam
View author publications
You can also search for this author in PubMed Google Scholar
Artur Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
André Lourenço
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yahoo! Research, Diagonal 177, 08018, Barcelona, Spain
Ricardo Baeza-Yates & B. Barla Cambazoglu &
Centrum Wiskunde & Informatica, Science Park 123, Amsterdam, The Netherlands
Arjen P. de Vries
Websays, Nàpols 294 7-4, 08025, Barcelona, Spain
Hugo Zaragoza
Yahoo! Research, Diagnoal 177, 08018, Barcelona, Spain
Vanessa Murdock
Yahoo! Labs, Tower 3, Matam Park, 31905, Haifa, Israel
Ronny Lempel
ISTI-CNR, via G. Moruzzi, 1, 56124, Pisa, Italy
Fabrizio Silvestri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tam, T., Ferreira, A., Lourenço, A. (2012). Automatic Foldering of Email Messages:A Combination Approach. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-28997-2_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics