Abstract
We propose a novel solution to the email classification problem: the integration of temporal information with the traditional content-based classification approaches. We discover temporal relations in an email sequence in the form of temporal sequential patterns and embed the discovered information into contentbased learning methods. The new heterogeneous classification system shows a good performance reducing the classification error by up to 22%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R. (1994). Fast Algorithms for Mining Association Rules. Proc. of the 20th Int. Conf. on Very Large Data Bases (VLDB), 487–499. Morgan Kaufmann.
Agrawal, R., Srikant, R. (1995). Mining Sequential Patterns. Proc. of the 11th Int. Conf. on Data Engineering (ICDE), 3–14. IEEE Computer Society Press.
Borgelt, C. (2002). Bayes, version 2. 7. http://fuzzy.cs.uni-magdeburg.de/’borgelt/software.html#bayes.
Chang, C.-C., Lin, C.-J. (2001). LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/“cjlin/libsvm/.
Ferris Research (2003). Spam Control: Problems & Opportunities. http:://www.ferris.com.
Gama, J. (1998). Combining classifiers by constructive induction. Proc. of the 10th European Conf. on Machine Learning (ECML), 178–189. Springer.
Höppner, F., Klawonn, F. (2002). Finding Informative Rules in Interval Sequences. Intelligent Data Analysis, 6, 237–255.
Kam, P., Fu, A. W. (2000). Discovering Temporal Patterns for Interval-based Events. Proc. of the 2nd Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), 317–326. Springer.
Kay, J., McCreath, E. (2001). Automatic Induction of Rules for E-Mail Classification. UM2001: 8th Int. Conf. on User Modeling, Workshop on User Modeling, Machine Learning and Information Retrieval.
Kleinberg, J. (2002). Bursty and Hierarchical Structure in Streams Proc. of the 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), 91–101.
Laxman, S., Unnikrishnan, K.P., Sastry, P.S. (2002). Generalized Frequent Episodes in Event Sequences. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Workshop on Temporal Data Mining.
Mannila, H., Toivonen, H., Verkamo, A. I. (1995). Discovering Frequent Episodes in Sequences. Proc. of the 1st Int. Conf. on Knowledge Discovery and Data Mining (KDD), 210–215. AAAI Press.
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. Proc. of the 17th Int. Conf. on Data Engineering (ICDE), 251–225. IEEE Computer Society Press.
Quinlan, J.R. (1992). C4. 5: Programs for Machine Learning. Morgan Kaufmann.
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian Approach to Filtering Junk E-Mail. Proc. of the AAAI Workshop on Learning for Text Categorization.
Srikant, R. Agrawal, R. (1996). Mining Sequential Patterns: Generalizations and Performance Improvements. Proc. of the 5th Int. Conf. on Extending Database Technology (EDBT), 3–17. Springer.
Zaki, M. (2001). SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning, 42 (1–2), 31–60.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kiritchenko, S., Matwin, S., Abu-Hakima, S. (2004). Email Classification with Temporal Features. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39985-8_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-39985-8_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21331-4
Online ISBN: 978-3-540-39985-8
eBook Packages: Springer Book Archive