Abstract
Documents that are widely in use today included many atypical characteristics. In particular, non-standardization appears more frequently in e-mail documents than other documents due to the extensive use of informal expressions such as slang and abbreviation. Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier’s performance. We suggest a three-step preprocessing algorithm by stages for accurate automatic classification for each e-mail category. This research identifies e-mail document’s characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document’s atypical characteristics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jeong, O.-R., Cho, D.-S.: A Personalized Recommendation Agent System for E-mail Document Classification. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3045, pp. 558–565. Springer, Heidelberg (2004)
Lewis, D.D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of the 11th international Conference on Machine Learning, pp. 148–156 (1994)
Mitchell, T.M.: Machine Learning. Kluwer Academic Publishers, Dordrecht (1997)
Trensh, M., Palmer, N., Luniewski, A.: Type Classification of Semi-structured Documents. In: Proceedings of the 21st ACM SIGMOD International Conference on Management of Data, pp. 263–274 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jeong, OR., Cho, DS. (2005). A Three-Step Preprocessing Algorithm for Minimizing E-Mail Document’s Atypical Characteristics. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_68
Download citation
DOI: https://doi.org/10.1007/11540007_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)