Abstract
OPTICS (Ordering Points To Identify the Clustering Structure) is an algorithm for finding density-based clusters in data.We introduce an adaptive dynamical clustering algorithm based on OPTICS. The algorithm is applied to clustering emails which are represented by quantitative profiles. Performance of the algorithm is assessed on public email corpuses TREC and CEAS.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Almeida, T.A., Yamakami, A.: Advances in spam filtering techniques. In: Elizondo, D.A., Solanas, A., Martinez, A. (eds.) Computational Intelligence for Privacy and Security. SCI, vol. 394, pp. 199–214. Springer, Heidelberg (2012)
Haider, P., Scheffer, T.: Bayesian clustering for email campaign detection. In: ICML 2009, pp. 385–392. ACM, New York (2009)
Qian, F., Pathak, A., Charlie Hu, Y., Morley Mao, Z., Xie, Y.: A case for unsupervised-learning-based spam filtering. In: SIGMETRICS 2010, pp. 367–368. ACM, New York (2010)
Whissell, J.S., Clarke, C.L.A.: Clustering for semi-supervised spam filtering. In: CEAS 2011, pp. 125–134. ACM, New York (2011)
Grendár, M., Škutová, J., Špitalský, V.: Spam filtering by quantitative profiles. Intnl. J. Comp. Sci. Issues 9, 265–271 (2012)
Grendár, M., Škutová, J., Špitalský, V.: Email categorization and spam fitering by random forest with new classes of quantitative profiles. In: Compstat 2012, pp. 283–294. ISI/IASC (2012)
Sroufe, P., Phithakkitnukoon, S., Dantu, R., Cangussu, J.: Email shape analysis. In: Kant, K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) ICDCN 2010. LNCS, vol. 5935, pp. 18–29. Springer, Heidelberg (2010)
Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. WIREs DMKD 1(3), 231–240 (2011)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on KDDM, pp. 226–231. AAAI Press (1996)
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: Proc. ACM SIGMOD 1999, pp. 49–60. Springer, Heidelberg (1999)
Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic Extraction of Clusters from Hierarchical Clustering Representations. In: Proc. 7th Pacific-Asia Conference on KDDM, pp. 75–87 (2003)
Brecheisen, S., Kriegel, H.P., Kröger, P., Pfeifle, M.: Visually Mining Through Cluster Hierarchies. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 400–411. SIAM (2004)
Achtert, E., Böhm, C., Kröger, P.: DeLi-Clu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 119–128. Springer, Heidelberg (2006)
Gorawski, M., Malczok, R.: AEC Algorithm: A Heuristic Approach to Calculating Density-Based Clustering Eps Parameter. In: Yakhno, T., Neuhold, E.J. (eds.) ADVIS 2006. LNCS, vol. 4243, pp. 90–99. Springer, Heidelberg (2006)
Cassisi, C., Ferro, A., Giugno, R., Pigola, G., Pulvirenti, A.: Enhancing density-based clustering: Parameter reduction and outlier detection. Info. Sys. 38, 317–330 (2013)
Achtert, E., Kriegel, H.P., Pryakhin, A., Schubert, M.: Hierarchical Density-Based Clustering for Multi-Represented Objects. In: MCD 2005. ICDM (2005)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2010) ISBN 3-900051-07-0, http://www.R-project.org
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321. SIAM (1993)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Špitalský, V., Grendár, M. (2013). OPTICS-Based Clustering of Emails Represented by Quantitative Profiles. In: Omatu, S., Neves, J., Rodriguez, J., Paz Santana, J., Gonzalez, S. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent Systems and Computing, vol 217. Springer, Cham. https://doi.org/10.1007/978-3-319-00551-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-00551-5_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00550-8
Online ISBN: 978-3-319-00551-5
eBook Packages: EngineeringEngineering (R0)