OPTICS-Based Clustering of Emails Represented by Quantitative Profiles

Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 217)

Abstract

OPTICS (Ordering Points To Identify the Clustering Structure) is an algorithm for finding density-based clusters in data.We introduce an adaptive dynamical clustering algorithm based on OPTICS. The algorithm is applied to clustering emails which are represented by quantitative profiles. Performance of the algorithm is assessed on public email corpuses TREC and CEAS.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Almeida, T.A., Yamakami, A.: Advances in spam filtering techniques. In: Elizondo, D.A., Solanas, A., Martinez, A. (eds.) Computational Intelligence for Privacy and Security. SCI, vol. 394, pp. 199–214. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Haider, P., Scheffer, T.: Bayesian clustering for email campaign detection. In: ICML 2009, pp. 385–392. ACM, New York (2009)Google Scholar
  3. 3.
    Qian, F., Pathak, A., Charlie Hu, Y., Morley Mao, Z., Xie, Y.: A case for unsupervised-learning-based spam filtering. In: SIGMETRICS 2010, pp. 367–368. ACM, New York (2010)Google Scholar
  4. 4.
    Whissell, J.S., Clarke, C.L.A.: Clustering for semi-supervised spam filtering. In: CEAS 2011, pp. 125–134. ACM, New York (2011)Google Scholar
  5. 5.
    Grendár, M., Škutová, J., Špitalský, V.: Spam filtering by quantitative profiles. Intnl. J. Comp. Sci. Issues 9, 265–271 (2012)Google Scholar
  6. 6.
    Grendár, M., Škutová, J., Špitalský, V.: Email categorization and spam fitering by random forest with new classes of quantitative profiles. In: Compstat 2012, pp. 283–294. ISI/IASC (2012)Google Scholar
  7. 7.
    Sroufe, P., Phithakkitnukoon, S., Dantu, R., Cangussu, J.: Email shape analysis. In: Kant, K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) ICDCN 2010. LNCS, vol. 5935, pp. 18–29. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. WIREs DMKD 1(3), 231–240 (2011)Google Scholar
  9. 9.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on KDDM, pp. 226–231. AAAI Press (1996)Google Scholar
  10. 10.
    Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: Proc. ACM SIGMOD 1999, pp. 49–60. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  11. 11.
    Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic Extraction of Clusters from Hierarchical Clustering Representations. In: Proc. 7th Pacific-Asia Conference on KDDM, pp. 75–87 (2003)Google Scholar
  12. 12.
    Brecheisen, S., Kriegel, H.P., Kröger, P., Pfeifle, M.: Visually Mining Through Cluster Hierarchies. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 400–411. SIAM (2004)Google Scholar
  13. 13.
    Achtert, E., Böhm, C., Kröger, P.: DeLi-Clu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 119–128. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Gorawski, M., Malczok, R.: AEC Algorithm: A Heuristic Approach to Calculating Density-Based Clustering Eps Parameter. In: Yakhno, T., Neuhold, E.J. (eds.) ADVIS 2006. LNCS, vol. 4243, pp. 90–99. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Cassisi, C., Ferro, A., Giugno, R., Pigola, G., Pulvirenti, A.: Enhancing density-based clustering: Parameter reduction and outlier detection. Info. Sys. 38, 317–330 (2013)CrossRefGoogle Scholar
  16. 16.
    Achtert, E., Kriegel, H.P., Pryakhin, A., Schubert, M.: Hierarchical Density-Based Clustering for Multi-Represented Objects. In: MCD 2005. ICDM (2005)Google Scholar
  17. 17.
    R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2010) ISBN 3-900051-07-0, http://www.R-project.org
  18. 18.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321. SIAM (1993)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  1. 1.Slovanet a.s.BratislavaSlovakia

Personalised recommendations