Advertisement

Active Distance-Based Clustering Using K-Medoids

  • Amin Aghaee
  • Mehrdad Ghadiri
  • Mahdieh Soleymani Baghshah
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9651)

Abstract

k-medoids algorithm is a partitional, centroid-based clustering algorithm which uses pairwise distances of data points and tries to directly decompose the dataset with n points into a set of k disjoint clusters. However, k-medoids itself requires all distances between data points that are not so easy to get in many applications. In this paper, we introduce a new method which requires only a small proportion of the whole set of distances and makes an effort to estimate an upper-bound for unknown distances using the inquired ones. This algorithm makes use of the triangle inequality to calculate an upper-bound estimation of the unknown distances. Our method is built upon a recursive approach to cluster objects and to choose some points actively from each bunch of data and acquire the distances between these prominent points from oracle. Experimental results show that the proposed method using only a small subset of the distances can find proper clustering on many real-world and synthetic datasets.

Keywords

Active k-medoids Active clustering Distance-based clustering Centroid-based clustering 

References

  1. 1.
    Amsterdam library of object images (aloi) (2004). http://aloi.science.uva.nl/
  2. 2.
    Example data sets for elki (2013). http://elki.dbs.ifi.lmu.de/wiki/DataSets
  3. 3.
    Arbelaez, A., Quesada, L.: Parallelising the k-meds clustering problem using space-partitioning. In: Proceedings of the Sixth Annual Symposium on Combinatorial Search, SOCS, Leavenworth, Washington, USA, 11–13 July 2013Google Scholar
  4. 4.
    Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 1027–1035, New Orleans, Louisiana, USA, 7–9 January 2007Google Scholar
  5. 5.
    Asuncion, A., Newman, D.: UCI machine learning repository datasets (2007). https://archive.ics.uci.edu/ml/datasets.html
  6. 6.
    Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of the Fourth SIAM International Conference on Data Mining, pp. 333–344, Lake Buena Vista, Florida, USA, 22–24 April 2004Google Scholar
  7. 7.
    Biswas, A., Jacobs, D.W.: Active image clustering with pairwise constraints from humans. Int. J. Comput. Vis. 108(1–2), 133–147 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Chen, M.: Synthesized dataset for k-medoids. http://www.mathworks.com/matlabcentral/fileexchange/28898-k-medoids/
  9. 9.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  10. 10.
    Eriksson, B., Dasarathy, G., Singh, A., Nowak, R.D.: Active clustering: robust and efficient hierarchical clustering using adaptively selected similarities. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 260–268, Fort Lauderdale, USA, 11–13 April 2011Google Scholar
  11. 11.
    Grira, N., Crucianu, M., Boujemaa, N.: Active semi-supervised fuzzy clustering. Pattern Recogn. 41(5), 1834–1844 (2008)CrossRefzbMATHGoogle Scholar
  12. 12.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)CrossRefGoogle Scholar
  13. 13.
    Krishnamurthy, A., Balakrishnan, S., Xu, M., Singh, A.: Efficient active algorithms for hierarchical clustering. In: Proceedings of the 29th International Conference on Machine Learning, ICML, Edinburgh, Scotland, UK, June 26-July 1, 2012Google Scholar
  14. 14.
    Mai, S.T., He, X., Hubig, N., Plant, C., Böhm, C.: Active density-based clustering. In: IEEE 13th International Conference on Data Mining, pp. 508–517, Dallas, TX, USA, 7–10 December 2013Google Scholar
  15. 15.
    Mobahi, H., Collobert, R., Weston, J.: Deep learning from temporal coherence in video. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 737–744, ICML, Montreal, Quebec, Canada, 14–18 June 2009Google Scholar
  16. 16.
    Nayar, S., Nene, S.A., Murase, H.: Columbia object image library (coil 100). Department of Computer Science, Columbia University, Technical report, CUCS-006-96 (1996)Google Scholar
  17. 17.
    Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML, pp. 1073–1080, Montreal, Quebec, Canada, 14–18 June 2009Google Scholar
  18. 18.
    Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)Google Scholar
  19. 19.
    Shamir, O., Tishby, N.: Spectral clustering on a budget. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 661–669, Fort Lauderdale, USA, 11–13 April 2011Google Scholar
  20. 20.
    Singh, S.S., Chauhan, N.: K-means v/s k-medoids: a comparative study. In: National Conference on Recent Trends in Engineering and Technology, vol. 13 (2011)Google Scholar
  21. 21.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)zbMATHGoogle Scholar
  22. 22.
    Ultsch, A.: Fundamental clustering problems suite (fcps). Technical report, University of Marburg (2005)Google Scholar
  23. 23.
    Voevodski, K., Balcan, M., Röglin, H., Teng, S., Xia, Y.: Active clustering of biological sequences. J. Mach. Learn. Res. 13, 203–225 (2012)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Vu, V., Labroche, N., Bouchon-Meunier, B.: Improving constrained clustering with active query selection. Pattern Recogn. 45(4), 1749–1758 (2012)CrossRefGoogle Scholar
  25. 25.
    Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), Williams College, pp. 577–584, Williamstown, MA, USA, June 28–July 1, 2001Google Scholar
  26. 26.
    Wang, X., Davidson, I.: Active spectral clustering. In: ICDM 2010, The 10th IEEE International Conference on Data Mining, pp. 561–568, Sydney, Australia, 14–17 December 2010Google Scholar
  27. 27.
    Wauthier, F.L., Jojic, N., Jordan, M.I.: Active spectral clustering via iterative uncertainty reduction. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, pp. 1339–1347, Beijing, China, 12–16 August 2012Google Scholar
  28. 28.
    Xiong, C., Johnson, D.M., Corso, J.J.: Active clustering with model-based uncertainty reduction. CoRR, abs/1402.1783 (2014)Google Scholar
  29. 29.
    Chen, Y., Keogh, E., Batista, G.: UCR time series classification archive (2015). http://www.cs.ucr.edu/~eamonn/time_series_data/

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Computer EngineeringSharif University of TechnologyTehranIran

Personalised recommendations