Advertisement

Fading affect bias: improving the trade-off between accuracy and efficiency in feature clustering

  • Ziyin Wang
  • Sepehr Farhand
  • Gavriil TsechpenakisEmail author
Special Issue Paper
  • 12 Downloads

Abstract

We present a fast and accurate center-based, single-pass clustering method, with main focus on improving the trade-off between accuracy and speed in computer vision problems, such as creating visual vocabularies. We use a stochastic mean-shift procedure to seek the local density peaks within a single pass of the data. We also present a dynamic kernel generation along with a density test procedure that finds the most promising kernel initializations. In our algorithm, we use two data structures, namely a dictionary of permanent kernels, and a short memory that is used to determine emerging kernels to be maintained and outliers to be discarded. We further develop a hierarchical realization of the algorithm such that it executes faster than most stream cluster algorithms, and the resulting tree serves as an efficient data structure to boost encoding procedure. We use our method for learning low-level image features on the fly, from unmanned aerial vehicle cameras, aiming at vision-driven maneuvering with reduced computational cost. In our experiments, we make extensive comparisons with popular clustering algorithms, with respect to accuracy and efficiency. Our algorithm showed improved accuracy and speed on datasets with sufficient cluster patterns. With noisy visual features, where natural clusters present inherent challenges (intra-cluster variability and inter-cluster similarities), we achieved high accuracy, compared to algorithms of higher complexity, while maintaining high efficiency: Only one method from the competition achieved lower run times, though with lower accuracy.

Keywords

Non parametric clustering Stream clustering Unsupervised detection 

Notes

References

  1. 1.
    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  2. 2.
    Cho, M., Kwak, S., Schmid, C., Ponce, J.: Unsupervised object discovery and localization in the wild: part-based matching with bottom-up region proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1201–1210 (2015)Google Scholar
  3. 3.
    Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  4. 4.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)CrossRefGoogle Scholar
  6. 6.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)zbMATHGoogle Scholar
  7. 7.
    Arthur, D., Vassilvitskii, S.: k-Means++: the advantages of careful seeding. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)Google Scholar
  8. 8.
    Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)CrossRefGoogle Scholar
  9. 9.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRefGoogle Scholar
  10. 10.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: ACM Sigmod Record, vol. 25, pp. 103–114. ACM (1996)Google Scholar
  11. 11.
    O’callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of the International Conference on Data Engineering, pp. 685–694. IEEE (2002)Google Scholar
  12. 12.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the International Conference on Very Large Data Bases, vol. 29, pp. 81–92. VLDB Endowment (2003)Google Scholar
  13. 13.
    Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM International Conference on Data Mining, pp. 328–339. SIAM (2006)Google Scholar
  14. 14.
    Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. (CSUR) 46(1), 13 (2013)CrossRefzbMATHGoogle Scholar
  15. 15.
    Sculley, D.: Web-scale k-means clustering. In: Proceedings of the International Conference on World Wide Web, pp. 1177–1178. ACM (2010)Google Scholar
  16. 16.
    Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. J. Exp. Algorithmics (JEA) 17, 2–4 (2012)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: Bico: birch meets coresets for k-means clustering. In: European Symposium on Algorithms, Springer, pp. 481–492 (2013)Google Scholar
  18. 18.
    Alvarez, L.M.L.: Data stream management systems, US Patent App. 14/375,845 (2014)Google Scholar
  19. 19.
    Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate k-means++ in sublinear time. In: AAAI, pp. 1459–1467 (2016)Google Scholar
  20. 20.
    Liberty, E., Sriharsha, R., Sviridenko, M.: An algorithm for online k-means clustering. In: Proceedings of the Workshop on Algorithm Engineering and Experiments, pp. 81–89. SIAM (2016)Google Scholar
  21. 21.
    Kobren, A., Monath, N., Krishnamurthy, A., McCallum, A.: A hierarchical algorithm for extreme clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 255–264. ACM (2017)Google Scholar
  22. 22.
    Ene, A., Im, S., Moseley, B.: Fast clustering using MapReduce. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 681–689. ACM (2011)Google Scholar
  23. 23.
    Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data k-means clustering using MapReduce. J. Supercomput. 70(3), 1249–1259 (2014)CrossRefGoogle Scholar
  24. 24.
    Balcan, M.-F.F., Ehrlich, S., Liang, Y.: Distributed \( k \)-means and \( k \)-median clustering on general topologies. In: Advances in Neural Information Processing Systems, pp. 1995–2003 (2013)Google Scholar
  25. 25.
    Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)CrossRefGoogle Scholar
  26. 26.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Sharing clusters among related groups: hierarchical dirichlet processes. In: Advances in Neural Information Processing Systems, pp. 1385–1392 (2005)Google Scholar
  27. 27.
    Kulis, B., Jordan, M.I.: Revisiting k-means: new algorithms via Bayesian nonparametrics. arXiv preprint arXiv:1111.0352
  28. 28.
    Sivic, J., Zisserman, A. et al.: Video google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, vol. 2, pp. 1470–1477 (2003)Google Scholar
  29. 29.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE (2010)Google Scholar
  30. 30.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  31. 31.
    Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 215–223 (2011)Google Scholar
  32. 32.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  33. 33.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  34. 34.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168 (2006)Google Scholar
  35. 35.
    Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. ACM SIGKDD Explor. Newsl. 2(1), 51–57 (2000)CrossRefGoogle Scholar
  36. 36.
    Walker, W.R., Skowronski, J.J.: The fading affect bias: but what the hell is it for? Appl. Cogn. Psychol. 23(8), 1122–1136 (2009)CrossRefGoogle Scholar
  37. 37.
    Asuncion, A., Newman, D.: Uci machine learning repository (2007). https://archive.ics.uci.edu/ml/index.php
  38. 38.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  39. 39.
    Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of the IEEE Workshop on Applications of Computer Vision, pp. 138–142. IEEE (1994)Google Scholar
  40. 40.
    LeCun, Y., Cortes, C., Burges, C.J.: The mnist database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
  41. 41.
    Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)CrossRefGoogle Scholar
  42. 42.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Indiana University - Purdue University IndianapolisIndianapolisUSA

Personalised recommendations