Skip to main content
Log in

Fading affect bias: improving the trade-off between accuracy and efficiency in feature clustering

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

We present a fast and accurate center-based, single-pass clustering method, with main focus on improving the trade-off between accuracy and speed in computer vision problems, such as creating visual vocabularies. We use a stochastic mean-shift procedure to seek the local density peaks within a single pass of the data. We also present a dynamic kernel generation along with a density test procedure that finds the most promising kernel initializations. In our algorithm, we use two data structures, namely a dictionary of permanent kernels, and a short memory that is used to determine emerging kernels to be maintained and outliers to be discarded. We further develop a hierarchical realization of the algorithm such that it executes faster than most stream cluster algorithms, and the resulting tree serves as an efficient data structure to boost encoding procedure. We use our method for learning low-level image features on the fly, from unmanned aerial vehicle cameras, aiming at vision-driven maneuvering with reduced computational cost. In our experiments, we make extensive comparisons with popular clustering algorithms, with respect to accuracy and efficiency. Our algorithm showed improved accuracy and speed on datasets with sufficient cluster patterns. With noisy visual features, where natural clusters present inherent challenges (intra-cluster variability and inter-cluster similarities), we achieved high accuracy, compared to algorithms of higher complexity, while maintaining high efficiency: Only one method from the competition achieved lower run times, though with lower accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. \(n_k\) in Eq. (1) denotes data points that contributed to updating kernel k previously, while \(N_k\) is the number of data points inside the current kernel location.

  2. \(p_f\) and \(\theta \) are related to each other: for smaller values of \(\theta \), higher values for \(\frac{N_k}{N}\) should be considered.

  3. If \(\Phi (x) =.99\) then \(x = 2.23\).

References

  1. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  2. Cho, M., Kwak, S., Schmid, C., Ponce, J.: Unsupervised object discovery and localization in the wild: part-based matching with bottom-up region proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1201–1210 (2015)

  3. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  4. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  6. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)

    MATH  Google Scholar 

  7. Arthur, D., Vassilvitskii, S.: k-Means++: the advantages of careful seeding. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

  8. Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)

    Article  Google Scholar 

  9. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)

    Article  Google Scholar 

  10. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: ACM Sigmod Record, vol. 25, pp. 103–114. ACM (1996)

  11. O’callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of the International Conference on Data Engineering, pp. 685–694. IEEE (2002)

  12. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the International Conference on Very Large Data Bases, vol. 29, pp. 81–92. VLDB Endowment (2003)

  13. Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM International Conference on Data Mining, pp. 328–339. SIAM (2006)

  14. Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. (CSUR) 46(1), 13 (2013)

    Article  MATH  Google Scholar 

  15. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the International Conference on World Wide Web, pp. 1177–1178. ACM (2010)

  16. Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. J. Exp. Algorithmics (JEA) 17, 2–4 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: Bico: birch meets coresets for k-means clustering. In: European Symposium on Algorithms, Springer, pp. 481–492 (2013)

  18. Alvarez, L.M.L.: Data stream management systems, US Patent App. 14/375,845 (2014)

  19. Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate k-means++ in sublinear time. In: AAAI, pp. 1459–1467 (2016)

  20. Liberty, E., Sriharsha, R., Sviridenko, M.: An algorithm for online k-means clustering. In: Proceedings of the Workshop on Algorithm Engineering and Experiments, pp. 81–89. SIAM (2016)

  21. Kobren, A., Monath, N., Krishnamurthy, A., McCallum, A.: A hierarchical algorithm for extreme clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 255–264. ACM (2017)

  22. Ene, A., Im, S., Moseley, B.: Fast clustering using MapReduce. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 681–689. ACM (2011)

  23. Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data k-means clustering using MapReduce. J. Supercomput. 70(3), 1249–1259 (2014)

    Article  Google Scholar 

  24. Balcan, M.-F.F., Ehrlich, S., Liang, Y.: Distributed \( k \)-means and \( k \)-median clustering on general topologies. In: Advances in Neural Information Processing Systems, pp. 1995–2003 (2013)

  25. Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)

    Article  Google Scholar 

  26. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Sharing clusters among related groups: hierarchical dirichlet processes. In: Advances in Neural Information Processing Systems, pp. 1385–1392 (2005)

  27. Kulis, B., Jordan, M.I.: Revisiting k-means: new algorithms via Bayesian nonparametrics. arXiv preprint arXiv:1111.0352

  28. Sivic, J., Zisserman, A. et al.: Video google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, vol. 2, pp. 1470–1477 (2003)

  29. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE (2010)

  30. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)

  31. Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 215–223 (2011)

  32. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  33. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)

  34. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168 (2006)

  35. Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. ACM SIGKDD Explor. Newsl. 2(1), 51–57 (2000)

    Article  Google Scholar 

  36. Walker, W.R., Skowronski, J.J.: The fading affect bias: but what the hell is it for? Appl. Cogn. Psychol. 23(8), 1122–1136 (2009)

    Article  Google Scholar 

  37. Asuncion, A., Newman, D.: Uci machine learning repository (2007). https://archive.ics.uci.edu/ml/index.php

  38. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

  39. Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of the IEEE Workshop on Applications of Computer Vision, pp. 138–142. IEEE (1994)

  40. LeCun, Y., Cortes, C., Burges, C.J.: The mnist database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/

  41. Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)

    Article  Google Scholar 

  42. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  43. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gavriil Tsechpenakis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Farhand, S. & Tsechpenakis, G. Fading affect bias: improving the trade-off between accuracy and efficiency in feature clustering. Machine Vision and Applications 30, 255–268 (2019). https://doi.org/10.1007/s00138-019-01008-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-019-01008-w

Keywords

Navigation