Fading affect bias: improving the trade-off between accuracy and efficiency in feature clustering

Wang, Ziyin; Farhand, Sepehr; Tsechpenakis, Gavriil

doi:10.1007/s00138-019-01008-w

Fading affect bias: improving the trade-off between accuracy and efficiency in feature clustering

Special Issue Paper
Published: 14 March 2019

Volume 30, pages 255–268, (2019)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

176 Accesses
3 Citations
Explore all metrics

Abstract

We present a fast and accurate center-based, single-pass clustering method, with main focus on improving the trade-off between accuracy and speed in computer vision problems, such as creating visual vocabularies. We use a stochastic mean-shift procedure to seek the local density peaks within a single pass of the data. We also present a dynamic kernel generation along with a density test procedure that finds the most promising kernel initializations. In our algorithm, we use two data structures, namely a dictionary of permanent kernels, and a short memory that is used to determine emerging kernels to be maintained and outliers to be discarded. We further develop a hierarchical realization of the algorithm such that it executes faster than most stream cluster algorithms, and the resulting tree serves as an efficient data structure to boost encoding procedure. We use our method for learning low-level image features on the fly, from unmanned aerial vehicle cameras, aiming at vision-driven maneuvering with reduced computational cost. In our experiments, we make extensive comparisons with popular clustering algorithms, with respect to accuracy and efficiency. Our algorithm showed improved accuracy and speed on datasets with sufficient cluster patterns. With noisy visual features, where natural clusters present inherent challenges (intra-cluster variability and inter-cluster similarities), we achieved high accuracy, compared to algorithms of higher complexity, while maintaining high efficiency: Only one method from the competition achieved lower run times, though with lower accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

3D Object Detection for Autonomous Driving: A Comprehensive Survey

Article 27 April 2023

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

Notes

\(n_k\) in Eq. (1) denotes data points that contributed to updating kernel k previously, while \(N_k\) is the number of data points inside the current kernel location.
\(p_f\) and \(\theta \) are related to each other: for smaller values of \(\theta \), higher values for \(\frac{N_k}{N}\) should be considered.
If \(\Phi (x) =.99\) then \(x = 2.23\).

References

Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Cho, M., Kwak, S., Schmid, C., Ponce, J.: Unsupervised object discovery and localization in the wild: part-based matching with bottom-up region proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1201–1210 (2015)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet MATH Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
MATH Google Scholar
Arthur, D., Vassilvitskii, S.: k-Means++: the advantages of careful seeding. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)
Article Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Article Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: ACM Sigmod Record, vol. 25, pp. 103–114. ACM (1996)
O’callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of the International Conference on Data Engineering, pp. 685–694. IEEE (2002)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the International Conference on Very Large Data Bases, vol. 29, pp. 81–92. VLDB Endowment (2003)
Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM International Conference on Data Mining, pp. 328–339. SIAM (2006)
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. (CSUR) 46(1), 13 (2013)
Article MATH Google Scholar
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the International Conference on World Wide Web, pp. 1177–1178. ACM (2010)
Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. J. Exp. Algorithmics (JEA) 17, 2–4 (2012)
MathSciNet MATH Google Scholar
Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: Bico: birch meets coresets for k-means clustering. In: European Symposium on Algorithms, Springer, pp. 481–492 (2013)
Alvarez, L.M.L.: Data stream management systems, US Patent App. 14/375,845 (2014)
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate k-means++ in sublinear time. In: AAAI, pp. 1459–1467 (2016)
Liberty, E., Sriharsha, R., Sviridenko, M.: An algorithm for online k-means clustering. In: Proceedings of the Workshop on Algorithm Engineering and Experiments, pp. 81–89. SIAM (2016)
Kobren, A., Monath, N., Krishnamurthy, A., McCallum, A.: A hierarchical algorithm for extreme clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 255–264. ACM (2017)
Ene, A., Im, S., Moseley, B.: Fast clustering using MapReduce. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 681–689. ACM (2011)
Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data k-means clustering using MapReduce. J. Supercomput. 70(3), 1249–1259 (2014)
Article Google Scholar
Balcan, M.-F.F., Ehrlich, S., Liang, Y.: Distributed \( k \)-means and \( k \)-median clustering on general topologies. In: Advances in Neural Information Processing Systems, pp. 1995–2003 (2013)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)
Article Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Sharing clusters among related groups: hierarchical dirichlet processes. In: Advances in Neural Information Processing Systems, pp. 1385–1392 (2005)
Kulis, B., Jordan, M.I.: Revisiting k-means: new algorithms via Bayesian nonparametrics. arXiv preprint arXiv:1111.0352
Sivic, J., Zisserman, A. et al.: Video google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, vol. 2, pp. 1470–1477 (2003)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE (2010)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 215–223 (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168 (2006)
Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. ACM SIGKDD Explor. Newsl. 2(1), 51–57 (2000)
Article Google Scholar
Walker, W.R., Skowronski, J.J.: The fading affect bias: but what the hell is it for? Appl. Cogn. Psychol. 23(8), 1122–1136 (2009)
Article Google Scholar
Asuncion, A., Newman, D.: Uci machine learning repository (2007). https://archive.ics.uci.edu/ml/index.php
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of the IEEE Workshop on Applications of Computer Vision, pp. 138–142. IEEE (1994)
LeCun, Y., Cortes, C., Burges, C.J.: The mnist database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

Download references

Author information

Authors and Affiliations

Indiana University - Purdue University Indianapolis, Indianapolis, USA
Ziyin Wang, Sepehr Farhand & Gavriil Tsechpenakis

Authors

Ziyin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sepehr Farhand
View author publications
You can also search for this author in PubMed Google Scholar
Gavriil Tsechpenakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gavriil Tsechpenakis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Farhand, S. & Tsechpenakis, G. Fading affect bias: improving the trade-off between accuracy and efficiency in feature clustering. Machine Vision and Applications 30, 255–268 (2019). https://doi.org/10.1007/s00138-019-01008-w

Download citation

Received: 30 August 2018
Accepted: 06 January 2019
Published: 14 March 2019
Issue Date: 04 March 2019
DOI: https://doi.org/10.1007/s00138-019-01008-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fading affect bias: improving the trade-off between accuracy and efficiency in feature clustering

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

3D Object Detection for Autonomous Driving: A Comprehensive Survey

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fading affect bias: improving the trade-off between accuracy and efficiency in feature clustering

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

3D Object Detection for Autonomous Driving: A Comprehensive Survey

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation