Learning Feature Representations with K-Means

  • Adam Coates
  • Andrew Y. Ng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7700)


Many algorithms are available to learn deep hierarchies of features from unlabeled data, especially images. In many cases, these algorithms involve multi-layered networks of features (e.g., neural networks) that are sometimes tricky to train and tune and are difficult to scale up to many machines effectively. Recently, it has been found that K-means clustering can be used as a fast alternative training method. The main advantage of this approach is that it is very fast and easily implemented at large scale. On the other hand, employing this method in practice is not completely trivial: K-means has several limitations, and care must be taken to combine the right ingredients to get the system to work well. This chapter will summarize recent results and technical tricks that are needed to make effective use of K-means clustering for learning large-scale representations of images. We will also connect these results to other well-known algorithms to make clear when K-means can be most useful and convey intuitions about its behavior that are useful for debugging and engineering new systems.


Image Patch Unlabeled Data Sparse Code Convolutional Neural Network Neural Information Processing System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, A., Triggs, B.: Hyperfeatures – Multilevel Local Coding for Visual Recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 30–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Aharon, M., Elad, M., Bruckstein, A.: K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
  3. 3.
    Bell, A., Sejnowski, T.J.: The ‘independent components’ of natural scenes are edge filters. Vision Research 37(23), 3327–3338 (1997)CrossRefGoogle Scholar
  4. 4.
    Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: 23rd Conference on Computer Vision and Pattern Recognition, pp. 2559–2566 (2010)Google Scholar
  5. 5.
    Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: 27th International Conference on Machine Learning, pp. 111–118 (2010)Google Scholar
  6. 6.
    Boureau, Y., Roux, N.L., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: 13th International Conference on Computer Vision, pp. 2651–2658 (2011)Google Scholar
  7. 7.
    Bradley, D.M., Bagnell, J.A.: Differentiable sparse coding. In: Advances in Neural Information Processing Systems, vol. 22, pp. 113–120 (2008)Google Scholar
  8. 8.
    Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: International Joint Conference on Artificial Intelligence, pp. 1237–1242 (2011)Google Scholar
  9. 9.
    Ciresan, D.C., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Computer Vision and Pattern Recognition, pp. 3642–3649 (2012)Google Scholar
  10. 10.
    Coates, A., Ng, A.Y.: Selecting receptive fields in deep networks. In: Advances in Neural Information Processing Systems, vol. 24, pp. 2528–2536 (2011)Google Scholar
  11. 11.
    Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. In: 14th International Conference on AI and Statistics, pp. 215–223 (2011)Google Scholar
  12. 12.
    Coates, A., Ng, A.Y.: The importance of encoding versus training with sparse coding and vector quantization. In: 28th International Conference on Machine Learning, pp. 921–928 (2011)Google Scholar
  13. 13.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning in Computer Vision, pp. 59–74 (2004)Google Scholar
  14. 14.
    Dhillon, I.S., Modha, D.M.: Concept decompositions for large sparse text data using clustering. Machine Learning 42(1), 143–175 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of statistics 32(2), 407–499 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: Computer Vision and Pattern Recognition, vol. 2, pp. 524–531 (2005)Google Scholar
  17. 17.
    Garrigues, P., Olshausen, B.: Group sparse coding with a laplacian scale mixture prior. In: Advances in Neural Information Processing Systems, vol. 23, pp. 676–684 (2010)Google Scholar
  18. 18.
    van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel Codebooks for Scene Categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)Google Scholar
  20. 20.
    Goodfellow, I., Courville, A., Bengio, Y.: Spike-and-slab sparse coding for unsupervised feature discovery. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)Google Scholar
  21. 21.
    Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Hyvärinen, A., Hurri, J., Hoyer, P.: Natural Image Statistics. Springer (2009)Google Scholar
  23. 23.
    Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Networks 13(4-5), 411–430 (2000)CrossRefGoogle Scholar
  24. 24.
    Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: 12th International Conference on Computer Vision, pp. 2146–2153 (2009)Google Scholar
  25. 25.
    Krizhevsky, A.: Learning multiple layers of features from Tiny Images. Master’s thesis, Dept. of Comp. Sci., University of Toronto (2009)Google Scholar
  26. 26.
    Krizhevsky, A.: Convolutional Deep Belief Networks on CIFAR-10 (2010) (unpublished manuscript)Google Scholar
  27. 27.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: International Conference on Robotics and Automation, pp. 1817–1824 (2011)Google Scholar
  28. 28.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition (2006)Google Scholar
  29. 29.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 541–551 (1989)CrossRefGoogle Scholar
  30. 30.
    LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Computer Vision and Pattern Recognition, vol. 2, pp. 97–104 (2004)Google Scholar
  31. 31.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: 26th International Conference on Machine Learning, pp. 609–616 (2009)Google Scholar
  32. 32.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research 11, 19–60 (2010)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: 27th International Conference on Machine Learning, pp. 807–814 (2010)Google Scholar
  34. 34.
    Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607–609 (1996)CrossRefGoogle Scholar
  35. 35.
    Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.: Self-taught learning: transfer learning from unlabeled data. In: 24th International Conference on Machine learning, pp. 759–766 (2007)Google Scholar
  36. 36.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)Google Scholar
  37. 37.
    Varma, M., Zisserman, A.: A statistical approach to material classification using image patch exemplars. Transactions on Pattern Analysis and Machine Intelligence 31(11), 2032–2047 (2009)CrossRefGoogle Scholar
  38. 38.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Computer Vision and Pattern Recognition, pp. 3360–3367 (2010)Google Scholar
  39. 39.
    Yang, J., Yu, K., Gong, Y., Huang, T.S.: Linear spatial pyramid matching using sparse coding for image classification. In: Computer Vision and Pattern Recognition, pp. 1794–1801 (2009)Google Scholar
  40. 40.
    Yu, K., Zhang, T., Gong, Y.: Nonlinear learning using local coordinate coding. In: Advances in Neural Information Processing Systems, vol. 22, pp. 2223–2231 (2009)Google Scholar
  41. 41.
    Zetzsche, C., Krieger, G., Wegmann, B.: The atoms of vision: Cartesian or polar? Journal of the Optical Society of America 16(7), 1554–1565 (1999)CrossRefGoogle Scholar
  42. 42.
    Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image Classification Using Super-Vector Coding of Local Image Descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Adam Coates
    • 1
  • Andrew Y. Ng
    • 1
  1. 1.Stanford UniversityStanfordUSA

Personalised recommendations