Knowledge and Information Systems

, Volume 54, Issue 1, pp 65–94 | Cite as

Iterative column subset selection

  • Bruno OrdozgoitiEmail author
  • Sandra Gómez Canaval
  • Alberto Mozo
Regular Paper


Dimensionality reduction is often a crucial step for the successful application of machine learning and data mining methods. One way to achieve said reduction is feature selection. Due to the impossibility of labelling many data sets, unsupervised approaches are frequently the only option. The column subset selection problem translates naturally to this purpose and has received considerable attention over the last few years, as it provides simple linear models for low-rank data reconstruction. Recently, it was empirically shown that an iterative algorithm, which can be implemented efficiently, provides better subsets than other state-of-the-art methods. In this paper, we describe this algorithm and provide a more in-depth analysis. We carry out numerous experiments to gain insights on its behaviour and derive a simple bound for the norm recovered by the resulting matrix. To the best of our knowledge, this is the first theoretical result of this kind for this algorithm.


Column subset selection Unsupervised feature selection Dimensionality reduction Machine learning Data mining 



We would like to thank José Ramón Sánchez Couso for the valuable discussions he agreed to hold on the theoretical analysis. The research leading to these results has received funding from the European Union under the FP7 Grant Agreement No. 619633 (project ONTIC) and H2020 Grant Agreement No. 671625 (project CogNet).


  1. 1.
    Altschuler J, Bhaskara A, Fu G, Mirrokni V, Rostamizadeh A, Zadimoghaddam M (2016) Greedy column subset selection: new bounds and distributed algorithms. In: International conference on machine learning, pp 2539–2548Google Scholar
  2. 2.
    Arai H, Maung C, Schweitzer H (2015) Optimal column subset selection by a-star search. In: Twenty-ninth AAAI conference on artificial intelligenceGoogle Scholar
  3. 3.
    Bertin-Mahieux T, Ellis DP, Whitman B, Lamere P (2011) The million song dataset. In: Proceedings of the 12th international conference on music information retrieval (ISMIR 2011)Google Scholar
  4. 4.
    Boutsidis C, Drineas P, Magdon-Ismail M (2014) Near-optimal column-based matrix reconstruction. SIAM J Comput 43(2):687–717MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Boutsidis C, Mahoney MW, Drineas P (2008) Unsupervised feature selection for principal components analysis. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 61–69Google Scholar
  6. 6.
    Boutsidis C, Mahoney MW, Drineas P (2009) An improved approximation algorithm for the column subset selection problem. In: Proceedings of the 20th annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and Applied Mathematics, pp 968–977Google Scholar
  7. 7.
    Businger P, Golub GH (1965) Linear least squares solutions by householder transformations. Numer Math 7(3):269–276MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Buza K (2014) Feedback Prediction for Blogs. In: Spiliopoulou M, Schmidt-Thieme L, Janning R (eds) Data analysis, machine learning and knowledge discovery. Studies in classification, Data analysis, and knowledge organization, Springer, Cham, pp 145–152Google Scholar
  9. 9.
    Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 333–342Google Scholar
  10. 10.
    Chan TF (1987) Rank revealing QR factorizations. Linear Algebra Appl 88:67–82MathSciNetzbMATHGoogle Scholar
  11. 11.
    Chan TF, Hansen PC (1992) Some applications of the rank revealing QR factorization. SIAM J Sci Stat Comput 13(3):727–741MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Civril A, Magdon-Ismail M (2012) Column subset selection via sparse approximation of SVD. Theor Comput Sci 421:1–14MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetzbMATHGoogle Scholar
  14. 14.
    Farahat AK, Elgohary A, Ghodsi A, Kamel MS (2013) Distributed column subset selection on mapreduce. In: Data mining (ICDM), 2013 IEEE 13th international conference on, IEEE, pp 171–180Google Scholar
  15. 15.
    Farahat AK, Ghodsi A, Kamel MS (2011) An efficient greedy method for unsupervised feature selection. In: Data mining (ICDM), 2011 IEEE 11th international conference on, IEEE, pp 161–170Google Scholar
  16. 16.
    Fernandes K, Vinagre P, Cortez P (2015) A proactive intelligent decision support system for predicting the popularity of online news. In: Pereira F, Machado P, Costa E, Cardoso A (eds) Progress in artificial intelligence, EPIA, vol 9273. Lecture Notes in Computer Science. Springer, Cham, pp 535–546Google Scholar
  17. 17.
    Foster LV (1986) Rank and null space calculations using matrix decomposition without column interchanges. Linear Algebra Appl 74:47–71MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660CrossRefGoogle Scholar
  19. 19.
    Golub G (1965) Numerical methods for solving linear least squares problems. Numer Math 7(3):206–216MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer Math 14(5):403–420MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, Baltimore, p 290Google Scholar
  22. 22.
    Gu M, Eisenstat SC (1996) Efficient algorithms for computing a strong rank-revealing qr factorization. SIAM J Sci Comput 17(4):848–869MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Guruswami V, Sinop AK (2012) Optimal column-based low-rank matrix reconstruction. In: Proceedings of the twenty-third annual ACM-SIAM symposium on discrete algorithms, SIAM, pp 1207–1214Google Scholar
  24. 24.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  25. 25.
    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514Google Scholar
  26. 26.
    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Jolliffe I (2002) Principal component analysis. Wiley Online Library, New YorkzbMATHGoogle Scholar
  28. 28.
    Lee K-C, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698CrossRefGoogle Scholar
  29. 29.
    Lichman M (2013) UCI Machine Learning Repository. Irvine, CA. Accessed 24 Oct 2017
  30. 30.
    Mahoney MW, Drineas P (2009) Cur matrix decompositions for improved data analysis. Proc Natl Acad Sci 106(3):697–702MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Meyer CD Jr (1973) Generalized inversion of modified matrices. SIAM J Appl Math 24(3):315–323MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312CrossRefGoogle Scholar
  33. 33.
    Nene SA, Nayar SK, Murase H (1996) Columbia object image library (coil-20). Technical Report CUCS-005-96, Columbia UniversityGoogle Scholar
  34. 34.
    Ordozgoiti B, Canaval SG, Mozo A (2016) A fast iterative algorithm for improved unsupervised feature selection. In: Data mining (ICDM), 2016 IEEE 16th international conference on, IEEE, pp 390–399Google Scholar
  35. 35.
    Papailiopoulos D, Kyrillidis A, Boutsidis C (2014) Provable deterministic leverage score sampling. In: Proceedings of the 20th ACM SIGKDD, ACM, pp 997–1006Google Scholar
  36. 36.
    Paul S, Magdon-Ismail M, Drineas P (2015) Column selection via adaptive sampling. In: Advances in neural information processing systems, pp 406–414Google Scholar
  37. 37.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
  38. 38.
    Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125CrossRefGoogle Scholar
  39. 39.
    Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, 1994, IEEE, pp 138–142Google Scholar
  40. 40.
    Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNetzbMATHGoogle Scholar
  41. 41.
    Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, USENIX Association, pp 2–2Google Scholar
  42. 42.
    Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning, ACM, pp 1151–1157Google Scholar
  43. 43.
    Zhu P, Zuo W, Zhang L, Hu Q, Shiu SC (2015) Unsupervised feature selection by regularized self-representation. Pattern Recognit 48(2):438–446CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  1. 1.Department of Computer SystemsUniversidad Politécnica de MadridMadridSpain

Personalised recommendations