Modular Dimensionality Reduction

  • Henry W. J. ReeveEmail author
  • Tingting Mu
  • Gavin Brown
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


We introduce an approach to modular dimensionality reduction, allowing efficient learning of multiple complementary representations of the same object. Modules are trained by optimising an unsupervised cost function which balances two competing goals: Maintaining the inner product structure within the original space, and encouraging structural diversity between complementary representations. We derive an efficient learning algorithm which outperforms gradient based approaches without the need to choose a learning rate. We also demonstrate an intriguing connection with Dropout. Empirical results demonstrate the efficacy of the method for image retrieval and classification.


Ensemble learning Dimensionality reduction Dropout Kernel principal components analysis 



H. Reeve was supported by the EPSRC through the Centre for Doctoral Training Grant [EP/1038099/1]. G. Brown was supported by the EPSRC LAMBDA project [EP/N035127/1].

Supplementary material

478880_1_En_37_MOESM1_ESM.pdf (1 mb)
Supplementary material 1 (pdf 1058 KB)


  1. 1.
    Baldi, P., Sadowski, P.J.: Understanding dropout. In: Advances in Neural Information Processing Systems, pp. 2814–2822 (2013)Google Scholar
  2. 2.
    Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)Google Scholar
  3. 3.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  4. 4.
    Brown, G., Wyatt, J.L., Tiňo, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2005)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, p. 48. ACM (2009)Google Scholar
  6. 6.
    Cunningham, J.P., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. J. Mach. Learn. Res. 16, 2859–2900 (2015)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Dax, A.: Low-rank positive approximants of symmetric matrices. Adv. Linear Algebr. Matrix Theory 4(3), 172 (2014)CrossRefGoogle Scholar
  8. 8.
    Durrant, R.J., Kabán, A.: Random projections as regularizers: learning a linear discriminant ensemble from fewer observations than dimensionsGoogle Scholar
  9. 9.
    Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211–218 (1936)zbMATHCrossRefGoogle Scholar
  10. 10.
    Germain, P., Lacasse, A., Laviolette, F., Marchand, M., Roy, J.-F.: Risk bounds for the majority vote: from a Pac-Bayesian analysis to a learning algorithm. J. Mach. Learn. Res. 16(1), 787–860 (2015)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Ham, J., Lee, D.D., Mika, S., Schölkopf, B.: A kernel view of the dimensionality reduction of manifolds. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 47. ACM (2004)Google Scholar
  13. 13.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv preprint: arXiv:1207.0580
  14. 14.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint: arXiv:1412.6980
  15. 15.
    Krogh, A., Vedelsby, J., et al.: Neural network ensembles, cross validation, and active learning. In: Advances in Neural Information Processing Systems, pp. 231–238 (1995)Google Scholar
  16. 16.
    Kwok, J.T.-Y., Tsang, I.W.-H.: The pre-image problem in kernel methods. IEEE Trans. Neural Netw. 15(6), 1517–1525 (2004)CrossRefGoogle Scholar
  17. 17.
    Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 473–480. ACM (2007)Google Scholar
  18. 18.
    Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Netw. 12(10), 1399–1404 (1999)CrossRefGoogle Scholar
  19. 19.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)zbMATHGoogle Scholar
  20. 20.
    Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D., Williamson, B. (eds.) COLT 2001. LNCS (LNAI), vol. 2111, pp. 416–426. Springer, Heidelberg (2001). Scholar
  21. 21.
    Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). Scholar
  22. 22.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Storcheus, D., Rostamizadeh, A., Kumar, S.: A survey of modern questions and challenges in feature extraction. In: Proceedings of the 1st International Workshop on “Feature Extraction: Modern Questions and Challenges”, NIPS, pp. 1–18 (2015)Google Scholar
  24. 24.
    Venna, J., Peltonen, J., Nybo, K., Aidos, H., Kaski, S.: Information retrieval perspective to nonlinear dimensionality reduction for data visualization. J. Mach. Learn. Res. 11, 451–490 (2010)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Wang, S.I., Manning, C.D.: Fast dropout trainingGoogle Scholar
  27. 27.
    Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Proceedings of the 14th Annual Conference on Neural Information Processing Systems, number EPFL-CONF-161322, pp. 682–688 (2001)Google Scholar
  28. 28.
    Wu, X., Hauptmann, A.G., Ngo, C.-W.: Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 218–227. ACM (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of BirminghamBirminghamUK
  2. 2.University of ManchesterManchesterUK

Personalised recommendations