Maximum Entropy Linear Manifold for Learning Discriminative Low-Dimensional Representation

  • Wojciech Marian Czarnecki
  • Rafal Jozefowicz
  • Jacek Tabor
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)


Representation learning is currently a very hot topic in modern machine learning, mostly due to the great success of the deep learning methods. In particular low-dimensional representation which discriminates classes can not only enhance the classification procedure, but also make it faster, while contrary to the high-dimensional embeddings can be efficiently used for visual based exploratory data analysis.

In this paper we propose Maximum Entropy Linear Manifold (MELM), a multidimensional generalization of Multithreshold Entropy Linear Classifier model which is able to find a low-dimensional linear data projection maximizing discriminativeness of projected classes. As a result we obtain a linear embedding which can be used for classification, class aware dimensionality reduction and data visualization. MELM provides highly discriminative 2D projections of the data which can be used as a method for constructing robust classifiers.

We provide both empirical evaluation as well as some interesting theoretical properties of our objective function such us scale and affine transformation invariance, connections with PCA and bounding of the expected balanced accuracy error.


Dense representation learning Data visualization Entropy Supervised dimensionality reduction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bache, K., Lichman, M.: UCI machine learning repository (2013).
  2. 2.
    Bhatia, R.: Matrix analysis, vol. 169. Springer Science & Business Media (1997)Google Scholar
  3. 3.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)Google Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of information theory, 2nd edn. Willey-Interscience, NJ (2006)zbMATHGoogle Scholar
  5. 5.
    Czarnecki, W.M.: On the consistency of multithreshold entropy linear classifier. Schedae Informaticae (2015)Google Scholar
  6. 6.
    Czarnecki, W.M., Tabor, J.: Multithreshold entropy linear classifier: Theory and applications. Expert Systems with Applications (2015)Google Scholar
  7. 7.
    Geng, Q., Wright, J.: On the local correctness of 1-minimization for dictionary learning. In: 2014 IEEE International Symposium on Information Theory (ISIT), pp. 3180–3184. IEEE (2014)Google Scholar
  8. 8.
    Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013, Part III. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  9. 9.
    Hinton, G., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)CrossRefMathSciNetzbMATHGoogle Scholar
  10. 10.
    Jozefowicz, R., Czarnecki, W.M.: Fast optimization of multithreshold entropy linear classifier (2015). arXiv preprint arXiv:1504.04739
  11. 11.
    Karampatziakis, N., Mineiro, P.: Discriminative features via generalized eigenvectors. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 494–502 (2014)Google Scholar
  12. 12.
    Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems (NIPS 2014), pp. 2177–2185 (2014)Google Scholar
  13. 13.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 689–696. ACM (2009)Google Scholar
  14. 14.
    Principe, J.C., Xu, D., Fisher, J.: Information theoretic learning. Unsupervised Adaptive Filtering 1, 265–319 (2000)Google Scholar
  15. 15.
    Silverman, B.W.: Density estimation for statistics and data analysis, vol. 26. CRC Press (1986)Google Scholar
  16. 16.
    Suykens, J.A., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least squares support vector machines, vol. 4. World Scientific (2002)Google Scholar
  17. 17.
    Tabor, J., Spurek, P.: Cross-entropy clustering. Pattern Recognition 47(9), 3046–3059 (2014)CrossRefGoogle Scholar
  18. 18.
    Wang, L.: Support Vector Machines: theory and applications, vol. 177. Springer Science & Business Media (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Wojciech Marian Czarnecki
    • 1
  • Rafal Jozefowicz
    • 2
  • Jacek Tabor
    • 1
  1. 1.Faculty of Mathematics and Computer ScienceJagiellonian UniversityKrakowPoland
  2. 2.GoogleNew YorkUSA

Personalised recommendations