Maximum Entropy Linear Manifold for Learning Discriminative Low-Dimensional Representation
Representation learning is currently a very hot topic in modern machine learning, mostly due to the great success of the deep learning methods. In particular low-dimensional representation which discriminates classes can not only enhance the classification procedure, but also make it faster, while contrary to the high-dimensional embeddings can be efficiently used for visual based exploratory data analysis.
In this paper we propose Maximum Entropy Linear Manifold (MELM), a multidimensional generalization of Multithreshold Entropy Linear Classifier model which is able to find a low-dimensional linear data projection maximizing discriminativeness of projected classes. As a result we obtain a linear embedding which can be used for classification, class aware dimensionality reduction and data visualization. MELM provides highly discriminative 2D projections of the data which can be used as a method for constructing robust classifiers.
We provide both empirical evaluation as well as some interesting theoretical properties of our objective function such us scale and affine transformation invariance, connections with PCA and bounding of the expected balanced accuracy error.
KeywordsDense representation learning Data visualization Entropy Supervised dimensionality reduction
Unable to display preview. Download preview PDF.
- 1.Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
- 2.Bhatia, R.: Matrix analysis, vol. 169. Springer Science & Business Media (1997)Google Scholar
- 3.Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)Google Scholar
- 5.Czarnecki, W.M.: On the consistency of multithreshold entropy linear classifier. Schedae Informaticae (2015)Google Scholar
- 6.Czarnecki, W.M., Tabor, J.: Multithreshold entropy linear classifier: Theory and applications. Expert Systems with Applications (2015)Google Scholar
- 7.Geng, Q., Wright, J.: On the local correctness of 1-minimization for dictionary learning. In: 2014 IEEE International Symposium on Information Theory (ISIT), pp. 3180–3184. IEEE (2014)Google Scholar
- 10.Jozefowicz, R., Czarnecki, W.M.: Fast optimization of multithreshold entropy linear classifier (2015). arXiv preprint arXiv:1504.04739
- 11.Karampatziakis, N., Mineiro, P.: Discriminative features via generalized eigenvectors. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 494–502 (2014)Google Scholar
- 12.Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems (NIPS 2014), pp. 2177–2185 (2014)Google Scholar
- 13.Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 689–696. ACM (2009)Google Scholar
- 14.Principe, J.C., Xu, D., Fisher, J.: Information theoretic learning. Unsupervised Adaptive Filtering 1, 265–319 (2000)Google Scholar
- 15.Silverman, B.W.: Density estimation for statistics and data analysis, vol. 26. CRC Press (1986)Google Scholar
- 16.Suykens, J.A., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least squares support vector machines, vol. 4. World Scientific (2002)Google Scholar
- 18.Wang, L.: Support Vector Machines: theory and applications, vol. 177. Springer Science & Business Media (2005)Google Scholar