Abstract
Dimensionality reduction is a very common preprocessing approach in many machine learning tasks. The goal is to design data representations that on one hand reduce the dimension of the data (therefore allowing faster processing), and on the other hand aim to retain as much task-relevant information as possible. We look at generic dimensionality reduction approaches that do not rely on much task-specific prior knowledge. However, we focus on scenarios in which unlabeled samples are available and can be utilized for evaluating the usefulness of candidate data representations.
We wish to provide some theoretical principles to help explain the success of certain dimensionality reduction techniques in classification prediction tasks, as well as to guide the choice of dimensionality reduction tool and parameters. Our analysis is based on formalizing the often implicit assumption that “similar instances are likely to have similar labels”. Our theoretical analysis is supported by experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Urner, R., Ben-David, S., Shalev-Shwartz, S.: Access to unlabeled data can speed up prediction time. In: ICML (2011)
van der Maaten, L.J., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: A comparative review. Journal of Machine Learning Research 10(1–41), 66–71 (2009)
Ghodsi, A.: Dimensionality reduction a short tutorial. Department of Statistics and Actuarial Science, Univ. of Waterloo, Ontario, Canada (2006)
Johnson, W.B., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemporary Mathematics 26(189–206), 1 (1984)
Abraham, I., Bartal, Y., Neiman, O.: On embedding of finite metric spaces into hilbert space. Tech. Report, Technical report (2006)
Chan, T.H.H., Dhamdhere, K., Gupta, A., Kleinberg, J., Slivkins, A.: Metric embeddings with relaxed guarantees. SIAM Journal on Computing 38(6), 2303–2329 (2009)
Urner, R., Ben-David, S.: Probabilistic lipschitzness a niceness assumption for deterministic labels. In: Learning Faster from Easy Data-Workshop@ NIPS (2013)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications 16(2), 264–280 (1971)
Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning (2014)
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Keyvanrad, M.A., Homayounpour, M.M.: A brief survey on deep belief networks and introducing a new object oriented matlab toolbox (deebnet) (2014). arXiv preprint arXiv:1408.3264
Knuth, D.E.: The art of computer programming: Fundamental algorithms, vol. i (1968)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kushagra, S., Ben-David, S. (2015). Information Preserving Dimensionality Reduction. In: Chaudhuri, K., GENTILE, C., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2015. Lecture Notes in Computer Science(), vol 9355. Springer, Cham. https://doi.org/10.1007/978-3-319-24486-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-24486-0_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24485-3
Online ISBN: 978-3-319-24486-0
eBook Packages: Computer ScienceComputer Science (R0)