Abstract
We are interested in the following questions. Given a finite data set \(\mathcal {S}\), with neither labels nor side information, and an unsupervised learning algorithm \(\mathsf {A}\), can the generalization of \(\mathsf {A}\) be assessed on \(\mathcal {S}\)? Similarly, given two unsupervised learning algorithms, \(\mathsf {A}_1\) and \(\mathsf {A}_2\), for the same learning task, can one assess whether one will generalize “better” on future data drawn from the same source as \(\mathcal {S}\)? In this paper, we develop a general approach to answering these questions in a reliable and efficient manner using mild assumptions on \(\mathsf {A}\). We first propose a concrete generalization criterion for unsupervised learning that is analogous to prediction error in supervised learning. Then, we develop a computationally efficient procedure that realizes the generalization criterion on finite data sets, and propose and extension for comparing the generalization of two algorithms on the same data set. We validate the overall framework on algorithms for clustering and dimensionality reduction (linear and nonlinear).
Chapter PDF
Similar content being viewed by others
Keywords
- Principal Component Analysis
- Supervise Learning
- Reconstruction Error
- Unsupervised Learning
- Local Linear Embed
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Anandkumar, A., Hsu, D., Kakade, S.: A method of moments for mixture models and hidden Markov models. CoRR abs/1203.0683 (2012)
Balle, B., Quattoni, A., Carreras, X.: Local loss optimization in operator models: a new insight into spectral learning. In: Proceedings of the 29th International Conference on Machine Learning, pp. 1879–1886 (2012)
Bartlett, P., Mendelson, S.: Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2003)
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for data representation. Neural Computation 15, 1373–1396 (2003)
Bousquet, O., Elisseeff, A.: Stability and generalization. Journal of Machine Learning Research 2, 499–526 (2002)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)
Devroye, L., Wagner, T.: Distribution-free inequalities for the deleted and holdout error estimates. IEEE Transactions on Information Theory 25(2), 202–207 (1979)
Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)
Efron, B.: Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 1–26 (1979)
Hansen, L., Larsen, J.: Unsupervised learning and generalization. In: Proceedings of the IEEE International Conference on Neural Networks, pp. 25–30 (1996)
Hsu, D., Kakade, S., Zhang, T.: A spectral algorithm for learning hidden Markov models. In: Proceedings of the 22nd Conference on Learning Theory (2009)
Jordan, M.: On statistics, computation and scalability. Bernoulli 19(4), 1378–1390 (2013)
Kearns, M., Ron, D.: Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. In: Proceedings of the Conference on Learning Theory, pp. 152–162. ACM (1999)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143 (1995)
Kutin, S., Niyogi, P.: Almost-everywhere algorithmic stability and generalization error. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 275–282 (2002)
Mukherjee, S., Niyogi, P., Poggio, T., Rifkin, R.: Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathemathics 25, 161–193 (2006)
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI Repository of Machine Learning Databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
Saul, L., Roweis, S.: Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4, 119–155 (2003)
Shalev-Shwartz, S., Shamir, O., Srebro, N., Sridharan, K.: Learnability, stability and uniform convergence. Journal of Machine Learning Research 11, 2635–2670 (2010)
Song, L., Boots, B., Siddiqi, S., Gordon, G., Smola, A.: Hilbert space embeddings of hidden Markov models. In: Proceedings of the 27th International Conference on Machine Learning, pp. 991–998 (2010)
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Sussex (1998)
Vapnik, V.N.: An overview of statistical learning theory. IEEE Transactions on Neural Networks 10(5), 988–999 (1999)
Xu, H., Mannor, S.: Robustness and generalization. Machine Learning 86(3), 391–423 (2012)
Xu, L., White, M., Schuurmans, D.: Optimal reverse prediction: a unified perspective on supervised, unsupervised and semi-supervised learning. In: Proceedings of the International Conference on Machine Learning, vol. 382 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Abou-Moustafa, K.T., Schuurmans, D. (2015). Generalization in Unsupervised Learning. In: Appice, A., Rodrigues, P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9284. Springer, Cham. https://doi.org/10.1007/978-3-319-23528-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-23528-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23527-1
Online ISBN: 978-3-319-23528-8
eBook Packages: Computer ScienceComputer Science (R0)