Abstract
Manifold landmarks can approximately represent the low-dimensional nonlinear manifold structure embedded in high-dimensional ambient feature space. Due to the quadratic complexity of many learning algorithms in the number of training samples, selecting a sample subset as manifold landmarks has become an important issue for scalable learning. Unfortunately, state-of-the-art Gaussian process methods for selecting manifold landmarks themselves are not scalable to large datasets. In an attempt to speed up learning manifold landmarks, uniformly selected minibatch stochastic gradient descent is used by the state-of-the-art approach. Unfortunately, this approach only goes part way to making manifold learning tractable. We propose two adaptive sample selection approaches for gradient-descent optimization, which can lead to better performance in accuracy and computational time. Our methods exploit the compatibility of locality-sensitive hashing (via LSH and DBH) and the manifold assumption, thereby limiting expensive optimization to relevant regions of the data. Landmarks selected by our methods achieve superior accuracy than training the state-of-the-art learner with randomly selected minibatch. We also demonstrate that our methods can be used to find manifold landmarks without learning Gaussian processes at all, which leads to orders-of-magnitude speed up with only minimal decrease in accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agarwal, A., Chapelle, O., Dudík, M., Langford, J.: A reliable effective terascale linear learning system. J. Mach. Learn. Res. 15(1), 1111–1133 (2014)
Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: ICDE, pp. 327–336. IEEE (2008)
Cai, D., He, X., Wu, X., Han, J.: Non-negative matrix factorization on manifold. In: ICDM, pp. 63–72. IEEE (2008)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp. 253–262. ACM (2004)
Elhamifar, E., Vidal, R.: Sparse manifold clustering and embedding. In: Advances in Neural Information Processing Systems, pp. 55–63 (2011)
Faloutsos, C., Lin, K.I.: FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets, vol. 24. ACM (1995)
Goldberg, A.B., Zhu, X., Singh, A., Xu, Z., Nowak, R.: Multi-manifold semi-supervised learning (2009)
Huh, S., Fienberg, S.E.: Discriminative topic modeling based on manifold learning. ACM Trans. Knowl. Discov. Data (TKDD) 5(4), 20 (2012)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613. ACM (1998)
Liang, D., Paisley, J.: Landmarking manifolds with Gaussian processes. In: ICML, pp. 466–474 (2015)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q.: LBMCH: learning bridging mapping for cross-modal hashing. In: SIGIR, pp. 999–1002. ACM (2015)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: NIPS, vol. 16(16), pp. 321–328 (2004)
Acknowledgments
We acknowledge partial support from ARC DP150103710.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Aye, Z.M.M., Rubinstein, B.I.P., Ramamohanarao, K. (2018). Fast Manifold Landmarking Using Locality-Sensitive Hashing. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-93040-4_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)