Large Scale Data

  • Harry StrangeEmail author
  • Reyer Zwiggelaar
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)


In this chapter the problems of using spectral dimensionality reduction with large scale datasets are outlined along with various solutions to these problems. The computational complexity of various spectral dimensionality reduction algorithms are looked at in detail. There is also often much overlap between the solutions in this chapter and what has been discussed previously with regards to incremental learning. Finally, some parallel and GPU based implementation aspects are discussed.


Large scale learning Approximations Nyström extension Parallel programming 


  1. 1.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Elsevier (2011)Google Scholar
  2. 2.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)Google Scholar
  3. 3.
    Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959)Google Scholar
  4. 4.
    Floyd, R.W.: Algorithm 97: Shortest path. Communications of the ACM 5(6), 345 (1962)Google Scholar
  5. 5.
    Chen, W., Weinberger, K.Q., Chen, Y.: Maximum variance correction with application to A* search. In: Proceedings of the 30th International Conference on Machine Learning (2013)Google Scholar
  6. 6.
    van der Maaten, L., Postma, E., van den Herik, J.: Dimensionality reduction: A comparitive review. Tech. Rep. TiCC-TR 2009–005, Tilburg University (2009). UnpublishedGoogle Scholar
  7. 7.
    Mishne, G., Cohen, I.: Multiscale anomaly detectiong using diffusion maps. IEEE Journal of Selected Topics in Signal Processing 7(1), 111–123 (2013)Google Scholar
  8. 8.
    Fokkema, D.R., Sleijpen, G.L.G., Vorst, H.A.v.: Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM Journal on Scientific Computing 20(1), 94–125 (1999)Google Scholar
  9. 9.
    Saul, L.K., Roweis, S.: An introduction to locally linear embedding. URL:
  10. 10.
    Cayton, L.: Algorithms for manifold learning. Tech. Rep. CS2008-0923, University of California San Diego (2005)Google Scholar
  11. 11.
    Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 214–225 (2004)Google Scholar
  12. 12.
    Nyström, E.J.: Über die Praktische Auflösung von Linearen Integralgleichungen mit Anwendungen auf Randwertaufgaben der Potentialtheorie. Commentationes Physio-Mathematicae 4(15), 1–52 (1928)Google Scholar
  13. 13.
    Williams, C.K.I., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Advances in Neural Information Processing Systems 13: Proceedings of the 2001 Conference (NIPS), pp. 682–688 (2001)Google Scholar
  14. 14.
    Baker, C.T.: The numerical treatment of integral equations. Clarendon Press (1977)Google Scholar
  15. 15.
    Ham, J., Lee, D.D., Mika, S., Schölkopf, B.: A kernel view of the dimensionality reduction of manifolds. In. In Proceedings of the 21st International Conference on Machine Learning, pp. 47–55 (2004)Google Scholar
  16. 16.
    Kumar, S., Mohri, M., Talwalkar, A.: Samping techniques for the nyström method. Journal of Machine Learning Research 13(1), 981–1006 (2012)Google Scholar
  17. 17.
    Drineas, P., Mahoney, M.W.: On the nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research 6, 2153–2175 (2005)Google Scholar
  18. 18.
    Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation matrix. SIAM Journal on Computing 36, 158–183 (2006)Google Scholar
  19. 19.
    Deshpande, A., Rademacher, L., Vempala, S., Wang, G.: Matrix approximation and projective clustering via volume sampling. Theory of Computing 2(12), 225–247 (2006)Google Scholar
  20. 20.
    Zhang, K., Kwok, J.T.: Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction. IEEE Transactions on Neural Networks 21(10), 1576–1587 (2010)Google Scholar
  21. 21.
    Silva, V.d., Tenenbaum, J.B.: Global versus local methods in nonlinear dimensionality reduction. In: Advances in Neural Information Processing Systems 15: Proceedings of the 2003 Conference (NIPS), pp. 705–712. MIT Press (2003)Google Scholar
  22. 22.
    Law, M., Jain, A.: Incremental nonlinear dimensionality reduction by manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(3), 377–391 (2006)Google Scholar
  23. 23.
    Narváez, P., Siu, K.Y., Tzeng, H.Y.: New dynamic algorithms for shortest path tree computation. IEEE/ACM Transactions on Networking 8(6), 734–746 (2000)Google Scholar
  24. 24.
    Vladymyrov, M., Carreira-Perpiñán, M.A.: Locally linear landmarks for large-scale manifold learning. In: In Proceedings of the 24th European Conference on Machine Learning and Princicples and Applications of Knowledge Discovery in Databases (ECML/PKDD), pp. 256–271 (2013)Google Scholar
  25. 25.
    Silva, V.d., Tenenbaum, J.B.: Sparse multidimensional scaling using landmark points. Tech. rep., Stanford University (2004)Google Scholar
  26. 26.
    Weinberger, K.Q., Packer, B.D., Saul, L.K.: Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pp. 381–388 (2005)Google Scholar
  27. 27.
    Salhov, M., Bermanis, A., Wolf, G., Averbuch, A.: Approximately-isometric Diffusion Maps. Pre-print 2013. URL:
  28. 28.
    Campana-Olivo, R., Manian, V.: Parallel implementation of nonlinear dimensionality reduction methods applied in object segmentation using CUDA and GPU. In: Proceedings of Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVII, pp. 80,480R–80,480R–12 (2011)Google Scholar
  29. 29.
    NVIDIA Corporation: NVIDIA CUDA C Programming Guide (2011)Google Scholar
  30. 30.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2322 (2000)Google Scholar
  31. 31.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by Locally Linear Embedding. Science 290, 2323–2326 (2000)Google Scholar
  32. 32.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems 14: Proceedings of the 2002 Conference (NIPS), pp. 585–591 (2002)Google Scholar
  33. 33.
    EM Photonics: CULA Tools: A GPU Accelerated Linear Algebra Library (2013).
  34. 34.
    Talwalkar, A., Kumar, S., Mohri, M., Rowley, H.: Manifold Learning Theory and Applications, chap. Large-Scale Manifold Learning, pp. 121–143. CRC Press (2012)Google Scholar
  35. 35.
    Talwalkar, A., Kumar, S., Rowley, H.: Large-scale manifold learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Department of Computer ScienceAberystwyth UniversityAberystwythUK

Personalised recommendations