Improving Random Projections Using Marginal Information

  • Ping Li
  • Trevor J. Hastie
  • Kenneth W. Church
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4005)


We present an improved version of random projections that takes advantage of marginal norms. Using a maximum likelihood estimator (MLE), margin-constrained random projections can improve estimation accuracy considerably. Theoretical properties of this estimator are analyzed in detail.


Asymptotic Normality Moment Generate Function Random Projection Multiple Root Marginal Norm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vempala, S.S.: The Random Projection Method. American Mathematical Society, Providence, RI (2004)Google Scholar
  2. 2.
    Arriaga, R., Vempala, S.: An algorithmic theory of learning: Robust concepts and random projection. In: Proc. of FOCS, pp. 616–623 (1999) (Also to appear in Machine Learning)Google Scholar
  3. 3.
    Dasgupta, S.: Learning mixtures of gaussians. In: Proc. of FOCS, New York, pp. 634–644 (1999)Google Scholar
  4. 4.
    Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proc. of KDD, Washington, DC, pp. 517–522 (2003)Google Scholar
  5. 5.
    Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: A cluster ensemble approach. In: Proc. of ICML, Washington, DC, pp. 186–193 (2003)Google Scholar
  6. 6.
    Balcan, M.-F., Blum, A., Vempala, S.S.: On kernels, margins, and low-dimensional mappings. In: Ben-David, S., Case, J., Maruoka, A. (eds.) ALT 2004. LNCS, vol. 3244, pp. 194–205. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  7. 7.
    Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: Proc. of PODS, Seattle, WA, pp. 159–168 (1998)Google Scholar
  8. 8.
    Achlioptas, D., McSherry, F., Schölkopf, B.: Sampling techniques for kernel methods. In: Proc. of NIPS, Vancouver, BC, Canada, pp. 335–342 (2001)Google Scholar
  9. 9.
    Bingham, E., Mannila, H.: Random projection in dimensionality reduction: Applications to image and text data. In: Proc. of KDD, San Francisco, CA, pp. 245–250 (2001)Google Scholar
  10. 10.
    Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proc. of STOC, Montreal, Quebec, Canada, pp. 380–388 (2002)Google Scholar
  11. 11.
    Ravichandran, D., Pantel, P., Hovy, E.: Randomized algorithms and NLP: Using locality sensitive hash function for high speed noun clustering. In: Proc. of ACL, Ann Arbor, MI, pp. 622–629 (2005)Google Scholar
  12. 12.
    Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering 18, 92–106 (2006)CrossRefGoogle Scholar
  13. 13.
    Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mapping into Hilbert space. Contemporary Mathematics 26, 189–206 (1984)MathSciNetMATHGoogle Scholar
  14. 14.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. of STOC, Dallas, TX, pp. 604–613 (1998)Google Scholar
  15. 15.
    Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences 66, 671–687 (2003)CrossRefMathSciNetMATHGoogle Scholar
  16. 16.
    Bartlett, M.S.: Approximate confidence intervals, II. Biometrika 40, 306–317 (1953)MathSciNetMATHGoogle Scholar
  17. 17.
    Small, C.G., Wang, J., Yang, Z.: Eliminating multiple root problems in estimation. Statistical Science 15, 313–341 (2000)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Lehmann, E.L., Romano, J.P.: Testing Statistical Hypothesis, 3rd edn. Springer, New York (2005)Google Scholar
  19. 19.
    Li, P., Paul, D., Narasimhan, R., Cioffi, J.: On the distribution of SINR for the MMSE MIMO receiver and performance analysis. IEEE Trans. Inform. Theory 52, 271–286 (2006)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Li, P., Hastie, T.J., Church, K.W.: Margin-constrained random projections and very sparse random projections. Technical report, Department of Statistics, Stanford University (2006)Google Scholar
  21. 21.
    Li, P., Church, K.W., Hastie, T.J.: A sketched-based sampling algorithm on sparse data. Technical report, Department of Statistics, Stanford University (2006)Google Scholar
  22. 22.
    Shenton, L.R., Bowman, K.: Higher moments of a maximum-likelihood estimate. Journal of Royal Statistical Society B 25, 305–317 (1963)MathSciNetMATHGoogle Scholar
  23. 23.
    Ferrari, S.L.P., Botter, D.A., Cordeiro, G.M., Cribari-Neto, F.: Second and third order bias reduction for one-parameter family models. Stat. and Prob. Letters 30, 339–345 (1996)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ping Li
    • 1
  • Trevor J. Hastie
    • 1
  • Kenneth W. Church
    • 2
  1. 1.Department of StatisticsStanford UniversityStanfordUSA
  2. 2.Microsoft ResearchRedmondUSA

Personalised recommendations