Skip to main content

Useful Tools for Statistics and Machine Learning

  • Chapter
  • First Online:

Part of the book series: Springer Texts in Statistics ((STS))

Abstract

As much as we would like to have analytical solutions to important problems, it is a fact that many of them are simply too difficult to admit closed-form solutions. Common examples of this phenomenon are finding exact distributions of estimators and statistics, computing the value of an exact optimum procedure, such as a maximum likelihood estimate, and numerous combinatorial algorithms of importance in computer science and applied probability. Unprecedented advances in computing powers and availability have inspired creative new methods and algorithms for solving old problems; often, these new methods are better than what we had in our toolbox before. This chapter provides a glimpse into a few selected computing tools and algorithms that have had a significant impact on the practice of probability and statistics, specifically, the bootstrap, the EM algorithm, and the use of kernels for smoothing and modern statistical classification. The treatment is supposed to be introductory, with references to more advanced parts of the literature.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Aizerman, M., Braverman, E., and Rozonoer, L. (1964). Theoretical foundations of the potential function method in pattern recognition learning, Autom. Remote Control, 25, 821–837.

    MathSciNet  Google Scholar 

  • Aronszajn, N. (1950). Theory of reproducing kernels,Trans. Amer. Math. Soc., 68, 307–404.

    Article  MathSciNet  Google Scholar 

  • Athreya, K. (1987). Bootstrap of the mean in the infinite variance case, Ann. Statist., 15, 724–731.

    Article  MathSciNet  MATH  Google Scholar 

  • Berlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and Statistics, Kluwer, Boston.

    Book  MATH  Google Scholar 

  • Bickel, P.J. (2003). Unorthodox bootstraps, Invited paper, J. Korean Statist. Soc., 32, 213–224.

    MathSciNet  Google Scholar 

  • Bickel, P.J. and Doksum, K. (2006).Mathematical Statistics, Basic Ideas and Selected Topics, Prentice Hall, upper Saddle River, NJ.

    Google Scholar 

  • Bickel, P.J. and Freedman, D. (1981). Some asymptotic theory for the bootrap, Ann. Statist., 9, 1196–1217.

    Article  MathSciNet  MATH  Google Scholar 

  • Carlstein, E. (1986). The use of subseries values for estimating the variance of a general statistic from a stationary sequence,Ann. Statist., 14, 1171–1179.

    Article  MathSciNet  MATH  Google Scholar 

  • Chan, K. and Ledolter, J. (1995). Monte Carlo estimation for time series models involving counts, J. Amer. Statist. Assoc., 90, 242–252.

    Article  MathSciNet  MATH  Google Scholar 

  • Cheney, W. (2001).Analysis for Applied Mathematics, Springer, New York.

    MATH  Google Scholar 

  • Cheney, W. and Light, W. (2000). A Course in Approximation Theory, Pacific Grove, Brooks/ Cole, CA.

    Google Scholar 

  • Cristianini, N. and Shawe-Taylor, J. (2000).An Introduction to Support Vector Machines and other Kernel Based Learning Methods, Cambridge Univ. Press, Cambridge, UK.

    Google Scholar 

  • DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability, Springer, New York.

    MATH  Google Scholar 

  • Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm,JRSS, Ser. B, 39, 1–38.

    MathSciNet  MATH  Google Scholar 

  • Devroye, L., Györfi, L., and Lugosi, G. (1996).A Probabilistic Theory of Pattern Recognition, Springer, New York.

    MATH  Google Scholar 

  • Efron, B. (2003). Second thoughts on the bootstrap, Statist. Sci., 18, 135–140.

    Article  MathSciNet  Google Scholar 

  • Efron, B. and Tibshirani, R. (1993).An Introduction to the Bootstrap, Chapman and Hall, London.

    MATH  Google Scholar 

  • Giné, E. and Zinn, J. (1989).Necessary conditions for bootstrap of the mean, Ann. Statist., 17, 684–691.

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P. (1986). On the number of bootstrap simulations required to construct a confidence interval,Ann. Statist., 14, 1453–1462.

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P. (1988). Rate of convergence in bootstrap approximations, Ann. prob, 16,4, 1665–1684.

    Article  MATH  Google Scholar 

  • Hall, P. (1989). On efficient bootstrap simulation,Biometrika, 76, 613–617.

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P. (1990). Asymptotic properties of the bootstrap for heavy-tailed distributions, Ann. Prob., 18, 1342–1360.

    Article  MATH  Google Scholar 

  • Hall, P. (1992).The Bootstrap and Edgeworth Expansion, Springer, New York.

    Google Scholar 

  • Hall, P., Horowitz, J. and Jing, B. (1995). On blocking rules for the bootstrap with dependent data, Biometrika, 82, 561–574.

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P. (2003). A short prehistory of the bootstrap,Statist. Sci., 18, 158–167.

    Article  MathSciNet  Google Scholar 

  • Künsch, H.R. (1989). The Jackknife and the bootstrap for general stationary observations, Ann. Statist., 17, 1217–1241.

    Article  MathSciNet  MATH  Google Scholar 

  • Lahiri, S.N. (1999). Theoretical comparisons of block bootstrap methods, Ann. Statist., 27, 386–404.

    Article  MathSciNet  MATH  Google Scholar 

  • Lahiri, S.N. (2003).Resampling Methods for Dependent Data, Springer-Verlag, New York.

    Book  MATH  Google Scholar 

  • Lahiri, S.N. (2006). Bootstrap methods, a review, in Frontiers in Statistics, J. Fan and H. Koul Eds., 231–256, Imperial College Press, London.

    Chapter  Google Scholar 

  • Lange, K. (1999).Numerical Analysis for Statisticians, Springer, New York.

    MATH  Google Scholar 

  • Le Cam, L. and Yang, G. (1990). Asymptotics in Statistics, Some Basic Concepts, Springer, New York.

    Book  MATH  Google Scholar 

  • Lehmann, E.L. (1999).Elements of Large Sample Theory, Springer, New York.

    Book  MATH  Google Scholar 

  • Lehmann, E.L. and Casella, G. (1998). Theory of Point Estimation, Springer, New York.

    MATH  Google Scholar 

  • Levine, R. and Casella, G. (2001). Implementation of the Monte Carlo EM algorithm,J. Comput. Graph. Statist., 10, 422–439.

    Article  MathSciNet  Google Scholar 

  • McLachlan, G. and Krishnan, T. (2008). The EM Algorithm and Extensions, Wiley, New York.

    Book  MATH  Google Scholar 

  • Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations,Philos. Trans. Royal Soc. London, A, 415–416.

    Google Scholar 

  • Minh, H., Niyogi, P., and Yao, Y. (2006). Mercer’s theorem, feature maps, and smoothing, Proc. Comput. Learning Theory, COLT, 154–168.

    Google Scholar 

  • Murray, G.D. (1977). Discussion of paper by Dempster, Laird, and Rubin (1977),JRSS Ser. B, 39, 27–28.

    Google Scholar 

  • Politis, D. and Romano, J. (1994). The stationary bootstrap, JASA, 89, 1303–1313.

    Article  MathSciNet  MATH  Google Scholar 

  • Politis, D. and White, A. (2004). Automatic block length selection for the dependent bootstrap,Econ. Rev., 23, 53–70.

    Article  MathSciNet  MATH  Google Scholar 

  • Politis, D., Romano, J. and Wolf, M. (1999). Subsampling, Springer, New York.

    Book  MATH  Google Scholar 

  • Rudin, W. (1986).Real and Complex Analysis, 3rd edition, McGraw-Hill, Columbus, OH.

    Google Scholar 

  • Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function, Ann. Math, Statist., 27, 832–835. 3rd Edition, McGraw-Hill, Columbus, OH.

    Google Scholar 

  • Shao, J. and Tu, D. (1995).The Jackknife and Bootstrap, Springer, New York.

    Book  MATH  Google Scholar 

  • Singh, K. (1981). On the asymptotic accuracy of Efron’s bootstrap, Ann. Statist., 9, 1187–1195.

    Article  MathSciNet  MATH  Google Scholar 

  • Sundberg, R. (1974). Maximum likelihood theory for incomplete data from exponential family,Scand. J. Statist., 1, 49–58.

    MathSciNet  MATH  Google Scholar 

  • Tong, Y. (1990). The Multivariate Normal Distribution, Springer, New York.

    Book  MATH  Google Scholar 

  • Vapnik, V. and Chervonenkis, A. (1964). A note on one class of perceptrons,Autom. Remote Control, 25.

    Google Scholar 

  • Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer, New York.

    MATH  Google Scholar 

  • Wei, G. and Tanner, M. (1990). A Monte Carlo implementation of the EM algorithm,J. Amer. Statist. Assoc., 85, 699–704.

    Article  Google Scholar 

  • Wu, C.F.J. (1983). On the convergence properties of the EM algorithm, Ann. Statist., 11, 95–103.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anirban DasGupta .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

DasGupta, A. (2011). Useful Tools for Statistics and Machine Learning. In: Probability for Statistics and Machine Learning. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9634-3_20

Download citation

Publish with us

Policies and ethics