Skip to main content

Probability Theory

  • Chapter
  • First Online:
Statistics for Data Scientists

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

  • 4387 Accesses

Abstract

Statistics is a science that is concerned with principles, methods, and techniques for collecting, processing, analyzing, presenting, and interpreting (numerical) data. Statistics can be divided roughly into descriptive statistics (Chap. 1) and inferential statistics (Chap. 2), as we have already suggested. Descriptive statistics summarizes and visualizes the observed data. It is usually not very difficult, but it forms an essential part of reporting (scientific) results. Inferential statistics tries to draw conclusions from the data that would hold true for part or the whole of the population from which the data is collected. The theory of probability, which is the topic of the next two theoretical chapters, makes it possible to connect the two disciplines of descriptive and inferential statistics. We have already encountered some ideas from probability theory in the previous chapter. To start with, we discussed the probability of selecting a specific sample \(\pi _k\) and we briefly defined the notion of probability based on the throwing of a dice. In this chapter we work out these ideas more formally and discuss the probabilities of events; we define probabilities and discuss how to calculate with probabilities. In the previous chapter, when discussing bias, we have also encountered the expected population parameter \(\mathbb {E}(T)\), but we have not yet detailed what expectations are exactly; this is something we cover in Chap. 4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It should be noted here that a probability of zero does not necessarily mean that the event will never occur. This seems contradictory, but we will explain this later. On the other hand, if the event can never occur, the probability is zero.

  2. 2.

    Using definition Eq. (3.3) we can write \(\Pr (A\cap B)\) as \(\Pr (A|B)\Pr (B)\), as we did in Table 3.1, but also as \(\Pr (B|A)\Pr (A)\). Which one to use mostly depends on the practical situation. In Table 3.1 we could have used \(\Pr (B|A)\Pr (A)\) as well.

  3. 3.

    If, in this case, the population size(s) were known, we could calculate weighted averages to estimate the population parameters as we did in Chap. 2.

  4. 4.

    Note that Simpson’s Paradox, and its solutions, are still heavily debated (see, Armistead 2014 for examples).

References

  • T.W. Armistead, Resurrecting the third variable: a critique of pearl’s causal analysis of Simpson’s paradox. Am. Stat. 68(1), 1–7 (2014)

    Article  MathSciNet  Google Scholar 

  • C.R. Charig, D.R. Webb, S.R. Payne, J.E. Wickham, Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy. Br. Med. J. (Clin. Res. Ed.) 292(6524), 879–882 (1986)

    Article  Google Scholar 

  • G. Grimmett, D. Stirzaker et al., Probability and Random Processes (Oxford University Press, Oxford, 2001)

    Google Scholar 

  • N.P. Jewell, Statistics for Epidemiology (Chapman and Hall/CRC, Boca Raton, 2003)

    Google Scholar 

  • R. Lanting, E.R. Van Den Heuvel, B. Westerink, P.M. Werker, Prevalence of dupuytren disease in the Netherlands. Plast. Reconstr. Surg. 132(2), 394–403 (2013)

    Article  Google Scholar 

  • K.J. Rothman, S. Greenland, T.L. Lash et al., Modern Epidemiology, vol. 3 (Wolters Kluwer Health/Lippincott Williams & Wilkins, Philadelphia, 2008)

    Google Scholar 

  • E.H. Simpson, The interpretation of interaction in contingency tables. J. Roy. Stat. Soc.: Ser. B (Methodol.) 13(2), 238–241 (1951)

    MathSciNet  MATH  Google Scholar 

  • E.P. Veening, R.O.B. Gans, J.B.M. Kuks, Medische Consultvoering (Bohn Stafleu van Loghum, Houten, 2009)

    Google Scholar 

  • E. White, B.K. Armstrong, R. Saracci, Principles of Exposure Measurement in Epidemiology: Collecting, Evaluating and Improving Measures of Disease Risk Factors (OUP, Oxford, 2008)

    Google Scholar 

  • F.N. David, Studies in the History of Probability and Statistics I. Dicing and Gaming (A Note on the History of Probability). Biometrika, 42(1/2), 1–5 (1955)

    Google Scholar 

  • O.B. Sheynin, Early history of the theory of probability. Archive for History of Exact Sciences, 17(3), 201–259 (1977)

    Google Scholar 

  • S.M. Stigler, Studies in the History of Probability and Statistics. XXXIV: Napoleonic statistics: The work of Laplace. Biometrika, 62(2), 503–517 (1975)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maurits Kaptein .

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kaptein, M., van den Heuvel, E. (2022). Probability Theory. In: Statistics for Data Scientists . Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-10531-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10531-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10530-3

  • Online ISBN: 978-3-030-10531-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics