Encyclopedia of Machine Learning and Data Mining

2017 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Online Controlled Experiments and A/B Testing

  • Ron Kohavi
  • Roger Longbotham
Reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7687-1_891

Abstract

The Internet connectivity of client software (e.g., apps running on phones and PCs), websites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the scientific method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users.

The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box et al., Statistics for experimenters: design, innovation, and discovery. Wiley, Hoboken, 2005). Online-controlled experiments started to be used in the late 1990s with the growth of the Internet. Today, many large sites, including Amazon, Bing, Facebook, Google, LinkedIn, and Yahoo!, run thousands to tens of thousands of experiments each year testing user interface (UI) changes, enhancements to algorithms (search, ads, personalization, recommendation, etc.), changes to apps, content management system, etc. Online-controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites. Controlled experiments are especially useful in combination with Agile software development (Martin, Clean code: a handbook of Agile software craftsmanship. Prentice Hall, Upper Saddle River, 2008; Rubin, Essential scrum: a practical guide to the most popular Agile process. Addison-Wesley Professional, Upper Saddle River, 2012), Steve Blank’s Customer Development process (Blank, The four steps to the epiphany: successful strategies for products that win. Cafepress.com., 2005), and MVPs (minimum viable products) popularized by Eric Ries’s Lean Startup (Ries, The lean startup: how today’s entrepreneurs use continuous innovation to create radically successful businesses. Crown Business, New York, 2011).

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Biau DJ, Jolles BM, Porcher R (2010) P value and the theory of hypothesis testing. Clin Orthop Relat Res 468(3):885–892CrossRefGoogle Scholar
  2. Bickel PJ, Doksum KA (1981) An analysis of transformations revisited. J Am Stat Assoc 76(374):296–311. doi:10.1080/01621459.1981.10477649MathSciNetCrossRefzbMATHGoogle Scholar
  3. Blank SG (2005) The four steps to the epiphany: successful strategies for products that win. Cafepress.com.Google Scholar
  4. Box GEP, Hunter JS, Hunter WG (2005) Statistics for experimenters: design, innovation, and discovery. Wiley, HobokenzbMATHGoogle Scholar
  5. Casella G, Berger RL (2001) Statistical inference, 2nd edn. Cengage Learning. http://www.amazon.com/Statistical-Inference-George-Casella
  6. Deng A, Hu V (2015) Diluted treatment effect estimation for trigger analysis in online controlled experiments. In: WSDM, Shanghai 2015Google Scholar
  7. Deng A, Xu Y, Kohavi R, Walker T (2013) Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In: WSDM, Rome 2013CrossRefGoogle Scholar
  8. Deng S, Longbotham R, Walker T, Xu Y (2011) Choice of randomization unit in online controlled experiment. In: Joint statistical meetings proceedings, Miami Beach, pp 4866–4877Google Scholar
  9. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New YorkCrossRefzbMATHGoogle Scholar
  10. Fieller EC (1954) Some problems in interval estimation. J R Stat Soc Ser B 16(2):175–185. doi:JSTOR2984043Google Scholar
  11. Good PI (2005) Permutation, parametric and bootstrap tests of hypotheses, 3rd edn. Springer, New YorkzbMATHGoogle Scholar
  12. Goward C (2012) You should test that: conversion optimization for more leads, sales and profit or the art and science of optimized marketing. Sybex. http://www.amazon.com/You-Should-Test-That-Optimization/dp/1118301307
  13. Hochberg Y Benjamini Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing Series B. J R Stat Soc 57(1):289–300zbMATHGoogle Scholar
  14. Kaushik A (2006) Experimentation and testing: a primer. Occam’s razor. http://www.kaushik.net/avinash / 2006 / 05 / experimentation-and-testing-a-primer.html. Accessed 22 May 2008Google Scholar
  15. Kohavi R, Deng A, Frasca B, Longbotham R, Walker T, Xu Y (2012) Trustworthy online controlled experiments: five puzzling outcomes explained. In: Proceedings of the 18th conference on knowledge discovery and data mining. http://bit.ly/expPuzzling
  16. Kohavi R, Deng A, Frasca B, Walker T, Xu Y, Pohlmann N (2013) Online controlled experiments at large scale. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2013). http://bit.ly/ExPScale
  17. Kohavi R, Deng A, Longbotham R, Xu Y (2014) Seven rules of thumb for web site. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’14). http://bit.ly/expRulesOfThumb
  18. Kohavi R, Longbotham R (2010) Unexpected results in online controlled experiments. In: SIGKDD Explorations. http://bit.ly/expUnexpected
  19. Kohavi R, Longbotham R, Walker T (2010) Online experiments: practical lessons. IEEE Comput Sept:82–85. http://bit.ly/expPracticalLessons
  20. Kohavi R, Longbotham R, Sommerfield D, Henne RM (2009) Controlled experiments on the web: survey and practical guide. Data Min Knowl Discov 18:140–181. http://bit.ly/expSurvey
  21. Kohavi R, Crook T, Longbotham R (2009) Online experimentation at microsoft. In: Third workshop on data mining case studies and practice prize. http://bit.ly/expMicrosoft
  22. Malinas G, Bigelow J (2009) Simpson’s paradox. Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/entries/paradox-simpson/
  23. Manzi J (2012) Uncontrolled: the surprising payoff of trial-and-error for business, politics, and society. Basic Books. https://www.amazon.com/Uncontrolled-Surprising-Trial-Error-Business-ebook/dp/B007V2VEQO
  24. Martin RC (2008) Clean code: a handbook of Agile software craftsmanship. Prentice Hall, Upper Saddle RiverGoogle Scholar
  25. McFarland C (2012a) Experiment!: website conversion rate optimization with A/B and multivariate. New Riders. http://www.amazon.com/Experiment-Website-conversion-optimization-multivariate/dp/0321834607
  26. McFarland C (2012b) Experiment!: website conversion rate optimization with A/B and multivariate testing. New Riders. http://www.amazon.com/Experiment-Website-conversion-optimization-multivariate/dp/0321834607
  27. McKinley D (2013) Testing to cull the living flower. http://mcfunley.com/testing-to-cull-the-living-flower
  28. Moran M (2007) Do it wrong quickly: how the web changes the old marketing rules. IBM Press. http://www.amazon.com/Do-Wrong-Quickly-Changes-Marketing/dp/0132255960/
  29. Moran M (2008) Multivariate testing in action: Quicken Loan’s Regis Hadiaris on multivariate testing. www.biznology.com/2008/12/multivariate_testing_in_action/
  30. Peterson ET (2004) Web analytics demystified: a marketer’s guide to understanding how your web site affects your business. Celilo Group Media and CafePress. http://www.amazon.com/Web-Analytics-Demystified-Marketers-Understanding/dp/0974358428/
  31. Ries E (2011) The lean startup: how today’s entrepreneurs use continuous innovation to create radically successful businesses. Crown Business, New YorkGoogle Scholar
  32. Rubin KS (2012) Essential scrum: a practical guide to the most popular Agile process. Addison-Wesley Professional, Upper Saddle RiverGoogle Scholar
  33. Schrage M (2014) The innovator’s hypothesis: how cheap experiments are worth more than good ideas. MIT Press. http://www.amazon.com/Innovators-Hypothesis-Cheap-Experiments-Worth/dp/0262528967
  34. Siroker D, Koomen P (2013) A/B testing: the most powerful way to turn clicks into customers. Wiley. http://www.amazon.com/Testing-Most-Powerful-Clicks-Customers/dp/1118792416
  35. Stone JV (2013) Bayes’ rule: a tutorial introduction to Bayesian analysis. Sebtel Press. http://www.amazon.com/Bayes-Rule-Tutorial-Introduction-Bayesian/dp/0956372848
  36. Tang D, Agarwal A, O’Brien D, Meyer M (2010) Overlapping experiment infrastructure: more, better, faster experimentation. In: KDD 2010: The 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, 25–28 JulyGoogle Scholar
  37. Ugander J, Karrer B, Backstrom L, Kleinberg J (2013) Graph cluster randomization: network exposure to multiple universes. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’13), ChicagoGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Ron Kohavi
    • 1
  • Roger Longbotham
    • 2
  1. 1.Application Services GroupMicrosoftBellevueUSA
  2. 2.Data and Decision Sciences GroupMicrosoft, RedmondUSA