Abstract
The Internet connectivity of client software (e.g., apps running on phones and PCs), websites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the scientific method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users.
The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box et al., Statistics for experimenters: design, innovation, and discovery. Wiley, Hoboken, 2005). Online-controlled experiments started to be used in the late 1990s with the growth of the Internet. Today, many large sites, including Amazon, Bing, Facebook, Google, LinkedIn, and Yahoo!, run thousands to tens of thousands of experiments each year testing user interface (UI) changes, enhancements to algorithms (search, ads, personalization, recommendation, etc.), changes to apps, content management system, etc. Online-controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites. Controlled experiments are especially useful in combination with Agile software development (Martin, Clean code: a handbook of Agile software craftsmanship. Prentice Hall, Upper Saddle River, 2008; Rubin, Essential scrum: a practical guide to the most popular Agile process. Addison-Wesley Professional, Upper Saddle River, 2012), Steve Blank’s Customer Development process (Blank, The four steps to the epiphany: successful strategies for products that win. Cafepress.com., 2005), and MVPs (minimum viable products) popularized by Eric Ries’s Lean Startup (Ries, The lean startup: how today’s entrepreneurs use continuous innovation to create radically successful businesses. Crown Business, New York, 2011).
This is a preview of subscription content, access via your institution.
Buying options


Recommended Reading
Biau DJ, Jolles BM, Porcher R (2010) P value and the theory of hypothesis testing. Clin Orthop Relat Res 468(3):885–892
Bickel PJ, Doksum KA (1981) An analysis of transformations revisited. J Am Stat Assoc 76(374):296–311. doi:10.1080/01621459.1981.10477649
Blank SG (2005) The four steps to the epiphany: successful strategies for products that win. Cafepress.com.
Box GEP, Hunter JS, Hunter WG (2005) Statistics for experimenters: design, innovation, and discovery. Wiley, Hoboken
Casella G, Berger RL (2001) Statistical inference, 2nd edn. Cengage Learning. http://www.amazon.com/Statistical-Inference-George-Casella
Deng A, Hu V (2015) Diluted treatment effect estimation for trigger analysis in online controlled experiments. In: WSDM, Shanghai 2015
Deng A, Xu Y, Kohavi R, Walker T (2013) Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In: WSDM, Rome 2013
Deng S, Longbotham R, Walker T, Xu Y (2011) Choice of randomization unit in online controlled experiment. In: Joint statistical meetings proceedings, Miami Beach, pp 4866–4877
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New York
Fieller EC (1954) Some problems in interval estimation. J R Stat Soc Ser B 16(2):175–185. doi:JSTOR2984043
Good PI (2005) Permutation, parametric and bootstrap tests of hypotheses, 3rd edn. Springer, New York
Goward C (2012) You should test that: conversion optimization for more leads, sales and profit or the art and science of optimized marketing. Sybex. http://www.amazon.com/You-Should-Test-That-Optimization/dp/1118301307
Hochberg Y Benjamini Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing Series B. J R Stat Soc 57(1):289–300
Kaushik A (2006) Experimentation and testing: a primer. Occam’s razor. http://www.kaushik.net/avinash / 2006 / 05 / experimentation-and-testing-a-primer.html. Accessed 22 May 2008
Kohavi R, Deng A, Frasca B, Longbotham R, Walker T, Xu Y (2012) Trustworthy online controlled experiments: five puzzling outcomes explained. In: Proceedings of the 18th conference on knowledge discovery and data mining. http://bit.ly/expPuzzling
Kohavi R, Deng A, Frasca B, Walker T, Xu Y, Pohlmann N (2013) Online controlled experiments at large scale. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2013). http://bit.ly/ExPScale
Kohavi R, Deng A, Longbotham R, Xu Y (2014) Seven rules of thumb for web site. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’14). http://bit.ly/expRulesOfThumb
Kohavi R, Longbotham R (2010) Unexpected results in online controlled experiments. In: SIGKDD Explorations. http://bit.ly/expUnexpected
Kohavi R, Longbotham R, Walker T (2010) Online experiments: practical lessons. IEEE Comput Sept:82–85. http://bit.ly/expPracticalLessons
Kohavi R, Longbotham R, Sommerfield D, Henne RM (2009) Controlled experiments on the web: survey and practical guide. Data Min Knowl Discov 18:140–181. http://bit.ly/expSurvey
Kohavi R, Crook T, Longbotham R (2009) Online experimentation at microsoft. In: Third workshop on data mining case studies and practice prize. http://bit.ly/expMicrosoft
Malinas G, Bigelow J (2009) Simpson’s paradox. Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/entries/paradox-simpson/
Manzi J (2012) Uncontrolled: the surprising payoff of trial-and-error for business, politics, and society. Basic Books. https://www.amazon.com/Uncontrolled-Surprising-Trial-Error-Business-ebook/dp/B007V2VEQO
Martin RC (2008) Clean code: a handbook of Agile software craftsmanship. Prentice Hall, Upper Saddle River
McFarland C (2012a) Experiment!: website conversion rate optimization with A/B and multivariate. New Riders. http://www.amazon.com/Experiment-Website-conversion-optimization-multivariate/dp/0321834607
McFarland C (2012b) Experiment!: website conversion rate optimization with A/B and multivariate testing. New Riders. http://www.amazon.com/Experiment-Website-conversion-optimization-multivariate/dp/0321834607
McKinley D (2013) Testing to cull the living flower. http://mcfunley.com/testing-to-cull-the-living-flower
Moran M (2007) Do it wrong quickly: how the web changes the old marketing rules. IBM Press. http://www.amazon.com/Do-Wrong-Quickly-Changes-Marketing/dp/0132255960/
Moran M (2008) Multivariate testing in action: Quicken Loan’s Regis Hadiaris on multivariate testing. www.biznology.com/2008/12/multivariate_testing_in_action/
Peterson ET (2004) Web analytics demystified: a marketer’s guide to understanding how your web site affects your business. Celilo Group Media and CafePress. http://www.amazon.com/Web-Analytics-Demystified-Marketers-Understanding/dp/0974358428/
Ries E (2011) The lean startup: how today’s entrepreneurs use continuous innovation to create radically successful businesses. Crown Business, New York
Rubin KS (2012) Essential scrum: a practical guide to the most popular Agile process. Addison-Wesley Professional, Upper Saddle River
Schrage M (2014) The innovator’s hypothesis: how cheap experiments are worth more than good ideas. MIT Press. http://www.amazon.com/Innovators-Hypothesis-Cheap-Experiments-Worth/dp/0262528967
Siroker D, Koomen P (2013) A/B testing: the most powerful way to turn clicks into customers. Wiley. http://www.amazon.com/Testing-Most-Powerful-Clicks-Customers/dp/1118792416
Stone JV (2013) Bayes’ rule: a tutorial introduction to Bayesian analysis. Sebtel Press. http://www.amazon.com/Bayes-Rule-Tutorial-Introduction-Bayesian/dp/0956372848
Tang D, Agarwal A, O’Brien D, Meyer M (2010) Overlapping experiment infrastructure: more, better, faster experimentation. In: KDD 2010: The 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, 25–28 July
Ugander J, Karrer B, Backstrom L, Kleinberg J (2013) Graph cluster randomization: network exposure to multiple universes. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’13), Chicago
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Kohavi, R., Longbotham, R. (2017). Online Controlled Experiments and A/B Testing. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_891
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_891
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering