Controlled experiments on the web: survey and practical guide

Kohavi, Ron; Longbotham, Roger; Sommerfield, Dan; Henne, Randal M.

doi:10.1007/s10618-008-0114-1

Controlled experiments on the web: survey and practical guide

Open access
Published: 30 July 2008

Volume 18, pages 140–181, (2009)
Cite this article

Download PDF

You have full access to this open access article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Controlled experiments on the web: survey and practical guide

Download PDF

Ron Kohavi¹,
Roger Longbotham¹,
Dan Sommerfield¹ &
…
Randal M. Henne¹

22k Accesses
405 Citations
18 Altmetric
1 Mention
Explore all metrics

Abstract

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.

Article PDF

What is Qualitative in Qualitative Research

Article Open access 27 February 2019

Patrik Aspers & Ugo Corte

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Sander Greenland, Stephen J. Senn, … Douglas G. Altman

The potential of working hypotheses for deductive exploratory research

Article Open access 08 December 2020

Mattia Casula, Nandhini Rangarajan & Patricia Shields

References

Alt B, Usborne N (2005) Market Exp J. [Online] December 29, 2005. http://www.marketingexperiments.com/improving-website-conversion/multivariable-testing.html
Boos DD, Hughes-Oliver JM (2000) How large does n have to be for Z and t intervals?. Am Statist 54(2): 121–128
Article Google Scholar
Box GEP, Hunter JS, Hunter WG (2005) Statistics for experimenters: design, innovation, and discovery, 2nd edn. Wiley, ISBN: 0471718130
Burns M (2006) Web analytics spendings trends 2007. Forrester Research Inc., Cambridge
Google Scholar
Charles RS, Melvin MM (2004) Quasi experimentation. [book auth.] In: Wholey JS, Hatry HP, Newcomer KE (eds) Handbook of practical program evaluation, 2nd edn. Jossey-Bass
Chatham B, Temkin BD, Amato M (2004) A primer on A/B testing. Forrester Research
Davies OL, Hay WA (1950) Construction and uses of fractional factorial designs in industrial research. Biometrics 233(6): 121–128
Google Scholar
Eisenberg B (2003a) How to Decrease sales by 90%. ClickZ. [Online] Feb 21, 2003. http://www.clickz.com/showPage.html?page=1588161
Eisenberg B (2003b) How to increase conversion rate 1,000%. ClickZ. [Online] Feb 28, 2003. http://www.clickz.com/showPage.html?page=1756031
Eisenberg B (2004) A/B testing for the mathematically disinclined. ClickZ. [Online] May 7, 2004. http://www.clickz.com/showPage.html?page=3349901
Eisenberg B (2005) How to improve A/B testing. ClickZ Netw. [Online] April 29, 2005. http://www.clickz.com/showPage.html?page=3500811
Eisenberg B, Eisenberg J (2005) Call to action, secret formulas to improve online results. Wizard Academy Press, Austin, 2005. Making the dial move by testing, introducing A/B testing
Eisenberg B, Garcia A (2006) Which sells best: a quick start guide to testing for retailers. Future now’s publications. [Online] 2006. http://futurenowinc.com/shop/
Forrester Research (2005) The state of retailing online. Shop.org
Google Website Optimizer (2008) [Online] 2008. http://services.google.com/websiteoptimizer
Hawthorne effect (2007) Wikipedia. [Online] 2007. http://en.wikipedia.org/wiki/Hawthorne_experiments
Hopkins C (1923) Scientific advertising. Crown Publishers Inc., New York City
Google Scholar
Kaplan RS, Norton DP (1996) The balanced scorecard: translating strategy into action. Harvard Business School Press, ISBN: 0875846513
Kaushik A (2006) Experimentation and testing: a primer. Occam’s Razor by Avinash Kaushik. [Online] May 22, 2006. http://www.kaushik.net/avinash/2006/05/experimentation-and-testing-a-primer.html
Keppel G, Saufley WH, Tokunaga H (1992) Introduction to design and analysis, 2nd edn. W.H. Freeman and Company
Kohavi R (2007) Emetrics 2007 practical guide to controlled experiments on the web. [Online] October 16, 2007. http://exp-platform.com/Documents/2007-10EmetricsExperimenation.pdf
Kohavi R, Parekh R (2003) Ten supplementary analyses to improve e-commerce web sites. WebKDD
Kohavi R, Round M (2004) In: Sterne J (ed) Front line internet analytics at Amazon.com. Santa Barbara, CA. http://ai.stanford.edu/~ronnyk/emetricsAmazon.pdf
Kohavi R et al (2004) Lessons and challenges from mining retail e-commerce data. Machine Learn 57(1–2):83–113. http://ai.stanford.edu/~ronnyk/lessonsInDM.pdf
Google Scholar
Koselka R (1996) The new mantra: MVT. Forbes. March 11, 1996, pp 114–118
Linden G (2006a) Early Amazon: shopping cart recommendations. Geeking with Greg. [Online] April 25, 2006. http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html
Linden G (2006b) Make data useful. [Online] Dec 2006. http://home.blarg.net/~glinden/StanfordDataMining.2006-11-29.ppt
Manning H, Dorsey M, Carney CL (2006) Don’t rationalize bad site design. Forrester Research, Cambridge
Google Scholar
Marks HM (2000) The progress of experiment: science and therapeutic reform in the united states, 1900–1990. Cambridge University Press, ISBN: 978-0521785617
Maron O, Moore AW (1994) Hoeffding races: accelerating model selection search for classification and function approximation. http://citeseer.ist.psu.edu/maron94hoeffding.html
Mason RL, Gunst RF, Hess JL (1989) Statistical design and analysis of experiments with applications to engineering and science. Wiley, ISBN: 047185364X
McGlaughlin F et al (2006) The power of small changes tested. Market Exp J. [Online] March 21, 2006. http://www.marketingexperiments.com/improving-website-conversion/power-small-change.html
Miller S (2006) The ConversionLab.com: how to experiment your way to increased web sales using split testing and Taguchi optimization. http://www.conversionlab.com/
Miller S (2007) How to design a split test. Web marketing today, conversion/testing. [Online] Jan 18, 2007. http://www.wilsonweb.com/conversion/
Moran M (2007) Do it wrong quickly: how the web changes the old marketing rules. IBM Press, ISBN: 0132255960
Nielsen J (2005) Putting A/B testing in its place. Useit.com Alertbox. [Online] Aug 15, 2005. http://www.useit.com/alertbox/20050815.html
Omniture (2008) [Online] 2008. http://www.omniture.com/products/optimization/offermatica
Optimost (2008) [Online] 2008. http://www.optimost.com
Peterson ET (2004) Web analytics demystified: a marketer’s guide to understanding how your web site affects your business. Celilo Group Media and CafePress, ISBN: 0974358428
Peterson ET (2005) Web site measurement hacks. O’Reilly Media, ISBN: 0596009887
Plackett RL, Burman JP (1946) The design of optimum multifactorial experiments. Biometrika 33: 305–325
Article MATH MathSciNet Google Scholar
Quarto-vonTivadar J (2006) AB testing: too little, too soon. Future Now. [Online] 2006. http://www.futurenowinc.com/abtesting.pdf
Rossi PH, Lipsey MW, Freeman HE (2003) Evaluation: a systematic approach, 7th edn. Sage Publications, Inc., ISBN: 0-7619-0894-3
Roy RK (2001) Design of experiments using the taguchi approach: 16 steps to product and process improvement. Wiley, ISBN: 0-471-36101-1
SiteSpect (2008) [Online] 2008. http://www.sitespect.com
Spool JM (2004) The cost of frustration. WebProNews. [Online] September 20, 2004. http://www.webpronews.com/topnews/2004/09/20/the-cost-of-frustration
Sterne J (2002) Web metrics: proven methods for measuring web site success. Wiley, ISBN: 0-471-22072-8
Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Dis
Thomke S (2001) Enlightened experimentation: the new imperative for innovation, Feb 2001
Thomke SH (2003) Experimentation matters: unlocking the potential of new technologies for innovation
Tyler ME, Ledford J (2006) Google analytics. Wiley, ISBN: 0470053852
Ulwick A (2005) What customers want: using outcome-driven innovation to create breakthrough products and services. McGraw-Hill, ISBN: 0071408673
Usborne N (2005) Design choices can cripple a website. A list apart. [Online] Nov 8, 2005. http://alistapart.com/articles/designcancripple
van Belle G (2002) Statistical rules of thumb. Wiley, ISBN: 0471402273
Varian HR (2007) Kaizen, that continuous improvement strategy, finds its ideal environment. New York Times. February 8, 2007. Online at http://www.nytimes.com/2007/02/08/business/08scene.html?fta=y
Verster (2008) [Online] 2008. http://www.vertster.com
Weiss CH (1997) Evaluation: methods for studying programs and policies, 2nd edn. Prentice Hall, ISBN: 0-13-309725-0
Weiss TR (2000) Amazon apologizes for price-testing program that angered customers. http://www.Safecount.net. [Online] September 28, 2000. http://www.infoworld.com/articles/hn/xml/00/09/28/000928hnamazondvd.html
Wheeler RE (1974) Portable power. Technometrics 16:193–201. http://www.bobwheeler.com/stat/Papers/PortablePower.PDF
Google Scholar
Wheeler RE (1975) The validity of portable power. Technometrics 17(2):177–179
Google Scholar
Widemile (2008) [Online] 2008. http://www.widemile.com
Wikepedia (2008) Multi-armed bandit. Wikipedia. [Online] 2008. http://en.wikipedia.org/wiki/Multi-armed_bandit
Willan AR, Briggs AH (2006) Statistical analysis of cost-effectiveness data (statistics in practice). Wiley

Download references

Author information

Authors and Affiliations

Microsoft, One Microsoft Way, Redmond, WA, 98052, USA
Ron Kohavi, Roger Longbotham, Dan Sommerfield & Randal M. Henne

Authors

Ron Kohavi
View author publications
You can also search for this author in PubMed Google Scholar
Roger Longbotham
View author publications
You can also search for this author in PubMed Google Scholar
Dan Sommerfield
View author publications
You can also search for this author in PubMed Google Scholar
Randal M. Henne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ron Kohavi.

Additional information

Responsible editor: R. Bayardo.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Kohavi, R., Longbotham, R., Sommerfield, D. et al. Controlled experiments on the web: survey and practical guide. Data Min Knowl Disc 18, 140–181 (2009). https://doi.org/10.1007/s10618-008-0114-1

Download citation

Received: 14 February 2008
Accepted: 30 June 2008
Published: 30 July 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s10618-008-0114-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Controlled experiments on the web: survey and practical guide

Abstract

Article PDF

Similar content being viewed by others

What is Qualitative in Qualitative Research

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

The potential of working hypotheses for deductive exploratory research

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Controlled experiments on the web: survey and practical guide

Abstract

Article PDF

Similar content being viewed by others

What is Qualitative in Qualitative Research

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

The potential of working hypotheses for deductive exploratory research

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation