Abstract
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Alt B, Usborne N (2005) Market Exp J. [Online] December 29, 2005. http://www.marketingexperiments.com/improving-website-conversion/multivariable-testing.html
Boos DD, Hughes-Oliver JM (2000) How large does n have to be for Z and t intervals?. Am Statist 54(2): 121–128
Box GEP, Hunter JS, Hunter WG (2005) Statistics for experimenters: design, innovation, and discovery, 2nd edn. Wiley, ISBN: 0471718130
Burns M (2006) Web analytics spendings trends 2007. Forrester Research Inc., Cambridge
Charles RS, Melvin MM (2004) Quasi experimentation. [book auth.] In: Wholey JS, Hatry HP, Newcomer KE (eds) Handbook of practical program evaluation, 2nd edn. Jossey-Bass
Chatham B, Temkin BD, Amato M (2004) A primer on A/B testing. Forrester Research
Davies OL, Hay WA (1950) Construction and uses of fractional factorial designs in industrial research. Biometrics 233(6): 121–128
Eisenberg B (2003a) How to Decrease sales by 90%. ClickZ. [Online] Feb 21, 2003. http://www.clickz.com/showPage.html?page=1588161
Eisenberg B (2003b) How to increase conversion rate 1,000%. ClickZ. [Online] Feb 28, 2003. http://www.clickz.com/showPage.html?page=1756031
Eisenberg B (2004) A/B testing for the mathematically disinclined. ClickZ. [Online] May 7, 2004. http://www.clickz.com/showPage.html?page=3349901
Eisenberg B (2005) How to improve A/B testing. ClickZ Netw. [Online] April 29, 2005. http://www.clickz.com/showPage.html?page=3500811
Eisenberg B, Eisenberg J (2005) Call to action, secret formulas to improve online results. Wizard Academy Press, Austin, 2005. Making the dial move by testing, introducing A/B testing
Eisenberg B, Garcia A (2006) Which sells best: a quick start guide to testing for retailers. Future now’s publications. [Online] 2006. http://futurenowinc.com/shop/
Forrester Research (2005) The state of retailing online. Shop.org
Google Website Optimizer (2008) [Online] 2008. http://services.google.com/websiteoptimizer
Hawthorne effect (2007) Wikipedia. [Online] 2007. http://en.wikipedia.org/wiki/Hawthorne_experiments
Hopkins C (1923) Scientific advertising. Crown Publishers Inc., New York City
Kaplan RS, Norton DP (1996) The balanced scorecard: translating strategy into action. Harvard Business School Press, ISBN: 0875846513
Kaushik A (2006) Experimentation and testing: a primer. Occam’s Razor by Avinash Kaushik. [Online] May 22, 2006. http://www.kaushik.net/avinash/2006/05/experimentation-and-testing-a-primer.html
Keppel G, Saufley WH, Tokunaga H (1992) Introduction to design and analysis, 2nd edn. W.H. Freeman and Company
Kohavi R (2007) Emetrics 2007 practical guide to controlled experiments on the web. [Online] October 16, 2007. http://exp-platform.com/Documents/2007-10EmetricsExperimenation.pdf
Kohavi R, Parekh R (2003) Ten supplementary analyses to improve e-commerce web sites. WebKDD
Kohavi R, Round M (2004) In: Sterne J (ed) Front line internet analytics at Amazon.com. Santa Barbara, CA. http://ai.stanford.edu/~ronnyk/emetricsAmazon.pdf
Kohavi R et al (2004) Lessons and challenges from mining retail e-commerce data. Machine Learn 57(1–2):83–113. http://ai.stanford.edu/~ronnyk/lessonsInDM.pdf
Koselka R (1996) The new mantra: MVT. Forbes. March 11, 1996, pp 114–118
Linden G (2006a) Early Amazon: shopping cart recommendations. Geeking with Greg. [Online] April 25, 2006. http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html
Linden G (2006b) Make data useful. [Online] Dec 2006. http://home.blarg.net/~glinden/StanfordDataMining.2006-11-29.ppt
Manning H, Dorsey M, Carney CL (2006) Don’t rationalize bad site design. Forrester Research, Cambridge
Marks HM (2000) The progress of experiment: science and therapeutic reform in the united states, 1900–1990. Cambridge University Press, ISBN: 978-0521785617
Maron O, Moore AW (1994) Hoeffding races: accelerating model selection search for classification and function approximation. http://citeseer.ist.psu.edu/maron94hoeffding.html
Mason RL, Gunst RF, Hess JL (1989) Statistical design and analysis of experiments with applications to engineering and science. Wiley, ISBN: 047185364X
McGlaughlin F et al (2006) The power of small changes tested. Market Exp J. [Online] March 21, 2006. http://www.marketingexperiments.com/improving-website-conversion/power-small-change.html
Miller S (2006) The ConversionLab.com: how to experiment your way to increased web sales using split testing and Taguchi optimization. http://www.conversionlab.com/
Miller S (2007) How to design a split test. Web marketing today, conversion/testing. [Online] Jan 18, 2007. http://www.wilsonweb.com/conversion/
Moran M (2007) Do it wrong quickly: how the web changes the old marketing rules. IBM Press, ISBN: 0132255960
Nielsen J (2005) Putting A/B testing in its place. Useit.com Alertbox. [Online] Aug 15, 2005. http://www.useit.com/alertbox/20050815.html
Omniture (2008) [Online] 2008. http://www.omniture.com/products/optimization/offermatica
Optimost (2008) [Online] 2008. http://www.optimost.com
Peterson ET (2004) Web analytics demystified: a marketer’s guide to understanding how your web site affects your business. Celilo Group Media and CafePress, ISBN: 0974358428
Peterson ET (2005) Web site measurement hacks. O’Reilly Media, ISBN: 0596009887
Plackett RL, Burman JP (1946) The design of optimum multifactorial experiments. Biometrika 33: 305–325
Quarto-vonTivadar J (2006) AB testing: too little, too soon. Future Now. [Online] 2006. http://www.futurenowinc.com/abtesting.pdf
Rossi PH, Lipsey MW, Freeman HE (2003) Evaluation: a systematic approach, 7th edn. Sage Publications, Inc., ISBN: 0-7619-0894-3
Roy RK (2001) Design of experiments using the taguchi approach: 16 steps to product and process improvement. Wiley, ISBN: 0-471-36101-1
SiteSpect (2008) [Online] 2008. http://www.sitespect.com
Spool JM (2004) The cost of frustration. WebProNews. [Online] September 20, 2004. http://www.webpronews.com/topnews/2004/09/20/the-cost-of-frustration
Sterne J (2002) Web metrics: proven methods for measuring web site success. Wiley, ISBN: 0-471-22072-8
Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Dis
Thomke S (2001) Enlightened experimentation: the new imperative for innovation, Feb 2001
Thomke SH (2003) Experimentation matters: unlocking the potential of new technologies for innovation
Tyler ME, Ledford J (2006) Google analytics. Wiley, ISBN: 0470053852
Ulwick A (2005) What customers want: using outcome-driven innovation to create breakthrough products and services. McGraw-Hill, ISBN: 0071408673
Usborne N (2005) Design choices can cripple a website. A list apart. [Online] Nov 8, 2005. http://alistapart.com/articles/designcancripple
van Belle G (2002) Statistical rules of thumb. Wiley, ISBN: 0471402273
Varian HR (2007) Kaizen, that continuous improvement strategy, finds its ideal environment. New York Times. February 8, 2007. Online at http://www.nytimes.com/2007/02/08/business/08scene.html?fta=y
Verster (2008) [Online] 2008. http://www.vertster.com
Weiss CH (1997) Evaluation: methods for studying programs and policies, 2nd edn. Prentice Hall, ISBN: 0-13-309725-0
Weiss TR (2000) Amazon apologizes for price-testing program that angered customers. http://www.Safecount.net. [Online] September 28, 2000. http://www.infoworld.com/articles/hn/xml/00/09/28/000928hnamazondvd.html
Wheeler RE (1974) Portable power. Technometrics 16:193–201. http://www.bobwheeler.com/stat/Papers/PortablePower.PDF
Wheeler RE (1975) The validity of portable power. Technometrics 17(2):177–179
Widemile (2008) [Online] 2008. http://www.widemile.com
Wikepedia (2008) Multi-armed bandit. Wikipedia. [Online] 2008. http://en.wikipedia.org/wiki/Multi-armed_bandit
Willan AR, Briggs AH (2006) Statistical analysis of cost-effectiveness data (statistics in practice). Wiley
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: R. Bayardo.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Kohavi, R., Longbotham, R., Sommerfield, D. et al. Controlled experiments on the web: survey and practical guide. Data Min Knowl Disc 18, 140–181 (2009). https://doi.org/10.1007/s10618-008-0114-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0114-1