The prospect of conducting rigorous, randomized, controlled psychology experiments on the Web first received substantial consideration in 1996, at the 26th annual meeting of the Society for Computers in Psychology (SCiP), in a pair of symposia on Internet experiments (SCiP, 1996). Twenty years later, Ulf-Dietrich Reips organized the SCiP symposium 20 Years of Internet-Based Research at SCiP: Surviving Concepts, New Methodologies, for which I served in the role of discussant.

The 2016 symposium included talks titled “Update on the Health and Use of the Web for Psychological Research” (Krantz & Reips, 2016), “The Measurement of Pace of Life: Results From an Experience Sampling Smartphone App Study” (Stieger, Lewetz, & Reips, 2016), “Smartphone Tilt as a Measure of Well-Being? Results From a Longitudinal Smartphone App Study” (Kuhlmann, Reips, & Stieger, 2016), “PageFocus: Using Paradata to Detect Window Switching Behavior in Online Tests” (Diedenhofen & Musch, 2016), “‘Is There Anybody Out There?’: Sleepsex.Org, a Longstanding Hub for Research and Exchange on Sexsomnia” (Mangan, 2016), and “Svendroid: A Generic Smartphone App Configurator for Mobile Assessment Studies” (Reips, Stieger, Heinrichs, & de Spindler, 2016). By way of contrast, the titles of the articles presented at the 1996 SCiP symposia on Internet experiments are presented in Table 1. The discussion here is organized into the sections New Methodologies, A Persistent Issue, and Living Up to the Promise of SCiP ’96.

Table 1 Papers presented at the 1996 Society for Computers in Psychology symposia on Internet experiments

New methodologies

It is evident that mobile computing is taking center stage as a new platform for psychological research with capabilities that were not envisioned 20 years ago. As smartphones become ubiquitous throughout the developed (and even much of the developing) world, researchers are adapting to the realities of the ways that people currently interact online, and the opportunities and challenges that mobile computing presents to the research community. For example, Stieger, Lewetz, and Reips (2016) used an experience-sampling approach to investigate the “pace of life,” comparing self-report to results from a tapping task on participants’ smartphones. The touch interface was able to provide useful data. Smartphones afford researchers the opportunity to collect ecologically valid data in situ, taking advantage of automatically generated sources of data such as real-time GPS coordinates and gyro sensor data on the position or tilt of cell phones. Kuhlmann, Reips, and Stieger (2016) explored the use of smartphone tilt over a period of three weeks as an index to participants’ posture, which is associated with well-being (Briñol, Petty, & Wagner, 2009). They found a negative correlation between tilt and well-being, such that the more the device was tilted relative to the “normal” position, the lower the well-being score. Tapping on touch screens and collecting gyro sensor tilt data represent relatively new interfaces that have the potential to provide useful behavioral data that could allow researchers to make inferences about cognitive and affective phenomena.

Researchers are also developing solutions to problems that were not as challenging in the early days of Web-based psychology experiments. As multitasking has become more common, it is increasingly more likely that the subjects in psychology experiments will have multiple browser windows open during experimental sessions. This is potentially problematic, especially when researchers wish to assess knowledge or unaided judgment. Diedenhofen and Musch (2016) developed PageFocus, a JavaScript that allows them to determine whether and how frequently participants leave a target webpage by switching to another window or browser tab. They conducted two experiments to validate whether PageFocus can detect and prevent cheating in an unproctored online test, and found that PageFocus was effective.

A persistent issue

One persistent issue is the challenge of conducting scientific research with consumer-grade electronics (Wolfe, 2006). As former Behavior Research Methods, Instruments, and Computers editor Jonathan Vaughn told me over a dozen years ago,

When we use computers, video monitors, keyboards, and videocams in our research we are using equipment designed to meet the needs of the average consumer. Thus, we confront challenges and sources of error in the areas of timing and stimulus control that are attributable to the equipment because we are trying to conduct professional-grade scientific research using consumer-grade electronic equipment. (Wolfe, 2006, p. 248)

This statement remains true today, which can be seen in the work presented on smartphones in this SCiP symposium on Internet-based research. Both the touchscreen-tapping interface employed by Stieger, Lewetz, and Reips (2016) and the app created by Kuhlmann, Reips, and Stieger (2016) to assess posture by monitoring built-in gyro sensors lack some of the control and precision of specialized scientific apparatus. Indeed much of the work of the Society for Computers in Psychology can be seen as devising research methods to overcome the shortcomings of consumer-grade electronic equipment so as to conduct “professional-grade scientific research.”

Living up to the promise of SCiP ’96

The 1990s were a heady time for psychologists interested in conducting true experiments on the Web. Researchers began to formally compare the results from laboratory-based and Web-based experiments (e.g., Krantz, Ballard, & Scher, 1996), with the general consensus being that they produce comparable results. As a service to the field, some researchers began to create sites that brought together in one place links to an array of Web-based experiments, for the convenience of participants and to aid researchers in recruitment (see Krantz & Reips, 2017). Influential examples include Psychological Research on the Net (Krantz, 1996), The Web Experimental Psychology Lab (Reips, 2000), and several sites related to Amazon’s Mechanical Turk (e.g., Buhrmester, Kwang, & Gosling, 2011; Litman, Robinson, & Rosenzweig, 2015; Mason & Suri, 2012; Summerville & Chartier, 2013), including PsiTurk (Gureckis et al., 2015) and TurkPrime (Litman, Robinson, & Abberbock, 2016). Over the years, researchers have also developed and shared a number of tools to aid investigators in developing sophisticated experimental designs for Internet-based research, including WEXTOR (Reips & Neuhaus, 2002), Visual DMDX (Garaizar & Reips, 2015), jsPsyc (de Leeuw, 2015), QRTEngine for Qualtrics (Barnhoorn, Haasnoot, Bocanegra, & van Steenbergen, 2015), and JATOS (Lange, Kühn, & Filevich, 2015). Tools such as these have made it much easier for potential participants to find studies to match their interests and for researchers without special programming skills or large budgets to conduct controlled experiments employing a wide range of measures and designs.

In a 1996 SCiP symposium on Internet-based research, Reips (1996) identified four advantages of Web-based psychology experiments,

(1) easy access to a geographically unlimited subject population, including subjects from very specific and previously inaccessible target populations; (2) bringing the experiment to the subject instead of the opposite; (3) high statistical power through high sample size while keeping a conventional a-level; and (4) reduced cost, because neither laboratory rooms nor experimenters are needed.

Thus, it is reasonable to ask after 20 years whether these perceived advantages of Web-based psychological research have withstood the test of time.

  • (1) Easy access to a geographically unlimited subject population, including subjects from very specific and previously inaccessible target populations Recruiting special populations has been a goal of Web-based psychological research since the earliest days. For example, as he described at the 1996 SCiP symposium Internet Experiments II: Definition and Examples, Young (1996) was able to recruit 396 dependent Internet users in a study of Internet addiction. In the same symposium, Schiano (1996) described studying members of emerging “on-line communities,” and in my own work I have recruited women interested in learning about genetic breast cancer risk (Widmer et al., 2015). In the 2016 SCiP symposium, Mangan (2016) reported on 16 years of psychological research on “sleep-related abnormal sexual behaviors” (American Academy of Sleep Medicine, 2005) or “Sexsomnia” (Mangan & Reips, 2006)—that is, engaging in sexual behavior in one’s sleep. Research with special populations was often prohibitively expensive when researchers had to rely on telephones and the postal service. However, the Web makes it possible to do rigorous research, including controlled experiments, with small, scattered, and difficult-to-reach populations. Unfortunately, the ability to access special populations is often not appreciated by investigators as a reason for doing Web-based research (Krantz & Reips, 2017).

  • (2) Bringing the experiment to the subject instead of the opposite This is clearly still the case, and the saturation of smartphones in society has made it possible to conduct psychological research practically anywhere and anytime. For example, Kuhlmann, Reips, and Stieger (2016) were able to unobtrusively collect data on Christmas Eve and New Years Eve. In my own research, in a study with 151 subjects who answered all of the questions, I was able to recruit participants from Africa, Asia, Australia, Europe, North America, and South America (Wolfe & Fisher, 2013). Of course, the “downside” is a lack of experimental control over experimental conditions. Although this is likely to remain an issue for the foreseeable future, researchers may be able to utilize smartphone capabilities to at least measure critical aspects of the participant’s environment. For example, by using the built-in microphone to measure decibel levels the participant’s immediate environment, researchers may be able to identify potentially contaminated data in experiments in which a noisy environment would be problematic.

  • (3) High statistical power through high sample size while keeping a conventional a-level Large sample size continues to be a major reason that researchers are interested in Internet-based research. For example, in a study of 201 researchers, “nearly all listed fast data collection and large sample sizes as benefits of online data collection” (Gureckis et al., 2015, p. 831). Some Internet-based studies do indeed draw large numbers of participants. For example, Salganik, Dodds, and Watts (2006) recruited 14,341 participants in an experiment on artificial music markets, and Germine and colleagues (2012) had 4,080 participants in the Web-based arm of an experiment on perception that compared Web to laboratory samples.

    However, it appears that some investigators in the 1990s were unduly optimistic about the potential for routinely conducting experiments with large numbers of participants. At that time, some scholars talked about going to conferences and exchanging ideas with colleagues from other institutions in the morning, collecting data online in the afternoon, and then bringing those data to conversations that same evening. This has not been the experience of most researchers. To illustrate, in this 2016 symposium, researchers reported N = 261 in the smartphone-tapping study (Stieger et al., 2016), N = 98 in the smartphone tilt study (Kuhlmann et al., 2016), N = 186 in the PageFocus study (Diedenhofen & Musch, 2016), and an average of N = 265 in studies on sexsomnia collected on Sleepsex.org since 1999 (Mangan, 2016). In my own work, I had 140 participants in a Web-based experiment on joint probability estimation (Wolfe & Reyna, 2010), and 106 participants in the Web-based arm of a 90-min experiment on dialogues about breast cancer risk (Widmer et al., 2015).

    These numbers suggest that some (if not most) controlled Web-based psychology experiments have statistical power roughly comparable with psychology experiments conducted in the traditional laboratory—particularly since, to maintain the same level of statistical power, a Web-based experiment with more variation stemming from a more heterogeneous sample will require a larger sample size than a laboratory study with less variation from a more homogeneous sample. It is worth noting that most behavioral research conducted online is survey research, rather than true, controlled experiments, and that survey researchers have been able to recruit large numbers of participants for a long time with or without the Internet. However, experimentalists working online have to contend with a number of methodological issues that require solutions that often reduce N, including unequal dropout rates among the conditions and limiting recruitment to subjects who are naive to key elements of the experiment (e.g., excluding people who attempt to repeat the experiment and those who have participated in similar studies). Other aspects of controlled Web-based experiments that limit sample size include special conditions required for precision in the presentation of stimuli and, most importantly, inherently uninteresting experimental tasks. In retrospect, it appears that in the early years the “demand” for the experience of being a subject in psychology experiments exceeded the “supply,” whereas today they are closer to being balanced, with supply exceeding demand in cases with more difficult and less intrinsically interesting experimental tasks.

    Although there are many examples of true Web-based experiments with large samples and a general perception that the Web makes it easy to recruit large numbers of subjects, a paucity of data are available to show the actual statistical power and numbers of subjects in controlled Web-based experiments, in “apples-to-apples” comparisons with experiments using similar techniques in traditional laboratories. A systematic study of sample size and actual statistical power in highly comparable Web-based and lab-based studies would provide a valuable service to the field—particularly if the study were to compare published studies to those that were presented at conferences but not published.

  • (4) Reduced cost, because neither laboratory rooms nor experimenters are needed This appears to be the case. In an informal study of 201 researchers, Gureckis and colleagues (2015) found that 75% of researchers endorsed cost as a benefit of online data collection. Krantz and Reips (2016, 2017) surveyed psychological researchers who conduct Web-based studies on their reasons for doing so. Cost was one of the highest-rated reasons, ranking significantly higher than in a comparable study conducted by Musch and Reips in 2000.

Discussion

The studies reviewed here exemplify many of the advantages of Web-based psychology experiments. Although the most optimistic dreams of early researchers about large sample sizes have not always corresponded to the researcher’s experience, the best research conducted on the Web has kept pace with the promise of the early days of psychological research on the Internet. Addressing the shortcomings of consumer-grade electronics remains a burden, yet a new generation of investigators appears to be up to the challenge.

Despite these signs of scientific progress, there are also causes for concern. “Fast data collection” was the benefit of conducting research online most commonly endorsed by investigators in Gureckis et al.’s (2015) study of researchers in psychology, linguistics, marketing, neuroscience, and economics. Indeed, it was endorsed by almost all 201 researchers. Krantz and Reips (2016, 2017) reported that ease of use and cost appear to be the primary motivations for many researchers to turn to the Internet. However, Krantz and Reips also noted that there is no coherent curricular approach to teaching Web research methods. In standard textbooks on psychology research methods, topics such as N = 1 case studies continue to receive far more attention than Web-based psychological research (Krantz & Reips, 2017). More troubling yet, Krantz and Reips (2016, 2017) has reported that in Krantz’s work with the Psychological Research on the Net website, he routinely finds submissions that fail to meet even minimal scientific standards. Excellent resources are available for psychologists who wish to conduct “professional-grade” psychological research on the Internet. These include numerous peer-reviewed articles on Web-based research and related topics in the pages of both Behavior Research Methods and the International Journal of Internet Research. There are also a number of good books, including Introduction to Behavioral Research on the Internet (Birnbaum, 2001), The Oxford Handbook of Internet Psychology (Joinson, McKenna, Postmes, & Reips, 2007), and Advanced Methods for Behavior Research on the Internet (Gosling & Johnson, 2010). We have good data about the basic methodological issues for online studies, such as the consequences of using Likert scales, visual analog scales, or sliders (Funke & Reips, 2012; Funke, Reips, & Thomas, 2011). Yet too many researchers appear to be insufficiently acquainted with the literature on Web-based psychological research before they try to post a study online. Until those who write methods textbooks and those who teach research methods to undergraduates, and especially graduate students begin to take this burgeoning literature seriously, the future of Internet-based psychological research will remain in doubt.