Almost all existing Web-based RT research has used individual visually presented stimuli. There are several likely reasons for this. First, it reflects a focus in cognitive psychology directed more toward visual rather than auditory perception and processing. (To illustrate, in the UK, the core introductory cognitive psychology textbooks have several chapters on aspects of visual processing, but only a few pages on auditory perception.)
Second, there may have been more practical barriers to running Web-based experiments with auditory stimuli. The ability for users to see visually presented stimuli is a given, as all computers use a visual interface—that is, a monitor—for interaction. Audio has traditionally been more optional: Early PCs could only produce simple beeps via the internal speaker, and only a little over a decade ago, many business PCs included sound cards only as an optional extra. A more recent issue has been user-driven: people do not always have the ability to listen to sounds presented on their computer, lacking speakers or headphones. However, the increasing use of the Web to play video, and applications requiring audio such as skype, is likely to have made the ability to use audio more widespread.
Third, researchers have legitimate concerns about the uncontrolled environment in which Web-based studies are run (for a more detailed overview, see Slote & Strand, 2016). Although the appearance of visual stimuli varies substantially from system to system, in terms of size, hue, saturation, contrast, among other things, the fact that people need to be able to view a display in order to use a computer means that the basic properties of a stimulus will be perceived by a participant. On the other hand, auditory stimuli may be too quiet to be perceived; they may be distorted; they may be played in a noisy environment making their discriminability impossible. They may also be presented monaurally or binaurally, which can affect perception (Feuerstein, 1992), in mono or stereo (which would affect dichotic listening tasks), and so on.
Fourth, the presentation of browser-based audio is generally more complicated than the presentation of visual stimuli. For example, no single audio format is supported by all current popular PC browsers, and until the recent development of the HTML5 standards, the optimal methods for playing audio varied by browser (for an introduction to earlier methods for presenting audio, see Huckvale, 2011).
Finally, there are concerns regarding the variability in audio presentation onset times. Most notably, Babjack et al. (2015) reported substantial variability in the latency between executing the code to present a sound, and that sound being presented. In their research, a Black Box ToolKit (see below) was used to measure the latency between pulse generated by the testing system, which would be detected immediately, and a sound that was coded to begin at the same time. The results showed that the mean latency varied substantially across different hardware and software combinations, from 25 to 126 ms of milliseconds, and that one-off latencies could be as much as 356 ms.
Experiments using auditory stimuli in Web-based studies started in the very earliest days of internet-mediated research (Welch & Krantz, 1996). However in the intervening 20 years, very little published research appears to have presented auditory stimuli over the Web (for overviews, see Knoll, Uther, & Costall, 2011, and Slote & Strand, 2016), and less still has examined the accuracy of doing so, systematically.
Some existing research has used long audio files embedded in a webpage (e.g., Honing, 2006) or downloaded to the user’s computer (e.g., Welch & Krantz, 1996). Auditory stimuli have included excerpts of musical performance (Honing, 2006), unpleasant sounds such as vomiting and dentists’ drills (Cox, 2008), and speech (Knoll et al., 2011; Slote & Strand, 2016).
In Knoll et al.’s (2011) study, participants listened to 30-second samples of low-pass filtered speech, spoken by a UK-based mother (a) to her infant, (b) to a British confederate, and (c) to a foreign confederate. Participants made a series of affective ratings for each speech clip. The experiment was run in two conditions: One was a traditional lab-based setup; the other used participants recruited and tested over the Web. The pattern of results was very similar across the two conditions, with participants in both conditions rating infant-directed speech as more positive, more comforting and more encouraging than either of the two other speech types.