Behavior Research Methods

, Volume 50, Issue 6, pp 2597–2605 | Cite as

Is automatic speech-to-text transcription ready for use in psychological experiments?

  • Kirsten Ziman
  • Andrew C. Heusser
  • Paxton C. Fitzpatrick
  • Campbell E. Field
  • Jeremy R. Manning


Verbal responses are a convenient and naturalistic way for participants to provide data in psychological experiments (Salzinger, The Journal of General Psychology, 61(1),65–94:1959). However, audio recordings of verbal responses typically require additional processing, such as transcribing the recordings into text, as compared with other behavioral response modalities (e.g., typed responses, button presses, etc.). Further, the transcription process is often tedious and time-intensive, requiring human listeners to manually examine each moment of recorded speech. Here we evaluate the performance of a state-of-the-art speech recognition algorithm (Halpern et al., 2016) in transcribing audio data into text during a list-learning experiment. We compare transcripts made by human annotators to the computer-generated transcripts. Both sets of transcripts matched to a high degree and exhibited similar statistical properties, in terms of the participants’ recall performance and recall dynamics that the transcripts captured. This proof-of-concept study suggests that speech-to-text engines could provide a cheap, reliable, and rapid means of automatically transcribing speech data in psychological experiments. Further, our findings open the door for verbal response experiments that scale to thousands of participants (e.g., administered online), as well as a new generation of experiments that decode speech on the fly and adapt experimental parameters based on participants’ prior responses.


Annotation Free recall Mechanical Turk Memory Speech-to-text Verbal response 



We are grateful for useful discussions with Justin Hulbert and Talia Manning. Our work was supported in part by NSF EPSCoR Award Number 1632738. The content is solely the responsibility of the authors and does not necessarily represent the official views of our supporting organizations.


  1. Angelakis, E., Stathopoulou, S., Frymiare, J. L., Green, D. L., Lubar, J. F., & Kounios, J. (2007). EEG neurofeedback: A brief overview and an example of peak alpha frequency training for cognitive enhancement in the elderly. The Clinical Neuropsychologist, 21(1), 110–129.CrossRefGoogle Scholar
  2. Bamberg, P., Chow, Y.-L., Gillick, L., Roth, R., & Sturtevant, D. (1990). The Dragon continuous speech recognition system: a real-time implementation. In Proceedings of DARPA Speech and Natural Language Workshop (pp. 78–81).Google Scholar
  3. Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: a new source of inexpensive, yet high-quality, data Perspectives on Psychological Science, 6(1), 3–5.CrossRefGoogle Scholar
  4. Carlini, N., & Wagner, D. (2018). Audio adversarial examples: targeted attacks on speech-to-text. arXiv:1801.01944
  5. Cohen, M. S. (2001). Real-time functional magnetic resonance imaging. Methods, 25, 201–220.CrossRefGoogle Scholar
  6. Col, J. (2017). Enchanted learning. Retrieved from
  7. Cornsweet, T. N. (1962). The staircase-method in psychophysics. The American Journal of Psychology, 75(3), 485–491.CrossRefGoogle Scholar
  8. Cox, R. W., & Jesmanowicz, A. (1999). Real-time 3D image registration for functional MRI. Magnetic Resonance in Medicine, 42, 1014–1018.CrossRefGoogle Scholar
  9. Cox, R. W., Jesmanowicz, A., & Hyde, J. S. (1995). Real-time functional magnetic resonance imaging. Magnetic Resonance in Medicine, 33, 230–236.CrossRefGoogle Scholar
  10. Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS One, 8(3), e57410.CrossRefGoogle Scholar
  11. deBettencourt, M. T., Cohen, J. D., Lee, R. F., Norman, K. A., & Turk-Browne, N. B. (2015). Closed-loop training of attention with real-time brain imaging. Nature Neuroscience, 18(3), 470– 475.CrossRefGoogle Scholar
  12. deCharms, R. C. (2008). Applications of real-time fMRI. Nat Rev Neurosci, 9(9), 720–729.CrossRefGoogle Scholar
  13. de Leeuw, J.R. (2015). jsPsych: A JavaScript library for creating behavioral experiments in a web browser. Behavior Research Methods, 47(1), 1–12.CrossRefGoogle Scholar
  14. Gureckis, T.M., Martin, J., McDonnell, J., Rich, A. S., Markant, D., Coenen, A., & Chan, P. (2015). psiTurk: An open-source framework for conducting replicable behavioral experiments online. Behavior Research Methods, 48(3), 829–842.CrossRefGoogle Scholar
  15. Halpern, Y., Hall, K. B., Schogol, V., Riley, M., Roark, B., Skobeltsyn, G., & Bäuml, M. (2016). Contextual prediction models for speech recognition. In Interspeech (pp. 2338–2342).Google Scholar
  16. Heusser, A. C., Fitzpatrick, P. C., Field, C. E., Ziman, K., & Manning, J. R. (2017). Quail: a Python toolbox for analyzing and plotting free recall data. The Journal of Open Source Software, CrossRefGoogle Scholar
  17. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-R., Jaitly, N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRefGoogle Scholar
  18. Huggins-Daines, D., Kumar, M., Chan, A., Black, A. W., Ravishankar, M., & Rudnicky, A. I. (2006). Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (Vol. 1 pp. 185–188).Google Scholar
  19. Kahana, M. J. (1996). Associative retrieval processes in free recall. Memory Cognition, 24, 103–109.CrossRefGoogle Scholar
  20. Kahana, M.J. (2012) Foundations of human memory. New York: Oxford University Press.Google Scholar
  21. Kahana, M. J. (2017). Memory search. In J. H. Byrne (Ed.) Learning and memory: A comprehensive reference, second edition (pp. 181–200). Oxford: Academic Press.Google Scholar
  22. Kurzweil, R., Richter, R., Kurzweil, R., & Schneider, M. L. (1990) The age of intelligent machines. Cambridge: MIT Press.Google Scholar
  23. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083
  24. Manning, J. R., Norman, K. A., & Kahana, M. J. (2015). The role of context in episodic memory. In M. Gazzaniga (Ed.) The cognitive neurosciences, 5th edition (pp. 557–566). Cambridge: MIT Press.Google Scholar
  25. Manning, J. R., Polyn, S. M., Baltuch, G., Litt, B., & Kahana, M. J. (2011). Oscillatory patterns in temporal lobe reveal context reinstatement during memory search. Proceedings of the National Academy of Sciences, USA, 108(31), 12893–12897.CrossRefGoogle Scholar
  26. Murdock, B. B. (1962). The serial position effect of free recall. Journal of Experimental Psychology, 64, 482–488.CrossRefGoogle Scholar
  27. Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgement and Decision Making, 5(5), 411–419.Google Scholar
  28. Park, M., & Pillow, J. W. (2012). Bayesian active learning with localized priors for fast receptive field characterization. In Advances in Neural Information Processing Systems (pp. 2348–2356).Google Scholar
  29. Polyn, S. M., & Kahana, M. J. (2008). Memory search and the neural representation of context. Trends in Cognitive Sciences, 12(1), 24–30.CrossRefGoogle Scholar
  30. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRefGoogle Scholar
  31. Rundus, D. (1971). Analysis of rehearsal processes in free recall. Journal of Experimental Psychology, 89(1), 63–77.CrossRefGoogle Scholar
  32. Salzinger, K. (1959). Experimental manipulation of verbal behavior: A review. The Journal of General Psychology, 61(1), 65–94.CrossRefGoogle Scholar
  33. Tan, L., & Ward, G. (2000). A recency-based account of the primacy effect in free recall. Journal of Experimental Psychology: Learning Memory, and Cognition, 26, 1589–1626. Google Scholar
  34. Tan, L., & Ward, G. (2008). Rehearsal in immediate serial recall. Psychonomic Bulletin & Review, 15(3), 535–542.CrossRefGoogle Scholar
  35. UPenn Computational Memory Lab (2015). Penn TotalRecall. Computer Software.Google Scholar
  36. van der Linden, W. J., & Glas, C.A. (2000) Computerized adaptive testing: Theory and practice. Berlin: Springer.CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Kirsten Ziman
    • 1
  • Andrew C. Heusser
    • 1
  • Paxton C. Fitzpatrick
    • 1
  • Campbell E. Field
    • 1
  • Jeremy R. Manning
    • 1
  1. 1.Department of Psychological and Brain SciencesDartmouth CollegeHanoverUSA

Personalised recommendations