Is automatic speech-to-text transcription ready for use in psychological experiments?
Verbal responses are a convenient and naturalistic way for participants to provide data in psychological experiments (Salzinger, The Journal of General Psychology, 61(1),65–94:1959). However, audio recordings of verbal responses typically require additional processing, such as transcribing the recordings into text, as compared with other behavioral response modalities (e.g., typed responses, button presses, etc.). Further, the transcription process is often tedious and time-intensive, requiring human listeners to manually examine each moment of recorded speech. Here we evaluate the performance of a state-of-the-art speech recognition algorithm (Halpern et al., 2016) in transcribing audio data into text during a list-learning experiment. We compare transcripts made by human annotators to the computer-generated transcripts. Both sets of transcripts matched to a high degree and exhibited similar statistical properties, in terms of the participants’ recall performance and recall dynamics that the transcripts captured. This proof-of-concept study suggests that speech-to-text engines could provide a cheap, reliable, and rapid means of automatically transcribing speech data in psychological experiments. Further, our findings open the door for verbal response experiments that scale to thousands of participants (e.g., administered online), as well as a new generation of experiments that decode speech on the fly and adapt experimental parameters based on participants’ prior responses.
KeywordsAnnotation Free recall Mechanical Turk Memory Speech-to-text Verbal response
We are grateful for useful discussions with Justin Hulbert and Talia Manning. Our work was supported in part by NSF EPSCoR Award Number 1632738. The content is solely the responsibility of the authors and does not necessarily represent the official views of our supporting organizations.
- Bamberg, P., Chow, Y.-L., Gillick, L., Roth, R., & Sturtevant, D. (1990). The Dragon continuous speech recognition system: a real-time implementation. In Proceedings of DARPA Speech and Natural Language Workshop (pp. 78–81).Google Scholar
- Carlini, N., & Wagner, D. (2018). Audio adversarial examples: targeted attacks on speech-to-text. arXiv:1801.01944
- Col, J. (2017). Enchanted learning. Retrieved from http://www.enchantedlearning.com
- Halpern, Y., Hall, K. B., Schogol, V., Riley, M., Roark, B., Skobeltsyn, G., & Bäuml, M. (2016). Contextual prediction models for speech recognition. In Interspeech (pp. 2338–2342).Google Scholar
- Huggins-Daines, D., Kumar, M., Chan, A., Black, A. W., Ravishankar, M., & Rudnicky, A. I. (2006). Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (Vol. 1 pp. 185–188).Google Scholar
- Kahana, M.J. (2012) Foundations of human memory. New York: Oxford University Press.Google Scholar
- Kahana, M. J. (2017). Memory search. In J. H. Byrne (Ed.) Learning and memory: A comprehensive reference, second edition (pp. 181–200). Oxford: Academic Press.Google Scholar
- Kurzweil, R., Richter, R., Kurzweil, R., & Schneider, M. L. (1990) The age of intelligent machines. Cambridge: MIT Press.Google Scholar
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083
- Manning, J. R., Norman, K. A., & Kahana, M. J. (2015). The role of context in episodic memory. In M. Gazzaniga (Ed.) The cognitive neurosciences, 5th edition (pp. 557–566). Cambridge: MIT Press.Google Scholar
- Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgement and Decision Making, 5(5), 411–419.Google Scholar
- Park, M., & Pillow, J. W. (2012). Bayesian active learning with localized priors for fast receptive field characterization. In Advances in Neural Information Processing Systems (pp. 2348–2356).Google Scholar
- UPenn Computational Memory Lab (2015). Penn TotalRecall. Computer Software.Google Scholar