Skip to main content

Using Audio Transformations to Improve Comprehension in Voice Question Answering

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2019)


Many popular form factors of digital assistants—such as Amazon Echo or Google Home—enable users to converse with speech-based systems. The lack of screens presents unique challenges. To satisfy users’ information needs, the presentation of answers has to be optimized for voice-only interactions. We evaluate the usefulness of audio transformations (i.e., prosodic modifications) for voice-only question answering. We introduce a crowdsourcing setup evaluating the quality of our proposed modifications along multiple dimensions corresponding to the informativeness, naturalness, and ability of users to identify key parts of the answer. We offer a set of prosodic modifications that highlight potentially important parts of the answer using various acoustic cues. Our experiments show that different modifications lead to better comprehension at the expense of slightly degraded naturalness of the audio.

For extended version of this paper, please refer to Chuklin et al. [2].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

  2. 2.

    Experiments performed under Ethics Application BSEH 10–14 at RMIT University.

  3. 3.

  4. 4.

    The emphasis feature is currently only available in the Google TTS and the implementation details are not specified in the SSML standard nor the documentation.


  1. Chuklin, A., de Rijke, M.: Incorporating clicks, attention and satisfaction into a search engine result page evaluation model. In: CIKM (2016)

    Google Scholar 

  2. Chuklin, A., Severyn, A., Trippas, J.R., Alfonseca, E., Silen, H., Spina, D.: Prosody modifications for question-answering in voice-only settings. CoRR abs/1806.03957 (2018).

  3. Cutler, A., Foss, D.J.: On the role of sentence stress in sentence processing. Lang. Speech 20, 1–10 (1977)

    Article  Google Scholar 

  4. Filippova, K., Alfonseca, E., Colmenares, C.A., Kaiser, L., Vinyals, O.: Sentence compression by deletion with LSTMs. In: EMNLP (2015)

    Google Scholar 

  5. Kumar, A.J., Schmidt, C., Köhler, J.: A knowledge graph-based speech interface for question answering systems. Speech Commun. 92, 1–12 (2017)

    Article  Google Scholar 

  6. Mishra, T., Bangalore, S.: Qme!: a speech-based question-answering system on mobile devices. In: Proceedings of NAACL 2010, pp. 55–63 (2010)

    Google Scholar 

  7. Mitra, B., Simon, G., Gao, J., Craswell, N., Deng, L.: A proposal for evaluating answer distillation from web data. In: Proceedings of the SIGIR 2016 WebQA Workshop (2016)

    Google Scholar 

  8. Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset (2016)

    Google Scholar 

  9. Pannekamp, A., Toepel, U., Alter, K., Hahne, A., Friederici, A.D.: Prosody-driven sentence processing: an event-related brain potential study. J. Cogn. Neurosci. 17, 407–421 (2005)

    Article  Google Scholar 

  10. Philips, L.: The double metaphone search algorithm. C/C++ Users J. 18(6), 38–43 (2000)

    Google Scholar 

  11. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: EMNLP (2016)

    Google Scholar 

  12. Ratcliff, J.W., Metzener, D.E.: Pattern matching: the gestalt approach. Dr. Dobb’s J. 13(7), 46 (1988)

    Google Scholar 

  13. Sanderman, A.A., Collier, R.: Prosodic phrasing and comprehension. Lang. Speech 40(4), 391–409 (1997)

    Article  Google Scholar 

  14. Whittaker, E.W.D., Mrozinski, J., Furui, S.: Factoid question answering with web, mobile and speech interfaces. In: NAACL (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Aleksandr Chuklin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chuklin, A., Severyn, A., Trippas, J.R., Alfonseca, E., Silen, H., Spina, D. (2019). Using Audio Transformations to Improve Comprehension in Voice Question Answering. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28576-0

  • Online ISBN: 978-3-030-28577-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics