Abstract
By definition, the goal of NL speech research has been “natural” human language—that is, faithfully simulating human-human conversation with human-computer dialogues. The word “natural” here is enclosed in quotation marks to emphasize its narrow definition. Attempts to explore alternatives to this narrow definition have attracted little interest. But there are other ways to incorporate speech into effective and intelligent user interfaces. Text-to-speech synthesis, for example, can easily exceed human capabilities—encompassing a wider pitch range, speaking faster or slower, and/or pronouncing tongue-twisters. Similarly, non-speech audio can provide prompts and feedback more quickly than speech, and can also exploit musical syntax and semantics. Speech recognition dialogues, in turn, can master pidgin languages, shortcut expressions, cockney-like underground tongues, and non-speech sounds. User interfaces designed around these super-human technology capabilities are known collectively as super-natural language dialogues (SNLD). Such interfaces are inherently multimodal, and their study is inherently interdisciplinary.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Audio searching and skimming designs have been well-studied. See (Arons 1994) for one of the earliest.
- 2.
“Musical sense” is often not consciously-recognized but can still be measured through imaging, observed fluctuations in arousal, or other physiological experiments.
- 3.
In a personal communication, Koelsch challenges this statement, arguing that “Meyer’s referential/absolute classification is rather reminiscent of extra- and intramusical meaning (terms that Meyer also used).” Koelsch suggests instead, that, “The distinction between indexical extramusical meaning (i.e., the recognition of an emotional expression) and emotional musicogenic meaning (i.e., meaning emerging from the presence of a particular feeling state) is reminiscent of Gabrielson’s “perceived” and “felt” emotion.”
- 4.
Not exactly true, sound waves must propagate through space to the eardrum before a human user can hear them. So at least the body-sized envelope of sound-passing air (in the case of open-air loudspeakers), or the head-sized envelope implied by headphones or ear buds define the minimum spatial requirements for HCI audio. But because sound perception is so abstract—and because the user cannot see the sounds—the statement is practical.
- 5.
Remember that this description is focused on maximum integration of auditory media with other modalities. There are many ways to accomplish such integration, and this method, of course, presents practical product-development obstacles. But it is conceptually easy to understand, and thus makes a good example for the purposes of this article.
- 6.
These directed dialogues are well-known in the speech industry and not labored here.
- 7.
Post-recognition analysis includes n-best list traversal, interpretation of confidence values, and other executive decision-making.
References
Arons B (1994) Interactively skimming recorded speech. Dissertation, MIT Press, Boston
Balentine B (1994) A multimedia interface: speech, sound, sight, and touch. In: AVIOS ’94 Proceedings, San Jose, Sept 1994
Balentine B (1999) Re-engineering the speech menu. In: Gardner-Bonneau D (ed) Human factors and voice interactive systems. Kluwer Academic, Boston, pp 205–235
Balentine B (2007a) It’s better to be a good machine than a bad person. ICMI Press, Annapolis
Balentine,B (2007b) Online articles regarding menu guidelines for HCI. Click on “Lists and User Memory”. http://www.beagoodmachine.com/extras/cuttingroomfloor.php
Balentine B, Morgan DP (2001) How to build a speech recognition application. EIG Press, San Ramon
Balentine B, Melaragno R, Stringham R (2000) Speech recognition 1999 R&D program final report: user interface design recommendations, EIG Press, San Ramon
Blattner MM, Greenberg RM (1992a) In: Edwards ADN, Holland S (eds) Multimedia interface design in education, Springer, Berlin
Blattner MM, Sumikawa DA, Greenberg RM (1989) Earcons and icons: their structure and common design principles. Hum Comput Interact 4:11–44
Brewster SA, Crease MG (1999) Correcting menu usability problems with sound. Behav Info Technol 18(3):165–177
Buxton W, Baecker R, Arnott J (1985) A holistic approach to user interface design. Unpublished manuscript
Christian B (2011) The most human human: what artificial intelligence teaches us about being alive. Anchor Books, New York
Commarford PM, Lewis JR, Smither JA-A, Gentzler MD (2008) A comparison of broad versus deep auditory menu structures. Hum Factors 50(1):77–89
Gaver WW (1986) Auditory icons: using sound in computer interfaces. Hum Comput Interact 4:167–177
Keller P, Stevens C (2004) Meaning from environmental sounds: types of signal-referent relations and their effect on recognizing auditory icons. J Exp Psychol Appl 10(1):3–12
Koelsch S (2012) Neural correlates of processing musical semantics. In: First international workshop on segregation and integration in music and language, Tübingen, Feb 2012
Kramer G (1994) An introduction to auditory display. In: Kramer G (ed) Auditory display: sonification, audification, and auditory interfaces. Addison Wesley, Reading, pp 1–78
Kramer G, Walker B, Bonebright T, Cook P, Flowers JH, Miner N, Neuhoff J (2010) Sonification report: status of the field and research agenda. Lincoln: http://digitalcommons.unl.edu/psychfacpub/444
Lewis JR (2011) Practical speech user interface design. CRC Press, Boca Raton
Martin P, Crabbe F, Adams S, Baatz E, Yankelovich N (1996) Speech acts: a spoken language framework. Computer 29(7) IEEE Computer, pp 33–40
Meyer LB (1956) Emotion and meaning in music. The University of Chicago Press, Chicago
Miller G (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Sci 63:81–97
Muller M, Farrell R, Cebulka K, Smith J (1992) In: Blattner M, Dannenberg R (ed) Multimedia Interface Design. ACM Press, Michigan, p 30
Palladino DK, Walker BN (2007) Learning rates for auditory menus enhanced with spearcons versus earcons. In: International conference on auditory display, Montreal
Palladino DK, Walker BN (2008) Efficiency of spearcon-enhanced navigation of one dimensional electronic menus. In: International conference on auditory display, Paris
Pitt I, Edwards A (2003) Design of speech-based devices. Springer, London
Rennyson D, Bouzid A (2012) Personal conversation. Virginia, Vienna
Serafin S, Franinvić K, Hermann T, Lemaitre G, Rinott M, Rocchesso D (2011) Sonic interaction design. In: Hermann T, Hunt A, Neuhoff JG (eds) The sonification handbook. Logos-Verlag, Berlin, pp 87–110
Suhm B, Freeman B, Getty D (2001) Curing the menu blues in touch-tone voice interfaces. In: Proceedings of CHI 2001, ACM, The Hague, pp 131–132
Walker BN, Nees MA (2011) Theory of sonification. In: Hermann T, Hunt A, Neuhoff JG (eds) The sonification handbook. Logos-Verlag, Berlin, pp 9–39
Walker BN, Nance A, Lindsay J (2006) Spearcons: speech-based earcons improve navigation performance in auditory menus. In: Proceedings of the international conference on auditory display (ICAD 2006), London(June 20–24), pp 63–68
Winograd T, Flores F (1986) Understanding computers and cognition. Addison-Wesley, Menlo Park
Yalla P, Walker BN (2007) Advanced auditory menus. (No. GIT-GVU-07-12.): Georgia Institute of Technology GVU Center
Yalla P, Walker BN (2008) Advanced auditory menus: design and evaluation of auditory scroll bars. In ASSETS’08, ACM Press, Halifax
Acknowledgments
Illustrations by Alexander T. Klein. Spearcons and Spindex are trademarks of Georgia Tech University. Etch-a-Sketch is a registered trademark of The Ohio Art Company. ShadowPrompt is a registered trademark of Enterprise Integration Group.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Balentine, B. (2013). “Super-Natural” Language Dialogues: In Search of Integration. In: Neustein, A., Markowitz, J. (eds) Mobile Speech and Advanced Natural Language Solutions. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6018-3_14
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6018-3_14
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6017-6
Online ISBN: 978-1-4614-6018-3
eBook Packages: EngineeringEngineering (R0)