“Super-Natural” Language Dialogues: In Search of Integration

Balentine, Bruce

doi:10.1007/978-1-4614-6018-3_14

“Super-Natural” Language Dialogues: In Search of Integration

Bruce Balentine M.Mus.³

Chapter
First Online: 12 December 2012

1043 Accesses

Abstract

By definition, the goal of NL speech research has been “natural” human language—that is, faithfully simulating human-human conversation with human-computer dialogues. The word “natural” here is enclosed in quotation marks to emphasize its narrow definition. Attempts to explore alternatives to this narrow definition have attracted little interest. But there are other ways to incorporate speech into effective and intelligent user interfaces. Text-to-speech synthesis, for example, can easily exceed human capabilities—encompassing a wider pitch range, speaking faster or slower, and/or pronouncing tongue-twisters. Similarly, non-speech audio can provide prompts and feedback more quickly than speech, and can also exploit musical syntax and semantics. Speech recognition dialogues, in turn, can master pidgin languages, shortcut expressions, cockney-like underground tongues, and non-speech sounds. User interfaces designed around these super-human technology capabilities are known collectively as super-natural language dialogues (SNLD). Such interfaces are inherently multimodal, and their study is inherently interdisciplinary.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Audio searching and skimming designs have been well-studied. See (Arons 1994) for one of the earliest.
2.
“Musical sense” is often not consciously-recognized but can still be measured through imaging, observed fluctuations in arousal, or other physiological experiments.
3.
In a personal communication, Koelsch challenges this statement, arguing that “Meyer’s referential/absolute classification is rather reminiscent of extra- and intramusical meaning (terms that Meyer also used).” Koelsch suggests instead, that, “The distinction between indexical extramusical meaning (i.e., the recognition of an emotional expression) and emotional musicogenic meaning (i.e., meaning emerging from the presence of a particular feeling state) is reminiscent of Gabrielson’s “perceived” and “felt” emotion.”
4.
Not exactly true, sound waves must propagate through space to the eardrum before a human user can hear them. So at least the body-sized envelope of sound-passing air (in the case of open-air loudspeakers), or the head-sized envelope implied by headphones or ear buds define the minimum spatial requirements for HCI audio. But because sound perception is so abstract—and because the user cannot see the sounds—the statement is practical.
5.
Remember that this description is focused on maximum integration of auditory media with other modalities. There are many ways to accomplish such integration, and this method, of course, presents practical product-development obstacles. But it is conceptually easy to understand, and thus makes a good example for the purposes of this article.
6.
These directed dialogues are well-known in the speech industry and not labored here.
7.
Post-recognition analysis includes n-best list traversal, interpretation of confidence values, and other executive decision-making.

References

Arons B (1994) Interactively skimming recorded speech. Dissertation, MIT Press, Boston
Google Scholar
Balentine B (1994) A multimedia interface: speech, sound, sight, and touch. In: AVIOS ’94 Proceedings, San Jose, Sept 1994
Google Scholar
Balentine B (1999) Re-engineering the speech menu. In: Gardner-Bonneau D (ed) Human factors and voice interactive systems. Kluwer Academic, Boston, pp 205–235
Google Scholar
Balentine B (2007a) It’s better to be a good machine than a bad person. ICMI Press, Annapolis
Google Scholar
Balentine,B (2007b) Online articles regarding menu guidelines for HCI. Click on “Lists and User Memory”. http://www.beagoodmachine.com/extras/cuttingroomfloor.php
Balentine B, Morgan DP (2001) How to build a speech recognition application. EIG Press, San Ramon
Google Scholar
Balentine B, Melaragno R, Stringham R (2000) Speech recognition 1999 R&D program final report: user interface design recommendations, EIG Press, San Ramon
Google Scholar
Blattner MM, Greenberg RM (1992a) In: Edwards ADN, Holland S (eds) Multimedia interface design in education, Springer, Berlin
Google Scholar
Blattner MM, Sumikawa DA, Greenberg RM (1989) Earcons and icons: their structure and common design principles. Hum Comput Interact 4:11–44
Article Google Scholar
Brewster SA, Crease MG (1999) Correcting menu usability problems with sound. Behav Info Technol 18(3):165–177
Article Google Scholar
Buxton W, Baecker R, Arnott J (1985) A holistic approach to user interface design. Unpublished manuscript
Google Scholar
Christian B (2011) The most human human: what artificial intelligence teaches us about being alive. Anchor Books, New York
Google Scholar
Commarford PM, Lewis JR, Smither JA-A, Gentzler MD (2008) A comparison of broad versus deep auditory menu structures. Hum Factors 50(1):77–89
Article Google Scholar
Gaver WW (1986) Auditory icons: using sound in computer interfaces. Hum Comput Interact 4:167–177
Article Google Scholar
Keller P, Stevens C (2004) Meaning from environmental sounds: types of signal-referent relations and their effect on recognizing auditory icons. J Exp Psychol Appl 10(1):3–12
Article Google Scholar
Koelsch S (2012) Neural correlates of processing musical semantics. In: First international workshop on segregation and integration in music and language, Tübingen, Feb 2012
Google Scholar
Kramer G (1994) An introduction to auditory display. In: Kramer G (ed) Auditory display: sonification, audification, and auditory interfaces. Addison Wesley, Reading, pp 1–78
Google Scholar
Kramer G, Walker B, Bonebright T, Cook P, Flowers JH, Miner N, Neuhoff J (2010) Sonification report: status of the field and research agenda. Lincoln: http://digitalcommons.unl.edu/psychfacpub/444
Lewis JR (2011) Practical speech user interface design. CRC Press, Boca Raton
Google Scholar
Martin P, Crabbe F, Adams S, Baatz E, Yankelovich N (1996) Speech acts: a spoken language framework. Computer 29(7) IEEE Computer, pp 33–40
Google Scholar
Meyer LB (1956) Emotion and meaning in music. The University of Chicago Press, Chicago
Google Scholar
Miller G (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Sci 63:81–97
Google Scholar
Muller M, Farrell R, Cebulka K, Smith J (1992) In: Blattner M, Dannenberg R (ed) Multimedia Interface Design. ACM Press, Michigan, p 30
Google Scholar
Palladino DK, Walker BN (2007) Learning rates for auditory menus enhanced with spearcons versus earcons. In: International conference on auditory display, Montreal
Google Scholar
Palladino DK, Walker BN (2008) Efficiency of spearcon-enhanced navigation of one dimensional electronic menus. In: International conference on auditory display, Paris
Google Scholar
Pitt I, Edwards A (2003) Design of speech-based devices. Springer, London
Book Google Scholar
Rennyson D, Bouzid A (2012) Personal conversation. Virginia, Vienna
Google Scholar
Serafin S, Franinvić K, Hermann T, Lemaitre G, Rinott M, Rocchesso D (2011) Sonic interaction design. In: Hermann T, Hunt A, Neuhoff JG (eds) The sonification handbook. Logos-Verlag, Berlin, pp 87–110
Google Scholar
Suhm B, Freeman B, Getty D (2001) Curing the menu blues in touch-tone voice interfaces. In: Proceedings of CHI 2001, ACM, The Hague, pp 131–132
Google Scholar
Walker BN, Nees MA (2011) Theory of sonification. In: Hermann T, Hunt A, Neuhoff JG (eds) The sonification handbook. Logos-Verlag, Berlin, pp 9–39
Google Scholar
Walker BN, Nance A, Lindsay J (2006) Spearcons: speech-based earcons improve navigation performance in auditory menus. In: Proceedings of the international conference on auditory display (ICAD 2006), London(June 20–24), pp 63–68
Google Scholar
Winograd T, Flores F (1986) Understanding computers and cognition. Addison-Wesley, Menlo Park
MATH Google Scholar
Yalla P, Walker BN (2007) Advanced auditory menus. (No. GIT-GVU-07-12.): Georgia Institute of Technology GVU Center
Google Scholar
Yalla P, Walker BN (2008) Advanced auditory menus: design and evaluation of auditory scroll bars. In ASSETS’08, ACM Press, Halifax
Google Scholar

Download references

Acknowledgments

Illustrations by Alexander T. Klein. Spearcons and Spindex are trademarks of Georgia Tech University. Etch-a-Sketch is a registered trademark of The Ohio Art Company. ShadowPrompt is a registered trademark of Enterprise Integration Group.

Author information

Authors and Affiliations

EIG Labs, Enterprise Integration Group E.I.G. AG, Weinbergstrasse 68, Zürich, 8006, Switzerland
Bruce Balentine M.Mus.

Authors

Bruce Balentine M.Mus.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bruce Balentine M.Mus. .

Editor information

Editors and Affiliations

Lingustic Technology Systems, Palisade Ave Apt 1809 800, Fort Lee, 07024-4121, New Jersey, USA
Amy Neustein
J. Markowitz Consultants, N. Sheridan Road, Suite 19A 5801, Chicago, 60660, USA
Judith A. Markowitz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Balentine, B. (2013). “Super-Natural” Language Dialogues: In Search of Integration. In: Neustein, A., Markowitz, J. (eds) Mobile Speech and Advanced Natural Language Solutions. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6018-3_14

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6018-3_14
Published: 12 December 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6017-6
Online ISBN: 978-1-4614-6018-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics