Skip to main content

“Super-Natural” Language Dialogues: In Search of Integration

  • Chapter
  • First Online:
  • 1043 Accesses

Abstract

By definition, the goal of NL speech research has been “natural” human language—that is, faithfully simulating human-human conversation with human-computer dialogues. The word “natural” here is enclosed in quotation marks to emphasize its narrow definition. Attempts to explore alternatives to this narrow definition have attracted little interest. But there are other ways to incorporate speech into effective and intelligent user interfaces. Text-to-speech synthesis, for example, can easily exceed human capabilities—encompassing a wider pitch range, speaking faster or slower, and/or pronouncing tongue-twisters. Similarly, non-speech audio can provide prompts and feedback more quickly than speech, and can also exploit musical syntax and semantics. Speech recognition dialogues, in turn, can master pidgin languages, shortcut expressions, cockney-like underground tongues, and non-speech sounds. User interfaces designed around these super-human technology capabilities are known collectively as super-natural language dialogues (SNLD). Such interfaces are inherently multimodal, and their study is inherently interdisciplinary.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Audio searching and skimming designs have been well-studied. See (Arons 1994) for one of the earliest.

  2. 2.

    “Musical sense” is often not consciously-recognized but can still be measured through imaging, observed fluctuations in arousal, or other physiological experiments.

  3. 3.

    In a personal communication, Koelsch challenges this statement, arguing that “Meyer’s referential/absolute classification is rather reminiscent of extra- and intramusical meaning (terms that Meyer also used).” Koelsch suggests instead, that, “The distinction between indexical extramusical meaning (i.e., the recognition of an emotional expression) and emotional musicogenic meaning (i.e., meaning emerging from the presence of a particular feeling state) is reminiscent of Gabrielson’s “perceived” and “felt” emotion.”

  4. 4.

    Not exactly true, sound waves must propagate through space to the eardrum before a human user can hear them. So at least the body-sized envelope of sound-passing air (in the case of open-air loudspeakers), or the head-sized envelope implied by headphones or ear buds define the minimum spatial requirements for HCI audio. But because sound perception is so abstract—and because the user cannot see the sounds—the statement is practical.

  5. 5.

    Remember that this description is focused on maximum integration of auditory media with other modalities. There are many ways to accomplish such integration, and this method, of course, presents practical product-development obstacles. But it is conceptually easy to understand, and thus makes a good example for the purposes of this article.

  6. 6.

    These directed dialogues are well-known in the speech industry and not labored here.

  7. 7.

    Post-recognition analysis includes n-best list traversal, interpretation of confidence values, and other executive decision-making.

References

  • Arons B (1994) Interactively skimming recorded speech. Dissertation, MIT Press, Boston

    Google Scholar 

  • Balentine B (1994) A multimedia interface: speech, sound, sight, and touch. In: AVIOS ’94 Proceedings, San Jose, Sept 1994

    Google Scholar 

  • Balentine B (1999) Re-engineering the speech menu. In: Gardner-Bonneau D (ed) Human factors and voice interactive systems. Kluwer Academic, Boston, pp 205–235

    Google Scholar 

  • Balentine B (2007a) It’s better to be a good machine than a bad person. ICMI Press, Annapolis

    Google Scholar 

  • Balentine,B (2007b) Online articles regarding menu guidelines for HCI. Click on “Lists and User Memory”. http://www.beagoodmachine.com/extras/cuttingroomfloor.php

  • Balentine B, Morgan DP (2001) How to build a speech recognition application. EIG Press, San Ramon

    Google Scholar 

  • Balentine B, Melaragno R, Stringham R (2000) Speech recognition 1999 R&D program final report: user interface design recommendations, EIG Press, San Ramon

    Google Scholar 

  • Blattner MM, Greenberg RM (1992a) In: Edwards ADN, Holland S (eds) Multimedia interface design in education, Springer, Berlin

    Google Scholar 

  • Blattner MM, Sumikawa DA, Greenberg RM (1989) Earcons and icons: their structure and common design principles. Hum Comput Interact 4:11–44

    Article  Google Scholar 

  • Brewster SA, Crease MG (1999) Correcting menu usability problems with sound. Behav Info Technol 18(3):165–177

    Article  Google Scholar 

  • Buxton W, Baecker R, Arnott J (1985) A holistic approach to user interface design. Unpublished manuscript

    Google Scholar 

  • Christian B (2011) The most human human: what artificial intelligence teaches us about being alive. Anchor Books, New York

    Google Scholar 

  • Commarford PM, Lewis JR, Smither JA-A, Gentzler MD (2008) A comparison of broad versus deep auditory menu structures. Hum Factors 50(1):77–89

    Article  Google Scholar 

  • Gaver WW (1986) Auditory icons: using sound in computer interfaces. Hum Comput Interact 4:167–177

    Article  Google Scholar 

  • Keller P, Stevens C (2004) Meaning from environmental sounds: types of signal-referent relations and their effect on recognizing auditory icons. J Exp Psychol Appl 10(1):3–12

    Article  Google Scholar 

  • Koelsch S (2012) Neural correlates of processing musical semantics. In: First international workshop on segregation and integration in music and language, Tübingen, Feb 2012

    Google Scholar 

  • Kramer G (1994) An introduction to auditory display. In: Kramer G (ed) Auditory display: sonification, audification, and auditory interfaces. Addison Wesley, Reading, pp 1–78

    Google Scholar 

  • Kramer G, Walker B, Bonebright T, Cook P, Flowers JH, Miner N, Neuhoff J (2010) Sonification report: status of the field and research agenda. Lincoln: http://digitalcommons.unl.edu/psychfacpub/444

  • Lewis JR (2011) Practical speech user interface design. CRC Press, Boca Raton

    Google Scholar 

  • Martin P, Crabbe F, Adams S, Baatz E, Yankelovich N (1996) Speech acts: a spoken language framework. Computer 29(7) IEEE Computer, pp 33–40

    Google Scholar 

  • Meyer LB (1956) Emotion and meaning in music. The University of Chicago Press, Chicago

    Google Scholar 

  • Miller G (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Sci 63:81–97

    Google Scholar 

  • Muller M, Farrell R, Cebulka K, Smith J (1992) In: Blattner M, Dannenberg R (ed) Multimedia Interface Design. ACM Press, Michigan, p 30

    Google Scholar 

  • Palladino DK, Walker BN (2007) Learning rates for auditory menus enhanced with spearcons versus earcons. In: International conference on auditory display, Montreal

    Google Scholar 

  • Palladino DK, Walker BN (2008) Efficiency of spearcon-enhanced navigation of one dimensional electronic menus. In: International conference on auditory display, Paris

    Google Scholar 

  • Pitt I, Edwards A (2003) Design of speech-based devices. Springer, London

    Book  Google Scholar 

  • Rennyson D, Bouzid A (2012) Personal conversation. Virginia, Vienna

    Google Scholar 

  • Serafin S, Franinvić K, Hermann T, Lemaitre G, Rinott M, Rocchesso D (2011) Sonic interaction design. In: Hermann T, Hunt A, Neuhoff JG (eds) The sonification handbook. Logos-Verlag, Berlin, pp 87–110

    Google Scholar 

  • Suhm B, Freeman B, Getty D (2001) Curing the menu blues in touch-tone voice interfaces. In: Proceedings of CHI 2001, ACM, The Hague, pp 131–132

    Google Scholar 

  • Walker BN, Nees MA (2011) Theory of sonification. In: Hermann T, Hunt A, Neuhoff JG (eds) The sonification handbook. Logos-Verlag, Berlin, pp 9–39

    Google Scholar 

  • Walker BN, Nance A, Lindsay J (2006) Spearcons: speech-based earcons improve navigation performance in auditory menus. In: Proceedings of the international conference on auditory display (ICAD 2006), London(June 20–24), pp 63–68

    Google Scholar 

  • Winograd T, Flores F (1986) Understanding computers and cognition. Addison-Wesley, Menlo Park

    MATH  Google Scholar 

  • Yalla P, Walker BN (2007) Advanced auditory menus. (No. GIT-GVU-07-12.): Georgia Institute of Technology GVU Center

    Google Scholar 

  • Yalla P, Walker BN (2008) Advanced auditory menus: design and evaluation of auditory scroll bars. In ASSETS’08, ACM Press, Halifax

    Google Scholar 

Download references

Acknowledgments

Illustrations by Alexander T. Klein. Spearcons and Spindex are trademarks of Georgia Tech University. Etch-a-Sketch is a registered trademark of The Ohio Art Company. ShadowPrompt is a registered trademark of Enterprise Integration Group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruce Balentine M.Mus. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Balentine, B. (2013). “Super-Natural” Language Dialogues: In Search of Integration. In: Neustein, A., Markowitz, J. (eds) Mobile Speech and Advanced Natural Language Solutions. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6018-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6018-3_14

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6017-6

  • Online ISBN: 978-1-4614-6018-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics