Telematics: Artificial Passenger and Beyond

  • Dimitri Kanevsky
Part of the Signals and Communication Technology book series (SCT)

This chapter describes human-machine interfaces for in-vehicle technologies that are based on conversational interactivity. The Artificial Passenger concept of speech interactivity, which was designed to prevent a driver from falling asleep, is used as the backbone for a number of technology innovations for Telematics applications in the future.


Telematics Voice technology Driving Voice user interface Natural language understanding 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Amir, N., & Ron, S. (1998) Towards an automatic classification of emotions in speech. Proceedings of the 5th International Conference of Spoken Language Processing (pp. 555–558).Google Scholar
  2. Bahl, L. R., et al. (1995). Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal Task. International Conference on Acoustics, Speech and Signal Processing, 1, 41-44.Google Scholar
  3. Bahl, L. R., Brown, P. F., de Souza, P. V., & Mercer, R. L. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(7), 1001-1008.CrossRefGoogle Scholar
  4. Bahl, L. R., de Souza, P. V., Gopalakrishnan, P. S., Nahamoo, D., & Picheny, M. (1994). Robust methods for using context-dependent features and models in continuous speech recognizer. Proceedings of the International Conference on Acoustics, Speech and Signal Processin, I, 533-536.Google Scholar
  5. Bahl, L. R., de Souza, P. V., Gopalakrishnan, P. S., Nahamoo, D., & Picheny, M. (1991). Decision trees for phonological rules in continuous speech. Proceedings of the 1991 International Conference on Acoustic, Speech, and Signal Processing(pp.185-188). Toronto, Canada.Google Scholar
  6. Bain, K., Basson, S., Faisman, A., & Kanevsky, D. (2005). Accessibility, transcription, and access everywhere, IBM Systems Journal, 44(3), 589-604.CrossRefGoogle Scholar
  7. Basson, S., Fairweather, P., & Kanevsky, D. (2003). Artificial passenger with condition sensors. U.S. Patent No. 6,792,339.Google Scholar
  8. Basson, S. Kanevsky, D., Sabath, M., Sedivy, J., & Zlatsin, A. (2001). Virtual cooperative network formed by local clients in zones without cellular services. U.S. Patent No. 6,965,773.Google Scholar
  9. Bellegarda, J. R. (1996). Context-dependent vector clustering for speech recognition. In C-H. Lee and F. K. Song (Eds.), Automatic speech and speaker recognition (pp. 133-153), Boston: Kluwer Academic Publishers.Google Scholar
  10. Comerford, L., Frank, D., Gopalakrsihnan, P., Gopinath, R., & Sedivy, J. (2001). The IBM personal Speech Assistant. Proceedings of the International Conference on Acoustics, Speech and Signal Processing. Google Scholar
  11. Deligne, S., Eide, E., Gopinath, R., Kanevsky, D., Maison, B., Olsen, P., Printz, H., & Sedivy, J. (2001). Low-resource speech recognition of 500-word vocabularies. Proceedings of EUROSPEECH 01.Google Scholar
  12. Eisenberg, A. (2001, December 27). A passenger whose chatter is always appreciated, The New York Times. Scholar
  13. Ferguson, G., Allen J. F., Miller, B. W. & Ringger, E. K. (1996). The design and implementation of the TRAINS-96 system: A prototype mixed-initiative planning assistant (TRAINS Technical Note 96-5). Rochester, NY: The University of Rochester, Computer Science Department.Google Scholar
  14. Gales, M. J. F., & Woodland, P.C. (1996). Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 10(4), 249-264.CrossRefGoogle Scholar
  15. Gazdar G., & Mellish, C. (1989). Natural language processing in POP-11. New York: Addison-Wesley.Google Scholar
  16. Gellatly, A. W., & Dingus, T. A. (1998). Speech recognition and automotive applications: Using speech to perform in-vehicle tasks. Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting(pp. 1247-1251). Santa Monica, CA: Human Factors and Ergonomics Society.Google Scholar
  17. Global System for Mobile Comminications (2004). Recognition performance evaluations of codecs for Speech Enabled Services” (SES) (Release 6) (Technical Report 3GPP TR 26.943 v6.00 (2004-12)). Scholar
  18. Gopalakrishnan, P. S., Kanevsky, D., N’adas, A., & Nahamoo, D. (1991). An inequality for rational functions with applications to some statistical estimation problems IEEE Transactions on Information Theory, 37(1), 107-113.zbMATHCrossRefGoogle Scholar
  19. Gould, L. S. (2003). IBM’s near-future technologies for telematics. In Automotive design and production. Scholar
  20. Green, M., & Senders, J. (2004) Human error in road accidents. Scholar
  21. IBM – Haifa. (2007). .Google Scholar
  22. Jelinek, F. (1999). Statistical methods for speech recognition. Cambridge, MA: MIT Press.Google Scholar
  23. Jelinek, F., et al. (1994). Decision tree parsing using a hidden derivation model. Proceedings of the ARPA Workshop on Human Language Technology, Princeton, N.J.Google Scholar
  24. Kanevsky, D. (2007). Predicitive user modeling (IBM Research Report).Google Scholar
  25. Kanevsky, D. (2007). Evaluating/predicting consequences of wrong recognition(IBM Research Report).Google Scholar
  26. Kanevsky, D., Churchill, B., Faisman, A., Sicconi, R., & Nahamoo, D. (2004). Driver safety manager Proceedings of SPECOM 2004. .Google Scholar
  27. Kanevsky, D., Libal, V., Sedivy, J., Zadrozny, W. (2002). Speaker model adaptation via network of similar users. U.S. Patent: No 6,442,519.Google Scholar
  28. Kanevsky, D., & Maes, S. (2002). Apparatus and methods for user recognition employing behavioral passwords, U.S. Patent No. 6,421,453.Google Scholar
  29. Kanevsky, D. R., Sicconi, R., & Viswanathan, M. (2004). Touch gesture based interface for motor vehicle. U.S. Patent application 20060047386, filed August 31, 2004.Google Scholar
  30. Kanevsky, D., Maes, S., Sorensen, H., & Scott, J. (2003). Conversational data mining. U.S. Patent No. 6,665,644.Google Scholar
  31. Kanevsky, D., Samdani, T., Zadrozny, W., & Zlatsin, A. (2003). System and method for resolving decoding ambiguity via dialog. U.S. Patent No. 6,587,818.Google Scholar
  32. Kanevsky, D., Wolf, C., & Zadrozny, W. (2002). Language model adaptation via network of similar users. U.S. Patent: No. 6,484,136.Google Scholar
  33. Kanevsky, D., & Yashchin, E. (2003). Method and system for off-line detection of textual topical changes and topic identification via likelihood based methods for improved language modeling. U.S. Patent: No. 6,529,902.Google Scholar
  34. Kanevsky, D. & Zadrozny, W. (2001). Sleep prevention dialog based car system. U.S. Patent: No. 6,236,968.Google Scholar
  35. Kanevsky, D. Zlatsin, A. (2001). Distributed personalized advertisement system and method. U.S. Patent: No 6,334,109.Google Scholar
  36. Kanevsky, D., Zlatsin, A., & Pickover, C. (2003). Educational monitoring method and system for improving interactive skills based on participants on the network. U.S. Patent: No. 6,505,208.Google Scholar
  37. Kensinger E. A., & Corkin, S. (2004). Two routes to emotional memory: Distinct neural processes for valence and arousal, PNAS, 101(9), 3310-3315. Scholar
  38. Lee, C.H., & Gauvain, J. L. (1996). Bayesian adaptive learning and MAP estimation of HMM. In C-H. Lee and F. K. Song (Eds.), Automatic speech and speaker recognition(pp. 109-132). Boston: Kluwer Academic Publishers.Google Scholar
  39. Lehnert, W., Dyer, M., Johnson, P., Yang, C. J., & Harley, S. (1983). BORIS–An experiment in-depth understanding of narratives, Artificial Intelligence, 20(1), 15-62.CrossRefGoogle Scholar
  40. Lincoln, M., Cox, S., & Ringland, S. (1998). A comparison of two unsupervised approaches to accent identification. Proceedings of the 1998 International Conference on Spoken Language Processing (ICSLP 98)(pp. 317-320). Sydney, Australia.Google Scholar
  41. Mangu, L., Brill, E., & Stolcke, A. (1999) Finding consensus among words: Lattice-based word error minimization. Proceedings of EUROSPEECH99.Google Scholar
  42. Matrouf, D., Adda-Decker, M., Lamel L., & Gauvain, J. (1998). Language identification incorporating lexical information. Proceedings of the 1998 International Conference on Spoken Language Processing (ICSLP 98)(pp.181-184) Sydney, Australia.Google Scholar
  43. McCallum, M. C., Campbell, J. L., Richman, J. B., & Brown, J. L. (2004). Speech recognition and in-vehicle telematics devices: Potential reductions in driver distraction. International Journal of Speech Technology, 7(1), 25-33.CrossRefGoogle Scholar
  44. National Highway Traffic Safety Administration (NHTSA). (2007). gov/departments/nrd-13/DriverDistraction.html.Google Scholar
  45. Normandin, Y. (1991). Hidden Markov models, maximum mutual information estimation and the speech recognition problem. Unpublished Ph.D. thesis, McGill University, Montreal, Canada.Google Scholar
  46. Pereira, C., & Watson, C. (1998). Some acoustic characteristics of emotion. Proceedings of the 5th International Conference of Spoken Language Processing (pp. 927-930).Google Scholar
  47. Picard R. W., & Scheirer, J. (2001). The Galvactivator: A glove that senses and communicates skin conductivity, MIT Media Laboratory. tech-reports/TR-542.pdf .Google Scholar
  48. Rosetta Consortium. (2007). references.htm.Google Scholar
  49. Sample, I. (2001). You drive me crazy. New Scientist. mg17123002.900.html .Google Scholar
  50. Sicconi, R., & Lai, J. (2007). Speech interface in telematics,Unpublished paper.Google Scholar
  51. Siebert, K-W. (2003). European telecommunications market trends: Future challenges for mobile service providers.{%}20Siebert.pdf .Google Scholar
  52. Treat, J. R., Tumbas, N. S., McDonald, S. T., Shinar, D., Hume, R. D., Mayer, R. E., Stanisfer, R.L. & Castellan, N. J. (1977), Tri-level study of the causes of traffic accidents (U. S. Department of Transportation Report No. DOT-HS-034-3-535-77 (TAC).Google Scholar
  53. Ward, M. (2001). Smart dashboard watches drivers, BBC News. science/nature/1445342.stm .Google Scholar
  54. Whiteside, S. P. (1998). Simulated emotions: An acoustic study of voice and perturbation measures. Proceedings of the 5th International Conference of Spoken Language Processing (pp. 699-703).Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2008

Authors and Affiliations

  • Dimitri Kanevsky
    • 1
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations