Real Time Challenges to Handle the Telephonic Speech Recognition System

  • Joyanta Basu
  • Milton Samirakshma Bepari
  • Rajib Roy
  • Soma Khan
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 222)

Abstract

Present paper describes the real time challenges to design the telephonic Automatic Speech Recognition (ASR) System. Telephonic speech data are collected automatically from all geographical regions of West Bengal to cover major dialectal variations of Bangla spoken language. All incoming calls are handled by Asterisk Server i.e. Computer telephony interface (CTI). The system asks some queries and users’ spoken responses are stored and transcribed manually for ASR system training. At the time of application of telephonic ASR, users’ voice queries are passed through the Signal Analysis and Decision (SAD) Module and after getting its decision speech signal may enter into the back-end Automatic Speech Recognition (ASR) Engine and relevant information is automatically delivered to the user. In real time scenario, the telephonic speech contains channel drop, silence or no speech event, truncated speech signal, noisy signal etc. along with the desired speech event. This paper deals with some techniques which will handle such unwanted signals in case of telephonic speech to certain extent and able to provide almost desired speech signal for the ASR system. Real time telephonic ASR system performance is increased by 8.91 % after implementing SAD module.

Keywords

Asterisk server Interactive voice response Transcription tool Temporal and spectral features Knowledge base 

References

  1. 1.
    Lee K-M, Lai J (2005) Speech vs. touch: a comparative study of the use of speech and DTMF keypad for navigation. International Journal of Human Computer Interaction IJHCI, vol 19(3)Google Scholar
  2. 2.
    Furui S (2000) Speech recognition technology in the ubiquitous/wearable computing environment. In: Proceedings of the international conference on acoustics speech and signal processing, pp 3735–3738Google Scholar
  3. 3.
    Maes SH, Chazan D, Cohen G, Hoory R (2000) Conversational networking: conversational protocols for transport, coding, and control. In: Proceedings of the international conference on spoken language processingGoogle Scholar
  4. 4.
    Gomillion D, Dempster B Building telephony system with asterisk. ISBN: 1-904811-15-9, Packet Publishing LtdGoogle Scholar
  5. 5.
    Meggelen JV, Madsen L, Smith J Asterisk: the future of telephony, ISBN-10: 0-596-51048-9, ISBN-13: 987-0-596-51048-0, O’REILLGoogle Scholar
  6. 6.
  7. 7.
    Basu J, Khan S, Roy R, Bepari MS (2011) Designing voice enabled railway travel enquiry system: an IVR based approach on bangla ASR. ICON 2011, Anna University, Chennai, India, pp 138–145Google Scholar
  8. 8.
    Basu J, Bepari MS, Roy R, Khan S (2012) Design of telephonic speech data collection and transcription methodology for speech recognition systems. FRSM 2012, KIIT, Gurgaon, pp 147–153Google Scholar
  9. 9.
    Basu J, Basu T, Mitra M, Das Mandal S (2009) Grapheme to Phoneme (G2P) conversion for bangla. O-COCOSDA international conference, pp 66–71 Google Scholar

Copyright information

© Springer India 2013

Authors and Affiliations

  • Joyanta Basu
    • 1
  • Milton Samirakshma Bepari
    • 1
  • Rajib Roy
    • 1
  • Soma Khan
    • 1
  1. 1.Centre for Development of Advanced ComputingKolkataIndia

Personalised recommendations