Development of Assamese Speech Corpus and Automatic Transcription Using HTK

  • Himangshu Sarma
  • Navanath Saharia
  • Utpal Sharma
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 264)


Exact pronunciation of words of a language is not found from the written form of the language. Phonetic transcription is a step towards the speech processing of a language. For a language like Assamese it is most important because it is spoken differently in different regions of the state. In this paper we report automatic transcription of Assamese speech using Hidden Markov Model Tool Kit (HTK). We obtain accuracy of 65.26 an experiment. We transcribed recorded speech files using IPA symbols and ASCII for automatic transcription. We used 34 phones for IPA transcription and 38 for ASCII transcription.


Automatic Transcription Speech corpus Assamese HTK 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Chang, S., Shastri, L., Greenberg, S.: Automatic phonetic transcription of spontaneous speech (american English). In: Proceedings of the INTERSPEECH, Beijing, China, pp. 330–333 (2000)Google Scholar
  2. Coxhead, P.: Phones and Phonemes (2007)Google Scholar
  3. Giurgiu, M., Kabir, A.: Automatic transcription and speech recognition of Romanian corpus RO-GRID. In: Proceedings of the 35th International Conference on Telecommunications and Signal Processing, Czech Republic, pp. 465–468 (2012)Google Scholar
  4. Hasan, M.R., Jamil, M., Rabbani, M.G., Rahman, M.S.: Speaker identification using Mel frequency cepstral coefficients. Variations 1, 4 (2004)Google Scholar
  5. Ladefoged, P.: Elements of acoustic phonetics. University of Chicago Press (1995)Google Scholar
  6. Ladefoged, P., Johnstone, K.: A course in phonetics (2011),
  7. Laurent, A., Merlin, T., Meignier, S., Esteve, Y., Deléglise, P.: Iterative filtering of phonetic transcriptions of proper nouns. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Taiwan, pp. 4265–4268 (2009)Google Scholar
  8. Leung, H., Zue, V.: A procedure for automatic alignment of phonetic transcriptions with continuous speech. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, San Diego, USA, vol. 9, pp. 73–76 (1984)Google Scholar
  9. Levinson, S.E., Liberman, M.Y., Ljolje, A., Miller, L.: Speaker independent phonetic transcription of fluent speech for large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Glasgow, Scotland, pp. 441–444 (1989)Google Scholar
  10. Liang, M.S., Lyu, R.Y., Chiang, Y.C.: Phonetic transcription using speech recognition technique considering variations in pronunciation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, vol. 4, pp. 109–112 (2007)Google Scholar
  11. Nagarajan, T., Murthy, H.A., Hemalatha, N.: Automatic segmentation and labeling of continuous speech without bootstrapping. In: Proceedings of EUSIPCO, Vienna, Austria, pp. 561–564 (2004)Google Scholar
  12. Patil, H.A., Madhavi, M.C., Malde, K.D., Vachhani, B.B.: Phonetic Transcription of Fricatives and Plosives for Gujarati and Marathi Languages. In: Proceedings of International Conference on Asian Language Processing, Hanoi, Vietnam, pp. 177–180 (2012)Google Scholar
  13. Sarada, G.L., Lakshmi, A., Murthy, H.A., Nagarajan, T.: Automatic transcription of continuous speech into syllable-like units for Indian languages. Sadhana 34(2), 221–233 (2009)CrossRefGoogle Scholar
  14. Sarma, H., Saharia, N., Sharma, U., Sinha, S.K., Malakar, M.J.: Development and transcription of Assamese speech corpus. In: Proceedings of National seminar cum Conference on Recent threads and Techniques in Computer Sciences, Bodoland University, India (2013)Google Scholar
  15. Sharma, U., Kalita, J.K., Das, R.K.: Acquisition of morphology of an Indic language from text corpus. ACM Transactions on Asian Language Information Processing 7(3), 9:1–9:33 (2008)Google Scholar
  16. Sjölander, K., Beskow, J.: Wavesurfer-an open source speech tool. In: Proceedings of ICSLP, Beijing, China, vol. 4, pp. 464–467 (2000)Google Scholar
  17. Stefan-Adrian, T., Doru-Petru, M.: Rule-based automatic phonetic transcription for the Romanian language. In: Proceedings of the Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, Athens, pp. 682–686 (2009)Google Scholar
  18. Wells, J.C.: Phonetic transcription and analysis. Encyclopedia of Language and Linguistics, pp. 386–396. Elsevier, Amsterdam (2006)Google Scholar
  19. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK book, version, 3.4th edn. Cambridge University Engineering Department (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Himangshu Sarma
    • 1
  • Navanath Saharia
    • 1
  • Utpal Sharma
    • 1
  1. 1.Tezpur UniversityAssamIndia

Personalised recommendations