Advertisement

Knowledge and Information Systems

, Volume 48, Issue 2, pp 253–275 | Cite as

Phoneme sequence recognition via DTW-based classification

  • Hossein HamooniEmail author
  • Abdullah Mueen
  • Amy Neel
Regular Paper

Abstract

Phonemes are the smallest units of sound produced by a human being. Automatic classification of phonemes is a well-researched topic in linguistics due to its potential for robust speech recognition. With the recent advancement of phonetic segmentation algorithms, it is now possible to generate datasets of millions of phonemes automatically. Phoneme classification on such datasets is a challenging data mining task because of the large number of classes (over a hundred) and complexities of the existing methods. In this paper, we introduce the phoneme classification problem as a data mining task. We propose a dual-domain (time and frequency) hierarchical classification algorithm. Our method uses a dynamic time warping (DTW)-based classifier in the top layers and time–frequency features in the lower layer. We cross-validate our method on phonemes from three online dictionaries and achieved up to 35 % improvement in classification compared with existing techniques. We further modify our vowel classifier by adopting DTW distance over time–frequency coefficients and gain an additional 3 % improvement. We provide case studies on classifying accented phonemes and speaker-invariant phoneme classification. Finally, we show a demonstration of how phoneme classification can be used to recognize speech.

Keywords

Phoneme classification DTW-based classification Phonetic time series  Big data Sequence recognition 

References

  1. 1.
    Yuan J, Liberman M (2008) Speaker identification on the scotus corpus. In: Proceedings of acoustics 2008Google Scholar
  2. 2.
    Hamooni H, Mueen A (2014) Dual-domain hierarchical classification of phonetic time series. In: ICDMGoogle Scholar
  3. 3.
    Garofolo J (1993) Timit acoustic-phonetic continuous speech corpusldc93s1, web download. Philadelphia: linguistic data consortiumGoogle Scholar
  4. 4.
  5. 5.
    Lee K-F, Hon H-W (1989) Speaker-independent phone recognition using hidden Markov models, acoustics, speech and signal processing. IEEE Transa on 37(11):1641–1648Google Scholar
  6. 6.
    Dekel O, Keshet J, Singer Y (2005) An online algorithm for hierarchical phoneme classification. In: Proceedings of the first international conference on machine learning for multimodal interaction, ser. MLMI’04, 2005, pp 146–158Google Scholar
  7. 7.
    Carla L, Fernando P (2011) Phoneme recognition on the timit database. Speech Technologies, [Online]. http://www.intechopen.com/books/export/citation/BibTex/speech-technologies/phoneme-recognition-on-the-timit-database
  8. 8.
    Schwarz P, Matejka P, Cernocky J (2006) Hierarchical structures of neural networks for phoneme recognition. In: 2006 IEEE international conference on acoustics, speech and signal processing, 2006. ICASSP 2006 proceedingsGoogle Scholar
  9. 9.
    Rahman-Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22CrossRefGoogle Scholar
  10. 10.
    Salomon J (2001) Support vector machines for phoneme classification, Master of Science, School of Artificial Intelligence, Division of Informatics, University of EdinburghGoogle Scholar
  11. 11.
    Mohamed A, Hinton G (2010) Phone recognition using restricted Boltzmann machines. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP), pp 4354–4357Google Scholar
  12. 12.
    Dewey E (1970) Godfrey, relative frequency of english spellings. In: International Reading Association, Anaheim, California, May 6–9, 1970. http://files.eric.ed.gov/fulltext/ED042572.pdf
  13. 13.
    Matlab implementation to compute mel frequency cepstrum coefficients. http://www.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab/content/mfcc/mfcc.m
  14. 14.
    Mueen A, Nath S, Liu J (2010) Fast approximate correlation for massive time-series data. In: SIGMOD conference, pp 171–182Google Scholar
  15. 15.
    Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. SIGMOD Rec 23:419–429CrossRefGoogle Scholar
  16. 16.
    The cmu pronouncing dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict
  17. 17.
    Ding H, Trajcevski G, Wang X, Keogh E (2008) Querying and mining of time series data: Experimental comparison of representations and distance measures. In: Proceedings of the 34 th VLDB, pp 1542–1552Google Scholar
  18. 18.
    Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases, ser. VLDB ’02, pp 406–417Google Scholar
  19. 19.
    Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping, ser. KDD ’12, pp 262–270Google Scholar
  20. 20.
    Sart D, Mueen A, Najjar W, Niennattrakul V, Keogh EJ (2010) Accelerating dynamic time warping subsequence search with gpus and fpgas. In: ICDM, pp 1001–1006Google Scholar
  21. 21.
    Petitjean F, Ketterlin A, Ganarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693CrossRefzbMATHGoogle Scholar
  22. 22.
    Assent I, Wichterich M, Krieger R, Kremer H, Seidl T (2009) Anticipatory DTW for efficient similarity search in time series databases. Proc VLDB Endow 2(1):826–837CrossRefGoogle Scholar
  23. 23.
    Mueen A (2013) Enumeration of time series motifs of all lengths. In: ICDM, pp 547–556Google Scholar
  24. 24.
    Cesa-Bianchi N, Gentile C, Zaniboni L (2006) Hierarchical classification: combining Bayes with svm. In: Proceedings of the 23rd international conference on machine learning, ser. ICML ’06, pp 177–184Google Scholar
  25. 25.
    Repository for supporting materials. http://cs.unm.edu/~hamooni/papers/Dual_2014/index.html
  26. 26.
  27. 27.
    Yi B-K, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the fourteenth international conference on data engineering, Orlando, Florida, USA, 23-27 Feb 1998, pp 201–208Google Scholar
  28. 28.
    Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: 2013 IEEE 29th international conference on data engineering (ICDE), vol. 0, pp 1046–1055Google Scholar
  29. 29.
  30. 30.
    Word frequency data. corpus of contemporary american english. http://www.wordfrequency.info
  31. 31.
    Gmu speech accent archive. http://www.accent.gmu.edu

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of New MexicoAlbuquerqueUSA
  2. 2.Department of Speech and Hearing SciencesUniversity of New MexicoAlbuquerqueUSA

Personalised recommendations