Manifold Learning-Based Feature Transformation for Phone Classification

  • Andrew Errity
  • John McKenna
  • Barry Kirkpatrick
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4885)

Abstract

This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of manifold learning techniques have been developed in recent years that attempt to discover this type of underlying geometric structure. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional representations produced by applying these techniques to MFCC feature vectors are evaluated in several phone classification tasks on the TIMIT corpus. Classification accuracy is analysed and compared to conventional MFCC features and those transformed with PCA, a linear dimensionality reduction method. It is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features. The best phone classification accuracy in general is demonstrated by feature transformation with Isomap.

Keywords

Locally Linear Embedding Dimensionality Reduction Method Manifold Learning Nonlinear Dimensionality Reduction MFCC Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Klein, W., Plomp, R., Pols, L.C.W.: Vowel spectra, vowel spaces, and vowel identification. J. Acoust. Soc. Amer. 48(4), 999–1009 (1970)CrossRefGoogle Scholar
  2. 2.
    Togneri, R., Alder, M., Attikiouzel, J.: Dimension and structure of the speech space. IEE Proceedings-I 139(2), 123–127 (1992)Google Scholar
  3. 3.
    Banbrook, M., McLaughlin, S., Mann, I.: Speech characterization and synthesis by nonlinear methods. Speech and Audio Processing 7(1), 1–17 (1999)CrossRefGoogle Scholar
  4. 4.
    Jansen, A., Niyogi, P.: Intrinsic Fourier analysis on the manifold of speech sounds. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 241–244 (2006)Google Scholar
  5. 5.
    Jolliffe, I.: Principal Component Analysis. Springer Series in Statistics. Springer-Verlag, New York (1986)Google Scholar
  6. 6.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)MATHGoogle Scholar
  7. 7.
    Wang, X., Paliwal, K.K.: Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition. Pattern Recognition 36(10), 2429–2439 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Somervuo, P.: Experiments with linear and nonlinear feature transformations in HMM based phone recognition. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 52–55 (April 2003)Google Scholar
  9. 9.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  10. 10.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  11. 11.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 585–591. MIT Press, Cambridge, MA (2002)Google Scholar
  12. 12.
    Hegde, R.M., Murthy, H.A.: Cluster and intrinsic dimensionality analysis of the modified group delay feature for speaker classification. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1172–1178. Springer, Heidelberg (2004)Google Scholar
  13. 13.
    Jain, V., Saul, L.K.: Exploratory analysis and visualization of speech and music by locally linear embedding. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 984–987 (2004)Google Scholar
  14. 14.
    Errity, A., McKenna, J.: An investigation of manifold learning for speech analysis. In: Proc. of the Int. Conf. on Spoken Language Processing (Interspeech 2006 - ICSLP), Pittsburgh PA, USA, pp. 2506–2509 (September 2006)Google Scholar
  15. 15.
    Belkin, M., Niyogi, P.: Semi-supervised learning on Riemannian manifolds. Machine Learning 56(1-3), 209–239 (2004)MATHCrossRefGoogle Scholar
  16. 16.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, N.Y (1995)MATHGoogle Scholar
  17. 17.
    Garofalo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM. NIST (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Andrew Errity
    • 1
  • John McKenna
    • 1
  • Barry Kirkpatrick
    • 1
  1. 1.School of Computing, Dublin City University, Dublin 9Ireland

Personalised recommendations