NOLISP 2007: Advances in Nonlinear Speech Processing pp 132-141 | Cite as
Manifold Learning-Based Feature Transformation for Phone Classification
Abstract
This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of manifold learning techniques have been developed in recent years that attempt to discover this type of underlying geometric structure. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional representations produced by applying these techniques to MFCC feature vectors are evaluated in several phone classification tasks on the TIMIT corpus. Classification accuracy is analysed and compared to conventional MFCC features and those transformed with PCA, a linear dimensionality reduction method. It is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features. The best phone classification accuracy in general is demonstrated by feature transformation with Isomap.
Keywords
Locally Linear Embedding Dimensionality Reduction Method Manifold Learning Nonlinear Dimensionality Reduction MFCC FeaturePreview
Unable to display preview. Download preview PDF.
References
- 1.Klein, W., Plomp, R., Pols, L.C.W.: Vowel spectra, vowel spaces, and vowel identification. J. Acoust. Soc. Amer. 48(4), 999–1009 (1970)CrossRefGoogle Scholar
- 2.Togneri, R., Alder, M., Attikiouzel, J.: Dimension and structure of the speech space. IEE Proceedings-I 139(2), 123–127 (1992)Google Scholar
- 3.Banbrook, M., McLaughlin, S., Mann, I.: Speech characterization and synthesis by nonlinear methods. Speech and Audio Processing 7(1), 1–17 (1999)CrossRefGoogle Scholar
- 4.Jansen, A., Niyogi, P.: Intrinsic Fourier analysis on the manifold of speech sounds. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 241–244 (2006)Google Scholar
- 5.Jolliffe, I.: Principal Component Analysis. Springer Series in Statistics. Springer-Verlag, New York (1986)Google Scholar
- 6.Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)MATHGoogle Scholar
- 7.Wang, X., Paliwal, K.K.: Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition. Pattern Recognition 36(10), 2429–2439 (2003)MATHCrossRefGoogle Scholar
- 8.Somervuo, P.: Experiments with linear and nonlinear feature transformations in HMM based phone recognition. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 52–55 (April 2003)Google Scholar
- 9.Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
- 10.Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
- 11.Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 585–591. MIT Press, Cambridge, MA (2002)Google Scholar
- 12.Hegde, R.M., Murthy, H.A.: Cluster and intrinsic dimensionality analysis of the modified group delay feature for speaker classification. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1172–1178. Springer, Heidelberg (2004)Google Scholar
- 13.Jain, V., Saul, L.K.: Exploratory analysis and visualization of speech and music by locally linear embedding. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 984–987 (2004)Google Scholar
- 14.Errity, A., McKenna, J.: An investigation of manifold learning for speech analysis. In: Proc. of the Int. Conf. on Spoken Language Processing (Interspeech 2006 - ICSLP), Pittsburgh PA, USA, pp. 2506–2509 (September 2006)Google Scholar
- 15.Belkin, M., Niyogi, P.: Semi-supervised learning on Riemannian manifolds. Machine Learning 56(1-3), 209–239 (2004)MATHCrossRefGoogle Scholar
- 16.Vapnik, V.: The Nature of Statistical Learning Theory. Springer, N.Y (1995)MATHGoogle Scholar
- 17.Garofalo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM. NIST (1990)Google Scholar