Zusammenfassung
Hidden Control Neural Networks (HCN Networks) eignen sich für vielfältige Mustererkennungsaufgaben. Hier wird ein Spracherkennungsverfahren zur sprecherunabhängigen Einzelworterkennung beschrieben, welches die Implementierung von Benutzerschnittstellen zur Steuerung von Geräten mittels einfacher Wort-Kommandos ermöglicht. Um das Verfahren zu evaluieren, wurden Minimalpaare verwendet, also Wortpaare, innerhalb derer sich die Worte lediglich um ein einziges Phonem unterscheiden. Es gelang, die Erkennungsrate zu erhöhen, indem Zeitabschnitte der Aufnahme, welche sich im Training als erkennungsrelevant herausgestellt haben, verstärkt Berücksichtigung finden.
Abstract
Hidden control neural networks (HCN networks) are suitable for a variety of pattern recognition techniques. The speech recognizer described here is built for speaker-independent single-word recognition and is intended to implement user interfaces to control devices via simple word-commands. To evaluate the speech recognizer, it has been applied to minimum pairs. Within a minimum pair two words differ only in a single phoneme. It was achieved to increase the recognition rate while taking those periods of time especially into account, that are found to contain the relevant difference.
Schrifttum
Levin, E.: Hidden control neural architecture modeling of nonlinear time varying systems and its applications. Transactions on Neural Networks 4 (1993), pp. 109–116.
Levin, E.: Word recognition using hidden control neural architecture. Proceedings of the ICASSP (1990), pp. 433–437.
Forney, G. D.: The viterbi algorithm. Proceedings of the IEEE, 61 (1973), pp. 268–278.
Vidal, E., Marzal, A.: A new technique for automatic segmentation of continuous speech. NATO ASI Speech Recognition and Understanding F75 (1990), pp. 543–548.
Widrow, B., Lehr, M.: 30 Years of adaptive neural networks: Perceptron, madaline and backpropagation. Proceedings of the IEEE 78 (1973), pp. 1415–1442.
Rumelhart, D. E., Hinton, G. E., Williams, R. J.: Learning internal representation by error propagation. (Parallel distributed processing). Cambridge: MIT Press 1986.
Cybenko, G.: Approximation by superposition of a sigmodial function. Mathematics of Control, Signals and Systems (1989), pp. 303–314.
Juang, B.-H., Katagiri, S.: Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing 40 (1992), pp. 3043–3054.
Carrol, S. M., Dickinson, B. W.: Construction of neural networks using the radon transform. Proceedings of the IJCNN (1989), pp. I-607–I-611.
Levin, E., Gewirtzman, R., Inbar, G. F.: Neural network architecture for adaptive system modeling and control. Proceedings of the IJCNN (1989), pp. II-311–II-316.
Lapedes, A., Farber, R.: Los Alamos National Laboratory Technical Report LA/UR87/2662: Nonlinear signal processing using neural networks: Prediction and system modeling. Los Alamos NM. 1987.
Tishby, N.Z.: A dynamical systems approach to speech processing. Proceedings of the ICASSP (1990), pp. 365–369.
Atal, B. S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoustical Society of America 55 (1974), no. 6, pp. 1304–1312.
Eppinger, B., Herter, E.: Mustererkennung bei Sprachsignalen. S. 147 bis 194. Sprachverarbeitung. Esstingen: STZ. 1993.
Iso, K. Watanabe, T.: Speaker-independent word recognition using a neural prediction model. Proceedings of the ICASSP (1990), pp. 441–444.
Tebelskis, J., Waibel, A.: Large vocabulary recognition using linked predictive neural networks. Proceedings of the ICASSP (1990), pp. 437–440.
Hanazawa, T., Hinton, G., Shikano, K., Waibel, A., Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing IEEE 37 (1989), no. 3, pp. 328–339.
Sawai, H., Shikano, K., Miyatake, M.: Integrated training for spotting Japanese phonemes using large phonemic time-delay neutral networks. Proceedings of the ICASSP (1990), pp. 449–452.
Bottou, L., Fogelman-Soulie, F., Blanchet, P., Lienard, J. S.: Experiments with time delay networks and dynamic time warping for speaker independent isolated digit recognition. Proceedings Eurospeech (1989), pp. 537–540.
Rabiner, L. R., Wilpon, J. G., Soong, F. K.: High performance connected digit recognition using hidden markov models. IEEE Transactions on Acoustics, Speech and Signal Processing 37 (1989), no. 8, pp. 1214–1225.
Rabiner, L. R.: A Tutorial on hidden markov models and selected applications in speech recognition. IEEE Proceedings 77 (1989), no. 2, pp. 257–286.
Bourlard, H., Wellekens, C. J.: Links between markov models and multilayered perceptrons. Advances in Neural Network Information Processing Systems (1988), pp. 502–510.
Bridle, J. S.: Neural network or hidden markov models for automatic speech recognition: Is there a choice? NATO ASI Speech Recognition and Understanding (1990), pp. 225–236.
Dietrich, S.: Hidden Markov Modelle zum Mustervergleich in der Spracherkennung. Diplomarbeit am Institut für Computersprachen, Institut für Nachrichten und Hochfrequenztechnik, TU Wien, 1994.
Hickersberger, H.: Spracherkennung mit Predictive Neural Networks. Programmierpraktikum am Institut für Computertechnik, TU Wien, 1997.
Hickersberger, H.: Spracherkennung mit Hidden Control Neural Networks. Diplomarbeit Siemens AG, Institut für Computertechnik, TU Wien, 1997.
Author information
Authors and Affiliations
Additional information
Diese Arbeit wurde anläßlich der letzten Generalversammlung des ÖVE am 27. November 1997 mit einem GIT-Preis ausgezeichnet.
Rights and permissions
About this article
Cite this article
Hickersberger, H. Spracherkennung mit Hidden Control Neural Networks. Elektrotech. Inftech. 115, 245–250 (1998). https://doi.org/10.1007/BF03159578
Issue Date:
DOI: https://doi.org/10.1007/BF03159578
Schlüsselwörter
- Spracherkennung
- Neuronale Netze
- Hidden Control Neural Networks
- dynamische Programmierung
- Backpropagation