Experiments with Linguistic Categories for Language Model Optimization
In this work we obtain robust category-based language models to be integrated into speech recognition systems. Deductive rules are used to select linguistic categories and to match words with categories. Statistical techniques are then used to build n-gram Language Models based on lexicons that consist of sets of categories. The categorization procedure and the language model evaluation were carried out on a taskoriented Spanish corpus. The cooperation between deductive and inductive approaches has proved efficient in building small, reliable language models for speech understanding purposes.
KeywordsTraining Corpus Speech Recognition System Word Class Word Error Rate Application Task
Unable to display preview. Download preview PDF.
- 1.“The MACO Morphological Analyzer.” http://www.lsi.upc.es/nlp.
- 2.“The CMU-Cambridge Statistical Language Modeling toolkit.” http://svr-www.eng.cam.ac.uk/prc14/toolkit-documentation.html.