Abstract
We present a general weighted grammar software library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. The underlying algorithms were designed to support a wide variety of semirings and the representation and use of very large grammars and automata of several hundred million rules or transitions. We describe several algorithms and utilities of this library and point out in each case their application to several text and speech processing tasks.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)
Allauzen, C., Crochemore, M., Raffinot, M.: Efficient experimental string matching by weak factor recognition. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 51–72. Springer, Heidelberg (2001)
Allauzen, C., Mohri, M., Roark, B.: Generalized algorithms for constructing language models. In: Proceedings of ACL 2003, pp. 40–47 (2003)
Allauzen, C., Raffinot, M.: Simple optimal string matching. Journal of Algorithms 36(1), 102–116 (2000)
Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., Seiferas, J.I.: The smallest automaton recognizing the subwords of a text. Theoretical Computer Science 40(1), 31–55 (1985)
Blumer, A., Blumer, J., Haussler, D., McConnel, R.M., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. Journal of the ACM 34(3), 578–595 (1987)
Cortes, C., Mohri, M.: Distribution Kernels Based on Moments of Counts. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada (July 2004)
Crochemore, M.: Transducers and repetitions. Theoretical Computer Science 45(1), 63–86 (1986)
Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string-matching algorithms. Algorithmica 12(4/5), 247–267 (1994)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustic, Speech, and Signal Processing 35(3), 400–401 (1987)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of ICASSP, vol. 1, pp. 181–184 (1995)
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)
Mohri, M.: Syntactic analysis by local grammars automata: an efficient algorithm. In: Proceedings of the International Conference on Computational Lexicography (COMPLEX 1994), Linguistic Institute, Hungarian Academy of Science (1994)
Mohri, M.: String-matching with automata. Nordic Journal of Computing 2(2), 217–231 (1997)
Mohri, M.: Weighted Grammar Tools: the GRM Library. In: Robustness in Language and Speech Technology, pp. 165–186. Kluwer, Dordrecht (2001)
Mohri, M., Nederhof, M.-J.: Regular Approximation of Context-Free Grammars through Transformation. In: Robustness in Language and Speech Technology, pp. 153–163. Kluwer, Dordrecht (2001)
Mohri, M., Pereira, F.C.N., Riley, M.: The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231, 17–32 (2000), http://www.research.att.com/sw/tools/fsm
Mohri, M., Pereira, F.C.N., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)
Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language 8(1), 1–38 (1994)
Seymore, K., Rosenfeld, R.: Scalable backoff language models. In: Proceedings of ICSLP, Philadelphia, Pennsylvania, vol. 1, pp. 232–235 (1996)
Stolcke, A.: Entropy-based pruning of backoff language models. In: Proc. DARPA Broadcast News Transcription and Understanding Workshop, pp. 270–274 (1998)
Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proc. Intl. Conf. on Spoken Language Processing (ICSLP 2002), vol. 2, pp. 901–904 (2002)
Ullian, J.: Partial algorithm problems for context free languages. Information and Control 11, 80–101 (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Allauzen, C., Mohri, M., Roark, B. (2005). A General Weighted Grammar Library. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds) Implementation and Application of Automata. CIAA 2004. Lecture Notes in Computer Science, vol 3317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30500-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-30500-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24318-2
Online ISBN: 978-3-540-30500-2
eBook Packages: Computer ScienceComputer Science (R0)