A Fast Algorithm for Words Reordering Based on Language Model
What appears to be given in all languages is that words can not be randomly ordered in sentences, but that they must be arranged in certain ways, both globally and locally. The “scrambled” words into a sentence cause a meaningless sentence. Although the use of manually collected grammatical rules can boost the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This work proposes a method for repairing word order errors in English sentences by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of a permutations’ filtering approach in order to reduce the search space among the possible sentences with reordered words. The filtering method is based on bigrams’ probabilities. In this work the search space is further reduced using a threshold over bigrams’ probabilities. The experimental results show that more than 95% of the test sentences can be repaired using this technique. The comparative advantage of this method is that it is not restricted into a specific set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns. Unlike most of the approaches, the proposed method is applicable to any language (language models can be simply computed in any language) and does not work only with a specific set of words. The use of parser and/or tagger is not necessary.
KeywordsLanguage Model Fast Algorithm Confusion Matrix Training Corpus Test Sentence
Unable to display preview. Download preview PDF.
- 1.Atwell, E.S.: How to detect grammatical errors in a text without parsing it. In: Proceedings of the 3rd EACL, pp. 38–45 (1987)Google Scholar
- 2.Bigert, J., Knutsson, O.: Robust error detection: A hybrid approach combining unsupervised error detection and linguistic knowledge. In: Proceedings of Robust Methods in Analysis of Natural language Data (ROMAND 2002), pp. 10–19 (2002)Google Scholar
- 3.Chodorow, M., Leacock, C.: An unsupervised method for detecting grammatical errors. In: Proceedings of NAACL 2000, pp. 140–147 (2000)Google Scholar
- 4.Feyton, C.M.: Teaching ESL/EFL with the internet. Merill Prentice- Hall (2002)Google Scholar
- 5.Folse, K.S.: Intermediate TOEFL Test Practices (rev. ed.). The University of Michigan Press, Ann Arbor (1997)Google Scholar
- 7.Golding, A.A.: Bayesian hybrid for context-sensitive spelling correction. In: Proceedings of the 3rd Workshop on Very Large Corpora, pp. 39–53 (1995)Google Scholar
- 8.Hawkins, J.A.: A Performance Theory of Order and Constituency. Cambridge University Press, Cambridge (1994)Google Scholar
- 9.Heift, T.: Intelligent Language Tutoring Systems for Grammar Practice. Zeitschrift für Interkulturellen Fremdsprachenunterricht (Online) 6(2), 15 (2001)Google Scholar
- 11.Sjöbergh, J.: Chunking: an unsupervised method to find errors in text. In: Proceedings of the 15th Nordic Conference of Computational Linguistics, NODALIDA (2005)Google Scholar