Abstract
We can still create computer programs displaying only the most rudimentary natural language processing capabilities. One of the greatest barriers to advanced natural language processing is our inability to overcome the linguistic knowledge acquisition bottleneck. In this paper, we describe recent work in a number of areas, including grammar checker development, automatic question answering, and language modeling, where state of the art accuracy is achieved using very simple methods whose power comes entirely from the plethora of text currently available to these systems, as opposed to deep linguistic analysis or the application of state of the art machine learning techniques. This suggests that the field of NLP might benefit by concentrating less on technology development and more on data acquisition.
Keywords
- Natural Language Processing
- Latent Semantic Analysis
- Question Answering
- Training Corpus
- British National Corpus
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Banko, M. and Brill, E. Scaling to Very Very Large Corpora for Natural Language Disambiguation. Proceedings of the Association for Computational Linguistics, 2001.
Banko, M. and Brill, E. Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing. Human Language Technologies Conference, 2001.
C. Clarke, G. Cormack and T. Lyman. Exploiting redundancy in question answering. In Proceedings of SIGIR’2001.
Dumais, S., Banko, M., Brill, E., Lin, J. and Ng, A. Web question answering: is more always better? In Proceedings of SIGIR 2002.
Golding, A.R. and Roth, D. A Winnow-Based Approach to Context-Sensitive Spelling Correction. Machine Learning, 34:107–130.
Golding, A.R. and Schabes, Y. Combining trigram-based and feature-based methods for context-sensitive spelling correction. In Proc. 34th Annual Meeting of the Association for Computatoin Lingusitcs. Santa,Cruz, Ca.
Jones, M. P. and Martin, J. H. Contextual spelling correction using latent semantic analysis.
Keller, F., Lapata, M. Ourioupina, O. Using the Web to Overcome Data Sparseness. In Proceedings of the Conference on Empirical Methods in Natural Langauge Processing.
Kwok, C., Etzioni, O. and Weld, D. (2001). Scaling question answering to the Web. In Proceedings of WWW’10.
Mangu, L and Brill, E. Automatic rule acquisition for spelling correction. In Proc. 14th International Conference on Machine Learing. Morgan Kaufmann.
Sapir, E. Language: An Introduction to the Study of Speech. 1921.
Yarowsky, D. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proc. 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM.
Zhu, X. and Rosenfeld, R.. Improving Trigram Language Modeling with the World Wide Web. In proceedings of International Conference on Acoustics, Speech, and Signal Processing, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brill, E. (2003). Processing Natural Language without Natural Language Processing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2003. Lecture Notes in Computer Science, vol 2588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36456-0_37
Download citation
DOI: https://doi.org/10.1007/3-540-36456-0_37
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00532-2
Online ISBN: 978-3-540-36456-6
eBook Packages: Springer Book Archive