Processing Natural Language without Natural Language Processing

  • Eric Brill
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2588)


We can still create computer programs displaying only the most rudimentary natural language processing capabilities. One of the greatest barriers to advanced natural language processing is our inability to overcome the linguistic knowledge acquisition bottleneck. In this paper, we describe recent work in a number of areas, including grammar checker development, automatic question answering, and language modeling, where state of the art accuracy is achieved using very simple methods whose power comes entirely from the plethora of text currently available to these systems, as opposed to deep linguistic analysis or the application of state of the art machine learning techniques. This suggests that the field of NLP might benefit by concentrating less on technology development and more on data acquisition.


Natural Language Processing Latent Semantic Analysis Question Answering Training Corpus British National Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Banko, M. and Brill, E. Scaling to Very Very Large Corpora for Natural Language Disambiguation. Proceedings of the Association for Computational Linguistics, 2001.Google Scholar
  2. 2.
    Banko, M. and Brill, E. Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing. Human Language Technologies Conference, 2001.Google Scholar
  3. 3.
    C. Clarke, G. Cormack and T. Lyman. Exploiting redundancy in question answering. In Proceedings of SIGIR’2001.Google Scholar
  4. 4.
    Dumais, S., Banko, M., Brill, E., Lin, J. and Ng, A. Web question answering: is more always better? In Proceedings of SIGIR 2002.Google Scholar
  5. 5.
    Golding, A.R. and Roth, D. A Winnow-Based Approach to Context-Sensitive Spelling Correction. Machine Learning, 34:107–130.Google Scholar
  6. 6.
    Golding, A.R. and Schabes, Y. Combining trigram-based and feature-based methods for context-sensitive spelling correction. In Proc. 34th Annual Meeting of the Association for Computatoin Lingusitcs. Santa,Cruz, Ca.Google Scholar
  7. 7.
    Jones, M. P. and Martin, J. H. Contextual spelling correction using latent semantic analysis.Google Scholar
  8. 8.
    Keller, F., Lapata, M. Ourioupina, O. Using the Web to Overcome Data Sparseness. In Proceedings of the Conference on Empirical Methods in Natural Langauge Processing.Google Scholar
  9. 9.
    Kwok, C., Etzioni, O. and Weld, D. (2001). Scaling question answering to the Web. In Proceedings of WWW’10.Google Scholar
  10. 10.
    Mangu, L and Brill, E. Automatic rule acquisition for spelling correction. In Proc. 14th International Conference on Machine Learing. Morgan Kaufmann.Google Scholar
  11. 11.
    Sapir, E. Language: An Introduction to the Study of Speech. 1921.Google Scholar
  12. 12.
    Yarowsky, D. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proc. 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM.Google Scholar
  13. 13.
    Zhu, X. and Rosenfeld, R.. Improving Trigram Language Modeling with the World Wide Web. In proceedings of International Conference on Acoustics, Speech, and Signal Processing, 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Eric Brill
    • 1
  1. 1.Microsoft ResearchRedmond

Personalised recommendations