Skip to main content

The Trend towards Statistical Models in Natural Language Processing

  • Conference paper

Part of the book series: ESPRIT Basic Research Series ((ESPRIT BASIC))

Abstract

Over the past few years, we have seen a significant increase in the number and sophistication of computational studies of large bodies of text and speech. Such studies have a wide variety of topics and motives, from lexicography and studies of language change, to methods for automated indexing and information retrieval, tagging and parsing algorithms, techniques for generating idiomatic text, cognitive models of language acquisition, and statistical models for application in speech recognizers, text or speech compression schemes, optical character readers, machine translation systems, and spelling correctors.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ACL: 1989, ‘ACL Data Collection Initiative Announcement’, The Finite String 15.

    Google Scholar 

  2. Bahl, L.B., Brown, P.F., de Souza, P.V., and Mercer, R.L.: 1990, ‘A Tree-Based Statistical Language Model for Natural Language Speech Recognition’. In Waibel, A., and Lee, K.-F., Readings in Speech Recognition, San Mateo, CA: Morgan Kaufman.

    Google Scholar 

  3. Brill, E., Magerman, D., Marcus, M., and Santorini, B.: 1990, ‘Deducing Linguistic Structure from the Statistics of Large Corpora’. In Proceedings of the DARPA Speech and Natural Language Workshop, New York: Morgan Kaufman.

    Google Scholar 

  4. Brown, P.F., Delia Pietra, S.A., Delia Pietra, V.J., Lai, J.C., Mercer, R.L.: 1990, ‘An Estimate of an Upper Bound for the Entropy of English’. Ms.

    Google Scholar 

  5. Brown, P.F., Cocke J., Delia Pietra, S.A., Delia Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., and Roosin, P.S.: 1990, ‘A Statistical Approach to Machine Translation’. Computational Linguistics 16, 79–85.

    Google Scholar 

  6. Chitrao, M., and Grishman, R.: 1990, ‘Statistical Parsing of Messages’. In Proceedings of DARPA Speech and Natural Language Processing Workshop. New York: Morgan Kaufman.

    Google Scholar 

  7. Chomsky, N.: 1957, Syntactic Structures. The Hague: Mouton.

    Google Scholar 

  8. Choueka, Y.: 1988, ‘Looking for Needles in a Haystack: Or, Locating Interesting Collocational Expressions in Large Textual Databases. In Proceedings of the RIA088 Conference on User-Oriented Content-Based Text and Image Handling. Cambridge, MA.

    Google Scholar 

  9. Church, K.W.: 1988, ‘A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text’. In Proceedings of the Second ACL Conference on Applied Natural Language Processing. Austin, Texas.

    Google Scholar 

  10. Church, K.W. and Hanks, P.: 1990, ‘Word Association Norms, Mutual Information and Lexicography’. Computational Linguistics 16, 22–29.

    Google Scholar 

  11. Church, K.W., Hanks, P., and Hindle, D.: forthcoming, ‘Using Statistics in Lexical Analysis’. In Zernik, V., ed. Lexical Acquisition: Using On-line Resources to Build a Lexicon.

    Google Scholar 

  12. Dagan, I., and Itai, A.: 1991 ‘A Statistical Filter for Resolving Pronoun References’. In Proceedings of the 29th Meeting of the ACL, Berkeley.

    Google Scholar 

  13. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R.: 1990, ‘Indexing by Latent Semantic Analysis’. Journal of the American Society for Information Science.

    Google Scholar 

  14. De Marcken, C.G.: 1990, ‘Parsing the LOB Corpus’. In Proceedings of the 28th Annual Meeting of the ACL, Pittsburgh, PA, 243-251.

    Google Scholar 

  15. DeRose, S.J.: 1988, ‘Grammatical Category Disambiguation by Statistical Optimization’. Computational Linguistics 14, 31–39.

    Google Scholar 

  16. Fillmore, C.J., and Atkins, B.T.: forthcoming, ‘Toward a Frame-Based Lexicon: the Semantics of RISK and Its Neighbors’. In Lehrer, A., and Kittay, E. (eds.) Papers in Lexical Semantics.

    Google Scholar 

  17. Gale, W.A. and Church, K.W.: 1990, ‘Poor Estimates of Context Are Worse than None’. In Proceedings of the DARPA Speech and Natural Language Workshop, June 1990.

    Google Scholar 

  18. Hanson, S. J. and Kegl, J.: 1987, ‘PARSNIP: A Connectionist Network That Learns Natural Language Grammar from Exposure to Natural Language Sentences’. In Proceedings of the Cognitive Science Society, Seattle, WA, 106-119.

    Google Scholar 

  19. Hindle, D.: 1990, ‘Noun Classification from Predicate-Argument Structures’. In Proceedings of the 28th Annual Meeting of the ACL, Pittsburgh, PA, 268-275.

    Google Scholar 

  20. Hindle, D. and Rooth., M.: 1990,’ structural Ambiguity and Lexical Relations’. In Proceedings of the DARPA Speech and Natural Language Workshop. June 1990.

    Google Scholar 

  21. Jelinek, F.: 1990, ‘Self-Organized Language Modeling for Speech Recognition’. In Waibel, A., and Lee, K.-F. (eds.), Readings in Speech Recognition, San Mateo, CA: Morgan Kaufman.

    Google Scholar 

  22. Jelinek, F., Lafferty, J.D., and Mercer, R.L.: 1990, Basic Methods of Probabilistic Context Free Grammars. Yorktown Heights: IBM RC 16374 (#72684).

    Google Scholar 

  23. Jelinek, F. and Mercer, R.: 1980, ‘Interpolated Estimation of Markov Source Parameters from Sparse Data’. In Proceedings of the Workshop on Pattern Recognition in Practice. Amsterdam: North-Holland.

    Google Scholar 

  24. Johansson, S., Atwell, E., Garside, R., and Leech, G.: 1986, The Tagged LOB Corpus: User’s Manual. Bergen: Norwegian Computing Centre for the Humanities.

    Google Scholar 

  25. Kernighan, M.D., Church, K.W., and Gale, W.A.: 1990, ‘A Spelling Corrector Based on Error Frequencies’. In Proceedings of the Thirteenth International Conference on Computational Linguistics.

    Google Scholar 

  26. Kroch, A.: 1989 ‘Function and Grammar in the History of English: Periphrastic Do’. In Fasold, R., and Schiffrin, D. (eds.), Language Change and Variation. Amsterdam and Philadelphia: John Benjamins.

    Google Scholar 

  27. Kucera, H. and Francis, W.N.: 1967, Computational Analysis of Present-Day American English. Providence: Brown University Press.

    Google Scholar 

  28. Liberman, M.: 1989, ‘Text on Tap: the ACL/DCI’. In Proceedings of the DARPA Speech and Natural Language Workshop, October 1989. San Mateo, CA.: Morgan Kaufmann.

    Google Scholar 

  29. Miller, G.A., and Chomsky, N.: 1963, ‘Finitary Models of Language Users’. In Luce, R.D., Bush, R.R., and Galanter, E. (eds.), Handbook of Mathematical Psychology. Vol. 2, 419–492. Wiley.

    Google Scholar 

  30. Partee, B., Ter Meulen, A., and Wall, W.: 1990, Mathematical Methods in Linguistics. Dordrecht: Reidel.

    Book  MATH  Google Scholar 

  31. Shannon, C.: 1951, ‘Prediction and Entropy of Printed English’, Bell Systems Technical Journal 30, 50–64.

    MATH  Google Scholar 

  32. Sinclair, J.M. (ed.): 1987, Looking Up: An Account of the COBUILD Project in Lexical Computing. London and Glasgow: Collins.

    Google Scholar 

  33. Smadja, F.: 1989, ‘Macrocoding the Lexicon with Co-occurrence Knowledge’. In Proceedings of the First International Lexical Acquisition Workshop, IJCAI, Detroit, August 1989.

    Google Scholar 

  34. Smadja, F. and McKeown, K.: 1990, ‘Automatically Extracting and Representing Collocations for Language Generation’. In Proceedings of the 28th Annual Meeting of the ACL, Pittsburgh, PA, 252-259.

    Google Scholar 

  35. Srihari, S.N.: 1984, Computer Text Recognition and Error Correction. IEEE Computer Society Press.

    Google Scholar 

  36. Walker, D.: 1989, ‘Developing Lexical Resources’. In Proceedings of the 5th Annual Conference of the UW Centre for the New Oxford English Dictionary, Waterloo, Ontario.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1991 ECSC - EEC - EAEC, Brussels - Luxembourg

About this paper

Cite this paper

Liberman, M.Y. (1991). The Trend towards Statistical Models in Natural Language Processing. In: Klein, E., Veltman, F. (eds) Natural Language and Speech. ESPRIT Basic Research Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-77189-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-77189-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-77191-0

  • Online ISBN: 978-3-642-77189-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics