The Trend towards Statistical Models in Natural Language Processing

Liberman, Mark Y.

doi:10.1007/978-3-642-77189-7_1

The Trend towards Statistical Models in Natural Language Processing

Mark Y. Liberman³

Conference paper

89 Accesses
4 Citations

Part of the book series: ESPRIT Basic Research Series ((ESPRIT BASIC))

Abstract

Over the past few years, we have seen a significant increase in the number and sophistication of computational studies of large bodies of text and speech. Such studies have a wide variety of topics and motives, from lexicography and studies of language change, to methods for automated indexing and information retrieval, tagging and parsing algorithms, techniques for generating idiomatic text, cognitive models of language acquisition, and statistical models for application in speech recognizers, text or speech compression schemes, optical character readers, machine translation systems, and spelling correctors.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ACL: 1989, ‘ACL Data Collection Initiative Announcement’, The Finite String 15.
Google Scholar
Bahl, L.B., Brown, P.F., de Souza, P.V., and Mercer, R.L.: 1990, ‘A Tree-Based Statistical Language Model for Natural Language Speech Recognition’. In Waibel, A., and Lee, K.-F., Readings in Speech Recognition, San Mateo, CA: Morgan Kaufman.
Google Scholar
Brill, E., Magerman, D., Marcus, M., and Santorini, B.: 1990, ‘Deducing Linguistic Structure from the Statistics of Large Corpora’. In Proceedings of the DARPA Speech and Natural Language Workshop, New York: Morgan Kaufman.
Google Scholar
Brown, P.F., Delia Pietra, S.A., Delia Pietra, V.J., Lai, J.C., Mercer, R.L.: 1990, ‘An Estimate of an Upper Bound for the Entropy of English’. Ms.
Google Scholar
Brown, P.F., Cocke J., Delia Pietra, S.A., Delia Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., and Roosin, P.S.: 1990, ‘A Statistical Approach to Machine Translation’. Computational Linguistics 16, 79–85.
Google Scholar
Chitrao, M., and Grishman, R.: 1990, ‘Statistical Parsing of Messages’. In Proceedings of DARPA Speech and Natural Language Processing Workshop. New York: Morgan Kaufman.
Google Scholar
Chomsky, N.: 1957, Syntactic Structures. The Hague: Mouton.
Google Scholar
Choueka, Y.: 1988, ‘Looking for Needles in a Haystack: Or, Locating Interesting Collocational Expressions in Large Textual Databases. In Proceedings of the RIA088 Conference on User-Oriented Content-Based Text and Image Handling. Cambridge, MA.
Google Scholar
Church, K.W.: 1988, ‘A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text’. In Proceedings of the Second ACL Conference on Applied Natural Language Processing. Austin, Texas.
Google Scholar
Church, K.W. and Hanks, P.: 1990, ‘Word Association Norms, Mutual Information and Lexicography’. Computational Linguistics 16, 22–29.
Google Scholar
Church, K.W., Hanks, P., and Hindle, D.: forthcoming, ‘Using Statistics in Lexical Analysis’. In Zernik, V., ed. Lexical Acquisition: Using On-line Resources to Build a Lexicon.
Google Scholar
Dagan, I., and Itai, A.: 1991 ‘A Statistical Filter for Resolving Pronoun References’. In Proceedings of the 29th Meeting of the ACL, Berkeley.
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R.: 1990, ‘Indexing by Latent Semantic Analysis’. Journal of the American Society for Information Science.
Google Scholar
De Marcken, C.G.: 1990, ‘Parsing the LOB Corpus’. In Proceedings of the 28th Annual Meeting of the ACL, Pittsburgh, PA, 243-251.
Google Scholar
DeRose, S.J.: 1988, ‘Grammatical Category Disambiguation by Statistical Optimization’. Computational Linguistics 14, 31–39.
Google Scholar
Fillmore, C.J., and Atkins, B.T.: forthcoming, ‘Toward a Frame-Based Lexicon: the Semantics of RISK and Its Neighbors’. In Lehrer, A., and Kittay, E. (eds.) Papers in Lexical Semantics.
Google Scholar
Gale, W.A. and Church, K.W.: 1990, ‘Poor Estimates of Context Are Worse than None’. In Proceedings of the DARPA Speech and Natural Language Workshop, June 1990.
Google Scholar
Hanson, S. J. and Kegl, J.: 1987, ‘PARSNIP: A Connectionist Network That Learns Natural Language Grammar from Exposure to Natural Language Sentences’. In Proceedings of the Cognitive Science Society, Seattle, WA, 106-119.
Google Scholar
Hindle, D.: 1990, ‘Noun Classification from Predicate-Argument Structures’. In Proceedings of the 28th Annual Meeting of the ACL, Pittsburgh, PA, 268-275.
Google Scholar
Hindle, D. and Rooth., M.: 1990,’ structural Ambiguity and Lexical Relations’. In Proceedings of the DARPA Speech and Natural Language Workshop. June 1990.
Google Scholar
Jelinek, F.: 1990, ‘Self-Organized Language Modeling for Speech Recognition’. In Waibel, A., and Lee, K.-F. (eds.), Readings in Speech Recognition, San Mateo, CA: Morgan Kaufman.
Google Scholar
Jelinek, F., Lafferty, J.D., and Mercer, R.L.: 1990, Basic Methods of Probabilistic Context Free Grammars. Yorktown Heights: IBM RC 16374 (#72684).
Google Scholar
Jelinek, F. and Mercer, R.: 1980, ‘Interpolated Estimation of Markov Source Parameters from Sparse Data’. In Proceedings of the Workshop on Pattern Recognition in Practice. Amsterdam: North-Holland.
Google Scholar
Johansson, S., Atwell, E., Garside, R., and Leech, G.: 1986, The Tagged LOB Corpus: User’s Manual. Bergen: Norwegian Computing Centre for the Humanities.
Google Scholar
Kernighan, M.D., Church, K.W., and Gale, W.A.: 1990, ‘A Spelling Corrector Based on Error Frequencies’. In Proceedings of the Thirteenth International Conference on Computational Linguistics.
Google Scholar
Kroch, A.: 1989 ‘Function and Grammar in the History of English: Periphrastic Do’. In Fasold, R., and Schiffrin, D. (eds.), Language Change and Variation. Amsterdam and Philadelphia: John Benjamins.
Google Scholar
Kucera, H. and Francis, W.N.: 1967, Computational Analysis of Present-Day American English. Providence: Brown University Press.
Google Scholar
Liberman, M.: 1989, ‘Text on Tap: the ACL/DCI’. In Proceedings of the DARPA Speech and Natural Language Workshop, October 1989. San Mateo, CA.: Morgan Kaufmann.
Google Scholar
Miller, G.A., and Chomsky, N.: 1963, ‘Finitary Models of Language Users’. In Luce, R.D., Bush, R.R., and Galanter, E. (eds.), Handbook of Mathematical Psychology. Vol. 2, 419–492. Wiley.
Google Scholar
Partee, B., Ter Meulen, A., and Wall, W.: 1990, Mathematical Methods in Linguistics. Dordrecht: Reidel.
Book MATH Google Scholar
Shannon, C.: 1951, ‘Prediction and Entropy of Printed English’, Bell Systems Technical Journal 30, 50–64.
MATH Google Scholar
Sinclair, J.M. (ed.): 1987, Looking Up: An Account of the COBUILD Project in Lexical Computing. London and Glasgow: Collins.
Google Scholar
Smadja, F.: 1989, ‘Macrocoding the Lexicon with Co-occurrence Knowledge’. In Proceedings of the First International Lexical Acquisition Workshop, IJCAI, Detroit, August 1989.
Google Scholar
Smadja, F. and McKeown, K.: 1990, ‘Automatically Extracting and Representing Collocations for Language Generation’. In Proceedings of the 28th Annual Meeting of the ACL, Pittsburgh, PA, 252-259.
Google Scholar
Srihari, S.N.: 1984, Computer Text Recognition and Error Correction. IEEE Computer Society Press.
Google Scholar
Walker, D.: 1989, ‘Developing Lexical Resources’. In Proceedings of the 5th Annual Conference of the UW Centre for the New Oxford English Dictionary, Waterloo, Ontario.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Pennsylvania, USA
Mark Y. Liberman

Authors

Mark Y. Liberman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre of Cognitive Science, University of Edinburgh, 2 Buccleuch Place, Edinburgh, EH8 9LW, Scotland
Ewan Klein
Institute for Language, Logic and Computation, University of Amsterdam, Nieuwe Doelenstraat 15, 1012 CP, Amsterdam, The Netherlands
Frank Veltman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liberman, M.Y. (1991). The Trend towards Statistical Models in Natural Language Processing. In: Klein, E., Veltman, F. (eds) Natural Language and Speech. ESPRIT Basic Research Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-77189-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-77189-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-77191-0
Online ISBN: 978-3-642-77189-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics