The Data-Oriented Parsing Approach: Theory and Application

  • Rens Bod
Part of the Studies in Computational Intelligence book series (SCI, volume 115)

Parsing models have many applications in AI, ranging from natural language processing (NLP) and computational music analysis to logic programming and computational learning. Broadly conceived, a parsing model seeks to uncover the underlying structure of an input, that is, the various ways in which elements of the input combine to form phrases or constituents and how those phrases recursively combine to form a tree structure for the whole input. During the last fifteen years, a major shift has taken place from rule-based, deterministic parsing to corpus-based, probabilistic parsing. A quick glance over the NLP literature from the last ten years, for example, indicates that virtually all natural language parsing systems are currently probabilistic. The same development can be observed in (stochastic) logic programming and (statistical) relational learning. This trend towards probabilistic parsing is not surprising: the increasing availability of very large collections of text, music, images and the like allow for inducing statistically motivated parsing systems from actual data.

A corpus-based parsing approach that has been quite successful in various fields of AI, is known as Data-Oriented Parsing or DOP. DOP was originally developed as an NLP technique but has been generalized to music analysis, problem-solving and unsupervised structure learning [7, 8, 14, 81]. The distinctive feature of the DOP approach, when it was first presented, was to model sentence structures on the basis of previously observed frequencies of sentencestructure fragments, without imposing any constraints on the sizeof these fragments. Fragments include, for instance, subtrees of depth 1 (corresponding to context-free rules), as well as entire trees.

Keywords

Natural Language Processing Wall Street Journal Parse Tree Inductive Logic Programming Training Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abeillé A (ed.) (2003) Treebanks. Kluwer Academic Publishers, Dordrecht, The Netherlands.MATHGoogle Scholar
  2. 2.
    Alonso M, Finn E (1996) Physics. Addison Wesley, Reading, MA.Google Scholar
  3. 3.
    Baader F, Nipkow T (1998) Term Rewriting and All That. Cambridge University Press, UK.Google Scholar
  4. 4.
    Black E, Abney S, Flickinger D, Gnadiec C, Grishman R, Harrison P, Hindle D, Ingria R, Jelinek F, Klavans J, Liberman M, Marcus M, Roukos S, Santorini B, Strzalkowski T (1991) A Procedure for quantitatively comparing the syn-tactic coverage of English. In: Proc. 5th DARPA Speech and Natural Language Workshop, Pacific Grove, CA, Morgan Kaufmann, San Mateo, CA: 306-311.Google Scholar
  5. 5.
    Black E, Lafferty J, Roukos S (1992) Development and evaluation of a broad-coverage probabilistic grammar of English-language computer manuals. In: Proc. 30th Association Computer Linguistics Conf. (ACL’92), Newark, DE, Association for Computer Linguistics, Stroudsburg, PA: 185-192.Google Scholar
  6. 6.
    Black E, Garside R, Leech G (1993) Statistically-Driven Computer Grammars of English: The IBM/Lancaster Approach. Rodopi, Amsterdam, The Netherlands.Google Scholar
  7. 7.
    Bod R (1992) Data-oriented parsing. In: Proc. Computational Linguistics Conf. (COLING’92), Nantes, France, Association for Computer Linguistics, Stroudsburg, PA: 854-859.Google Scholar
  8. 8.
    Bod R (1998) Beyond Grammar: An Experience-Based Theory of Language. Stanford: CSLI Publications(Lecture Notes number 88), distributed by Cambridge University Press, Cambridge, UK.Google Scholar
  9. 9.
    Bod R (1999) Context-sensitive spoken dialogue processing with the DOP model. Natural Language Engineering, 54: 309-323.CrossRefGoogle Scholar
  10. 10.
    Bod R (2000) Parsing with the shortest derivation. In: Proc. 18th ACL Compu-tational Linguistics Conf. (COLING’2000), Saarbrücken, Germany, Association for Computer Linguistics, Stroudsburg, PA: 69-75.Google Scholar
  11. 11.
    Bod R (2001) What is the minimal set of subtrees that achieves maximal parse accuracy? In: Proc. 39th Association Computer Linguistics Conf. (ACL’2001), Toulouse, France, Association for Computer Linguistics, Stroudsburg, PA: 66-73.Google Scholar
  12. 12.
    Bod R (2002) A unified model of structural organization in language and music. J. Artificial Intelligence Research, 17: 289-308.MATHGoogle Scholar
  13. 13.
    Bod R (2002) Memory-based models of melodic analysis: challenging the Gestalt principles. J. New Music Research, 311: 27-37.CrossRefGoogle Scholar
  14. 14.
    Bod R (2003) An efficient implementation of a new DOP model. In: Proc. 10th European Association Computer Linguistics Conf. (EACL’03), 12-17 April, Budapest, Hungary, Association for Computer Linguistics, Stroudsburg, PA: 19-26.Google Scholar
  15. 15.
    Bod R(2004) Exemplar-based explanation. In: Proc. Computation and Philosophy Conf. (ECAP04), 3-5 June, Pavia, Italy.Google Scholar
  16. 16.
    Bod R (2005) Modeling scientific problem solving by DOP. In: Proc. Cognitive Science Conf. (CogSci’05). Stresa, Italy: 103.Google Scholar
  17. 17.
    Bod R (2006) Unsupervised parsing with U-DOP. In: Proc. 10th Computational Natural Language Learning Conf. (CONLL’2006), 8-9 June, New York, NY, Association for Computer Linguistics, Stroudsburg, PA: 85-92.Google Scholar
  18. 18.
    Bod R (2006) An all-subtrees approach to unsupervised parsing. In: Proc. ACL Computational Linguistics Conf. (COLING’2006), Sydney, Australia, Association for Computer Linguistics, Stroudsburg, PA: 865-872.Google Scholar
  19. 19.
    Bod R (2006) Towards a general model of applying science. Intl. Studies in the Philosophy of Science, 201: 5-25.CrossRefMathSciNetGoogle Scholar
  20. 20.
    Bod R (2006) Exemplar-based reasoning with the shortest derivation. In: Magnani L (ed.) Model-Based Reasoning in Science and Engineering. College Publications, London, UK: 119-140.Google Scholar
  21. 21.
    Bod R (2006) Exemplar-based syntax: how to get productivity from examples. The Linguistic Review (Special Issue on Exemplar-Based Models in Linguistics), 233: 289-318.Google Scholar
  22. 22.
    Bod R, Kaplan R (1998) A probabilistic corpus-driven model for lexical-functional analysis. In: Proc. ACL Computational Linguistics Conf. (COLING’98), 10-14 August, Montreal, Canada, Association for Computer Linguistics, Stroudsburg, PA: 145-152.Google Scholar
  23. 23.
    Bod R, Hay J, Jannedy S (eds.) (2003) Probabilistic Linguistics. MIT Press, Cambridge, MA.MATHGoogle Scholar
  24. 24.
    Bod R, Scha R, Sima’an K (eds.) (2003) Data-Oriented Parsing. University of Chicago Press, Chicago, IL.Google Scholar
  25. 25.
    Bod R, Kaplan R (2003) A DOP model for lexical-functional grammar. In: Bod R, Scha R, Sima’an K (eds.) (2003) Data-Oriented Parsing. University of Chicago Press, Chicago, IL.Google Scholar
  26. 26.
    Bonnema R, Bod R, Scha R (1997) A DOP model for semantic interpretation. In: Proc. 4th European Association Computer Linguistics Conf. (EACL’97), Madrid, Spain, Association for Computer Linguistics, Stroudsburg, PA: 159-167.Google Scholar
  27. 27.
    Briscoe T, Waegner N (1992) Robust stochastic parsing using the inside-outside algorithm. In: Proc. AAAI Workshop Statistically-Based Techniques in Natural Language Processing, Menlo Park, CA, AAAI Press/MIT Press, Cambridge, MA: 39-53.Google Scholar
  28. 28.
    Carbonell J (1993) Derivational analogy: a theory of reconstructive problem solving and expertise acquisition. In: Michalski RS, Carbonell J, Mitchell T (eds.) Machine Learning II. Morgan Kaufmann, San Francisco, CA: 371-392.Google Scholar
  29. 29.
    Charniak E (1997) Statistical techniques for natural language parsing. AI Magazine, Winter: 32-43.Google Scholar
  30. 30.
    Charniak E (2000) A maximum-entropy-inspired parser. In: Proc. 1st North American ACL Chapter Conf. (ANLP-NAACL’2000), Seattle, WA, Morgan Kaufmann, San Francisco, CA: 132-139.Google Scholar
  31. 31.
    Chater N (1999) The search for simplicity: a fundamental cognitive principle? The Quarterly J. Experimental Psychology, 52A2: 273-302.Google Scholar
  32. 32.
    Chiang D (2000) Statistical parsing with an automatically extracted tree adjoining grammar. In: Proc. 38th Association Computer Linguistics Conf. (ACL’2000), October, Hong Kong, China, Association for Computer Linguistics, Stroudsburg, PA: 456-463.Google Scholar
  33. 33.
    Clark A (2001) Unsupervised induction of stochastic context-free gram-mars using distributional clustering. In: Proc. Computational Natural Lan-guage Learning Conf. (CoNLL’2001), July, Toulouse, France, Association for Computer Linguistics, Stroudsburg, PA: 97-104.Google Scholar
  34. 34.
    Chomsky N (1965) Aspects of the Theory of Syntax. MIT Press, Cambridge MA.Google Scholar
  35. 35.
    Collins M (1996) A new statistical parser based on Bigram lexical dependen-cies. In: Proc. 34th Association Computer Linguistics Conf. (ACL’96), 23-28 June, Santa Cruz, CA, Association for Computer Linguistics, Stroudsburg, PA: 184-191.Google Scholar
  36. 36.
    Collins M (1997) Three generative lexicalised models for statistical parsing. In: Proc. 35th Association Computer Linguistics Conf. (ACL’97), July, Madrid, Spain, Association for Computer Linguistics, Stroudsburg, PA: 16-23.Google Scholar
  37. 37.
    Collins M (1999) Head-Driven Statistical Models for Natural Language Parsing. PhD Thesis, University of Pennsylvania, PA.Google Scholar
  38. 38.
    Collins M (2000) Discriminative reranking for natural language parsing. In: Proc. 17th Intl. Conf. Machine Learning (ICML-2000), Stanford, CA: 175-182.Google Scholar
  39. 39.
    Collins M, Duffy N (2001) Convolution kernels for natural language. In: Dietrich TG, Becker S, Gharamani Z (eds.) Advances in NIPS 14 (Proc. NIPS’2001), 3-8 December, Vancouver, Canada, MIT Press, Cambridge, MA: 617-624.Google Scholar
  40. 40.
    Collins M, Duffy N (2002) New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: Proc. 38th Asso-ciation Computer Linguistics Conf. (ACL’2002), Philadelphia, PA, Association for Computer Linguistics, Stroudsburg, PA: 263-270.Google Scholar
  41. 41.
    Conklin D (2006) Melodic analysis with segment classes. Machine Learning, 652-3: 349-360.CrossRefGoogle Scholar
  42. 42.
    Cussens J (2001) Parameter estimation in stochastic logic programs. Machine Learning, 443: 245-271.MATHCrossRefGoogle Scholar
  43. 43.
    Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society, 39: 1-38.MATHMathSciNetGoogle Scholar
  44. 44.
    De Raedt L, Kersting K (2004) Probabilistic inductive logic programming. In: Proc. Algorithmic Learning Theory (ALT) Conf., Lecture Notes in Computer Science 3244, Springer-Verlag, Berlin: 19-36.Google Scholar
  45. 45.
    Douglas J, Matthews R (1996) Fluid Mechanics 1 (3rd ed.). Longman, Essex, UK.Google Scholar
  46. 46.
    Eisner J (1996) Three new probabilistic models for dependency parsing: an exploration. In: Proc. 18th ACL Computational Linguistics Conf. (COL-ING’96), August, Copenhagen, Denmark, Association for Computer Linguistics, Stroudsburg, PA: 340-345.Google Scholar
  47. 47.
    Ferrand M, Nelson P, Wiggins G (2003) Unsupervised learning of melodic seg-mentation: a memory-based approach. In: Proc. 5th European Society for the Cognitive Sciences of Music Conf. (ESCOM’2003), 8-13 September, Hanover, Germany.Google Scholar
  48. 48.
    Frazier L (1978) On Comprehending Sentences: Syntactic Parsing Strategies. PhD Thesis, University of Connecticut.Google Scholar
  49. 49.
    Fujisaki T, Jelinek F, Cocke J, Black E, Nishino T (1989) A probabilistic method for sentence disambiguation. In: Proc. 1st Intl. Workshop Parsing Technologies, 28-31 August, Pittsburgh, PA: 85-94.Google Scholar
  50. 50.
    Gahl S, Garnsey S (2004) Knowledge of grammar, knowledge of usage: syntactic probabilities affect pronunciation variation. Language, 804: 748-775.CrossRefGoogle Scholar
  51. 51.
    Giere R (1988) Explaining Science: A Cognitive Approach. University of Chicago Press, Chicago, IL.Google Scholar
  52. 52.
    Goldberg A (2006) Constructions at Work. Oxford University Press, Oxford, UK.Google Scholar
  53. 53.
    Goodman J (1996) Efficient algorithms for parsing the DOP model. In: Proc. Empirical Methods in Natural Language Processing, Philadelphia, PA: 143-152.Google Scholar
  54. 54.
    Goodman J (2003) Efficient parsing of DOP with PCFG-reductions. In: Bod R, Scha R, Sima’an K (eds.) Data-Oriented Parsing. University of Chicago Press, Chicago, IL.Google Scholar
  55. 55.
    Hearne M, Way A (2003) Seeing the wood for the trees: data-oriented translation. In: Proc. Machine Translation Summit IX, September, New Orleans, LO: 165-172.Google Scholar
  56. 56.
    Hearne M, Way A (2004) Data-oriented parsing and the Penn Chinese Treebank. In: Proc. 1st Intl. Joint Conf. Natural Language Processing, May, Hainan Island, China: 406-413.Google Scholar
  57. 57.
    Hearne M, Way A (2006) Disambiguation strategies for data-oriented transla-tion. In: Proc. 11th Intl. Conf. European Association for Machine Translation, 19-20 June, Oslo, Norway.Google Scholar
  58. 58.
    Hoogweg L (2003) Extending DOP with insertion. In: Bod R, Scha R, Sima’an K (eds.) Data-Oriented Parsing. University of Chicago Press, Chicago, IL.Google Scholar
  59. 59.
    Huron D (1996) The melodic arch in western folksongs. Computing in Musicology, 10: 2-23.Google Scholar
  60. 60.
    Johnson M(1998) PCFG models of linguistic tree representations. Computational Linguistics, 24(4): 613-632.Google Scholar
  61. 61.
    Johnson M (2002) The DOP estimation method is biased and inconsistent. Computational Linguistics, 281: 71-76.CrossRefGoogle Scholar
  62. 62.
    Jurafsky D (2003) Probabilistic modeling in psycholinguistics. In: Bod R, Scha R, Sima’an K (eds) Data-Oriented Parsing. University of Chicago Press, Chicago, IL: 39-96.Google Scholar
  63. 63.
    Klein D (2005) The unsupervised learning of natural language structure. PhD Thesis, Department of Computer Science, Stanford University, CA.Google Scholar
  64. 64.
    Klein D, Manning C (2002) A general constituent-context model for improved grammar induction. In: Proc. 40th Association Computer Linguistics Conf. (ACL’2002), July, Philadelphia, PA, Association for Computer Linguistics, Stroudsburg, PA: 128-135.Google Scholar
  65. 65.
    Klein D, Manning C (2004) Corpus-based induction of syntactic structure: models of dependency and constituency. Proc. 42nd Association Computer Linguistics Conf. (ACL’2004), 21-26 July, Barcelona, Spain, Association for Computer Linguistics, Stroudsburg, PA: 438.Google Scholar
  66. 66.
    Kudo T, Suzuki J, Isozaki H (2005) Boosting-based parse reranking with subtree features. In: Proc. 43rd Association Computer Linguistics Conf. (ACL’2005), June, Ann Arbor, MI, Association for Computer Linguistics, Stroudsburg, PA: 189-196.Google Scholar
  67. 67.
    Kuhn T (1970) The Structure of Scientific Revolutions (2nd ed.). University of Chicago Press, Chicago, IL.Google Scholar
  68. 68.
    Lerdahl F, Jackendoff R (1983) A Generative Theory of Tonal Music. MIT Press, Cambridge, MA.Google Scholar
  69. 69.
    Longuet-Higgins H (1976) Perception of melodies. Nature, 263, October 21: 646-653.CrossRefGoogle Scholar
  70. 70.
    Longuet-Higgins H, Lee C (1987) The rhythmic interpretation of monophonic music. In: Longuet-Higgins H (ed.) Mental Processes: Studies in Cognitive Science, MIT Press, Cambridge, MA.Google Scholar
  71. 71.
    Makatchev M, Jordan P, VanLehn K (2004) Abductive theorem proving for analyzing student explanations to guide feedback in intelligent tutoring systems. J. Automated Reasoning, (Special Issue: Automated Reasoning and Theorem Proving in Education), 323: 187-226.Google Scholar
  72. 72.
    Manning C (2003) Probabilistic syntax. In: Bod R, Hay J, Jannedy S (eds.) Probabilistic Linguistics. MIT Press, Cambridge, MA: 289-342.Google Scholar
  73. 73.
    Manning C, Schuetze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.MATHGoogle Scholar
  74. 74.
    Marcus M, Santorini B, Marcinkiewicz M(1993) Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313-330.Google Scholar
  75. 75.
    McClosky D, Charniak E, Johnson M (2006) Effective self-training for parsing. In: Proc. North American Chapter of ACL Conf. Human Language Technol-ogy (NAACL-HLT 2006), June, New York, NY, Association for Computer Linguistics, Stroudsburg, PA: 152-159.Google Scholar
  76. 76.
    Mitchell T, Keller R, Kedar-Cabelli S (1986) Explanation-based learning: a unifying view. Machine Learning, 1: 47-80.Google Scholar
  77. 77.
    Mooney J, Zelle J (1994) Integrating ILP and EBL. SIGART Bulletin, 51: 12-21.CrossRefGoogle Scholar
  78. 78.
    Muggleton S (1996) Stochastic logic programs. In: De Raed L (ed.) Advances in Inductive Logic Programming (Proc. 5th Intl. Workshop Inductive Logic Programming), IOS Press, Amsterdam, The Netherlands: 254-264.Google Scholar
  79. 79.
    Neumann G (2003) A data-oriented approach to HPSG. In: Bod R, Scha R, Sima’an K (eds) Data-Oriented Parsing. University of Chicago Press, Chicago, IL.Google Scholar
  80. 80.
    Pereira F, Schabes Y (1992) Inside-outside reestimation from partially bracketed corpora. In: In: Proc. 30th Association Computer Linguistics Conf. (ACL’92), Newark, DL, Association for Computer Linguistics, Stroudsburg, PA: 128-135.Google Scholar
  81. 81.
    Scha R (1990) Taaltheorie en taaltechnologie; competence en performance. In: de Kort Q, Leerdam G (eds) Computertoepassingen in de Neerlandistiek. Landelijke Vereniging van Neerlandici (LVVN-jaarboek), Almere, The Netherlands.Google Scholar
  82. 82.
    Schaffrath H (1995) The Essen Folksong Collection in the Humdrum Kern Format. In: Huron D (ed.) Probabilistic Grammars for Music. Center for Computer Assisted Research in the Humanities, Menlo Park, CA.Google Scholar
  83. 83.
    Sima’an K (1996) Computational complexity of probabilistic disambiguation by means of tree grammars. In: Proc. 14th Computational Linguistics Conf. (COLING’96), 5-9 August, Copenhagen, Denmark, Association for Computer Linguistics, Stroudsburg, PA: 1175-1180.Google Scholar
  84. 84.
    Sima’an K (1999) Learning Efficient Disambiguation. ILLC Dissertation Series 1999-02, Utrecht University, The Netherlands.Google Scholar
  85. 85.
    Sima’an K, Itai A, Winter Y, Altman A, Nativ N (2001) Building a tree-bank of modern Hebrew text. J. Traitement Automatique des Langues (Special Issue on Natural Language Processing and Corpus Linguistics), 422: 347-380.Google Scholar
  86. 86.
    Temperley D (2001) The Cognition of Basic Musical Structures. MIT Press, Cambridge, MA.Google Scholar
  87. 87.
    Tomasello M(2003) Constructing a Language. Harvard University Press, Harvard, MA.Google Scholar
  88. 88.
    Van Lehn K (1998) Analogy events: how examples are used during problem solving. Cognitive Science, 223: 347-388.CrossRefGoogle Scholar
  89. 89.
    van Zaanen M (2000) ABL: alignment-based learning. In: Proc. 18th Compu-tational Linguistics Conf. (COLING’2000), 31 July - 4 August, Saarbrücken, Germany, Association for Computer Linguistics, Stroudsburg, PA: 961-967.Google Scholar
  90. 90.
    van Zaanen M (2002) Bootstrapping Structure into Language. PhD thesis. School of Computing, University of Leeds, UK.Google Scholar
  91. 91.
    van Zaanen M, Bod R, Honing H (2003) A memory-based approach to meter induction. In: Proc. 5th European Society for the Cognitive Sciences of Music Conf. (ESCOM5), September, Hanover, Germany: 250-253.Google Scholar
  92. 92.
    Veloso M, Carbonell J (1993) Derivational analogy in PRODIGY: automating case acquisition, storage, and utilization. Machine Learning, 103: 249-278.CrossRefGoogle Scholar
  93. 93.
    Wertheimer M (1923) Untersuchungen zur lehre von der gestalt. Psychologische Forschung, 4: 301-350.CrossRefGoogle Scholar
  94. 94.
    Younger D (1967) Recognition and parsing of context-free languages in time n3. Information and Control, 102: 189-208.MATHCrossRefGoogle Scholar
  95. 95.
    Zollmann A, Sima’an, K (2005) A consistent and efficient estimator for data-oriented parsing. J. Automata, Languages and Combinatorics, 10: 367-388.MATHMathSciNetGoogle Scholar
  96. 96.
    Zuidema W (2006) What are the productive units of natural language gram-mar? A DOP approach to the automatic identification of constructions. In: Proc. 10th Computational Natural Language Learning Conf. (CONLL’2006), 8-9 June, New York, NY, Association for Computer Linguistics, Stroudsburg, PA: 29-36.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Rens Bod
    • 1
  1. 1.School of Computer ScienceUniversity of St. AndrewsScotland

Personalised recommendations