Basic Notions of Parsing

Hausser, Roland

doi:10.1007/978-3-642-41431-2_9

Roland Hausser²

2517 Accesses

Abstract

This chapter investigates which formal properties make a formal grammar suitable for automatic language analysis and which do not. For this purpose, context-free PS Grammar and its parsers will be used as our main example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As explained in Sect. 1.3, a parser is a computer program which takes language expressions as input and produces some other representation, e.g., a grammatical analysis, as output.
2.
For example, a context-free PS Grammar like 7.1.3, 8.3.4, 8.3.5, or a C LAG like 10.2.2, 10.2.3, 11.5.1, 11.5.2, 11.5.4, 11.5.6–11.5.8.
3.
There is general agreement that the formal rules of a grammar should not be formulated in the programming language directly. Such an implicit use of a formal grammar has the disadvantage that the resulting computer program does not show which of its properties are theoretically accidental (reflecting the programming environment or stylistic idiosyncrasies of the programmer) and which are theoretically necessary (reflecting the formal analysis of the language described). Another disadvantage of failing to separate between the formal grammar and the parsing algorithm is that the software works only for a single language rather than a whole subtype of formal grammars and their languages.
4.
“Literally trillions and trillions of rules,” Shieber et al. (1983).
5.
Uszkoreit and Peters (1986).
6.
For example, context-free a ^k b ^k (7.1.3, 10.2.2) and context-sensitive a ^k b ^k c ^k (8.3.5, 10.2.3) are classified in LA Grammar as elements of the same linear class of C1 LAGs. Correspondingly, context-free WW ^R (8.3.5, 11.5.4) and context-sensitive WW (11.5.6) are classified in LA Grammar as elements of the same polynomial (n ²) class of C2 LAGs.
7.
See for example Church (1956) , p. 52, footnote 119.
8.
Jay Earley developed his parsing algorithm for context-free PS Grammars in his dissertation at the Computer Science Department of Carnegie Mellon University in Pittsburgh, USA. After this achievement he became a clinical psychologist specializing in group therapy and Internal Family Systems Therapy (IFS).
9.
The second state is also still active at this point because it allows a further predictor-operation, resulting in the new states

a a a a. b b b b

a a a a. S b b b b

The subsequent scan-operations on the input

a a a b. b b

do not succeed, however.
10.
According to the writing conventions of the Greco-Roman tradition.
11.
Chomsky has emphasized tirelessly that it was not the goal of his nativist program to model the communication procedure of the speaker-hearer. See for example Chomsky (1965) , p. 9.
12.
In the history of science, a field of research is regarded as developing positively if its different areas converge, i.e., if improvements in one area lead to improvements in others. Conversely, a field of science is regarded as developing negatively if improvements in one area lead to a deterioration in other areas.
13.
This number is computed on the basis of the open alternatives within the nativist theory of language. McCawley’s calculation amounts to an average of 2 055 different grammar theories a day, or a new theory every 42 seconds for the duration of 40 years.
14.
PS Grammar in combination with constituent structure analysis exhibits descriptive aporia in, for example, declarative main clauses of German such as Peter hat die Tür geschlossen. Because of its discontinuous elements, this sentence cannot be provided with a legal constituent structure analysis within context-free PS Grammar (8.5.1 and 8.5.2).
To resolve this and other problems, transformations were added to context-free PS Grammar (8.5.3). This resulted in many problems of the type embarrassment of riches. For example, there arose the question of whether the main clauses of German should be derived transformationally from the deep structure of subordinate clauses (e.g., weil Peter die Tür geschlossen hat) or vice versa.
The transformational derivation of main clauses from subordinate clauses, championed by Bach (1962), was motivated by the compliance of subordinate clauses with constituent structure, while the transformational derivation of subordinate clauses from main clauses, advocated by Bierwisch (1963), was motivated by the feeling that main clauses are more basic. For a treatment of German main and subordinate clauses without transformations see Chap. 18, especially Sect. 18.5.
15.
Changing from one formalism to another may be costly. It means giving up the hard-earned status of expert in a certain area, implies revising or discarding decades of previous work, and may have major social repercussions in one’s peer group, both at home and abroad.
However, fields of science are known to sometimes shift in unexpected ways such that established research groups suddenly find their funds, their influence, and their membership drastically reduced. When this happens, changing from one formalism to another may become a necessity of survival.
In light of these possibilities, the best long-term strategy is to evaluate scientific options, such as the choice of grammar formalism, rationally. As in all good science, linguistic research should be conducted with conviction, based on the broadest and deepest possible knowledge of the field.

References

Bach, E. (1962) “The Order of Elements in a Transformational Grammar of German,” Language 38:263–269
Article Google Scholar
Bar-Hillel, Y. (1964) Language and Information. Selected Essays on Their Theory and Application, Reading: Addison-Wesley
MATH Google Scholar
Berwick, R.C., and A.S. Weinberg (1984) The Grammatical Basis of Linguistic Performance: Language Use and Acquisition, Cambridge: MIT Press
Google Scholar
Bierwisch, M. (1963) Grammatik des deutschen Verbs, Studia Grammatica II, Berlin: Akademie-Verlag
Google Scholar
Chomsky, N. (1957) Syntactic Structures, The Hague: Mouton
Google Scholar
Chomsky, N. (1965) Aspects of the Theory of Syntax, Cambridge: MIT Press
Google Scholar
Church, A. (1956) Introduction to Mathematical Logic, Vol. I, Princeton: Princeton University Press
Google Scholar
Earley, J. (1970) “An Efficient Context-Free Parsing Algorithm,” Communications of the ACM 2:94, reprinted in B. Grosz, K. Sparck Jones, and B.L. Webber (eds.), 1986
Article Google Scholar
Gazdar, G. (1982) “Phrase structure grammar,” in P. Jacobson and G.K. Pullum (eds.)
Google Scholar
Harman, G. (1963) “Generative Grammar Without Transformational Rules: A Defense of Phrase Structure,” Language 39:597–616
Article Google Scholar
Harris, Z. (1951) Method in Structural Linguistics, Chicago: University of Chicago Press
Google Scholar
Hockett, C.F. (1958) A Course in Modern Linguistics, New York: Macmillan
Google Scholar
Hopcroft, J.E., and J.D. Ullman (1979) Introduction to Automata Theory, Languages, and Computation, Reading: Addison-Wesley
MATH Google Scholar
Kay, M. (1980) “Algorithmic Schemata and Data Structures in Syntactic Processing”, reprinted in Grosz, Sparck Jones, Webber (eds.), 1986
Google Scholar
McCawley, J.D. (1982) Thirty Million Theories of Grammar, Chicago: The University of Chicago Press
Google Scholar
Miller, G., and N. Chomsky (1963) “Finitary Models of Language Users,” in R. Luce, R. Bush, and E. Galanter (eds.)
Google Scholar
Post, E. (1936) “Finite Combinatory Processes – Formulation I,” The Journal of Symbolic Logic I:103
Article Google Scholar
Shieber, S. (1985) “Evidence Against the Non-Contextfreeness of Natural Language,” Linguistics and Philosophy 8:333–343
Article Google Scholar
Shieber, S., S. Stucky, H. Uszkoreit, and J. Robinson (1983) “Formal Constraints on Metarules,” Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics, Cambridge, Mass
Google Scholar
Uszkoreit, H., and S. Peters (1986) “On Some Formal Properties of Metarules,” Report CSLI-85-43, Stanford University: Center for the Study of Language and Information
Google Scholar

Download references

Author information

Authors and Affiliations

Abteilung für Computerlinguistik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Roland Hausser

Authors

Roland Hausser
View author publications
You can also search for this author in PubMed Google Scholar

Exercises

Section 9.1

1.
What is the origin of the term parser and what are the functions of a parser?
2.
What is the relation between morphology, syntax, and semantic parsers, and how do they differ?
3.
Describe two different ways of using a formal grammar in a parser and evaluate the alternatives.
4.
Explain the notions declarative and procedural. How do they show up in parsers?
5.
Is it possible to write different parsers for the same grammatical formalism?

Section 9.2

1.
Describe a context-free structure in natural language.
2.
Are there context-sensitive structures in natural language?
3.
What follows from the assumption that natural language is not context-free? Does the answer depend on the grammar formalism used? Would it be possible to parse natural language in linear time even if it were context-sensitive?
4.
Explain the possible equivalence relations between two formalisms of grammar.
5.
How is bidirectional C Grammar related to the kinds of PS Grammar?
6.
Do artificial languages depend on their formal grammars?
7.
Do language classes depend on formalisms of grammar?
8.
What impact has the complexity of a language class on the possible existence of a practical parser for it?
9.
What is the inherent complexity of a language and how is it determined?

Section 9.3

1.
Explain the notion of type transparency.
2.
For what purpose did Post (1936) develop his production systems?
3.
When are a grammar formalism and a parser input-output equivalent?
4.
What is the difference between a top-down and a bottom-up derivation in a context-free PS Grammar?
5.
Why is it that a context-free PS Grammar is not input-output equivalent with its parsers? Base your explanation on a top-down and a bottom-up derivation.
6.
Explain the functioning of the Earley algorithm using an expression of a ^k b ^k. How does the Earley algorithm manage to get around the substitution-based derivation order of PS Grammar?
7.
Explain how the Earley algorithm makes crucial use of the pairwise inverse structure of context-free PS Grammars.
8.
Is it possible to parse a ^k b ^k c ^k using the Earley algorithm?
9.
Are there type-transparent parsers for context-free PS Grammar?
10.
Name two practical disadvantages of an automatic language analysis which is not type-transparent.

Section 9.4

1.
Explain why a nativist theory of language based on PS Grammar is incompatible with the principle form follows function, using the notion of input-output equivalence.
2.
Demonstrate with an example that the derivation order of PS Grammar is incompatible with the time-linear structure of natural language.
3.
Does an additional transformational component (Sects. 2.4 and 8.5) diminish or increase the incompatibility between the PS Grammar derivation order and the time-linear order of natural language?

Section 9.5

1.
Explain the notion of convergence, as used in the history of science.
2.
How does a lack of convergence show up in the historical development of nativism, and what are its reasons?
3.
How did McCawley calculate the number in his title Thirty Million Theories of Grammar? Does this number indicate a positive development in linguistics?
4.
Why can changing to another grammar formalism be costly?
5.
Do you see a relation between ‘descriptive aporia’ and ‘embarrassment of riches,’ on the one hand, and the proposal of ever new derived formalisms with high mathematical complexity, on the other?
6.
Describe the mathematical, computational, and empirical properties of PS Grammar.
7.
Which desiderata must be satisfied by a formal grammar in order for it to be suitable for a computational analysis of natural language?

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hausser, R. (2014). Basic Notions of Parsing. In: Foundations of Computational Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41431-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-41431-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41430-5
Online ISBN: 978-3-642-41431-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Basic Notions of Parsing

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Exercises

Exercises

Section 9.1

Section 9.2

Section 9.3

Section 9.4

Section 9.5

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation