Formal Grammar

Hausser, Roland

doi:10.1007/978-3-642-41431-2_7

Roland Hausser²

2540 Accesses

Abstract

Part I has explained the mechanism of natural communication, based on the [2+1] level structure of the Slim theory of language and different kinds of language signs. Part II turns to the combinatorial building up of complex signs within the grammar component of syntax. The methods are those of formal language theory, a wide field reaching far into the foundations of mathematics and logic. The purpose here is to introduce the linguistically relevant concepts and formalisms as simply as possible, explaining their historical origin and motivation as well as their different strengths and weaknesses. Formal proofs will be kept to a minimum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In formal language theory, the lexicon of an artificial language is sometimes called the alphabet, a word a letter, and a sentence a word. From a linguistic point of view this practice is unnecessarily misleading. Therefore a basic expression of an artificial or a natural language is called here uniformly a word (even if the word consists of only a single letter, e.g., a) and a complete well-formed expression is called here uniformly a sentence (even if it consists only of a sequence of one-letter-words, e.g., aaabbb).
2.
In other words: the free monoid over LX equals \(\mathrm{LX}^{\mathsf{+}} \cup \{\varepsilon\}\) (Harrison 1978, p. 3).
3.
The subsets of infinite sets may themselves be infinite. For example, the even numbers, e.g., 2,4,6, … form an infinite subset of the natural numbers 1,2,3,4,5, …. The latter are formed from the finite lexicon of the digits 1,2,3,4,5,6,7,8,9, and 0 by means of concatenation, e.g., 12 or 21.
4.
This is because an explicit list of the well-formed sentences is finite by nature. Therefore it would be impossible to make a list of, for example, all the natural numbers. Instead the infinitely many surfaces of possible natural numbers are produced from the digits via the structural principle of concatenation.
5.
A detailed introduction to PS Grammar is given in Chap. 8.
6.
This latter approach is taken by Left-Associative grammar (LA Grammar, Chap. 10).
7.
An algebraic definition of C Grammar is provided in 7.4.2, of PS Grammar in 8.1.1, and of LA Grammar in 10.2.1, respectively.
8.
The relation between C and PS Grammar is described in Sect. 9.2. The different language hierarchies of PS and LA Grammar are compared in Chap. 12.
9.
For example, Chomsky originally thought that the recoverability condition
of deletions would keep transformational grammar decidable (see Sect. 8.5). However, Peters and Ritchie proved in 1972 that TG is undecidable despite this condition.
When Gazdar (1981) proposed the additional formalism of metarules for context-free PS Grammar, he formulated the finite closure condition to ensure that metarules would not increase complexity beyond that of context-free. However, the condition was widely rejected as linguistically unmotivated, leading Uszkoreit and Peters (1986) to the conclusion that GPSG is in fact undecidable.
10.
In linguistics, examples of ungrammatical structures are marked with an asterisk *, a convention which dates back at least to Bloomfield (1933).
11.
The mathematical properties of informal descriptions, on the other hand, cannot be investigated because their structures are not sufficiently clearly specified.
12.
Programs which are not based on a declarative specification may still run. However, as long as it is not clear which of their properties are theoretically necessary and which are an accidental result of the programming environment and the programmer’s idiosyncrasies, such programs – called hacks – are of little theoretical interest. From a practical point of view, they are difficult to scale up and hard to debug. The relation between grammar systems and their implementation is further discussed in Sect. 15.1.
13.
Quechua is a language of South-American Indians.
14.
A good intuitive summary may be found in Geach (1972). See also Lambek (1958) and Bar-Hillel (1964), Chap. 14, pp. 185–189.
15.
In contrast, square_root is not a function, but called a relation because it may assign more than one value to an argument in the domain. The root of 4, for example, has two values, namely 2 and −2.
16.
An alternative algebraic definition of C Grammar may be found in Bar-Hillel (1964), p. 188.
17.
The names and the number of elementary categories (here, u and v) are in principle unrestricted. For example, Ajdukiewicz used only one elementary category, Geach and Montague used two, others three.
18.
The term fragment is used to refer to that subset of a natural language which a given formal grammar is designed to handle.
19.
Sect. 19.4 and CoL, pp. 292–295.
20.
For simplicity and consistency, our notation differs from Montague’s in that the distinction between syntactic categories and semantic types is omitted, with arguments positioned before the slash.
21.
For an attempt see SCG.

References

Ajdukiewicz, K. (1935) “Die syntaktische Konnexität,” Studia Philosophica 1:1–27
Google Scholar
Bar-Hillel, Y. (1953) “Some Linguistic Problems Connected with Machine Translation,” Philosophy of Science 20:217–225
Article Google Scholar
Bar-Hillel, Y. (1964) Language and Information. Selected Essays on Their Theory and Application, Reading: Addison-Wesley
MATH Google Scholar
Bloomfield, L. (1933) Language, New York: Holt, Rinehart, and Winston
Google Scholar
Gazdar, G. (1981) “Unbounded Dependencies and Coordinate Structure,” Linguistic Inquiry 12.2:155–184
Google Scholar
Geach, P. (1972) “A Program for Syntax,” in D. Davidson and G. Harman (eds.), 483–497
Google Scholar
Halliday, M.A.K. (1985) An Introduction to Functional Grammar, London: Edward Arnold
Google Scholar
Harrison, M. (1978) Introduction to Formal Language Theory, Reading: Addison-Wesley
MATH Google Scholar
Kleene, S.C. (1952) Introduction to Metamathematics, Amsterdam
MATH Google Scholar
Lamb, S. (1996) Outline of Stratificational Grammar, Washington: Georgetown University Press
Google Scholar
Lambek, J. (1958) “The Mathematics of Sentence Structure,” The American Mathematical Monthly 65:154–170
Article MATH MathSciNet Google Scholar
Leśniewski, S. (1929) “Grundzüge eines neuen Systems der Grundlagen der Mathematik,” Fundamenta Mathematicae 14:1–81
MATH Google Scholar
Tesnière, L. (1959) Éléments de syntaxe structurale, Paris: Editions Klincksieck
Google Scholar
Uszkoreit, H., and S. Peters (1986) “On Some Formal Properties of Metarules,” Report CSLI-85-43, Stanford University: Center for the Study of Language and Information
Google Scholar

Download references

Author information

Authors and Affiliations

Abteilung für Computerlinguistik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Roland Hausser

Authors

Roland Hausser
View author publications
You can also search for this author in PubMed Google Scholar

Exercises

Section 7.1

1.
How is the notion of a language defined in formal grammar?
2.
Explain the notion of a free monoid as it relates to formal grammar.
3.
What is the difference between positive closure and Kleene closure?
4.
In what sense can a generative grammar be viewed as a filter?
5.
Explain the role of recursion in the derivation of aaaabbbb using definition 7.1.3.
6.
Why is PS Grammar called a generative grammar?
7.
What is an algebraic definition and what is its purpose?
8.
What is the difference between elementary, derived, and semi-formal formalisms?
9.
What is the reason for the development of derived formalisms?

Section 7.2

1.
Explain the difference in the well-formedness for artificial and natural languages.
2.
Why is a formal characterization of grammatical well-formedness a descriptive goal of theoretical linguistics?
3.
Name three reasons for using formal grammar in modern linguistics.
4.
Why is the use of formal grammars a necessary, but not a sufficient, condition for a successful language analysis?

Section 7.3

1.
Under what circumstances is a formal grammar descriptively adequate?
2.
What is meant by the mathematical complexity of a grammar formalism and why is it relevant for practical work?
3.
What is the difference between functional and nonfunctional grammar theories?
4.
Which three aspects should be jointly taken into account in the development of a generative grammar and why?

Section 7.4

1.
Who invented C Grammar, when, and for what purpose?
2.
When was C Grammar first applied to natural language and by whom?
3.
What is the structure of a logical function?
4.
Give an algebraic definition of C Grammar.
5.
Explain the interpretation of complex C Grammar categories as functors.
6.
Why is the set of categories in C Grammar infinite and the lexicon finite?
7.
Name the formal principle allowing the C Grammar 7.4.4 to generate infinitely many expressions even though its lexicon and its rule set are finite.
8.
Why is the grammar formalism defined in 7.4.4 called bidirectional C Grammar?
9.
Would it be advisable to use C Grammar as the syntactic component of the Slim theory of language?

Section 7.5

1.
Why is C Grammar prototypical of a lexical approach?
2.
What is meant by a fragment of a natural language in formal grammar?
3.
Explain the relation between a functional interpretation of complex categories in C Grammar and the model-theoretic interpretation of natural language.
4.
Explain the recursive structure in the C Grammar 7.5.4.
5.
Explain how the semantic interpretation of C Grammar works in principle.
6.
Extend the C Grammar 7.5.4 to generate the sentences The man sent the girl a letter, The girl received a letter from the man, The girl was sent a letter by the man. Explain the semantic motivation of your categories.
7.
Why are there no large-scale descriptions of natural language in C Grammar?
8.
Why are there no efficient implementations of C Grammar?
9.
Why is the absence of efficient implementations a serious methodological problem for C Grammar?
10.
Does C Grammar provide a mechanism of natural communication? Would it be suitable as a component of such a mechanism?

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hausser, R. (2014). Formal Grammar. In: Foundations of Computational Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41431-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-41431-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41430-5
Online ISBN: 978-3-642-41431-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Formal Grammar

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Exercises

Exercises

Section 7.1

Section 7.2

Section 7.3

Section 7.4

Section 7.5

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation