, Volume 4, Issue 4, pp 471-504

Natural languages and context-free languages

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Conclusions

Notice that this paper has not claimed that all natural languages are CFL's. What it has shown is that every published argument purporting to demonstrate the non-context-freeness of some natural language is invalid, either formally or empirically or both.18 Whether non-context-free characteristics can be found in the stringset of some natural language remains an open question, just as it was a quarter century ago.

Whether the question is ultimately answered in the negative or the affirmative, there will be interesting further questions to ask. If it turns out that natural languages are indeed always CFL's, it will be reasonable to ask whether this helps to explain why speakers apparently recognize so quickly whether a presented utterance corresponds to a grammatical sentence or not, and to associate structural and semantic details with it. It might also be reasonable to speculate about the explanation for the universally context-free character of the languages used by humans, and to wonder whether evolutionary biological factors are implicated in some way (Sampson (1979) could be read in this light). And naturally, it will be reasonable to pursue the program put forward by Gazdar (1981, in press) to see to what extent CFL-inducing grammatical devices can be exploited to yield insightful descriptions of natural languages that capture generalizations in revealing ways.

If a human language that is not a CFL is proved to exist, on the other hand, a different question will be raised: give the non-context-free character of human languages in general, why has this property been so hard to demonstrate that it has taken over twenty-five years to bring it to light since the issue was first explicitly posed? If human languages do not have to be CFL's, why do so many (most?) of them come so close to having the property of context-freeness? And, since the CFL's certainly constitute a very broad class of mathematically natural and computationally tractable languages, what property of human beings or their communicative or cognitive needs is it that has caused some linguistic communities to reach beyond the boundaries of this class in the course of evolving a linguistic system?

Either way, we shall be interested to see our initial question resolved, and further questions raised. One cautionary word should be said, however, about the implications (or lack of them) that the answer will have for grammatical studies. Chomsky has repeatedly stated that he does not see weak generative capacity as a theme of central importance in the theory of grammar, and we agree. It is very far from being the case that the recent resurgence of interest in exploring the potential of CF-PSG or equivalent systems will, or should, be halted dead in its tracks by the discovery (if it is ever forthcoming) that some natural language is not a CFL. In the area of parsing, for instance, it seems possible that natural languages are not only parsed on the basis of constituent structure such as a CF-PSG would assign, but are parsed as if they were finite state languages (see Langendoen (1975) and Church (1980) for discussion along these lines). That is, precisely those construction-types that figure in the various proofs that English is not an FSL appear to cause massive difficulty in the human processing system; the sentences crucial to the proofs are for the most part unprocessable unless they are extremely short (yet the arguments for English not being and FSL only go through if length is not an issue). This means that in practice properties of finite state grammars are still of great potential importance to linguistic theory despite the fact that they do not provide the framework for defining the total class of grammatical sentences. The same would almost certannly be true of CF-PSG's if they were shown to be inadequate in a similar sense. It is highly unlikely that the advances made so far in far in phrase structure description could be nullified by a discovery about weak generative capacity. Moreover, there are known to be numerous ways in which the power of CF-PSG's can be marginally enhanced to permit, for example, xx languages to be generated without allowing anything like the full class of recursively enumerable or even context-sensitive languages (see Hopcroft and Ullmann (1979), Chapter 14) for an introduction to this topic, noting especially Figure 14.7 on p. 393. The obvious thing to do if natural languages were ever shown not to be CFL's in the general case would be to start exploring such minimal enhancements of expressive power to determine exactly what natural languages call for in this regard and how it could be effectively but parsimoniously provided in a way that closely modelled human linguistic capacities.

In the meantime, it seems reasonable to assume that the natural languages are a proper subset of the infinite-cardinality CFL's until such time as they are validly shown not to be.

A brief, preliminary statement of the view developed in this paper appeared in an unpublished paper by Gazdar, ‘English As a Context-free Language’, in April 1979. The authors jointly presented an early version of the present paper at the University of York in January 1980, and a more recent version was presented by Pullum at the University of California, San Diego, in May 1981. We thank Paul Postal, Mark Steedman, Thomas Wasow, David Watt, and our anonymous referees for their detailed comments on the whole paper, some of which improved it enormously. Wallace Chafe, David Dowty, Elisabet Engdahl, Aravind Joshi, D. Terence Langendoen, Alexis Manaster-Ramer, Marianne Mithun, Stanley Peters, Robert Ritchie, Jerrold Sadock; Ivan Sag, Geoffrey Sampson, Paul Schachter and Annie Zaenen also helped us with correspondence or suggestions. Some of the people mentioned take strong exception to our views, so their willingness to help must be seen as courtesy rather than concurrence. Our work was partially funded by grants from the National Science Foundation (grant No. BNS-8102406) and the Sloan Foundation to Stanford University, where the hospitality of the Department of Linguistics gave us the conditions under which we could finish the paper, and also by a grant from the Social Science Research Council, U.K. (grant No. HR-5767) to the University of Sussex. Offprint requests should be directed to Pullum at Cowell College, University of California (UCSC), Santa Cruz, California 95064.