Maxwell M. (2013) A System for Archivable Grammar Documentation. In: Mahlow C., Piotrowski M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2013. Communications in Computer and Information Science, vol 380. Springer, Berlin, Heidelberg
This paper describes a number of criteria for archivable documentation of grammars of natural languages, extending the work of Bird and Simons’ “Seven dimensions of portability for language documentation and description.” We then describe a system for writing and testing morphological and phonological grammars of languages, a system which satisfies most of these criteria (where it does not, we discuss plans to extend the system).
The core of this system is based on an XML schema which allows grammars to be written in a stable and linguistically-based formalism, a formalism which is independent of any particular parsing engine. This core system also includes a converter program, analogous to a programming language compiler, which translates grammars written in this format, plus a dictionary, into the programming language of a suitable parsing engine (currently the Stuttgart Finite State Tools). The paper describes some of the decisions which went into the design of the formalism; for example, the decision to aim for observational adequacy, rather than descriptive adequacy. We draw out the implications of this decision in several areas, particularly in the treatment of morphological reduplication.
We have used this system to produce formal grammars of Bangla, Urdu,Pashto, and Persian (Farsi), and we have derived parsers from those formal grammars. In the future we expect to implement similar grammars of other languages, including Dhivehi, Swahili, and Somali. In further work (briefly described in this paper), we have embedded formal grammars produced in this core system into traditional descriptive grammars of several of these languages. These descriptive grammars serve to document the formal grammars, and also provide automatically extractable test cases for the parser.