Urdu in a parallel grammar development environment

Butt, Miriam; King, Tracy Holloway

doi:10.1007/s10579-007-9042-8

Urdu in a parallel grammar development environment

Published: 16 October 2007

Volume 41, pages 191–207, (2007)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Miriam Butt¹ &
Tracy Holloway King²

351 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, we report on the role of the Urdu grammar in the Parallel Grammar (ParGram) project (Butt, M., King, T. H., Niño, M.-E., & Segond, F. (1999). A grammar writer’s cookbook. CSLI Publications; Butt, M., Dyvik, H., King, T. H., Masuichi, H., & Rohrer, C. (2002). ‘The parallel grammar project’. In: Proceedings of COLING 2002, Workshop on grammar engineering and evaluation, pp. 1–7). The Urdu grammar was able to take advantage of standards in analyses set by the original grammars in order to speed development. However, novel constructions, such as correlatives and extensive complex predicates, resulted in expansions of the analysis feature space as well as extensions to the underlying parsing platform. These improvements are now available to all the project grammars.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Natural Language Processing

GPT-3: Its Nature, Scope, Limits, and Consequences

Article Open access 01 November 2020

Notes

The languages now also include Arabic, Chinese, Hungarian, Korean, Malagasy, Norwegian, Vietnamese, and Welsh. Some of these grammars are broad coverage grammars used in applications; some are still at initial stages of development; and some have been developed primarily to test aspects of linguistic theory.
In general, these grammars have focused on edited, written texts such as newspaper text and manuals. The Urdu grammar is also geared towards such texts.
These structures can be manipulated via the ordered rewrite systems (transfer component) which is part of the XLE grammar development platform to make them more specialized for a given application.
ParGram does not adopt a more pervasive grammar sharing approach such as that found in (Bender and Flickinger 2005).
One significant effort is the Hindi Verb Project run by Prof. Alice Davison at the University of Iowa; further information is available via their web site.
Unfortunately, unlike the other grammars, there has been no full-time grammar writer on the Urdu grammar.
The c-structures are less parallel in that the languages differ significantly in their word order possibilities. Japanese and Urdu are SOV languages while English is an SVO language. However, the standards for naming the nodes in the trees and the types of constituents formed in the trees, such as NPs, are similar.
German and possibly French have some complex predicate constructions. The ParGram grammars for these use a less linguistically satisfying complex clause analysis. The wider range of complex predicate phenomena in Urdu make this approach infeasible.
XLE implements lexical rules which can be used to delete and rename arguments, e.g., for the English passive in which the obj becomes the subj and the subj becomes the obl-ag. However, adding arguments and composing preds is not possible.
The Japanese grammar (Masuichi and Ohkuma 2003) was also evaluated against the Japanese bunsetsu standard which is a type of dependency measure; see Masuichi et al. (2003) for details.

References

Asahara, M., & Matsumoto, Y. (2000). Extended models and tools for high-performance part-of-speech tagger. In: Proceedings of COLING.
Beesley, K., & Karttunen, L. (2003). Finite-state morphology. CSCI Publications.
Bender, E., & Flickinger, D. (2005). Rapid prototyping of scalable grammars: Towards modularity in extensions to a language-independent core. In: Proceedings of IJCNLP-05 (Posters/Demos).
Butt, M. (1995). The structure of complex predicates in Urdu. CSLI Publications.
Butt, M., Dyvik, H., King, T. H., Masuichi, H., & Rohrer, C. (2002). The parallel grammar project. In: Proceedings of COLING 2002, Workshop on Grammar Engineering and Evaluation, pp. 1–7.
Butt, M., Forst, M., King, T. H., & Kuhn, J. (2003a). The feature space in parallel grammar writing. In: ESSLLI 2003 Workshop on ideas and strategies for multilingual grammar development.
Butt, M., & King, T. H. (2002). Urdu and the parallel grammar project. In: Proceedings of COLING 2002, Workshop on Asian Language Resources and International Standardization, pp. 39–45.
Butt, M., & King, T. H. (2005a). Case systems: Beyond structural distinctions. In: New perspectives on case theory (pp. 53–87). CSLI Publications.
Butt, M., & King, T. H. (2005b). The status of case. In V. Dayal & A. Mahajan (Eds.), Clause structure in South Asian languages. Kluwer.
Butt, M., & King, T. H. (2006a). Restriction for morphological valency alternations: The Urdu causative. In M. Butt, M. Dalrymple, & T. H. King (Eds.), Intelligent linguistic architectures: Variations on themes by Ronald M. Kaplan (pp. 235–258). CSLI Publications.
Butt, M., & King, T. H. (2006b). Restriction for Morphological valency alternations: The Urdu causative. In: Intelligent linguistic architectures: Variations on themes by Ronald M. Kaplan (pp. 235–258). CSLI Publications.
Butt, M., King, T. H., & Maxwell, J. T. (2003b). Complex predicates via restriction. In: Proceedings of the LFG03 Conference. CSLI On-line Publications.
Butt, M., King, T. H., Niño, M.-E., & Segond, F. (1999). A grammar writer’s cookbook. CSLI Publications.
Cahill, A., Forst, M., Burke, M., McCarthy, M., O’Donovan, R., Rohrer, C., van Genabith, J., & Way, A. (2005). Treebank-based acquisition of multilingual unification grammar resources. Journal of Research on Language and Computation; Special Issue on Shared Representations in Multilingual Grammar Engineering, pp. 247–279.
Chanod, J.-P., & Tapanainen, P. (1995). Creating a tagset, lexicon, and guesser for a French tagger. In: Proceedings of the ACL SIGDAT Workshop: From texts to tags. Issues in Multilingual Language Analysis, pp. 58–64.
Crouch, D., Dalrymple, M., Kaplan, R., King, T. H., Maxwell, J., & Newman, P. (2007). XLE Documentation. Available on-line at http://www.2.parc.com/isl/groups/nltt/xle/doc/xle_toc.html. Accessed 10 Oct 2007.
Crouch, D., & King, T. H. (2006). Semantics via f-structure rewriting. In: Proceedings of LFG06. CSLI On-line Publications.
Crouch, R., Kaplan, R., King, T. H., & Riezler, S. (2002). A comparison of evaluation metrics for a broad coverage parser. In: workshop on beyond PARSEVAL at the language resources and evaluation conference.
Crouch, R., King, T. H., Maxwell, J. T., Riezler, S., & Zaenen, A. (2004). Exploiting f-structure input for sentence condensation. In: Proceedings of LFG04, pp. 167–187. CSLI On-line Publications.
Dalrymple, M. (2001). Lexical functional grammar, Vol. 34 of Syntax and semantics. Academic Press.
Dalrymple, M., Dyvik, H., & King, T. H. (2004a). Copular complements: Closed or open? In: Proceedings of the LFG04 conference. CSLI On-line Publications.
Dalrymple, M., Kaplan, R., & King, T. H. (2004b). Linguistic generalizations over descriptions. In: Proceedings of the LFG04 conference. CSLI On-line Publications.
Forst, M. (2003a). Treebank conversion—Creating a German f-structure bank from the TIGER corpus. In: Proceedings of the LFG03 conference. CSLI On-line Publications.
Forst, M. (2003b). Treebank conversion—Establishing a testsuite for a broad-coverage LFG from the TIGER Treebank. In: Proceedings of the EACL workshop on linguistically interpreted corpora (LINC ’03).
Forst, M. (2007). Disambiguation for a linguistically precise German LFG parser. Ph.D. thesis, IMS Stuttgart (in press).
Forst, M., Bertomeu, N., Crysmann, B., Fouvry, F., Hansen-Schirra, S., & Kordoni, V. (2004). Towards a dependency-based gold standard for German parsers—The TiGer Dependency Bank. In: Proceedings of the COLING workshop on linguistically interpreted corpora (LINC ’04).
Frank, A. (1999). From parallel grammar development towards machine translation. In: Proceedings of MT Summit VII, pp. 134–142.
Kaplan, R. (1988). Correspondences and their Inverses. In: Presented at the Titisee workshop on unification formalisms: Syntax, semantics, and implementation, Titisee, Germany.
Kaplan, R., King, T. H., & Maxwell, J. (2002). Adapting existing grammars: The XLE experience. In: Proceedings of COLING2002, Workshop on Grammar Engineering and Evaluation, pp. 29–35.
Kaplan, R., Maxwell, J. T., King, T. H., & Crouch, R. (2004a). Integrating finite-state technology with deep LFG grammars. In: Proceedings of the workshop on combining shallow and deep processing for NLP (ESSLLI).
Kaplan, R., & Wedekind, J. (1993). Restriction and correspondence-based translation. In: Proceedings of the sixth European conference of the association for computational linguistics, pp. 193–202.
Kaplan, R. M., Riezler, S., King, T. H., Maxwell, J. T., Vasserman, A., & Crouch, R. (2004b). Speed and accuracy in shallow and deep stochastic parsing. In: Proceedings of the human language technology conference and the 4th annual meeting of the North American chapter of the association for computational linguistics (HLT-NAACL’04).
Khader, R. (2003). Evaluation of an English LFG-based grammar as error checker. MSc thesis, UMIST.
King, T. H., Crouch, R., Riezler, S., Dalrymple, M., & Kaplan, R. (2003). The PARC700 dependency bank. In: Proceedings of the EACL03: 4th international workshop on linguistically interpreted corpora (LINC-03).
King, T. H., Forst, M., Kuhn, J., & Butt, M. (2005). The feature space in parallel grammar writing. Research on Language and Computation, 3(2), 139–163.
Article Google Scholar
Malik, A. (2006). Hindi Urdu machine transliteration system. MSc Thesis, University of Paris 7.
Masuichi, H., & Ohkuma, T. (2003). Constructing a practical Japanese parser based on lexical-functional grammar. Journal of Natural Language Processing, 10, 79–109 (In Japanese).
Google Scholar
Masuichi, H., Ohkuma, T., Yoshimura, H., & Harada, Y. (2003). Japanese parser on the basis of the lexical-functional grammar formalism and its evaluation. In: Proceedings of The 17th Pacific Asia conference on language, information and computation (PACLIC17), pp. 298–309.
Maxwell, J. T., & Kaplan, R. (1993). The interface between phrasal and functional constraints. Computational Linguistics, 19, 571–589.
Google Scholar
Riezler, S., King, T. H., Crouch, R., & Zaenen, A. (2003). Statistical sentence condensation using ambiguity packing and stochastic disambiguation methods for Lexical-Functional Grammar. In: Proceedings of the human language technology conference and the 3rd meeting of the North American chapter of the association for computational linguistics.
Riezler, S., King, T. H., Kaplan, R., Crouch, D., Maxwell, J., & M. Johnson (2002). Parsing the wall street journal using a lexical-functional grammar and discriminative estimation techniques. In: Proceedings of the annual meeting of the association for computational linguistics.
Riezler, S., & Maxwell, J. T. (2006). Grammatical machine translation. In: Proceedings of human language technology conference—North American chapter of the association for computational linguistics.
Rivzi, S. M. J. (2006). Development of algorithms and computational grammar of Urdu for the machine translation between English and Urdu languages. Ph.D. thesis, Pakistan Institute of Engineering and Applied Sciences.
Rohrer, C., & Forst, M. (2006a). Broad-coverage grammar development—How far can it go?. In M. Butt, M. Dalrymple, & T. H. King (Eds.), Intelligent linguistic architectures—Variations on themes by Ronald M. Kaplan. CSLI Publications.
Rohrer, C., & Forst, M. (2006b). Improving coverage and parsing quality of a large-scale LFG for German. In: Proceedings of the Language Resources and Evaluation Conference (LREC-2006). Genoa, Italy.
Umemoto, H. (2006). Implementing a Japanese semantic parser based on glue approach. In: Proceedings of The 20th Pacific Asia conference on language, information and computation.

Download references

Acknowledgements

We would like to thank the audience of the COLING Workshop on Asian Languages in which an earlier version of this paper appeared (Butt and King 2002) and three anonymous reviewers who provided detailed comments. We would also like to thank John Maxwell for extensive discussion and implementation help over the years.

Author information

Authors and Affiliations

Universität Konstanz, Konstanz, 78462, Germany
Miriam Butt
Palo Alto Research Center, Palo Alto, CA, USA
Tracy Holloway King

Authors

Miriam Butt
View author publications
You can also search for this author in PubMed Google Scholar
Tracy Holloway King
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miriam Butt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Butt, M., King, T.H. Urdu in a parallel grammar development environment. Lang Resources & Evaluation 41, 191–207 (2007). https://doi.org/10.1007/s10579-007-9042-8

Download citation

Received: 26 August 2006
Accepted: 25 September 2007
Published: 16 October 2007
Issue Date: May 2007
DOI: https://doi.org/10.1007/s10579-007-9042-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Urdu in a parallel grammar development environment

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

GPT-3: Its Nature, Scope, Limits, and Consequences

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Urdu in a parallel grammar development environment

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

GPT-3: Its Nature, Scope, Limits, and Consequences

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation