Skip to main content

A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging

Part of the Communications in Computer and Information Science book series (CCIS,volume 537)

Abstract

Semantically detailed and typologically-informed morphological analysis that is broadly applicable cross-linguistically has the potential to improve many NLP applications, including machine translation, n-gram language models, information extraction, and co-reference resolution. In this paper, we present a universal morphological feature schema, which is a set of features that represent the finest distinctions in meaning that are expressed by inflectional morphology across languages. We first present the schema’s guiding theoretical principles, construction methodology, and contents. We then present a method of measuring cross-linguistic variability in the semantic distinctions conveyed by inflectional morphology along the multiple dimensions spanned by the schema. This method relies on representing inflected wordforms from many languages in our universal feature space, and then testing for agreement across multiple aligned translations of pivot words in a parallel corpus (the Bible). The results of this method are used to assess the effectiveness of cross-linguistic projection of a multilingual consensus of these fine-grained morphological features, both within and across language families. We find high cross-linguistic agreement for a diverse range of semantic dimensions expressed by inflectional morphology.

Keywords

  • Inflectional morphology
  • Linguistic typology
  • Universal schema
  • Cross-linguistic projection

The first two authors contributed equally to this paper.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-23980-4_5
  • Chapter length: 22 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-23980-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    The local case morphemes can be organized within each category through the use of abstract features that are more general than the feature labels employed in the schema.

  2. 2.

    http://paralleltext.info/data/all/.

  3. 3.

    http://www.wiktionary.org.

  4. 4.

    Some disagreement in the data will be due to errors in our Wiktionary data, or the automated Bible alignment. We do not discuss these sources of noise in this paper, but they should affect all measurements in a uniform way, and thus do not preclude the comparisons we make.

  5. 5.

    When comparing Albanian and Latin pivots to the consensus of their translations, no Albanian and Latin translations were used. Using only cross-language consensus prevents unfair advantage from self-similarity.

References

  1. Aikhenvald, A.Y.: Evidentiality. Oxford University Press, Oxford (2004)

    Google Scholar 

  2. Bhat, D.N.S.: Pronouns. Oxford University Press, Oxford (2004)

    Google Scholar 

  3. Blake, B.J.: Case. Cambridge University Press, Cambridge (2001)

    CrossRef  Google Scholar 

  4. Bliss, H., Ritter, E.: Developing a database of personal and demonstrative pronoun paradigms: Conceptual and technical challenges. In: Proceedings of the IRCS Workshop on Linguistic Databases. IRCS, Philadelphia (2001)

    Google Scholar 

  5. Brown, P., Levinson, S.C.: Politeness: Some Universals in Language Usage. Cambridge University Press, Cambridge (1987)

    Google Scholar 

  6. Cable, S.: Tense, Aspect and Aktionsart. http://people.umass.edu/scable/PNWSeminar/handouts/Tense/Tense-Background.pdf

  7. Chelliah, S.L., de Reuse, W.J.: Handbook of Descriptive Linguistic Fieldwork. Springer, Dordrecht (2011)

    CrossRef  Google Scholar 

  8. Comrie, B.: Aspect: An Introduction to the Study of Verbal Aspect and Related Problems. Cambridge University Press, Cambridge (1976)

    Google Scholar 

  9. Comrie, B.: Linguistic Politeness Axes: Speaker-Addressee, Speaker-Referent, Speaker-Bystander. In: Pragmatics Microfiche 1.7 (1976)

    Google Scholar 

  10. Comrie, B.: Tense. Cambridge University Press, Cambridge (1985)

    CrossRef  Google Scholar 

  11. Comrie, B.: Language Universals and Linguistic Typology. Basil Blackwell, Oxford (1989)

    Google Scholar 

  12. Comrie, B., Haspelmath, M., Bickel, B.: Leipzig Glossing Rules. https://www.eva.mpg.de/lingua/resources/glossing-rules.php

  13. Comrie, B., Polinsky, M.: The great Daghestanian case hoax. In: Siewierska, A., Song, J.J. (eds.) Case, Typology, and Grammar: In Honor of Barry J. Blake, pp. 95–114. John Benjamins, Amsterdam (1998)

    CrossRef  Google Scholar 

  14. Corbett, G.: Number. Cambridge University Press, Cambridge (2000)

    CrossRef  Google Scholar 

  15. Creissels, D.: Construct forms of nouns in African languages. In: Proceedings of the Conference on Language Documentation and Linguistic Theory 2, pp. 73–82. SOAS, London (2009)

    Google Scholar 

  16. Croft, W.: Parts of speech as language universals and as language-particular categories. In: Vogel, P.M., Comrie, B. (eds.) Approaches to the Typology of Word Classes, pp. 65–102. Mouton de Gruyter, New York (2000)

    Google Scholar 

  17. Cucerzan, S., Yarowsky, D.: Minimally supervised induction of grammatical gender. In: Proceedings of HLT-NAACL 2003, pp. 40–47. ACL, Stroudsburg, PA (2003)

    Google Scholar 

  18. Cuzzolin, P., Lehmann, C.: Comparison and Gradation. In: Booij, G.E., Lehmann, C., Mugdan, J., Skopeteas, S. (eds.) Morphologie: Ein internationales Handbuch zur Flexion und Wortbildung/An International Handbook on Inflection and Word-Formation, pp. 1212–1220. Mouton de Gruyter, Berlin (2004)

    Google Scholar 

  19. Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of ACL 2011, pp. 600–609. ACL, Stroudsburg, PA (2011)

    Google Scholar 

  20. Davis, I.: The language of Santa Ana Pueblo. In: Anthropological Papers, Numbers 68–74, Bureau of American Ethnology, Bulletin 191, pp. 53–190. Smithsonian Institution, Washington, DC (1964)

    Google Scholar 

  21. Demuth, K.: Bantu noun classes: loanword and acquisition evidence of semantic productivity. In: Senft, G. (ed.) Classification Systems, pp. 270–292. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  22. Friedman, V.: Lak. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, pp. 303–305. Elsevier, Oxford (2006)

    CrossRef  Google Scholar 

  23. Göksel, A., Kerslake, C.: Turkish: A Comprehensive Grammar. Routledge, London (2005)

    CrossRef  Google Scholar 

  24. General Ontology for Linguistic Description (GOLD). http://linguistics-ontology.org/

  25. Haspelmath, M.: The converb as a cross-linguistically valid category. In: Haspelmath, M., König, E. (eds.) Converbs in Cross-Linguistic Perspective, pp. 1–56. Mouton de Gruyter, Berlin (1995)

    Google Scholar 

  26. Haspelmath, M.: Comparative concepts and descriptive categories in crosslinguistic studies. Language 8(3), 663–687 (2010)

    CrossRef  Google Scholar 

  27. Hualde, J.I., Ortiz de Urbina, J.: A Grammar of Basque. Mouton de Gruyter, Berlin (2003)

    CrossRef  Google Scholar 

  28. Hwa, R., Resnik, P., Weinberg, A., Cabezas, C., Kolak, O.: Bootstrapping parsers via syntactic projection across parallel texts. Nat. Lang. Eng. 11, 311–325 (2005)

    CrossRef  Google Scholar 

  29. Klaiman, M.H.: Grammatical Voice. Cambridge University Press, Cambridge (1991)

    Google Scholar 

  30. Klein, W.: Time in Language. Routledge, New York (1994)

    Google Scholar 

  31. Laitinen, L.: Zero person in Finnish: a grammatical resource for construing human reference. In: Helasvuo, M.-L., Campbell, L. (eds.) Grammar from the Human Perspective: Case, Space and Person in Finnish, pp. 209–232. John Benjamins, Amsterdam (2006)

    CrossRef  Google Scholar 

  32. Lambrecht, K.: Information Structure and Sentence Form. Cambridge University Press, Cambridge (1994)

    CrossRef  Google Scholar 

  33. Levinson, S.C.: Pragmatics. Cambridge University Press, Cambridge (1983)

    Google Scholar 

  34. Lewis, M.P., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 18th edn. SIL International, Dallas (2015). http://www.ethnologue.com

    Google Scholar 

  35. Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of HLT-NAACL 2006, pp. 104–111. ACL, Stroudsburg, PA (2006)

    Google Scholar 

  36. Lyons, C.: Definiteness. Cambridge University Press, Cambridge (1999)

    CrossRef  Google Scholar 

  37. Palmer, F.R.: Mood and Modality. Cambridge University Press, Cambridge (2001)

    CrossRef  Google Scholar 

  38. Polinsky, M.: Applicative Constructions. http://wals.info/chapter/109

  39. Pulkina, I., Zaxava-Nekrasova, E.: Russian: A Practical Grammar with Exercises. Russky Yazyk Publishers, Moscow (1992)

    Google Scholar 

  40. Radkevich, N.: On Location: The Structure of Case and Adpositions. University of Connecticut, Storrs (2010)

    Google Scholar 

  41. Rubino, C.: Iloko. In: Adelaar, A., Himmelmann, N.P. (eds.) The Austronesian Languages of Asia and Madagascar, pp. 326–349. Routledge, London (2005)

    Google Scholar 

  42. Ryding, K.C.: A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge (2005)

    CrossRef  Google Scholar 

  43. Sagot, B., Walther, G.: Implementing a formal model of inflectional morphology. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2013. CCIS, vol. 380, pp. 115–134. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  44. Stirling, L.: Switch-Reference and Discourse Representation. Cambridge University Press, Cambridge (1993)

    CrossRef  Google Scholar 

  45. Sylak-Glassman, J., Kirov, C., Yarowsky, D., Que, R.: A language-independent feature schema for inflectional morphology. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), pp. 674-680. Association for Computational Linguistics, Beijing (2015)

    Google Scholar 

  46. Täckström, O., Das, D., Petrov, S., McDonald, R., Nivre, J.: Token and type constraints for cross-lingual part-of-speech tagging. Trans. Assoc. Comput. Linguist. 1, 1–12 (2013)

    Google Scholar 

  47. Universal Dependencies. http://universaldependencies.github.io/docs/

  48. Vendler, Z.: Verbs and times. Philos. Rev. 66, 143–160 (1957)

    CrossRef  Google Scholar 

  49. Weber, D.J.: A Grammar of Huallaga (Huanuco) Quechua. University of California Press, Berkeley (1989)

    Google Scholar 

  50. Welmers, W.E.: African Language Structures. University of California Press, Berkeley (1973)

    Google Scholar 

  51. Wenger, J.R.: Some Universals of Honorific Language with Special Reference to Japanese. University of Arizona, Tucson (1982)

    Google Scholar 

  52. Willie, M.: Navajo Pronouns and Obviation. University of Arizona, Tucson (1991)

    Google Scholar 

  53. Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Proceedings of the First International Conference on Human Language Technology Research, pp. 1–8. ACL, Stroudsburg, PA (2001)

    Google Scholar 

  54. Yamamoto, M.: Animacy and Reference. John Benjamins, Amsterdam (1999)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Sylak-Glassman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Sylak-Glassman, J., Kirov, C., Post, M., Que, R., Yarowsky, D. (2015). A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2015. Communications in Computer and Information Science, vol 537. Springer, Cham. https://doi.org/10.1007/978-3-319-23980-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23980-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23978-1

  • Online ISBN: 978-3-319-23980-4

  • eBook Packages: Computer ScienceComputer Science (R0)