A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging

  • John Sylak-GlassmanEmail author
  • Christo Kirov
  • Matt Post
  • Roger Que
  • David Yarowsky
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 537)


Semantically detailed and typologically-informed morphological analysis that is broadly applicable cross-linguistically has the potential to improve many NLP applications, including machine translation, n-gram language models, information extraction, and co-reference resolution. In this paper, we present a universal morphological feature schema, which is a set of features that represent the finest distinctions in meaning that are expressed by inflectional morphology across languages. We first present the schema’s guiding theoretical principles, construction methodology, and contents. We then present a method of measuring cross-linguistic variability in the semantic distinctions conveyed by inflectional morphology along the multiple dimensions spanned by the schema. This method relies on representing inflected wordforms from many languages in our universal feature space, and then testing for agreement across multiple aligned translations of pivot words in a parallel corpus (the Bible). The results of this method are used to assess the effectiveness of cross-linguistic projection of a multilingual consensus of these fine-grained morphological features, both within and across language families. We find high cross-linguistic agreement for a diverse range of semantic dimensions expressed by inflectional morphology.


Inflectional morphology Linguistic typology Universal schema Cross-linguistic projection 


  1. 1.
    Aikhenvald, A.Y.: Evidentiality. Oxford University Press, Oxford (2004)Google Scholar
  2. 2.
    Bhat, D.N.S.: Pronouns. Oxford University Press, Oxford (2004)Google Scholar
  3. 3.
    Blake, B.J.: Case. Cambridge University Press, Cambridge (2001)CrossRefGoogle Scholar
  4. 4.
    Bliss, H., Ritter, E.: Developing a database of personal and demonstrative pronoun paradigms: Conceptual and technical challenges. In: Proceedings of the IRCS Workshop on Linguistic Databases. IRCS, Philadelphia (2001)Google Scholar
  5. 5.
    Brown, P., Levinson, S.C.: Politeness: Some Universals in Language Usage. Cambridge University Press, Cambridge (1987)Google Scholar
  6. 6.
  7. 7.
    Chelliah, S.L., de Reuse, W.J.: Handbook of Descriptive Linguistic Fieldwork. Springer, Dordrecht (2011)CrossRefGoogle Scholar
  8. 8.
    Comrie, B.: Aspect: An Introduction to the Study of Verbal Aspect and Related Problems. Cambridge University Press, Cambridge (1976)Google Scholar
  9. 9.
    Comrie, B.: Linguistic Politeness Axes: Speaker-Addressee, Speaker-Referent, Speaker-Bystander. In: Pragmatics Microfiche 1.7 (1976)Google Scholar
  10. 10.
    Comrie, B.: Tense. Cambridge University Press, Cambridge (1985)CrossRefGoogle Scholar
  11. 11.
    Comrie, B.: Language Universals and Linguistic Typology. Basil Blackwell, Oxford (1989)Google Scholar
  12. 12.
    Comrie, B., Haspelmath, M., Bickel, B.: Leipzig Glossing Rules.
  13. 13.
    Comrie, B., Polinsky, M.: The great Daghestanian case hoax. In: Siewierska, A., Song, J.J. (eds.) Case, Typology, and Grammar: In Honor of Barry J. Blake, pp. 95–114. John Benjamins, Amsterdam (1998)CrossRefGoogle Scholar
  14. 14.
    Corbett, G.: Number. Cambridge University Press, Cambridge (2000)CrossRefGoogle Scholar
  15. 15.
    Creissels, D.: Construct forms of nouns in African languages. In: Proceedings of the Conference on Language Documentation and Linguistic Theory 2, pp. 73–82. SOAS, London (2009)Google Scholar
  16. 16.
    Croft, W.: Parts of speech as language universals and as language-particular categories. In: Vogel, P.M., Comrie, B. (eds.) Approaches to the Typology of Word Classes, pp. 65–102. Mouton de Gruyter, New York (2000)Google Scholar
  17. 17.
    Cucerzan, S., Yarowsky, D.: Minimally supervised induction of grammatical gender. In: Proceedings of HLT-NAACL 2003, pp. 40–47. ACL, Stroudsburg, PA (2003)Google Scholar
  18. 18.
    Cuzzolin, P., Lehmann, C.: Comparison and Gradation. In: Booij, G.E., Lehmann, C., Mugdan, J., Skopeteas, S. (eds.) Morphologie: Ein internationales Handbuch zur Flexion und Wortbildung/An International Handbook on Inflection and Word-Formation, pp. 1212–1220. Mouton de Gruyter, Berlin (2004)Google Scholar
  19. 19.
    Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of ACL 2011, pp. 600–609. ACL, Stroudsburg, PA (2011)Google Scholar
  20. 20.
    Davis, I.: The language of Santa Ana Pueblo. In: Anthropological Papers, Numbers 68–74, Bureau of American Ethnology, Bulletin 191, pp. 53–190. Smithsonian Institution, Washington, DC (1964)Google Scholar
  21. 21.
    Demuth, K.: Bantu noun classes: loanword and acquisition evidence of semantic productivity. In: Senft, G. (ed.) Classification Systems, pp. 270–292. Cambridge University Press, Cambridge (2000)Google Scholar
  22. 22.
    Friedman, V.: Lak. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, pp. 303–305. Elsevier, Oxford (2006)CrossRefGoogle Scholar
  23. 23.
    Göksel, A., Kerslake, C.: Turkish: A Comprehensive Grammar. Routledge, London (2005)CrossRefGoogle Scholar
  24. 24.
    General Ontology for Linguistic Description (GOLD).
  25. 25.
    Haspelmath, M.: The converb as a cross-linguistically valid category. In: Haspelmath, M., König, E. (eds.) Converbs in Cross-Linguistic Perspective, pp. 1–56. Mouton de Gruyter, Berlin (1995)Google Scholar
  26. 26.
    Haspelmath, M.: Comparative concepts and descriptive categories in crosslinguistic studies. Language 8(3), 663–687 (2010)CrossRefGoogle Scholar
  27. 27.
    Hualde, J.I., Ortiz de Urbina, J.: A Grammar of Basque. Mouton de Gruyter, Berlin (2003)CrossRefGoogle Scholar
  28. 28.
    Hwa, R., Resnik, P., Weinberg, A., Cabezas, C., Kolak, O.: Bootstrapping parsers via syntactic projection across parallel texts. Nat. Lang. Eng. 11, 311–325 (2005)CrossRefGoogle Scholar
  29. 29.
    Klaiman, M.H.: Grammatical Voice. Cambridge University Press, Cambridge (1991)Google Scholar
  30. 30.
    Klein, W.: Time in Language. Routledge, New York (1994)Google Scholar
  31. 31.
    Laitinen, L.: Zero person in Finnish: a grammatical resource for construing human reference. In: Helasvuo, M.-L., Campbell, L. (eds.) Grammar from the Human Perspective: Case, Space and Person in Finnish, pp. 209–232. John Benjamins, Amsterdam (2006)CrossRefGoogle Scholar
  32. 32.
    Lambrecht, K.: Information Structure and Sentence Form. Cambridge University Press, Cambridge (1994)CrossRefGoogle Scholar
  33. 33.
    Levinson, S.C.: Pragmatics. Cambridge University Press, Cambridge (1983)Google Scholar
  34. 34.
    Lewis, M.P., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 18th edn. SIL International, Dallas (2015). Google Scholar
  35. 35.
    Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of HLT-NAACL 2006, pp. 104–111. ACL, Stroudsburg, PA (2006)Google Scholar
  36. 36.
    Lyons, C.: Definiteness. Cambridge University Press, Cambridge (1999)CrossRefGoogle Scholar
  37. 37.
    Palmer, F.R.: Mood and Modality. Cambridge University Press, Cambridge (2001)CrossRefGoogle Scholar
  38. 38.
    Polinsky, M.: Applicative Constructions.
  39. 39.
    Pulkina, I., Zaxava-Nekrasova, E.: Russian: A Practical Grammar with Exercises. Russky Yazyk Publishers, Moscow (1992)Google Scholar
  40. 40.
    Radkevich, N.: On Location: The Structure of Case and Adpositions. University of Connecticut, Storrs (2010)Google Scholar
  41. 41.
    Rubino, C.: Iloko. In: Adelaar, A., Himmelmann, N.P. (eds.) The Austronesian Languages of Asia and Madagascar, pp. 326–349. Routledge, London (2005)Google Scholar
  42. 42.
    Ryding, K.C.: A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge (2005)CrossRefGoogle Scholar
  43. 43.
    Sagot, B., Walther, G.: Implementing a formal model of inflectional morphology. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2013. CCIS, vol. 380, pp. 115–134. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  44. 44.
    Stirling, L.: Switch-Reference and Discourse Representation. Cambridge University Press, Cambridge (1993)CrossRefGoogle Scholar
  45. 45.
    Sylak-Glassman, J., Kirov, C., Yarowsky, D., Que, R.: A language-independent feature schema for inflectional morphology. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), pp. 674-680. Association for Computational Linguistics, Beijing (2015)Google Scholar
  46. 46.
    Täckström, O., Das, D., Petrov, S., McDonald, R., Nivre, J.: Token and type constraints for cross-lingual part-of-speech tagging. Trans. Assoc. Comput. Linguist. 1, 1–12 (2013)Google Scholar
  47. 47.
  48. 48.
    Vendler, Z.: Verbs and times. Philos. Rev. 66, 143–160 (1957)CrossRefGoogle Scholar
  49. 49.
    Weber, D.J.: A Grammar of Huallaga (Huanuco) Quechua. University of California Press, Berkeley (1989)Google Scholar
  50. 50.
    Welmers, W.E.: African Language Structures. University of California Press, Berkeley (1973)Google Scholar
  51. 51.
    Wenger, J.R.: Some Universals of Honorific Language with Special Reference to Japanese. University of Arizona, Tucson (1982)Google Scholar
  52. 52.
    Willie, M.: Navajo Pronouns and Obviation. University of Arizona, Tucson (1991)Google Scholar
  53. 53.
    Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Proceedings of the First International Conference on Human Language Technology Research, pp. 1–8. ACL, Stroudsburg, PA (2001)Google Scholar
  54. 54.
    Yamamoto, M.: Animacy and Reference. John Benjamins, Amsterdam (1999)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • John Sylak-Glassman
    • 1
    Email author
  • Christo Kirov
    • 1
  • Matt Post
    • 2
    • 3
  • Roger Que
    • 3
  • David Yarowsky
    • 3
  1. 1.Center for Language and Speech ProcessingJohns Hopkins UniversityBaltimoreUSA
  2. 2.Human Language Technology Center of ExcellenceJohns Hopkins UniversityBaltimoreUSA
  3. 3.Department of Computer ScienceJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations