Automatic Extraction of Typological Linguistic Features from Descriptive Grammars

  • Shafqat Mumtaz VirkEmail author
  • Lars Borin
  • Anju Saxena
  • Harald Hammarström
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10415)


The present paper describes experiments on automatically extracting typological linguistic features of natural languages from traditional written descriptive grammars. The feature-extraction task has high potential value in typological, genealogical, historical, and other related areas of linguistics that make use of databases of structural features of languages. Until now, extraction of such features from grammars has been done manually, which is highly time and labor consuming and becomes prohibitive when extended to the thousands of languages for which linguistic descriptions are available. The system we describe here starts from semantically parsed text over which a set of rules are applied in order to extract feature values. We evaluate the system’s performance on the manually curated Grambank database as the gold standard and report the first measures of precision and recall for this problem.


Information extraction Semantic parsing Language typology Typological database 


  1. 1.
    Björkelund, A., Hafdell, L., Nugues, P.: Multilingual semantic role labeling. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, CoNLL 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  2. 2.
    Broscheit, S., Poesio, M., Ponzetto, S.P., Rodriguez, K.J., Romano, L., Uryupina, O., Versley, Y., Zanoli, R., Kessler, F.B.: Bart: a multilingual anaphora resolution system. In: Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval 2010, pp. 104–107 (2010)Google Scholar
  3. 3.
    Grierson, G.A.: A Linguistic Survey of India, vol. I–XI. Government of India, Central Publication Branch, Calcutta (1903–1927)Google Scholar
  4. 4.
    Polyakov, V.N., Solovyev, V.D., Wichmann, S., Belyaev, O.: Using wals and jazyki mira. Linguist. Typology 13, 137–167 (2009)CrossRefGoogle Scholar
  5. 5.
    Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 492–501. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  6. 6.
    Reesink, G., Singer, R., Dunn, M.: Explaining the linguistic diversity of sahul using population models. PLoS Biol. 7(11), 1–9 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Shafqat Mumtaz Virk
    • 1
    Email author
  • Lars Borin
    • 1
  • Anju Saxena
    • 2
  • Harald Hammarström
    • 2
  1. 1.Språkbanken, Department of SwedishUniversity of GothenburgGothenburgSweden
  2. 2.Department of Linguistics and PhilologyUppsala UniversityUppsalaSweden

Personalised recommendations