Language Resources and Evaluation

, Volume 49, Issue 1, pp 107–145 | Cite as

Parsing Hebrew CHILDES transcripts

  • Shai Gretz
  • Alon Itai
  • Brian MacWhinney
  • Bracha Nir
  • Shuly Wintner
Original Paper

Abstract

We present a syntactic parser of (transcripts of) spoken Hebrew: a dependency parser of the Hebrew CHILDES database. CHILDES is a corpus of child–adult linguistic interactions. Its Hebrew section has recently been morphologically analyzed and disambiguated, paving the way for syntactic annotation. This paper describes a novel annotation scheme of dependency relations reflecting constructions of child and child-directed Hebrew utterances. A subset of the corpus was annotated with dependency relations according to this scheme, and was used to train two parsers (MaltParser and MEGRASP) with which the rest of the data were parsed. The adequacy of the annotation scheme to the CHILDES data is established through numerous evaluation scenarios. The paper also discusses different annotation approaches to several linguistic phenomena, as well as the contribution of morphological features to the accuracy of parsing.

Keywords

Parsing Dependency grammar Child language Syntactic annotation 

References

  1. Albert, A., MacWhinney, B., Nir, B., & Wintner, S. (2014). The Hebrew CHILDES corpus: Transcription and morphological analysis. Language Resources and Evaluation.Google Scholar
  2. Ballesteros, M., Herrera, J., Francisco, V., & Gervás, P. (2012). Analyzing the CoNLL-X shared task from a sentence accuracy perspective. SEPLN: Sociedad Española Procesamiento del Lenguaje Natural, 48, 29–34.Google Scholar
  3. Ballesteros, M., & Nivre, J. (2012). MaltOptimizer: A system for MaltParser optimization. In Proceedings of the eighth international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, May 2012. European Language Resources Association (ELRA). ISBN 978-2-9517408-7-7.Google Scholar
  4. Berman, R. A. (1978). Modern Hebrew structure. Tel Aviv: University Publishing Projects.Google Scholar
  5. Berman, R. A., & Weissenborn, J. (1991). Acquisition of word order: A crosslinguistic study. Jerusalem, Israel: German-Israel Foundation for Research and Development (GIF); In Hebrew.Google Scholar
  6. Blevins, J. P. (2006). Word-based morphology. Journal of Linguistics, 42(4), 531–573. doi:10.1017/S0022226706004191.CrossRefGoogle Scholar
  7. Bohnet, B. (2010). Very high accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd international conference on computational linguistics (pp. 89–97). Stroudsburg, PA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1873781.1873792.
  8. Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.CrossRefGoogle Scholar
  9. Danon, G. (2001). Syntactic definiteness in the grammar of Modern Hebrew. Linguistics, 39(6), 1071–1116. doi:10.1515/ling.2001.042.CrossRefGoogle Scholar
  10. de Marneffe, M.-C., MacCartney, B., & Manning, C. D. (2006). Generating typed dependency parses from phrase structure trees. In Proceedings of LREC-2006. http://nlp.stanford.edu/pubs/LREC06_dependencies.pdf.
  11. de Marneffe, M.-C., & Manning, C. D. (2008). The Stanford typed dependencies representation. In COLING workshop on cross-framework and cross-domain parser evaluation. http://pubs/dependencies-coling08.pdf.Google Scholar
  12. Dromi, E., & Berman, R. A. (1982). A morphemic measure of early language development: Data from Modern Hebrew. Journal of Child Language, 9, 403–424. ISSN 1469-7602. http://journals.cambridge.org/article_S0305000900004785.
  13. Eryiğit, G., & Nivre, J., & Oflazer, K. (2008). Dependency parsing of Turkish. Computational Linguistics, 34(3), 357–389. ISSN 0891-2017. doi:10.1162/coli.2008.07-017-R1-06-83.Google Scholar
  14. Goldberg, Y. (2011). Automatic syntactic processing of Modern Hebrew. PhD thesis, Ben Gurion University of the Negev, Israel.Google Scholar
  15. Goldberg, Y., & Elhadad, M. (2009). Hebrew dependency parsing: Initial results. In Proceedings of the 11th international workshop on parsing technologies (IWPT-2009), 7–9 October 2009 (pp. 129–133). Paris, France: The Association for Computational Linguistics.Google Scholar
  16. Hajič, J., & Zemánek, P. (2004). Prague Arabic dependency treebank: Development in data and tools. In Proceedings of the NEMLAR international conference on Arabic language resources and tools (pp. 110–117).Google Scholar
  17. Haugereid, P., Melnik, N., & Wintner, S. (2013). Nonverbal predicates in Modern Hebrew. In S. Müller (Ed.), The proceedings of the 20th international conference on head-driven phrase structure grammar. CSLI Publications.Google Scholar
  18. Kübler, S., McDonald, R. T., & Nivre, J. (2009). Dependency parsing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.Google Scholar
  19. Lembersky, G., Shacham, D., & Wintner, S. (2014). Morphological disambiguation of Hebrew: A case study in classifier combination. Natural Language Engineering. ISSN 1469-8110. doi:10.1017/S1351324912000216.
  20. MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3 ed). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  21. Marton, Y., Habash, N., & Rambow, O. (2013). Dependency parsing of Modern Standard Arabic with lexical and inflectional features. Computational Linguistics, 39(1), 161–194.CrossRefGoogle Scholar
  22. McDonald, R., Crammer, K., & Pereira, F. (2005). Online large-margin training of dependency parsers. In Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 91–98). Stroudsburg, PA, USA: Association for Computational Linguistics. doi:10.3115/1219840.1219852.
  23. Ninio, A. (2013). Dependency grammar and Hebrew. In G. Khan (Ed.), Encyclopedia of Hebrew language and linguistics. Leiden: Brill.Google Scholar
  24. Nir, B., MacWhinney, B., & Wintner, S. (2010). A morphologically-analyzed CHILDES corpus of Hebrew. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10) (pp. 1487–1490). European Language Resources Association (ELRA), ISBN 2-9517408-6-7.Google Scholar
  25. Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the eighth international worskshop on parsing technologies (IWPT-2003) (pp. 149–160).Google Scholar
  26. Nivre, J. (2005). Dependency grammar and dependency parsing. Technical report, Växjö University.Google Scholar
  27. Nivre, J. (2009). Non-projective dependency parsing in expected linear time. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing (pp. 351–359). Stroudsburg, PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1687878.1687929.
  28. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., et al. (2007). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL (pp. 915–932), Prague.Google Scholar
  29. Nivre, J., Hall, J., & Nilsson, J. (2006). Maltparser: A data-driven parser-generator for dependency parsing. In Proceedings of LREC-2006 (pp. 2216–2219).Google Scholar
  30. Nivre, J., Kuhlmann, M., & Hall, J. (2009). An improved oracle for dependency parsing with online reordering. In Proceedings of the 11th international conference on parsing technologies (IWPT-09) (pp. 73–76).Google Scholar
  31. Plank, B. (2011). Domain adaptation for parsing. Ph.D. Thesis, University of Groningen.Google Scholar
  32. Rosen, H. B. (1966). Ivrit Tova (Good Hebrew). Kiryat Sepher, Jerusalem, in Hebrew.Google Scholar
  33. Sagae, K., Davis, E., Lavie A., MacWhinney, B., & Wintner, S. (2010). Morphosyntactic annotation of CHILDES transcripts. Journal of Child Language, 37(3), 705–729. doi:10.1017/S0305000909990407.Google Scholar
  34. Sagae, K., & Lavie, A. (2006). A best-first probabilistic shift-reduce parser. In Proceedings of the COLING/ACL poster session (pp. 691–698). Association for Computational Linguistics.Google Scholar
  35. Sagae, K., & Tsujii, J. (2007). Dependency parsing and domain adaptation with LR models and parser ensembles. In Proceedings of the CoNLL shared task session of EMNLP-CoNLL 2007 (pp. 1044–1050). http://www.aclweb.org/anthology/D/D07/D07-1111.
  36. Seddah, D., Tsarfaty, R., & Foster, J., eds. (October 2011). Proceedings of the second workshop on statistical parsing of morphologically rich languages. Dublin, Ireland: Association for Computational Linguistics. http://www.aclweb.org/anthology/W11-38.
  37. Sima’an, K., Itai, A., Winter, Y., Altman, A., & Nativ, N. (2001). Building a tree-bank of Modern Hebrew text. Traitement Automatique des Langues, 42(2), 247–380.Google Scholar
  38. Smrž, O., & Pajas, P. (2004). MorphoTrees of Arabic and their annotation in the TrEd environment (pp. 38–41). ELDA.Google Scholar
  39. Tsarfaty, R., & Goldberg, Y. (2008). Word-based or morpheme-based? Annotation strategies for Modern Hebrew clitics. In Proceedings of the sixth international conference on language resources and evaluation (LREC’08). European Language Resources Association (ELRA). ISBN 2-9517408-4-0. http://www.lrec-conf.org/proceedings/lrec2008/.
  40. Tsarfaty, R., Nivre, J., & Andersson, E. (2012). Joint evaluation of morphological segmentation and syntactic parsing. In Proceedings of the 50th annual meeting of the association for computational linguistics (vol. 2, pp. 6–10).Google Scholar
  41. Tsarfaty, R., Seddah, D., Goldberg, Y., Kübler, S., Candito, M., Foster, J., et al. (2010). Statistical parsing of morphologically rich languages (spmrl): What, how and whither. In Proceedings of the NAACL HLT 2010 first workshop on statistical parsing of morphologically-rich languages (pp. 1–12). Stroudsburg, PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1868771.1868772.
  42. Tsarfaty, R., Seddah, D., Kübler, S., & Nivre, J. (2013). Parsing morphologically rich languages: Introduction to the special issue. Computational Linguistics, 39(1), 15–22.CrossRefGoogle Scholar
  43. Wintner, S. (2004). Hebrew computational linguistics: Past and future. Artificial Intelligence Review, 21(2), 113–138. ISSN 0269-2821. doi:10.1023/B:AIRE.0000020865.73561.bc.
  44. Zhang, Y., & Clark, S. (2011). Syntactic processing using the generalized perceptron and beam search. Computational Linguistics, 37(1):105–151. doi:10.1162/coli_a_00037.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Shai Gretz
    • 1
  • Alon Itai
    • 1
  • Brian MacWhinney
    • 2
  • Bracha Nir
    • 3
  • Shuly Wintner
    • 4
  1. 1.Department of Computer ScienceTechnionHaifaIsrael
  2. 2.Department of PsychologyCarnegie Mellon UniversityPittsburghUSA
  3. 3.Department of Communication DisordersUniversity of HaifaHaifaIsrael
  4. 4.Department of Computer ScienceUniversity of HaifaHaifaIsrael

Personalised recommendations