Skip to main content
Log in

A Hebrew verb–complement dictionary

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

We present a verb–complement dictionary of Modern Hebrew, automatically extracted from text corpora. Carefully examining a large set of examples, we defined ten types of verb complements that cover the vast majority of the occurrences of verb complements in the corpora. We explored several collocation measures as indicators of the strength of the association between the verb and its complement. We then used these measures to automatically extract verb complements from corpora. The result is a wide-coverage, accurate dictionary that lists not only the likely complements for each verb, but also the likelihood of each complement. We evaluated the quality of the extracted dictionary both intrinsically and extrinsically. Intrinsically, we showed high precision and recall on randomly (but systematically) selected verbs. Extrinsically, we showed that using the extracted information is beneficial for two applications, prepositional phrase attachment disambiguation and Arabic-to-Hebrew machine translation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. For readability, we use a straight-forward, one-to-one transliteration of Hebrew in which the letters, in traditional Hebrew alphabetic order, are represented by abgdhwzxTiklmnsypcqršt.

  2. Different binyanim (verb patterns) constitute different lemmas.

  3. Under ‘Lexicons’, at http://www.mila.cs.technion.ac.il.

  4. Since we did not use a parser to determine the PPs to be attached, we are immune in this experiment to the criticism of Atterer and Schütze (2007), whereby using an “oracle” distorts the actual performance of the attachment module.

  5. This experiment was reported in Shilon et al. (2012a).

References

  • Albert, A., MacWhinney, B., Nir, B., & Wintner, S. (2012). A morphologically annotated Hebrew CHILDES corpus. In Proceedings of the Workshop on Computational Models of Language Acquisition and Loss (pp. 20–22), Avignon, France, April 2012. Association for Computational Linguistics. http://www.aclweb.org/anthology/W/W12/W12-0904.

  • Atterer, M., & Schütze, H. (2007). Prepositional phrase attachment without oracles. Computational Linguistics, 33(4), 469–476. ISSN 0891-2017. doi:10.1162/coli.2007.33.4.469.

    Google Scholar 

  • Baldewein, U. (2004). Modeling attachment decisions with a probabilistic parser: The case of head final structures. In Proceedings of the 26th Annual Conference of the Cognitive Science Society (pp 73–78). Erlbaum.

  • Belletti, A., & Shlonsky, U. (1995). The order of verbal complements: A comparative study. Natural Language and Linguistic Theory, 13(3), 489–526.

    Article  Google Scholar 

  • Brent, M. R., (1991). Automatic acquisition of subcategorization frames from untagged text. In Proceedings of the 29th annual meeting on Association for Computational Linguistics (pp. 209–214), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/981344.981371.

  • Brent, M. R. (1993). From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics, 19(2), 243–262.

    Google Scholar 

  • Briscoe, T., & Carroll, J. (1993). Generalised probabilistic LR parsing of natural language (corpora) with unification-based grammars. Computational Linguistics, 19(1), 25–59.

    Google Scholar 

  • Briscoe, T., & Carroll, J. (1997). Automatic extraction of subcategorization from corpora. In Proceedings of the 5th ACL Conference on Applied Natural Language Processing (pp. 356–363).

  • Carroll, J., Minnen, G., & Briscoe, T. (1998). Can subcategorisation probabilities help a statistical parser? In Proceedings of the 6th ACL/SIGDAT Workshop on Very Large Corpora (pp. 118–126).

  • Chang, B., Danielsson, P., & Teubert, W. (2002). Extraction of translation unit from Chinese–English parallel corpora. In Proceedings of the first SIGHAN workshop on Chinese language processing, (pp. 1–5), Morristown, NJ, USA. Association for Computational Linguistics. doi:10.3115/1118824.1118825.

  • Chesley, P., & Salmon-alt, S. (2006). Automatic extraction of subcategorization frames for French. In Proceedings of the Language Resources and Evaluation Conference, LREC 2006 (pp. 253–258). European Language Resources Association (ELRA).

  • Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

    Google Scholar 

  • Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29. ISSN 0891-2017.

    Google Scholar 

  • Dahlgren, K., & McDowell, J. P. (1986). Using commonsense knowledge to disambiguate prepositional phrase modifiers. In T. Kehler (Ed.), Proceedings of the 5th National Conference on Artificial Intelligence (pp. 589–593). Morgan Kaufmann.

  • Dȩbowski, Ł. (2009). Valence extraction using EM selection and co-occurrence matrices. Language Resources and Evaluation, 43(4), 301–327.

    Article  Google Scholar 

  • Denkowski, M. & Lavie, A. (2011). Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Sixth Workshop on Statistical Machine Translation (pp. 85–91). Association for Computational Linguistics, July 2011. http://www.aclweb.org/anthology/W11-2107.

  • Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19, 61–74.

    Google Scholar 

  • Garnsey, S. M., Pearlmutter, N. J., Myers, E., & Lotocky, M. A. (1997). The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory and Language, 37(1), 58–93, 7.

    Google Scholar 

  • Goldberg, Y. (2011). Automatic Syntactic Processing of Modern Hebrew. PhD thesis, Ben Gurion University of the Negev, Israel.

  • Goldberg, Y., & Elhadad, M. (2009). Hebrew dependency parsing: Initial results. In Proceedings of the 11th International Workshop on Parsing Technologies (IWPT-2009), 7–9 October 2009, Paris, France (pp. 129–133). The Association for Computational Linguistics.

  • Goldberg, Y., & Elhadad, M. (2010). An efficient algorithm for easy-first non-directional dependency parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 742–750). Stroudsburg, PA, USA. Association for Computational Linguistics. ISBN 1-932432-65-5. http://dl.acm.org/citation.cfm?id=1857999.1858114.

  • Guthmann, N., Krymolowski, Y., Milea, A., & Winter, Y. (2009). Automatic annotation of morpho-syntactic dependencies in a Modern Hebrew treebank. In Proceedings of Trees in Linguistic Theory (TLT-2009), January 2009.

  • Hajič, J., Čmejrek, M., Dorr, B., Ding, Y., Eisner, J., Gildea, D., Koo, T., Parton, K., Penn, G., Radev, D., & Rambow, O. (2004). Natural language generation in the context of machine translation. Technical report, Center for Language and Speech Processing, Johns Hopkins University, March 2004. http://cs.jhu.edu/~jason/papers/ws02. Final report from 2002 CLSP summer workshop (p. 87).

  • Han, X., Zhao, T., Qi, H., Yu, H. (2004). Subcategorization acquisition and evaluation for Chinese verbs. In Proceedings of the 20th international conference on Computational Linguistics (COLING ’04), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1220355.1220459.

  • Hindle, D., & Rooth, M. (1993). Structural ambiguity and lexical relations. Computationa Linguistics, 19(1), 103–120. ISSN 0891-2017.

    Google Scholar 

  • Hirst, G. (1988). Semantic interpretation and ambiguity. Artificial Intelligence, 34(2), 131–177.

    Article  Google Scholar 

  • Huddleston, R., & Pullum, G. K. (2002). The Cambridge Grammar of the English Language. Cambridge, MA: Cambridge University Press.

    Google Scholar 

  • Ienco, D., Villata, S., & Bosco, C. (2008). Automatic extraction of subcategorization frames for Italian. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). European Language Resources Association (ELRA), May 2008. ISBN 2-9517408-4-0. http://www.lrec-conf.org/proceedings/lrec2008/.

  • Itai, A., & Wintner, S. (2008). Language resources for Hebrew. Language Resources and Evaluation, 42(1), 75–98.

    Article  Google Scholar 

  • Jensen, K., Binot, J.-L. (1987). Disambiguating prepositional phrase attachments by using on-line dictionary definitions. Computational Linguistics, 13(3–4), 251–260. ISSN 0891-2017.

    Google Scholar 

  • Korhonen, A. (2000). Using semantically motivated estimates to help subcategorization acquisition. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 216–223), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1117794.1117821.

  • Korhonen, A. (2002a). Semantically motivated subcategorization acquisition. In Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition (pp. 51–58), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1118627.1118634.

  • Korhonen, A. (2002b). Subcategorisation acquisition. PhD thesis, Computer Laboratory, University of Cambridge. Techical Report UCAM-CL-TR-530.

  • Korhonen, A., Gorrell, G., & McCarthy, D. (2000). Statistical filtering and subcategorization frame acquisition. In Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora (pp. 199–206), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1117794.1117819.

  • Korhonen, A., Krymolowski, Y., & Briscoe, T. (2006). A large subcategorization lexicon for natural language processing applications. In Proceedings of the Language Resources and Evaluation Conference, LREC 2006 (pp. 1015–1020). European Language Resources Association (ELRA).

  • Korhonen, A., Preiss, J. (2003). Improving subcategorization acquisition using word sense disambiguation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (pp. 48–55), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1075096.1075103.

  • Kummerfeld, J. K., Hall, D., Curran, J. R., & Klein, D. (2012). Parser showdown at the wall street corral: An empirical investigation of error types in parser output. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1048–1059), Jeju Island, South Korea, July 2012. Association for Computational Linguistics. http://www.aclweb.org/anthology/D12-1096.

  • Kummerfeld, J. K., Tse, D., Curran, J. R., Klein, D. (2013). An empirical examination of challenges in Chinese parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 98–103), Sofia, Bulgaria, August 2013. Association for Computational Linguistics. http://www.aclweb.org/anthology/P13-2018.

  • Lapata, M., Keller, F., & Schulte im Walde, S. (2001). Verb frame frequency as a predictor of verb bias. Journal of Psycholinguistic Reseach, 30(4), 419–435.

    Article  Google Scholar 

  • Lavie, A. (2008). Stat-XFER: A general search-based syntax-driven framework for machine translation. In A. F. Gelbukh (ed.), CICLing, vol. 4919 of Lecture Notes in Computer Science (pp. 362–375). Springer. ISBN 978-3-540-78134-9.

  • Levin, B. (1993). English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago Press. ISBN 9780226475332.

  • Li, J., Brew, C. (2005). Automatic extraction of subcategorization frames from spoken corpora. In Proceedings of the Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes (pp. 74–79).

  • Lin, D. (1998). Dependency-based evaluation of MINIPAR. In Proceedings of the Workshop on the Evaluation of Parsing Systems (pp. 317–330). Springer.

  • Merlo, P., & Ferrer, E. E. (2006). The notion of argument in prepositional phrase attachment. Computational Linguistics, 32(3), 341–378. ISSN 0891-2017. doi:10.1162/coli.2006.32.3.341.

  • Messiant, C., Poibeau, T., Korhonen, A. (2008). LexSchem: a large subcategorization lexicon for French verbs. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). European Language Resources Association (ELRA), May 2008. ISBN 2-9517408-4-0. http://www.lrec-conf.org/proceedings/lrec2008/.

  • Nir, B., MacWhinney, B., & Wintner, S. (2010). A morphologically-analyzed CHILDES corpus of Hebrew. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10) (pp. 1487–1490). European Language Resources Association (ELRA), May 2010. ISBN 2-9517408-6-7.

  • Ó Séaghdha, D. (2010). Latent variable models of selectional preference. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 435–444), Stroudsburg, PA, USA. Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1858681.1858726.

  • Pantel, P., & Lin, D. (2000). An unsupervised approach to prepositional phrase attachment using contextually similar words. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (pp. 101–108), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1075218.1075232.

  • Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In ACL ’02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (pp. 311–318), Morristown, NJ, USA. Association for Computational Linguistics. doi:10.3115/1073083.1073135.

  • Pecina, P. (2005). An extensive empirical study of collocation extraction methods. In Proceedings of the ACL Student Research Workshop (pp. 13–18), Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P05/P05-2003.

  • Ratnaparkhi, A., Reynar, J., Roukos, S. (1994). A maximum entropy model for prepositional phrase attachment. In Proceedings of the workshop on Human Language Technology (pp. 250–255), Stroudsburg, PA, USA. Association for Computational Linguistics. ISBN 1-55860-357-3. doi:10.3115/1075812.1075868.

  • Resnik, P., Hearst, M. A. (1993). Structural ambiguity and conceptual relations. In Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives (pp. 58–64).

  • Ritter, A., Mausam, & Etzioni, O. (2010). A latent dirichlet allocation method for selectional preferences. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 424–434), Stroudsburg, PA, USA. Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1858681.1858725.

  • Ross, J. R. (1967). Constraints on variables in syntax. PhD thesis, Massachusetts Institute of Technology, Department of Modern Languages and Linguistics.

  • Sarkar, A., & Zeman, D. (2000). Automatic extraction of subcategorization frames for Czech. In Proceedings of the 18th Conference on Computational Linguistics (pp. 691–697), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/992730.992746.

  • Schulte im Walde, S., & Brew, C. (2002). Inducing German semantic verb classes from purely syntactic subcategorisation information. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 223–230), Philadelphia, PA.

  • Shilon, R., Fadida, H. & Wintner, S. (2012a). Incorporating linguistic knowledge in statistical machine translation: Translating prepositions. In Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data (pp. 106–114), Avignon, France. Association for Computational Linguistics. http://www.aclweb.org/anthology/W/W12/W12-0514.

  • Shilon, R., Habash, N., Lavie, A., & Wintner, S. (2010). Machine translation between Hebrew and Arabic: Needs, challenges and preliminary solutions. In Proceedings of AMTA 2010: The Ninth Conference of the Association for Machine Translation in the Americas.

  • Shilon, R., Habash, N., Lavie, A., & Wintner, S. (2012b). Machine translation between Hebrew and Arabic. Machine Translation, 26, 177–195. ISSN 0922-6567. http://dx.doi.org/10.1007/s10590-011-9103-z.

  • Sima’an, K., Itai, A., Winter, Y., Altman, A., & Nativ, N. (2001). Building a tree-bank of modern Hebrew text. Traitement Automatique des Langues, 42(2), 247–380.

    Google Scholar 

  • Stern, N. (1994). Milon ha-Poal. Bar Ilan University. ISBN 965-226-164-5. In Hebrew.

  • Stetina, J., & Nagao, M. (1997). Corpus based PP attachment ambiguity resolution with a semantic dictionary. In J. Zhou & K. W. Church (eds.), Proceedings of the Fifth Workshop on Very Large Corpora (pp. 66–80).

  • Sun, L., & Korhonen, A. (2009). Improving verb clustering with automatically acquired selectional preferences. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (pp. 638–647), Stroudsburg, PA, USA. Association for Computational Linguistics. ISBN 978-1-932432-62-6.

  • Sun, L., Korhonen, A., & Krymolowski, Y. (2008a). Automatic classification of English verbs using rich syntactic features. In Proceedings of the Third International Joint Conference on Natural Language Processing (pp. 769–774). http://aclweb.org/anthology-new/I/I08/I08-2107.pdf.

  • Sun, L., Korhonen, A., & Krymolowski, Y. (2008b). Verb class discovery from rich syntactic data. In Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing (pp. 16–27), Berlin, Heidelberg. Springer-Verlag. ISBN 3-540-78134-X, 978-3-540-78134-9.

  • Surdeanu, M., Harabagiu, S., Williams, J., & Aarseth, P. (2003). Using predicate-argument structures for information extraction. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (pp. 8–15), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1075096.1075098.

  • Tsvetkov, Y., & Wintner, S. (2010). Extraction of multi-word expressions from small parallel corpora. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) (pp. 1256–1264).

  • Tsvetkov, Y., & Wintner, S. (2012). Extraction of multi-word expressions from small parallel corpora. Natural Language Engineering, 18(4). 549–573. doi:10.1017/S1351324912000101.

    Article  Google Scholar 

  • Villavicencio, A., Kordoni, V., Zhang, Y., Idiart, M., Ramisch, C. (2007). Validation and evaluation of automatically acquired multiword expressions for grammar engineering. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 1034–1043). http://www.aclweb.org/anthology/D/D07/D07-1110.

  • Volk, M. (2002). Combining unsupervised and supervised methods for PP attachment disambiguation. In Proceedings of the 19th international conference on Computational linguistics (vol. 1, pp. 1–7), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1072228.1072232.

  • Wilks, Y., Huang, X., Fass, D. (1985). Syntax, preference, and right attachment. In Proceedings of the 9th International Joint Conference on Artificial Intelligence, vol. 2 of IJCAI’85 (pp. 779–784), San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. ISBN 0-934613-02-8, 978-0-934-61302-6.

  • Yeh, A. S., & Vilain, M. B. (1998). Some properties of preposition and subordinate conjunction attachments. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (vol. 2, pp. 1436–1442), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/980691.980803.

  • Zanette, A., Scarton, C., & Zilio, L. (2012). Automatic extraction of subcategorization frames from corpora: An approach to portuguese. In Proceedings of PROPOR 2012: International Conference on Computational Processing of the Portuguese Language. http://www.propor2012.org/demos/DemoSubcategorization.pdf.

  • Zeman, D. (2002). Can subcategorization help a statistical dependency parser? In Proceedings of the 19th international conference on Computational linguistics (COLING-02) (pp. 1156–1162), Stroudsburg, PA, USA. Association for Computational Linguistics. doi:10.3115/1072228.1072346.

Download references

Acknowledgments

This research was supported by THE ISRAEL SCIENCE FOUNDATION (Grants No. 1269/07, 505/11). We are grateful to Reshef Shilon for his help with the machine translation experiments, and to Yoav Goldberg for his help with the Hebrew parser. Thanks are also due to Kayla Jacobs for several useful comments. We benefitted greatly from the constructive comments of three anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuly Wintner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fadida, H., Itai, A. & Wintner, S. A Hebrew verb–complement dictionary. Lang Resources & Evaluation 48, 249–278 (2014). https://doi.org/10.1007/s10579-013-9259-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-013-9259-7

Keywords

Navigation