Skip to main content
Log in

How to pick out token instances of English verb-particle constructions

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

We propose a method for automatically identifying individual instances of English verb-particle constructions (VPCs) in raw text. Our method employs the RASP parser and analysis of the sentential context of each VPC candidate to differentiate VPCs from simple combinations of a verb and prepositional phrase. We show that our proposed method has an F-score of 0.974 at VPC identification over the Brown Corpus and Wall Street Journal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. VPCs are found in a number of languages, including English, German and Dutch, but in this paper, we target English VPCs exclusively; VPCs are also commonly termed “phrasal verbs” in the literature.

  2. Prepositional verbs are obligatorily transitive, so there is no ambiguity with intransitive VPCs.

  3. Focusing exclusively on the subject and object argument positions.

  4. All sense definitions are derived from WordNet 2.1, based on the first sense of each word; note that all examples are based on corpus examples, but simplified for expository purposes.

  5. The reason we chose to hand-check the instances rather than simply using the gold-standard POS tags in the original Brown Corpus and Wall Street Journal (which distinguish between particles and transitive prepositions) was that the POS tags were found to be highly unreliable.

  6. The choice of 3 levels was made empirically.

  7. Note that no compositionality 0–2 instances were observed in our data to be able to track this trend to the level of full non-compositionality.

References

  • Baldwin, T. (2005a). The deep lexical acquisition of English verb-particles. Computer Speech and Language, Special Issue on Multiword Expressions, 19(4), 398–414.

    Google Scholar 

  • Baldwin, T. (2005b). Looking for prepositional verbs in corpus data. In Proceedings of the 2nd ACL-SIGSEM workshop on the linguistic dimensions of prepositions and their use in computational linguistics formalisms and applications (pp. 115–126). Colchester, UK.

  • Baldwin, T., Bannard, C., Tanaka, T., & Widdows D. (2003). An empirical model of multiword expression decomposability. In Proceedings of the ACL-2003 workshop on multiword expressions: Analysis, acquisition and treatment (pp. 89–96). Sapporo, Japan.

  • Baldwin, T., Beavers, J., Van Der Beek, L., Bond, F., Flickinger, D., & Sag, I. A. (2006). In search of a systematic treatment of determinerless PPs. In P. Saint-Dizier (Ed.), Syntax and semantics of prepositions. Dordrecht: Springer.

    Google Scholar 

  • Baldwin, T., & Kim, S. N. (2009). Multiword expressions. In N. Indurkhya & F. J. Damerau (Eds.), Handbook of natural language processing (2nd ed.). Boca Raton, USA: CRC Press.

    Google Scholar 

  • Baldwin, T., & Villavicencio, A. (2002). Extracting the unextractable: A case study on verb-particles. In Proceedings of the 6th conference on natural language learning (CoNLL-2002) (pp. 98–104). Taipei, Taiwan.

  • Bannard, C. (2003). Statistical techniques for automatically inferring the semantics of verb-particle constructions. Master’s Thesis, University of Edinburgh.

  • Bannard, C., Baldwin, T., & Lascarides A. (2003). A statistical approach to the semantics of verb-particles. In Proceedings of the ACL2003 workshop on multiword expressions: Analysis, acquisition and treatment (pp. 65–72). Sapporo, Japan.

  • Bolinger, D. (1976). The phrasal verb in English. Boston, USA: Harvard University Press.

    Google Scholar 

  • Briscoe, T., & Carroll, J. (2002). Accurate statistical annotation of general text. In Proceedings of the 3rd international conference on language resources and evaluation (LREC-2002) (pp. 1499–1504). Las Palmas, Canary Islands.

  • Calzolari, N., Fillmore, C., Grishman, R., Ide, N., Lenci, A., MacLeod, C., & Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. In Proceedings of the 3rd international conference on language resources and evaluation (LREC 2002) (pp. 1934–1940). Las Palmas, Canary Islands.

  • Charniak, E. (2000). A maximum entropy-based parser. In Proceedings of the 1st annual meeting of the North American chapter of association for computational linguistics (pp. 132–139). Seattle, USA.

  • Cook, P., & Stevenson, S. (2006). Classifying particle semantics in English verb-particle constructions. In Proceedings of the ACL-2006 workshop on multiword expressions: Identifying and exploiting underlying properties (pp. 45–53). Sydney, Australia.

  • Daelemans, W., Zavrel, J., van der Sloot, K., & van den Bosch, A. (2004). TiMBL: Tilburg memory based learner, version 5.1, reference guide.

  • Dehe, N. (2002). Particle verbs in English: Syntax, information structure and intonation. Amsterdam, Netherlands/Philadelphia, USA: John Benjamins Publishing.

    Google Scholar 

  • Dehe, N., Jackendoff, R., McIntyre, A., & Urban, S. (Eds.). (2001). Verb-particle explorations. Berlin, Germany/New York, USA: Mounton de Gruyter.

    Google Scholar 

  • Fellbaum, C. (Ed.). (1998). WordNet, an electronic lexical database. Cambridge, MA: MIT Press.

    Google Scholar 

  • Fraser, B. (1976). The verb-particle combination in English. The Hague: Mouton.

    Google Scholar 

  • Grefenstette, G., & Teufel, S. (1995). A corpus-based method for automatic identification of support verbs for nominalizations. In Proceedings of the 7th European chapter of association of computational linguistics (EACL-1995) (pp. 98–103). Dublin, Ireland.

  • Huddleston, R., & Pullum, G. K. (2002). The cambridge grammar of the English language. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Jackendoff, R. (1973). The base rules for prepositional phrases. In S. Anderson & P. Kiparsky (Eds.), A festschrift for Morris Halle (pp. 345–356). New York, USA: Rinehart and Winston.

    Google Scholar 

  • Jackendoff, R. (2002). Foundations of language. Oxford, UK: Oxford University Press.

    Book  Google Scholar 

  • Katz, G., & Giesbrecht, E. (2006). Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In Proceedings of the ACL-2006 workshop on multiword expressions: Identifying and exploiting underlying properties (pp. 28–35). Sydney, Australia.

  • Kim, S. N., & Baldwin, T. (2007). Detecting compositionality of English verb-particle constructions using semantic similarity. In Proceedings of conference of the Pacific association for computational linguistics (pp. 40–48). Melbourne, Australia.

  • Landes, S., Leacock, C., & Tengi, R. I. (1998). Building semantic concordances. In C. Fellbaum (Ed.), WordNet: An electronic lexical database. Cambridge, USA: MIT Press.

    Google Scholar 

  • Li, W., Zhang, X., Niu, C., Jiang, Y., & Srihari, R. K. (2003). An expert lexicon approach to identifying English phrasal verbs. In Proceedings of the ACL2003 workshop on multiword expressions: Analysis, acquisition and treatment (pp. 513–520). Sapporo, Japan.

  • Lidner, S. (1983). A lexico-semantic analysis of English verb particle constructions with OUT and UP. Ph.D. Thesis, University of Indiana at Bloomington.

  • Lin, D. (1993). Principle-based parsing without overgeneration. In Proceedings of the 31th association of computational linguistics (ACL-1993) (pp. 112–120). Columbus, OH.

  • Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  • McCarthy, D., Keller, B., & Carroll, J. (2003). Detecting a continuum of compositionality in phrasal verbs. In Proceedings of the ACL2003 workshop on multiword expressions: Analysis, acquisition and treatment (pp. 73–80). Sapporo, Japan.

  • McCarthy, D., Koeling, R., Weeds, J., & Carroll, J. (2004). Finding predominant senses in untagged text. In Proceedings of the 42nd annual meeting of the association of computational linguistics (pp. 280–287). Barcelona, Spain.

  • Ngai, G., & Florian, R. (2001). Transformation-based learning in the fast lane. In Proceedings of the 2nd annual meeting of the North American chapter of association for computational linguistics (NAACL) (pp. 40–47). Pittsburgh, USA.

  • O’Dowd, E. M. (1998). Prepositions and particles in English. Oxford: Oxford University Press.

    Google Scholar 

  • O’Hara, T., & Wiebe, J. (2003). Preposition semantic classification via Treebank and FrameNet. In Proceedings of the 7th conference on natural language learning (pp. 79–86). Edmonton, Canada.

  • Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of the 3rd international conference on intelligent text processing and computational linguistics (CICLing-2002) (pp. 1–15). Mexico City, Mexico.

  • Stevenson, S., Fazly, A., & North, R. (2004). Statistical measures of the semi-productivity of light verb constructions. In Proceedings of the 2nd ACL workshop on multiword expressions: Integrating processing (pp. 1–8). Barcelona, Spain.

  • van der Beek, L. (2005). The extraction of determinerless PPs. In Proceedings of the second ACL-SIGSEM workshop on the linguistic dimensions of prepositions and their use in computational linguistics formalisms and applications (pp. 190–199). Colchester, UK.

  • Villavicencio, A. (2003a). Verb-particle constructions and lexical resources. In Proceedings of the ACL2003 workshop on multiword expressions: Analysis, acquisition and treatment (pp. 57–64). Sapporo, Japan.

  • Villavicencio, A. (2003b). Verb-Particle constructions in the world wide web. In Proceedings of the ACL-SIGSEM workshop on the linguistic dimensions of prepositions and their use in computational linguistics formalisms and applications. Toulouse, France.

  • Widdows, D., & Dorow, B. (2005). Automatic extraction of idioms using graph analysis and asymmetric lexicosyntactic patterns. In Proceedings of ACL2005 workshop on deep lexical axquisition (pp. 48–56). Ann Arbor, MI, USA.

Download references

Acknowledgement

This research was carried out in part with support from Australian Research Council Grant No. DP0663879.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Su Nam Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S.N., Baldwin, T. How to pick out token instances of English verb-particle constructions. Lang Resources & Evaluation 44, 97–113 (2010). https://doi.org/10.1007/s10579-009-9099-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-009-9099-7

Keywords

Navigation