Advertisement

Machine Learning

, Volume 60, Issue 1–3, pp 11–39 | Cite as

Support Vector Learning for Semantic Argument Classification

  • Sameer Pradhan
  • Kadri Hacioglu
  • Valerie Krugler
  • Wayne Ward
  • James H. Martin
  • Daniel Jurafsky
Article

Abstract

The natural language processing community has recently experienced a growth of interest in domain independent shallow semantic parsing—the process of assigning a Who did What to Whom, When, Where, Why, How etc. structure to plain text. This process entails identifying groups of words in a sentence that represent these semantic arguments and assigning specific labels to them. It could play a key role in NLP tasks like Information Extraction, Question Answering and Summarization. We propose a machine learning algorithm for semantic role parsing, extending the work of Gildea and Jurafsky (2002), Surdeanu et al. (2003) and others. Our algorithm is based on Support Vector Machines which we show give large improvement in performance over earlier classifiers. We show performance improvements through a number of new features designed to improve generalization to unseen data, such as automatic clustering of verbs. We also report on various analytic studies examining which features are most important, comparing our classifier to other machine learning algorithms in the literature, and testing its generalization to new test set from different genre. On the task of assigning semantic labels to the PropBank (Kingsbury, Palmer, & Marcus, 2002) corpus, our final system has a precision of 84% and a recall of 75%, which are the best results currently reported for this task. Finally, we explore a completely different architecture which does not requires a deep syntactic parse. We reformulate the task as a combined chunking and classification problem, thus allowing our algorithm to be applied to new languages or genres of text for which statistical syntactic parsers may not be available.

Keywords

shallow semantic parsing support vector machines 

References

  1. Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. In Proceedings of the 17th International Conference on Machine Learning (pp. 9–16). San Francisco, CA: Morgan Kaufmann.Google Scholar
  2. Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley Framenet Project. In Proceedings of the International Conference on Computational Linguistics (COLING/ACL-98). (pp. 86–90). Montreal.Google Scholar
  3. Bikel, D. M., Schwartz, R., & Weischedel, R. M. (1999). An algorithm that learns what’s in a name. Machine Learning, 34, 211–231.CrossRefGoogle Scholar
  4. Blaheta, D., & Charniak, E. (2000). Assigning function tags to parsed text. In Proceedings of the 1st Annual Meeting of the North American Chapter of the ACL(NAACL) (pp. 234–240). Seattle, Washington.Google Scholar
  5. Burges, C. J. C. (1998). Atutorial on support vectormachines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 121–167.CrossRefGoogle Scholar
  6. Charniak, E. (2001). Immediate-head parsing for language models. In Proceedings of the 39th Annual Conference of the Association for Computational Linguistics (ACL-01). Toulouse, France.Google Scholar
  7. Chen, J., & Rambow, O. (2003). Use of deep linguistics features for the recognition and labeling of semantic arguments. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Sapporo, Japan.Google Scholar
  8. Collins, M. J. (1999) Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.Google Scholar
  9. Daniel, K., Schabes, Y., Zaidel, M., & Egedi, D.(1992). A freely available wide coverage morphological analyzer for English. In Proceedings of the 14th International Conference on Computational Linguistics (COLING-92). Nantes, France.Google Scholar
  10. Fleischman, M., & Hovy, E. (2003). A maximum entropy approach to framenet tagging. In Proceedings of the Human Language Technology Conference. Edmonton, Canada.Google Scholar
  11. Gildea, D., & Hockenmaier, J. (2003). Identifying semantic roles using combinatory categorial grammar. InProceedings of the Conference on Empirical Methodsin Natural Language Processing. Sapporo, Japan.Google Scholar
  12. Gildea, D., & Jurafsky, D. (2000). Automatic labeling of semantic roles. In Proceedings of the 38th Annual Conference of the Association for Computational Linguistics (ACL-00) (pp. 512–520). Hong Kong.Google Scholar
  13. Gildea, D. & Jurafsky, D. (2002).Automatic labeling of semantic roles. Computational Linguistics, 28:3, 245–288.CrossRefGoogle Scholar
  14. Gildea, D., & Palmer, M. (2002). The necessity of syntactic parsing for predicate argument recognition. In Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL-02). Philadelphia, PA.Google Scholar
  15. Hacioglu, K., Pradhan, S., Ward, W., Martin, J., & Jurafsky, D. (2003). Shallow semantic parsing using support vector machines. Technical Report TR-CSLR-2003-1, Center for Spoken Language Research, Boulder, Colorado.Google Scholar
  16. Hacioglu, K., & Ward, W. (2003). Target word detection and semantic role chunking using support vector machines. In Proceedings of the Human Language Technology Conference. Edmonton, Canada.Google Scholar
  17. Hearst, M. (1999). Untangling text data mining. In Proceedings of the 37th Annual Meeting of the ACL (pp. 3–10). College Park, Maryland.Google Scholar
  18. Hofmann, T., & Puzicha, J. (1998). Statistical models for co-occurrence data. Memo, Massachusetts Institute of Technology Artificial Intelligence Laboratory.Google Scholar
  19. Joachims, T. (1998) Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning (ECML).Google Scholar
  20. Kingsbury, P., Palmer, M., & Marcus, M. (2002). Adding semantic annotation to the Penn Treebank. In Proceedings of the Human Language Technology Conference. San Diego, CA.Google Scholar
  21. Kressel, U. H. G. (1999). Pairwise classification and support vector machines. In B. Scholkopf, C. Burges, & A. J. Smola (Eds.), Advances in kernel methods. The MIT Press.Google Scholar
  22. Kudo, T., & Matsumoto, Y. (2000). Use of support vector learning for chunk identification. In Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000 (pp. 142–144).Google Scholar
  23. Kudo, T., & Matsumoto, Y. (2001). Chunking with support vector machines. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2001).Google Scholar
  24. LDC: (2002). The AQUAINT Corpus of English News Text, Catalog no. LDC2002T31.Google Scholar
  25. Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the International Conference on Computational Linguistics (COLING/ACL-98). Montreal, Canada.Google Scholar
  26. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2:Feb, 419–444.CrossRefGoogle Scholar
  27. Magerman, D. (1994). Natural language parsing as statistical pattern recognition. Ph.D. thesis, Stanford University, CA.Google Scholar
  28. Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., & Schasberger, B. (1994). The Penn treebank: Annotating predicate argument structure.Google Scholar
  29. Platt, J. (2000). Probabilities for support vectormachines. In A. Smola, P. Bartlett, B. Scholkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers. Cambridge, MA: MIT press.Google Scholar
  30. Pradhan, S., Hacioglu, K., Ward, W., Martin, J., & Jurafsky, D. (2003). Semantic role parsing: Adding semantic structure to unstructured text. In Proceedings of the International Conference on Data Mining (ICDM 2003). Melbourne, Florida.Google Scholar
  31. Pradhan, S., Ward, W., Hacioglu, K., Martin, J., & Jurafsky, D. (2004). Shallow Semantic parsing using support vector machines. In Proceedings of the Human Language Technology Conference/North American chapter of the Association of Computational Linguistics (HLT/NAACL). Boston, MA.Google Scholar
  32. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:1, 81&106.Google Scholar
  33. Quinlan, R. (2003). Data Mining Tools See5 and C5.0. http://www.rulequest.com.
  34. Ramshaw, L. A., & Marcus, M. P. (1995). Text chunking using transformation-based learning. In Proceedings of the Third Annual Workshop on Very Large Corpora (pp. 82–94).Google Scholar
  35. Sang, E. F. T. K., & Veenstra, J. (1999). Representing text chunks. In Proceedingsof the EACL. (pp. 173–179).Google Scholar
  36. Surdeanu, M., Harabagiu, S., Williams,J., & Aarseth, P. (2003). Using predicate-argument structures for information extraction. In Proceedings of the 41stAnnual Meeting of the Association for Computational Linguistics. Sapporo, Japan.Google Scholar
  37. Thompson, C. A., Levy, R., & Manning, C. D. (2003). A generative model for semantic role labeling. In Proceedings of the European Conference on Machine Learning (ECML).Google Scholar
  38. Vapnik, V. (1998). Statistical learning theory New York: John Wiley and Sons Inc.Google Scholar
  39. Wallis, S., & Nelson, G. (2001). Knowledge discovery in grammatically analysed corpora. Data Mining and Knowledge Discovery, 5:4, 305–335.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Sameer Pradhan
    • 1
  • Kadri Hacioglu
    • 1
  • Valerie Krugler
    • 1
    • 2
  • Wayne Ward
    • 1
  • James H. Martin
    • 1
  • Daniel Jurafsky
    • 1
    • 3
  1. 1.The Center for Spoken Language ResearchUniversity of ColoradoBoulder
  2. 2.Stanford UniversityStanford
  3. 3.Stanford UniversityStanford

Personalised recommendations