Starting from Scratch in Semantic Role Labeling: Early Indirect Supervision

  • Michael Connor
  • Cynthia Fisher
  • Dan Roth
Part of the Theory and Applications of Natural Language Processing book series (NLP)


A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Where do children learning their first languages begin in solving this problem? To experiment with different representations that children may use to begin understanding language, we have built a computational model for this early point in language acquisition. This system, Latent BabySRL, learns from transcriptions of natural child-directed speech and makes use of psycholinguistically plausible background knowledge and realistically noisy semantic feedback to improve both an intermediate syntactic representation and its final semantic role classification. Using this system we show that it is possible for a simple learner in a plausible (noisy) setup to begin comprehending the meanings of simple sentences, when initialized with a small amount of concrete noun knowledge and some simple syntax-semantics mapping biases, before acquiring any specific verb knowledge.


Hide Markov Model Content Word Function Word Semantic Role Input Sentence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We wish to thank Yael Gertner for insightful discussion that led up to this work as well as the various annotators who helped create the semantically tagged data. This research is supported by NSF grant BCS-0620257 and NIH grant R01-HD054448.


  1. 1.
    Alishahi, A., & Stevenson, S. (2010). A computational model of learning semantic roles from child-directed language. Language and Cognitive Processes, 25(1), 50–93.CrossRefGoogle Scholar
  2. 2.
    Alishahi, A., & Stevenson, S. (2012). Gradual acquisition of verb selectional prefences in a bayesian model. In A. Villavicencio, A. Alishahi, T. Poibeau, & A. Korhonen (Eds.), Cognitive aspects of computational language acquisition. New York: Springer.Google Scholar
  3. 3.
    Beal, M. J. (2003). Variational algorithms for approximate bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London.Google Scholar
  4. 4.
    Bever, T. G. (1970). The cognitive basis for linguistic structures. In J. Hayes (Ed.), Cognition and the development of language (pp. 279–362). New York: Wiley.Google Scholar
  5. 5.
    Bloom, B. H. (1970). Space/time trade-offs in Hash coding with allowable errors. Communications of the ACM, 13(7), 422–426.zbMATHCrossRefGoogle Scholar
  6. 6.
    Bloom, L. (1973). One word at a time: The use of single-word utterances before syntax. The Hague: Mouton.Google Scholar
  7. 7.
    Bod, R. (2009). From exemplar to grammar: A probabilistic analogy-based model of language learning. Cognitive Science, 33(5), 752–793.CrossRefGoogle Scholar
  8. 8.
    Brent, M. R., & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition, 81, 31–44.CrossRefGoogle Scholar
  9. 9.
    Brill, E. (1997). Unsupervised learning of disambiguation rules for part of speech tagging. In S. Armstrong, K. Church, P. Isabelle, S. Manzi, E. Tzoukermann, & D. Yarowsky (Eds.), Natural language processing using very large corpora. Dordrecht: Kluwer Academic Press.Google Scholar
  10. 10.
    Brown, R. (1973). A first language. Cambridge: Harvard University Press.Google Scholar
  11. 11.
    Brown, P., Pietra, V. D., deSouza, P., Lai, J., & Mercer, R. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–479.Google Scholar
  12. 12.
    Carreras, X., & Màrquez, L. (2004). Introduction to the CoNLL-2004 shared tasks: Semantic role labeling. In Proceedings of CoNLL-2004 (pp. 89–97), Boston.Google Scholar
  13. 13.
    Carreras, X., & Màrquez, L. (2005). Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor.Google Scholar
  14. 14.
    Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological review, 113(2), 234–272.CrossRefGoogle Scholar
  15. 15.
    Chang, M., Goldwasser, D., Roth, D., & Srikumar, V. (2010). Discriminative learning over constrained latent representations. In Proceedings of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), Los Angeles.Google Scholar
  16. 16.
    Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Providence.Google Scholar
  17. 17.
    Cherry, C., & Quirk, C. (2008). Discriminative, syntactic language modeling through latent svms. In Proceedings of the Eighth Conference of AMTA, Honolulu.Google Scholar
  18. 18.
    Clark, E. V. (1978). Awwareness of language: Some evidence from what children say and do. In R. J. A. Sinclair & W. Levelt (Eds.), The child’s conception of language. Berlin: Springer.Google Scholar
  19. 19.
    Clark, E. V. (1990). Speaker perspective in language acquisition. Linguistics, 28, 1201–1220.CrossRefGoogle Scholar
  20. 20.
    Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP, Philadelphia.Google Scholar
  21. 21.
    Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2008). Baby srl: Modeling early language acquisition. In Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), Manchester.Google Scholar
  22. 22.
    Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2009). Minimally supervised model of early language acquisition. In Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), Boulder.Google Scholar
  23. 23.
    Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2010). Starting from scratch in semantic role labeling. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL), Uppsala.Google Scholar
  24. 24.
    Connor, M., Fisher, C., & Roth, D. (2011). Online latent structure training for language acquisition. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Barcelona.Google Scholar
  25. 25.
    Dale, P. S., & Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments, & Computers, 28, 125–127.CrossRefGoogle Scholar
  26. 26.
    Demetras, M., Post, K., & Snow, C. (1986). Feedback to first-language learners. Journal of Child Language, 13, 275–292.CrossRefGoogle Scholar
  27. 27.
    Demuth, K., Culbertson, J., & Alter, J. (2006). Word-minimality, epenthesis, and coda licensing in the acquisition of english. Language & Speech, 49, 137–174.CrossRefGoogle Scholar
  28. 28.
    Dowty, D. (1991). Thematic proto-roles and argument selection. Language, 67, 547–619.Google Scholar
  29. 29.
    Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179–211.CrossRefGoogle Scholar
  30. 30.
    Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195–225.Google Scholar
  31. 31.
    Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Anchorage, Alaska.Google Scholar
  32. 32.
    Fisher, C. (1996). Structural limits on verb mapping: The role of analogy in children’s interpretation of sentences. Cognitive Psychology, 31, 41–81.CrossRefGoogle Scholar
  33. 33.
    Fisher, C., & Tokura, H. (1996). Acoustic cues to grammatical structure in infant-directed speech: Cross-linguistic evidence. Child Development, 67, 3192–3218.CrossRefGoogle Scholar
  34. 34.
    Fisher, C., Gleitman, H., & Gleitman, L. (1989). On the semantic content of subcategorization frames. Cognitive Psychology, 23, 331–392.CrossRefGoogle Scholar
  35. 35.
    Fisher, C., Gertner, Y., Scott, R., & Yuan, S. (2010). Syntactic bootstrapping. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 143–149.Google Scholar
  36. 36.
    Gao, J., & Johnson, M. (2008). A comparison of bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proceedings of EMNLP-2008 (pp. 344–352), Honolulu.Google Scholar
  37. 37.
    Gentner, D. (2006). Why verbs are hard to learn. In K. Hirsh-Pasek & R. Golinkoff (Eds.), Action meets word: How children learn verbs (pp. 544–564). Oxford/New York: Oxford University Press.CrossRefGoogle Scholar
  38. 38.
    Gertner, Y., Fisher, C., & Eisengart, J. (2006). Learning words and rules: Abstract knowledge of word order in early sentence comprehension. Psychological Science, 17, 684–691.CrossRefGoogle Scholar
  39. 39.
    Gildea, D., & Palmer, M. (2002). The necessity of parsing for predicate argument recognition. In ACL (pp. 239–246), Philadelphia.Google Scholar
  40. 40.
    Gillette, J., Gleitman, H., Gleitman, L. R., & Lederer, A. (1999). Human simulations of vocabulary learning. Cognition, 73, 135–176.CrossRefGoogle Scholar
  41. 41.
    Goldwater, S., & Griffiths, T. (2007). A fully bayesian approach to unsupervised part-of-speech tagging. In ACL (pp. 744–751), Prague.Google Scholar
  42. 42.
    Gomez, R., & Gerken, L. (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition, 70, 109–135.CrossRefGoogle Scholar
  43. 43.
    Haghighi, A., & Klein, D. (2006). Prototype-driven learning for sequence models. In Proceedings of HTL-NAACL, New York.Google Scholar
  44. 44.
    Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., & Zhang, Y. (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, Boulder.Google Scholar
  45. 45.
    Harris, Z. (1951). Methods in structural linguistics. Chicago: Chicago University Press.Google Scholar
  46. 46.
    Hochmann, J., Endress, A. D., & Mehler, J. (2010). Word frequency as a cue for identifying function words in infancy. Cognition, 115, 444–457.CrossRefGoogle Scholar
  47. 47.
    Huang, F., & Yates, A. (2009). Distributional representations for handling sparsity in supervised sequence-labeling. In ACL, Singapore.Google Scholar
  48. 48.
    Johnson, M. (2007). Why doesn’t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference of EMNLP-CoNLL (pp. 296–305), Prague.Google Scholar
  49. 49.
    Johnson, M., Demuth, K., Frank, M. C., & Jones, B. (2010). Synergies in learning words and their meanings. In Neural Information Processing Systems, 23, Vancouver.Google Scholar
  50. 50.
    Kazama, J., & Torisawa, K. (2007). A new perceptron algorithm for sequence labeling with non-local features. In Proceedings of the 2007 Joint Conference of EMNLP-CoNLL (pp. 315–324), Prague.Google Scholar
  51. 51.
    Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review, 99, 349–364.CrossRefGoogle Scholar
  52. 52.
    Kingsbury, P., & Palmer, M. (2002). From Treebank to PropBank. In Proceedings of LREC-2002, Spain.Google Scholar
  53. 53.
    Klein, D., & Manning, C. (2004). Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings of the Association for Computational Linguistics (ACL), Barcelona.Google Scholar
  54. 54.
    Landau, B., & Gleitman, L. (1985). Language and experience. Cambridge: Harvard University Press.Google Scholar
  55. 55.
    Levin, B., & Rappaport-Hovav, M. (2005). Argument realization. Research surveys in linguistics series. Cambridge: Cambridge University Press.Google Scholar
  56. 56.
    MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd ed.). Mahwah: Lawrence Elrbaum Associates.Google Scholar
  57. 57.
    Marcus, M. P., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.Google Scholar
  58. 58.
    Marcus., G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule learning by seven-month-old infants. Science, 283, 77–80.Google Scholar
  59. 59.
    Màrquez, L., Carreras, X., Litkowski, K., & Stevenson, S. (2008). Semantic role labeling: An introduction to the special issue. Computational Linguistics, 34, 145–159.CrossRefGoogle Scholar
  60. 60.
    Meilă, M. (2002). Comparing clusterings (Tech. Rep. 418). University of Washington Statistics Department.Google Scholar
  61. 61.
    Miller, G., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–312.CrossRefGoogle Scholar
  62. 62.
    Mintz, T. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90, 91–117.CrossRefGoogle Scholar
  63. 63.
    Mintz, T., Newport, E., & Bever, T. (2002). The distributional structure of grammatical categories in speech to young children. Cognitive Science, 26, 393–424.CrossRefGoogle Scholar
  64. 64.
    Monaghan, P., Chater, N., & Christiansen, M. (2005). The differential role of phonological and distributional cues in grammatical categorisation. Cognition, 96, 143–182.CrossRefGoogle Scholar
  65. 65.
    Naigles, L. R. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374.CrossRefGoogle Scholar
  66. 66.
    Nappa, R., Wessel, A., McEldoon, K., Gleitman, L., & Trueswell, J. (2009). Use of speaker’s gaze and syntax in verb learning. Language Learning and Development, 5, 203–234.CrossRefGoogle Scholar
  67. 67.
    Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.CrossRefGoogle Scholar
  68. 68.
    Parisien, C., & Stevenson, S. (2010). Learning verb alternations in a usage-based bayesian model. In Proceedings of the 32nd annual meeting of the Cognitive Science Society, Portland.Google Scholar
  69. 69.
    Perfors, A., Tenenbaum, J., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37, 607–642.CrossRefGoogle Scholar
  70. 70.
    Pinker, S. (1984). Language learnability and language development. Cambridge: Harvard University Press.Google Scholar
  71. 71.
    Pinker, S. (1989). Learnability and cognition. Cambridge: MIT Press.Google Scholar
  72. 72.
    Punyakanok, V., Roth, D., & Yih, W. (2008). The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2), 257–287.CrossRefGoogle Scholar
  73. 73.
    Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–285.CrossRefGoogle Scholar
  74. 74.
    Ravi, S., & Knight, K. (2009). Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conferenceof the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP), Singapore.Google Scholar
  75. 75.
    Rispoli, M. (1989). Encounters with japanese verbs: Caregiver sentences and the categorization of transitive and intransitive action verbs. First Language, 9, 57–80.CrossRefGoogle Scholar
  76. 76.
    Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 906–914.CrossRefGoogle Scholar
  77. 77.
    Shi, R., Morgan, J. L., & Allopenna, P. (1998). Phonological and acoustic bases for earliest grammatical category assignment: A cross-linguistic perspective. Journal of Child Language, 25(01), 169–201.CrossRefGoogle Scholar
  78. 78.
    Shi, R., Werker, J. F., & Morgan, J. L. (1999). Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition, 72(2), B11–B21.CrossRefGoogle Scholar
  79. 79.
    Smith, L., & Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition, 106, 1558–1568.CrossRefGoogle Scholar
  80. 80.
    Snedeker, J., & Gleitman, L. (2004). Why it is hard to label our concepts. In D. G. Hall & S. R. Waxman (Eds.), Weaving a lexicon. Cambridge: MIT Press.Google Scholar
  81. 81.
    Solan, Z., Horn, D., Ruppin, E., & Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of the National Academy of Science, 102, 11629–11634.CrossRefGoogle Scholar
  82. 82.
    Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., & Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester.Google Scholar
  83. 83.
    Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge: Harvard University Press.Google Scholar
  84. 84.
    Toutanova, K., & Johnson, M. (2007). A bayesian LDA-based model for semi-supervised part-of-speech tagging. In Proceedings of NIPS, Vancouver.Google Scholar
  85. 85.
    Waterfall, H., Sandbank, B., Onnis, L., & Edelman, S. (2010). An empirical generative framework for computational modeling of language acquisition. Journal of Child Language, 37, 671–703.CrossRefGoogle Scholar
  86. 86.
    Yang, C. (2011). A statistical test for grammar. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, Portland.Google Scholar
  87. 87.
    Yu, C., & Joachims, T. (2009). Learning structural svms with latent variables. In ICML, Montreal.Google Scholar
  88. 88.
    Yuan, S., Fisher, C., & Snedeker, J. (2012). Counting the nouns: Simple structural cues to verb meaning. Child Development, 83, 1382–1399.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of IllinoisUrbanaUSA
  2. 2.Department of PsychologyUniversity of IllinoisChampaignUSA

Personalised recommendations