Modeling Relations and Their Mentions without Labeled Text

  • Sebastian Riedel
  • Limin Yao
  • Andrew McCallum
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)


Several recent works on relation extraction have been applying the distant supervision paradigm: instead of relying on annotated text to learn how to predict relations, they employ existing knowledge bases (KBs) as source of supervision. Crucially, these approaches are trained based on the assumption that each sentence which mentions the two related entities is an expression of the given relation. Here we argue that this leads to noisy patterns that hurt precision, in particular if the knowledge base is not directly related to the text we are working with. We present a novel approach to distant supervision that can alleviate this problem based on the following two ideas: First, we use a factor graph to explicitly model the decision whether two entities are related, and the decision whether this relation is mentioned in a given sentence; second, we apply constraint-driven semi-supervision to train this model without any knowledge about which sentences express the relations in our training KB. We apply our approach to extract relations from the New York Times corpus and use Freebase as knowledge base. When compared to a state-of-the-art approach for relation extraction under distant supervision, we achieve 31% error reduction.


Relation Type Factor Graph Computational Linguistics Related Entity Relation Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bellare, K., McCallum, A.: Generalized expectation criteria for bootstrapping extractors using record-text alignment. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 131–140 (2009)Google Scholar
  2. 2.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM, New York (2008)CrossRefGoogle Scholar
  3. 3.
    Bunescu, R.C., Mooney, R.J.: Learning to extract relations from the web using minimal supervision. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, ACL’ 07 (2007)Google Scholar
  4. 4.
    Chang, M.W., Goldwasser, D., Roth, D., Tu, Y.: Unsupervised constraint driven learning for transliteration discovery. In: NAACL ’09: Proceedings of Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 299–307 (2009)Google Scholar
  5. 5.
    Chang, M.W., Ratinov, L., Rizzolo, N., Roth, D.: Learning and inference with constraints. In: AAAI Conference on Artificial Intelligence, pp. 1513–1518. AAAI Press, Menlo Park (2008)Google Scholar
  6. 6.
    Chang, M.W., Ratinov, L., Roth, D.: Guiding semi-supervision with constraint-driven learning. In: Annual Meeting of the Association for Computational Linguistics (ACL), pp. 280–287 (2007)Google Scholar
  7. 7.
    Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’02), vol. 10, pp. 1–8 (2002)Google Scholar
  8. 8.
    Craven, M., Kumlien, J.: Constructing biological knowledge-bases by extracting information from text sources. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, Germany, pp. 77–86 (1999)Google Scholar
  9. 9.
    Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM ’05), pp. 257–258. ACM, New York (2005)CrossRefGoogle Scholar
  10. 10.
    Dietterich, T., Lathrop, R., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)zbMATHCrossRefGoogle Scholar
  11. 11.
    Dimitry Zelenko, C.A., Richardella, A.: Kernel methods for relation extraction. JMLR 3(6), 1083–1106 (2003)CrossRefGoogle Scholar
  12. 12.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’ 05), pp. 363–370 (June 2005)Google Scholar
  13. 13.
    Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images, pp. 452–472 (1990)Google Scholar
  14. 14.
    Jensen, C.S., Kong, A., Kjaerulff, U.: Blocking gibbs sampling in very large probabilistic expert systems. International Journal of Human Computer Studies. Special Issue on Real-World Applications of Uncertain Reasoning 42, 647–666 (1993)Google Scholar
  15. 15.
    Lafferty, J.D., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, ICML (2001)Google Scholar
  16. 16.
    Mann, G.S., McCallum, A.: Generalized expectation criteria for semi-supervised learning of conditional random fields. In: Annual Meeting of the Association for Computational Linguistics (ACL), pp. 870–878 (2008)Google Scholar
  17. 17.
    McCallum, A., Schultz, K., Singh, S.: Factorie: Probabilistic programming via imperatively defined factor graphs. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1249–1257 (2009)Google Scholar
  18. 18.
    Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47rd Annual Meeting of the Association for Computational Linguistics (ACL’ 09), pp. 1003–1011. Association for Computational Linguistics (2009)Google Scholar
  19. 19.
    Morgan, A.A., Hirschman, L., Colosimo, M., Yeh, A.S., Colombe, J.B.: Gene name identification and normalization using a model organism database. J. of Biomedical Informatics 37(6), 396–410 (2004)CrossRefGoogle Scholar
  20. 20.
    Nivre, J., Hall, J., Nilsson, J.: Memory-based dependency parsing. In: Proceedings of CoNLL, pp. 49–56 (2004)Google Scholar
  21. 21.
    Rohanimanesh, K., Wick, M., McCallum, A.: Inference and learning in large factor graphs with a rank based objective. Tech. Rep. UM-CS-2009-08, University of Massachusetts, Amherst (2009)Google Scholar
  22. 22.
    Sandhaus, E.: The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia (2008)Google Scholar
  23. 23.
    Singh, S., Schultz, K., McCallum, A.: Bi-directional joint inference for entity resolution and segmentation using imperatively-defined factor graphs. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pp. 414–429 (2009)Google Scholar
  24. 24.
    Singh, S., Yao, L., Riedel, S., McCallum, A.: Constraint-driven rank-based learning for information extraction. In: North American Chapter of the Association for Computational Linguistics - Human Language Technologies, NAACL HLT (2010)Google Scholar
  25. 25.
    Smith, N.A., Eisner, J.: Contrastive estimation: training log-linear models on unlabeled data. In: ACL ’05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 354–362. Association for Computational Linguistics, Morristown (2005)CrossRefGoogle Scholar
  26. 26.
    Sun, X., Matsuzaki, T., Okanohara, D., Tsujii, J.: Latent variable perceptron algorithm for structured classification. In: IJCAI’09: Proceedings of the 21st International Jiont Conference on Artifical Intelligence, pp. 1236–1242. Morgan Kaufmann Publishers Inc., San Francisco (2009)Google Scholar
  27. 27.
    Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: Samplerank: Learning preferences from atomic gradients. In: Neural Information Processing Systems (NIPS), Workshop on Advances in Ranking (2009)Google Scholar
  28. 28.
    Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of the 16th ACM International Conference on Information and Knowledge Management (CIKM ’07), pp. 41–50. ACM Press, New York (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Sebastian Riedel
    • 1
  • Limin Yao
    • 1
  • Andrew McCallum
    • 1
  1. 1.University of Massachusetts, AmherstAmherstU.S.

Personalised recommendations