Skip to main content

Unsupervised Discovery of Compound Entities for Relationship Extraction

  • Conference paper
Knowledge Engineering: Practice and Patterns (EKAW 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5268))

Abstract

In this paper we investigate unsupervised population of a biomedical ontology via information extraction from biomedical literature. Relationships in text seldom connect simple entities. We therefore focus on identifying compound entities rather than mentions of simple entities. We present a method based on rules over grammatical dependency structures for unsupervised segmentation of sentences into compound entities and relationships. We complement the rule-based approach with a statistical component that prunes structures with low information content, thereby reducing false positives in the prediction of compound entities, their constituents and relationships. The extraction is manually evaluated with respect to the UMLS Semantic Network by analyzing the conformance of the extracted triples with the corresponding UMLS relationship type definitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ramakrishnan, C., Kochut, K.J., Sheth, A.P.: A Framework for Schema-Driven Relationship Discovery from Unstructured Text. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 583–596. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7(2), 56–63 (2005)

    Article  Google Scholar 

  3. Guha, R., McCool, R., Miller, E.: Semantic search. In: WWW 2003, pp. 700–709 (2003)

    Google Scholar 

  4. Soon, W., Ng, H., Daniel: A machine learning approach to coreference resolution of noun phrases. COLING 27(4), 521–544 (2001)

    Google Scholar 

  5. Lappin, S., Leass, H.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20(4), 535–561 (1994)

    Google Scholar 

  6. Kim, J.D., et al.: GENIA corpus–semantically annotated corpus for bio-textmining. Bioinformatics  19 (suppl. 1) (2003)

    Google Scholar 

  7. Pyysalo, S., et al.: BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8(1) (2007)

    Google Scholar 

  8. Alex, B., Haddow, B., Grover, C.: Recognising Nested Named Entities in Biomedical Text. In: BioNLP 2007: Biological, translational, and clinical language processing, Prague (2007)

    Google Scholar 

  9. Tsai, R., et al.: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 2006 7(5) (2006)

    Google Scholar 

  10. Talukdar, P., et al.: A Context Pattern Induction Method for Named Entity Extraction. In: CoNLL-X (2006)

    Google Scholar 

  11. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: NLL at HLT-NAACL 2003. ACL (2003)

    Google Scholar 

  12. Barbara, R., Marti, A.H.: Classifying semantic relations in bioscience texts. In: ACL 2004. Association for Computational Linguistics, Barcelona (2004)

    Google Scholar 

  13. Mark, C., Johan, K.: Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In: ISMB 1999. AAAI Press, Menlo Park (1999)

    Google Scholar 

  14. Rinaldi, F., et al.: Mining relations in the GENIA corpus. In: Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics, held in conjunction with ECML/PKDD 2004 (2004)

    Google Scholar 

  15. Friedman, C., et al.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17 (suppl. 1), 1367–4803 (2001)

    Google Scholar 

  16. Saric, J., et al.: Extraction of regulatory gene/protein networks from Medline. Bioinformatics (2005)

    Google Scholar 

  17. Ciaramita, M., et al.: Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology. In: 19th IJCAI 2005 (2005)

    Google Scholar 

  18. Klein, D., Manning, C.: Fast exact inference with a factored model for natural language parsing. In: NIPS (2003)

    Google Scholar 

  19. Carrol, J., Minnen, G., Briscoe, T.: Corpus annotation for parser evaluation, Journe(s) ATALA sur les corpus annots pour la syntaxe, Paris, France (1999)

    Google Scholar 

  20. Rosario, B., Hearst, M.: Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy. In: EMNLP 2001 (2001)

    Google Scholar 

  21. Church, K.W., et al.: Using statistics in lexical analysis, in Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Lawrence Erlbaum Associates, Mahwah (1991)

    Google Scholar 

  22. Swanson, D.R.: Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge. Perspectives in Biology and Medicine 30(1), 7–18 (1986)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Aldo Gangemi Jérôme Euzenat

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ramakrishnan, C., Mendes, P.N., Wang, S., Sheth, A.P. (2008). Unsupervised Discovery of Compound Entities for Relationship Extraction. In: Gangemi, A., Euzenat, J. (eds) Knowledge Engineering: Practice and Patterns. EKAW 2008. Lecture Notes in Computer Science(), vol 5268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87696-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87696-0_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87695-3

  • Online ISBN: 978-3-540-87696-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics