Unsupervised Discovery of Compound Entities for Relationship Extraction

Ramakrishnan, Cartic; Mendes, Pablo N.; Wang, Shaojun; Sheth, Amit P.

doi:10.1007/978-3-540-87696-0_15

Cartic Ramakrishnan¹,
Pablo N. Mendes¹,
Shaojun Wang¹ &
…
Amit P. Sheth¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5268))

Included in the following conference series:

International Conference on Knowledge Engineering and Knowledge Management

1067 Accesses
9 Citations

Abstract

In this paper we investigate unsupervised population of a biomedical ontology via information extraction from biomedical literature. Relationships in text seldom connect simple entities. We therefore focus on identifying compound entities rather than mentions of simple entities. We present a method based on rules over grammatical dependency structures for unsupervised segmentation of sentences into compound entities and relationships. We complement the rule-based approach with a statistical component that prunes structures with low information content, thereby reducing false positives in the prediction of compound entities, their constituents and relationships. The extraction is manually evaluated with respect to the UMLS Semantic Network by analyzing the conformance of the extracted triples with the corresponding UMLS relationship type definitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ramakrishnan, C., Kochut, K.J., Sheth, A.P.: A Framework for Schema-Driven Relationship Discovery from Unstructured Text. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 583–596. Springer, Heidelberg (2006)
Chapter Google Scholar
Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7(2), 56–63 (2005)
Article Google Scholar
Guha, R., McCool, R., Miller, E.: Semantic search. In: WWW 2003, pp. 700–709 (2003)
Google Scholar
Soon, W., Ng, H., Daniel: A machine learning approach to coreference resolution of noun phrases. COLING 27(4), 521–544 (2001)
Google Scholar
Lappin, S., Leass, H.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20(4), 535–561 (1994)
Google Scholar
Kim, J.D., et al.: GENIA corpus–semantically annotated corpus for bio-textmining. Bioinformatics 19 (suppl. 1) (2003)
Google Scholar
Pyysalo, S., et al.: BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8(1) (2007)
Google Scholar
Alex, B., Haddow, B., Grover, C.: Recognising Nested Named Entities in Biomedical Text. In: BioNLP 2007: Biological, translational, and clinical language processing, Prague (2007)
Google Scholar
Tsai, R., et al.: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 2006 7(5) (2006)
Google Scholar
Talukdar, P., et al.: A Context Pattern Induction Method for Named Entity Extraction. In: CoNLL-X (2006)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: NLL at HLT-NAACL 2003. ACL (2003)
Google Scholar
Barbara, R., Marti, A.H.: Classifying semantic relations in bioscience texts. In: ACL 2004. Association for Computational Linguistics, Barcelona (2004)
Google Scholar
Mark, C., Johan, K.: Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In: ISMB 1999. AAAI Press, Menlo Park (1999)
Google Scholar
Rinaldi, F., et al.: Mining relations in the GENIA corpus. In: Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics, held in conjunction with ECML/PKDD 2004 (2004)
Google Scholar
Friedman, C., et al.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17 (suppl. 1), 1367–4803 (2001)
Google Scholar
Saric, J., et al.: Extraction of regulatory gene/protein networks from Medline. Bioinformatics (2005)
Google Scholar
Ciaramita, M., et al.: Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology. In: 19th IJCAI 2005 (2005)
Google Scholar
Klein, D., Manning, C.: Fast exact inference with a factored model for natural language parsing. In: NIPS (2003)
Google Scholar
Carrol, J., Minnen, G., Briscoe, T.: Corpus annotation for parser evaluation, Journe(s) ATALA sur les corpus annots pour la syntaxe, Paris, France (1999)
Google Scholar
Rosario, B., Hearst, M.: Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy. In: EMNLP 2001 (2001)
Google Scholar
Church, K.W., et al.: Using statistics in lexical analysis, in Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Lawrence Erlbaum Associates, Mahwah (1991)
Google Scholar
Swanson, D.R.: Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge. Perspectives in Biology and Medicine 30(1), 7–18 (1986)
Google Scholar

Download references

Author information

Authors and Affiliations

Kno.e.sis Center, Dept. of Computer Science & Engineering, Wright State University 3640 Colonel Glenn Hwy. Dayton, Ohio,
Cartic Ramakrishnan, Pablo N. Mendes, Shaojun Wang & Amit P. Sheth

Authors

Cartic Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
Pablo N. Mendes
View author publications
You can also search for this author in PubMed Google Scholar
Shaojun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Amit P. Sheth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Aldo Gangemi Jérôme Euzenat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramakrishnan, C., Mendes, P.N., Wang, S., Sheth, A.P. (2008). Unsupervised Discovery of Compound Entities for Relationship Extraction. In: Gangemi, A., Euzenat, J. (eds) Knowledge Engineering: Practice and Patterns. EKAW 2008. Lecture Notes in Computer Science(), vol 5268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87696-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-87696-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87695-3
Online ISBN: 978-3-540-87696-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics