Abstract
In this paper we investigate unsupervised population of a biomedical ontology via information extraction from biomedical literature. Relationships in text seldom connect simple entities. We therefore focus on identifying compound entities rather than mentions of simple entities. We present a method based on rules over grammatical dependency structures for unsupervised segmentation of sentences into compound entities and relationships. We complement the rule-based approach with a statistical component that prunes structures with low information content, thereby reducing false positives in the prediction of compound entities, their constituents and relationships. The extraction is manually evaluated with respect to the UMLS Semantic Network by analyzing the conformance of the extracted triples with the corresponding UMLS relationship type definitions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ramakrishnan, C., Kochut, K.J., Sheth, A.P.: A Framework for Schema-Driven Relationship Discovery from Unstructured Text. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 583–596. Springer, Heidelberg (2006)
Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7(2), 56–63 (2005)
Guha, R., McCool, R., Miller, E.: Semantic search. In: WWW 2003, pp. 700–709 (2003)
Soon, W., Ng, H., Daniel: A machine learning approach to coreference resolution of noun phrases. COLING 27(4), 521–544 (2001)
Lappin, S., Leass, H.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20(4), 535–561 (1994)
Kim, J.D., et al.: GENIA corpus–semantically annotated corpus for bio-textmining. Bioinformatics 19 (suppl. 1) (2003)
Pyysalo, S., et al.: BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8(1) (2007)
Alex, B., Haddow, B., Grover, C.: Recognising Nested Named Entities in Biomedical Text. In: BioNLP 2007: Biological, translational, and clinical language processing, Prague (2007)
Tsai, R., et al.: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 2006 7(5) (2006)
Talukdar, P., et al.: A Context Pattern Induction Method for Named Entity Extraction. In: CoNLL-X (2006)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: NLL at HLT-NAACL 2003. ACL (2003)
Barbara, R., Marti, A.H.: Classifying semantic relations in bioscience texts. In: ACL 2004. Association for Computational Linguistics, Barcelona (2004)
Mark, C., Johan, K.: Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In: ISMB 1999. AAAI Press, Menlo Park (1999)
Rinaldi, F., et al.: Mining relations in the GENIA corpus. In: Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics, held in conjunction with ECML/PKDD 2004 (2004)
Friedman, C., et al.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17 (suppl. 1), 1367–4803 (2001)
Saric, J., et al.: Extraction of regulatory gene/protein networks from Medline. Bioinformatics (2005)
Ciaramita, M., et al.: Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology. In: 19th IJCAI 2005 (2005)
Klein, D., Manning, C.: Fast exact inference with a factored model for natural language parsing. In: NIPS (2003)
Carrol, J., Minnen, G., Briscoe, T.: Corpus annotation for parser evaluation, Journe(s) ATALA sur les corpus annots pour la syntaxe, Paris, France (1999)
Rosario, B., Hearst, M.: Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy. In: EMNLP 2001 (2001)
Church, K.W., et al.: Using statistics in lexical analysis, in Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Lawrence Erlbaum Associates, Mahwah (1991)
Swanson, D.R.: Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge. Perspectives in Biology and Medicine 30(1), 7–18 (1986)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ramakrishnan, C., Mendes, P.N., Wang, S., Sheth, A.P. (2008). Unsupervised Discovery of Compound Entities for Relationship Extraction. In: Gangemi, A., Euzenat, J. (eds) Knowledge Engineering: Practice and Patterns. EKAW 2008. Lecture Notes in Computer Science(), vol 5268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87696-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-87696-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87695-3
Online ISBN: 978-3-540-87696-0
eBook Packages: Computer ScienceComputer Science (R0)