Unsupervised Discovery of Compound Entities for Relationship Extraction

  • Cartic Ramakrishnan
  • Pablo N. Mendes
  • Shaojun Wang
  • Amit P. Sheth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5268)


In this paper we investigate unsupervised population of a biomedical ontology via information extraction from biomedical literature. Relationships in text seldom connect simple entities. We therefore focus on identifying compound entities rather than mentions of simple entities. We present a method based on rules over grammatical dependency structures for unsupervised segmentation of sentences into compound entities and relationships. We complement the rule-based approach with a statistical component that prunes structures with low information content, thereby reducing false positives in the prediction of compound entities, their constituents and relationships. The extraction is manually evaluated with respect to the UMLS Semantic Network by analyzing the conformance of the extracted triples with the corresponding UMLS relationship type definitions.


Information extraction compound entity identification relationship extraction relational knowledge acquisition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ramakrishnan, C., Kochut, K.J., Sheth, A.P.: A Framework for Schema-Driven Relationship Discovery from Unstructured Text. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 583–596. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7(2), 56–63 (2005)CrossRefGoogle Scholar
  3. 3.
    Guha, R., McCool, R., Miller, E.: Semantic search. In: WWW 2003, pp. 700–709 (2003)Google Scholar
  4. 4.
    Soon, W., Ng, H., Daniel: A machine learning approach to coreference resolution of noun phrases. COLING 27(4), 521–544 (2001)Google Scholar
  5. 5.
    Lappin, S., Leass, H.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20(4), 535–561 (1994)Google Scholar
  6. 6.
    Kim, J.D., et al.: GENIA corpus–semantically annotated corpus for bio-textmining. Bioinformatics  19 (suppl. 1) (2003)Google Scholar
  7. 7.
    Pyysalo, S., et al.: BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8(1) (2007)Google Scholar
  8. 8.
    Alex, B., Haddow, B., Grover, C.: Recognising Nested Named Entities in Biomedical Text. In: BioNLP 2007: Biological, translational, and clinical language processing, Prague (2007)Google Scholar
  9. 9.
    Tsai, R., et al.: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 2006 7(5) (2006)Google Scholar
  10. 10.
    Talukdar, P., et al.: A Context Pattern Induction Method for Named Entity Extraction. In: CoNLL-X (2006)Google Scholar
  11. 11.
    McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: NLL at HLT-NAACL 2003. ACL (2003)Google Scholar
  12. 12.
    Barbara, R., Marti, A.H.: Classifying semantic relations in bioscience texts. In: ACL 2004. Association for Computational Linguistics, Barcelona (2004)Google Scholar
  13. 13.
    Mark, C., Johan, K.: Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In: ISMB 1999. AAAI Press, Menlo Park (1999)Google Scholar
  14. 14.
    Rinaldi, F., et al.: Mining relations in the GENIA corpus. In: Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics, held in conjunction with ECML/PKDD 2004 (2004)Google Scholar
  15. 15.
    Friedman, C., et al.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17 (suppl. 1), 1367–4803 (2001)Google Scholar
  16. 16.
    Saric, J., et al.: Extraction of regulatory gene/protein networks from Medline. Bioinformatics (2005)Google Scholar
  17. 17.
    Ciaramita, M., et al.: Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology. In: 19th IJCAI 2005 (2005)Google Scholar
  18. 18.
    Klein, D., Manning, C.: Fast exact inference with a factored model for natural language parsing. In: NIPS (2003)Google Scholar
  19. 19.
    Carrol, J., Minnen, G., Briscoe, T.: Corpus annotation for parser evaluation, Journe(s) ATALA sur les corpus annots pour la syntaxe, Paris, France (1999)Google Scholar
  20. 20.
    Rosario, B., Hearst, M.: Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy. In: EMNLP 2001 (2001)Google Scholar
  21. 21.
    Church, K.W., et al.: Using statistics in lexical analysis, in Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Lawrence Erlbaum Associates, Mahwah (1991)Google Scholar
  22. 22.
    Swanson, D.R.: Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge. Perspectives in Biology and Medicine 30(1), 7–18 (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Cartic Ramakrishnan
    • 1
  • Pablo N. Mendes
    • 1
  • Shaojun Wang
    • 1
  • Amit P. Sheth
    • 1
  1. 1.Kno.e.sis Center, Dept. of Computer Science & EngineeringWright State University 3640 Colonel Glenn Hwy. DaytonOhio 

Personalised recommendations