A Framework for Schema-Driven Relationship Discovery from Unstructured Text

  • Cartic Ramakrishnan
  • Krys J. Kochut
  • Amit P. Sheth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4273)


We address the issue of extracting implicit and explicit relationships between entities in biomedical text. We argue that entities seldom occur in text in their simple form and that relationships in text relate the modified, complex forms of entities with each other. We present a rule-based method for (1) extraction of such complex entities and (2) relationships between them and (3) the conversion of such relationships into RDF. Furthermore, we present results that clearly demonstrate the utility of the generated RDF in discovering knowledge from text corpora by means of locating paths composed of the extracted relationships.


Relationship Extraction Knowledge-Driven Text mining 


  1. 1.
    Bush, V.: As We May Think. The Atlantic Monthly 176(1), 101–108 (1945)Google Scholar
  2. 2.
    NLM, PubMed, The National Library Of Medicine, Bethesda MDGoogle Scholar
  3. 3.
    Swanson, D.R.: Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge. Perspectives in Biology and Medicine 30(1), 7–18 (1986)Google Scholar
  4. 4.
    Swanson, D.R.: Migraine and Magnesium: Eleven Neglected Connections. Perspectives in Biology and Medicine 31(4), 526–557 (1988)Google Scholar
  5. 5.
    Anyanwu, K., Sheth, A.: ρ-Queries: enabling querying for semantic associations on the semantic web. In: Proceedings WWW. ACM Press, Budapest (2003)Google Scholar
  6. 6.
    Ramakrishnan, C., et al.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explor. Newsl. 7(2), 56–63 (2005)CrossRefGoogle Scholar
  7. 7.
    Guha, R., McCool, R., Miller, E.: Semantic search. In: WWW 2003, pp. 700–709 (2003)Google Scholar
  8. 8.
    Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Brief Bioinform. 6(1), 57–71 (2005)CrossRefGoogle Scholar
  9. 9.
    Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)CrossRefGoogle Scholar
  10. 10.
    Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)Google Scholar
  11. 11.
    Yu, H., et al.: Automatically identifying gene/protein terms in MEDLINE abstracts. J. of Biomedical Informatics 35(5/6), 322–330 (2002)CrossRefGoogle Scholar
  12. 12.
    Gaizauskas, R., et al.: Protein structures and information extraction from biological texts: the PASTA system. Bioinformatics 19(1), 135–143 (2003)CrossRefGoogle Scholar
  13. 13.
    Friedman, C., et al.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(suppl. 1), 1367–4803 (2001)Google Scholar
  14. 14.
    Rindflesch, T.C., et al.: EDGAR: extraction of drugs, genes and relations from the biomedical literature. In: Pac. Symp. Biocomput., pp. 517–528 (2000)Google Scholar
  15. 15.
    NLM, Medical Subject Heading (MeSH), The National Library Of Medicine, Bethesda, MDGoogle Scholar
  16. 16.
    NLM, Unified Medical Language System (UMLS), The National Library Of Medicine, Bethesda, MDGoogle Scholar
  17. 17.
    Tsuruoka, Y., Tsujii, J.i.: Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing Association, pp. 467–474 (2005)Google Scholar
  18. 18.
    Tsuruoka, Y., Tsujii, J.i.: Chunk Parsing Revisited. In: Proceedings of the 9th International Workshop on Parsing Technologies (IWPT 2005), pp. 133–140 (2005)Google Scholar
  19. 19.
    Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the First Conference on North American Chapter of the ACL, pp. 132–139. Morgan, San Francisco (2000)Google Scholar
  20. 20.
    Collins, M.: Head-driven statistical models for natural language parsing (1999)Google Scholar
  21. 21.
    Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: ACL 2002, pp. 263–270 (2002)Google Scholar
  22. 22.
    Tsuruoka, Y., et al.: Developing a Robust Part-of-Speech Tagger for Biomedical Text. LNCS, pp. 382–392 (2005)Google Scholar
  23. 23.
    Déjean, H.: Learning rules and their exceptions. J. Mach. Learn. Res. 2, 669–693 (2002)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Cartic Ramakrishnan
    • 1
  • Krys J. Kochut
    • 1
  • Amit P. Sheth
    • 1
  1. 1.LSDIS Lab, Dept. of Computer ScienceUniversity of GeorgiaAthens

Personalised recommendations