Visualization and Language Processing for Supporting Analysis across the Biomedical Literature

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6279)


Finding relevant publications in the large and rapidly growing body of biomedical literature is challenging. Search queries on PubMed often return thousands of publications and it can be a tedious task to filter out irrelevant publications and choose a manageable set to read. We have developed a visual analytics system, named Bio-Jigsaw, which acts like a visual index on a document collection and supports biologists in investigating and understanding connections between biological entities. We apply natural language processing techniques to identify biological entities such as genes and pathways and visualize connections among them via multiple representations. Connections are based on co-occurrence in abstracts and also are drawn from ontologies or annotations in digital libraries. We demonstrate how Bio-Jigsaw can be used to analyze a PubMed search query on a gene related to breast cancer resulting in over 1500 primary papers.


KEGG Pathway Biological Entity Biomedical Literature PubMed Abstract Natural Language Processing Technique 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baumgartner Jr., W.A., Lu, Z., Johnson, H.L., Caporaso, J.G., Paquette, J., Lindemann, A., White, E.K., Medvedeva, O., Cohen, K.B., Hunter, L.: Concept recognition for extracting protein interaction relations from biomedical text. Genome Biology 9 (in press)Google Scholar
  2. 2.
    Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: protein-protein interactions. Intelligent Systems for Molecular Biology, 60–67 (1999)Google Scholar
  3. 3.
    Chun, H.W., Tsuruoka, Y., Kim, J.D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, J.: Extraction of gene-disease relations from medline using domain dictionaries and machine learning. In: Pacific Symposium on Biocomputing, pp. 4–5 (2006)Google Scholar
  4. 4.
    Craven, M., Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. Intelligent Systems for Molecular Biology, 77–86 (1999)Google Scholar
  5. 5.
    Doms, A., Schroeder, M.: GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Research 33, 783–786 (2005), GoPubMedCrossRefGoogle Scholar
  6. 6.
    Gabow, A., Leach, S.M., Baumgartner Jr., W.A., Hunter, L.E., Goldberg, D.S.: Improving protein function prediction methods with integrated literature data. BMC Bioinformatics 9(198) (2008)Google Scholar
  7. 7.
    Galperin, M.Y., Cochrane, G.R.: Nucleic acids research annual database issue and the nar online molecular biology database collection in 2009. Nucleic Acids Research 37, D1–D4 (2009)CrossRefGoogle Scholar
  8. 8.
    Graham, E.F., Dracy, A.E.: The effect of relaxin – mechanical di-latation of the bovine cervix. Journal of Dairy Science 36, 772–777 (1953)CrossRefGoogle Scholar
  9. 9.
    Hoffmann, R., Valencia, A.: A gene network for navigating the literature. Nat. Genet. 736(7) (July 2004)Google Scholar
  10. 10.
    Hunter, L., Cohen, B.: Biomedical language processing: what’s beyond PubMed? Mol. Cell. 21(5), 589–594 (2006)CrossRefGoogle Scholar
  11. 11.
    Hunter, L., Lu, Z., Firby, J., Baumgartner Jr., W.A., Johnson, H.L., Ogren, P.V., Cohen, K.B.: OpenDMAP: An open-source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-specific gene expression. BMC Bioinformatics 9(78) (2008)Google Scholar
  12. 12.
    Kang, Y., Görg, C., Stasko, J.: The evaluation of visual analytics systems for investigative analysis: Deriving design principles from a case study. In: IEEE VAST, pp. 139–146 (October 2009)Google Scholar
  13. 13.
    Kim, J.-D., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP 2009 shared task on event extraction. In: BioNLP 2009 Companion Volume: Shared Task on Entity Extraction, pp. 1–9 (2009)Google Scholar
  14. 14.
    Krallinger, M., Leitner, F., Valencia, A.: The BioCreative II.5 challenge overview. In: Proceedings of the BioCreative II.5 Workshop 2009 on Digital Annotations (2009)Google Scholar
  15. 15.
    Krallinger, M., Morgan, A., Smith, L., Leitner, F., Tanabe, L., Wilbur, J., Hirschman, L., Valencia, A.: Evaluation of text-mining systems for biology: overview of the second biocreative community challenge. Genome Biology 9(suppl. 2), S1 (2008)CrossRefGoogle Scholar
  16. 16.
    Kupari, M., Mikkola, T.S., Turto, H., Lommi, J.: Is the pregnancy hormone relaxin an important player in human heart failure? Eur. J. Heart Fail 7, 195–198 (2005)CrossRefGoogle Scholar
  17. 17.
    Muller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2(11) (2004)Google Scholar
  18. 18.
    Nagel, K., Jimeno-Yepes, A., Rebholz-Schuhmann, D.: Annotation of protein residues based on a literature analysis: cross-validation against uniprotkb. BMC Bioinformatics 10(suppl. 8), S4 (2009)CrossRefGoogle Scholar
  19. 19.
    Santora, K., Rasa, C., Visco, D., Steinetz, B.G., Bagnell, C.A.: Antiarthritic effects of relaxin, in combination with estrogen, in rat adjuvant-induced arthritis. J. Pharmacol. Exp. Ther. 322, 887–893 (2007)CrossRefGoogle Scholar
  20. 20.
    Sayers, E.W., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Landsman, D., Lipman, D.J., Madden, T.L., Maglott, D.R., Miller, V., Mizrachi, I., Ostell, J., Pruitt, K.D., Schuler, G.D., Sequeira, E., Sherry, S.T., Shumway, M., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusova, T.A., Wagner, L., Yaschenko, E., Ye, J.: Database resources of the National Center for Biotechnology Information. Nucl. Acids Res. 37(suppl. 1),D5–D15 (2009)CrossRefGoogle Scholar
  21. 21.
    Stasko, J., Görg, C., Liu, Z.: Jigsaw: supporting investigative analysis through interactive visualization. Information Visualization 7(2), 118–132 (2008)CrossRefGoogle Scholar
  22. 22.
    Wattenberg, M., Viégas, F.B.: The word tree, an interactive visual concordance. IEEE Transactions on Visualization and Computer Graphics 14(6), 1221–1228 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.School of Interactive Computing & GVU CenterGeorgia Institute of TechnologyAtlanta
  2. 2.Center for Computational PharmacologyUniversity of Colorado Denver School of MedicineAurora

Personalised recommendations