Chapter

Linking Literature, Information, and Knowledge for Biology

Volume 6004 of the series Lecture Notes in Computer Science pp 23-32

Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature

  • Luís Pedro CoelhoAffiliated withLane Center for Computational Biology, Carnegie Mellon UniversityJoint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational BiologyCenter for Bioimage Informatics, Carnegie Mellon University
  • , Amr AhmedAffiliated withMachine Learning Department, Carnegie Mellon UniversityLanguage Technologies Institute, Carnegie Mellon University
  • , Andrew ArnoldAffiliated withMachine Learning Department, Carnegie Mellon University
  • , Joshua KangasAffiliated withLane Center for Computational Biology, Carnegie Mellon UniversityJoint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational BiologyCenter for Bioimage Informatics, Carnegie Mellon University
  • , Abdul-Saboor SheikhAffiliated withCenter for Bioimage Informatics, Carnegie Mellon University
  • , Eric P. XingAffiliated withLane Center for Computational Biology, Carnegie Mellon UniversityJoint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational BiologyCenter for Bioimage Informatics, Carnegie Mellon UniversityMachine Learning Department, Carnegie Mellon UniversityLanguage Technologies Institute, Carnegie Mellon UniversityDepartment of Biological Sciences, Carnegie Mellon University
  • , William W. CohenAffiliated withLane Center for Computational Biology, Carnegie Mellon UniversityJoint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational BiologyCenter for Bioimage Informatics, Carnegie Mellon UniversityMachine Learning Department, Carnegie Mellon University
  • , Robert F. MurphyAffiliated withLane Center for Computational Biology, Carnegie Mellon UniversityJoint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational BiologyCenter for Bioimage Informatics, Carnegie Mellon UniversityMachine Learning Department, Carnegie Mellon UniversityDepartment of Biological Sciences, Carnegie Mellon UniversityDepartment of Biomedical Engineering, Carnegie Mellon University

Abstract

Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu).

Slif originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the classification into subcellular classes by building a more representative training set. To get the most out of the human labeling effort, we used active learning to select images to label.

We developed models that take into account the structure of the document (with panels inside figures inside papers) and the multi-modality of the information (free and annotated text, images, information from external databases). This has allowed us to provide new ways to navigate a large collection of documents.