Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature

  • Luís Pedro Coelho
  • Amr Ahmed
  • Andrew Arnold
  • Joshua Kangas
  • Abdul-Saboor Sheikh
  • Eric P. Xing
  • William W. Cohen
  • Robert F. Murphy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6004)

Abstract

Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu).

Slif originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the classification into subcellular classes by building a more representative training set. To get the most out of the human labeling effort, we used active learning to select images to label.

We developed models that take into account the structure of the document (with panels inside figures inside papers) and the multi-modality of the information (free and annotated text, images, information from external databases). This has allowed us to provide new ways to navigate a large collection of documents.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Murphy, R.F., Velliste, M., Yao, J., Porreca, G.: Searching online journals for fluorescence microscope images depicting protein subcellular location patterns. In: BIBE 2001: Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering, Washington, DC, USA, pp. 119–128. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  2. 2.
    Cohen, W.W., Wang, R., Murphy, R.F.: Understanding captions in biomedical publications. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 499–504. ACM, New York (2003)CrossRefGoogle Scholar
  3. 3.
    Murphy, R.F., Kou, Z., Hua, J., Joffe, M., Cohen, W.W.: Extracting and structuring subcellular location information from on-line journal articles: The subcellular location image finder. In: Proceedings of IASTED International Conference on Knowledge Sharing and Collaborative Engineering, pp. 109–114 (2004)Google Scholar
  4. 4.
    Kou, Z., Cohen, W.W., Murphy, R.F.: A stacked graphical model for associating sub-images with sub-captions. In: Altman, R.B., Dunker, A.K., Hunter, L., Murray, T., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing, pp. 257–268. World Scientific, Singapore (2007)CrossRefGoogle Scholar
  5. 5.
    Ahmed, A., Arnold, A., Coelho, L.P., Kangas, J., Sheikh, A.S., Xing, E.P., Cohen, W.W., Murphy, R.F.: Structured literature image finder: Parsing text and figures in biomedical literature. Journal of Web Semantics (2009) (in press)Google Scholar
  6. 6.
    Gingras, D., Michaud, M., Tomasso, G.D., Bliveau, E., Nyalendo, C., Bliveau, R.: Sphingosine-1-phosphate induces the association of membrane-type 1 matrix metalloproteinase with p130cas in endothelial cells. FEBS Letters 582(3), 399–404 (2008)CrossRefGoogle Scholar
  7. 7.
    Kou, Z., Cohen, W.W., Murphy, R.F.: High-recall protein entity recognition using a dictionary. Bioinformatics 21, i266–i273 (2005)CrossRefGoogle Scholar
  8. 8.
    Geusebroek, J.M., Hoang, M.A., van Gernert, J., Worring, M.: Genre-based search through biomedical images. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 1, pp. 271–274 (2002)Google Scholar
  9. 9.
    Shatkay, H., Chen, N., Blostein, D.: Integrating image data into biomedical text categorization. Bioinformatics 22(14), 446–453 (2006)CrossRefGoogle Scholar
  10. 10.
    Rafkind, B., Lee, M., Chang, S., Yu, H.: Exploring text and image features to classify images in bioscience literature. In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL, Morristown, NJ, USA. Association for Computational Linguistics, pp. 73–80 (2006)Google Scholar
  11. 11.
    Roy, N., Mccallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. 18th International Conf. on Machine Learning, pp. 441–448. Morgan Kaufmann, San Francisco (2001)Google Scholar
  12. 12.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)MATHGoogle Scholar
  13. 13.
    Haralick, R.M.: Statistical and structural approaches to texture. Proceedings of the IEEE 67, 786–804 (1979)CrossRefGoogle Scholar
  14. 14.
    Boland, M.V., Murphy, R.F.: A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics 17(12), 1213–1223 (2001)CrossRefGoogle Scholar
  15. 15.
    Jennrich, R.: Stepwise Regression & Stepwise Discriminant Analysis. In: Statistical Methods for Digital Computers, pp. 58–95. John Wiley & Sons, Inc., New York (1977)Google Scholar
  16. 16.
    Hamilton, N., Pantelic, R., Hanson, K., Teasdale, R.: Fast automated cell phenotype image classification. BMC Bioinformatics 8(1), 110 (2007)CrossRefGoogle Scholar
  17. 17.
    Ridler, T., Calvard, S.: Picture thresholding using an iterative selection method. IEEE Trans. Systems, Man and Cybernetics 8(8), 629–632 (1978)Google Scholar
  18. 18.
    Ahmed, A., Xing, E.P., Cohen, W.W., Murphy, R.F.: Structured correspondence topic models for mining captioned figures in biological literature. In: Proceedings of The Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 39–47. ACM, New York (2009)CrossRefGoogle Scholar
  19. 19.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 275–281. ACM, New York (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Luís Pedro Coelho
    • 1
    • 2
    • 3
  • Amr Ahmed
    • 4
    • 5
  • Andrew Arnold
    • 4
  • Joshua Kangas
    • 1
    • 2
    • 3
  • Abdul-Saboor Sheikh
    • 3
  • Eric P. Xing
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
  • William W. Cohen
    • 1
    • 2
    • 3
    • 4
  • Robert F. Murphy
    • 1
    • 2
    • 3
    • 4
    • 6
    • 7
  1. 1.Lane Center for Computational BiologyCarnegie Mellon University 
  2. 2.Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology 
  3. 3.Center for Bioimage InformaticsCarnegie Mellon University 
  4. 4.Machine Learning DepartmentCarnegie Mellon University 
  5. 5.Language Technologies InstituteCarnegie Mellon University 
  6. 6.Department of Biological SciencesCarnegie Mellon University 
  7. 7.Department of Biomedical EngineeringCarnegie Mellon University 

Personalised recommendations