Skip to main content

Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature

  • Conference paper
Book cover Linking Literature, Information, and Knowledge for Biology

Abstract

Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu).

Slif originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the classification into subcellular classes by building a more representative training set. To get the most out of the human labeling effort, we used active learning to select images to label.

We developed models that take into account the structure of the document (with panels inside figures inside papers) and the multi-modality of the information (free and annotated text, images, information from external databases). This has allowed us to provide new ways to navigate a large collection of documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Murphy, R.F., Velliste, M., Yao, J., Porreca, G.: Searching online journals for fluorescence microscope images depicting protein subcellular location patterns. In: BIBE 2001: Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering, Washington, DC, USA, pp. 119–128. IEEE Computer Society, Los Alamitos (2001)

    Chapter  Google Scholar 

  2. Cohen, W.W., Wang, R., Murphy, R.F.: Understanding captions in biomedical publications. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 499–504. ACM, New York (2003)

    Chapter  Google Scholar 

  3. Murphy, R.F., Kou, Z., Hua, J., Joffe, M., Cohen, W.W.: Extracting and structuring subcellular location information from on-line journal articles: The subcellular location image finder. In: Proceedings of IASTED International Conference on Knowledge Sharing and Collaborative Engineering, pp. 109–114 (2004)

    Google Scholar 

  4. Kou, Z., Cohen, W.W., Murphy, R.F.: A stacked graphical model for associating sub-images with sub-captions. In: Altman, R.B., Dunker, A.K., Hunter, L., Murray, T., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing, pp. 257–268. World Scientific, Singapore (2007)

    Chapter  Google Scholar 

  5. Ahmed, A., Arnold, A., Coelho, L.P., Kangas, J., Sheikh, A.S., Xing, E.P., Cohen, W.W., Murphy, R.F.: Structured literature image finder: Parsing text and figures in biomedical literature. Journal of Web Semantics (2009) (in press)

    Google Scholar 

  6. Gingras, D., Michaud, M., Tomasso, G.D., Bliveau, E., Nyalendo, C., Bliveau, R.: Sphingosine-1-phosphate induces the association of membrane-type 1 matrix metalloproteinase with p130cas in endothelial cells. FEBS Letters 582(3), 399–404 (2008)

    Article  Google Scholar 

  7. Kou, Z., Cohen, W.W., Murphy, R.F.: High-recall protein entity recognition using a dictionary. Bioinformatics 21, i266–i273 (2005)

    Article  Google Scholar 

  8. Geusebroek, J.M., Hoang, M.A., van Gernert, J., Worring, M.: Genre-based search through biomedical images. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 1, pp. 271–274 (2002)

    Google Scholar 

  9. Shatkay, H., Chen, N., Blostein, D.: Integrating image data into biomedical text categorization. Bioinformatics 22(14), 446–453 (2006)

    Article  Google Scholar 

  10. Rafkind, B., Lee, M., Chang, S., Yu, H.: Exploring text and image features to classify images in bioscience literature. In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL, Morristown, NJ, USA. Association for Computational Linguistics, pp. 73–80 (2006)

    Google Scholar 

  11. Roy, N., Mccallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. 18th International Conf. on Machine Learning, pp. 441–448. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  12. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  13. Haralick, R.M.: Statistical and structural approaches to texture. Proceedings of the IEEE 67, 786–804 (1979)

    Article  Google Scholar 

  14. Boland, M.V., Murphy, R.F.: A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics 17(12), 1213–1223 (2001)

    Article  Google Scholar 

  15. Jennrich, R.: Stepwise Regression & Stepwise Discriminant Analysis. In: Statistical Methods for Digital Computers, pp. 58–95. John Wiley & Sons, Inc., New York (1977)

    Google Scholar 

  16. Hamilton, N., Pantelic, R., Hanson, K., Teasdale, R.: Fast automated cell phenotype image classification. BMC Bioinformatics 8(1), 110 (2007)

    Article  Google Scholar 

  17. Ridler, T., Calvard, S.: Picture thresholding using an iterative selection method. IEEE Trans. Systems, Man and Cybernetics 8(8), 629–632 (1978)

    Google Scholar 

  18. Ahmed, A., Xing, E.P., Cohen, W.W., Murphy, R.F.: Structured correspondence topic models for mining captioned figures in biological literature. In: Proceedings of The Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 39–47. ACM, New York (2009)

    Chapter  Google Scholar 

  19. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 275–281. ACM, New York (1998)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Coelho, L.P. et al. (2010). Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature. In: Blaschke, C., Shatkay, H. (eds) Linking Literature, Information, and Knowledge for Biology. Lecture Notes in Computer Science(), vol 6004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13131-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13131-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13130-1

  • Online ISBN: 978-3-642-13131-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics