Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature

Coelho, Luís Pedro; Ahmed, Amr; Arnold, Andrew; Kangas, Joshua; Sheikh, Abdul-Saboor; Xing, Eric P.; Cohen, William W.; Murphy, Robert F.

doi:10.1007/978-3-642-13131-8_4

Luís Pedro Coelho^21,22,23,
Amr Ahmed^24,25,
Andrew Arnold²⁴,
Joshua Kangas^21,22,23,
Abdul-Saboor Sheikh²³,
Eric P. Xing^{21,22,23,24,25,26},
William W. Cohen^21,22,23,24 &
…
Robert F. Murphy^{21,22,23,24,26,27}

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6004))

681 Accesses
21 Citations

Abstract

Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu).

Slif originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the classification into subcellular classes by building a more representative training set. To get the most out of the human labeling effort, we used active learning to select images to label.

We developed models that take into account the structure of the document (with panels inside figures inside papers) and the multi-modality of the information (free and annotated text, images, information from external databases). This has allowed us to provide new ways to navigate a large collection of documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 99.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Murphy, R.F., Velliste, M., Yao, J., Porreca, G.: Searching online journals for fluorescence microscope images depicting protein subcellular location patterns. In: BIBE 2001: Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering, Washington, DC, USA, pp. 119–128. IEEE Computer Society, Los Alamitos (2001)
Chapter Google Scholar
Cohen, W.W., Wang, R., Murphy, R.F.: Understanding captions in biomedical publications. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 499–504. ACM, New York (2003)
Chapter Google Scholar
Murphy, R.F., Kou, Z., Hua, J., Joffe, M., Cohen, W.W.: Extracting and structuring subcellular location information from on-line journal articles: The subcellular location image finder. In: Proceedings of IASTED International Conference on Knowledge Sharing and Collaborative Engineering, pp. 109–114 (2004)
Google Scholar
Kou, Z., Cohen, W.W., Murphy, R.F.: A stacked graphical model for associating sub-images with sub-captions. In: Altman, R.B., Dunker, A.K., Hunter, L., Murray, T., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing, pp. 257–268. World Scientific, Singapore (2007)
Chapter Google Scholar
Ahmed, A., Arnold, A., Coelho, L.P., Kangas, J., Sheikh, A.S., Xing, E.P., Cohen, W.W., Murphy, R.F.: Structured literature image finder: Parsing text and figures in biomedical literature. Journal of Web Semantics (2009) (in press)
Google Scholar
Gingras, D., Michaud, M., Tomasso, G.D., Bliveau, E., Nyalendo, C., Bliveau, R.: Sphingosine-1-phosphate induces the association of membrane-type 1 matrix metalloproteinase with p130cas in endothelial cells. FEBS Letters 582(3), 399–404 (2008)
Article Google Scholar
Kou, Z., Cohen, W.W., Murphy, R.F.: High-recall protein entity recognition using a dictionary. Bioinformatics 21, i266–i273 (2005)
Article Google Scholar
Geusebroek, J.M., Hoang, M.A., van Gernert, J., Worring, M.: Genre-based search through biomedical images. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 1, pp. 271–274 (2002)
Google Scholar
Shatkay, H., Chen, N., Blostein, D.: Integrating image data into biomedical text categorization. Bioinformatics 22(14), 446–453 (2006)
Article Google Scholar
Rafkind, B., Lee, M., Chang, S., Yu, H.: Exploring text and image features to classify images in bioscience literature. In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL, Morristown, NJ, USA. Association for Computational Linguistics, pp. 73–80 (2006)
Google Scholar
Roy, N., Mccallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. 18th International Conf. on Machine Learning, pp. 441–448. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Haralick, R.M.: Statistical and structural approaches to texture. Proceedings of the IEEE 67, 786–804 (1979)
Article Google Scholar
Boland, M.V., Murphy, R.F.: A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics 17(12), 1213–1223 (2001)
Article Google Scholar
Jennrich, R.: Stepwise Regression & Stepwise Discriminant Analysis. In: Statistical Methods for Digital Computers, pp. 58–95. John Wiley & Sons, Inc., New York (1977)
Google Scholar
Hamilton, N., Pantelic, R., Hanson, K., Teasdale, R.: Fast automated cell phenotype image classification. BMC Bioinformatics 8(1), 110 (2007)
Article Google Scholar
Ridler, T., Calvard, S.: Picture thresholding using an iterative selection method. IEEE Trans. Systems, Man and Cybernetics 8(8), 629–632 (1978)
Google Scholar
Ahmed, A., Xing, E.P., Cohen, W.W., Murphy, R.F.: Structured correspondence topic models for mining captioned figures in biological literature. In: Proceedings of The Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 39–47. ACM, New York (2009)
Chapter Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 275–281. ACM, New York (1998)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Lane Center for Computational Biology, Carnegie Mellon University,
Luís Pedro Coelho, Joshua Kangas, Eric P. Xing, William W. Cohen & Robert F. Murphy
Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology,
Luís Pedro Coelho, Joshua Kangas, Eric P. Xing, William W. Cohen & Robert F. Murphy
Center for Bioimage Informatics, Carnegie Mellon University,
Luís Pedro Coelho, Joshua Kangas, Abdul-Saboor Sheikh, Eric P. Xing, William W. Cohen & Robert F. Murphy
Machine Learning Department, Carnegie Mellon University,
Amr Ahmed, Andrew Arnold, Eric P. Xing, William W. Cohen & Robert F. Murphy
Language Technologies Institute, Carnegie Mellon University,
Amr Ahmed & Eric P. Xing
Department of Biological Sciences, Carnegie Mellon University,
Eric P. Xing & Robert F. Murphy
Department of Biomedical Engineering, Carnegie Mellon University,
Robert F. Murphy

Authors

Luís Pedro Coelho
View author publications
You can also search for this author in PubMed Google Scholar
Amr Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Arnold
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Kangas
View author publications
You can also search for this author in PubMed Google Scholar
Abdul-Saboor Sheikh
View author publications
You can also search for this author in PubMed Google Scholar
Eric P. Xing
View author publications
You can also search for this author in PubMed Google Scholar
William W. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Robert F. Murphy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bioalma, C/Ronda de Poniente, 4, 2-C, 28760, Tres Cantos, Madrid, Spain
Christian Blaschke
Computational Biology and Machine Learning Lab, School of Computing, Queen’s University, K7L 3N6, Kingston, ON, Canada
Hagit Shatkay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coelho, L.P. et al. (2010). Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature. In: Blaschke, C., Shatkay, H. (eds) Linking Literature, Information, and Knowledge for Biology. Lecture Notes in Computer Science(), vol 6004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13131-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-13131-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13130-1
Online ISBN: 978-3-642-13131-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics