Skip to main content

High-Throughput Identification of Chemistry in Life Science Texts

  • Conference paper
Computational Life Sciences II (CompLife 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4216))

Included in the following conference series:

Abstract

OSCAR3 is an open extensible system for the automated annotation of chemistry in scientific articles, which can process thousands of articles per hour. This XML annotation supports applications such as interactive browsing and chemically-aware searching, and has been designed for integration with larger text-analysis systems. We report its application to the high-throughput analysis of the small-molecule chemistry content of texts in life sciences, such as PubMed abstracts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. de Matos, P., Ennis, M., Guedj, M., Degtyarenko, K., Apweiler, R.: ChEBI – Chemical Entities of Biological Interest. Nucleic Acids Res., Database Summary Paper 646

    Google Scholar 

  2. http://bioie.ldc.upenn.edu

  3. http://www.cl.cam.ac.uk/users/av308/Project_Index/index.html

  4. http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA

  5. http://www-tsujii.is.s.u-tokyo.ac.jp/medie

  6. http://www-tsujii.is.s.u-tokyo.ac.jp/info-pubmed

  7. http://www.ihop-net.org/UniPub/iHOP/

  8. http://www.textpresso.org/

  9. http://www.ebi.ac.uk/Rebholz-srv/ebimed/index.jsp

  10. http://pdg.cnb.uam.es/BioLINK/BioCreative.eval.html

  11. http://ir.ohsu.edu/genomics/

  12. Vasserman, A.: Identifying Chemical Names in Biomedical Text: An Investigation of the Substring Co-occurrence Based Approaches. In: Proceedings of the Student Research Workshop at HLT-NAACL (2004)

    Google Scholar 

  13. Wilbur, J.W., Hazard, G.F., Divita, G., Mork, J.G., Aronson, A.R., Browne, A.C.: Analysis of Biomedical Text for Chemical Names: A Comparison of Three Methods. In: Proc. AMIA Symp. 1999, pp. 176–180 (1999)

    Google Scholar 

  14. Chowdhury, G.G., Lynch, M.F.: Semantic Interpretation of the Texts of Chemical Patent Abstracts. 1. Lexical Analysis and Categorization. Journal of Chemical Informatics and Computer Science 32, 463–467 (1992)

    Google Scholar 

  15. Chowdhury, G.G., Lynch, M.F.: Semantic Interpretation of the Texts of Chemical Patent Abstracts. 2. Processing and Results. Journal of Chemical Informatics and Computer Science 32, 468–473 (1992)

    Google Scholar 

  16. Al, C.S., Blower Jr., P.E., Ledwith, R.H.: Extraction of Chemical Reaction Information from Primary Journal Text. Journal of Chemical Informatics and Computer Science 30, 163–169 (1990)

    Google Scholar 

  17. Zamora, E.M., Blower Jr., P.E.: Extraction of Chemical Reaction Information from Primary Journal Text Using Computational Linguistics Techniques. 1. Lexical and Syntactic Phases. Journal of Chemical Informatics and Computer Science 24, 176–181 (1984)

    Google Scholar 

  18. Zamora, E.M., Blower Jr., P.E.: Extraction of Chemical Reaction Information from Primary Journal Text Using Computational Linguistics Techniques. 2. Semantic Phase. Journal of Chemical Informatics and Computer Science 24, 181–188 (1984)

    Google Scholar 

  19. Postma, G.J., van der Linden, B., Smits, J.R., Kateman, G.: TICA: A System for the Extraction of Data from Analytical Chemical Text. Chemometrics and Intellegent Laboratory Systems 9, 65–74 (1990)

    Article  Google Scholar 

  20. Cooper, J.W., Boyer, S., Nevidomsky, A., Coden, A.R.: Automatic discovery and annotation of organic chemical names in patents. In: 229th ACS National Meeting (2005)

    Google Scholar 

  21. Copestake, A., Corbett, P.T., Murray-Rust, P., Rupp, C.J., Siddharthan, A., Teufel, S., Waldron, B.: An Architecture for Language Technology for Processing Scientific Texts. UK e-Science All Hands Meeting (submitted, 2006)

    Google Scholar 

  22. http://sourceforge.net/projects/oscar3-chem

  23. Ludwig, M.-G., Vanek, M., Guerini, D., Gasser, J.A., Jones, C.E., Junker, U., Hofstetter, H., Wolf, R.M., Seuwen, K.: Proton-sensing G-protein-coupled receptors. Nature 425, 93–98 (2003)

    Article  Google Scholar 

  24. Murray-Rust, P., Mitchell, J.B.O., Rzepa, H.S.: Communication and re-use of chemical information in bioscience. BMC Bioinformatics 6, 180 (2005)

    Article  Google Scholar 

  25. Murray-Rust, P., Mitchell, J.B.O., Rzepa, H.S.: Chemistry in Bioinformatics. BMC Bioinformatics 6, 141 (2005)

    Article  Google Scholar 

  26. Townsend, J., Copestake, A., Murray-Rust, P., Teufel, S., Waudby, C.: Language Technology for Processing Chemistry Publications. In: Proceedings of the fourth UK e-Science All Hands Meeting (2005)

    Google Scholar 

  27. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 359–394 (1999)

    Article  Google Scholar 

  28. Townsend, J.A., Adams, S.E., Waudby, C.A., de Souza, V.K., Goodman, J.M., Murray-Rust, P.: Chemical documents: machine understanding and automated information extraction. Organic & Biomolecular Chemistry 2, 3294 (2004)

    Article  Google Scholar 

  29. A Guide to IUPAC Nomenclature of Organic Chemistry, Recommendations 1993 (including Revisions, Published and hitherto Unpublished, to the 1979 Edition of Nomenclature of Organic Chemistry), IUPAC (1993)

    Google Scholar 

  30. Van der Stouw, G.G., Naznitsky, I., Rush, J.E.: Procedures for Converting Systematic Names of Organic Compounds into Atom-Bond Connection Tables. Journal of Chemical Documentation 7, 165–169 (1967)

    Article  Google Scholar 

  31. Van der Stouw, G.G., Elliott, P.M., Isenbert, A.C.: Automated Conversion of Chemical Substance Names into Atom-Bond Connection Tables. Journal of Chemical Documentation 14, 185–193 (1974)

    Article  Google Scholar 

  32. Cooke-Fox, D.I., Kirby, G.H., Rayner, J.D.: Computer Translation of IUPAC Systematic Organic Chemical Nomenclature. 1. Introduction and Background to a Grammar-Based Approach. J. Chem. Inf. Comp. Sci. 29, 101 (1989)

    Google Scholar 

  33. Brecher, J.: Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical Nomenclature. J. Chem. Inf. Comp. Sci. 39, 943 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Corbett, P., Murray-Rust, P. (2006). High-Throughput Identification of Chemistry in Life Science Texts. In: R. Berthold, M., Glen, R.C., Fischer, I. (eds) Computational Life Sciences II. CompLife 2006. Lecture Notes in Computer Science(), vol 4216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875741_11

Download citation

  • DOI: https://doi.org/10.1007/11875741_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45767-1

  • Online ISBN: 978-3-540-45768-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics