High-Throughput Identification of Chemistry in Life Science Texts

  • Peter Corbett
  • Peter Murray-Rust
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4216)

Abstract

OSCAR3 is an open extensible system for the automated annotation of chemistry in scientific articles, which can process thousands of articles per hour. This XML annotation supports applications such as interactive browsing and chemically-aware searching, and has been designed for integration with larger text-analysis systems. We report its application to the high-throughput analysis of the small-molecule chemistry content of texts in life sciences, such as PubMed abstracts.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    de Matos, P., Ennis, M., Guedj, M., Degtyarenko, K., Apweiler, R.: ChEBI – Chemical Entities of Biological Interest. Nucleic Acids Res., Database Summary Paper 646Google Scholar
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
    Vasserman, A.: Identifying Chemical Names in Biomedical Text: An Investigation of the Substring Co-occurrence Based Approaches. In: Proceedings of the Student Research Workshop at HLT-NAACL (2004)Google Scholar
  13. 13.
    Wilbur, J.W., Hazard, G.F., Divita, G., Mork, J.G., Aronson, A.R., Browne, A.C.: Analysis of Biomedical Text for Chemical Names: A Comparison of Three Methods. In: Proc. AMIA Symp. 1999, pp. 176–180 (1999)Google Scholar
  14. 14.
    Chowdhury, G.G., Lynch, M.F.: Semantic Interpretation of the Texts of Chemical Patent Abstracts. 1. Lexical Analysis and Categorization. Journal of Chemical Informatics and Computer Science 32, 463–467 (1992)Google Scholar
  15. 15.
    Chowdhury, G.G., Lynch, M.F.: Semantic Interpretation of the Texts of Chemical Patent Abstracts. 2. Processing and Results. Journal of Chemical Informatics and Computer Science 32, 468–473 (1992)Google Scholar
  16. 16.
    Al, C.S., Blower Jr., P.E., Ledwith, R.H.: Extraction of Chemical Reaction Information from Primary Journal Text. Journal of Chemical Informatics and Computer Science 30, 163–169 (1990)Google Scholar
  17. 17.
    Zamora, E.M., Blower Jr., P.E.: Extraction of Chemical Reaction Information from Primary Journal Text Using Computational Linguistics Techniques. 1. Lexical and Syntactic Phases. Journal of Chemical Informatics and Computer Science 24, 176–181 (1984)Google Scholar
  18. 18.
    Zamora, E.M., Blower Jr., P.E.: Extraction of Chemical Reaction Information from Primary Journal Text Using Computational Linguistics Techniques. 2. Semantic Phase. Journal of Chemical Informatics and Computer Science 24, 181–188 (1984)Google Scholar
  19. 19.
    Postma, G.J., van der Linden, B., Smits, J.R., Kateman, G.: TICA: A System for the Extraction of Data from Analytical Chemical Text. Chemometrics and Intellegent Laboratory Systems 9, 65–74 (1990)CrossRefGoogle Scholar
  20. 20.
    Cooper, J.W., Boyer, S., Nevidomsky, A., Coden, A.R.: Automatic discovery and annotation of organic chemical names in patents. In: 229th ACS National Meeting (2005)Google Scholar
  21. 21.
    Copestake, A., Corbett, P.T., Murray-Rust, P., Rupp, C.J., Siddharthan, A., Teufel, S., Waldron, B.: An Architecture for Language Technology for Processing Scientific Texts. UK e-Science All Hands Meeting (submitted, 2006)Google Scholar
  22. 22.
  23. 23.
    Ludwig, M.-G., Vanek, M., Guerini, D., Gasser, J.A., Jones, C.E., Junker, U., Hofstetter, H., Wolf, R.M., Seuwen, K.: Proton-sensing G-protein-coupled receptors. Nature 425, 93–98 (2003)CrossRefGoogle Scholar
  24. 24.
    Murray-Rust, P., Mitchell, J.B.O., Rzepa, H.S.: Communication and re-use of chemical information in bioscience. BMC Bioinformatics 6, 180 (2005)CrossRefGoogle Scholar
  25. 25.
    Murray-Rust, P., Mitchell, J.B.O., Rzepa, H.S.: Chemistry in Bioinformatics. BMC Bioinformatics 6, 141 (2005)CrossRefGoogle Scholar
  26. 26.
    Townsend, J., Copestake, A., Murray-Rust, P., Teufel, S., Waudby, C.: Language Technology for Processing Chemistry Publications. In: Proceedings of the fourth UK e-Science All Hands Meeting (2005)Google Scholar
  27. 27.
    Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 359–394 (1999)CrossRefGoogle Scholar
  28. 28.
    Townsend, J.A., Adams, S.E., Waudby, C.A., de Souza, V.K., Goodman, J.M., Murray-Rust, P.: Chemical documents: machine understanding and automated information extraction. Organic & Biomolecular Chemistry 2, 3294 (2004)CrossRefGoogle Scholar
  29. 29.
    A Guide to IUPAC Nomenclature of Organic Chemistry, Recommendations 1993 (including Revisions, Published and hitherto Unpublished, to the 1979 Edition of Nomenclature of Organic Chemistry), IUPAC (1993) Google Scholar
  30. 30.
    Van der Stouw, G.G., Naznitsky, I., Rush, J.E.: Procedures for Converting Systematic Names of Organic Compounds into Atom-Bond Connection Tables. Journal of Chemical Documentation 7, 165–169 (1967)CrossRefGoogle Scholar
  31. 31.
    Van der Stouw, G.G., Elliott, P.M., Isenbert, A.C.: Automated Conversion of Chemical Substance Names into Atom-Bond Connection Tables. Journal of Chemical Documentation 14, 185–193 (1974)CrossRefGoogle Scholar
  32. 32.
    Cooke-Fox, D.I., Kirby, G.H., Rayner, J.D.: Computer Translation of IUPAC Systematic Organic Chemical Nomenclature. 1. Introduction and Background to a Grammar-Based Approach. J. Chem. Inf. Comp. Sci. 29, 101 (1989)Google Scholar
  33. 33.
    Brecher, J.: Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical Nomenclature. J. Chem. Inf. Comp. Sci. 39, 943 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Peter Corbett
    • 1
  • Peter Murray-Rust
    • 1
  1. 1.Unilever center for Moleclular Sciences InformaticsCambridge

Personalised recommendations