Skip to main content

Modern Approaches to Chemical Image Recognition

  • Chapter
  • First Online:
Current Challenges in Patent Information Retrieval

Part of the book series: The Information Retrieval Series ((INRE,volume 37))

Abstract

Millions of existing patent documents and journal articles dealing with chemistry describe chemical structures by way of structure images (so-called Kekulé structures). While being human-readable, these structure images cannot be interpreted by a computer and are unusable in the context of most chemoinformatics applications: structure and substructure searches, chemo-biological property calculations, etc. There are currently many formats available for storing structural information in a computer-readable format, but the conversion of millions of images by hand is a cumbersome and time-consuming process. Therefore there is a need for an automatic tool for converting images into structures. One of the first such tools was presented at ICDAR in 1993 (OROCS). We would like to present modern developments in optical structure recognition which build upon the ideas developed earlier and add modern enhancements to the process of automatic extraction of structure images from the surrounding text and graphics and conversion of the extracted images into a molecular format. We describe in detail two top performing chemical OCR applications—one open source and one academic software package. The performance here was judged by TREC-CHEM 2011 and CLEF 2012 challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Casey R, Boyer S, Healey P, Miller A, Oudot B, Zilles Z (1993) Optical recognition of chemical graphics. In: Proceedings of the international conference on document analysis and recognition, pp 627–632

    Google Scholar 

  2. McDaniel J, Balmuth J (1992) Kekule - OCR optical chemical (structure) recognition. J Chem Inf Comput Sci 32:373–378

    Article  Google Scholar 

  3. Contreras M, Allendes C, Alvarez L, Rozas R (1990) Computational perception and recognition of digitized molecular structures. J Chem Inf Comput Sci 30:302–307

    Article  Google Scholar 

  4. Ibison P, Jacquot M, Kam F, Neville A, Simpson R, Tonnelier C, Venczel T, Johnson A (1993) Chemical literature data extraction - the CLiDE project. J Chem Inf Comput Sci 33:338–344

    Article  Google Scholar 

  5. Park J, Rosania G, Shedden K, Nguyen M, Lyu N, Saitou K (2009) Automated extraction of chemical structure information from digital raster images. Chem Cent J 3(1):4

    Article  Google Scholar 

  6. Zimmermann M, Thi L, Hofmann M (2005) Combating illiteracy in chemistry: towards computer-based chemical structure reconstruction. ERCIM News 60:40–41

    Google Scholar 

  7. Zimmermann M (2006) Large scale evaluation of chemical structure recognition. In: Proceedings of the 4th text mining symposium in life sciences

    Google Scholar 

  8. Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information: OSRA, an open source solution. J Chem Inf Model 49(3):740–743

    Article  Google Scholar 

  9. Valko AT, Johnson AP (2009) CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition. J Chem Inf Model 49(4):780–787

    Article  Google Scholar 

  10. Sadawi NM, Sexton AP, Sorge V (2012) Chemical structure recognition: a rule based approach. In: Viard-Gaudin C, Zanibbi R (eds) 19th Document recognition and retrieval conference (DRR 2012), SPIE, Bellingham

    Google Scholar 

  11. Cychosz JM (1994) Efficient binary image thinning using neighborhood maps. In: Graphics gems IV. Academic, San Diego, pp 465–473

    Chapter  Google Scholar 

  12. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9:62–66

    Article  Google Scholar 

  13. Guo Z, Hall RW (1989) Parallel thinning with two subiteration algorithms. Commun ACM 32(3):359–373

    Article  MathSciNet  Google Scholar 

  14. Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica 10(2):112–122

    Article  Google Scholar 

  15. Jain A, Trier D, Taxt T (1996) Feature extraction methods for character recognition: a survey. Pattern Recogn 29(4):641–662

    Article  Google Scholar 

  16. Accelrys (2011) CTfile format. http://accelrys.com/products/collaborative-science/biovia-draw/ctfile-no-fee.html

  17. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33

    Article  Google Scholar 

  18. Lupu M, Jiashu Z, Huang J, Gurulingappa H, Filipov I, Tait J (2011) Overview of the TREC 2011 chemical IR track. In: Proceedings of TREC

    Google Scholar 

  19. Piroi F, Lupu M, Hanbury A, Sexton A, Magdy W, Filippov I (2012) CLEF-IP 2012: retrieval experiments in the intellectual property domain. In: Working notes of CLEF

    Google Scholar 

  20. Heifets A, Jurisica I (2012) SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents. Nucleic Acids Res 40(D1):D428–D433. doi:10.1093/nar/gkr919

    Article  Google Scholar 

  21. Wendling L, Tabbone S (2003) Recognition of arrows in line drawings based on the aggregation of geometric criteria using the Choquet integral. In: Seventh international conference on document analysis and recognition–ICDAR 2003, Edinburgh, pp 299–303

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor V. Filippov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer-Verlag GmbH Germany

About this chapter

Cite this chapter

Filippov, I.V., Lupu, M., Sexton, A.P. (2017). Modern Approaches to Chemical Image Recognition. In: Lupu, M., Mayer, K., Kando, N., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 37. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53817-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-53817-3_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-53816-6

  • Online ISBN: 978-3-662-53817-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics