Skip to main content

Catching the Drift – Indexing Implicit Knowledge in Chemical Digital Libraries

  • Conference paper
Theory and Practice of Digital Libraries (TPDL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7489))

Included in the following conference series:

Abstract

In the domain of chemistry the information gathering process is highly focused on chemical entities. But due to synonyms and different entity representations the indexing of chemical documents is a challenging process. Considering the field of drug design, the task is even more complex. Domain experts from this field are usually not interested in any chemical entity itself, but in representatives of some chemical class showing a specific reaction behavior. For describing such a reaction behavior of chemical entities the most interesting parts are their functional groups. The restriction of each chemical class is somehow also related to the entities’ reaction behavior, but further based on the chemist’s implicit knowledge. In this paper we present an approach dealing with this implicit knowledge by clustering chemical entities based on their functional groups. However, since such clusters are generally too unspecific, containing chemical entities from different chemical classes, we further divide them into sub-clusters using fingerprint based similarity measures. We analyze several uncorrelated fingerprint/similarity measure combinations and show that the most similar entities with respect to a query entity can be found in the respective sub-cluster. Furthermore, we use our approach for document retrieval introducing a new similarity measure based on Wikipedia categories. Our evaluation shows that the sub-clustering leads to suitable results enabling sophisticated document retrieval in chemical digital libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tönnies, S., Köhncke, B., Koepler, O., Balke, W.-T.: Exposing the Hidden Web for Chemical Digital Libraries. In: Proc. of the Joint Conf. on Digital Libraries, JCDL (2010)

    Google Scholar 

  2. Haider, N.: Functionality Pattern Matching as an Efficient Complementary Structure/Reaction Search Tool: An Open-Source Approach. Molecules 15(8) (2010)

    Google Scholar 

  3. Feldman, H.J., et al.: CO: A Chemical Ontology for Identification of Functional Groups and Semantic Comparison of Small Molecules. FEBS Letters 579(21) (2005)

    Google Scholar 

  4. Corbett, P., Murray-Rust, P.: High-Throughput Identification of Chemistry in Life Science Texts. In: Berthold, M., Glen, R.C., Fischer, I. (eds.) CompLife 2006. LNCS (LNBI), vol. 4216, pp. 107–118. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Townsend, J.A., et al.: Chemical Documents: Machine Understanding and Automated Information Extraction. Journal of Organic & Biomolecular Chemistry 2 (2004)

    Google Scholar 

  6. Morgan, H.L.: The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. Journal of Chemical Documentation 5(2) (1965)

    Google Scholar 

  7. Gluck, D.J.: A Chemical Structure Storage and Search System Developed at Du Pont. Journal of Chemical Documentation 5(1) (1965)

    Google Scholar 

  8. Smith, E., Baker, P., Wiswesser, W.: The Wiswesser Line-Formula Chemical Notation (WLN). Chemical Information Management (Cherry Hill, N.J.) 102(2) (1975)

    Google Scholar 

  9. Weininger, D.: SMILES, A Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. Journal of Chemical Information and Modeling 28(1) (1988)

    Google Scholar 

  10. Stein, S.E., Heller, S.R., Tchekhovskoi, D.: An Open Standard For Chemical Structure Representation: The IUPAC Chemical Identifier. In: Proc. of the International Chemical Information Conference (2003)

    Google Scholar 

  11. Berkhin, P.: A Survey of Clustering Data Mining Techniques. Journal of Grouping Multidimensional Data (2006)

    Google Scholar 

  12. Adamson, G.W., Bawden, D.: Comparison of Hierarchical Cluster Analysis Techniques for Automatic Classification of Chemical Structures. Journal of Chemical Information and Modeling 21(4) (1981)

    Google Scholar 

  13. Wilkens, S.J., Janes, J., Su, A.I.: HierS:  Hierarchical Scaffold Clustering Using Topological Chemical Graphs. Journal of Medicinal Chemistry 48(9) (2005)

    Google Scholar 

  14. Downs, G.M., Barnard, J.M.: Clustering Methods and their Uses in Computational Chemistry. Reviews in Computational Chemistry 18 (2002)

    Google Scholar 

  15. Hubálek, Z.: Coefficients of Association and Similarity, Based on Binary (Presence-Absence) Data: An Evaluation. Journal of Biological Reviews 57(4) (1982)

    Google Scholar 

  16. Willett, P., Barnard, J.M., Downs, G.M.: Chemical Similarity Searching. Journal of Chemical Information and Modeling 38(6) (1998)

    Google Scholar 

  17. Holliday, J., Hu, C., Willett, P.: Grouping of Coefficients for the Calculation of Inter-molecular Similarity and Dissimilarity Using 2D Fragment Bit-Strings. Journal of Combinatorial Chemistry; High Throughput Screening 5(2) (2002)

    Google Scholar 

  18. Willett, P.: Similarity-based Approaches to Virtual Screening. Journal of Biochemical Society Transactions 31 (2003)

    Google Scholar 

  19. Tönnies, S., Köhncke, B., Balke, W.-T.: Taking Chemistry to the Task – Personalized Queries for Chemical Digital Libraries. In: Proc. of the Joint Conf. on Digital Libraries, JCDL (2011)

    Google Scholar 

  20. Köhncke, B., Balke, W.-T.: Using Wikipedia Categories for Compact Representations of Chemical Documents. In: Proc. of the Int. Conf. of Information and Knowledge Management, CIKM (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Köhncke, B., Tönnies, S., Balke, WT. (2012). Catching the Drift – Indexing Implicit Knowledge in Chemical Digital Libraries. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33290-6_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33290-6_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33289-0

  • Online ISBN: 978-3-642-33290-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics