Similarity Search for Multi-dimensional NMR-Spectra of Natural Products

  • Karina Wolfram
  • Andrea Porzel
  • Alexander Hinneburg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)


Searching and mining nuclear magnetic resonance (NMR)-spectra of naturally occurring products is an important task to investigate new potentially useful chemical compounds. We develop a set-based similarity function, which, however, does not sufficiently capture more abstract aspects of similarity. NMR-spectra are like documents, but consists of continuous multi-dimensional points instead of words. Probabilistic semantic indexing (PLSI) is an retrieval method, which learns hidden topics. We develop several mappings from continuous NMR-spectra to discrete text-like data. The new mappings include redundancies into the discrete data, which proofs helpful for the PLSI-model used afterwards. Our experiments show that PLSI, which is designed for text data created by humans, can effectively handle the mapped NMR-data originating from natural products. Additionally, PLSI combined with the new mappings is able to find meaningful ”topics” in the NMR-data.


Grid Cell Similarity Search Latent Dirichlet Allocation Text Retrieval Grid Cell Size 


  1. 1.
    Barros, A.S., Rutledge, D.N.: Segmented principal component transform-principal component analysis. Chemometrics & Intelligent Laboratory Systems 78, 125–137 (2005)CrossRefGoogle Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)MATHCrossRefGoogle Scholar
  3. 3.
    Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999 (1999)Google Scholar
  4. 4.
    Krishnan, P., Kruger, N.J., Ratcliffe, R.G.: Metabolite fingerprinting and profiling in plants using nmr. Journal of Experimental Botany 56, 255–265 (2005)CrossRefGoogle Scholar
  5. 5.
    Steinbeck, C., Krause, S., Kuhn, S.: Nmrshiftdb-constructing a free chemical information system with open-source components. J. chem. inf. & comp. sci. 43, 1733–1739 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Karina Wolfram
    • 1
  • Andrea Porzel
    • 2
  • Alexander Hinneburg
    • 1
  1. 1.Institute of Computer ScienceMartin-Luther-University of Halle-WittenbergGermany
  2. 2.Leibniz Institute of Plant Biochemistry (IPB)Germany

Personalised recommendations