Similarity Search for Multi-dimensional NMR-Spectra of Natural Products

Wolfram, Karina; Porzel, Andrea; Hinneburg, Alexander

doi:10.1007/11871637_67

Karina Wolfram²¹,
Andrea Porzel²² &
Alexander Hinneburg²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4213))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3419 Accesses
2 Citations

Abstract

Searching and mining nuclear magnetic resonance (NMR)-spectra of naturally occurring products is an important task to investigate new potentially useful chemical compounds. We develop a set-based similarity function, which, however, does not sufficiently capture more abstract aspects of similarity. NMR-spectra are like documents, but consists of continuous multi-dimensional points instead of words. Probabilistic semantic indexing (PLSI) is an retrieval method, which learns hidden topics. We develop several mappings from continuous NMR-spectra to discrete text-like data. The new mappings include redundancies into the discrete data, which proofs helpful for the PLSI-model used afterwards. Our experiments show that PLSI, which is designed for text data created by humans, can effectively handle the mapped NMR-data originating from natural products. Additionally, PLSI combined with the new mappings is able to find meaningful ”topics” in the NMR-data.

Download to read the full chapter text

Chapter PDF

Local Linear Matrix Factorization for Document Modeling

Non-negative Matrix Factorization Procedure for Characteristic Mining of Mathematical Formulae from Documents

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Article 14 July 2018

Chenguang Wang, Yangqiu Song, … Jiawei Han

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Barros, A.S., Rutledge, D.N.: Segmented principal component transform-principal component analysis. Chemometrics & Intelligent Laboratory Systems 78, 125–137 (2005)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999 (1999)
Google Scholar
Krishnan, P., Kruger, N.J., Ratcliffe, R.G.: Metabolite fingerprinting and profiling in plants using nmr. Journal of Experimental Botany 56, 255–265 (2005)
Article Google Scholar
Steinbeck, C., Krause, S., Kuhn, S.: Nmrshiftdb-constructing a free chemical information system with open-source components. J. chem. inf. & comp. sci. 43, 1733–1739 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Martin-Luther-University of Halle-Wittenberg, Germany
Karina Wolfram & Alexander Hinneburg
Leibniz Institute of Plant Biochemistry (IPB), Germany
Andrea Porzel

Authors

Karina Wolfram
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Porzel
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Hinneburg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wolfram, K., Porzel, A., Hinneburg, A. (2006). Similarity Search for Multi-dimensional NMR-Spectra of Natural Products. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_67

Download citation

DOI: https://doi.org/10.1007/11871637_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Similarity Search for Multi-dimensional NMR-Spectra of Natural Products

Abstract

Chapter PDF

Similar content being viewed by others

Local Linear Matrix Factorization for Document Modeling

Non-negative Matrix Factorization Procedure for Characteristic Mining of Mathematical Formulae from Documents

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Similarity Search for Multi-dimensional NMR-Spectra of Natural Products

Abstract

Chapter PDF

Similar content being viewed by others

Local Linear Matrix Factorization for Document Modeling

Non-negative Matrix Factorization Procedure for Characteristic Mining of Mathematical Formulae from Documents

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation