Efficient Transformation of Protein Sequence Databases to Columnar Index Schema

Zoun, Roman; Schallert, Kay; Broneske, David; Trifonova, Ivayla; Chen, Xiao; Heyer, Robert; Benndorf, Dirk; Saake, Gunter

doi:10.1007/978-3-030-27684-3_10

Roman Zoun²⁰,
Kay Schallert²⁰,
David Broneske²⁰,
Ivayla Trifonova²⁰,
Xiao Chen²⁰,
Robert Heyer²⁰,
Dirk Benndorf²¹ &
…
Gunter Saake²⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1062))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

599 Accesses

Abstract

Mass spectrometry is used to sequence proteins and extract bio-markers of biological environments. These bio-markers can be used to diagnose thousands of diseases and optimize biological environments such as bio-gas plants. Indexing of the protein sequence data allows to streamline the experiments and speed up the analysis. In our work, we present a schema for distributed column-based database management systems using a column-oriented index to store sequence data. This leads to the problem, how to transform the protein sequence data from the standard format to the new schema. We analyze four different methods of transformation and evaluate those four different methods. The results show that our proposed extended radix tree has the best performance regarding memory consumption and calculation time. Hence, the radix tree is proved to be a suitable data structure for the transformation of protein sequences into the indexed schema.

Supported by organization de.NBI and Bruker Daltonik GmbH.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Deutsch, E.W.: File formats commonly used in mass spectrometry proteomics. Mol. Cell. Proteomics 11(12), 1612–1621 (2012)
Article Google Scholar
Heyer, R., et al.: Metaproteomics of complex microbial communities in biogas plants. Microb. Technol. 8, 749–763 (2015)
Google Scholar
Heyer, R., et al.: Challenges and perspectives of metaproteomic data analysis. J. Biotechnol. 261(Suppl. C), 24–36 (2017)
Article Google Scholar
https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp:. Fasta format, November 2002
Leis, V., et al.: The adaptive radix tree: artful indexing for main-memory databases. In: IEEE International Conference on Data Engineering (ICDE 2013), pp. 38–49 (2013)
Google Scholar
Millioni, R., et al.: Pros and cons of peptide isolectric focusing in shotgun proteomics. J. Chromatogr. A 1293, 1–9 (2013)
Article Google Scholar
Petriz, B.A., et al.: Metaproteomics as a complementary approach to gut microbiota in health and disease. Front. Chem. 5, 4 (2017)
Article Google Scholar
Shishibori, M., et al.: An efficient compression method for patricia tries. In: 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, vol. 1, pp. 415–420, October 1997
Google Scholar
Zoun, R., et al.: Protein identification as a suitable application for fast data architecture. In: International Workshop on Biological Knowledge Discovery and Data Mining (BIOKDD-DEXA). IEEE, September 2018
Google Scholar
Zoun, R., et al.: Msdatastream - connecting a bruker mass spectrometer to the internet. In: Datenbanksysteme für Business, Technologie und Web, March 2019
Google Scholar

Download references

Acknowledgments

The authors sincerely thank Niya Zoun, Gabriel Cam-pero Durand, Marcus Pinnecke, Sebastian Krieter, Sven Helmer, Sven Brehmer and Andreas Meister for their support and advice. This work is partly funded by the BMBF (Fkz: 031L0103), the European Regional Development Fund (no.: 11.000sz00.00.0 17 114347 0), the DFG (grant no.: SA 465/50-1), by the German Federal Ministry of Food and Agriculture (grants no.: 22404015) and dedicated to the memory of Mikhail Zoun.

Author information

Authors and Affiliations

University of Magdeburg, Magdeburg, Germany
Roman Zoun, Kay Schallert, David Broneske, Ivayla Trifonova, Xiao Chen, Robert Heyer & Gunter Saake
Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
Dirk Benndorf

Authors

Roman Zoun
View author publications
You can also search for this author in PubMed Google Scholar
Kay Schallert
View author publications
You can also search for this author in PubMed Google Scholar
David Broneske
View author publications
You can also search for this author in PubMed Google Scholar
Ivayla Trifonova
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Robert Heyer
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Benndorf
View author publications
You can also search for this author in PubMed Google Scholar
Gunter Saake
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roman Zoun .

Editor information

Editors and Affiliations

Institute of Telecooperation, Johannes Kepler University of Linz, Linz, Oberösterreich, Austria
Gabriele Anderst-Kotsis
Software Competence Center Hagenberg, Hagenberg, Austria
A Min Tjoa
Institute of Telecooperation, Johannes Kepler University of Linz, Linz, Oberösterreich, Austria
Ismail Khalil
ENSIT, LaTICE, University of Tunis, Tunis, Tunisia
Mourad Elloumi
Software Competence Center, Hagenberg, Austria
Atif Mashkoor
Steyregg, Oberösterreich, Austria
Johannes Sametinger
Edificio 204, ICT Division,TECNALIA, Derio, Vizcaya, Spain
Xabier Larrucea
Top 74, Innsbruck, Tirol, Austria
Anna Fensel
Hagenberg Gmbh, Software Competence Center, Hagenberg im Mühlkreis, Oberösterreich, Austria
Jorge Martinez-Gil
Software Competence Center Hagenberg, SC, Hagenberg im Mühlkreis, Oberösterreich, Austria
Bernhard Moser
University of Twente, ENSCHEDE, Overijssel, The Netherlands
Christin Seifert
Bauhaus Universität Weimar, Weimar, Thüringen, Germany
Benno Stein
MiCS, Media Computer Science, University of Passau, Passau, Bayern, Germany
Michael Granitzer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zoun, R. et al. (2019). Efficient Transformation of Protein Sequence Databases to Columnar Index Schema. In: Anderst-Kotsis, G., et al. Database and Expert Systems Applications. DEXA 2019. Communications in Computer and Information Science, vol 1062. Springer, Cham. https://doi.org/10.1007/978-3-030-27684-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-27684-3_10
Published: 01 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27683-6
Online ISBN: 978-3-030-27684-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics