Advertisement

Data Mining in Life Sciences: A Case Study on SAPs In-Memory Computing Engine

  • Joos-Hendrik Boese
  • Gennadi Rabinovitch
  • Matthias Steinbrecher
  • Miganoush Magarian
  • Massimiliano Marcon
  • Cafer Tosun
  • Vishal Sikka
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 154)

Abstract

While column-oriented in-memory databases have been primarily designed to support fast OLAP queries and business intelligence applications, their analytical performance makes them a promising platform for data mining tasks found in life sciences. One such system is the HANA database, SAP’s in-memory data management solution. In this contribution, we show how HANA meets some inherent requirements of data mining in life sciences. Furthermore, we conducted a case study in the area of proteomics research. As part of this study, we implemented a proteomics analysis pipeline in HANA. We also implemented a flexible data analysis toolbox that can be used by life sciences researchers to easily design and evaluate their analysis models.

Topics

Data mining and data analysis in real-time Case studies 

Submission type

Industry paper 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Plattner, H., Zeier, A.: In-Memory Data Management: An Inflection Point for Enterprise Applications. Springer (June 2011)Google Scholar
  2. 2.
    Färber, F., Cha, S.K., Primsch, J., Bornhövd, C., Sigg, S., Lehner, W.: SAP HANA Database: Data Management for Modern Business Applications. SIGMOD Rec. 40(4), 45–51 (2012)CrossRefGoogle Scholar
  3. 3.
    Sikka, V., Färber, F., Lehner, W., Cha, S.K., Peh, T., Bornhövd, C.: Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In: Proceedings of the 2012 International Conference on Management of Data, SIGMOD 2012, pp. 731–742. ACM, New York (2012)CrossRefGoogle Scholar
  4. 4.
    Plattner, H.: A Common Database Approach for OLTP and OLAP using an In-Memory Column Database. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 1–2. ACM (2009)Google Scholar
  5. 5.
    Jaecksch, B., Faerber, F., Rosenthal, F., Lehner, W.: Hybrid Data-Flow Graphs for Procedural Domain-Specific Query Languages. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 577–578. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Venables, W.N., Smith, D.M.: An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics Version 2.15.0. R-project.org. (March 2012)Google Scholar
  7. 7.
    Stonebraker, M., Becla, J., DeWitt, D.J., Lim, K.T., Maier, D., Ratzesberger, O., Zdonik, S.B.: Requirements for Science Data Bases and SciDB. In: CIDR (2009)Google Scholar
  8. 8.
    SciDB.org: The Open Source Data Management and Analytics Software for Scientific Research, http://www.scidb.org/Documents/SciDB-Summary.pdf (last accessed: July 22, 2012)
  9. 9.
    Haas, L.M., Schwarz, P.M., Kodali, P., Kotlar, E., Rice, J.E., Swope, W.C.: DiscoveryLink: A System for integrated Access to Life Sciences Data Sources. IBM Syst. J. 40(2), 489–511 (2001)CrossRefGoogle Scholar
  10. 10.
    SAP: SAP Unveils Unified Strategy for Real-Time Data Management to Grow Database Market Leadership. SAP News (April 2012), http://www.sap.com/corporate-en/press.epx?PressID=18621
  11. 11.
    Conrad, T.O.F.: New Statistical Algorithms for the Analysis of Mass Spectrometry Time-Of-Flight Mass Data with Applications in Clinical Diagnostics. PhD thesis, Freie Universität Berlin (2008)Google Scholar
  12. 12.
    MALDI-TOF Mass Analysis, http://www.protein.iastate.edu/maldi.html (last accessed: July 25, 2012)

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Joos-Hendrik Boese
    • 1
  • Gennadi Rabinovitch
    • 1
  • Matthias Steinbrecher
    • 1
  • Miganoush Magarian
    • 1
  • Massimiliano Marcon
    • 1
  • Cafer Tosun
    • 1
  • Vishal Sikka
    • 2
  1. 1.SAP Innovation CenterPotsdamGermany
  2. 2.SAPPalo AltoUSA

Personalised recommendations