Abstract
While column-oriented in-memory databases have been primarily designed to support fast OLAP queries and business intelligence applications, their analytical performance makes them a promising platform for data mining tasks found in life sciences. One such system is the HANA database, SAP’s in-memory data management solution. In this contribution, we show how HANA meets some inherent requirements of data mining in life sciences. Furthermore, we conducted a case study in the area of proteomics research. As part of this study, we implemented a proteomics analysis pipeline in HANA. We also implemented a flexible data analysis toolbox that can be used by life sciences researchers to easily design and evaluate their analysis models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Plattner, H., Zeier, A.: In-Memory Data Management: An Inflection Point for Enterprise Applications. Springer (June 2011)
Färber, F., Cha, S.K., Primsch, J., Bornhövd, C., Sigg, S., Lehner, W.: SAP HANA Database: Data Management for Modern Business Applications. SIGMOD Rec. 40(4), 45–51 (2012)
Sikka, V., Färber, F., Lehner, W., Cha, S.K., Peh, T., Bornhövd, C.: Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In: Proceedings of the 2012 International Conference on Management of Data, SIGMOD 2012, pp. 731–742. ACM, New York (2012)
Plattner, H.: A Common Database Approach for OLTP and OLAP using an In-Memory Column Database. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 1–2. ACM (2009)
Jaecksch, B., Faerber, F., Rosenthal, F., Lehner, W.: Hybrid Data-Flow Graphs for Procedural Domain-Specific Query Languages. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 577–578. Springer, Heidelberg (2011)
Venables, W.N., Smith, D.M.: An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics Version 2.15.0. R-project.org. (March 2012)
Stonebraker, M., Becla, J., DeWitt, D.J., Lim, K.T., Maier, D., Ratzesberger, O., Zdonik, S.B.: Requirements for Science Data Bases and SciDB. In: CIDR (2009)
SciDB.org: The Open Source Data Management and Analytics Software for Scientific Research, http://www.scidb.org/Documents/SciDB-Summary.pdf (last accessed: July 22, 2012)
Haas, L.M., Schwarz, P.M., Kodali, P., Kotlar, E., Rice, J.E., Swope, W.C.: DiscoveryLink: A System for integrated Access to Life Sciences Data Sources. IBM Syst. J. 40(2), 489–511 (2001)
SAP: SAP Unveils Unified Strategy for Real-Time Data Management to Grow Database Market Leadership. SAP News (April 2012), http://www.sap.com/corporate-en/press.epx?PressID=18621
Conrad, T.O.F.: New Statistical Algorithms for the Analysis of Mass Spectrometry Time-Of-Flight Mass Data with Applications in Clinical Diagnostics. PhD thesis, Freie Universität Berlin (2008)
MALDI-TOF Mass Analysis, http://www.protein.iastate.edu/maldi.html (last accessed: July 25, 2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boese, JH. et al. (2013). Data Mining in Life Sciences: A Case Study on SAPs In-Memory Computing Engine. In: Castellanos, M., Dayal, U., Rundensteiner, E.A. (eds) Enabling Real-Time Business Intelligence. BIRTE 2012. Lecture Notes in Business Information Processing, vol 154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39872-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-39872-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39871-1
Online ISBN: 978-3-642-39872-8
eBook Packages: Computer ScienceComputer Science (R0)