Data Mining in Life Sciences: A Case Study on SAPs In-Memory Computing Engine
While column-oriented in-memory databases have been primarily designed to support fast OLAP queries and business intelligence applications, their analytical performance makes them a promising platform for data mining tasks found in life sciences. One such system is the HANA database, SAP’s in-memory data management solution. In this contribution, we show how HANA meets some inherent requirements of data mining in life sciences. Furthermore, we conducted a case study in the area of proteomics research. As part of this study, we implemented a proteomics analysis pipeline in HANA. We also implemented a flexible data analysis toolbox that can be used by life sciences researchers to easily design and evaluate their analysis models.
TopicsData mining and data analysis in real-time Case studies
Submission typeIndustry paper
Unable to display preview. Download preview PDF.
- 1.Plattner, H., Zeier, A.: In-Memory Data Management: An Inflection Point for Enterprise Applications. Springer (June 2011)Google Scholar
- 3.Sikka, V., Färber, F., Lehner, W., Cha, S.K., Peh, T., Bornhövd, C.: Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In: Proceedings of the 2012 International Conference on Management of Data, SIGMOD 2012, pp. 731–742. ACM, New York (2012)CrossRefGoogle Scholar
- 4.Plattner, H.: A Common Database Approach for OLTP and OLAP using an In-Memory Column Database. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 1–2. ACM (2009)Google Scholar
- 6.Venables, W.N., Smith, D.M.: An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics Version 2.15.0. R-project.org. (March 2012)Google Scholar
- 7.Stonebraker, M., Becla, J., DeWitt, D.J., Lim, K.T., Maier, D., Ratzesberger, O., Zdonik, S.B.: Requirements for Science Data Bases and SciDB. In: CIDR (2009)Google Scholar
- 8.SciDB.org: The Open Source Data Management and Analytics Software for Scientific Research, http://www.scidb.org/Documents/SciDB-Summary.pdf (last accessed: July 22, 2012)
- 10.SAP: SAP Unveils Unified Strategy for Real-Time Data Management to Grow Database Market Leadership. SAP News (April 2012), http://www.sap.com/corporate-en/press.epx?PressID=18621
- 11.Conrad, T.O.F.: New Statistical Algorithms for the Analysis of Mass Spectrometry Time-Of-Flight Mass Data with Applications in Clinical Diagnostics. PhD thesis, Freie Universität Berlin (2008)Google Scholar
- 12.MALDI-TOF Mass Analysis, http://www.protein.iastate.edu/maldi.html (last accessed: July 25, 2012)