Advertisement

Towards an Integrated Platform for Big Data Analysis

  • Mahdi Bohlouli
  • Frank Schulz
  • Lefteris Angelis
  • David Pahor
  • Ivona Brandic
  • David Atlan
  • Rosemary Tate

Abstract

The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users’ activities. These large data sets have been labeled as “Big Data”, and their storage, processing and analysis presents a plethora of new challenges to computer science researchers and IT professionals. In addition to efficient data management, additional complexity arises from dealing with semi-structured or unstructured data, and from time critical processing requirements. In order to understand these massive amounts of data, advanced visualization and data exploration techniques are required.

Innovative approaches to these challenges have been developed during recent years, and continue to be a hot topic for research and industry in the future. An investigation of current approaches reveals that usually only one or two aspects are addressed, either in the data management, processing, analysis or visualization. This paper presents the vision of an integrated platform for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, a more efficient usage of system resources, and an improved usability during the end-to-end data analysis process.

Keywords

Scalable Decision Support Complex Event Processing Big Data Cloud Computing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexandrov, A., Ewen, S., Heimel, M., Hueske, F., Kao, O., Markl, V., Nijkamp, E., Warneke, D.: MapReduce and PACT - Comparing Data Parallel Programming Models. In: Proceedings of the 14th Conference on Database Systems for Business, Technology, and Web (BTW), pp. 25–44 (2011)Google Scholar
  2. 2.
    Agrawal, D., Das, S., El Abbadi, A.: Big Data and Cloud Computing: Current State and Future Opportunities. In: 14th International Conference on Extending Database Technology, EDBT (2011)Google Scholar
  3. 3.
    Apache Cassandra, http://cassandra.apache.org/
  4. 4.
  5. 5.
    Banker, K.: MongoDB in Action. Manning Publications Co. (2012)Google Scholar
  6. 6.
    Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Studies in Classification, Data Analysis, and Knowledge Organization, GfKL (2007)Google Scholar
  7. 7.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallahch, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. In: Seventh Symposium on Operating System Design and Implementation, OSDL (2006)Google Scholar
  8. 8.
    Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-Reduce for Machine Learning on Multicore. In: Twentieth Annual Conference on Neural Information Processing Systems (NIPS), pp. 281–288 (2006)Google Scholar
  9. 9.
    Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce Online. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI), p. 21 (2010)Google Scholar
  10. 10.
    Czajkowski, G., Dvorsky, M., Zhao, J., Conley, M.: Sorting Petabytes with MapReduce (September 2011)Google Scholar
  11. 11.
    Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: Integrating R and Hadoop. In: SIGMOD, pp. 987–998 (2010)Google Scholar
  12. 12.
    Dean, J., Ghemawat, S.: MapReduce – Simplified data processing on large clusters. In: Proceedings of the Sixth Symposium on Operating System Design and Implementation (2004); Journal Version: Communications of the ACM 51(1), 107–113 (2008)Google Scholar
  13. 13.
    Gartner Research: Hype Cycle for Emerging Technologies (July 2011), http://www.gartner.com/it/page.jsp?id=1763814
  14. 14.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. ACM SIGOPS Operating Systems Review 37(5), 29–43 (2003)CrossRefGoogle Scholar
  15. 15.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Google Scholar
  16. 16.
    Hameurlain, A., Küng, J., Wagner, R., Böhm, C., Eder, J., Plant, C. (eds.): Transactions on Large-Scale Data- and Knowledge-Centered Systems IV. LNCS, vol. 6990. Springer, Heidelberg (2011)Google Scholar
  17. 17.
    IBM: Bringing big data to the enterprise, http://www-01.ibm.com/software/data/bigdata/
  18. 18.
    Kelly, J.: Big Data Market Size and Vendor Revenues, Wikibon Report (March 2012), http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
  19. 19.
    Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: The next frontier for innovation, competition, and productivity. McKinsey Report (May 2011)Google Scholar
  20. 20.
    Miller, G.A.: The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. The Psychological Review 63, 81–97 (1956)CrossRefGoogle Scholar
  21. 21.
    Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: Distributed Stream Computing Platform. In: The 10th IEEE International Conference on Data Mining (ICDM) Workshops, pp. 170–177 (2010)Google Scholar
  22. 22.
    O’Reilly Media: Big Data Now (September 2011)Google Scholar
  23. 23.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)Google Scholar
  24. 24.
    Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-Scale Data Analysis. In: SIGMOD, pp. 165–178 (2009)Google Scholar
  25. 25.
    R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012) ISBN 3-900051-07-0Google Scholar
  26. 26.
    Russom, P.: Big Data Analytics. TDWI Report (Q4 2011)Google Scholar
  27. 27.
    Stratosphere Research Initiative, http://www.stratosphere.eu/
  28. 28.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive – a warehousing solution over a Map-Reduce framework. PVLDB 2(2), 1626–1629 (2009)Google Scholar
  29. 29.
    Warden, P.: Big Data Glossary. O’Reilly Media Publications, USA (2011)Google Scholar
  30. 30.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media Publications, USA (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Mahdi Bohlouli
    • 1
  • Frank Schulz
    • 2
  • Lefteris Angelis
    • 3
  • David Pahor
    • 4
  • Ivona Brandic
    • 5
  • David Atlan
    • 6
  • Rosemary Tate
    • 7
  1. 1.Institute of Knowledge Based Systems & Knowledge ManagementUniversity of SiegenSiegenGermany
  2. 2.SAP ResearchKarlsruheGermany
  3. 3.Aristotle UniversityThessalonikiGreece
  4. 4.Arctur d.o.oNova GoricaSlovenia
  5. 5.Technical University of ViennaViennaAustria
  6. 6.Phenosystems SABruxellesBelgium
  7. 7.University of SussexBrightonUnited Kingdom

Personalised recommendations