Skip to main content

Cloud Computing and Big Data Analytics: What Is New from Databases Perspective?

  • Conference paper
Big Data Analytics (BDA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7678))

Included in the following conference series:

Abstract

Many industries, such as telecom, health care, retail, pharmaceutical, financial services, etc., generate large amounts of data. Gaining critical business insights by querying and analyzing such massive amounts of data is becoming the need of the hour. The warehouses and solutions built around them are unable to provide reasonable response times in handling expanding data volumes. One can either perform analytics on big volume once in days or one can perform transactions on small amounts of data in seconds. With the new requirements, one needs to ensure the real-time or near real-time response for huge amount of data. In this paper we outline challenges in analyzing big data for both data at rest as well as data in motion. For big data at rest we describe two kinds of systems: (1) NoSQL systems for interactive data serving environments; and (2) systems for large scale analytics based on MapReduce paradigm, such as Hadoop, The NoSQL systems are designed to have a simpler key-value based data model having in-built sharding, hence, these work seamlessly in a distributed cloud based environment. In contrast, one can use Hadoop based systems to run long running decision support and analytical queries consuming and possible producing bulk data. For processing data in motion, we present use-cases and illustrative algorithms of data stream management system (DSMS). We also illustrate applications which can use these two kinds of systems to quickly process massive amount of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Avizienis, A.: Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing (2004)

    Google Scholar 

  2. Srivastava, A., Kundu, A., Sural, S., Majumdar, A.: Credit Card Fraud Detection using Hidden Markov Model. IEEE Transactions on Dependable and Secure Computing (2008)

    Google Scholar 

  3. Stewart, R.J., Trinder, P.W., Loidl, H.-W.: Comparing High Level MapReduce Query Languages. In: Temam, O., Yew, P.-C., Zang, B. (eds.) APPT 2011. LNCS, vol. 6965, pp. 58–72. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Apache Foundation. Hadoop, http://hadoop.apache.org/core/

  5. Awadallah, A.: Hadoop: An Industry Perspective. In: International Workshop on Massive Data Analytics Over Cloud (2010) (keynote talk)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communications of ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Hive- Hadoop wiki, http://wiki.apache.org/hadoop/Hive

  8. JSON, http://www.json.org

  9. Gupta, R., Gupta, H., Nambiar, U., Mohania, M.: Enabling Active Archival Over Cloud. In: Proceedings of Service Computing Conference, SCC (2012)

    Google Scholar 

  10. Stonebraker, M., et al.: C-STORE: A Column-oriented DBMS. In: Proceedings of Very Large Databases, VLDB (2005)

    Google Scholar 

  11. Vardi, M.: The Universal-Relation Data Model for Logical Independence. IEEE Software 5(2) (1988)

    Google Scholar 

  12. Borthakur, D., Jan, N., Sharma, J., Murthy, R., Liu, H.: Data Warehousing and Analytics Infrastructure at Facebook. In: Proceedings of ACM International Conference on Management of Data, SIGMOD (2010)

    Google Scholar 

  13. Jaql Project hosting, http://code.google.com/p/jaql/

  14. Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.-C., Ozcan, F., Shekita, E.J.: Jaql: A Scripting Language for Large Scale Semi-structured Data Analysis. In: Proceedings of Very Large Databases, VLDB (2011)

    Google Scholar 

  15. Liveland: Hive vs. Pig, http://www.larsgeorge.com/2009/10/hive-vs-pig.html

  16. Pig, hadoop.apache.org/pig/

  17. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig-Latin: A Not-So-Foreign Language for Data Processing. In: Proceedings of ACM International Conference on Management of Data, SIGMOD (2008)

    Google Scholar 

  18. HBase, hbase.apache.org/

  19. Curino, C., Jones, E.P.C., Popa, R.A., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Realtional Cloud: A Database-as-a-Service for the Cloud. In: Proceedings of Conference on Innovative Data Systems Research, CIDR (2011)

    Google Scholar 

  20. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive – A Petabyte Scake Data Warehouse Using Hadoop. In: Proceedings of International Conference on Data Engineering, ICDE (2010)

    Google Scholar 

  21. Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal (2005)

    Google Scholar 

  22. Zikopoulos, P., Eaton, C., Deroos, D., Deutsch, T., Lapis, G.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGrawHill (2012)

    Google Scholar 

  23. Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.: SPADE: The System S Declaratve Stream Processing Engine. In: Proceedings of ACM International Conference on Management of Data, SIGMOD (2008)

    Google Scholar 

  24. Bouillet, E., Kothari, R., Kumar, V., Mignet, L., et al.: Processing 6 billion CDRs/day: from research to production (experience report). In: Proceedings of International Conference on Distributed Event-Based Systems, DEBS (2012)

    Google Scholar 

  25. Kai, http://sourceforge.net/apps/mediawiki/kai

  26. Fox, A., Gribble, S.D., Chawathe, Y., Brewer, E.A., Gauthier, P.: Cluster-Based Scalable Network Services. In: Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles, SOSP (1997)

    Google Scholar 

  27. Wada, H., Fekede, A., Zhao, L., Lee, K., Liu, A.: Data Consistency Properties and the Trade-offs in Commercial Cloud Storages: the Consumers’ Perspective. In: Proceedings of Conference on Innovative Data Systems Research, CIDR (2011)

    Google Scholar 

  28. Gray, J., Helland, P., O’Neil, P.E., Shasha, D.: The Dangers of Replication and a Solution. In: Proceedings of ACM International Conference on Management of Data (1996)

    Google Scholar 

  29. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilch, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s Highly Available Key-value Store. In: Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, SOSP (2007)

    Google Scholar 

  30. Habeeb, M.: A Developer’s Guide to Amazon SimpleDB. Pearson Education

    Google Scholar 

  31. Lehnardt, J., Anderson, J.C., Slater, N.: CouchDB: The Definitive Guide. O’Reilly (2010)

    Google Scholar 

  32. Chodorow, K., Dirolf, M.: MongoDB: The Definitive Guide. O’Reilly Media, USA (2010)

    Google Scholar 

  33. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: BigTable: A Distributed Storage System for Structured Data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design annd Implementation, OSDI (2006)

    Google Scholar 

  34. Storm: The Hadoop of Stream processing, http://fierydata.com/2012/03/29/storm-the-hadoop-of-stream-processing/

  35. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: Distributed Stream Computing Platform. In: IEEE International Conference on Data Mining Workshops, ICDMW (2010)

    Google Scholar 

  36. Biem, A., Bouillet, E., Feng, H., et al.: IBM infosphere streams for scalable, real-time, intelligent transportation services. In: SIGMOD 2010 (2010)

    Google Scholar 

  37. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the Annual Symposium on Theory of Computing, STOC (1996)

    Google Scholar 

  38. Babcock, B., Babu, S., Datar, M., Motvani, R., Widom, J.: Model and Issues in Data Streams Systems. ACM PODS (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gupta, R., Gupta, H., Mohania, M. (2012). Cloud Computing and Big Data Analytics: What Is New from Databases Perspective?. In: Srinivasa, S., Bhatnagar, V. (eds) Big Data Analytics. BDA 2012. Lecture Notes in Computer Science, vol 7678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35542-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35542-4_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35541-7

  • Online ISBN: 978-3-642-35542-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics