Skip to main content
Log in

Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Data-intensive systems encompass terabytes to petabytes of data. Such systems require massive storage and intensive computational power in order to execute complex queries and generate timely results. Further, the rate at which this data is being generated induces extensive challenges of data storage, linking, and processing. A data-intensive cloud provides an abstraction of high availability, usability, and efficiency to users. However, underlying this abstraction, there are stringent requirements and challenges to facilitate scalable and resourceful services through effective physical infrastructure, smart networking solutions, intelligent software tools, and useful software approaches. This paper analyzes the extensive requirements which exist in data-intensive clouds, describes various challenges related to the paradigm, and assess numerous solutions in meeting these requirements and challenges. It provides a detailed study of the solutions and analyzes their capabilities in meeting emerging needs of widespread applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, D.: Data management in the Cloud: limitations and opportunities. In: IEEE Data Engineering (2009)

  2. Abadi, D.: Problems with CAP and Yahoo’s little known NOSQL System. Available. http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html. Last accessed 4 Oct 2012

  3. Abe, Y., Gibson, G.: pWalrus: Towards better integration of parallel file systems into cloud storage. In: Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS10), co-located with IEEE Int. Conference on Cluster Computing 2010 (Cluster10), Heraklion, Greece (2010)

  4. Abouzeid, A., Bajda-Pawlikowskim, K., Abadi, D., Silberschatzm, A., Rasin, A.: HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: VLDB (2009)

  5. Agrawal, S.: Hadoop NextGen. Hadoop India Summit (2011)

  6. Agrawal, S., Dunagan, J., Jain, N., Saroiu, S., Wolman, A., Bhogan, H.: Volley: Automated data placement for geo-distributed cloud services. In: Usenix NSDI (2010)

  7. Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: DCTCP: efficient packet transport for the commoditized data center. In: ACM SIGCOMM (2010)

  8. Andersen, D., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: FAWN: a fast array of wimpy nodes. In: Communications of the ACM (2011)

  9. Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: Above the Clouds: A Berkeley View of Cloud Computing. UCB/EECS-2009-28, EECS Department, University of California, Berkeley (2009)

  10. Baker, J., Bond, C., Corbett, J., Furman, J., Khorlin, A., Larson, J., Leon, J., Li, Y., Lloyd, A., Vadim, Y.: Megastore: providing scalable, highly available storage for interactive services. In: Proceedings of the Conference on Innovative Data system Research (CIDR), pp. 223–234 (2011)

  11. Balraj, K., Gunabalan, S.: An approach to achieve delegation of sensitive. RESTful resources on storage cloud. In: 2nd Workshop on Software Services: Cloud Computing and Applications based on Software Services. Timisoara (2011)

  12. Banker, K.: MongoDB in Action. Manning Publications (2012)

  13. Barroso, L.A.: Warehouse-scale computing: entering the teenage decade. In: ISCA (2011)

  14. Belady, C.: In the data center, power and cooling costs more than IT equipment it supports. In: Electronics Cooling Magazine (2007)

  15. Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U., Pasquimi, R.: Incoop: MapReduce for Incremental Computations. Max Planck Institute. Technical Report: MPI-SWS-2011-003 (2011)

  16. Bhatotia, P., Wieder, A., Akkus, I., Rodrigues, R., Acar, U.: Large-scale Incremental Data Processing with Change Propagation. Usenix Hotcloud (2011)

  17. Borthakur D.: HDFS Architecture Guide. Apache Foundation (2008)

  18. Borhtakur, D., Sarma, J., Gray, J.: Apache Hadoop goes realtime at Facebook. In: ACM SIGMOD, Athens, Greece (2011)

  19. Brewer, E.: Towards robust distributed systems. In: ACM Symposium on the Principles of Distributed Computing. Portland, OR, USA (2000)

  20. Bu, Y., Howe B., Balazinska, M., Ernst, M.: HaLoop: efficient iterative data processing on large clusters. J. Proceedings VLDB Endowment 3(1–2), 285–296 (2010)

    Google Scholar 

  21. Cao, Y., Chun Chen, C., Guo, F., Jiang, D., Lin, Y., Ooi, B., Vo, H., Wu, S., Xu, Q.: ES2: A cloud data storage system for supporting both OLTP and OLAP. In: IEEE ICDE (2011)

  22. Chambliss, D.: An architecture for storage-hosted application extensions. IBM J. Res. Develop. (0018-8646) 52(4), 427 (2008)

    Article  Google Scholar 

  23. Chang, F., Ganapathi, A., Katz, R.: To compress or not to compress—compute vs. IO tradeoffs for MapReduce energy efficiency. University of California–Berkeley. Technical Report (2010)

  24. Chen, Y., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 1–26 (2008)

    Google Scholar 

  25. Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: ACM EuroSys, Article 4, pp. 1–26 (2012)

  26. Close, T.: ACL’s Don’t. Technical Report HP Laboratories (2009)

  27. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.: MapReduce Online. Usenix NSDI (2010)

  28. Cooper, B., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, Jacobson, A., Puz, N., Weaver, D., Yernani, R.: PNUTS: Yahoo!’s hosted data serving platform. In: VLDB (2008)

  29. Cooper, B., Baldeschwieler, E., Fonseca, R., James, J., Kistler, J., Narayan, P., Neerdaels, C., Negrin, T., Ramakrishnan, R., Silberstein, A., Srivastava, U., Stata, R.: Building a Cloud for Yahoo!. In: IEEE Data Engineering (2009)

  30. Cooper, B., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems YCSB. In: SOCC (2010)

  31. Dai, J., Huang, J., Huang, S., Bo Huang, B., Liu, Y.: HiTune: dataflow-based performance analysis for big data cloud. In: Usenix HotCloud (2011)

  32. Das, S., Agrawal, D., Abbadi, A.: ElasTras: an elastic transactional data store in the cloud. In: Usenix Hotclud (2009)

  33. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)

  34. Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  35. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proc. SOSP (2007)

  36. DeWitt, D.J., Gerber, R.H., Graefe, G., Heytens, M.L., Kumar, K.B., Muralikrishna, M.: GAMMA: A high-performance dataflow database machine. In: VLDB, pp. 228–237 (1986)

  37. DeWitt, D., Stonebraker, M.: MapReduce: A major step backwards. Database Column Blog (2008). http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html

  38. Dwork, C.: Differential privacy. In: ICALP (2006)

  39. Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S., Qiu, J.: Geoffrey Fox. Twister: a runtime for iterative MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. HPDC (2010)

  40. Elmore, A., Das, S., Agrawal, D., Abbadi, A.: Zephyr: live migration in shared nothing databases for elastic cloud platforms. In: ACM SIGMOD (2011)

  41. Fan, B., Tantisiriroj, W., Xiao, L., Gibson, G.: DiskReduce: RAID for data-intensive scalable computing. In: PDSW Super Computing (2009)

  42. Ford, D., Labelle, F., Popovici, F., Stokely, M., Truong, V., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: OSDI (2010)

  43. Ghemawat, S., Gobio, H., Leung, T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. 7(5), 29–43 (2003)

    Article  Google Scholar 

  44. Gokhale, M., Cohen, J., Yoo, A., Marcus Miller, M., Jacob, A., Ulmer, C., Pearce, R.: Hardware technologies for high-performance data-intensive computing. IEEE Computer 41(4), 60–68 (2008)

    Article  Google Scholar 

  45. Gorton, I., Greenfield, P. Szalay, A., Williams, R.: Data-intensive computing in the 21st century. IEEE Computer 41(4), 30–32 (2008)

    Article  Google Scholar 

  46. Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management. MSR Tech Report December (2005)

  47. Grossman, R., Gu, Y.: On the varieties of clouds for data intensive computing. In: IEEE Data Engineering (2009)

  48. Gu, Y., Grossman, R.: Towards Efficient and Simplified Distributed Data Intensive Computing. IEEE Trans. Parallel Distrib. Syst. 22(6), 974–984 (2010)

    Google Scholar 

  49. Gunarathne, T., Wu, T., Qiu, J., Fox, G.: MapReduce in the clouds for science. In: IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom) (2010)

  50. Haddad, I.: PVFS: a parallel virtual file system for Linux clusters. Linux J. 2000(80) (2000)

  51. Hadoop. The Apache Hadoop Project. http://hadoop.apache.org/

  52. Harnik, D., Kolodner, E., Ronen, S., Satran, J. Shulman-Peleg, A., Tal, S.: Secure access mechanisms for cloud storage. In: 2nd Workshop on Software Services: Cloud Computing and Services: Cloud Computing and Applications based on Software Services (2011)

  53. HBase: The Apache HBase Project. http://hbase.apache.org/

  54. HBql Homepage—http://www.hbql.com/. Last accessed 10 Oct 2012

  55. He, B., Fang, W., Govindaraju, N., Luo, Q., Want, T.: Mars: a MapReduce framework on graphics processors. In: PACT (2008)

  56. He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: RCFile: a fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: IEEE ICDE (2011)

  57. Hindman, B., Konwinski, A., Zaharia, M., Ali Ghodsi, A., Joseph, A., Katz, R., Scott Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Usenix NSDI (2011)

  58. Hive HBase Integration. https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

  59. HStreaming Project. http://www.hstreaming.com/ Last accessed 7 Oct 2012

  60. Huang, J., Ouyang, X., Jose, J., Wasi-ur-Rahman, M., Wang, H., Luo, M., Subramoni, H., Murthy, C., Panda, D.: High-performance design of HBase with RDMA over infiniBand. In: IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS) (2012)

  61. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS/Eurosys (2007)

  62. Jiang, D., Tung, A.K. H, Chen, G.: Map-Join-Reduce: Towards Scalable and Efficient Data Analysis on Large Clusters. IEEE (2010)

  63. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proceedings of the Twenty-Ninth Annual ACM Symposium on theory of Computing (El Paso, Texas, United States, 4–6 May 1997). STOC ’97. ACM Press, New York, pp. 654–663 (1997)

    Chapter  Google Scholar 

  64. Kc, K., Anyanwu, K.: Scheduling Hadoop jobs to meet deadlines. In: IEEE CloudCom (2010)

  65. Ko, S., Hoque, I., Cho, B., Gupta, I.: On availability of intermediate data in cloud computations. In: Usenix HotOS (2009)

  66. Kollodner, E.: Data-intensive storage services on clouds: limitations, challenges, and enablers. In: 2nd Workshop on Software Services: Cloud Computing and Applications based on Software Services (2011)

  67. Kouzes R., Anderson G., Elbert S., Gorton, I., Gracio, D.: The changing paradigm of data-intensive computing. IEEE Computer 42(1), 26–34 (2009)

    Article  Google Scholar 

  68. Kovoor, G., Singer, J., LujánBuilding, M.: A Java MapReduce framework for multi-core architectures. In: Third Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG) (2010)

  69. Krevat, E., Joseph Tucek, J., Gregory, G.: Disks are like snowflakes: no two are alike. In: HotOS (2011)

  70. Krishnan, R., Madhyastha, H., Jain, S., Srinivasan, S., Krishnamurthy, A., Anderson, T., Gao, J.: Moving beyond end-to-end path information to optimize CDN performance. In: Internet Measurement Conference (IMC), pp. 190–201 (2009)

  71. Kung, H., Lin, C.-K., Vlah, D.: CloudSense: Continuous fine-grain cloud monitoring with compressive sensing. In: Usenix HotCloud (2011)

  72. Lakshman, A., Malik, P.: Cassandra—a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  73. Lang, W., Patel, J.: Energy management for mapreduce clusters. In: VLDB’10 (2010)

  74. Lin, J., Dyer, C.: Data Intensive Text Processing with MapReduce. Morgan and Claypool Publishers (2010)

  75. Lin, J., Ryaboy, D., Weil, K.: Full-text indexing for optimizing selection operations in large-scale data analytics. In: MapReduce (2011)

  76. Logothetis, D., Olston, C., Reed, B., Webb, K., Yocum, K.: Stateful bulk processing for incremental analytics. In: Proc. ACM Symposium on Cloud computing, SoCC ’10 (2010)

  77. Logothetis, D., Trezzo, C., Webb, K. Webb, Yocum, K.: In-situ MapReduce for log processing. In: Usenix HotCloud (2011)

  78. Marinelli, E.: Hyrax: Cloud computing on mobile devices using MapReduce. MS thesis, CMU (2009)

  79. Meisner, D., Sadler, C., Barroso, L., Weber, W., Wenisch, T.: Power management of online data-intensive services. In: ISCA ’11 (2011)

  80. Miceli, C., Miceli, M., Jha, S., Kaiser, H., Merzky, A.: Programming abstractions for data intensive computing on clouds and Grids. In: 9th IEEE/ACM International Symposium on Cluster Computing and the Grid

  81. Mooley, A., Murthy, K., Singh, H.: DisMaRC: A Distributed Map Reduce framework on CUDA. UTAustin Tech Report (2009)

  82. Moretti, C., Bulosan, J., Thain, D., Flynn, P.: All-Pairs: an abstraction for data-intensive cloud computing. IEEE Trans. Parallel Distrib. Syst. 21(1), 33–46 (2010)

    Article  Google Scholar 

  83. Morton, K., Balazinska, M., Grossman, D., Olston, C.: The case for being lazy: How to leverage lazy evaluation in MapReduce. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing. ScienceCloud (2011)

  84. Murray, D., Schwarzkopf, M., Smowton, C., Smith, S., Madhavapeddy, A., Hand, S.C.: A universal execution engine for distributed data-flow computing. In: NSDI (2011)

  85. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: S&P (2008)

  86. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Data Mining Workshops (ICDMW) (2010)

  87. Next Generation of Hadoop. Blog. http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/

  88. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: ACM SIGMOD (2008)

  89. Papagiannis, A., Nikolopoulos, D.: Scalable runtime support for data-intensive applications on the single-chip cloud computer. In: 3rd Many-core Applications Research Community Many-core Applications Research Community (MARC) Symposium (2011)

  90. Patil, S., Gibson, G.: Scale and concurrency of GIGA+: file system directories with millions of files. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST ’11). San Jose CA (2011)

  91. Pavlo, A., Paulson, E., Rasin, A., Abadi, J., Dewitt, J., Madden, S., Stonebraker, M.M.: A comparison of approaches to large-scale data analysis. In: SIGMOD ’09. ACM (2009)

  92. Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: OSDI (2010)

  93. Peterson, Z., Gondree, M., Beverly, M.: A position paper on data sovereignty: the importance of geolocating data in the cloud. In: Usenix HotCloud (2011)

  94. Pig. http://pig.apache.org/

  95. Pike, R., Dorward, S., Griesemer, R., Quinla, S.: Interpreting the data: parallel analysis with Sawzall. Sci. Program. J. (Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure) 13(4), 227–298

  96. Power, R., Li, J.: Piccolo: building fast, distributed programs with partitioned tables. In: Usenix OSDI (2010)

  97. Qiao, L.: Integration of server, storage and database stack: moving processing towards data. In: 2008 IEEE 24th International Conference on Data Engineering (1-4244-1836-4, 978-1-4244-1836-7), p. 1200 (2008)

  98. Raicu, I., Zhao, Y., Dumitrescu, C., Foster, L., Wilde, M.: FALKON: a Fast and Light-weight task executiON framework. In: ACM SC (2007)

  99. Raicu, I., Ian Foster, I., Zhao, Y., Szalay, A., Little, P., Moretti, C., Chaudhary, A., Thain, D.: Towards data intensive many-task computing. In: Data Intensive Distributed Computing Challenges and Solutions for Large-Scale Information Management (2012)

  100. Rasmussen, A., Porter, G., Conley, M., Madhyasthay, H.V., Mysore, R.N., Pucher, A., Vahdat, A.: TritonSort: a balanced large-scale sorting system. In: Usenix NSDI (2011)

  101. Ren, K., López, J., Gibson, G.: Otus: resource attribution in data-intensive clusters. In: Mapreduce (2011)

  102. Riak. https://wiki.basho.com/display/RIAK/Riak (2011)

  103. Roy, I., Setty, S., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and privacy for MapReduce. In: Usenix NSDI (2010)

  104. Sakr, S., Liu, A., Batista, M., Alomari, M.: A survey of large scale data management approaches in cloud environments. IEEE Commun. Surv. Tutor. 13(3), 311–336 (2011)

    Article  Google Scholar 

  105. SCC. Single chip Cloud Computer Project. http://www.intel.com/content/www/us/en/research/intel-labs-single-chip-cloud-computer.html. Last accessed 6 Oct 2012

  106. Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: FAST ’02: Proceedings of the 1st USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA (2002)

  107. Shang, P., Wang, J.: A novel power management for CMP systems in data-intensive environment. In: Parallel & Distributed Processing Symposium (IPDPS) (2011)

  108. Sharma, B., Chudnovsky, V., Hellerstein, J., Rifaat, R., Das, C.: Characterizing logical constraints in google compute clusters. In: Symposium on Cloud Computing (2011)

  109. Shieh, A., Kandulaz, S., Greenberg, A., Changhoon Kim, C., Saha, B.: Sharing the Data Center Network. NSDI (2011)

  110. Stonebraker, M., Abadi, D., Dewitt, D., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes. Commun. ACM 53(1), 65–71 (2010)

    Article  Google Scholar 

  111. Storm. https://github.com/nathanmarz/storm/wiki. Last accessed 7 Oct 2012

  112. Tan, J., Pan, X., Kavulya, S., E. Marinelli, E., Kavulya, S., Gandhi, R., Narasimhan, P.: Kahuna: Problem diagnosis for MapReduce-based cloud computing environments. In: 12th IEEE/IFIP NOMS (2010)

  113. Teradata Corp. Database Computer System Manual, Release 1.3. Los Angeles, CA (1985)

  114. Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at facebook. In: ACM SIGMOD (2010)

  115. Thusoo, A., Sarma, J.S. , Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murth, R.: Hive∣a petabyte scale data warehouse using Hadoop. In: ICDE (2010)

  116. Tian, C., Zhou, H., He, Y., Zha, L.: A Dynamic MapReduce Scheduler for Heterogenous Workloads IEEE GCC (2009)

  117. Valancius, V., Laoutaris, N., Massoulié, L., Diot, C., Rodriguez, P.: Greening the internet with nano data centers. In: ACM CONEXt (2009)

  118. Vasudevan, V., Amar Phanishayee, A., Shah, H., Krevat, E., Andersen, D., Ganger, G., Gibson, G., Mueller, B.: Safe and effective fine-grained TCP retransmissions for datacenter communication. In: ACM SIGCOMM (2009)

  119. Verma, A., Cherkasova, L., Campbell, R.: ARIA: automatic resource inference and allocation for MapReduce environments. In: Autonomic Computing Conference ICAC (2011)

  120. Wachs, M., Ganager, G.: Co-Scheduling of disk head time in cluster-based storage. In: IEEE SRDS (2009)

  121. Wachs, M., Ganger, G.: Improving storage bandwidth guarantees with performance insulation. Technical Report. Parallel Data Laboratory Carnegie Mellon University (2010)

  122. Wachs, M., Lianghong Xu, L., Kanevskyy, A., Ganger, G.: Exertion-based billing for cloud storage access. In: Usenix HotCloud (2011)

  123. Wang, C., Cao, N., Li, J., Ren, K., Lou, W.: Secure Ranked Keyword Search Over Encrypted Cloud Data. IEEE Computer Society (2010)

  124. Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanres, A., Qin, X.: Improving MapReduce performance through data placement in heterogenous Hadoop clusters. In: IEEE IPDPSW (2010)

  125. Yoo, R., Romano, A., Kozyrakis, C.: Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: IISWC (2009)

  126. Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U. Gunda, P., Currey, J.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI (2008)

  127. Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Usnix OSDI (2008)

  128. Zaharia, M., Borthakur, D., Sarma, J., Elmeleeg, K., Shenker, S., Stoica, I.: Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. University of California—Berkeley. Technical Report (2009)

  129. Zaharia, M., Borthakur, D., Sarma, J., Elmeleeg, K., Shenker, S., Stoica, I.: Job Scheduling for Multi-User MapReduce Clusters. University of California–Berkeley. Technical Report (2009)

  130. Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud ’10) (2010)

  131. Zaharia, M., Das, T., Li, H., Shenker, S., Scotia, I.: Discretized streams: an efficient and fault-tolerant model for stream processing n large clusters. In: Usenix Hotcloud (2012)

  132. Zhang, B., Ruan, Y., Wu, T., Qiu, J., Hughes, A., Fox, G.: Applying twister to scientific applications. In: 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom2010) (2010)

  133. Zhang, Y., Gao, Q., Gaoy, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. In: Proceedings of DataCloud 2011: The First International Workshop on Data Intensive Computing in the Clouds (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jawwad Shamsi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shamsi, J., Khojaye, M.A. & Qasmi, M.A. Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions. J Grid Computing 11, 281–310 (2013). https://doi.org/10.1007/s10723-013-9255-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-013-9255-6

Keywords

Navigation