Journal of Grid Computing

, Volume 11, Issue 2, pp 281–310 | Cite as

Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

  • Jawwad Shamsi
  • Muhammad Ali Khojaye
  • Mohammad Ali Qasmi
Article

Abstract

Data-intensive systems encompass terabytes to petabytes of data. Such systems require massive storage and intensive computational power in order to execute complex queries and generate timely results. Further, the rate at which this data is being generated induces extensive challenges of data storage, linking, and processing. A data-intensive cloud provides an abstraction of high availability, usability, and efficiency to users. However, underlying this abstraction, there are stringent requirements and challenges to facilitate scalable and resourceful services through effective physical infrastructure, smart networking solutions, intelligent software tools, and useful software approaches. This paper analyzes the extensive requirements which exist in data-intensive clouds, describes various challenges related to the paradigm, and assess numerous solutions in meeting these requirements and challenges. It provides a detailed study of the solutions and analyzes their capabilities in meeting emerging needs of widespread applications.

Keywords

Data-intensive cloud computing Scalability Fault tolerance Heterogeneity Large scale data management Cloud data storage 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D.: Data management in the Cloud: limitations and opportunities. In: IEEE Data Engineering (2009)Google Scholar
  2. 2.
    Abadi, D.: Problems with CAP and Yahoo’s little known NOSQL System. Available. http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html. Last accessed 4 Oct 2012
  3. 3.
    Abe, Y., Gibson, G.: pWalrus: Towards better integration of parallel file systems into cloud storage. In: Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS10), co-located with IEEE Int. Conference on Cluster Computing 2010 (Cluster10), Heraklion, Greece (2010)Google Scholar
  4. 4.
    Abouzeid, A., Bajda-Pawlikowskim, K., Abadi, D., Silberschatzm, A., Rasin, A.: HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: VLDB (2009)Google Scholar
  5. 5.
    Agrawal, S.: Hadoop NextGen. Hadoop India Summit (2011)Google Scholar
  6. 6.
    Agrawal, S., Dunagan, J., Jain, N., Saroiu, S., Wolman, A., Bhogan, H.: Volley: Automated data placement for geo-distributed cloud services. In: Usenix NSDI (2010)Google Scholar
  7. 7.
    Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: DCTCP: efficient packet transport for the commoditized data center. In: ACM SIGCOMM (2010)Google Scholar
  8. 8.
    Andersen, D., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: FAWN: a fast array of wimpy nodes. In: Communications of the ACM (2011)Google Scholar
  9. 9.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: Above the Clouds: A Berkeley View of Cloud Computing. UCB/EECS-2009-28, EECS Department, University of California, Berkeley (2009)Google Scholar
  10. 10.
    Baker, J., Bond, C., Corbett, J., Furman, J., Khorlin, A., Larson, J., Leon, J., Li, Y., Lloyd, A., Vadim, Y.: Megastore: providing scalable, highly available storage for interactive services. In: Proceedings of the Conference on Innovative Data system Research (CIDR), pp. 223–234 (2011)Google Scholar
  11. 11.
    Balraj, K., Gunabalan, S.: An approach to achieve delegation of sensitive. RESTful resources on storage cloud. In: 2nd Workshop on Software Services: Cloud Computing and Applications based on Software Services. Timisoara (2011)Google Scholar
  12. 12.
    Banker, K.: MongoDB in Action. Manning Publications (2012)Google Scholar
  13. 13.
    Barroso, L.A.: Warehouse-scale computing: entering the teenage decade. In: ISCA (2011)Google Scholar
  14. 14.
    Belady, C.: In the data center, power and cooling costs more than IT equipment it supports. In: Electronics Cooling Magazine (2007)Google Scholar
  15. 15.
    Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U., Pasquimi, R.: Incoop: MapReduce for Incremental Computations. Max Planck Institute. Technical Report: MPI-SWS-2011-003 (2011)Google Scholar
  16. 16.
    Bhatotia, P., Wieder, A., Akkus, I., Rodrigues, R., Acar, U.: Large-scale Incremental Data Processing with Change Propagation. Usenix Hotcloud (2011)Google Scholar
  17. 17.
    Borthakur D.: HDFS Architecture Guide. Apache Foundation (2008)Google Scholar
  18. 18.
    Borhtakur, D., Sarma, J., Gray, J.: Apache Hadoop goes realtime at Facebook. In: ACM SIGMOD, Athens, Greece (2011)Google Scholar
  19. 19.
    Brewer, E.: Towards robust distributed systems. In: ACM Symposium on the Principles of Distributed Computing. Portland, OR, USA (2000)Google Scholar
  20. 20.
    Bu, Y., Howe B., Balazinska, M., Ernst, M.: HaLoop: efficient iterative data processing on large clusters. J. Proceedings VLDB Endowment 3(1–2), 285–296 (2010)Google Scholar
  21. 21.
    Cao, Y., Chun Chen, C., Guo, F., Jiang, D., Lin, Y., Ooi, B., Vo, H., Wu, S., Xu, Q.: ES2: A cloud data storage system for supporting both OLTP and OLAP. In: IEEE ICDE (2011)Google Scholar
  22. 22.
    Chambliss, D.: An architecture for storage-hosted application extensions. IBM J. Res. Develop. (0018-8646) 52(4), 427 (2008)CrossRefGoogle Scholar
  23. 23.
    Chang, F., Ganapathi, A., Katz, R.: To compress or not to compress—compute vs. IO tradeoffs for MapReduce energy efficiency. University of California–Berkeley. Technical Report (2010)Google Scholar
  24. 24.
    Chen, Y., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 1–26 (2008)Google Scholar
  25. 25.
    Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: ACM EuroSys, Article 4, pp. 1–26 (2012)Google Scholar
  26. 26.
    Close, T.: ACL’s Don’t. Technical Report HP Laboratories (2009)Google Scholar
  27. 27.
    Condie, T., Conway, N., Alvaro, P., Hellerstein, J.: MapReduce Online. Usenix NSDI (2010)Google Scholar
  28. 28.
    Cooper, B., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, Jacobson, A., Puz, N., Weaver, D., Yernani, R.: PNUTS: Yahoo!’s hosted data serving platform. In: VLDB (2008)Google Scholar
  29. 29.
    Cooper, B., Baldeschwieler, E., Fonseca, R., James, J., Kistler, J., Narayan, P., Neerdaels, C., Negrin, T., Ramakrishnan, R., Silberstein, A., Srivastava, U., Stata, R.: Building a Cloud for Yahoo!. In: IEEE Data Engineering (2009)Google Scholar
  30. 30.
    Cooper, B., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems YCSB. In: SOCC (2010)Google Scholar
  31. 31.
    Dai, J., Huang, J., Huang, S., Bo Huang, B., Liu, Y.: HiTune: dataflow-based performance analysis for big data cloud. In: Usenix HotCloud (2011)Google Scholar
  32. 32.
    Das, S., Agrawal, D., Abbadi, A.: ElasTras: an elastic transactional data store in the cloud. In: Usenix Hotclud (2009)Google Scholar
  33. 33.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)Google Scholar
  34. 34.
    Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRefGoogle Scholar
  35. 35.
    DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proc. SOSP (2007)Google Scholar
  36. 36.
    DeWitt, D.J., Gerber, R.H., Graefe, G., Heytens, M.L., Kumar, K.B., Muralikrishna, M.: GAMMA: A high-performance dataflow database machine. In: VLDB, pp. 228–237 (1986)Google Scholar
  37. 37.
    DeWitt, D., Stonebraker, M.: MapReduce: A major step backwards. Database Column Blog (2008). http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html
  38. 38.
    Dwork, C.: Differential privacy. In: ICALP (2006)Google Scholar
  39. 39.
    Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S., Qiu, J.: Geoffrey Fox. Twister: a runtime for iterative MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. HPDC (2010)Google Scholar
  40. 40.
    Elmore, A., Das, S., Agrawal, D., Abbadi, A.: Zephyr: live migration in shared nothing databases for elastic cloud platforms. In: ACM SIGMOD (2011)Google Scholar
  41. 41.
    Fan, B., Tantisiriroj, W., Xiao, L., Gibson, G.: DiskReduce: RAID for data-intensive scalable computing. In: PDSW Super Computing (2009)Google Scholar
  42. 42.
    Ford, D., Labelle, F., Popovici, F., Stokely, M., Truong, V., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: OSDI (2010)Google Scholar
  43. 43.
    Ghemawat, S., Gobio, H., Leung, T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. 7(5), 29–43 (2003)CrossRefGoogle Scholar
  44. 44.
    Gokhale, M., Cohen, J., Yoo, A., Marcus Miller, M., Jacob, A., Ulmer, C., Pearce, R.: Hardware technologies for high-performance data-intensive computing. IEEE Computer 41(4), 60–68 (2008)CrossRefGoogle Scholar
  45. 45.
    Gorton, I., Greenfield, P. Szalay, A., Williams, R.: Data-intensive computing in the 21st century. IEEE Computer 41(4), 30–32 (2008)CrossRefGoogle Scholar
  46. 46.
    Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management. MSR Tech Report December (2005)Google Scholar
  47. 47.
    Grossman, R., Gu, Y.: On the varieties of clouds for data intensive computing. In: IEEE Data Engineering (2009)Google Scholar
  48. 48.
    Gu, Y., Grossman, R.: Towards Efficient and Simplified Distributed Data Intensive Computing. IEEE Trans. Parallel Distrib. Syst. 22(6), 974–984 (2010)Google Scholar
  49. 49.
    Gunarathne, T., Wu, T., Qiu, J., Fox, G.: MapReduce in the clouds for science. In: IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom) (2010)Google Scholar
  50. 50.
    Haddad, I.: PVFS: a parallel virtual file system for Linux clusters. Linux J. 2000(80) (2000)Google Scholar
  51. 51.
    Hadoop. The Apache Hadoop Project. http://hadoop.apache.org/
  52. 52.
    Harnik, D., Kolodner, E., Ronen, S., Satran, J. Shulman-Peleg, A., Tal, S.: Secure access mechanisms for cloud storage. In: 2nd Workshop on Software Services: Cloud Computing and Services: Cloud Computing and Applications based on Software Services (2011)Google Scholar
  53. 53.
    HBase: The Apache HBase Project. http://hbase.apache.org/
  54. 54.
    HBql Homepage—http://www.hbql.com/. Last accessed 10 Oct 2012
  55. 55.
    He, B., Fang, W., Govindaraju, N., Luo, Q., Want, T.: Mars: a MapReduce framework on graphics processors. In: PACT (2008)Google Scholar
  56. 56.
    He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: RCFile: a fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: IEEE ICDE (2011)Google Scholar
  57. 57.
    Hindman, B., Konwinski, A., Zaharia, M., Ali Ghodsi, A., Joseph, A., Katz, R., Scott Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Usenix NSDI (2011)Google Scholar
  58. 58.
  59. 59.
    HStreaming Project. http://www.hstreaming.com/ Last accessed 7 Oct 2012
  60. 60.
    Huang, J., Ouyang, X., Jose, J., Wasi-ur-Rahman, M., Wang, H., Luo, M., Subramoni, H., Murthy, C., Panda, D.: High-performance design of HBase with RDMA over infiniBand. In: IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS) (2012)Google Scholar
  61. 61.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS/Eurosys (2007)Google Scholar
  62. 62.
    Jiang, D., Tung, A.K. H, Chen, G.: Map-Join-Reduce: Towards Scalable and Efficient Data Analysis on Large Clusters. IEEE (2010)Google Scholar
  63. 63.
    Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proceedings of the Twenty-Ninth Annual ACM Symposium on theory of Computing (El Paso, Texas, United States, 4–6 May 1997). STOC ’97. ACM Press, New York, pp. 654–663 (1997)CrossRefGoogle Scholar
  64. 64.
    Kc, K., Anyanwu, K.: Scheduling Hadoop jobs to meet deadlines. In: IEEE CloudCom (2010)Google Scholar
  65. 65.
    Ko, S., Hoque, I., Cho, B., Gupta, I.: On availability of intermediate data in cloud computations. In: Usenix HotOS (2009)Google Scholar
  66. 66.
    Kollodner, E.: Data-intensive storage services on clouds: limitations, challenges, and enablers. In: 2nd Workshop on Software Services: Cloud Computing and Applications based on Software Services (2011)Google Scholar
  67. 67.
    Kouzes R., Anderson G., Elbert S., Gorton, I., Gracio, D.: The changing paradigm of data-intensive computing. IEEE Computer 42(1), 26–34 (2009)CrossRefGoogle Scholar
  68. 68.
    Kovoor, G., Singer, J., LujánBuilding, M.: A Java MapReduce framework for multi-core architectures. In: Third Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG) (2010)Google Scholar
  69. 69.
    Krevat, E., Joseph Tucek, J., Gregory, G.: Disks are like snowflakes: no two are alike. In: HotOS (2011)Google Scholar
  70. 70.
    Krishnan, R., Madhyastha, H., Jain, S., Srinivasan, S., Krishnamurthy, A., Anderson, T., Gao, J.: Moving beyond end-to-end path information to optimize CDN performance. In: Internet Measurement Conference (IMC), pp. 190–201 (2009)Google Scholar
  71. 71.
    Kung, H., Lin, C.-K., Vlah, D.: CloudSense: Continuous fine-grain cloud monitoring with compressive sensing. In: Usenix HotCloud (2011)Google Scholar
  72. 72.
    Lakshman, A., Malik, P.: Cassandra—a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRefGoogle Scholar
  73. 73.
    Lang, W., Patel, J.: Energy management for mapreduce clusters. In: VLDB’10 (2010)Google Scholar
  74. 74.
    Lin, J., Dyer, C.: Data Intensive Text Processing with MapReduce. Morgan and Claypool Publishers (2010)Google Scholar
  75. 75.
    Lin, J., Ryaboy, D., Weil, K.: Full-text indexing for optimizing selection operations in large-scale data analytics. In: MapReduce (2011)Google Scholar
  76. 76.
    Logothetis, D., Olston, C., Reed, B., Webb, K., Yocum, K.: Stateful bulk processing for incremental analytics. In: Proc. ACM Symposium on Cloud computing, SoCC ’10 (2010)Google Scholar
  77. 77.
    Logothetis, D., Trezzo, C., Webb, K. Webb, Yocum, K.: In-situ MapReduce for log processing. In: Usenix HotCloud (2011)Google Scholar
  78. 78.
    Marinelli, E.: Hyrax: Cloud computing on mobile devices using MapReduce. MS thesis, CMU (2009)Google Scholar
  79. 79.
    Meisner, D., Sadler, C., Barroso, L., Weber, W., Wenisch, T.: Power management of online data-intensive services. In: ISCA ’11 (2011)Google Scholar
  80. 80.
    Miceli, C., Miceli, M., Jha, S., Kaiser, H., Merzky, A.: Programming abstractions for data intensive computing on clouds and Grids. In: 9th IEEE/ACM International Symposium on Cluster Computing and the GridGoogle Scholar
  81. 81.
    Mooley, A., Murthy, K., Singh, H.: DisMaRC: A Distributed Map Reduce framework on CUDA. UTAustin Tech Report (2009)Google Scholar
  82. 82.
    Moretti, C., Bulosan, J., Thain, D., Flynn, P.: All-Pairs: an abstraction for data-intensive cloud computing. IEEE Trans. Parallel Distrib. Syst. 21(1), 33–46 (2010)CrossRefGoogle Scholar
  83. 83.
    Morton, K., Balazinska, M., Grossman, D., Olston, C.: The case for being lazy: How to leverage lazy evaluation in MapReduce. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing. ScienceCloud (2011)Google Scholar
  84. 84.
    Murray, D., Schwarzkopf, M., Smowton, C., Smith, S., Madhavapeddy, A., Hand, S.C.: A universal execution engine for distributed data-flow computing. In: NSDI (2011)Google Scholar
  85. 85.
    Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: S&P (2008)Google Scholar
  86. 86.
    Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Data Mining Workshops (ICDMW) (2010)Google Scholar
  87. 87.
  88. 88.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: ACM SIGMOD (2008)Google Scholar
  89. 89.
    Papagiannis, A., Nikolopoulos, D.: Scalable runtime support for data-intensive applications on the single-chip cloud computer. In: 3rd Many-core Applications Research Community Many-core Applications Research Community (MARC) Symposium (2011)Google Scholar
  90. 90.
    Patil, S., Gibson, G.: Scale and concurrency of GIGA+: file system directories with millions of files. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST ’11). San Jose CA (2011)Google Scholar
  91. 91.
    Pavlo, A., Paulson, E., Rasin, A., Abadi, J., Dewitt, J., Madden, S., Stonebraker, M.M.: A comparison of approaches to large-scale data analysis. In: SIGMOD ’09. ACM (2009)Google Scholar
  92. 92.
    Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: OSDI (2010)Google Scholar
  93. 93.
    Peterson, Z., Gondree, M., Beverly, M.: A position paper on data sovereignty: the importance of geolocating data in the cloud. In: Usenix HotCloud (2011)Google Scholar
  94. 94.
  95. 95.
    Pike, R., Dorward, S., Griesemer, R., Quinla, S.: Interpreting the data: parallel analysis with Sawzall. Sci. Program. J. (Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure) 13(4), 227–298Google Scholar
  96. 96.
    Power, R., Li, J.: Piccolo: building fast, distributed programs with partitioned tables. In: Usenix OSDI (2010)Google Scholar
  97. 97.
    Qiao, L.: Integration of server, storage and database stack: moving processing towards data. In: 2008 IEEE 24th International Conference on Data Engineering (1-4244-1836-4, 978-1-4244-1836-7), p. 1200 (2008)Google Scholar
  98. 98.
    Raicu, I., Zhao, Y., Dumitrescu, C., Foster, L., Wilde, M.: FALKON: a Fast and Light-weight task executiON framework. In: ACM SC (2007)Google Scholar
  99. 99.
    Raicu, I., Ian Foster, I., Zhao, Y., Szalay, A., Little, P., Moretti, C., Chaudhary, A., Thain, D.: Towards data intensive many-task computing. In: Data Intensive Distributed Computing Challenges and Solutions for Large-Scale Information Management (2012)Google Scholar
  100. 100.
    Rasmussen, A., Porter, G., Conley, M., Madhyasthay, H.V., Mysore, R.N., Pucher, A., Vahdat, A.: TritonSort: a balanced large-scale sorting system. In: Usenix NSDI (2011)Google Scholar
  101. 101.
    Ren, K., López, J., Gibson, G.: Otus: resource attribution in data-intensive clusters. In: Mapreduce (2011)Google Scholar
  102. 102.
  103. 103.
    Roy, I., Setty, S., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and privacy for MapReduce. In: Usenix NSDI (2010)Google Scholar
  104. 104.
    Sakr, S., Liu, A., Batista, M., Alomari, M.: A survey of large scale data management approaches in cloud environments. IEEE Commun. Surv. Tutor. 13(3), 311–336 (2011)CrossRefGoogle Scholar
  105. 105.
    SCC. Single chip Cloud Computer Project. http://www.intel.com/content/www/us/en/research/intel-labs-single-chip-cloud-computer.html. Last accessed 6 Oct 2012
  106. 106.
    Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: FAST ’02: Proceedings of the 1st USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA (2002)Google Scholar
  107. 107.
    Shang, P., Wang, J.: A novel power management for CMP systems in data-intensive environment. In: Parallel & Distributed Processing Symposium (IPDPS) (2011)Google Scholar
  108. 108.
    Sharma, B., Chudnovsky, V., Hellerstein, J., Rifaat, R., Das, C.: Characterizing logical constraints in google compute clusters. In: Symposium on Cloud Computing (2011)Google Scholar
  109. 109.
    Shieh, A., Kandulaz, S., Greenberg, A., Changhoon Kim, C., Saha, B.: Sharing the Data Center Network. NSDI (2011)Google Scholar
  110. 110.
    Stonebraker, M., Abadi, D., Dewitt, D., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes. Commun. ACM 53(1), 65–71 (2010)CrossRefGoogle Scholar
  111. 111.
    Storm. https://github.com/nathanmarz/storm/wiki. Last accessed 7 Oct 2012
  112. 112.
    Tan, J., Pan, X., Kavulya, S., E. Marinelli, E., Kavulya, S., Gandhi, R., Narasimhan, P.: Kahuna: Problem diagnosis for MapReduce-based cloud computing environments. In: 12th IEEE/IFIP NOMS (2010)Google Scholar
  113. 113.
    Teradata Corp. Database Computer System Manual, Release 1.3. Los Angeles, CA (1985)Google Scholar
  114. 114.
    Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at facebook. In: ACM SIGMOD (2010)Google Scholar
  115. 115.
    Thusoo, A., Sarma, J.S. , Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murth, R.: Hive∣a petabyte scale data warehouse using Hadoop. In: ICDE (2010)Google Scholar
  116. 116.
    Tian, C., Zhou, H., He, Y., Zha, L.: A Dynamic MapReduce Scheduler for Heterogenous Workloads IEEE GCC (2009)Google Scholar
  117. 117.
    Valancius, V., Laoutaris, N., Massoulié, L., Diot, C., Rodriguez, P.: Greening the internet with nano data centers. In: ACM CONEXt (2009)Google Scholar
  118. 118.
    Vasudevan, V., Amar Phanishayee, A., Shah, H., Krevat, E., Andersen, D., Ganger, G., Gibson, G., Mueller, B.: Safe and effective fine-grained TCP retransmissions for datacenter communication. In: ACM SIGCOMM (2009)Google Scholar
  119. 119.
    Verma, A., Cherkasova, L., Campbell, R.: ARIA: automatic resource inference and allocation for MapReduce environments. In: Autonomic Computing Conference ICAC (2011)Google Scholar
  120. 120.
    Wachs, M., Ganager, G.: Co-Scheduling of disk head time in cluster-based storage. In: IEEE SRDS (2009)Google Scholar
  121. 121.
    Wachs, M., Ganger, G.: Improving storage bandwidth guarantees with performance insulation. Technical Report. Parallel Data Laboratory Carnegie Mellon University (2010)Google Scholar
  122. 122.
    Wachs, M., Lianghong Xu, L., Kanevskyy, A., Ganger, G.: Exertion-based billing for cloud storage access. In: Usenix HotCloud (2011)Google Scholar
  123. 123.
    Wang, C., Cao, N., Li, J., Ren, K., Lou, W.: Secure Ranked Keyword Search Over Encrypted Cloud Data. IEEE Computer Society (2010)Google Scholar
  124. 124.
    Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanres, A., Qin, X.: Improving MapReduce performance through data placement in heterogenous Hadoop clusters. In: IEEE IPDPSW (2010)Google Scholar
  125. 125.
    Yoo, R., Romano, A., Kozyrakis, C.: Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: IISWC (2009)Google Scholar
  126. 126.
    Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U. Gunda, P., Currey, J.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI (2008)Google Scholar
  127. 127.
    Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Usnix OSDI (2008)Google Scholar
  128. 128.
    Zaharia, M., Borthakur, D., Sarma, J., Elmeleeg, K., Shenker, S., Stoica, I.: Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. University of California—Berkeley. Technical Report (2009)Google Scholar
  129. 129.
    Zaharia, M., Borthakur, D., Sarma, J., Elmeleeg, K., Shenker, S., Stoica, I.: Job Scheduling for Multi-User MapReduce Clusters. University of California–Berkeley. Technical Report (2009)Google Scholar
  130. 130.
    Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud ’10) (2010)Google Scholar
  131. 131.
    Zaharia, M., Das, T., Li, H., Shenker, S., Scotia, I.: Discretized streams: an efficient and fault-tolerant model for stream processing n large clusters. In: Usenix Hotcloud (2012)Google Scholar
  132. 132.
    Zhang, B., Ruan, Y., Wu, T., Qiu, J., Hughes, A., Fox, G.: Applying twister to scientific applications. In: 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom2010) (2010)Google Scholar
  133. 133.
    Zhang, Y., Gao, Q., Gaoy, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. In: Proceedings of DataCloud 2011: The First International Workshop on Data Intensive Computing in the Clouds (2011)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Jawwad Shamsi
    • 1
  • Muhammad Ali Khojaye
    • 1
  • Mohammad Ali Qasmi
    • 1
  1. 1.Systems Research LaboratoryFAST-National University of Computer and Emerging SciencesKarachiPakistan

Personalised recommendations