Skip to main content
Log in

Performance analysis of data intensive cloud systems based on data management and replication: a survey

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such ‘Data Explosions’ has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amount of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service Provider must assure that data is efficiently processed and distributed in a way that does not compromise end-users’ Quality of Service (QoS) in terms of data availability, data search delay, data analysis delay, and the like. In the aforementioned perspective, data replication is used in the cloud for improving the performance (e.g., read and write delay) of applications that access data. Through replication a data intensive application or system can achieve high availability, better fault tolerance, and data recovery. In this paper, we survey data management and replication approaches (from 2007 to 2011) that are developed by both industrial and research communities. The focus of the survey is to discuss and characterize the existing approaches of data replication and management that tackle the resource usage and QoS provisioning with different levels of efficiencies. Moreover, the breakdown of both influential expressions (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Mell, P., Grance, T.: Definition of cloud computing. Technical report, National Institute of Standard and Technology (NIST) (2009)

  2. Bell, G., Gray, J., Szalay, A.: Petascale computational systems. IEEE Comp. 39(1), 110–112 (2006)

    Article  Google Scholar 

  3. Lamanna, M.: High-energy physics applications on the grid. In: Wang, Lizhe, Jie, Wei, Chen, Jinjun (eds.) Grid Computing: Infrastructure, Service, and Applications, pp. 433–458. CRC Press, Boca Raton (2009)

    Chapter  Google Scholar 

  4. Khatib, Y., Edwards, C.: A Survey-Based Study of Grid Traffic. In: Proceedings of GridNets, pp. 41–48 (2007)

  5. Gartner: Gartner top ten disruptive technologies for 2008 to 2012. Emerging trends and technologies roadshow http://www.gartner.com/it/page.jsp?id=681107, Accessed (2011)

  6. Abadi, D.: Data management in the cloud: limitations and opportunities. IEEE Data Eng. Bull. 32(1), 3–12 (2009)

    Google Scholar 

  7. Leinwand, A.: The Hidden Cost of the cloud: Bandwidth Charges, GIGAom, Jul. 17 2009, http://gigaom.com/2009/07/17/the-hidden-cost-of-the-cloud-bandwidth-charges/, Accessed May 12 (2011)

  8. Sakr, S., Liu, A., Batista, D., Alomari, M.: A survey of large scale data management approaches in cloud environments. IEEE Commun. Survey Tutor. 09, 1–26 (2011)

    Google Scholar 

  9. Cassandra: Available at http://incubator.apache.org/cassandra/, Accessed (2011)

  10. Thusoo, A., Sarma, J., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive-A warehousing solution over a MapReduce framework. In VLDB, pp. 1626–1629 (2009)

  11. HBase: Available at http://hadoop.apache.org/hbase/, Accessed (2011)

  12. Loukopoulos, Thanasis, Ahmad, Ishfaq, Papadias, Dimitris: An overview of data replication on the internet. In: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN.02), pp. 27–32 (2002)

  13. Kia, H.S., Khan, S.U.: Server replication in multicast networks. In: 10th IEEE International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, pp. 337–341 (2012)

  14. Khan, S.U., Ahmad, I.: A pure Nash equilibrium based game theoretical method for data replication across multiple servers. IEEE Trans. Knowl. Data Eng. 21(4), 537–553 (2009)

    Article  MathSciNet  Google Scholar 

  15. Khan, S.U.: A frugal auction technique for data replication in large distributed computing systems. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, NV, USA, pp. 17–23 (2009)

  16. Khan, S.U., Ardil, C.: A competitive replica placement methodology for Ad Hoc networks. In: International Conference on Parallel and Distributed Computing Systems (ICPDCS), Oslo, Norway, pp. 128–133 (2009)

  17. Khan, S.U., Ahmad, I.: Comparison and analysis of ten static heuristics-based internet data replication techniques. J. Parallel Distrib. Comput. 68(2), 113–136 (2008)

    Article  MATH  Google Scholar 

  18. Khan, S.U., Maciejewski, A.A., Siegel, H.J., Ahmad, I.: A game theoretical data replication technique for mobile Ad Hoc networks. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS). Miami (2008)

  19. Khan, S.U., Ahmad, I.: A pure Nash equilibrium guaranteeing game theoretical replica allocation method for reducing web access time. In: 12th International Conference on Parallel and Distributed Systems (ICPADS), Minneapolis pp. 169–176 (2006)

  20. Khan, S.U., Ahmad, I.: Game theoretical solutions for data replication in distributed computing systems. In: Rajasekaran, S., Reif, J. (eds.), Handbook of Parallel Computing: Models, Algorithms, and Applications. Chapman & Hall/CRC Press, Boca Raton (2007). ISBN 1-584-88623-4, Chapter 45

  21. Khan, S.U., Ahmad, I.: Data replication in large distributed computing systems using supergames. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, pp. 38–44 (2006)

  22. Khan, S.U., Ardil, C.: A frugal bidding procedure for replicating WWW content. Int. J. Inform. Technol. 5(1), 67–80 (2009)

    Google Scholar 

  23. Khan, S.U., Maciejewski, A.A., Siegel, H.J.: Robust CDN replica placement techniques. In: 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS). Italy, Rome (2009)

  24. Khan, S.U., Ardil, C.: A fast replica placement methodology for large-scale distributed computing systems. In: International Conference on Parallel and Distributed Computing Systems (ICPDCS), Oslo, pp. 121–127 (2009)

  25. Wu, Y., Li, G., Wang, L., Ma, Y., Kolodziej, J., Khan, S.U.: A review of data intensive computing. In: 12th International Conference on Scalable Computing and Communications (ScalCom), Changzhou, (2012)

  26. Khan, S.U., Ahmad, I.: A cooperative game theoretical replica placement technique. In: 13th International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, (2007)

  27. Khan, S.U., Ahmad, I.: Replicating data objects in large-scale distributed computing systems using extended Vickery auction. Int. J. Comput. Intell. 3(1), 14–22 (2006)

    Google Scholar 

  28. Gao, Aiqiang, Diao, Luhong: Lazy update propagation for data replication in cloud computing. In: 5th International Conference on Pervasive Computing and Applications (ICPCA), pp. 250–254 (2010)

  29. Ikeda, Takahiko, Ohara, Mamoru, Fukumoto, Satoshi, Arai, Masayuki, Iwasaki, Kazuhiko: A distributed data replication protocol for file versioning with optimal node assignments. In: Proceedings of IEEE International Pacific Rim International Symposium on Dependable Computing 2010, pp. 117–125 (2011)

  30. Khan, S.U., Ahmad, I.: Discriminatory algorithmic mechanism design based WWW content replication. Informatica 31(1), 105–119 (2007)

    MathSciNet  Google Scholar 

  31. Khan, S.U., Ahmad, I.: A semi-distributed axiomatic game theoretical mechanism for replicating data objects in large distributed computing systems. In: 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS). Long Beach (2007)

  32. Kohavi, R., Henne, R.M., Sommerfield, D.: Practical guide to controlled experiments on the Web: Listen to your customers not to the HiPPO. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (KDD 2007), pp. 959–967

  33. Gulati, A., Merchant, A., Varman, P.: pClock: An arrival curve based approach for QoS in shared storage systems. In: Proceedings ACM International Conference on Measurement and Modeling of Computer System (SIGMETRICS), (2007)

  34. Gulati, A., Merchant, A., Varman, P.: mClock: Handling throughput variability for hypervisor IO scheduling. In: Proceedings of the 9th OSDI, (2010)

  35. Wang, J., Varmany, P., Xie, C.: Avoiding performance fluctuation in cloud storage. In: Proceeding of High Performance Computing (HiPC), pp. 1–9 (2010)

  36. Goiri, I., Julia, F., Fito, J., Macias, M., Guitart, J.: Resource-level QoS metric for CPU-based guarantees in Cloud providers. In: 7th international workshop on economics of grids, Clouds, systems, and services, pp. 34–47 (2010)

  37. Amrhein, D., Anderson, P., de Andrade, A., Armstrong, J., Arasan, E., Bartlett, J., Bruklis, R., Cameron, K., Cohen, R., Crawford, T. M., Deolaliker, V., Easton, A., Flores, R., Fourcade, G.: Review and summary of cloud service level agreements. http://public.dhe.ibm.com/software/dw/cloud/library/cl-rev2sla-pdf.pdf

  38. Kliazovich, D., Bouvry, P., Khan, S.U.: Simulation and Performance Analysis of Data Intensive and Workload Intensive Cloud Computing Data Centers. In: Kachris, C., Bergman, K., Tomkos, I. (eds.) Optical Interconnects for Future Data Center Networks.Springer, New York, USA, ISBN: 978-1-4614-4629-3, Chapter 4

  39. Goel, S., Buyya, R.: Data Replication Strategies in Wide Area Distributed Systems. Enterprise Service Computing: From Concept to Deployment, Robin G. Qiu (ed), pp. 211–241, ISBN 1-599044181-2, Idea Group Inc., Hershey (2006)

  40. Pallickara, S.L., Pallickara, S., Pierce, M.: Scientific Data Management in the Cloud: A Survey of Technologies, Approaches and Challenges. Chapter 22: pp. 517–534, Handbook of Cloud Computing. Springer. ISBN: 978-1-4419-6523-3 (2010)

  41. Ramakrishnan, R.: Data Management in the Cloud. In: Proceedings of IEEE 25th International Conference on Data Engineering(ICDE ’09), pp. 5–5 (2009)

  42. Gonzalez, L., Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. Comp. Commun. Rev. 39(1), 50–55 (2009)

    Google Scholar 

  43. Plummer, D., Bittman, T., Austin, T., Cearley, D., Smith, D.: Cloud Computing: Defining and Describing an Emerging Phenomenon. Technical report, Gartner (2008)

  44. Staten, J., Yates, S., Gillett, F., Saleh, W., Dines, R.: Is cloud computing ready for the enterprise?. Technical Report, Forrester Research (2008)

  45. Bojanova, I., Samba, A.: Analysis of cloud computing delivery architecture models. In: IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA), Biopolis, pp. 453–458 (2011)

  46. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Rasin, A., Silberschatz, A.: Hadoopdb: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Publ. Very Large Database (PVLDB) 2(1), 922–933 (2009)

    Google Scholar 

  47. Cooper, B., Baldeschwieler, E., Fonseca, R., Kistler, J., Narayan, P., Neerdaels, C., Negrin, T., Ramakrishnan, R., Silberstein, A., Srivastava, U., Stata, R.: Building a cloud for Yahoo!. IEEE Data Eng. Bull. 32(1), 36–43 (2009)

    Google Scholar 

  48. Pfleeger, C.P., Pfleeger, S.L.: Security in Computing, 4th edn. Prentice Hall PTR, Upper Saddle River (2006)

    MATH  Google Scholar 

  49. Chen, Y., Paxson, V., Katz, R.H.: What’s New about cloud Computing Security?, Technical Report UCB/EECS-2010-5, EECS Department, University of California, Berkeley (2010)

  50. Ristenpart et al.: Hey, you, get off of my cloud! Exploring information leakage in third- party compute clouds. In: Proceedings of the 16th ACM Conference on Computer and Communication Security (CCS-09), pp. 199–212. ACM Press (2009)

  51. Habib, S.M., Ries, S., Muhlhauser, M.: Cloud Computing Landscape and Research Challenges regarding Trust and Reputation. In: 7th International Conference on Ubiquitous Intelligence & Computing and 7th International Conference on Autonomic & Trusted Computing (UIC/ATC), 2010, pp. 410–415 (2010)

  52. Person, S.: Taking account of privacy when designing cloud computing services, Technical Report, HPL-2009-54, HP Laboratories (2009)

  53. Everett, C.: Cloud computing: a question of trust. Comput. Fraud Security 2009(6), 5–7 (2009)

    Article  Google Scholar 

  54. Dillon, T.S., Wu, C., Chang, E.: Cloud computing: issues and challenges, In: Proceedings of 24th IEEE International Conference on Advanced Information Networking and Applications (AINA-2010), pp. 27–33 (2010)

  55. Mouline, I.: Why assumptions about cloud performance can be dangerous to your business. J. Cloud Comput. 2(3), 24–28 (2009)

    Google Scholar 

  56. Goel, S., Buyya, R.: Data replication strategies in wide area distributed systems. In: Qiu, Robin G. (ed.) Enterprise Service Computing: From Concept to Deployment, pp. 211–241, ISBN 1-599044181-2, Idea Group Inc., Hershey, PA, USA (2006)

  57. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (Bolton Landing, NY, USA, 2003). SOSP ’03. pp. 29–43 (2003)

  58. Amazon.com: Amazon simple storage service (Amazon S3), http://aws.amazon.com/s3, Accessed on 2011

  59. Gray, J., Helland, P., O’Neil, P., Shasha, D.: The danger of replication and a solution. In: Proceedings of International Conference on Management of Data ACM SIGMOD, Montreal, pp. 173–182 (1996)

  60. Loukopoulos, T., Ahmad, I.: Static and adaptive distributed data replication using genetic algorithms. J. Parallel Distrib. Comput. 64(11), 1270–1285 (2004)

    Article  MATH  Google Scholar 

  61. Ullah Khan, Samee, Ahmad, Ishfaq: A pure Nash equilibrium-based game theoretical method for data replication across multiple servers. IEEE Trans. Knowl. Data Eng. 21(4), 537-553 (2009)

  62. Wei, Q., Veeravalli, B., Gong, B., Zeng, L., Feng, D.: CDRM: A cost-effective dynamic replication management scheme for cloud storage cluster. In: IEEE International Conference on Cluster Computing 2010, pp. 188–197 (2010)

  63. Kangasharju, J., Roberts, J., Ross, K.: Object replication strategies in content distribution networks. In: Proceedings of Sixth International Workshop on Web Caching and Content Distribution (WCW ’01), pp. 455–456 (2001)

  64. Dowdy, L., Foster, D.: Comparative models of the file assignment problem. ACM Comput. Surveys 14(2), 287–313 (1982)

    Article  Google Scholar 

  65. Khan, S., Ahmad, I.: Heuristic-based replication schemas for fast information retrieval over the internet. In: Proceedings of 17th International Conference on Parallel and Distributed Computing Systems (PDCS ’04), pp. 278–283 (2004)

  66. Li, B., Golin, M., Italiano, G., Deng, X.: On the optimal placement of Web Proxies in the internet. Proc. IEEE INFOCOM ’00 1(1), 1282–1290 (2000)

    Google Scholar 

  67. Qiu, L., Padmanabhan, V., Voelker, G.: On the placement of web server replicas. Proc. IEEE INFOCOM ’01 1(2), 1587–1596 (2000)

    Google Scholar 

  68. Loukopoulos, T., Lampsas, P., Ahmad, I.: Continuous replica placement schemes in distributed systems. In: International Conference on Supercomputing (ICS’05) Boston, June 20–22

  69. Chu, W.W.: Optimal file allocation in a multiple-computer information system. IEEE Trans. Comput. C–18, 885–889 (1969)

    Article  MATH  Google Scholar 

  70. Chu, W.W.: Optimal file allocation in a computer network. In: Abramson, N., Kuo, F.F. (eds.) Computer-Communication Networks, pp. 83–94. Prentice-Hall, Englewood Cliffs (1973)

    Google Scholar 

  71. Casey, R.G.: Allocation of copies of files in an information network. In: Proceedings of AFZPS 1972 SJCC, vol. 40, pp. 617–625. AFIPS Press (1972)

  72. Eswaran, K.P.: Placement of records in a file and file allocation in a computer network. In: Proceedings of the ZFZP Congress on Information Processing 1974, pp. 304–307. North-Holland, Amsterdam (1974)

  73. Mahmoud, S., Riordon, J.S.: Optimal allocation of resources in distributed information networks. ACM Trans. Database Syst. 1(1), 66–78 (1976)

    Article  Google Scholar 

  74. Ramamoorthy, C.V., Wah, B.W.: The placement of relations on a distributed relational database. In: Proceedings of the 1st International Conference on Distributed Computing Systems (Huntsville, Ala., Oct. 1979). IEEE, New York, pp. 642–650 (1979)

  75. Wah, B.W., Lien, Y.-N.: Design of distributed databases on local computer systems with a multiaccess network. IEEE Trans. Softw. Eng. SE–11(7), 606–619 (1985)

    Article  Google Scholar 

  76. Wang, F., Oral, S., Shipman, G., Drokin, O., Wang, T., Huang, I.: Understanding lustre filesystem internals. Technical Report ORNL/TM-2009/117, Oak Ridge National Lab., National Center for Computational Sciences (2009)

  77. Cloudstore (kosmosfs), http://code.google.com/p/kosmosfs/. Accessed 12 June 2012

  78. Haddad, I.F.: PVFS: A parallel virtual file system for linux clusters. In: 4th Annual Linux Showcase and Conference, pp. 317–328. Atlanta (2000)

  79. Huang, H., Hung, W., Shin, K.G.: FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In: Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005), (2005)

  80. Bonvin, N., Papaioannou, T.G., Aberer, K.: A self-organized, fault tolerant and scalable replication scheme for cloud storage. In: Proceedings of the Symposium on Cloud Computing, pp. 205–216. Indianapolis, USA (2010)

  81. Decandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of ACM Symposium on Operating Systems Principles, pp. 205–220. New York (2007)

  82. Silvestre, G., Monnet, S., Krishnaswamy, R., Sens, P.: AREN: A Popularity aware replication scheme for cloud storage. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp. 189–196 (2012)

  83. Ye, Y., Xiao, L., Yen, I., Bastani, F.B.: Cloud storage design based on hybrid of replication and data partitioning. In: Proceedings of IEEE Sixteenth International Conference on Parallel and Distributed Systems (ICPADS), pp. 415\(\sim \)422. (2010)

  84. Ye, Y., Yen, I., Xiao, L., Bastani, F.: Secure. Dependable and high performance cloud storage. Technical Report: UTDCS-10-10

  85. Gupta, A., Liskov, B., Rodrigues, R.: One Hop lookups for peer-to-peer overlays. In: Proceedings of the Hot Topics in Operating Systems, Hawaii (2003)

  86. Wang, F., Qiu, J., Yang, J., Dong, B., Li, X., Li, Ying: Hadoop high availability through metadata replication. In: Proceeding of the first international workshop on cloud data management, pp. 37–44 (2009)

  87. Skeen, D., Stonebraker, M.: A formal model of crash recovery in a distributed system. IEEE Trans. Softw. Eng. 9(3), 219–228 (1983)

    Article  Google Scholar 

  88. Suresh, A.: HadoopT: Breaking the Scalability Limits of Hadoop. Diss, Rochester Institute of Technology, Rochester (2011)

    Google Scholar 

  89. Bessani, A., Correia, M., Quaresma, B., Andr’e, F., Sousa, P.: DepSky: Dependable and secure storage in a cloud-of-clouds. In: Proceedings of the European Conference on Computer Systems (EuroSys), pp. 31–46 (2011)

  90. Francisco, R., Correia, M.: Lucy in the sky without diamonds: Stealing confidential data in the cloud. In: IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W) (2011)

  91. Tsai, W., Zhong, P., Elston, J., Bai, X., Chen, Y.: Service replication with MapReduce in clouds. In: Tenth International Symposium on Autonomous Decentralized Systems, pp. 381–388 (2011)

  92. Cecchet, E., Singh, R., Sharma, U., Shenoy, P.: Dolly: virtualization-driven database provisioning for the cloud. In: Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 51–62 (2011)

  93. Twin Peaks Software Inc. http://www.TwinPeakSoft.com. Accessed 04 May 2012

  94. Twin Peaks Software Inc., Mirror File System for Cloud Computing. U.S Patent number: 7418439

  95. MFS presentation at usenix.org fast 08, http://www.usenix.org/events/fast08/wips_posters/slides/wong.pdf

  96. Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. In: VLDB, pp. 48–57 (2010)

  97. Armbrust, M., Fox, A., Rean, G., Joseph, A., Katz, R., Konwinski, A., Gunho, L., David, P., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: A Berkeley View of cloud Computing. Tech. Rep. UCB/EECS-2009-28, EECS Department, U.C. Berkeley (2009)

  98. Khan, S.U., Min-Allah, N.: A goal programming based energy efficient resource allocation in data centers. J. Supercomput. 61(3), 502–519 (2012)

    Article  Google Scholar 

  99. Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: Proceedings of cloudDB’2009, pp. 17–24

  100. Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The R -tree: a dynamic index for multi-dimensional objects. VLDB J., pp. 507–518 (1987)

  101. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice Hall Inc, Upper Saddle River (1999)

    Google Scholar 

  102. Haojun, L., Han, J., Fang, J.: Multi-Dimensional index on Hadoop Distributed File System. In: IEEE 5th International Conference on Networking, Architecture and Storage (NAS) (2010)

  103. Tiwari, R.G., Navathe, S.B., Kulkarni, G. J.: Towards transactional data management over the cloud. In: proceedings of Second International Symposium on Data, Privacy, and E-Commerce, pp. 100–107 (2010)

  104. Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)

  105. Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Operating systems design and implementation, pp. 335–350 (2006)

  106. Cooper, B., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H., Puz, N., Weaver, D., Yerneni, R.: Pnuts: Yahoo!’s hosted data serving platform. Publ. Very Large Database (PVLDB) 1(2), 1277–1288 (2008)

    Google Scholar 

  107. Simmhan, Y., Barga, R., van Ingen, C., Lazowska, E., Szalay, A.: Building the Trident Scientific Workflow Workbench for Data Management in the cloud. In: Third International Conference on Advanced Engineering Computing and Applications in Sciences, 2009. ADVCOMP ’09, pp. 41–50 (2009)

  108. Hey, T., Trefethen, A.: The Data Deluge: An e-Science Perspective, in Grid Computing: Making the Global Infrastructure a Reality. Wiley, Chichester (2003)

    Google Scholar 

  109. Barnes, C.R., Bornhold, B.D., Juniper, S.K., Pirenne, B., Phibbs, P.: The NEPTUNE Project–a cabled ocean observatory in the NE Pacific: Overview, challenges and scientific objectives for the installation and operation of Stage I in Canadian waters. In: Symposium on Underwater Technology and Workshop on Scientific Use of Submarine Cables and Related Technologies, pp. 308–313 (2007)

  110. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: European Professional Society for Systems (EuroSys), pp. 59–72 (2007)

  111. Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P., Currey, J.: Dryad linq: A system for general-purpose distributed dataparallel computing using a high-level language. In: OSDI, pp. 1–14 (2008)

  112. Das, S., Agrawal, D., Abbadi, A.E.: Elastras: An elastic transactional data store in the cloud. In: Workshop on Hot Topics in Cloud Computing (2009)

  113. Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (1992)

    MATH  Google Scholar 

  114. Weikum, G., Vossen, G.: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  115. Aguilera, M.K., Merchant, A., Shah, M., Veitch, A., Karamanolis, C.: Sinfonia, A new paradigm for building scalable distributed systems. In: SOSP, pp. 159–174 (2007)

  116. Hsieh, M., Chang, C., Ho, L.Y., Wu, J., Liu, P.: SQLMR : A scalable database management system for cloud computing. In: Proceedings of International Conference on Parallel Processing, pp. 315–324 (2011)

  117. Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002)

    Article  Google Scholar 

  118. Youn, H., Lee, D., Lee, B., Choi, J., Kim, H., Park, C., Su, L.: An efficient hybrid replication protocol for highly available distributed system. In: Proceedings of IASTED International Conference on Communications and Computer Networks, pp. 508–513 (2002)

  119. Gifford, D.K.: Weighted voting for replicated data. In: Proceedings of 7th ACM Symposium on Operating Systems Principles, pp. 150–162 (1979)

  120. Agrawal, D., Abbadi, A.: The tree Quorum protocol: an efficient approach for managing replicated data. In: Proceedings of 16th Very Large Database Conference, pp. 243–254 (1990)

  121. Taheri, J., Zomaya, A.Y., Bouvry, P., Khan, S.U.: Hopfield neural network for simultaneous job scheduling and data replication in grids. Future Gener. Comput. Syst. 29(8), 1885–1900 (2013)

    Article  Google Scholar 

  122. Khan, S.U., Ahmad, I.: Replicating data objects in large distributed database systems: an axiomatic game theoretical mechanism design approach. Distrib. Parallel Databases 28(2–3), 187–218 (2010)

    Article  Google Scholar 

  123. Moiz, S.A., Sailaja, P., Venkataswamy, G., Supriya, N.: Database replication: a survey of open source and commercial tools. Int. J. Comput. Appl. 13(6), 1–8 (2011)

    Google Scholar 

  124. Khan, S.U., Ahmad, I.: Non-cooperative, semi-cooperative, and cooperative games-based grid resource allocation. In: 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS). Rhodes Island, (2006)

  125. Garcia-Molina, H., Lindsay, B.: Research directions for distributed databases. IEEE Q. Bull. Database Eng. 13(4), 12–17 (1990)

    Google Scholar 

  126. Stonebraker, M.: Future trends in database systems. IEEE Trans. Knowl. Data Eng. 1(1), 33–44 (1989)

    Article  Google Scholar 

  127. Razavi, A., Moschoyiannis, S., Krause, P.: Concurrency control and recovery management in open e-Business transactions. In: WoTUG Communicating Process Architectures, pp. 267–285 (2007)

  128. Christmann, P., Härder, T.H., meyer-wegener, K., Sikeler, A.: Which kinds of OS mechanisms should be provided for database management. In: Nehmer, J. (ed.), Experiences with Distributed Systems, pp. 213–251. Springer, New York

  129. GORDA Project: State of the Art in Database Replication Deliverable D1.1, http://gorda.di.uminho.pt/deliverables, Accessed on 08 June 2013 (2006)

  130. Abdellatif, T., Cecchet, E., Lachaize, R.: Evaluation of a Group Communication Middleware for Clustered J2EE Application Servers. ODBASE, Cyprus (2004)

    Book  Google Scholar 

  131. Energy, STAR Data Center Energy Efficiency Initiatives, http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf?d7a4-0cec. Accessed 16 Aug 2012

  132. Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: FAWN: A fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp. 1–14 (2009)

  133. Szalay, A.S., Bell, G.C., Huang, H.H., Terzis, A., White, A.: Low-power amdahl-balanced blades for data intensive computing. ACM SIGOPS Oper. Syst. Rev. 44(1), 71–75 (2010)

    Article  Google Scholar 

  134. Nedevschi, S., Popa, L., Iannaccone, G., Ratnasamy, S., Wetherall, D.: Reducing network energy consumption via sleeping and rate-adaptation. In: NSDI’08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp. 323–336, Berkeley (2008). USENIX Association

  135. Goiri, I., Le, K., Haque, M.E., Beauchea, R., Nguyen, T.D., Guitart, J., Torres, J., Bianchini, R.: GreenSlot: Scheduling Energy Consumption in Green Datacenters. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 20 (2011)

  136. Khan, S.U., Bouvry, P., Engel, T.: Energy-efficient high-performance parallel and distributed computing. J. Supercomput. 60(2), 163–164 (2012)

    Article  Google Scholar 

  137. Marzolla, M., Babaoglu, O., Panzieri, F.: Server Consolidation in Clouds through Gossiping, TR UBLCS-2011-01. Department of Computer Science, University of Bologna, Italy (2011)

    Google Scholar 

  138. Shen, X., Liao, W., Choudhary, A., Memik, G., Kandemir, M.: A high-performance application data environment for large-scale scientific computations. IEEE Trans. Parallel Distrib. Syst. 14(12), 1262–1274 (2003)

    Article  Google Scholar 

Download references

Acknowledgments

The authors are thankful to Kashif Bilal and Osman Khalid for the valuable reviews, suggestions, and comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saif Ur Rehman Malik.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malik, S.U.R., Khan, S.U., Ewen, S.J. et al. Performance analysis of data intensive cloud systems based on data management and replication: a survey. Distrib Parallel Databases 34, 179–215 (2016). https://doi.org/10.1007/s10619-015-7173-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-015-7173-2

Keywords

Navigation