Performance analysis of data intensive cloud systems based on data management and replication: a survey

Malik, Saif Ur Rehman; Khan, Samee U.; Ewen, Sam J.; Tziritas, Nikos; Kolodziej, Joanna; Zomaya, Albert Y.; Madani, Sajjad A.; Min-Allah, Nasro; Wang, Lizhe; Xu, Cheng-Zhong; Malluhi, Qutaibah Marwan; Pecero, Johnatan E.; Balaji, Pavan; Vishnu, Abhinav; Ranjan, Rajiv; Zeadally, Sherali; Li, Hongxiang

doi:10.1007/s10619-015-7173-2

Performance analysis of data intensive cloud systems based on data management and replication: a survey

Published: 14 March 2015

Volume 34, pages 179–215, (2016)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Saif Ur Rehman Malik¹,
Samee U. Khan²,
Sam J. Ewen²,
Nikos Tziritas³,
Joanna Kolodziej⁴,
Albert Y. Zomaya⁵,
Sajjad A. Madani¹,
Nasro Min-Allah⁶,
Lizhe Wang⁷,
Cheng-Zhong Xu⁸,
Qutaibah Marwan Malluhi⁹,
Johnatan E. Pecero¹⁰,
Pavan Balaji¹¹,
Abhinav Vishnu¹²,
Rajiv Ranjan¹³,
Sherali Zeadally¹⁴ &
…
Hongxiang Li¹⁵

1882 Accesses
39 Citations
Explore all metrics

Abstract

As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such ‘Data Explosions’ has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amount of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service Provider must assure that data is efficiently processed and distributed in a way that does not compromise end-users’ Quality of Service (QoS) in terms of data availability, data search delay, data analysis delay, and the like. In the aforementioned perspective, data replication is used in the cloud for improving the performance (e.g., read and write delay) of applications that access data. Through replication a data intensive application or system can achieve high availability, better fault tolerance, and data recovery. In this paper, we survey data management and replication approaches (from 2007 to 2011) that are developed by both industrial and research communities. The focus of the survey is to discuss and characterize the existing approaches of data replication and management that tackle the resource usage and QoS provisioning with different levels of efficiencies. Moreover, the breakdown of both influential expressions (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on security challenges in cloud computing: issues, threats, and solutions

Article 28 February 2020

Energy efficiency in cloud computing data centers: a survey on software technologies

Article 30 August 2022

Big data analytics on Apache Spark

Article 13 October 2016

References

Mell, P., Grance, T.: Definition of cloud computing. Technical report, National Institute of Standard and Technology (NIST) (2009)
Bell, G., Gray, J., Szalay, A.: Petascale computational systems. IEEE Comp. 39(1), 110–112 (2006)
Article Google Scholar
Lamanna, M.: High-energy physics applications on the grid. In: Wang, Lizhe, Jie, Wei, Chen, Jinjun (eds.) Grid Computing: Infrastructure, Service, and Applications, pp. 433–458. CRC Press, Boca Raton (2009)
Chapter Google Scholar
Khatib, Y., Edwards, C.: A Survey-Based Study of Grid Traffic. In: Proceedings of GridNets, pp. 41–48 (2007)
Gartner: Gartner top ten disruptive technologies for 2008 to 2012. Emerging trends and technologies roadshow http://www.gartner.com/it/page.jsp?id=681107, Accessed (2011)
Abadi, D.: Data management in the cloud: limitations and opportunities. IEEE Data Eng. Bull. 32(1), 3–12 (2009)
Google Scholar
Leinwand, A.: The Hidden Cost of the cloud: Bandwidth Charges, GIGAom, Jul. 17 2009, http://gigaom.com/2009/07/17/the-hidden-cost-of-the-cloud-bandwidth-charges/, Accessed May 12 (2011)
Sakr, S., Liu, A., Batista, D., Alomari, M.: A survey of large scale data management approaches in cloud environments. IEEE Commun. Survey Tutor. 09, 1–26 (2011)
Google Scholar
Cassandra: Available at http://incubator.apache.org/cassandra/, Accessed (2011)
Thusoo, A., Sarma, J., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive-A warehousing solution over a MapReduce framework. In VLDB, pp. 1626–1629 (2009)
HBase: Available at http://hadoop.apache.org/hbase/, Accessed (2011)
Loukopoulos, Thanasis, Ahmad, Ishfaq, Papadias, Dimitris: An overview of data replication on the internet. In: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN.02), pp. 27–32 (2002)
Kia, H.S., Khan, S.U.: Server replication in multicast networks. In: 10th IEEE International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, pp. 337–341 (2012)
Khan, S.U., Ahmad, I.: A pure Nash equilibrium based game theoretical method for data replication across multiple servers. IEEE Trans. Knowl. Data Eng. 21(4), 537–553 (2009)
Article MathSciNet Google Scholar
Khan, S.U.: A frugal auction technique for data replication in large distributed computing systems. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, NV, USA, pp. 17–23 (2009)
Khan, S.U., Ardil, C.: A competitive replica placement methodology for Ad Hoc networks. In: International Conference on Parallel and Distributed Computing Systems (ICPDCS), Oslo, Norway, pp. 128–133 (2009)
Khan, S.U., Ahmad, I.: Comparison and analysis of ten static heuristics-based internet data replication techniques. J. Parallel Distrib. Comput. 68(2), 113–136 (2008)
Article MATH Google Scholar
Khan, S.U., Maciejewski, A.A., Siegel, H.J., Ahmad, I.: A game theoretical data replication technique for mobile Ad Hoc networks. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS). Miami (2008)
Khan, S.U., Ahmad, I.: A pure Nash equilibrium guaranteeing game theoretical replica allocation method for reducing web access time. In: 12th International Conference on Parallel and Distributed Systems (ICPADS), Minneapolis pp. 169–176 (2006)
Khan, S.U., Ahmad, I.: Game theoretical solutions for data replication in distributed computing systems. In: Rajasekaran, S., Reif, J. (eds.), Handbook of Parallel Computing: Models, Algorithms, and Applications. Chapman & Hall/CRC Press, Boca Raton (2007). ISBN 1-584-88623-4, Chapter 45
Khan, S.U., Ahmad, I.: Data replication in large distributed computing systems using supergames. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, pp. 38–44 (2006)
Khan, S.U., Ardil, C.: A frugal bidding procedure for replicating WWW content. Int. J. Inform. Technol. 5(1), 67–80 (2009)
Google Scholar
Khan, S.U., Maciejewski, A.A., Siegel, H.J.: Robust CDN replica placement techniques. In: 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS). Italy, Rome (2009)
Khan, S.U., Ardil, C.: A fast replica placement methodology for large-scale distributed computing systems. In: International Conference on Parallel and Distributed Computing Systems (ICPDCS), Oslo, pp. 121–127 (2009)
Wu, Y., Li, G., Wang, L., Ma, Y., Kolodziej, J., Khan, S.U.: A review of data intensive computing. In: 12th International Conference on Scalable Computing and Communications (ScalCom), Changzhou, (2012)
Khan, S.U., Ahmad, I.: A cooperative game theoretical replica placement technique. In: 13th International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, (2007)
Khan, S.U., Ahmad, I.: Replicating data objects in large-scale distributed computing systems using extended Vickery auction. Int. J. Comput. Intell. 3(1), 14–22 (2006)
Google Scholar
Gao, Aiqiang, Diao, Luhong: Lazy update propagation for data replication in cloud computing. In: 5th International Conference on Pervasive Computing and Applications (ICPCA), pp. 250–254 (2010)
Ikeda, Takahiko, Ohara, Mamoru, Fukumoto, Satoshi, Arai, Masayuki, Iwasaki, Kazuhiko: A distributed data replication protocol for file versioning with optimal node assignments. In: Proceedings of IEEE International Pacific Rim International Symposium on Dependable Computing 2010, pp. 117–125 (2011)
Khan, S.U., Ahmad, I.: Discriminatory algorithmic mechanism design based WWW content replication. Informatica 31(1), 105–119 (2007)
MathSciNet Google Scholar
Khan, S.U., Ahmad, I.: A semi-distributed axiomatic game theoretical mechanism for replicating data objects in large distributed computing systems. In: 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS). Long Beach (2007)
Kohavi, R., Henne, R.M., Sommerfield, D.: Practical guide to controlled experiments on the Web: Listen to your customers not to the HiPPO. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (KDD 2007), pp. 959–967
Gulati, A., Merchant, A., Varman, P.: pClock: An arrival curve based approach for QoS in shared storage systems. In: Proceedings ACM International Conference on Measurement and Modeling of Computer System (SIGMETRICS), (2007)
Gulati, A., Merchant, A., Varman, P.: mClock: Handling throughput variability for hypervisor IO scheduling. In: Proceedings of the 9th OSDI, (2010)
Wang, J., Varmany, P., Xie, C.: Avoiding performance fluctuation in cloud storage. In: Proceeding of High Performance Computing (HiPC), pp. 1–9 (2010)
Goiri, I., Julia, F., Fito, J., Macias, M., Guitart, J.: Resource-level QoS metric for CPU-based guarantees in Cloud providers. In: 7th international workshop on economics of grids, Clouds, systems, and services, pp. 34–47 (2010)
Amrhein, D., Anderson, P., de Andrade, A., Armstrong, J., Arasan, E., Bartlett, J., Bruklis, R., Cameron, K., Cohen, R., Crawford, T. M., Deolaliker, V., Easton, A., Flores, R., Fourcade, G.: Review and summary of cloud service level agreements. http://public.dhe.ibm.com/software/dw/cloud/library/cl-rev2sla-pdf.pdf
Kliazovich, D., Bouvry, P., Khan, S.U.: Simulation and Performance Analysis of Data Intensive and Workload Intensive Cloud Computing Data Centers. In: Kachris, C., Bergman, K., Tomkos, I. (eds.) Optical Interconnects for Future Data Center Networks.Springer, New York, USA, ISBN: 978-1-4614-4629-3, Chapter 4
Goel, S., Buyya, R.: Data Replication Strategies in Wide Area Distributed Systems. Enterprise Service Computing: From Concept to Deployment, Robin G. Qiu (ed), pp. 211–241, ISBN 1-599044181-2, Idea Group Inc., Hershey (2006)
Pallickara, S.L., Pallickara, S., Pierce, M.: Scientific Data Management in the Cloud: A Survey of Technologies, Approaches and Challenges. Chapter 22: pp. 517–534, Handbook of Cloud Computing. Springer. ISBN: 978-1-4419-6523-3 (2010)
Ramakrishnan, R.: Data Management in the Cloud. In: Proceedings of IEEE 25th International Conference on Data Engineering(ICDE ’09), pp. 5–5 (2009)
Gonzalez, L., Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. Comp. Commun. Rev. 39(1), 50–55 (2009)
Google Scholar
Plummer, D., Bittman, T., Austin, T., Cearley, D., Smith, D.: Cloud Computing: Defining and Describing an Emerging Phenomenon. Technical report, Gartner (2008)
Staten, J., Yates, S., Gillett, F., Saleh, W., Dines, R.: Is cloud computing ready for the enterprise?. Technical Report, Forrester Research (2008)
Bojanova, I., Samba, A.: Analysis of cloud computing delivery architecture models. In: IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA), Biopolis, pp. 453–458 (2011)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Rasin, A., Silberschatz, A.: Hadoopdb: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Publ. Very Large Database (PVLDB) 2(1), 922–933 (2009)
Google Scholar
Cooper, B., Baldeschwieler, E., Fonseca, R., Kistler, J., Narayan, P., Neerdaels, C., Negrin, T., Ramakrishnan, R., Silberstein, A., Srivastava, U., Stata, R.: Building a cloud for Yahoo!. IEEE Data Eng. Bull. 32(1), 36–43 (2009)
Google Scholar
Pfleeger, C.P., Pfleeger, S.L.: Security in Computing, 4th edn. Prentice Hall PTR, Upper Saddle River (2006)
MATH Google Scholar
Chen, Y., Paxson, V., Katz, R.H.: What’s New about cloud Computing Security?, Technical Report UCB/EECS-2010-5, EECS Department, University of California, Berkeley (2010)
Ristenpart et al.: Hey, you, get off of my cloud! Exploring information leakage in third- party compute clouds. In: Proceedings of the 16th ACM Conference on Computer and Communication Security (CCS-09), pp. 199–212. ACM Press (2009)
Habib, S.M., Ries, S., Muhlhauser, M.: Cloud Computing Landscape and Research Challenges regarding Trust and Reputation. In: 7th International Conference on Ubiquitous Intelligence & Computing and 7th International Conference on Autonomic & Trusted Computing (UIC/ATC), 2010, pp. 410–415 (2010)
Person, S.: Taking account of privacy when designing cloud computing services, Technical Report, HPL-2009-54, HP Laboratories (2009)
Everett, C.: Cloud computing: a question of trust. Comput. Fraud Security 2009(6), 5–7 (2009)
Article Google Scholar
Dillon, T.S., Wu, C., Chang, E.: Cloud computing: issues and challenges, In: Proceedings of 24th IEEE International Conference on Advanced Information Networking and Applications (AINA-2010), pp. 27–33 (2010)
Mouline, I.: Why assumptions about cloud performance can be dangerous to your business. J. Cloud Comput. 2(3), 24–28 (2009)
Google Scholar
Goel, S., Buyya, R.: Data replication strategies in wide area distributed systems. In: Qiu, Robin G. (ed.) Enterprise Service Computing: From Concept to Deployment, pp. 211–241, ISBN 1-599044181-2, Idea Group Inc., Hershey, PA, USA (2006)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (Bolton Landing, NY, USA, 2003). SOSP ’03. pp. 29–43 (2003)
Amazon.com: Amazon simple storage service (Amazon S3), http://aws.amazon.com/s3, Accessed on 2011
Gray, J., Helland, P., O’Neil, P., Shasha, D.: The danger of replication and a solution. In: Proceedings of International Conference on Management of Data ACM SIGMOD, Montreal, pp. 173–182 (1996)
Loukopoulos, T., Ahmad, I.: Static and adaptive distributed data replication using genetic algorithms. J. Parallel Distrib. Comput. 64(11), 1270–1285 (2004)
Article MATH Google Scholar
Ullah Khan, Samee, Ahmad, Ishfaq: A pure Nash equilibrium-based game theoretical method for data replication across multiple servers. IEEE Trans. Knowl. Data Eng. 21(4), 537-553 (2009)
Wei, Q., Veeravalli, B., Gong, B., Zeng, L., Feng, D.: CDRM: A cost-effective dynamic replication management scheme for cloud storage cluster. In: IEEE International Conference on Cluster Computing 2010, pp. 188–197 (2010)
Kangasharju, J., Roberts, J., Ross, K.: Object replication strategies in content distribution networks. In: Proceedings of Sixth International Workshop on Web Caching and Content Distribution (WCW ’01), pp. 455–456 (2001)
Dowdy, L., Foster, D.: Comparative models of the file assignment problem. ACM Comput. Surveys 14(2), 287–313 (1982)
Article Google Scholar
Khan, S., Ahmad, I.: Heuristic-based replication schemas for fast information retrieval over the internet. In: Proceedings of 17th International Conference on Parallel and Distributed Computing Systems (PDCS ’04), pp. 278–283 (2004)
Li, B., Golin, M., Italiano, G., Deng, X.: On the optimal placement of Web Proxies in the internet. Proc. IEEE INFOCOM ’00 1(1), 1282–1290 (2000)
Google Scholar
Qiu, L., Padmanabhan, V., Voelker, G.: On the placement of web server replicas. Proc. IEEE INFOCOM ’01 1(2), 1587–1596 (2000)
Google Scholar
Loukopoulos, T., Lampsas, P., Ahmad, I.: Continuous replica placement schemes in distributed systems. In: International Conference on Supercomputing (ICS’05) Boston, June 20–22
Chu, W.W.: Optimal file allocation in a multiple-computer information system. IEEE Trans. Comput. C–18, 885–889 (1969)
Article MATH Google Scholar
Chu, W.W.: Optimal file allocation in a computer network. In: Abramson, N., Kuo, F.F. (eds.) Computer-Communication Networks, pp. 83–94. Prentice-Hall, Englewood Cliffs (1973)
Google Scholar
Casey, R.G.: Allocation of copies of files in an information network. In: Proceedings of AFZPS 1972 SJCC, vol. 40, pp. 617–625. AFIPS Press (1972)
Eswaran, K.P.: Placement of records in a file and file allocation in a computer network. In: Proceedings of the ZFZP Congress on Information Processing 1974, pp. 304–307. North-Holland, Amsterdam (1974)
Mahmoud, S., Riordon, J.S.: Optimal allocation of resources in distributed information networks. ACM Trans. Database Syst. 1(1), 66–78 (1976)
Article Google Scholar
Ramamoorthy, C.V., Wah, B.W.: The placement of relations on a distributed relational database. In: Proceedings of the 1st International Conference on Distributed Computing Systems (Huntsville, Ala., Oct. 1979). IEEE, New York, pp. 642–650 (1979)
Wah, B.W., Lien, Y.-N.: Design of distributed databases on local computer systems with a multiaccess network. IEEE Trans. Softw. Eng. SE–11(7), 606–619 (1985)
Article Google Scholar
Wang, F., Oral, S., Shipman, G., Drokin, O., Wang, T., Huang, I.: Understanding lustre filesystem internals. Technical Report ORNL/TM-2009/117, Oak Ridge National Lab., National Center for Computational Sciences (2009)
Cloudstore (kosmosfs), http://code.google.com/p/kosmosfs/. Accessed 12 June 2012
Haddad, I.F.: PVFS: A parallel virtual file system for linux clusters. In: 4th Annual Linux Showcase and Conference, pp. 317–328. Atlanta (2000)
Huang, H., Hung, W., Shin, K.G.: FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In: Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005), (2005)
Bonvin, N., Papaioannou, T.G., Aberer, K.: A self-organized, fault tolerant and scalable replication scheme for cloud storage. In: Proceedings of the Symposium on Cloud Computing, pp. 205–216. Indianapolis, USA (2010)
Decandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of ACM Symposium on Operating Systems Principles, pp. 205–220. New York (2007)
Silvestre, G., Monnet, S., Krishnaswamy, R., Sens, P.: AREN: A Popularity aware replication scheme for cloud storage. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp. 189–196 (2012)
Ye, Y., Xiao, L., Yen, I., Bastani, F.B.: Cloud storage design based on hybrid of replication and data partitioning. In: Proceedings of IEEE Sixteenth International Conference on Parallel and Distributed Systems (ICPADS), pp. 415\(\sim \)422. (2010)
Ye, Y., Yen, I., Xiao, L., Bastani, F.: Secure. Dependable and high performance cloud storage. Technical Report: UTDCS-10-10
Gupta, A., Liskov, B., Rodrigues, R.: One Hop lookups for peer-to-peer overlays. In: Proceedings of the Hot Topics in Operating Systems, Hawaii (2003)
Wang, F., Qiu, J., Yang, J., Dong, B., Li, X., Li, Ying: Hadoop high availability through metadata replication. In: Proceeding of the first international workshop on cloud data management, pp. 37–44 (2009)
Skeen, D., Stonebraker, M.: A formal model of crash recovery in a distributed system. IEEE Trans. Softw. Eng. 9(3), 219–228 (1983)
Article Google Scholar
Suresh, A.: HadoopT: Breaking the Scalability Limits of Hadoop. Diss, Rochester Institute of Technology, Rochester (2011)
Google Scholar
Bessani, A., Correia, M., Quaresma, B., Andr’e, F., Sousa, P.: DepSky: Dependable and secure storage in a cloud-of-clouds. In: Proceedings of the European Conference on Computer Systems (EuroSys), pp. 31–46 (2011)
Francisco, R., Correia, M.: Lucy in the sky without diamonds: Stealing confidential data in the cloud. In: IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W) (2011)
Tsai, W., Zhong, P., Elston, J., Bai, X., Chen, Y.: Service replication with MapReduce in clouds. In: Tenth International Symposium on Autonomous Decentralized Systems, pp. 381–388 (2011)
Cecchet, E., Singh, R., Sharma, U., Shenoy, P.: Dolly: virtualization-driven database provisioning for the cloud. In: Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 51–62 (2011)
Twin Peaks Software Inc. http://www.TwinPeakSoft.com. Accessed 04 May 2012
Twin Peaks Software Inc., Mirror File System for Cloud Computing. U.S Patent number: 7418439
MFS presentation at usenix.org fast 08, http://www.usenix.org/events/fast08/wips_posters/slides/wong.pdf
Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. In: VLDB, pp. 48–57 (2010)
Armbrust, M., Fox, A., Rean, G., Joseph, A., Katz, R., Konwinski, A., Gunho, L., David, P., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: A Berkeley View of cloud Computing. Tech. Rep. UCB/EECS-2009-28, EECS Department, U.C. Berkeley (2009)
Khan, S.U., Min-Allah, N.: A goal programming based energy efficient resource allocation in data centers. J. Supercomput. 61(3), 502–519 (2012)
Article Google Scholar
Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: Proceedings of cloudDB’2009, pp. 17–24
Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The R -tree: a dynamic index for multi-dimensional objects. VLDB J., pp. 507–518 (1987)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice Hall Inc, Upper Saddle River (1999)
Google Scholar
Haojun, L., Han, J., Fang, J.: Multi-Dimensional index on Hadoop Distributed File System. In: IEEE 5th International Conference on Networking, Architecture and Storage (NAS) (2010)
Tiwari, R.G., Navathe, S.B., Kulkarni, G. J.: Towards transactional data management over the cloud. In: proceedings of Second International Symposium on Data, Privacy, and E-Commerce, pp. 100–107 (2010)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)
Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Operating systems design and implementation, pp. 335–350 (2006)
Cooper, B., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H., Puz, N., Weaver, D., Yerneni, R.: Pnuts: Yahoo!’s hosted data serving platform. Publ. Very Large Database (PVLDB) 1(2), 1277–1288 (2008)
Google Scholar
Simmhan, Y., Barga, R., van Ingen, C., Lazowska, E., Szalay, A.: Building the Trident Scientific Workflow Workbench for Data Management in the cloud. In: Third International Conference on Advanced Engineering Computing and Applications in Sciences, 2009. ADVCOMP ’09, pp. 41–50 (2009)
Hey, T., Trefethen, A.: The Data Deluge: An e-Science Perspective, in Grid Computing: Making the Global Infrastructure a Reality. Wiley, Chichester (2003)
Google Scholar
Barnes, C.R., Bornhold, B.D., Juniper, S.K., Pirenne, B., Phibbs, P.: The NEPTUNE Project–a cabled ocean observatory in the NE Pacific: Overview, challenges and scientific objectives for the installation and operation of Stage I in Canadian waters. In: Symposium on Underwater Technology and Workshop on Scientific Use of Submarine Cables and Related Technologies, pp. 308–313 (2007)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: European Professional Society for Systems (EuroSys), pp. 59–72 (2007)
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P., Currey, J.: Dryad linq: A system for general-purpose distributed dataparallel computing using a high-level language. In: OSDI, pp. 1–14 (2008)
Das, S., Agrawal, D., Abbadi, A.E.: Elastras: An elastic transactional data store in the cloud. In: Workshop on Hot Topics in Cloud Computing (2009)
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (1992)
MATH Google Scholar
Weikum, G., Vossen, G.: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Aguilera, M.K., Merchant, A., Shah, M., Veitch, A., Karamanolis, C.: Sinfonia, A new paradigm for building scalable distributed systems. In: SOSP, pp. 159–174 (2007)
Hsieh, M., Chang, C., Ho, L.Y., Wu, J., Liu, P.: SQLMR : A scalable database management system for cloud computing. In: Proceedings of International Conference on Parallel Processing, pp. 315–324 (2011)
Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002)
Article Google Scholar
Youn, H., Lee, D., Lee, B., Choi, J., Kim, H., Park, C., Su, L.: An efficient hybrid replication protocol for highly available distributed system. In: Proceedings of IASTED International Conference on Communications and Computer Networks, pp. 508–513 (2002)
Gifford, D.K.: Weighted voting for replicated data. In: Proceedings of 7th ACM Symposium on Operating Systems Principles, pp. 150–162 (1979)
Agrawal, D., Abbadi, A.: The tree Quorum protocol: an efficient approach for managing replicated data. In: Proceedings of 16th Very Large Database Conference, pp. 243–254 (1990)
Taheri, J., Zomaya, A.Y., Bouvry, P., Khan, S.U.: Hopfield neural network for simultaneous job scheduling and data replication in grids. Future Gener. Comput. Syst. 29(8), 1885–1900 (2013)
Article Google Scholar
Khan, S.U., Ahmad, I.: Replicating data objects in large distributed database systems: an axiomatic game theoretical mechanism design approach. Distrib. Parallel Databases 28(2–3), 187–218 (2010)
Article Google Scholar
Moiz, S.A., Sailaja, P., Venkataswamy, G., Supriya, N.: Database replication: a survey of open source and commercial tools. Int. J. Comput. Appl. 13(6), 1–8 (2011)
Google Scholar
Khan, S.U., Ahmad, I.: Non-cooperative, semi-cooperative, and cooperative games-based grid resource allocation. In: 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS). Rhodes Island, (2006)
Garcia-Molina, H., Lindsay, B.: Research directions for distributed databases. IEEE Q. Bull. Database Eng. 13(4), 12–17 (1990)
Google Scholar
Stonebraker, M.: Future trends in database systems. IEEE Trans. Knowl. Data Eng. 1(1), 33–44 (1989)
Article Google Scholar
Razavi, A., Moschoyiannis, S., Krause, P.: Concurrency control and recovery management in open e-Business transactions. In: WoTUG Communicating Process Architectures, pp. 267–285 (2007)
Christmann, P., Härder, T.H., meyer-wegener, K., Sikeler, A.: Which kinds of OS mechanisms should be provided for database management. In: Nehmer, J. (ed.), Experiences with Distributed Systems, pp. 213–251. Springer, New York
GORDA Project: State of the Art in Database Replication Deliverable D1.1, http://gorda.di.uminho.pt/deliverables, Accessed on 08 June 2013 (2006)
Abdellatif, T., Cecchet, E., Lachaize, R.: Evaluation of a Group Communication Middleware for Clustered J2EE Application Servers. ODBASE, Cyprus (2004)
Book Google Scholar
Energy, STAR Data Center Energy Efficiency Initiatives, http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf?d7a4-0cec. Accessed 16 Aug 2012
Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: FAWN: A fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp. 1–14 (2009)
Szalay, A.S., Bell, G.C., Huang, H.H., Terzis, A., White, A.: Low-power amdahl-balanced blades for data intensive computing. ACM SIGOPS Oper. Syst. Rev. 44(1), 71–75 (2010)
Article Google Scholar
Nedevschi, S., Popa, L., Iannaccone, G., Ratnasamy, S., Wetherall, D.: Reducing network energy consumption via sleeping and rate-adaptation. In: NSDI’08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp. 323–336, Berkeley (2008). USENIX Association
Goiri, I., Le, K., Haque, M.E., Beauchea, R., Nguyen, T.D., Guitart, J., Torres, J., Bianchini, R.: GreenSlot: Scheduling Energy Consumption in Green Datacenters. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 20 (2011)
Khan, S.U., Bouvry, P., Engel, T.: Energy-efficient high-performance parallel and distributed computing. J. Supercomput. 60(2), 163–164 (2012)
Article Google Scholar
Marzolla, M., Babaoglu, O., Panzieri, F.: Server Consolidation in Clouds through Gossiping, TR UBLCS-2011-01. Department of Computer Science, University of Bologna, Italy (2011)
Google Scholar
Shen, X., Liao, W., Choudhary, A., Memik, G., Kandemir, M.: A high-performance application data environment for large-scale scientific computations. IEEE Trans. Parallel Distrib. Syst. 14(12), 1262–1274 (2003)
Article Google Scholar

Download references

Acknowledgments

The authors are thankful to Kashif Bilal and Osman Khalid for the valuable reviews, suggestions, and comments.

Author information

Authors and Affiliations

COMSATS Institute of Information Technology, Islamabad, Pakistan
Saif Ur Rehman Malik & Sajjad A. Madani
North dakota State University, Fargo, USA
Samee U. Khan & Sam J. Ewen
Shenzhen Institute of Advanced Technology, Shenzhen, Guangdong, China
Nikos Tziritas
University of Bielsko-Biala, Bielsko-Biala, Poland
Joanna Kolodziej
University of Sydney, Sydney, Australia
Albert Y. Zomaya
University of Dammam, Dammam, Saudi Arabia
Nasro Min-Allah
Chinese Academy of Sciences, Beijing, China
Lizhe Wang
Wayne State University, Detroit, MI, USA
Cheng-Zhong Xu
Qatar University, Doha, Qatar
Qutaibah Marwan Malluhi
University of Luxembourg, Walferdange, Luxembourg
Johnatan E. Pecero
Argonne National Laboratory, Lemont, IL, USA
Pavan Balaji
Pacific Northwest National Laboratory, Richland, USA
Abhinav Vishnu
CSIRO ICT Center, Marsfield, NSW, Australia
Rajiv Ranjan
University of the District of Columbia, Washington, DC, 20008, USA
Sherali Zeadally
University of Louisville, Louisville, Kentucky
Hongxiang Li

Authors

Saif Ur Rehman Malik
View author publications
You can also search for this author in PubMed Google Scholar
Samee U. Khan
View author publications
You can also search for this author in PubMed Google Scholar
Sam J. Ewen
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Tziritas
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Kolodziej
View author publications
You can also search for this author in PubMed Google Scholar
Albert Y. Zomaya
View author publications
You can also search for this author in PubMed Google Scholar
Sajjad A. Madani
View author publications
You can also search for this author in PubMed Google Scholar
Nasro Min-Allah
View author publications
You can also search for this author in PubMed Google Scholar
Lizhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Zhong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qutaibah Marwan Malluhi
View author publications
You can also search for this author in PubMed Google Scholar
Johnatan E. Pecero
View author publications
You can also search for this author in PubMed Google Scholar
Pavan Balaji
View author publications
You can also search for this author in PubMed Google Scholar
Abhinav Vishnu
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ranjan
View author publications
You can also search for this author in PubMed Google Scholar
Sherali Zeadally
View author publications
You can also search for this author in PubMed Google Scholar
Hongxiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saif Ur Rehman Malik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malik, S.U.R., Khan, S.U., Ewen, S.J. et al. Performance analysis of data intensive cloud systems based on data management and replication: a survey. Distrib Parallel Databases 34, 179–215 (2016). https://doi.org/10.1007/s10619-015-7173-2

Download citation

Published: 14 March 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10619-015-7173-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance analysis of data intensive cloud systems based on data management and replication: a survey

Abstract

Access this article

Similar content being viewed by others

A survey on security challenges in cloud computing: issues, threats, and solutions

Energy efficiency in cloud computing data centers: a survey on software technologies

Big data analytics on Apache Spark

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance analysis of data intensive cloud systems based on data management and replication: a survey

Abstract

Access this article

Similar content being viewed by others

A survey on security challenges in cloud computing: issues, threats, and solutions

Energy efficiency in cloud computing data centers: a survey on software technologies

Big data analytics on Apache Spark

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation