Abstract
In the cloud storage system, data sets replicas technology can efficiently enhance data availability and thereby increase the system reliability by replicating commonly used data sets in geographically different data centers. Most current approaches largely focus on system performance improvement by placing replicas for an independent data set, omitting the generation relationship among data sets. Furthermore, cost is an important element in deciding replicas number and their stored places, which can cause great financial burden for cloud clients because the cost for replicas storage and consistency maintenance may lead to high overhead with the number of new replicas increased in a pay-as-you-go paradigm. In this paper, we propose a combination strategy of real-replicas and pseudo-replicas (by computation from its provenance) from cost-effective view in order to achieve the minimum data set management cost, not only for the independent data sets but also for related data sets with generation relationships. We first define cost models that fit into the cloud computing paradigm, including data sets storage, computation and transfer costs, and then develop a new data set management cost model, helping to achieve a multi-criteria optimization of data set management. After that, a minimum cost benchmarking approach for the best trade-off between real-replicas and pseudo-replicas is proposed once decision to add a replica has been made. Then, a more practical and reasonable genetic algorithm as an alternative procedure for generating optimal or near-optimal solution is given in order to identify the suitable replicas storage places. Finally, we present simulations setups and results that provide a first validation of our strategy. Both the theoretical analysis and simulations conducted on general (random) data sets as well as specific real world applications have shown efficiency and effectiveness of the improved system brought by the proposed strategy in cloud computing environment.
Article PDF
Avoid common mistakes on your manuscript.
References
Robinson D. Amazon Web Services made simple: learn how Amazon EC2, S3, simpleDB and SQS Web Services enables you to reach business goals faster(Emereo Pty Ltd, 2008).
Goolge cloud storage. http://cloud.google.com/storage.
Copeland M, Soh J, Puca A, et al. Overview of Microsoft Azure Services (Microsoft Azure. Apress, 2015).
Tu M, Yen I L. Distributed replica placement algorithms for correlated data. The Journal of Supercomputing, 68(1) (2014)245–273.
Amjad T, Sher M, Daud A. A survey of dynamic replication strategies for improving data availability in data grids. Future Generation Computer Systems, 28(2) (2012) 337–349.
Kingsy Grace R, Manimegalai R. Dynamic replica placement and selection strategies in data grids—A comprehensive survey. Journal of Parallel and Distributed Computing, 74(2) (2014)2099–2108.
Allcock B, Bester J, Bresnahan J, et al. Data management and transfer in high-performance computational grid environments. Parallel Computing, 28(5) (2002)749–771.
Ranganathan K, Foster I. Identifying dynamic replication strategies for a high-performance data grid. Grid Computing-GRID(Springer Berlin Heidelberg, 2001)
Shorfuzzaman M, Graham P, Eskicioglu R. Adaptive popularity-driven replica placement in hierarchical data grids. Journal of Supercomputing, 51(3) (2010)374–392.
Sashi K, Thanamani A S. Dynamic replication in a data grid using a Modified BHR Region Based Algorithm. Future Generation Computer Systems, 27(2) (2011)202–210.
Wang L, Luo J, Shen J, et al. Cost and time aware ant colony algorithm for data replica in alpha magnetic spectrometer experiment. in Proc. 2013 IEEE International Congress on Big Data (2013) , pp.247–254.
Wang L, Shen J. Towards bio-inspired cost minimization for data-intensive service provision. in Proc. 2012 IEEE First International Conference (IEEE 2012), pp. 16–23.
Lei M, Vrbsky S V, Hong X. An on-line replication strategy to increase availability in data grids. Future Generation Computer Systems, 24(2) (2008) 85–98.
Ye Z, Li S, Zhou X. GCplace: geo-cloud based correlation aware data replica placement. In Proc. 2013 Int. Conf. on Applied Computing (2013), pp.371–376.
Ghemawat S, gobioff H, leung S T. The Google file system. ACM SIGOPS Operating Systems Review, 37(5) (2003)29–43.
Mansouri N, Dastghaibyfard G H. A dynamic replica management strategy in data grid. Journal of Network & Computer Applications, 35(4) (2012)1297–1303.
Maheshwari N, Nanduri R, Varma V. Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Future Generation Computer Systems, 28(1) (2012)119–127.
Pérez J M, García-Carballeira F, Carretero J, et al. Branch replication scheme: A new model for data replication in large scale data grids. Future Generation Computer Systems, 26(1) (2010)12–20.
Ranganathan K, Foster I. Identifying dynamic replication strategies for a high-performance data grid. (Springer Berlin Heidelberg, 2001).
Ren X, Wang R, Kong Q. Using optorsim to efficiently simulate replica placement strategies. The Journal of China Universities of Posts and Telecommunications, 17(1) (2010)111–119.
Kumar K A, Quamar A, Deshpande A, et al. SWORD: workload-aware data placement and replica selection for cloud data management systems. The Vldb Journal, 23(6) (2014)845–870.
Chang R S, Chang J S, Lin S Y. Job scheduling and data replication on data grids. Future Generation Computer Systems, 23(7) (2007)846–860.
Wu J J, Lin Y F, Liu P. Optimal replica placement in hierarchical Data Grids with locality assurance. Journal of Parallel and Distributed Computing, 68(12)(2008) 1517–1538.
Chang R S, Chang H P, Wang Y T. A dynamic weighted data replication strategy in data grids. in Proc. Int. Conf. 2008 Computer Systems and Applications, (IEEE/ACS, 2008), pp. 414–421.
Bell W H, Cameron D G, Carvajal-Schiaffino R, et al. Evaluation of an economy-based file replication strategy for a data grid. in Proc. Int. Conf. 2003 Cluster Computing and the Grid, (ACM International Symposium on. IEEE, 2003), pp. 661–668.
Nguyen T V A, Bimonte S, d’Orazio L, et al. Cost models for view materialization in the cloud. in Proc. 2012 Joint EDBT/ICDT Workshops (ACM 2012), pp. 47–54.
Dong Yuan, Yun Yang, Xiao Liu, et al. A local-optimization based strategy for cost-effective datasets storage of scientific applications in the cloud. in Proc. of Conf. 2011 IEEE International Conference on Cloud Computing (CLOUD 2011), pp. 179–186.
Jacobsen S K. Heuristics for the capacitated plant location model. European Journal of Operational Research, 12(3)(1983)253–261.
Krarup J, Pruzan P M. The simple plant location problem: survey and synthesis. European Journal of Operational Research, 12(1)(1983)36–81.
Fu K, Miao Z, Xu J. On planar medianoid competitive location problems with manhattan distance. Asia-Pacific Journal of Operational Research, 30(02)(2013)1–13.
Beasley J E, Chu P C. A genetic algorithm for the set covering problem. European Journal of Operational Research, 94(2)(1996)392–404.
Jaramillo J H, Bhadury J, Batta R. On the use of genetic algorithms to solve location problems. Computers & Operations Research, 29(6) (2002) 761–779.
Daskin M S. Network and discrete location: models, algorithms, and applications(John Wiley & Sons, 2011).
Hosage C M, Goodchild M F. Discrete space location-allocation solutions from genetic algorithms. Annals of Operations Research, 6(2)(1986) 35–46.
Alander J T. On optimal population size of genetic algorithms. in Proc. Int. Conf. CompEuro92 Computer Systems and Software Engineering(1992),pp. 65–70.
Yan, Jun, Y. Yang, and G. K. Raikundalia. SwinDeW-a p2p-based decentralized workflow management system. IEEE Transactions on Systems, Man, and Cybernetics -Part A: Systems and Humans, 36(5)(2006)922–935.
Yang Y, Liu K, Chen J, Joël Lignier1 Hai Jin. Peer-to-peer based grid workflow runtime environment of SwinDeW-G, in Proc. Int. Conf. 2007 e-Science and Grid Computing IEEE(2007), pp. 51–58.
Wu X. Data Sets Replicas Placements Strategy from Cost-Effective View in the Cloud. Scientific Programming, 11(2016)1–13.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partly supported by Shandong Provincial Natural Science Foundation (ZR2016FM01), China; the Doctor Foundation of Shandong University of Finance and Economics under Grant (2010034), and the Project of Jinan High-tech Independent and Innovation (201303015), China.
Rights and permissions
This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
About this article
Cite this article
Wu, X. Combination Replicas Placements Strategy for Data sets from Cost-effective View in the Cloud. Int J Comput Intell Syst 10, 521–539 (2017). https://doi.org/10.2991/ijcis.10.1.521
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.2991/ijcis.10.1.521