Combination Replicas Placements Strategy for Data sets from Cost-effective View in the Cloud

Wu, Xiuguo

doi:10.2991/ijcis.10.1.521

Combination Replicas Placements Strategy for Data sets from Cost-effective View in the Cloud

Research Article
Open access
Published: 01 January 2017

Volume 10, pages 521–539, (2017)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

Combination Replicas Placements Strategy for Data sets from Cost-effective View in the Cloud

Download PDF

Xiuguo Wu¹

11 Accesses
Explore all metrics

Abstract

In the cloud storage system, data sets replicas technology can efficiently enhance data availability and thereby increase the system reliability by replicating commonly used data sets in geographically different data centers. Most current approaches largely focus on system performance improvement by placing replicas for an independent data set, omitting the generation relationship among data sets. Furthermore, cost is an important element in deciding replicas number and their stored places, which can cause great financial burden for cloud clients because the cost for replicas storage and consistency maintenance may lead to high overhead with the number of new replicas increased in a pay-as-you-go paradigm. In this paper, we propose a combination strategy of real-replicas and pseudo-replicas (by computation from its provenance) from cost-effective view in order to achieve the minimum data set management cost, not only for the independent data sets but also for related data sets with generation relationships. We first define cost models that fit into the cloud computing paradigm, including data sets storage, computation and transfer costs, and then develop a new data set management cost model, helping to achieve a multi-criteria optimization of data set management. After that, a minimum cost benchmarking approach for the best trade-off between real-replicas and pseudo-replicas is proposed once decision to add a replica has been made. Then, a more practical and reasonable genetic algorithm as an alternative procedure for generating optimal or near-optimal solution is given in order to identify the suitable replicas storage places. Finally, we present simulations setups and results that provide a first validation of our strategy. Both the theoretical analysis and simulations conducted on general (random) data sets as well as specific real world applications have shown efficiency and effectiveness of the improved system brought by the proposed strategy in cloud computing environment.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Robinson D. Amazon Web Services made simple: learn how Amazon EC2, S3, simpleDB and SQS Web Services enables you to reach business goals faster(Emereo Pty Ltd, 2008).
Goolge cloud storage. http://cloud.google.com/storage.
Copeland M, Soh J, Puca A, et al. Overview of Microsoft Azure Services (Microsoft Azure. Apress, 2015).
Tu M, Yen I L. Distributed replica placement algorithms for correlated data. The Journal of Supercomputing, 68(1) (2014)245–273.
Google Scholar
Amjad T, Sher M, Daud A. A survey of dynamic replication strategies for improving data availability in data grids. Future Generation Computer Systems, 28(2) (2012) 337–349.
Kingsy Grace R, Manimegalai R. Dynamic replica placement and selection strategies in data grids—A comprehensive survey. Journal of Parallel and Distributed Computing, 74(2) (2014)2099–2108.
Google Scholar
Allcock B, Bester J, Bresnahan J, et al. Data management and transfer in high-performance computational grid environments. Parallel Computing, 28(5) (2002)749–771.
Google Scholar
Ranganathan K, Foster I. Identifying dynamic replication strategies for a high-performance data grid. Grid Computing-GRID(Springer Berlin Heidelberg, 2001)
Shorfuzzaman M, Graham P, Eskicioglu R. Adaptive popularity-driven replica placement in hierarchical data grids. Journal of Supercomputing, 51(3) (2010)374–392.
Sashi K, Thanamani A S. Dynamic replication in a data grid using a Modified BHR Region Based Algorithm. Future Generation Computer Systems, 27(2) (2011)202–210.
Google Scholar
Wang L, Luo J, Shen J, et al. Cost and time aware ant colony algorithm for data replica in alpha magnetic spectrometer experiment. in Proc. 2013 IEEE International Congress on Big Data (2013) , pp.247–254.
Wang L, Shen J. Towards bio-inspired cost minimization for data-intensive service provision. in Proc. 2012 IEEE First International Conference (IEEE 2012), pp. 16–23.
Lei M, Vrbsky S V, Hong X. An on-line replication strategy to increase availability in data grids. Future Generation Computer Systems, 24(2) (2008) 85–98.
Google Scholar
Ye Z, Li S, Zhou X. GCplace: geo-cloud based correlation aware data replica placement. In Proc. 2013 Int. Conf. on Applied Computing (2013), pp.371–376.
Ghemawat S, gobioff H, leung S T. The Google file system. ACM SIGOPS Operating Systems Review, 37(5) (2003)29–43.
Google Scholar
Mansouri N, Dastghaibyfard G H. A dynamic replica management strategy in data grid. Journal of Network & Computer Applications, 35(4) (2012)1297–1303.
Google Scholar
Maheshwari N, Nanduri R, Varma V. Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Future Generation Computer Systems, 28(1) (2012)119–127.
Google Scholar
Pérez J M, García-Carballeira F, Carretero J, et al. Branch replication scheme: A new model for data replication in large scale data grids. Future Generation Computer Systems, 26(1) (2010)12–20.
Google Scholar
Ranganathan K, Foster I. Identifying dynamic replication strategies for a high-performance data grid. (Springer Berlin Heidelberg, 2001).
Ren X, Wang R, Kong Q. Using optorsim to efficiently simulate replica placement strategies. The Journal of China Universities of Posts and Telecommunications, 17(1) (2010)111–119.
Google Scholar
Kumar K A, Quamar A, Deshpande A, et al. SWORD: workload-aware data placement and replica selection for cloud data management systems. The Vldb Journal, 23(6) (2014)845–870.
Google Scholar
Chang R S, Chang J S, Lin S Y. Job scheduling and data replication on data grids. Future Generation Computer Systems, 23(7) (2007)846–860.
Google Scholar
Wu J J, Lin Y F, Liu P. Optimal replica placement in hierarchical Data Grids with locality assurance. Journal of Parallel and Distributed Computing, 68(12)(2008) 1517–1538.
Chang R S, Chang H P, Wang Y T. A dynamic weighted data replication strategy in data grids. in Proc. Int. Conf. 2008 Computer Systems and Applications, (IEEE/ACS, 2008), pp. 414–421.
Bell W H, Cameron D G, Carvajal-Schiaffino R, et al. Evaluation of an economy-based file replication strategy for a data grid. in Proc. Int. Conf. 2003 Cluster Computing and the Grid, (ACM International Symposium on. IEEE, 2003), pp. 661–668.
Nguyen T V A, Bimonte S, d’Orazio L, et al. Cost models for view materialization in the cloud. in Proc. 2012 Joint EDBT/ICDT Workshops (ACM 2012), pp. 47–54.
Dong Yuan, Yun Yang, Xiao Liu, et al. A local-optimization based strategy for cost-effective datasets storage of scientific applications in the cloud. in Proc. of Conf. 2011 IEEE International Conference on Cloud Computing (CLOUD 2011), pp. 179–186.
Jacobsen S K. Heuristics for the capacitated plant location model. European Journal of Operational Research, 12(3)(1983)253–261.
Google Scholar
Krarup J, Pruzan P M. The simple plant location problem: survey and synthesis. European Journal of Operational Research, 12(1)(1983)36–81.
Google Scholar
Fu K, Miao Z, Xu J. On planar medianoid competitive location problems with manhattan distance. Asia-Pacific Journal of Operational Research, 30(02)(2013)1–13.
Google Scholar
Beasley J E, Chu P C. A genetic algorithm for the set covering problem. European Journal of Operational Research, 94(2)(1996)392–404.
Google Scholar
Jaramillo J H, Bhadury J, Batta R. On the use of genetic algorithms to solve location problems. Computers & Operations Research, 29(6) (2002) 761–779.
Google Scholar
Daskin M S. Network and discrete location: models, algorithms, and applications(John Wiley & Sons, 2011).
Hosage C M, Goodchild M F. Discrete space location-allocation solutions from genetic algorithms. Annals of Operations Research, 6(2)(1986) 35–46.
Alander J T. On optimal population size of genetic algorithms. in Proc. Int. Conf. CompEuro92 Computer Systems and Software Engineering(1992),pp. 65–70.
Yan, Jun, Y. Yang, and G. K. Raikundalia. SwinDeW-a p2p-based decentralized workflow management system. IEEE Transactions on Systems, Man, and Cybernetics -Part A: Systems and Humans, 36(5)(2006)922–935.
Google Scholar
Yang Y, Liu K, Chen J, Joël Lignier1 Hai Jin. Peer-to-peer based grid workflow runtime environment of SwinDeW-G, in Proc. Int. Conf. 2007 e-Science and Grid Computing IEEE(2007), pp. 51–58.
Wu X. Data Sets Replicas Placements Strategy from Cost-Effective View in the Cloud. Scientific Programming, 11(2016)1–13.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Management Science and Engineering, Shandong University of Finance and Economics, No. 7366, Erhuan Road, 250014, Jinan, Shandong, China
Xiuguo Wu

Authors

Xiuguo Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiuguo Wu.

Additional information

This work is partly supported by Shandong Provincial Natural Science Foundation (ZR2016FM01), China; the Doctor Foundation of Shandong University of Finance and Economics under Grant (2010034), and the Project of Jinan High-tech Independent and Innovation (201303015), China.

Rights and permissions

This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Wu, X. Combination Replicas Placements Strategy for Data sets from Cost-effective View in the Cloud. Int J Comput Intell Syst 10, 521–539 (2017). https://doi.org/10.2991/ijcis.10.1.521

Download citation

Received: 25 July 2015
Accepted: 13 December 2016
Published: 01 January 2017
Issue Date: January 2017
DOI: https://doi.org/10.2991/ijcis.10.1.521

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Combination Replicas Placements Strategy for Data sets from Cost-effective View in the Cloud

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation