A Distributed Data Storage Strategy Based on LOPs

Wang, Qianqiu; Ye, Xiaoping; Luo, Xianlu; Li, Lunjie; Chen, Hainan

doi:10.1007/s13369-021-06371-3

A Distributed Data Storage Strategy Based on LOPs

Research Article-Computer Engineering and Computer Science
Published: 30 November 2021

Volume 47, pages 9767–9779, (2022)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Qianqiu Wang ORCID: orcid.org/0000-0002-8586-0653¹,
Xiaoping Ye²,
Xianlu Luo²,
Lunjie Li² &
…
Hainan Chen¹

166 Accesses
1 Citation
Explore all metrics

Abstract

Distributed data management requires data partitioning and deployment at the data storage level, and data querying requires the configuration and integration of query subresults at each site. The data partitioning strategy is closely related to the overhead of the distributed system. It is necessary to determine the appropriate data partitioning strategy and update strategy according to the application. This paper proposes a widely distributed storage and processing scheme for a distributed linear order partition (DLOP) based on time stamps. This scheme proposes two kinds of partition strategy based on the characteristics of an "equivalent division" of a linear order partition (LOP), namely, partitioning based on time interval equilibrium and partitioning based on query expectation. Each site in the distributed system is uniformly configured with an index-based data query mechanism to complete the distributed management of data. The corresponding experiments verify the practicability and efficiency of the proposed storage strategy and show that the proposed method is effective for the self-scalability of the data scale and reduces the cluster hardware configuration requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Query Processing of Pre-partitioned Data Using Sandwich Operators

S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse

Efficient Partitioning and Allocation of Data for Workload Queries

References

Zhang, Y.; et al.: Parallel processing systems for big data: a survey. Proc. IEEE 104(11), 2114–2136 (2016)
Article Google Scholar
Polato, I.; et al.: A comprehensive view of Hadoop research—a systematic literature review. J. Netw. Comput. Appl. 46, 1–25 (2014). Author 1, A.; Author 2, B. Book Title, 3rd ed.; Publisher: Publisher Location, Country, 2008; pp. 154–196
Challa, J.S.; et al.: DD-Rtree: a dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE (2016)
Cangir, O.F.; Cankur, O.; Ozsoy, A.: A taxonomy for Blockchain based distributed storage technologies. Inf. Process. Manag. 58(5), 102627 (2021)
Article Google Scholar
Fan, W.; et al.: Method of maintaining data consistency in microservice architecture. In: 2018 IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS). IEEE Computer Society (2018)
Benerjee, K.G.; Gupta, M.K.: Trade-off for heterogeneous distributed storage systems between storage and repair cost. Prob. Inf. Transm. 57(1), 33–53 (2021)
Article MathSciNet Google Scholar
Ruty, G.; Baccouch, H.; Nguyen, V., et al.: Popularity-based full replica caching for erasure-coded distributed storage systems. Clust. Comput. 2021, 1–14 (2021)
Google Scholar
Hall, R.J.: Tools for predicting the reliability of large-scale storage systems. ACM Trans. Storage (TOS) 12(4), 1–30 (2016)
Article Google Scholar
Kruglik, S.; Frolov, A.: An information-theoretic approach for reliable distributed storage systems. J. Commun. Technol. Elect. 65(12), 1505–1516 (2020)
Article Google Scholar
Yu, L.; et al.: Stochastic load balancing for virtual resource management in datacenters. IEEE Trans. Cloud Comput. 8(2), 459–472 (2016)
Article Google Scholar
Kaur, S.; Sharma, T.: Efficient load balancing using improved central load balancing technique. In: 2018 2nd International Conference on Inventive Systems and Control (ICISC). IEEE (2018)
Qin, X.P.; Wang, H.J.; Li, F.R.; et al.: New landscape of data management technologies. J. Softw. 24(2), 175–197 (2013)
Article Google Scholar
Mishra, S.; Suman, A.C.: An efficient method of partitioning high volumes of multidimensional data for parallel clustering algorithms (2016). arXiv:1609.06221
Alarabi, L.; Mokbel, M.F.; Musleh, M.: St-hadoop: a mapreduce framework for spatio-temporal data. GeoInformatica 22(4), 785–813 (2018)
Article Google Scholar
Mahmud, M.S.; et al.: A survey of data partitioning and sampling methods to support big data analysis. Big Data Min. Analyt. 3(2), 85–101 (2020)
Article Google Scholar
Emara, X.Z.T.Z.; He, C.W.H.: A random sample partition data model for big data analysis (2017). arXiv:1712.04146
Alsmirat, M.; Jararweh, Y.; Al-Ayyoub, M.: Speeding DBLP querying using hadoop and spark//IOP conference series: materials science and engineering. IOP Publ. 459(1), 012003 (2018)
Google Scholar
Hu, X.; Xu, H.; Jia, J.; et al.: Research on distributed storage and query optimization of multi-source heterogeneous meteorological data. In: Proceedings of the 2018 International Conference on Cloud Computing and Internet of Things. ACM, pp. 12–18 (2018)
Xue, J.; Xu, C.; Bai, L.: DStore: a distributed system for outsourced data storage and retrieval. Futur. Gener. Comput. Syst. 99, 106–114 (2019)
Article Google Scholar
Kolomvatsos, K.: A distributed, proactive intelligent scheme for securing quality in large scale data processing. Computing 101(11), 1687–1710 (2019)
Article Google Scholar
Rafique, A.; Van Landuyt, D.; Joosen, W.: Persist: policy-based data management middleware for multi-tenant saas leveraging federated cloud storage. J. Grid Comput. 16(2), 165–194 (2018)
Article Google Scholar
Rafique, A.; Van Landuyt, D.; Truyen, E.; Reniers, V.; Joosen, W.: SCOPE: self-adaptive and policy-based data management middleware for federated clouds. J. Internet Serv. Appl. 10(1), 1–19 (2019)
Article Google Scholar
Li, R., et al.: TrajMesa: a distributed NoSQL-based trajectory data management system. IEEE Trans. Knowl. Data Eng. (2021). https://doi.org/10.1109/TKDE.2021.3079880
Article Google Scholar
Li, R.; He, H.; Wang, R.; Ruan, S.; Sui, Y.; Bao, J.; Zheng, Y.: Trajmesa: a distributed nosql storage engine for big trajectory data. In: 2020 IEEE 36th international conference on data engineering (ICDE). IEEE, pp. 2002–2005 (2020)
Ye, X.; Tang, Y.; Lin, Y.; Chen, Z.; Zhang, Z.; Chen, R.: Study and implementation of temporal index TD index. Sci. Sin. (Inf.) 8(45), 1025–1045 (2015)
Google Scholar
Ye, X.P.; Tang, Y.; Zhang, Z.B.; Chen, Z.Y.; Lin, Y.C.: Study and implementation on semantics-based cooperative temporal XML index. J. Comput. 37(9), 1911–1921 (2014)
Google Scholar
Ye, X.P.; Tang, Y.; Lin, Y.C.; Chen, Z.Y.; Zhang, Z.B.: Study and application of temporal quasi-order data structure. J. Softw. 25(11), 2587–2601 (2014)
MATH Google Scholar
Allen, J.F.: Maintaining knowledge about temporal intervals. Read. Qual. Reason. Phys. Syst. 26(11), 361–372 (1990)
Google Scholar

Download references

Funding

This research was funded by the Postdoctoral Research Foundation of China (2019M663239).

Author information

Authors and Affiliations

Sun Yat-Sen University, Guangzhou, China
Qianqiu Wang & Hainan Chen
Neusoft Institute Guangdong, Foshan, China
Xiaoping Ye, Xianlu Luo & Lunjie Li

Authors

Qianqiu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xianlu Luo
View author publications
You can also search for this author in PubMed Google Scholar
Lunjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Hainan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qianqiu Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Q., Ye, X., Luo, X. et al. A Distributed Data Storage Strategy Based on LOPs. Arab J Sci Eng 47, 9767–9779 (2022). https://doi.org/10.1007/s13369-021-06371-3

Download citation

Received: 25 June 2021
Accepted: 31 October 2021
Published: 30 November 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s13369-021-06371-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Distributed Data Storage Strategy Based on LOPs

Abstract

Access this article

Similar content being viewed by others

Query Processing of Pre-partitioned Data Using Sandwich Operators

S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse

Efficient Partitioning and Allocation of Data for Workload Queries

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Distributed Data Storage Strategy Based on LOPs

Abstract

Access this article

Similar content being viewed by others

Query Processing of Pre-partitioned Data Using Sandwich Operators

S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse

Efficient Partitioning and Allocation of Data for Workload Queries

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation