Skip to main content
Log in

A Distributed Data Storage Strategy Based on LOPs

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Distributed data management requires data partitioning and deployment at the data storage level, and data querying requires the configuration and integration of query subresults at each site. The data partitioning strategy is closely related to the overhead of the distributed system. It is necessary to determine the appropriate data partitioning strategy and update strategy according to the application. This paper proposes a widely distributed storage and processing scheme for a distributed linear order partition (DLOP) based on time stamps. This scheme proposes two kinds of partition strategy based on the characteristics of an "equivalent division" of a linear order partition (LOP), namely, partitioning based on time interval equilibrium and partitioning based on query expectation. Each site in the distributed system is uniformly configured with an index-based data query mechanism to complete the distributed management of data. The corresponding experiments verify the practicability and efficiency of the proposed storage strategy and show that the proposed method is effective for the self-scalability of the data scale and reduces the cluster hardware configuration requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Zhang, Y.; et al.: Parallel processing systems for big data: a survey. Proc. IEEE 104(11), 2114–2136 (2016)

    Article  Google Scholar 

  2. Polato, I.; et al.: A comprehensive view of Hadoop research—a systematic literature review. J. Netw. Comput. Appl. 46, 1–25 (2014). Author 1, A.; Author 2, B. Book Title, 3rd ed.; Publisher: Publisher Location, Country, 2008; pp. 154–196

  3. Challa, J.S.; et al.: DD-Rtree: a dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE (2016)

  4. Cangir, O.F.; Cankur, O.; Ozsoy, A.: A taxonomy for Blockchain based distributed storage technologies. Inf. Process. Manag. 58(5), 102627 (2021)

    Article  Google Scholar 

  5. Fan, W.; et al.: Method of maintaining data consistency in microservice architecture. In: 2018 IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS). IEEE Computer Society (2018)

  6. Benerjee, K.G.; Gupta, M.K.: Trade-off for heterogeneous distributed storage systems between storage and repair cost. Prob. Inf. Transm. 57(1), 33–53 (2021)

    Article  MathSciNet  Google Scholar 

  7. Ruty, G.; Baccouch, H.; Nguyen, V., et al.: Popularity-based full replica caching for erasure-coded distributed storage systems. Clust. Comput. 2021, 1–14 (2021)

    Google Scholar 

  8. Hall, R.J.: Tools for predicting the reliability of large-scale storage systems. ACM Trans. Storage (TOS) 12(4), 1–30 (2016)

    Article  Google Scholar 

  9. Kruglik, S.; Frolov, A.: An information-theoretic approach for reliable distributed storage systems. J. Commun. Technol. Elect. 65(12), 1505–1516 (2020)

    Article  Google Scholar 

  10. Yu, L.; et al.: Stochastic load balancing for virtual resource management in datacenters. IEEE Trans. Cloud Comput. 8(2), 459–472 (2016)

    Article  Google Scholar 

  11. Kaur, S.; Sharma, T.: Efficient load balancing using improved central load balancing technique. In: 2018 2nd International Conference on Inventive Systems and Control (ICISC). IEEE (2018)

  12. Qin, X.P.; Wang, H.J.; Li, F.R.; et al.: New landscape of data management technologies. J. Softw. 24(2), 175–197 (2013)

    Article  Google Scholar 

  13. Mishra, S.; Suman, A.C.: An efficient method of partitioning high volumes of multidimensional data for parallel clustering algorithms (2016). arXiv:1609.06221

  14. Alarabi, L.; Mokbel, M.F.; Musleh, M.: St-hadoop: a mapreduce framework for spatio-temporal data. GeoInformatica 22(4), 785–813 (2018)

    Article  Google Scholar 

  15. Mahmud, M.S.; et al.: A survey of data partitioning and sampling methods to support big data analysis. Big Data Min. Analyt. 3(2), 85–101 (2020)

    Article  Google Scholar 

  16. Emara, X.Z.T.Z.; He, C.W.H.: A random sample partition data model for big data analysis (2017). arXiv:1712.04146

  17. Alsmirat, M.; Jararweh, Y.; Al-Ayyoub, M.: Speeding DBLP querying using hadoop and spark//IOP conference series: materials science and engineering. IOP Publ. 459(1), 012003 (2018)

    Google Scholar 

  18. Hu, X.; Xu, H.; Jia, J.; et al.: Research on distributed storage and query optimization of multi-source heterogeneous meteorological data. In: Proceedings of the 2018 International Conference on Cloud Computing and Internet of Things. ACM, pp. 12–18 (2018)

  19. Xue, J.; Xu, C.; Bai, L.: DStore: a distributed system for outsourced data storage and retrieval. Futur. Gener. Comput. Syst. 99, 106–114 (2019)

    Article  Google Scholar 

  20. Kolomvatsos, K.: A distributed, proactive intelligent scheme for securing quality in large scale data processing. Computing 101(11), 1687–1710 (2019)

    Article  Google Scholar 

  21. Rafique, A.; Van Landuyt, D.; Joosen, W.: Persist: policy-based data management middleware for multi-tenant saas leveraging federated cloud storage. J. Grid Comput. 16(2), 165–194 (2018)

    Article  Google Scholar 

  22. Rafique, A.; Van Landuyt, D.; Truyen, E.; Reniers, V.; Joosen, W.: SCOPE: self-adaptive and policy-based data management middleware for federated clouds. J. Internet Serv. Appl. 10(1), 1–19 (2019)

    Article  Google Scholar 

  23. Li, R., et al.: TrajMesa: a distributed NoSQL-based trajectory data management system. IEEE Trans. Knowl. Data Eng. (2021). https://doi.org/10.1109/TKDE.2021.3079880

    Article  Google Scholar 

  24. Li, R.; He, H.; Wang, R.; Ruan, S.; Sui, Y.; Bao, J.; Zheng, Y.: Trajmesa: a distributed nosql storage engine for big trajectory data. In: 2020 IEEE 36th international conference on data engineering (ICDE). IEEE, pp. 2002–2005 (2020)

  25. Ye, X.; Tang, Y.; Lin, Y.; Chen, Z.; Zhang, Z.; Chen, R.: Study and implementation of temporal index TD index. Sci. Sin. (Inf.) 8(45), 1025–1045 (2015)

    Google Scholar 

  26. Ye, X.P.; Tang, Y.; Zhang, Z.B.; Chen, Z.Y.; Lin, Y.C.: Study and implementation on semantics-based cooperative temporal XML index. J. Comput. 37(9), 1911–1921 (2014)

    Google Scholar 

  27. Ye, X.P.; Tang, Y.; Lin, Y.C.; Chen, Z.Y.; Zhang, Z.B.: Study and application of temporal quasi-order data structure. J. Softw. 25(11), 2587–2601 (2014)

    MATH  Google Scholar 

  28. Allen, J.F.: Maintaining knowledge about temporal intervals. Read. Qual. Reason. Phys. Syst. 26(11), 361–372 (1990)

    Google Scholar 

Download references

Funding

This research was funded by the Postdoctoral Research Foundation of China (2019M663239).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qianqiu Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Q., Ye, X., Luo, X. et al. A Distributed Data Storage Strategy Based on LOPs. Arab J Sci Eng 47, 9767–9779 (2022). https://doi.org/10.1007/s13369-021-06371-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-021-06371-3

Keywords

Navigation