Advertisement

Cluster Computing

, Volume 20, Issue 4, pp 2833–2844 | Cite as

Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency

  • Kun Zheng
  • Danpeng Gu
  • Falin FangEmail author
  • Miao Zhang
  • Kang Zheng
  • Qi Li
Article

Abstract

Scan operation will involve many fragments and cause many extra invalid partitioning query operations in distributed column-oriented database which affects query efficiency seriously, especially for spatial data. To solve this question, this paper refers to partitioning strategy in distributed column-oriented database and advocates a spatial data storage optimization strategy named ‘SPPS’. This strategy makes adjacent spatial objects stored in the same data fragment with considering spatial adjacency, and reserves the spatial information of each fragment. Thus spatial query operation can locate the relevant fragment on basis of spatial information of fragment, and extra invalid partitioning scan operations would be lighted. Then the storage and query efficiency would be improved. To verify the validity of ‘SPPS’ optimization strategy, this paper carries on relevant experiments based on HBase and records spatial query efficiency with and without ‘SPPS’ respectively. The experiments results indicate that ‘SPPS’ strategy can optimize the storage and query efficiency in distributed column-oriented databases.

Keywords

Partitioning Spatial data Spatial adjacency Distributed column-oriented database 

Notes

Acknowledgements

The authors would like to thank the following foundations for support: the National Key Research and Development Program of China (No. 2016YFB0502603), the National Key Research and Development Program of China (No. 2017YFB0503704), the Natural Science Foundation of Hubei Province of China (No. ZRY2015001543) and Fundamental Research Founds for National University, China University of Geosciences (Wuhan) (1610491B20).

References

  1. 1.
    Lu, F., Zhang, H.: Big data and generalized GIS. Geomat. Inf. Sci. Wuhan Univ. 39(6), 645–654 (2014)Google Scholar
  2. 2.
    Zhang, X., Song, W., Liu, L.: An implementation approach to store GIS spatial data on NoSQL database. In: Hu, S., Ye, X. (eds.) International Conference on Geoinformatics (2014)Google Scholar
  3. 3.
    Le, H.V., Takasu, A.: An Efficient Distributed Index for Geospatial Databases, pp. 28–42. Springer, Heidelberg (2015)Google Scholar
  4. 4.
    Alvanaki, F., et al.: GIS navigation boosted by column stores. Proc. Vldb Endow. 8(12), 1956–1959 (2015)CrossRefGoogle Scholar
  5. 5.
    Zhang, N., et al. HBaseSpatial: a scalable spatial data storage based on HBase. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2014)Google Scholar
  6. 6.
    Nishimura, S., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2SI), 289–319 (2013)CrossRefGoogle Scholar
  7. 7.
    Chen, Z., et al.: Hybrid Range Consistent Hash Partitioning Strategy—A New Data Partition Strategy for NoSQL Database, pp. 1161–1169. IEEE, New York (2013)Google Scholar
  8. 8.
    Qi, W., Song, J., Bao, Y.B.: Near-uniform range partition approach for increased partitioning in large database. In: IEEE International Conference on Information Management and Service (IMS) (2010)Google Scholar
  9. 9.
    Kumar, A., Yadav, J.S.: A review on partitioning techniques. Database 35(3), 342–347342 (2014)Google Scholar
  10. 10.
    George, L.: HBase schema design—things you need to know—O’Reilly Media Free. Live Events (2017)Google Scholar
  11. 11.
    Chang, F., et al.: Bigtable: a distributed storage system for structured data, pp. 205–218. USENIX Association, Berkeley (2006)Google Scholar
  12. 12.
  13. 13.
  14. 14.
    Akdogan, A., et al.: Cost-efficient partitioning of spatial data on cloud. In: International Conference on Big Data (2015)Google Scholar
  15. 15.
    Xia, C., Wang, T.: Cached Index of HBase based on coprocessor. In: International Conference on Computer Science and Communication Engineering (CSCE 2015), pp. 123–129 (2015)Google Scholar
  16. 16.
    Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: Proceedings of IEEE International Conference on Computer Science & Software Engineering (2015)Google Scholar
  17. 17.
    Zhuang, H., et al.: Design of a more scalable database system. In: IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, pp. 1213–1216. IEEE, New York (2015)Google Scholar
  18. 18.
    Zhong, Y., Liu, D.: The application of K-means clustering algorithm based on Hadoop. In: Proceedings of 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA 2016), pp. 88–92 (2016)Google Scholar
  19. 19.
    George, L.: HBase The Definitive Guide. O’Reilly Media, Newton (2011)Google Scholar
  20. 20.
    Cruz, F., et al.: Workload-Aware Table Splitting for NoSQL, pp. 399–404. Aurora Construction Materials, Rockbank (2014)Google Scholar
  21. 21.
    Ye, Z., Li, S.: A request skew aware heterogeneous distributed storage system based on Cassandra. In: International Conference on Computer and Management (2011)Google Scholar
  22. 22.
    Elghamrawy, S.M.: An adaptive load-balanced partitioning module in Cassandra using rendezvous hashing. In: International Conference on Advanced Intelligent Systems and Information (2016)Google Scholar
  23. 23.
    Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. Proc. Vldb Endow. 8(12), 1602–1605 (2015)CrossRefGoogle Scholar
  24. 24.
    Han, D., Stroulia, E.: HGrid: a data model for large geospatial data sets in HBase. In: IEEE Sixth International Conference on Cloud Computing (2013)Google Scholar
  25. 25.
    Fox, A., et al.: Spatio-temporal Indexing in Non-relational Distributed Databases. IEEE, New York (2013)CrossRefGoogle Scholar
  26. 26.
    Hughes, J.N., et al.: A survey of techniques and open-source tools for processing streams of spatio-temporal events. In: Proceedings of the 7th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS), pp. 39–42 (2016)Google Scholar
  27. 27.
  28. 28.
    Lee, K., et al.: Efficient spatial query processing for big data. In: ACM Sigspatial International Conference on Advances in Geographic Information Systems (2014)Google Scholar
  29. 29.
    Pal, S., et al.: Embedding an Extra Layer of Data Compression Scheme for Efficient Management of Big-Data, pp. 699–708. Springer, New Delhi (2015)Google Scholar
  30. 30.
    Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of the International Conference on Data Engineering (Series), pp. 497–506. Computer Soc Press, Los Alamitos (1997)Google Scholar
  31. 31.
  32. 32.
  33. 33.
    Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Kun Zheng
    • 1
  • Danpeng Gu
    • 1
  • Falin Fang
    • 1
    Email author
  • Miao Zhang
    • 1
  • Kang Zheng
    • 1
  • Qi Li
    • 1
  1. 1.Faculty of Information EngineeringChina University of Geoscience (WuHan)WuhanChina

Personalised recommendations