Skip to main content
Log in

SLC-index: A scalable skip list-based index for cloud data processing

SLC: 基于跳表的可扩展云数据索引

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

Due to the increasing number of cloud applications, the amount of data in the cloud shows signs of growing faster than ever before. The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance. However, most cloud storage systems currently adopt a hash-like approach to retrieving data that only supports simple keyword-based enquiries, but lacks various forms of information search. Therefore, a scalable and efficient indexing scheme is clearly required. In this paper, we present a skip list-based cloud index, called SLC-index, which is a novel, scalable skip list-based indexing for cloud data processing. The SLC-index offers a two-layered architecture for extending indexing scope and facilitating better throughput. Dynamic load-balancing for the SLC-index is achieved by online migration of index nodes between servers. Furthermore, it is a flexible system due to its dynamic addition and removal of servers. The SLC-index is efficient for both point and range queries. Experimental results show the efficiency of the SLC-index and its usefulness as an alternative approach for cloud-suitable data structures.

摘要

随着基于云平台的应用的增加, 云存储系统中的数据呈现出爆炸式增长的趋势, 要求云数据处理系统具备高效的海量数据处理能力, 然而, 现有的云存储系统大多采用哈希方法检索数据, 主要提供针对键值的查询, 范围查询效率较低。 因此, 有必要为云存储系统构建辅助数据索引。 提出了一种基于跳表的云数据索引结构, 简称 SLC 索引。 SLC 索引采用双层体系结构, 该索引结构契合云存储系统的分布式存储特性, 易于在多个服务器节点上灵活扩展。 局部索引节点基于查询耗费计算模型向全局索引节点发布索引信息, 保证 SLC 索引结构的整体高效性。 通过动态的索引节点分裂与合并机制, 降低数据倾斜带来的性能影响, 实现索引结构负载均衡。 实验结果表明, SLC 索引能够支持高效的单点查询和范围查询, 是一种适用于云计算系统的具有高可扩展性的辅助数据索引。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ARMBRUST M, FOX A, GRIFFITH R. A view of cloud computing [J]. Communications of the ACM, 2010, 53(4): 50–58. DOI: https://doi.org/10.1145/1721654.1721672.

    Article  Google Scholar 

  2. BEN-YEHUDA O A, BEN-YEHUDA M, SCHUSTER A, TSAFRIR D. Deconstructing Amazon EC2 spot instance pricing [J]. ACM Transactions on Economics and Computation, 2013, 1(3): 16. DOI: https://doi.org/10.1145/2509413.2509416.

    Google Scholar 

  3. WANG Li-zhe, von LASXEWSKI G, YOUNGE A. Cloud computing: A perspective study [J]. New Generation Computing, 2010, 28(2): 137–146. DOI: https://doi.org/10.1007/s00354-008-0081-5.

    Article  MATH  Google Scholar 

  4. CALDER B, WANG J, OGUS A W. Windows Azure storage: A highly available cloud storage service with strong consistency [C]// ACM Symposium on Operating Systems Principles. Cascais: ACM, 2011: 143–157. DOI: https://doi.org/10.1145/2043556.2043571.

    Google Scholar 

  5. LIU An-feng, LIU Xiao, LI He. MDMA: A multi-data and multi-ACK verified selective forwarding attack detection scheme in WSNs [J]. IEICE Transactions on Information and Systems, 2016, E99-D(8): 2010–2018. [2016-08-01] https://doi.org/search.ieice.org/bin/summary.php?id=e99-d_8_2010.

    Article  Google Scholar 

  6. DING Y S, HAO K R. Multi-objective workflow scheduling in cloud system based on cooperative multi-swarm optimization algorithm [J]. Journal of Central South University, 2017, 24(5):1050–1062. DOI: https://doi.org/10.1007/s11771-017-3508-7.

    Article  Google Scholar 

  7. CHANG F, DEAN J, GHEMAWAT S. Bigtable: A distributed storage system for structured data [J]. ACM Transactions on Computer Systems, 2008, 26(2): 1–26. DOI: https://doi.org/10.1145/1365815.1365816.

    Article  Google Scholar 

  8. GHEMAWAT S, GOBIOFF H, LEUNG S T. The Google file system [C]// ACM Symposium on Operating Systems Principles. Bolton landing: ACM, 2003, 37(5): 29–43. DOI: https://doi.org/10.1145/1165389.945450.

    Article  Google Scholar 

  9. VAVILAPALLI V K, MURTHY A C, DOUGLAS C. Apache hadoop yarn: Yet another resource negotiator [C]// ACM Annual Symposium on Cloud Computing. Santa clara: ACM, 2013(5): 1–16. DOI: https://doi.org/10.1145/2523616.2523633.

    Google Scholar 

  10. DECANDIA G, HASTORUN D, JAMPANI M. Dynamo: Amazon’s highly available key-value store [C]// ACM Symposium on Operating Systems Principles. Stevenson: ACM, 2007, 41(6): 205–220. DOI: https://doi.org/10.1145/1323293.1294281.

    Article  Google Scholar 

  11. LAKSHMAN A, MALIK P. Cassandra: A decentralized structured storage system [J]. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35–40. DOI: https://doi.org/10.1145/1773912.1773922.

    Article  Google Scholar 

  12. DITTRICH J, QUIANERUIZ J, JINDAL A. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing) [J]. Proceedings of The VLDB Endowment, 2010, 3(1, 2): 515–529. DOI: https://doi.org/10.14778/1920841.1920908.

    Article  Google Scholar 

  13. YANG H C, PARKER D S. Traverse: Simplified indexing on large map-reduce-merge clusters [C]// International Conference on Database Systems for Advanced Applications. Brisbane: Springer, 2009: 308–322. DOI: https://doi.org/10.1007/978-3-642-00887-0_27.

    Chapter  Google Scholar 

  14. LIN J, RYABOY D, WEIL K. Full-text indexing for optimizing selection operations in large-scale data analytics [C]// The Second International Workshop on Map Reduce and Its Applications. San Jose: ACM, 2011: 59–66. DOI: https://doi.org/10.1145/1996092.1996105.

    Google Scholar 

  15. LU Peng, CHEN Guang, OOI B C. ScalaGiST: Scalable generalized search trees for mapreduce systems [innovative systems paper] [J]. Proceedings of the VLDB Endowment, 2014, 7(14): 1797–1808. DOI: https://doi.org/10.14778/2733085.2733087.

    Article  Google Scholar 

  16. RICHTER S, QUIANERUIZ J, SCHUH S. Towards zero-overhead static and adaptive indexing in Hadoop [J]. Proceedings of the VLDB Endowment, 2014, 23(3): 469–494. DOI: https://doi.org/10.1007/s00778-013-0332-z.

    Google Scholar 

  17. CHANG B R, TSAI H F, HSU H T. Secondary ındex to big data NoSQL database–ıncorporating solr to HBase approach [J]. Journal of Information Hiding and Multimedia Signal Processing, 2016, 7(1): 80–89. https://doi.org/bit.kuas.edu.tw/~jihmsp/2016/vol7/JIH-MSP-2016-01-009.pdf.

    Google Scholar 

  18. AGUILERA M K, GOLAB W, SHAH M A. A practical scalable distributed B-Tree [J]. Proceedings of the VLDB Endowment, 2008, 1(1): 598–609. DOI: https://doi.org/10.14778/1453856.1453922.

    Article  Google Scholar 

  19. HUANG Bin, PENG Yu-xing. An efficient two-level bitmap index for cloud data management [C]// IEEE International Conference on Communication Software and Networks. Xi’an: IEEE, 2011: 509–513. DOI: https://doi.org/10.1109/ICCSN.2011.6014776.

    Google Scholar 

  20. ZHOU Wei, LU Jin, LUAN Zhong-zhi. SNB-index: A SkipNet and B+ tree based auxiliary Cloud index [J]. Cluster Computing, 2014, 17(2): 453–462. DOI 10.1007/s10586-013-0246-y.

    Article  Google Scholar 

  21. WANG Jin-bao, WU Sai, GAO Hong. Indexing multidimensional data in a cloud system [C]// ACM SIGMOD International Conference on Management of Data. Indianapolis: ACM, 2010: 591–602. DOI: https://doi.org/10.1145/1807167.1807232.

    Google Scholar 

  22. CHENG Chun-lin, SUN Chun-ju, XU Xiao-long. A multi-dimensional ındex structure based on ımproved VA-file and CAN in the cloud [J]. International Journal of Automation and Computing, 2014, 11(1): 109–117. DOI: https://doi.org/10.1007/s11633-014-0772-y.

    Article  Google Scholar 

  23. LU Peng, WU Sai, SHOU Li-dan. An efficient and compact indexing scheme for large-scale data store [C]// IEEE International Conference on Data Engineering. Brisbane: IEEE, 2013: 326–337. DOI: https://doi.org/10.1109/ICDE.2013.6544836.

    Google Scholar 

  24. DINARI H, NADERI H. A method for improving graph queries processing using positional inverted index (PII) idea in search engines and parallelization techniques [J]. Journal of Central South University, 2016, 23(1): 150–159 DOI: https://doi.org/10.1007/s11771-016-3058-4.

    Article  Google Scholar 

  25. HE Jing, WU Yue, DONG Yun-yun. Dynamic multidimensional index for large-scale cloud data [J]. Journal of Cloud Computing Advances Systems & Applications, 2016, 5(1): 1–12. DOI: https://doi.org/10.1186/s13677-016-0060-1.

    Article  Google Scholar 

  26. PUGH W. Skip lists: A probabilistic alternative to balanced trees [J]. Communications of The ACM, 1990, 33(6): 668–676. DOI: https://doi.org/10.1145/78973.78977.

    Article  Google Scholar 

  27. JESI G. P. Peersim HOWTO: Build a new protocol for the PeerSim 1.0 simulator [EB/OL]. [2011-08-04] https://doi.org/peersim.sourceforge.net/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Zhou  (周维).

Additional information

Foundation item: Projects(61363021, 61540061, 61663047) supported by the National Natural Science Foundation of China; Project(2017SE206) supported by the Open Foundation of Key Laboratory in Software Engineering of Yunnan Province, China

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, J., Yao, Sw., Cai, L. et al. SLC-index: A scalable skip list-based index for cloud data processing. J. Cent. South Univ. 25, 2438–2450 (2018). https://doi.org/10.1007/s11771-018-3927-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-018-3927-0

Key words

关键词

Navigation