SLC-index: A scalable skip list-based index for cloud data processing

He, Jing; Yao, Shao-wen; Cai, Li; Zhou, Wei

doi:10.1007/s11771-018-3927-0

SLC-index: A scalable skip list-based index for cloud data processing

SLC: 基于跳表的可扩展云数据索引

Published: 12 November 2018

Volume 25, pages 2438–2450, (2018)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

Jing He (何婧)^1,2,
Shao-wen Yao (姚绍文)^1,2,
Li Cai (蔡莉)^1,2 &
…
Wei Zhou (周维) ORCID: orcid.org/0000-0002-5881-9436^1,2

82 Accesses
5 Citations
Explore all metrics

Abstract

Due to the increasing number of cloud applications, the amount of data in the cloud shows signs of growing faster than ever before. The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance. However, most cloud storage systems currently adopt a hash-like approach to retrieving data that only supports simple keyword-based enquiries, but lacks various forms of information search. Therefore, a scalable and efficient indexing scheme is clearly required. In this paper, we present a skip list-based cloud index, called SLC-index, which is a novel, scalable skip list-based indexing for cloud data processing. The SLC-index offers a two-layered architecture for extending indexing scope and facilitating better throughput. Dynamic load-balancing for the SLC-index is achieved by online migration of index nodes between servers. Furthermore, it is a flexible system due to its dynamic addition and removal of servers. The SLC-index is efficient for both point and range queries. Experimental results show the efficiency of the SLC-index and its usefulness as an alternative approach for cloud-suitable data structures.

摘要

随着基于云平台的应用的增加, 云存储系统中的数据呈现出爆炸式增长的趋势, 要求云数据处理系统具备高效的海量数据处理能力, 然而, 现有的云存储系统大多采用哈希方法检索数据, 主要提供针对键值的查询, 范围查询效率较低。因此, 有必要为云存储系统构建辅助数据索引。提出了一种基于跳表的云数据索引结构, 简称 SLC 索引。 SLC 索引采用双层体系结构, 该索引结构契合云存储系统的分布式存储特性, 易于在多个服务器节点上灵活扩展。局部索引节点基于查询耗费计算模型向全局索引节点发布索引信息, 保证 SLC 索引结构的整体高效性。通过动态的索引节点分裂与合并机制, 降低数据倾斜带来的性能影响, 实现索引结构负载均衡。实验结果表明, SLC 索引能够支持高效的单点查询和范围查询, 是一种适用于云计算系统的具有高可扩展性的辅助数据索引。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic multidimensional index for large-scale cloud data

Article Open access 15 July 2016

SmallClient for big data: an indexing framework towards fast data retrieval

Article 20 December 2016

ABR-Tree: An Efficient Distributed Multidimensional Indexing Approach for Massive Data

References

ARMBRUST M, FOX A, GRIFFITH R. A view of cloud computing [J]. Communications of the ACM, 2010, 53(4): 50–58. DOI: https://doi.org/10.1145/1721654.1721672.
Article Google Scholar
BEN-YEHUDA O A, BEN-YEHUDA M, SCHUSTER A, TSAFRIR D. Deconstructing Amazon EC2 spot instance pricing [J]. ACM Transactions on Economics and Computation, 2013, 1(3): 16. DOI: https://doi.org/10.1145/2509413.2509416.
Google Scholar
WANG Li-zhe, von LASXEWSKI G, YOUNGE A. Cloud computing: A perspective study [J]. New Generation Computing, 2010, 28(2): 137–146. DOI: https://doi.org/10.1007/s00354-008-0081-5.
Article MATH Google Scholar
CALDER B, WANG J, OGUS A W. Windows Azure storage: A highly available cloud storage service with strong consistency [C]// ACM Symposium on Operating Systems Principles. Cascais: ACM, 2011: 143–157. DOI: https://doi.org/10.1145/2043556.2043571.
Google Scholar
LIU An-feng, LIU Xiao, LI He. MDMA: A multi-data and multi-ACK verified selective forwarding attack detection scheme in WSNs [J]. IEICE Transactions on Information and Systems, 2016, E99-D(8): 2010–2018. [2016-08-01] https://doi.org/search.ieice.org/bin/summary.php?id=e99-d_8_2010.
Article Google Scholar
DING Y S, HAO K R. Multi-objective workflow scheduling in cloud system based on cooperative multi-swarm optimization algorithm [J]. Journal of Central South University, 2017, 24(5):1050–1062. DOI: https://doi.org/10.1007/s11771-017-3508-7.
Article Google Scholar
CHANG F, DEAN J, GHEMAWAT S. Bigtable: A distributed storage system for structured data [J]. ACM Transactions on Computer Systems, 2008, 26(2): 1–26. DOI: https://doi.org/10.1145/1365815.1365816.
Article Google Scholar
GHEMAWAT S, GOBIOFF H, LEUNG S T. The Google file system [C]// ACM Symposium on Operating Systems Principles. Bolton landing: ACM, 2003, 37(5): 29–43. DOI: https://doi.org/10.1145/1165389.945450.
Article Google Scholar
VAVILAPALLI V K, MURTHY A C, DOUGLAS C. Apache hadoop yarn: Yet another resource negotiator [C]// ACM Annual Symposium on Cloud Computing. Santa clara: ACM, 2013(5): 1–16. DOI: https://doi.org/10.1145/2523616.2523633.
Google Scholar
DECANDIA G, HASTORUN D, JAMPANI M. Dynamo: Amazon’s highly available key-value store [C]// ACM Symposium on Operating Systems Principles. Stevenson: ACM, 2007, 41(6): 205–220. DOI: https://doi.org/10.1145/1323293.1294281.
Article Google Scholar
LAKSHMAN A, MALIK P. Cassandra: A decentralized structured storage system [J]. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35–40. DOI: https://doi.org/10.1145/1773912.1773922.
Article Google Scholar
DITTRICH J, QUIANERUIZ J, JINDAL A. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing) [J]. Proceedings of The VLDB Endowment, 2010, 3(1, 2): 515–529. DOI: https://doi.org/10.14778/1920841.1920908.
Article Google Scholar
YANG H C, PARKER D S. Traverse: Simplified indexing on large map-reduce-merge clusters [C]// International Conference on Database Systems for Advanced Applications. Brisbane: Springer, 2009: 308–322. DOI: https://doi.org/10.1007/978-3-642-00887-0_27.
Chapter Google Scholar
LIN J, RYABOY D, WEIL K. Full-text indexing for optimizing selection operations in large-scale data analytics [C]// The Second International Workshop on Map Reduce and Its Applications. San Jose: ACM, 2011: 59–66. DOI: https://doi.org/10.1145/1996092.1996105.
Google Scholar
LU Peng, CHEN Guang, OOI B C. ScalaGiST: Scalable generalized search trees for mapreduce systems [innovative systems paper] [J]. Proceedings of the VLDB Endowment, 2014, 7(14): 1797–1808. DOI: https://doi.org/10.14778/2733085.2733087.
Article Google Scholar
RICHTER S, QUIANERUIZ J, SCHUH S. Towards zero-overhead static and adaptive indexing in Hadoop [J]. Proceedings of the VLDB Endowment, 2014, 23(3): 469–494. DOI: https://doi.org/10.1007/s00778-013-0332-z.
Google Scholar
CHANG B R, TSAI H F, HSU H T. Secondary ındex to big data NoSQL database–ıncorporating solr to HBase approach [J]. Journal of Information Hiding and Multimedia Signal Processing, 2016, 7(1): 80–89. https://doi.org/bit.kuas.edu.tw/~jihmsp/2016/vol7/JIH-MSP-2016-01-009.pdf.
Google Scholar
AGUILERA M K, GOLAB W, SHAH M A. A practical scalable distributed B-Tree [J]. Proceedings of the VLDB Endowment, 2008, 1(1): 598–609. DOI: https://doi.org/10.14778/1453856.1453922.
Article Google Scholar
HUANG Bin, PENG Yu-xing. An efficient two-level bitmap index for cloud data management [C]// IEEE International Conference on Communication Software and Networks. Xi’an: IEEE, 2011: 509–513. DOI: https://doi.org/10.1109/ICCSN.2011.6014776.
Google Scholar
ZHOU Wei, LU Jin, LUAN Zhong-zhi. SNB-index: A SkipNet and B+ tree based auxiliary Cloud index [J]. Cluster Computing, 2014, 17(2): 453–462. DOI 10.1007/s10586-013-0246-y.
Article Google Scholar
WANG Jin-bao, WU Sai, GAO Hong. Indexing multidimensional data in a cloud system [C]// ACM SIGMOD International Conference on Management of Data. Indianapolis: ACM, 2010: 591–602. DOI: https://doi.org/10.1145/1807167.1807232.
Google Scholar
CHENG Chun-lin, SUN Chun-ju, XU Xiao-long. A multi-dimensional ındex structure based on ımproved VA-file and CAN in the cloud [J]. International Journal of Automation and Computing, 2014, 11(1): 109–117. DOI: https://doi.org/10.1007/s11633-014-0772-y.
Article Google Scholar
LU Peng, WU Sai, SHOU Li-dan. An efficient and compact indexing scheme for large-scale data store [C]// IEEE International Conference on Data Engineering. Brisbane: IEEE, 2013: 326–337. DOI: https://doi.org/10.1109/ICDE.2013.6544836.
Google Scholar
DINARI H, NADERI H. A method for improving graph queries processing using positional inverted index (PII) idea in search engines and parallelization techniques [J]. Journal of Central South University, 2016, 23(1): 150–159 DOI: https://doi.org/10.1007/s11771-016-3058-4.
Article Google Scholar
HE Jing, WU Yue, DONG Yun-yun. Dynamic multidimensional index for large-scale cloud data [J]. Journal of Cloud Computing Advances Systems & Applications, 2016, 5(1): 1–12. DOI: https://doi.org/10.1186/s13677-016-0060-1.
Article Google Scholar
PUGH W. Skip lists: A probabilistic alternative to balanced trees [J]. Communications of The ACM, 1990, 33(6): 668–676. DOI: https://doi.org/10.1145/78973.78977.
Article Google Scholar
JESI G. P. Peersim HOWTO: Build a new protocol for the PeerSim 1.0 simulator [EB/OL]. [2011-08-04] https://doi.org/peersim.sourceforge.net/.

Download references

Author information

Authors and Affiliations

National Pilot School of Software, Yunnan University, Kunming, 650091, China
Jing He (何婧), Shao-wen Yao (姚绍文), Li Cai (蔡莉) & Wei Zhou (周维)
Key Laboratory in Software Engineering of Yunnan Province, Kunming, 650091, China
Jing He (何婧), Shao-wen Yao (姚绍文), Li Cai (蔡莉) & Wei Zhou (周维)

Authors

Jing He (何婧)
View author publications
You can also search for this author in PubMed Google Scholar
Shao-wen Yao (姚绍文)
View author publications
You can also search for this author in PubMed Google Scholar
Li Cai (蔡莉)
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou (周维)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zhou (周维).

Additional information

Foundation item: Projects(61363021, 61540061, 61663047) supported by the National Natural Science Foundation of China; Project(2017SE206) supported by the Open Foundation of Key Laboratory in Software Engineering of Yunnan Province, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, J., Yao, Sw., Cai, L. et al. SLC-index: A scalable skip list-based index for cloud data processing. J. Cent. South Univ. 25, 2438–2450 (2018). https://doi.org/10.1007/s11771-018-3927-0

Download citation

Received: 11 July 2017
Accepted: 09 January 2018
Published: 12 November 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11771-018-3927-0

Key words

关键词

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SLC-index: A scalable skip list-based index for cloud data processing

Abstract

摘要

Access this article

Similar content being viewed by others

Dynamic multidimensional index for large-scale cloud data

SmallClient for big data: an indexing framework towards fast data retrieval

ABR-Tree: An Efficient Distributed Multidimensional Indexing Approach for Massive Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

关键词

Navigation

SLC-index: A scalable skip list-based index for cloud data processing

Abstract

摘要

Access this article

Similar content being viewed by others

Dynamic multidimensional index for large-scale cloud data

SmallClient for big data: an indexing framework towards fast data retrieval

ABR-Tree: An Efficient Distributed Multidimensional Indexing Approach for Massive Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

Search

Navigation