Handling query skew in large indexes: a view based approach
Indexing is one of the most important techniques to facilitate query processing over a multi-dimensional dataset. A commonly used strategy for such indexing is to keep the tree-structured index balanced. This strategy reduces query processing cost in the worst case, and can handle all different queries equally well. In other words, this strategy implies that all queries are uniformly issued, which is partially because the query distribution is not possibly known and will change over time in practice. A key issue we study in this work is whether it is the best to fully rely on a balanced tree-structured index in particular when datasets become larger and larger in the big data era. This means that, when a dataset becomes very large, it becomes unreasonable to assume that all data in any subspace are equally important and are uniformly accessed by all queries at the index level. Given the existence of query skew and the possible changes of query skew, in this paper, we study how to handle such query skew and such query skew changes at the index level without sacrifice of supporting any possible queries in a wellbalanced tree index and without a high overhead. To tackle the issue, we propose index-view at the index level, where an index-view is a short-cut in a balanced tree-structured index to access objects in the subspaces that are more frequently accessed, and propose a new index-view-centric framework for query processing using index-views in a bottom-up manner. We study index-views selection problem in both static and dynamic setting, and we confirm the effectiveness of our approach using large real and synthetic datasets.
Keywordsmulti-dimensional index query adaptive indexview
Unable to display preview. Download preview PDF.
This work was supported by grant of the Research Grants Council of the Hong Kong SAR, China (14209314).
- 1.Guttman A. R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM Special Interest Group on Management of Data. 1984, 47–57Google Scholar
- 8.Yuan J, Zheng Y, Zhang C Y, Xie W L, Xie X, Sun G Z, Huang Y. Tdrive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2010, 99–108Google Scholar
- 9.Levandoski J J, Sarwat M, Eldawy A, Mokbel M F. Lars: a locationaware recommender system. In: Proceedings of the 28th IEEE International Conference on Data Engineering. 2012, 450–461Google Scholar
- 11.Arya S, Mount D M, Netanyahu N S, Silverman R, Wu A Y. An optimal algorithm for approximate nearest neighbor searching. In: Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms. 1994, 573–582Google Scholar
- 12.Roy S B, Chakrabarti K. Location-aware type ahead search on spatial databases: semantics and efficiency. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 361–372Google Scholar
- 15.Felipe I D, Hristidis V, Rishe N. Keyword search on spatial databases. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 656–665Google Scholar
- 17.Cao X, Cong G, Jensen C S, Ooi B C. Collective spatial keyword querying. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2011, 373–384Google Scholar
- 18.Li G L, Feng J H, Xu J. Desks: direction-aware spatial keyword search. In: Proceedings of the 28th IEEE International Conference on Data Engineering. 2012, 474–485Google Scholar
- 19.Sheng C, Tao Y F. FIFO indexes for decomposable problems. In: Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2011, 25–35Google Scholar
- 24.Berinde R, Cormode G, Indyk P, Strauss M J. Space-optimal heavy hitters with strong error bounds. In: Proceedings of ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems. 2009, 157–166Google Scholar
- 25.Metwally A, Agrawal D, El Abbadi A. Efficient computation of frequent and top-k elements in data streams. In: Proceedings of International Conference on Database Theory. 2005, 398–412Google Scholar
- 26.Cudré-Mauroux P, Wu E, Madden S. Trajstore: an adaptive storage system for very large trajectory data sets. In: Proceedings of the 26th IEEE International Conference on Data Engineering. 2010, 109–120Google Scholar
- 27.Achakeev D, Seeger B, Widmayer P. Sort-based query-adaptive loading of R-trees. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 2080–2084Google Scholar
- 29.Park E, Mount D M. A self-adjusting data structure for multidimensional point sets. In: Proceedings of European Symposium on Algorithms. 2012, 778–789Google Scholar
- 30.Idreos S, Kersten M L, Manegold S. Database cracking. In: Proceedings of Innovative Data Systems Research. 2007, 68–78Google Scholar