Slicing the Dimensionality: Top-k Query Processing for High-Dimensional Spaces

Guzun, Gheorghi; Tosado, Joel; Canahuate, Guadalupe

doi:10.1007/978-3-662-45714-6_2

Gheorghi Guzun¹⁹,
Joel Tosado¹⁹ &
Guadalupe Canahuate¹⁹

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8800))

319 Accesses
4 Citations

Abstract

Top-k (preference) queries are used in several domains to retrieve the set of \(k\) tuples that more closely match a given query. For high-dimensional spaces, evaluation of top-k queries is expensive, as data and space partitioning indices perform worse than sequential scan. An alternative approach is the use of sorted lists to speed up query evaluation. This approach extends performance gains when compared to sequential scan to about ten dimensions. However, data-sets for which preference queries are considered, often are high-dimensional. In this paper, we explore the the use of bit-sliced indices (BSI) to encode the attributes or score lists and perform top-k queries over high-dimensional data using bit-wise operations. Our approach does not require sorting or random access to the index. Additionally, bit-sliced indices require less space than other type of indices. The size of the bit-sliced index (without using compression) for a normalized data-set with 3 decimals is 60 times smaller than the size of sorted lists. Furthermore, our experimental evaluation shows that the use of BSI for top-k query processing is more efficient than Sequential Scan for high-dimensional data. When compared to Sequential Top-k Algorithm (STA), BSI is one order of magnitude faster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 11:1–11:58 (2008). doi:10.1145/1391729.1391730. http://doi.acm.org/10.1145/1391729.1391730
Pagani, M.: Encyclopedia of Multimedia Technology and Networking, 2nd edn., Information Science Reference - Imprint of: IGI Publishing, Hershey (2008)
Google Scholar
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001). doi:10:1145=502807:502809. http://doi.acm.org/10.1145/502807.502809
Daoudi, I., Ouatik, S.E., Kharraz, A.E., Idrissi, K., Aboutajdine, D.: Vector approximation based indexing for high-dimensional multimedia databases (2008)
Google Scholar
Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. on Knowl. and Data Eng. 16(8), 992–1009 (2004). doi:10.1109/TKDE.2004.30. http://dx.doi.org/10.1109/TKDE.2004.30
Fagin, R.: Combining fuzzy information from multiple systems (extended abstract). In: Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1996, pp. 216–226. ACM, New York (1996). doi:10.1145/237661.237715. http://doi.acm.org/10.1145/237661.237715
Long, X., Suel, T.: Optimized query execution in large search engines with global page ordering. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, VLDB Endowment, vol. 29, pp. 129–140 (2003)
Google Scholar
Persin, M., Zobel, J., Sacks-davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47, 749–764 (1996)
Article Google Scholar
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing, PODC 2004, pp. 206–215. ACM, New York (2004). doi:10.1145/1011767.1011798. http://doi.acm.org/10.1145/1011767.1011798
Wu, M., Xu, J., Tang, X., Lee, W.-C.: Top-k monitoring in wireless sensor networks. IEEE Trans. on Knowl. and Data Eng. 19(7), 962–976 (2007). doi:10.1109/TKDE.2007.1038
Google Scholar
Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society, Washington, DC (2005). doi:10.1109/ICDE.2005.115. http://dx.doi.org/10.1109/ICDE.2005.115
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solutionfor computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006). doi:10.1145/1166074.1166084. http://doi.acm.org/10.1145/1166074.1166084
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004). doi:10.1145/1005566.1005569. http://doi.acm.org/10.1145/1005566.1005569
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS, pp. 102–113 (2001)
Google Scholar
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, VLDB Endowment, pp. 495–506. http://dl.acm.org/citation.cfm?id=1325851.1325909
Yu, A., Agarwal, P.K., Yang, J.: Topk preferences in high dimensions (2014)
Google Scholar
Gurský, P., Vojtáš, P.: Speeding up the nra algorithm. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 243–255. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-87993-0_20
Chapter Google Scholar
Mamoulis, N., Cheng, K.H., Yiu, M.L., Cheung, D.W.: Efficient aggregation of ranked inputs. In: ICDE. IEEE Computer Society, p. 72 (2006)
Google Scholar
Natsev, A., Chang, Y.C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB, pp. 281–290 (2001)
Google Scholar
Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries for image databases, pp. 419–428 (2000)
Google Scholar
Jin, W., Patel, J.M.: Efficient and generic evaluation of ranked queries. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 601–612. ACM, New York (2011). doi:10.1145/1989323.1989386. http://doi.acm.org/10.1145/1989323.1989386
O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 38–49. ACM Press (1997). http://doi.acm.org/10.1145/253260.253268
Rinfret, D., O’Neil, P., O’Neil, E.: Bit-sliced index arithmetic. SIGMOD Rec. 30(2), 47–57 (2001). doi:http://doi.acm.org/10.1145/376284.375669
Google Scholar
Wu, M.-C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE 1998: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 220–230. IEEE Computer Society Washington, DC (1998)
Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 28–36. Society for Industrial and Applied Mathematics. Philadelphia (2003). http://dl.acm.org/citation.cfm?id=644108.644113
Pang, H., Ding, X., Zheng, B.: Efficient processing of exact top-k queries over disk-resident sorted lists. The VLDB Journal 19(3), 437–456 (2010). doi:10:1007=s00778–009–0174–x. http://dx.doi.org/10.1007/s00778-009-0174-x
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: Index-access optimized top-k query processing, In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, VLDB Endowment, pp. 475–486 (2006). http://dl.acm.org/citation.cfm?id=1182635.1164169
Gurský, P., Vojtáš, P.: On Top-k search with no random access using small memory. In: Atzeni, P., Caplinskas, A., Jaakkola, H. (eds.) ADBIS 2008. LNCS, vol. 5207, pp. 97–111. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-85713-6_8
Chapter Google Scholar
Chuan Chang, K.C., won Hwang, S.: Minimal probing: Supporting expensive predicates for top-k queries. In: SIGMOD, pp. 346–357 (2002)
Google Scholar
Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, VLDB Endowment, pp. 451–462 (2006). http://dl.acm.org/citation.cfm?id=1182635.1164167
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects
Google Scholar
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data (2009). doi:10.1137/ 070710111. http://dx.doi.org/10.1137/070710111
Pareto, V.: Manual of political economy (1906)
Google Scholar
lászló Barabáasi, A., Albert, R.: Emergence of scaling in random networks, Science
Google Scholar
Barabasi, A.-L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207 (2005). http://www.citebase.org/abstract?id=oai:arXiv.org:cond-mat/0505371
Zipf, G.: Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge (1949). http://publication.wilsonwong.me/load.php?id=233281783

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, USA
Gheorghi Guzun, Joel Tosado & Guadalupe Canahuate

Authors

Gheorghi Guzun
View author publications
You can also search for this author in PubMed Google Scholar
Joel Tosado
View author publications
You can also search for this author in PubMed Google Scholar
Guadalupe Canahuate
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gheorghi Guzun .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, FAW, Linz, Austria
Josef Küng
Linz, Austria
Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Guzun, G., Tosado, J., Canahuate, G. (2014). Slicing the Dimensionality: Top-k Query Processing for High-Dimensional Spaces. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV. Lecture Notes in Computer Science(), vol 8800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45714-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-45714-6_2
Published: 21 November 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45713-9
Online ISBN: 978-3-662-45714-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics