Skip to main content

Slicing the Dimensionality: Top-k Query Processing for High-Dimensional Spaces

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8800))

Abstract

Top-k (preference) queries are used in several domains to retrieve the set of \(k\) tuples that more closely match a given query. For high-dimensional spaces, evaluation of top-k queries is expensive, as data and space partitioning indices perform worse than sequential scan. An alternative approach is the use of sorted lists to speed up query evaluation. This approach extends performance gains when compared to sequential scan to about ten dimensions. However, data-sets for which preference queries are considered, often are high-dimensional. In this paper, we explore the the use of bit-sliced indices (BSI) to encode the attributes or score lists and perform top-k queries over high-dimensional data using bit-wise operations. Our approach does not require sorting or random access to the index. Additionally, bit-sliced indices require less space than other type of indices. The size of the bit-sliced index (without using compression) for a normalized data-set with 3 decimals is 60 times smaller than the size of sorted lists. Furthermore, our experimental evaluation shows that the use of BSI for top-k query processing is more efficient than Sequential Scan for high-dimensional data. When compared to Sequential Top-k Algorithm (STA), BSI is one order of magnitude faster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 11:1–11:58 (2008). doi:10.1145/1391729.1391730. http://doi.acm.org/10.1145/1391729.1391730

  2. Pagani, M.: Encyclopedia of Multimedia Technology and Networking, 2nd edn., Information Science Reference - Imprint of: IGI Publishing, Hershey (2008)

    Google Scholar 

  3. Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001). doi:10:1145=502807:502809. http://doi.acm.org/10.1145/502807.502809

  4. Daoudi, I., Ouatik, S.E., Kharraz, A.E., Idrissi, K., Aboutajdine, D.: Vector approximation based indexing for high-dimensional multimedia databases (2008)

    Google Scholar 

  5. Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. on Knowl. and Data Eng. 16(8), 992–1009 (2004). doi:10.1109/TKDE.2004.30. http://dx.doi.org/10.1109/TKDE.2004.30

  6. Fagin, R.: Combining fuzzy information from multiple systems (extended abstract). In: Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1996, pp. 216–226. ACM, New York (1996). doi:10.1145/237661.237715. http://doi.acm.org/10.1145/237661.237715

  7. Long, X., Suel, T.: Optimized query execution in large search engines with global page ordering. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, VLDB Endowment, vol. 29, pp. 129–140 (2003)

    Google Scholar 

  8. Persin, M., Zobel, J., Sacks-davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47, 749–764 (1996)

    Article  Google Scholar 

  9. Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing, PODC 2004, pp. 206–215. ACM, New York (2004). doi:10.1145/1011767.1011798. http://doi.acm.org/10.1145/1011767.1011798

  10. Wu, M., Xu, J., Tang, X., Lee, W.-C.: Top-k monitoring in wireless sensor networks. IEEE Trans. on Knowl. and Data Eng. 19(7), 962–976 (2007). doi:10.1109/TKDE.2007.1038

    Google Scholar 

  11. Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society, Washington, DC (2005). doi:10.1109/ICDE.2005.115. http://dx.doi.org/10.1109/ICDE.2005.115

  12. Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solutionfor computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006). doi:10.1145/1166074.1166084. http://doi.acm.org/10.1145/1166074.1166084

  13. Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004). doi:10.1145/1005566.1005569. http://doi.acm.org/10.1145/1005566.1005569

  14. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS, pp. 102–113 (2001)

    Google Scholar 

  15. Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, VLDB Endowment, pp. 495–506. http://dl.acm.org/citation.cfm?id=1325851.1325909

  16. Yu, A., Agarwal, P.K., Yang, J.: Topk preferences in high dimensions (2014)

    Google Scholar 

  17. Gurský, P., Vojtáš, P.: Speeding up the nra algorithm. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 243–255. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-87993-0_20

    Chapter  Google Scholar 

  18. Mamoulis, N., Cheng, K.H., Yiu, M.L., Cheung, D.W.: Efficient aggregation of ranked inputs. In: ICDE. IEEE Computer Society, p. 72 (2006)

    Google Scholar 

  19. Natsev, A., Chang, Y.C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB, pp. 281–290 (2001)

    Google Scholar 

  20. Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries for image databases, pp. 419–428 (2000)

    Google Scholar 

  21. Jin, W., Patel, J.M.: Efficient and generic evaluation of ranked queries. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 601–612. ACM, New York (2011). doi:10.1145/1989323.1989386. http://doi.acm.org/10.1145/1989323.1989386

  22. O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 38–49. ACM Press (1997). http://doi.acm.org/10.1145/253260.253268

  23. Rinfret, D., O’Neil, P., O’Neil, E.: Bit-sliced index arithmetic. SIGMOD Rec. 30(2), 47–57 (2001). doi:http://doi.acm.org/10.1145/376284.375669

    Google Scholar 

  24. Wu, M.-C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE 1998: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 220–230. IEEE Computer Society Washington, DC (1998)

    Google Scholar 

  25. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 28–36. Society for Industrial and Applied Mathematics. Philadelphia (2003). http://dl.acm.org/citation.cfm?id=644108.644113

  26. Pang, H., Ding, X., Zheng, B.: Efficient processing of exact top-k queries over disk-resident sorted lists. The VLDB Journal 19(3), 437–456 (2010). doi:10:1007=s00778–009–0174–x. http://dx.doi.org/10.1007/s00778-009-0174-x

  27. Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: Index-access optimized top-k query processing, In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, VLDB Endowment, pp. 475–486 (2006). http://dl.acm.org/citation.cfm?id=1182635.1164169

  28. Gurský, P., Vojtáš, P.: On Top-k search with no random access using small memory. In: Atzeni, P., Caplinskas, A., Jaakkola, H. (eds.) ADBIS 2008. LNCS, vol. 5207, pp. 97–111. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-85713-6_8

    Chapter  Google Scholar 

  29. Chuan Chang, K.C., won Hwang, S.: Minimal probing: Supporting expensive predicates for top-k queries. In: SIGMOD, pp. 346–357 (2002)

    Google Scholar 

  30. Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, VLDB Endowment, pp. 451–462 (2006). http://dl.acm.org/citation.cfm?id=1182635.1164167

  31. Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects

    Google Scholar 

  32. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data (2009). doi:10.1137/ 070710111. http://dx.doi.org/10.1137/070710111

  33. Pareto, V.: Manual of political economy (1906)

    Google Scholar 

  34. lászló Barabáasi, A., Albert, R.: Emergence of scaling in random networks, Science

    Google Scholar 

  35. Barabasi, A.-L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207 (2005). http://www.citebase.org/abstract?id=oai:arXiv.org:cond-mat/0505371

  36. Zipf, G.: Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge (1949). http://publication.wilsonwong.me/load.php?id=233281783

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gheorghi Guzun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Guzun, G., Tosado, J., Canahuate, G. (2014). Slicing the Dimensionality: Top-k Query Processing for High-Dimensional Spaces. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV. Lecture Notes in Computer Science(), vol 8800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45714-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45714-6_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45713-9

  • Online ISBN: 978-3-662-45714-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics