Advertisement

Fast Top-Q and Top-K Query Answering

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10646)

Abstract

Efficient retrieval of the most relevant (e.g. top-k, k-NN) tuples is an important requirement in information systems which access large amounts of data. Top-k (or k-nearest-neighbors) queries retrieve the k-objects which score best for a specified objective function. But retrieving the closest objects does not tell the user how close or similar the objects are to the ideal object described by the input query. To support the query issuer more appropriate we introduce the top-q query answering TQQA which does not return a fixed number of result tuples but all tuples that are similar to the searched optimum with at least some minimum degree q. We show how to combine top-q queries with top-k queries enabling the user to post a large number of interesting queries. To the best of our knowledge neither such a top-q query answering approach nor a combination with top-k has not been proposed before. We implemented our approach and evaluated it against the best position algorithm BPA-2 which proved to be the among the fastest threshold based top-k query answering approaches. Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements.

Keywords

Top-Q query answering Top-K query answering Approximate querying Result ranking 

References

  1. 1.
    Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003)Google Scholar
  2. 2.
    Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Databases, pp. 495–506. VLDB Endowment (2007)Google Scholar
  3. 3.
    Asslaber, M., Abuja, P., et al.: The genome austria tissue bank (gatib). Pathobiology 74, 251–258 (2007)CrossRefGoogle Scholar
  4. 4.
    Church, K., Gale, W.: Inverse document frequency (idf): a measure of deviations from poisson. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 283–295. Springer, Dordrecht (1999)Google Scholar
  5. 5.
    Dabringer, C., Eder, J.: Efficient top-k retrieval for user preference queries. In: Proceedings of the 26th ACM Symposium on Applied Computing (2011)Google Scholar
  6. 6.
    Dabringer, C., Eder, J.: Fast top-k query answering. In: Proceedings of the 22nd International Conference on Database and Expert Systems Applications (2011)Google Scholar
  7. 7.
    Dabringer, C., Eder, J.: Towards adaptive distributed Top-k query processing. In: Ivanović, M., et al. (eds.) ADBIS 2016. CCIS, vol. 637, pp. 37–44. Springer, Cham (2016). doi: 10.1007/978-3-319-44066-8_4 CrossRefGoogle Scholar
  8. 8.
    Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03722-1_7 CrossRefGoogle Scholar
  9. 9.
    Eder, J., Frank, H., Liebhart, W.: Optimization of object-oriented queries by inverse methods. In: Eder, J., Kalinichenko, L.A. (eds.) East/West Database Workshop. Workshops in Computing, pp. 109–121. Springer, London (1995)Google Scholar
  10. 10.
    Eder, J., Gottweis, H., Zatloukal, K.: It solutions for privacy protection in biobanking. Public Health Genomics 15(5), 254–262 (2012)CrossRefGoogle Scholar
  11. 11.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM Symposium on Principles of Database Systems, pp. 102–113. ACM, New York (2001)Google Scholar
  12. 12.
    Guntzer, U., Balke, W.-T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  13. 13.
    Guntzer, U., Balke, W.-T., Kiessling, W., Guntzer, U., Balke, W.-T., Kiessling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE International Conference on IT: Coding and Computing, pp. 622–628 (2001)Google Scholar
  14. 14.
    Hofer-Picout, P., Pichler, H., Eder, J., Neururer, S.B., Müller, H., Reihs, R., Holub, P., Insam, T., Goebel, G.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332-340 (2017)Google Scholar
  15. 15.
    Hristidis, V., Hu, Y., Ipeirotis, P.G.: Ranked queries over sources with boolean query interfaces without ranking support. In: 26th IEEE International Conference on Data Engineering (2010)Google Scholar
  16. 16.
    Hua, M., Pei, J., Fu, A.W.C., Lin, X., Leung, H.-F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of the 33rd International Conference on Very Large Databases, pp. 890–901. VLDB Endowment (2007)Google Scholar
  17. 17.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)CrossRefGoogle Scholar
  18. 18.
    Lesot, M., Rifqi, M., Benhadda, H.: Similarity measures for binary and numerical data. Int. J. Knowl. Eng. Soft Data Paradigm. 1, 63–84 (2009)Google Scholar
  19. 19.
    Levandoski, J.J., Mokbel, M.F., Khalefa, M.E., Korukanti, V.R.: Flexpref: a framework for extensible preference evaluation in database systems. In: ICDE, New York, NY, USA (2010)Google Scholar
  20. 20.
    Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)CrossRefGoogle Scholar
  21. 21.
    Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)CrossRefGoogle Scholar
  22. 22.
    Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)Google Scholar
  23. 23.
    Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. J. Documentation 60, 503–520 (2004)CrossRefGoogle Scholar
  24. 24.
    Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 132–142 (1988)Google Scholar
  25. 25.
    Wichmann, H.-E., Kuhn, K., et al.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Informatics-SystemsAlpen-Adria Universität KlagenfurtKlagenfurtAustria

Personalised recommendations