Advertisement

Fast Distributed Top-q and Top-k Query Processing

  • Claus Dabringer
  • Johann EderEmail author
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11390)

Abstract

Top-k queries retrieve the k results of a query which score best for an objective function representing the preferences of users. To require that the returned results also have to satisfy the preferences to a certain degree we introduce top-q queries which return all results which approximate the user preferences to at least some minim degree q. We show how top-q queries and top-k queries can be combined enabling the user to post a large number of interesting queries. Furthermore, we show that the calculation of top-q queries can be integrated in algorithms efficiently processing top-k queries. We implemented our approach and evaluated it against the fastest threshold based top-k query answering approaches (BPA-2). Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements. Furthermore, we show how such queries can be processed in highly distributed peer-to-peer databases in an efficient way and propose an adaptive algorithm which takes several parameters of the network of databases into account to optimize the processing of distributed top-k queries.

Keywords

Top-q query answering Top-k query answering Approximate querying Result ranking Distributed Top-k queries Adaptive query processing p2p databases 

References

  1. 1.
    UCI Machine Learning Repository, US Census Data 1990 (2012). http://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990)
  2. 2.
    Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003)Google Scholar
  3. 3.
    Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19, 67–86 (2006)CrossRefGoogle Scholar
  4. 4.
    Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd Internatinal Conference on Very Large Databases, pp. 495–506. VLDB Endowment (2007)Google Scholar
  5. 5.
    Asslaber, M., Abuja, P., et al.: The Genome Austria Tissue Bank (GATIB). Pathobiology 74, 251–258 (2007)CrossRefGoogle Scholar
  6. 6.
    Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society (2005)Google Scholar
  7. 7.
    Church, K., Gale, W.: Inverse document frequency (IDF): a measure of deviations from Poisson. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 283–295. Springer, Dordrecht (1999).  https://doi.org/10.1007/978-94-017-2390-9_18CrossRefGoogle Scholar
  8. 8.
    Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with NULL values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-662-49214-7_7CrossRefGoogle Scholar
  9. 9.
    Conner, W., Hwang, S.-W., Nahrstedt, K.: Unified framework for top-k query processing in peer-to-peer networks. Technical report, University of Illinois (2007)Google Scholar
  10. 10.
    Dabringer, C.: Efficient local and distributed query processing in a biomedical environment. Ph.D. thesis, Alpen Adria Universität Klagenfurt (2012)Google Scholar
  11. 11.
    Dabringer, C., Eder, J.: Efficient top-k retrieval for user preference queries. In: Proceedings of the 26th ACM Symposium on Applied Computing (2011)Google Scholar
  12. 12.
    Dabringer, C., Eder, J.: Fast top-k query answering. In: Proceedings of the 22th International Conference on Database and Expert Systems Applications (2011)Google Scholar
  13. 13.
    Dabringer, C., Eder, J.: Towards adaptive distributed top-k query processing. In: Ivanović, M., et al. (eds.) ADBIS 2016. CCIS, vol. 637, pp. 37–44. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-44066-8_4CrossRefGoogle Scholar
  14. 14.
    Dabringer, C., Eder, J.: Fast top-Q and top-K query answering. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 43–63. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-70004-5_3CrossRefGoogle Scholar
  15. 15.
    Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03722-1_7CrossRefGoogle Scholar
  16. 16.
    Eder, J., Frank, H., Liebhart, W.: Optimization of object-oriented queries by inverse methods. In: Eder, J., Kalinichenko, L.A. (eds.) East/West Database Workshop. Springer, LondonI (1995).  https://doi.org/10.1007/978-1-4471-3577-7_8CrossRefGoogle Scholar
  17. 17.
    Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Publ. Health Genom. 15(5), 254–262 (2012)CrossRefGoogle Scholar
  18. 18.
    Eder, J., Koncilia, C., Morzy, T.: A model for a temporal data warehouse. In: Open Enterprise Solutions: Systems, Experiences and Organizations (OES-SEO 2001). Luiss Edizioni (2001)Google Scholar
  19. 19.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 2001 ACM Symposium on Principles of Database Systems, pp. 102–113. ACM, New York (2001)Google Scholar
  20. 20.
    Fang, Q., Yang, G.: Efficient top-k query processing algorithms in highly distributed environments. J. Comput. 9(9), 2000–2006 (2014)CrossRefGoogle Scholar
  21. 21.
    Fang, Q., Zhao, Y., Yang, G., Wang, B., Zheng, W.: Best position algorithms for top-k query processing in highly distributed environments. In: Proceedings of the 2010 First International Conference on Networking and Distributed Computing, ICNDC 2010, pp. 397–401. IEEE Computer Society, Washington, DC (2010)Google Scholar
  22. 22.
    Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming, 5th edn. Paperback, Sebastopol (2009)zbMATHGoogle Scholar
  23. 23.
    Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)Google Scholar
  24. 24.
    Guntzer, U., Balke, W.-T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  25. 25.
    Guntzer, U., Balke, W.-T., Kiessling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE International Conference on IT: Coding and Computing, pp. 622–628 (2001)Google Scholar
  26. 26.
    Hagihara, R., Shinohara, M., Hara, T., Nishio, S.: A message processing method for top-k query for traffic reduction in ad hoc networks. In: Proceedings of the Tenth Interenational Conference on Mobile Data Management, MDM 2009, pp. 11–20. IEEE Computer Society (2009)Google Scholar
  27. 27.
    Hofer-Picout, P., et al.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332–340 (2017)CrossRefGoogle Scholar
  28. 28.
    Hristidis, V., Hu, Y., Ipeirotis, P.G.: Ranked queries over sources with Boolean query interfaces without ranking support. In: 26th IEEE International Conference on Data Engineering (2010)Google Scholar
  29. 29.
    Hua, M., Pei, J., Fu, A.W.C., Lin, X., Leung, H.-F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of the 33rd Interenational Conference on Very Large Databases, pp. 890–901. VLDB Endowment (2007)Google Scholar
  30. 30.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)CrossRefGoogle Scholar
  31. 31.
    Levandoski, J.J., Mokbel, M.F., Khalefa, M.E., Korukanti, V.R.: FlexPref: a framework for extensible preference evaluation in database systems. In: ICDE, New York, NY, USA (2010)Google Scholar
  32. 32.
    Litton, J.-E.: Launch of an infrastructure for health research: BBMRI-ERIC. Biopreservation Biobanking 16, 233–241 (2018)CrossRefGoogle Scholar
  33. 33.
    Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)CrossRefGoogle Scholar
  34. 34.
    Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)CrossRefGoogle Scholar
  35. 35.
    Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)Google Scholar
  36. 36.
    Owens, K.T.: Building Intelligent Databases with Oracle PL/SQL, Triggers, and Stored Procedures, 2nd edn. Prentice-Hall Inc., Upper Saddle River (1998)Google Scholar
  37. 37.
    Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. J. Doc. 60, 503–520 (2004)CrossRefGoogle Scholar
  38. 38.
    Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011. LNCS, vol. 6588, pp. 280–295. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20152-3_21CrossRefGoogle Scholar
  39. 39.
    Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. In: Willett, P. (ed.) Document Retrieval Systems, pp. 132–142. Taylor Graham Publishing, London (1988). http://dl.acm.org/citation.cfm?id=106765.106782. ISBN 0-947568-21-2Google Scholar
  40. 40.
    Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Procedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 753–764. ACM (2008)Google Scholar
  41. 41.
    Wichmann, H.-E., Kuhn, K., et al.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Informatics-SystemsAlpen-Adria UniversitätKlagenfurtAustria

Personalised recommendations