AQUAGP: Approximate QUery Answers Using Genetic Programming

  • Jason B. Peltzer
  • Ankur M. Teredesai
  • Garrett Reinard
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3905)


Speed, cost, and accuracy are crucial performance parameters while evaluating the quality of information and query retrieval within any Database Management System. For some queries it may be possible to derive a similar result set using an approximate query answering algorithm or tool when the perfect/exact results are not required. Query approximation becomes useful when the following conditions are true: (a) a high percentage of the relevant data is retrieved correctly, (b) irrelevant or extra data is minimized, and (c) an approximate answer (if available) results in significant (notable) savings in terms of the overall query cost and retrieval time. In this paper we discuss a novel approach for approximate query answering using Genetic Programming (GP) paradigms. We have developed an evolutionary computing based query space exploration framework which, given an input query and the database schema, uses tree-based GP to generate and evaluate approximate query candidates, automatically. We highlight and discuss various avenues of exploration and evaluate the success of our experiments based on the speed, cost, and accuracy of the results retrieved by the re-formulated (GP generated) queries and present the results on a variety of query types for TPC-benchmark and PKDD-benchmark datasets.


Parse Tree Optimal Query Original Query Query Plan Input Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: The aqua approximate query answering system. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of data, pp. 574–576 (1999)Google Scholar
  2. 2.
    Chaudhuri, S.: An overview of query optimization in relational systems. In: Symposium on Principles of Database Systems Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 34–43 (1998)Google Scholar
  3. 3.
    Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–169 (1993)CrossRefGoogle Scholar
  4. 4.
    Hellerstein, J.M.: Optimization techniques for queries with expensive methods. ACM Trans. Database Syst. 23(2), 113–157 (1998)CrossRefGoogle Scholar
  5. 5.
    Jarke, M., Koch, J.: Query optimization in database systems. ACM Comput. Surv. 16(2), 111–152 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Pham, H.T.A., Sevcik, K.C.: Structure choices for two-dimensional histogram construction. In: Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research, pp. 13–27 (2004)Google Scholar
  7. 7.
    Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. The VLDB Journal 6(3), 191–208 (1997)CrossRefGoogle Scholar
  8. 8.
    Stillger, M., Spiliopoulou, M.: Genetic programming in database query optimization. In: Koza, J.R., Goldberg, D.E., Fogel, D.B., Riolo, R.L. (eds.) Genetic Programming 1996: Proceedings of the First Annual Conference, Stanford University, CA, USA, pp. 388–393. MIT Press, Cambridge (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jason B. Peltzer
    • 1
  • Ankur M. Teredesai
    • 1
  • Garrett Reinard
    • 1
  1. 1.Department of Computer ScienceRochester Institute of Technology (RIT)RochesterUSA

Personalised recommendations