Advertisement

Journal of Intelligent Information Systems

, Volume 23, Issue 2, pp 145–178 | Cite as

Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository

  • Zina Ben Miled
  • Jin Liu
  • Omran Bukhres
  • Huian Li
  • Jesse Martin
  • Chavali Balagopalakrishna
  • Robert Oppelt
Article

Abstract

Scientific databases, and in particular chemical and biological databases, have reached massive sizes in recent years due to the improvement of bench-side high throughput screening tools used by scientists. This rapid increase has caused a shift in the bottleneck in discovery and product development from the bench side to the computational side, thus, creating a need for new computational tools that can facilitate the access and interpretation of such massive data.

This paper discusses the design and implementation of the computation of a histogram to speed up access to large pharmaceutical databases. As opposed to traditional histograms in which approximate value distributions is obtained by grouping attribute values into buckets, the computation histogram proposed in this paper records the retrieval time and the calculation time of descriptors in a pharmaceutical drug candidate database. Both on-line and off-line update techniques are proposed to update the computation histogram so that an efficient query plan can be generated.

The efficiency of the proposed computation histogram is demonstrated by using a drug candidate database which is used in the pharmaceutical drug discovery process. The histogram allows the result of a query to be either computed using a computational algorithm or retrieved from the database. In addition to the pharmaceutical drug candidate database, the proposed approach is applicable to other scientific databases such as biological and agroscience databases.

database drug candidate histogram drug discovery query planning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aboulnaga, A. and Chaudhuri, S. (1999). Self-Tuning Histograms: Building HistogramsWithout Looking at Data. In Proceedings of the ACM SIGMOD Conference.Google Scholar
  2. Baru, C.K., et al. (1995). DB2 Parallel Edition. IBM Systems Journal, 34(2), 292–322.Google Scholar
  3. Ben Miled, Z., Zaitsev, A., Bukhres, O., Bem, M., Jones, R., and Opplet, R. (2000a). Efficient Data Representation in Very Large Datebases: A Case Study of a Pharmaceutical Data Repository. In International Conference on Computers and their Applications.Google Scholar
  4. Ben Miled, Z., Liu, Y., Powers, D., Bukhres, O., Bem, M., Jones, R., Opplet, R., and Milosevich, S. (2000b). Data Access Performance in a Large and Dynamic Pharmaceutical Drug Candidate Database. IEEE Supercomputing.Google Scholar
  5. Chen, C.M. and Roussopoulos, N. (1994). Adaptive Selectivity Estimation Using Query Feedback. In Proceedings of the ACM SIGMOD Conference (pp. 161–172).Google Scholar
  6. Daylight Chemical Information Systems, Inc. SMILES Tutorial. Available at http://www.daylight.com/ dayhtml/smiles/smiles-intro.html.Google Scholar
  7. Elmasri, R. and Navathe, S.B. (2000). Fundamentals of Database System. McGraw Hill.Google Scholar
  8. Gibbons, P.B. and Matias, Y. (1998). New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In Proceedings of the ACM SIGMOD Conference.Google Scholar
  9. Hallmark, G. (1997). Oracle ParallelWarehouse Server. IEEE Transactions on Knowledge and Data Engineering, 314–320.Google Scholar
  10. Informix. Informix Extended Parallel Server 8.3. Available at http://www.informix.com/xps/.Google Scholar
  11. Ioannidis, Y. and Poosala, V. (1995). Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In Proceedings of the ACM SIGMOD Conference (pp. 233–244).Google Scholar
  12. Kabra, N. and DeWitt, D.J. (1998). Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. In Proceedings of the ACM SIGMOD Conference (pp. 106–117).Google Scholar
  13. Kooi, R.P. (1980). The Optimization of Queries in Relational Databases. PhD thesis. Case Western Reserver University.Google Scholar
  14. Liu, Y., Ben Miled, Z., Bukhres, O., Bem, M., Jones, R., and Oppelt, R. (2000). Efficient Schema Design for a Pharmaceutical Data Repository. In IEEE symposium on Computer Based Medical Systems.Google Scholar
  15. Locke, P. (1999). Oracle Call Interface: Programmer's Guide. Oracle Corporation.Google Scholar
  16. Matias, Y., Vitter, J.S., and Wang, M. (1998).Wavelet-Based Histograms for Selectivity Estimation. In Proceedings of the ACM SIGMOD Conference (pp. 448–459).Google Scholar
  17. Microsoft. Microsoft SQL Server. Available at http://www.microsoft.com/sql/.Google Scholar
  18. NCR. Teradata Database. Available at http://www.teradata.com/ter/.Google Scholar
  19. Oracle. Oracle9i Release. Available at http://technet.oracle.com/docs/products/oracle9i/doc index.htm.Google Scholar
  20. Poosala, V. (1997). Histogram-Based Estimation Techniques in Database Systems. PhD thesis. University of Wisconsin-Madison.Google Scholar
  21. Shapiro, G.P. and Connell, C. (1984). Accurate Estimation of the Number of Tuples Satisfying a Condition. In Proceedings of the ACM SIGMOD Conference (pp. 256–276).Google Scholar
  22. Stonebraker, M., et al. (1976). The Design and Implementation of INGRES. ACM Transactions on Database Systems, 1(3), 189–222.CrossRefGoogle Scholar
  23. Sybase. Available at http://www.sybase.com/products/databaseservers/asiq/.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Zina Ben Miled
    • 1
  • Jin Liu
    • 2
  • Omran Bukhres
    • 2
  • Huian Li
    • 2
  • Jesse Martin
    • 3
  • Chavali Balagopalakrishna
    • 3
  • Robert Oppelt
    • 3
  1. 1.Electrical & Computer Engineering, School of Eng. & Tech.Indiana University Purdue UniversityIndianapolisUSA
  2. 2.Computer & Information Science, School of ScienceIndiana University Purdue UniversityIndianapolisUSA
  3. 3.Eli Lilly & CompanyIndianapolisUSA

Personalised recommendations