Exploring Heterogeneous Molecular Biology Databases in the Context of the Object-Protocol Model
Abstract
Solutions currently promoted for exploring heterogeneous molecular biology databases (MBDs) include providing Web links between MBDs or constructing MBDs consisting of links, physically integrating MBDs into data warehouses, and accessing MBDs using multidatabase query systems. Arguably the most difficult tasks in exploring heterogeneous MBDs are understanding the semantics of component MBDs and their connections, and specifying and interpreting queries expressed over MBDs. However, most existing solutions address only superficially these problems.
We propose a tool-based strategy for exploring heterogeneous MBDs in the context of the Object-Protocol Model (OPM). Our strategy involves developing tools that provide facilities for examining the semantics of MBDs; constructing and maintaining OPM views for MBDs; assembling MBDs into an OPM-based multidatabase system, while documenting MBD schemas and known schema links between MBDs; supporting multidatabase queries via uniform OPM interfaces; and assisting scientists in specifying and interpreting multi-database queries. Each of these tools can be used independently and therefore represents a valuable resource in its own right. We discuss the status of implementing our strategy and our plans in pursuing further this strategy.
Keywords
Query Processing Query Language Data Warehouse Global Schema Common Data ModelPreview
Unable to display preview. Download preview PDF.
References
- 1.Blake, J., et al. Inter-Connection of Biological Databases: Exploring Different Levels of Molecular Biology Database Federation. In Ref. 20.Google Scholar
- 2.Bright, M. W., Hurson, A. R., and Pakzad, H. A Taxonomy and Current Issues in Multidatabase Systems. IEEE Computer, 25(3), pp. 50–59, 1992.CrossRefGoogle Scholar
- 3.Buneman, R, Davidson, S., Hart, K., Overton, C., and Wong, L. A Data Transformation System for Biological Data Sources. In Proc. of the 21st Int. Conference on Very Large Data Bases, pp. 158–169, 1995.Google Scholar
- 4.Chen, I. A., and Markowitz, V. M. An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools. Information Systems, 20(5), pp. 393–418, 1995.CrossRefGoogle Scholar
- 5.Chen, I. A., and Markowitz, V. M., OPM Schema Translator 4.0, Reference Manual, Technical Report LBL-35582 (revised), 1995.Google Scholar
- 6.Chen, I. A., and Markowitz, V. M., Constructing and Maintaining Scientific Database Views, Technical Report LBL-38359, 1996.Google Scholar
- 7.Chen, I. A., Markowitz, V. M., and Szeto, E., The OPM Query Translator, Technical Report LBL-33706, 1995. Available at http://gizmo.lb1.gov/opm.html.Google Scholar
- 8.Chen, I. A., Kosky, A., Markowitz, V. M., and Szeto, E., OPM*QS: The Object-Protocol Model Multi-database Query System, Technical Report LBL-38181, 1995.Google Scholar
- 9.Cozza, S., Reed, E. C., Salit, J., Chang, W., Marr, T. Genome Topographer: A Next Generation Genome Database System (Abstract), presented at the meeting on Genome Mapping and Sequencing, Cold Spring Harbor Laboratory, Cold Spring Harbor, 1994.Google Scholar
- 10.Etzold, T., and Argos, P. SRS, An Indexing and Retrieval Tools for Flat File Data Libraries. Computer Applications of Biosciences, 9,1, pp. 49–57, 1993. See also http://www.embl-heidelberg.de/srs/srsc.Google Scholar
- 11.Etzold, T., and Argos, R. Transforming a Set of Biological Flat File Libraries to a Fast Access Network. Computer Applications of Biosciences,9,1, pp. 58–64, 1993.Google Scholar
- 12.Fasman, K. H., Letovsky, S. I., Cottingham, R. W., and Kingsbury, D. T. Improvements to the GDB Human Genome Data Base. Nucleic Acids Research, Vol. 24, No. 1, pp. 57–63, 1996. See alsohttp://wwwtest.gdb.org/gdb/about.html.PubMedCrossRefGoogle Scholar
- 13.Goto, S., Akiyama, Y., and Kanehisa, M. LinkDB: A Database of Cross Links Between Molecular Biology Databases. In Ref. 20.Google Scholar
- 14.Heimbigner, D., and McLeod, D. A Federated Architecture for Information Management, ACM Transactions on Office Information Systems 3(3), pp. 253–278, 1983.CrossRefGoogle Scholar
- 15.Karp, P., Report of the 1st Meeting on Interconnection of Molecular Biology Databases, Stanford,California,1994;http://www.sri.ai.com/people/pkarp/mimbd/mimbd-94.html.Google Scholar
- 16.Karp, P., A Strategy for Database Interoperation, Journal of Computational Biology, Vol 2, No 4, 1995.CrossRefGoogle Scholar
- 17.Kosky, A., Davidson, S., and Buneman, P. Semantics of Database Transformations. Technical Report MS-CIS-95–25, University of Pennsylvania, 1995.Google Scholar
- 18.Miller, R. J., Ioannidis, Y. E., and Ramakrishnan, R., The Use of Information Capacity in Schema Integration and Translation, Proc. of the 19th International Conference on Very Large Databases,1993, pp. 120–133.Google Scholar
- 19.Ritter, O. The Integrated Genomic Database. In Computational Methods in Genome Research (S. Suhai, ed.), pp. 57–73, Plenum, 1994. See also http://genome.dkfz-heidelberg.de:80/igd/start_igd_doc.html.CrossRefGoogle Scholar
- 20.Second Meeting on Interconnection of Molecular Biology Databases, Cambridge, United Kingdom, 1995, http://www-genome.wi.mit.edu/informatics/abstracts.html.Google Scholar
- 21.Sheth, A. P., and Larson, J. A. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3), pp. 183–236, 1990.CrossRefGoogle Scholar
- 22.Shuler, G. D., Epstein, J. A., Ohkawa, H., Kans, J. A. Entrez. In Methods in Enzymology,(R. Doolittle, ed.). Academic Press, Inc. In press. See also http://www3.ncbi.nim.nih.gov/Entrez/.