Abstract
In this paper we discuss first experiences and results of current work on the BioBench, an integrated information system for Bioinformatics. Since the major part of Bioinformatic data is distributed in many heterogeneous systems all over the world one has to deal with problems of integration of heterogeneous systems. Especially semi-structured data, presented via WWW-interfaces has to be integrated. Therefore, we focus on the aspects of acquisition, integration and management of the data for the BioBench. First we give a short motivation of the project and an overview of the system. In the main follows a discussion of schema derivation for the WWW-interfaces. Thereby, we discuss the application of domain knowledge and automatic grammar generation. Finally we briefly describe an automatic wrapper generation approach, supporting high quality wrappers as well as wrapper modification according to local schema or format evolutions.
This research was partially supported by the German State Sachsen-Anhalt under FKZ: 1987A/0025 and 1987/2527B and the Kurt-Eberhard-Bode-Foundation under FKZ: T 122/4
Preview
Unable to display preview. Download preview PDF.
References
S. Abiteboul, S. Cluet, and T. Milo. Querying and Updating the File. In Proc. of the 19th VLDB Conference, pages 73–84, Dublin, Ireland, August 1993.
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J.L. Wiener. The Lorel query language for semistructured data. Int. Journal on Digital Libraries, 1(1):68–88, 1997.
H. Ahonen, H. Mannila, and N. Nikunen. Generating Grammars for SGML Tagged Texts Lacking DTD. In Proc. of PODP’94 — Workshop on Principles of Document Processing, 1994.
N. Ashish and C. Knoblock. Wrapper Generation for Semi-structured Internet Sources. ACM SIGMOD Record, 26(4):8–15, December 1997.
S. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. D. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heteregenous Information Sources. In Proc. of IPSI Conf., 1994.
S. Conrad, M. Höding, G. Saake, I. Schmitt, and C. Türker. Schema Integration with Integrity Constraints. In C. Small, P. Douglas, R. Johnson, P. King, and N. Martin, editors, Advances in Databases, 15th British National Conf. on Databases, BNCOD 15, London, UK, July 1997, volume 1271 of Lecture Notes in Computer Science, pages 200–214, Berlin, 1997. Springer-Verlag.
Database Architecture Framework Task Group (DAFTG) of the ANSI/X3/SPARC Database System Study Group. Reference Model for DBMS Standardization. ACM SIGMOD Record, 15(1):19–58, March 1986.
A. Ebert. Enhancement of the ODMG Data Definition Language for the Integration of Files into Database Federations (In German). Master’s thesis, University of Magdeburg, Faculty of Computer Sciences, September 1997.
S. Goto, H. Bono, H. Ogata, T. Fujibuchi, T. Nishioka, K. Sato, and M. Kanehisa. Organizing and Computing Metabolic Pathway Data in Terms of Binary Relations. In R. B. Altman, A. K. Dunker, L. Hunter, and T. E. Klein, editors, Pacific Symposium on Biocomputing ’97, pages 175–186. Singapore et al: World Scientific, 1997.
Hammer, J. and Garcia-Molina, H. and Nestorov, S. and Yerneni, R. and Breunig, M. and Vassalos, V. Template-Based Wrappers in the TSIMMIS System. In J. M. Peckman, editor, Proc. of the 1997 ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, USA, volume 26 of ACM SIGMOD Record, pages 532–535. ACM Press, June 1997.
M. Höding. An Approach to Integration of File Based Systems into Database Federations. In Heterogeneous Information Management, Prague, Czech Republic, 4–5 November 1996, Proc. of the 10th ERCIM Database Research Group Workshop, pages 61–71. ERCIM-96-W003, European Research Consortium for Informatics and Mathematics, 1996.
R. Hofestädt and F. Meinecke. Interactive Modelling and Simulation of Biochemical Networks. Computers in Biology and Medicine, 25(3):321–334, 1995.
R. Hofestädt and U. Scholz. Information Processing for the Analysis of Metabolic Pathways and Inborn Errors. In Biosystems, 1998. im Druck.
M. Kanehisa. Toward pathway engineering: a new database of genetic and molecular pathways. Science & Technology Japan, 59:34–38, 1996.
B. Rieche and K. R. Dittrich. A Federated DBMS-Based Integrated Environment for Molecular Biology. In J. C. French and H. Hinterberger, editors, Proc. of Seventh International Working Conference on Scientific and Statistical Database Management, pages 118–127, Charlottesville, USA, September 1994. IEEE Computer Society Press.
I. Schmitt, A. Ebert, M. Höding, and C. Türker. SIGMA Bench — A Tool-Kit for the Design of Federated Database System (In German). In W. Hasselbring, editor, Kurzfassungen zum 2. Workshops “Föderierte Datenbanken”, Dortmund, 12.–13. Dezember 1996, number 90, pages 19–26. Fachbereich Informatik, Universität Dortmund, 1996.
I. Schmitt and G. Saake. Integration of Inheritance Trees as Part of View Generation for Database Federations. In B. Thalheim, editor, Conceptual Modelling — ER’96, Proc. of the 15th Int. Conf., Cottbus, Germany, October 1996, volume 1157 of Lecture Notes in Computer Science, pages 195–210, Berlin, 1996. Springer-Verlag.
B. Schroeder. Concepts for Schema Extraction from File for the Integration in Database Federations (In German). Master’s thesis, University of Magdeburg, Faculty of Computer Sciences, September 1997.
A. P. Sheth and J. A. Larson. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3):183–236, September 1990.
P. J. Stoehr and G. N. Cameron. The embl data library. Nucleic Acids Research, 19, 1991.
Suciu, D. Management of Semistructured Data. ACM SIGMOD Record, 26(4):4–7, December 1997.
G. Wiederhold. Mediators in the Architecture of Future Information Systems. IEEE Computer, 25(3):38–49, March 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Höding, M., Hofestädt, R., Saake, G., Scholz, U. (1998). Schema derivation for WWW information sources and their integration with databases in Bioinformatics. In: Litwin, W., Morzy, T., Vossen, G. (eds) Advances in Databases and Information Systems. ADBIS 1998. Lecture Notes in Computer Science, vol 1475. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0057742
Download citation
DOI: https://doi.org/10.1007/BFb0057742
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64924-3
Online ISBN: 978-3-540-68309-4
eBook Packages: Springer Book Archive