Skip to main content

Schema derivation for WWW information sources and their integration with databases in Bioinformatics

  • Regular Papers
  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1475))

Abstract

In this paper we discuss first experiences and results of current work on the BioBench, an integrated information system for Bioinformatics. Since the major part of Bioinformatic data is distributed in many heterogeneous systems all over the world one has to deal with problems of integration of heterogeneous systems. Especially semi-structured data, presented via WWW-interfaces has to be integrated. Therefore, we focus on the aspects of acquisition, integration and management of the data for the BioBench. First we give a short motivation of the project and an overview of the system. In the main follows a discussion of schema derivation for the WWW-interfaces. Thereby, we discuss the application of domain knowledge and automatic grammar generation. Finally we briefly describe an automatic wrapper generation approach, supporting high quality wrappers as well as wrapper modification according to local schema or format evolutions.

This research was partially supported by the German State Sachsen-Anhalt under FKZ: 1987A/0025 and 1987/2527B and the Kurt-Eberhard-Bode-Foundation under FKZ: T 122/4

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul, S. Cluet, and T. Milo. Querying and Updating the File. In Proc. of the 19th VLDB Conference, pages 73–84, Dublin, Ireland, August 1993.

    Google Scholar 

  2. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J.L. Wiener. The Lorel query language for semistructured data. Int. Journal on Digital Libraries, 1(1):68–88, 1997.

    Google Scholar 

  3. H. Ahonen, H. Mannila, and N. Nikunen. Generating Grammars for SGML Tagged Texts Lacking DTD. In Proc. of PODP’94 — Workshop on Principles of Document Processing, 1994.

    Google Scholar 

  4. N. Ashish and C. Knoblock. Wrapper Generation for Semi-structured Internet Sources. ACM SIGMOD Record, 26(4):8–15, December 1997.

    Article  Google Scholar 

  5. S. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. D. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heteregenous Information Sources. In Proc. of IPSI Conf., 1994.

    Google Scholar 

  6. S. Conrad, M. Höding, G. Saake, I. Schmitt, and C. Türker. Schema Integration with Integrity Constraints. In C. Small, P. Douglas, R. Johnson, P. King, and N. Martin, editors, Advances in Databases, 15th British National Conf. on Databases, BNCOD 15, London, UK, July 1997, volume 1271 of Lecture Notes in Computer Science, pages 200–214, Berlin, 1997. Springer-Verlag.

    Google Scholar 

  7. Database Architecture Framework Task Group (DAFTG) of the ANSI/X3/SPARC Database System Study Group. Reference Model for DBMS Standardization. ACM SIGMOD Record, 15(1):19–58, March 1986.

    Article  Google Scholar 

  8. A. Ebert. Enhancement of the ODMG Data Definition Language for the Integration of Files into Database Federations (In German). Master’s thesis, University of Magdeburg, Faculty of Computer Sciences, September 1997.

    Google Scholar 

  9. S. Goto, H. Bono, H. Ogata, T. Fujibuchi, T. Nishioka, K. Sato, and M. Kanehisa. Organizing and Computing Metabolic Pathway Data in Terms of Binary Relations. In R. B. Altman, A. K. Dunker, L. Hunter, and T. E. Klein, editors, Pacific Symposium on Biocomputing ’97, pages 175–186. Singapore et al: World Scientific, 1997.

    Google Scholar 

  10. Hammer, J. and Garcia-Molina, H. and Nestorov, S. and Yerneni, R. and Breunig, M. and Vassalos, V. Template-Based Wrappers in the TSIMMIS System. In J. M. Peckman, editor, Proc. of the 1997 ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, USA, volume 26 of ACM SIGMOD Record, pages 532–535. ACM Press, June 1997.

    Google Scholar 

  11. M. Höding. An Approach to Integration of File Based Systems into Database Federations. In Heterogeneous Information Management, Prague, Czech Republic, 4–5 November 1996, Proc. of the 10th ERCIM Database Research Group Workshop, pages 61–71. ERCIM-96-W003, European Research Consortium for Informatics and Mathematics, 1996.

    Google Scholar 

  12. R. Hofestädt and F. Meinecke. Interactive Modelling and Simulation of Biochemical Networks. Computers in Biology and Medicine, 25(3):321–334, 1995.

    Article  Google Scholar 

  13. R. Hofestädt and U. Scholz. Information Processing for the Analysis of Metabolic Pathways and Inborn Errors. In Biosystems, 1998. im Druck.

    Google Scholar 

  14. M. Kanehisa. Toward pathway engineering: a new database of genetic and molecular pathways. Science & Technology Japan, 59:34–38, 1996.

    Google Scholar 

  15. B. Rieche and K. R. Dittrich. A Federated DBMS-Based Integrated Environment for Molecular Biology. In J. C. French and H. Hinterberger, editors, Proc. of Seventh International Working Conference on Scientific and Statistical Database Management, pages 118–127, Charlottesville, USA, September 1994. IEEE Computer Society Press.

    Google Scholar 

  16. I. Schmitt, A. Ebert, M. Höding, and C. Türker. SIGMA Bench — A Tool-Kit for the Design of Federated Database System (In German). In W. Hasselbring, editor, Kurzfassungen zum 2. Workshops “Föderierte Datenbanken”, Dortmund, 12.–13. Dezember 1996, number 90, pages 19–26. Fachbereich Informatik, Universität Dortmund, 1996.

    Google Scholar 

  17. I. Schmitt and G. Saake. Integration of Inheritance Trees as Part of View Generation for Database Federations. In B. Thalheim, editor, Conceptual Modelling — ER’96, Proc. of the 15th Int. Conf., Cottbus, Germany, October 1996, volume 1157 of Lecture Notes in Computer Science, pages 195–210, Berlin, 1996. Springer-Verlag.

    Google Scholar 

  18. B. Schroeder. Concepts for Schema Extraction from File for the Integration in Database Federations (In German). Master’s thesis, University of Magdeburg, Faculty of Computer Sciences, September 1997.

    Google Scholar 

  19. A. P. Sheth and J. A. Larson. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3):183–236, September 1990.

    Article  Google Scholar 

  20. P. J. Stoehr and G. N. Cameron. The embl data library. Nucleic Acids Research, 19, 1991.

    Google Scholar 

  21. Suciu, D. Management of Semistructured Data. ACM SIGMOD Record, 26(4):4–7, December 1997.

    Article  Google Scholar 

  22. G. Wiederhold. Mediators in the Architecture of Future Information Systems. IEEE Computer, 25(3):38–49, March 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Witold Litwin Tadeusz Morzy Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Höding, M., Hofestädt, R., Saake, G., Scholz, U. (1998). Schema derivation for WWW information sources and their integration with databases in Bioinformatics. In: Litwin, W., Morzy, T., Vossen, G. (eds) Advances in Databases and Information Systems. ADBIS 1998. Lecture Notes in Computer Science, vol 1475. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0057742

Download citation

  • DOI: https://doi.org/10.1007/BFb0057742

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64924-3

  • Online ISBN: 978-3-540-68309-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics