Skip to main content

On the Integration of a Large Number of Life Science Web Databases

  • Conference paper
Book cover Data Integration in the Life Sciences (DILS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2994))

Included in the following conference series:

Abstract

The integration of life science web databases is an important research subject that has an impact on the rate at which new biological discoveries are made. However, addressing the interoperability of life science databases presents serious challenges, particularly when the databases are accessed through their web interfaces. Some of these challenges include the fact that life science databases are numerous and their access interface may change often. This paper proposes techniques that take into account these challenges and shows how these techniques were implemented in the context of BACIIS, a federation of life science web databases.

This work is supported in part by NSF CAREER DBI-0133946 and NSF DBI-0110854

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent Access to Multiple Bioinformatics Information Sources. IBM Systems Journal 40(2), 532–552 (2001)

    Article  Google Scholar 

  2. Zdobnov, E.M., Lopez, R., Apweiler, R., Etzold, T.: The EBI SRS server-recent developments. Bioinformatics 18, 368–373 (2002)

    Article  Google Scholar 

  3. Davidson, S.B., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, C., Stoeckert, C.: K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM Systems Journal 40(2) (2001)

    Google Scholar 

  4. Haas, L.M., Rice, J.E., Schwarz, P.M., Swops, W.C., Kodali, P., Kotlar, E.: DiscoveryLink: A system for integrated access to life sciences data sources. IBM System Journal 40(2) (2001)

    Google Scholar 

  5. McEntyre, J.: Linking up with Entrez. Trends Genet. 14(1), 39–40 (1998)

    Article  Google Scholar 

  6. Ben Miled, Z., Li, N., Kellett, G., Sipes, B., Bukhres, O.: Complex Life Science Multidatabase Queries. Proceedings of the IEEE 90(11) (2002)

    Google Scholar 

  7. Peim, M., Franconi, E., Paton, N.W., Goble, C.A.: Query Processing with Description Logic Ontologies Over Object-Wrapped Databases. In: Proc. 14th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 27–36. IEEE Computer Society, Los Alamitos (2002)

    Chapter  Google Scholar 

  8. Zdobnov, E.M., Lopez, R., Apweiler, R., Etzold, T.: The EBI SRS server - new features. Bioinformatics 18(8), 1149–1150 (2002)

    Article  Google Scholar 

  9. Toner, B.: Rise of the Middle Class: Integration Vendors Differentiate Range of ’N-Tier’ Offerings. Bioinform Online, http://www.bioinform.com 6(16) (2002)

  10. Wong, L.: Kleisli, a Functional Query System. Journal of Functional Programming 10(1), 19–56 (2000)

    Article  Google Scholar 

  11. Paton, N.W., Stevens, R., Baker, P.G., Goble, C.A., Bechhofer, S., Brass, A.: Query Processing in the TAMBIS Bioinformatics Source Integration System. In: Proc. 11th Int. Conf. on Scientific and Statistical Databases (SSDBM), pp. 138–147. IEEE Press, Los Alamitos (1999)

    Chapter  Google Scholar 

  12. Davidson, S.B., Overton, C., Tanen, V., Wong, L.: BioKleisli: A Digital Library for biomedical Researchers. Journal of Digital Libraries 1(1), 36–53 (1997)

    Google Scholar 

  13. Ben Miled, Z., Wang, Y., Li, N., Bukhres, O., Martin, J., Nayar, A., Oppelt, R.: BAO, A Biological and Chemical Ontology For Information Integration. Online Journal Bioinformatics 1, 60–73 (2002)

    Google Scholar 

  14. Baxevanis, A.D.: The Molecular Biology Database Collection: 2003 update. Nucleic Acids Res. 31(1), 1–12 (2003)

    Article  Google Scholar 

  15. Ben Miled, Z., Li, N., Kellett, G., Sipes, B., Bukhres, O.: Complex Life Science Multidatabase Queries. Proceedings of the IEEE 90(11) (2002)

    Google Scholar 

  16. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Rapp, B.A., Wheeler, D.L.: GenBank. Nucleic Acids Research 30(1), 17–20 (2002)

    Article  Google Scholar 

  17. O’Donovan, C., Martin, M.J., Gattiker, A., Gasteiger, E., Bairoch, A., Pweiler, R.: High-quality protein knowledge resource: SWISS-PROT and TrEMBL Brief. Bioinform 3, 275–284 (2002)

    Google Scholar 

  18. Wu, C.H., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Ledley, R.S., Lewis, K.C., Mewes, H., Orcutt, B.C., Suzek, B.E., Tsugita, A., Vinayaka, C.R., Yeh, L.L., Zhang, J., Barker, W.C.: The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Research 30, 35–37 (2002)

    Article  Google Scholar 

  19. Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucleic Acids Research 30, 235–238 (2002)

    Article  Google Scholar 

  20. Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Research 28, 304–305 (2000)

    Article  Google Scholar 

  21. Westbrook, J., Feng, Z., Chen, L., Yang, H., Berman, H.: The Protein Data Bank and structural genomics. Nucleic Acids Research 31, 489–491 (2003)

    Article  Google Scholar 

  22. Hamosh, A., Scott, A., Amberger, J., Bocchini, C., Valle, D., McKusick, V.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 30, 52–55 (2002)

    Article  Google Scholar 

  23. Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breuning, M., Vassalos, V.: Template-Based Wrappers in the TSIMMIS System. In: Proceedings of 23rd ACM SIGMOD International Conference on Management of Data, pp. 532–535 (1997)

    Google Scholar 

  24. Crescenzi, V., Mecca, G., Merialdo, P.: ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites. The VLDB Journal, 109–118 (2001)

    Google Scholar 

  25. Knobloc, C., Lerman, K., Minton, S., Muslea, I.: Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23(4), 33–41 (2000)

    Google Scholar 

  26. Soderland, S.: Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning 34(1-3), 233–272 (1999)

    Article  MATH  Google Scholar 

  27. Califf, M.E., Mooney, R.J.: Relational Learning of Pattern-Match Rules for Information Extraction. In: Proceedings of AAAI Spring Symposium, vol. 6-11 (1996)

    Google Scholar 

  28. Cohen, W.: Text categorization and relational learning. In: Proceedings of the 12th International Conference on Machine Learning, pp. 124–132 (1995)

    Google Scholar 

  29. Kushmerick, N., Weld, D.S., Doorenbos, R.: Wrapper Induction for Information Extraction. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 729–737 (1997)

    Google Scholar 

  30. Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A.: Extracting Semistructured Information from the Web. In: Proceedings of the 1st Workshop on Management for Semistructured Data, pp. 18–25 (1997)

    Google Scholar 

  31. Huck, G., Frankhausewr, P., Aberer, K., Neuhold, E.: Jedi: Extracting and Synthesizing Information from the Web. In: Proceedings of Conference on Cooperative Information Systems, pp. 32–43 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ben Miled, Z., Li, N., Liu, Y., He, Y., Lynch, E., Bukhres, O. (2004). On the Integration of a Large Number of Life Science Web Databases. In: Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2004. Lecture Notes in Computer Science(), vol 2994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24745-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24745-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21300-0

  • Online ISBN: 978-3-540-24745-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics