A Survey on Data Integration in Bioinformatics

  • Cheo Thiam Yui
  • Lim Jun Liang
  • Wong Jik Soon
  • Wahidah Husain
Part of the Communications in Computer and Information Science book series (CCIS, volume 254)


The need for data integration is widely acknowledged in bioinformatics. There are several huge biological databanks now available across the world in different formats. To characterize or apply data mapping between several data sources requires integration of all related data fields. The problem of integration may be addressed using a variety of approaches; some are widely used and some are less so, having failed to achieve the basic requirements of data integration. In this paper, we discuss three techniques for data integration: the federated database system approach, the data warehousing approach and the link-driven approach. While each approach has its strengths and weaknesses, it is important to identify which approach is best suited to a given user’s needs. We also discuss some database systems which use these three different approaches to solving the problem of data integration.


Data warehouse Link-driven approach Federated databases Data integration Bioinformatics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lacroix, Z., Critchlow, T.: Bioinformatics Managing Scientific Data. Morgan Kaufman Publishers (2003)Google Scholar
  2. 2.
    Eckman, B.A., Lacroix, Z., Raschid, L.: Optimized Seamless Integration of Biomolecular Data. In: Bioinformatics and Bioengineering Conference, Proceedings of the IEEE 2nd International Symposium, pp. 23–32 (2001)Google Scholar
  3. 3.
    Hernandez, T., Kambhampati, Z.: Integration of Biological Sources: Current Systems and Challenges Ahead. SIGMOD Record 33(3) (2004)Google Scholar
  4. 4.
    Stevens, R., Paton, N.W., Baker, P., Ng, G., Goble, C.A., Bechhofer, S., Brass, A.: TAMBIS Online: A Bioinformatics Source Integration Tool. In: Eleventh International Conference on Scientific and Statistical Database Management,1999, p. 280 (1999)Google Scholar
  5. 5.
    Yan, L., Vincent, S., Murphy, M.C.: Integrating Bioinformatics Data Sources over the SFSU ER Design Tools XML Databus. ACM International Conference Proceeding Series, vol. 155(19) (2006)Google Scholar
  6. 6.
    Wong, L.S.: Technologies for Integrating Biological Data. Laboratories for Information Technology 3, 389–404 (2002)Google Scholar
  7. 7.
    Davidson, S.B., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, C., Toeckert, C.: K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM System Joural. Deep Computing for the Life Sciences 40(31), 512–531 (2001)Google Scholar
  8. 8.
    Kirsten, T., Lange, J., Rahm, E.: An Integrated Platform for Analyzing Molecular-Biological Data Within Clinical Studies. In: Grust, T., Höpfner, H., Illarramendi, A., Jablonski, S., Fischer, F., Müller, S., Patranjan, P.-L., Sattler, K.-U., Spiliopoulou, M., Wijsen, J. (eds.) EDBT 2006. LNCS, vol. 4254, pp. 399–410. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Thuraisingham, B., Iyer, S.: Extended RBAC–Based Design and Implementation for a Secure Data Warehouse. In: The Second International Conference on Availability, Reliability, and Security (ARES), pp. 821–828 (2007)Google Scholar
  10. 10.
    Robert, M.R.: Bringing the Data Mart into the Curriculum. In: ACM-SE 38: Proceedings of the 38th Annual on Southeast Regional Conference, pp. 129–134 (2000)Google Scholar
  11. 11.
    Richard, M.C.: How Federated Databases Benefit Bioinformatics Research, http://www.b-eye-network.com/view/2164
  12. 12.
    Amit, S.P., James, A.L.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys (CSUR) 22(3), 7–23 (1990)Google Scholar
  13. 13.
    Muilu, J., Peltonen, L., Litton, J.-E.: The Federated Database – A Basis for Biobank-Basedpost-Genome Studies, Integrating Phenome and Genome Data from 600 000 Twin Pairs in Europe. European Journal of Human Genetics, 1–6 (2007)Google Scholar
  14. 14.
    Davidson, S., Overton, C., Buneman, P.: Challenges in Integrating Biological Data Source. Journal of Computational Biology 2(4), 557–572 (1995)CrossRefGoogle Scholar
  15. 15.
    Friedman, M., Levy, A., Millstein, T.: Navigational Plans For Data Integration. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 67–73 (1999)Google Scholar
  16. 16.
    The Computational Biology and Informatics Laboratory. AllGenes: A Website Providing Access to an Integrated Database of Known and Predicted Human and Mouse Genes. Center for Bioinformatics, University of Pennsylvania (2004), http://www.allgenes.org
  17. 17.
    Information U.S. National Library of Medicine National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Cheo Thiam Yui
    • 1
  • Lim Jun Liang
    • 1
  • Wong Jik Soon
    • 1
  • Wahidah Husain
    • 1
  1. 1.School of Computer SciencesUniversiti Sains MalaysiaMindenMalaysia

Personalised recommendations