Data Integration Systems for Scientific Applications

  • Bastian Roth
  • Bernhard Volz
  • Robin Hecht
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6428)


The integration of data stemming from heterogeneous sources is an issue that has challenged computer science research for years – not to say decades. Therefore, many methods, frameworks and tools were and are still being developed that all promise to solve the integration of data. This work describes those which we think are most promising by relating them to each other. Since our focus is on scientific applications, we consider important properties within this domain such as data provenance. However, aspects like the extensibility of an approach are also considered.


Data Integration Data Fusion Scientific Application Schema Match Data Integration System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    GBIF - Global Biodiversity Information Facilit, (visited: 2010-07-05)
  2. 2.
    Tan, W.-C.: Provenance in Databases: Past Current and future. Bulletin of the Technical Committee on Data Engineering 32, 3–12 (2007)Google Scholar
  3. 3.
    Simmhan, Y.L., Plale, B., Gannon, D.: A Survey of Data Provenance in e-Science. ACM SIGMOD Record 34, 31–36 (2005)CrossRefGoogle Scholar
  4. 4.
    Leser, U., Naumann, F.: Informationsintegration. dpunkt-Verlag, Heidelberg (2006)zbMATHGoogle Scholar
  5. 5.
    Bauer, A., Günzel, H.: Data Warehouse Systeme. dpunkt.verlag, Heidelberg (2008)zbMATHGoogle Scholar
  6. 6.
    Bleiholder, J., Naumann, F.: Data Fusion. ACM Computing Surveys (CSUR) 41, 1–40 (2008)CrossRefGoogle Scholar
  7. 7.
    Hull, R., Zhou, G.: A Framework for Supporting Data Integration using the Materialized and Virtual Approaches. SIGMOD Rec. 25, 481–492 (1996)CrossRefGoogle Scholar
  8. 8.
    Sheth, A.P., Larson, J.A.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys (CSUR) 22, 183–236 (1990)CrossRefGoogle Scholar
  9. 9.
    Bernstein, P., Melnik, S.: Model Management 2.0: Manipulating Richer Mappings. In: ACM SIGMOD International Conference on Management of Data, pp. 1–12. ACM, New York (2007)Google Scholar
  10. 10.
    Kepler, (visited: 2010-07-05)
  11. 11.
    Taverna Workflow System: (visited: 2010-07-05)
  12. 12.
    Oinn, T., Greenwood, M., Addis, M., Alpdemir, M.N., Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D., Marvin, D.: Taverna: Lessons in Creating a Workflow Environment for the Life Sciences. Concurrency and Computation: Practice and Experience 18, 1067–1100 (2006)CrossRefGoogle Scholar
  13. 13.
    Bowers, S., McPhillips, T.M., Ludäscher, B.: Provenance in Collection-Oriented Scientific Workflows. Concurrency and Computation: Practice and Experience 20, 519–529 (2008)CrossRefGoogle Scholar
  14. 14.
    SnapLogic - The DataFlow Company: (visited: 2010-07-05)
  15. 15.
    Bhattacharjee, A., Islam, A., Amin, M., Hossain, S., Hosain, S., Jamil, H., Lipovich, L.: On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) Database and Expert Systems Applications. LNCS, vol. 5690, pp. 561–575. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Jamil, H., El-Hajj-Diab, B.: Bioflow: A Web-based Declarative Workflow Language for Life Sciences. In: IEEE Congress on Services (SERVICES 2008), pp. 453–460. IEEE, Hawaii (2008)CrossRefGoogle Scholar
  17. 17.
    Hosain, S., Jamil, H.: An Algebraic Language for Semantic Data Integration on the Hidden Web. In: IEEE International Conference on Semantic Computing (ICSC 2009), pp. 237–244. IEEE, Berkeley (2009)CrossRefGoogle Scholar
  18. 18.
    Motro, A., Anokhin, P.: Fusionplex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources. Information Fusion 7, 176–196 (2006)CrossRefGoogle Scholar
  19. 19.
    Bilke, A., Bleiholder, J., Naumann, F., Böhm, C., Draba, K., Weis, M.: Automatic Data Fusion with HumMer. In: 31st International Conference on Very Large Data Bases (VLDB 2005), Trondheim, Norway, pp. 1251–1254 (2005)Google Scholar
  20. 20.
    Bilke, A., Naumann, F.: Matching Using Duplicates. In: 21st International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, pp. 69–80 (2005)Google Scholar
  21. 21.
    Bleiholder, J., Naumann, F.: Declarative Data Fusion – Syntax, Semantics and Implementation. In: Eder, J., Haav, H.-M., Kalja, A., Penjam, J. (eds.) ADBIS 2005. LNCS, vol. 3631, pp. 58–73. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Bastian Roth
    • 1
  • Bernhard Volz
    • 1
  • Robin Hecht
    • 1
  1. 1.Chair for Applied Computer Science IVUniversity of BayreuthBayreuthGermany

Personalised recommendations