CloudFuice: A Flexible Cloud-Based Data Integration System

  • Andreas Thor
  • Erhard Rahm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6757)


The advent of cloud computing technologies shows great promise for web engineering and facilitates the development of flexible, distributed, and scalable web applications. Data integration can notably benefit from cloud computing because integrating web data is usually an expensive task. This paper introduces CloudFuice, a data integration system that follows a mashup-like specification of advanced dataflows for data integration. CloudFuice’s task-based execution approach allows for an efficient, asynchronous, and parallel execution of dataflows in the cloud and utilizes recent cloud-based web engineering instruments. We demonstrate and evaluate CloudFuice’s applicability for mashup-based data integration in the cloud with the help of a first prototype implementation.


Cloud Data Management Data Integration Mashups 


  1. 1.
    Battré, Ewen, Hueske, Kao, Markl, Warneke: Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: SoCC (2010)Google Scholar
  2. 2.
    Bizer, Heath, Berners-Lee: Linked data - the story so far. IJSWIS 5(3) (2009)Google Scholar
  3. 3.
    Chaiken, Jenkins, Larson, Ramsey, Shakib, Weaver, Zhou: SCOPE: Easy and efficient parallel processing of massive data sets. In: PVLDB, vol. 1(2) (2008)Google Scholar
  4. 4.
    Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, Gruber: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26 (2008)Google Scholar
  5. 5.
    Cheng, Yan, Chang: Entityrank: Searching entities directly and holistically. In: VLDB (2007)Google Scholar
  6. 6.
    Dean, Ghemawat: MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1) (2008)Google Scholar
  7. 7.
    DeWitt, Gray: Parallel database systems: The future of high performance database systems. Communications of the ACM 35(6) (1992)Google Scholar
  8. 8.
    Elteir, Lin, Feng: Enhancing mapreduce via asynchronous data processing. In: ICPADS (2010)Google Scholar
  9. 9.
    Endrullis, Thor, Rahm: Evaluation of Query Generators for Entity Search Engines. In: USETIM (2009)Google Scholar
  10. 10.
    Hassan, Ramaswamy, Miller: Comap: A cooperative overlay-based mashup platform. In: CoopIS (2010)Google Scholar
  11. 11.
    Hassan, Ramaswamy, Miller: Enhancing Scalability and Performance of Mashups Through Merging and Operator Reordering. In: ICWS (2010)Google Scholar
  12. 12.
    Isard, Budiu, Yu, Birrell, Fetterly: Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. In: EuroSys Conference (2007)Google Scholar
  13. 13.
    Kirsten, Kolb, Hartung, Gross, Köpcke, Rahm: Data Partitioning for Parallel Entity Matching. In: QDB (2010)Google Scholar
  14. 14.
    Köpcke, Rahm: Frameworks for entity matching: A comparison. Data Knowl. Eng. 69(2) (2010)Google Scholar
  15. 15.
    Le-Phuoc, Polleres, Hauswirth, Tummarello, Morbidoni: Rapid Prototyping of semantic Mash-ups through semantic Web Pipes. In: WWW (2009)Google Scholar
  16. 16.
    Lenzerini: Data integration: A theoretical perspective. In: PODS (2002)Google Scholar
  17. 17.
    Lorenzo, Hacid, Paik, Benatallah: Data Integration in Mashups. SIGMOD Rec. 38 (2009)Google Scholar
  18. 18.
    Maximilien, E.M., Wilkinson, H., Desai, N., Tai, S.: A Domain-Specific Language for Web APIs and Services Mashups. In: Krämer, B.J., Lin, K.-J., Narasimhan, P. (eds.) ICSOC 2007. LNCS, vol. 4749, pp. 13–26. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Olston, Reed, Srivastava, Kumar, Tomkins: Pig Latin: A Not-So-Foreign Language for Data Processing. In: SIGMOD (2008)Google Scholar
  20. 20.
    Rahm, Thor, Aumueller, Do, Golovin, Kirsten: iFuice - Information Fusion utilizing Instance Correspondences and Peer Mappings. In: WebDB (2005)Google Scholar
  21. 21.
    Simmen, Altinel, Markl, Padmanabhan, Singh: Damia: Data Mashups for Intranet Applications. In: SIGMOD (2008)Google Scholar
  22. 22.
    Thor, Rahm: MOMA - A Mapping-based Object Matching System. In: CIDR (2007)Google Scholar
  23. 23.
    Thor, Rahm: CloudFuice: A flexible Cloud-based Data Integration Approach. Technical report, University of Leipzig (2011),
  24. 24.
    Yu, Isard, Fetterly, Budiu, Erlingsson, Gunda, Currey: Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Andreas Thor
    • 1
  • Erhard Rahm
    • 2
  1. 1.Institute for Advanced Computer StudiesUniversity of MarylandUSA
  2. 2.Department of Computer ScienceUniversity of LeipzigGermany

Personalised recommendations