Abstract
The advent of cloud computing technologies shows great promise for web engineering and facilitates the development of flexible, distributed, and scalable web applications. Data integration can notably benefit from cloud computing because integrating web data is usually an expensive task. This paper introduces CloudFuice, a data integration system that follows a mashup-like specification of advanced dataflows for data integration. CloudFuice’s task-based execution approach allows for an efficient, asynchronous, and parallel execution of dataflows in the cloud and utilizes recent cloud-based web engineering instruments. We demonstrate and evaluate CloudFuice’s applicability for mashup-based data integration in the cloud with the help of a first prototype implementation.
Keywords
References
Battré, Ewen, Hueske, Kao, Markl, Warneke: Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: SoCC (2010)
Bizer, Heath, Berners-Lee: Linked data - the story so far. IJSWIS 5(3) (2009)
Chaiken, Jenkins, Larson, Ramsey, Shakib, Weaver, Zhou: SCOPE: Easy and efficient parallel processing of massive data sets. In: PVLDB, vol. 1(2) (2008)
Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, Gruber: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26 (2008)
Cheng, Yan, Chang: Entityrank: Searching entities directly and holistically. In: VLDB (2007)
Dean, Ghemawat: MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1) (2008)
DeWitt, Gray: Parallel database systems: The future of high performance database systems. Communications of the ACM 35(6) (1992)
Elteir, Lin, Feng: Enhancing mapreduce via asynchronous data processing. In: ICPADS (2010)
Endrullis, Thor, Rahm: Evaluation of Query Generators for Entity Search Engines. In: USETIM (2009)
Hassan, Ramaswamy, Miller: Comap: A cooperative overlay-based mashup platform. In: CoopIS (2010)
Hassan, Ramaswamy, Miller: Enhancing Scalability and Performance of Mashups Through Merging and Operator Reordering. In: ICWS (2010)
Isard, Budiu, Yu, Birrell, Fetterly: Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. In: EuroSys Conference (2007)
Kirsten, Kolb, Hartung, Gross, Köpcke, Rahm: Data Partitioning for Parallel Entity Matching. In: QDB (2010)
Köpcke, Rahm: Frameworks for entity matching: A comparison. Data Knowl. Eng. 69(2) (2010)
Le-Phuoc, Polleres, Hauswirth, Tummarello, Morbidoni: Rapid Prototyping of semantic Mash-ups through semantic Web Pipes. In: WWW (2009)
Lenzerini: Data integration: A theoretical perspective. In: PODS (2002)
Lorenzo, Hacid, Paik, Benatallah: Data Integration in Mashups. SIGMOD Rec. 38 (2009)
Maximilien, E.M., Wilkinson, H., Desai, N., Tai, S.: A Domain-Specific Language for Web APIs and Services Mashups. In: Krämer, B.J., Lin, K.-J., Narasimhan, P. (eds.) ICSOC 2007. LNCS, vol. 4749, pp. 13–26. Springer, Heidelberg (2007)
Olston, Reed, Srivastava, Kumar, Tomkins: Pig Latin: A Not-So-Foreign Language for Data Processing. In: SIGMOD (2008)
Rahm, Thor, Aumueller, Do, Golovin, Kirsten: iFuice - Information Fusion utilizing Instance Correspondences and Peer Mappings. In: WebDB (2005)
Simmen, Altinel, Markl, Padmanabhan, Singh: Damia: Data Mashups for Intranet Applications. In: SIGMOD (2008)
Thor, Rahm: MOMA - A Mapping-based Object Matching System. In: CIDR (2007)
Thor, Rahm: CloudFuice: A flexible Cloud-based Data Integration Approach. Technical report, University of Leipzig (2011), http://dbs.uni-leipzig.de/publication/year/2011
Yu, Isard, Fetterly, Budiu, Erlingsson, Gunda, Currey: Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thor, A., Rahm, E. (2011). CloudFuice: A Flexible Cloud-Based Data Integration System. In: Auer, S., Díaz, O., Papadopoulos, G.A. (eds) Web Engineering. ICWE 2011. Lecture Notes in Computer Science, vol 6757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22233-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-22233-7_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22232-0
Online ISBN: 978-3-642-22233-7
eBook Packages: Computer ScienceComputer Science (R0)