DSToolkit: An Architecture for Flexible Dataspace Management

Hedeler, Cornelia; Belhajjame, Khalid; Mao, Lu; Guo, Chenjuan; Arundale, Ian; Lóscio, Bernadette Farias; Paton, Norman W.; Fernandes, Alvaro A. A.; Embury, Suzanne M.

doi:10.1007/978-3-642-28148-8_6

Cornelia Hedeler¹⁶,
Khalid Belhajjame¹⁶,
Lu Mao¹⁶,
Chenjuan Guo¹⁶,
Ian Arundale¹⁶,
Bernadette Farias Lóscio¹⁷,
Norman W. Paton¹⁶,
Alvaro A. A. Fernandes¹⁶ &
…
Suzanne M. Embury¹⁶

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 7100))

515 Accesses
2 Citations

Abstract

The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs. Combining this with opportunities for incremental refinement enables a ‘pay-as-you-go’ approach to data integration, resulting in simplified integrated access to distributed data. It has been speculated that model management could provide the basis for Dataspace Management, however, this has not been investigated until now.

Here, we present DSToolkit, the first dataspace management system that is based on model management, and therefore, benefits from the flexibility provided by the approach for the management of schemas represented in heterogeneous models, supports the complete dataspace lifecycle, which includes automatic initialisation, maintenance and improvement of a dataspace, and allows the user to provide feedback by annotating result tuples returned as a result of queries the user has posed. The user feedback gathered is utilised for improvement by annotating, selecting and refining mappings. Without the need for additional feedback on a new data source, these techniques can also be applied to determine its perceived quality with respect to already gathered feedback and to identify the best mappings over all sources including the new one.

The work reported in this paper was supported by a grant from the EPSRC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: Mism: A platform for model-independent solutions to model management problems. J. Data Semantics 14, 133–161 (2009)
Article Google Scholar
Atzeni, P., Gianforme, G., Cappellari, P.: A universal metamodel and its dictionary. T. Large-Scale Data- and Knowledge-Centered Systems 1, 38–62 (2009)
Google Scholar
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD Conference, pp. 906–908 (2005)
Google Scholar
Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: EDBT, pp. 573–584 (2010)
Google Scholar
Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Hedeler, C., Embury, S.M.: User feedback as a first class citizen in information integration systems. In: CIDR, pp. 175–183 (2011)
Google Scholar
Bernstein, P.A.: Applying model management to classical meta data problems. In: CIDR, pp. 209–220 (2003)
Google Scholar
Bernstein, P.A., Halevy, A.Y., Pottinger, R.A.: A vision for management of complex models. SIGMOD Record 29(4), 55–63 (2000)
Article Google Scholar
Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD Conference, pp. 1–12 (2007)
Google Scholar
Bernstein, P.A., Melnik, S., Petropoulos, M., Quix, C.: Industrial-strength schema matching. SIGMOD Record 33(4), 38–43 (2004)
Article Google Scholar
Cao, H., Qi, Y., Candan, K.S., Sapino, M.L.: Feedback-driven result ranking and query refinement for exploring semi-structured data collections. In: EDBT, pp. 3–14 (2010)
Google Scholar
Chai, X., Vuong, B.Q., Doan, A., Naughton, J.F.: Efficiently incorporating user feedback into information extraction and integration programs. In: SIGMOD Conference, pp. 87–100 (2009)
Google Scholar
Chiticariu, L., Kolaitis, P.G., Popa, L.: Interactive generation of integrated schemas. In: SIGMOD Conference, pp. 833–846 (2008)
Google Scholar
Chiticariu, L., Tan, W.C.: Debugging schema mappings with routes. In: VLDB, pp. 79–90 (2006)
Google Scholar
Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD, pp. 861–874 (2008)
Google Scholar
Dittrich, J., Salles, M.A.V., Blunschi, L.: imemex: From search to information integration and back. IEEE Data Eng. Bull. 32(2), 28–35 (2009)
Google Scholar
Do, H.H., Rahm, E.: Coma: a system for flexible combination of schema matching approaches. In: VLDB, pp. 610–621 (2002)
Google Scholar
Do, H.H., Rahm, E.: Matching large schemas: Approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)
Article Google Scholar
Dong, X., Halevy, A.Y.: A platform for personal information management and integration. In: CIDR, pp. 119–130 (2005)
Google Scholar
Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27–33 (2005)
Article Google Scholar
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems The Complete Book. Pearson International edn., 2nd edn. (2009)
Google Scholar
Graefe, G.: Encapsulation of parallelism in the volcano query processing system. In: SIGMOD Conference, pp. 102–111 (1990)
Google Scholar
Haas, L.: Beauty and the Beast: The Theory and Practice of Information Integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)
Chapter Google Scholar
Haas, L., Lin, E., Roth, M.: Data integration through database federation. IBM Systems Journal 41(4), 578–596 (2002)
Article Google Scholar
Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (2001)
Article MATH Google Scholar
Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. In: PODS, pp. 1–9 (2006)
Google Scholar
Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 2009. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)
Chapter Google Scholar
Hedeler, C., Belhajjame, K., Paton, N.W., Campi, A., Fernandes, A.A.A., Embury, S.M.: Dataspaces. In: SeCO Workshop, pp. 114–134 (2009)
Google Scholar
Hedeler, C., Paton, N.W.: Utilising the MISM Model Independent Schema Management Platform for Query Evaluation. In: Fernandes, A.A.A., Gray, A.J.G., Belhajjame, K. (eds.) BNCOD 2011. LNCS, vol. 7051, pp. 108–117. Springer, Heidelberg (2011)
Chapter Google Scholar
Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The orchestra collaborative data sharing system. SIGMOD Record 37(3), 26–32 (2008)
Article Google Scholar
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD Conference, pp. 847–860 (2008)
Google Scholar
Kensche, D., Quix, C., Li, X., Li, Y., Jarke, M.: Generic schema mappings for composition and query answering. Data & Knowledge Engineering (DKE) 68(7), 599–621 (2009)
Article Google Scholar
Kim, W., Choi, I., Gala, S.K., Scheevel, M.: On resolving schematic heterogeneity in multidatabase systems. Distributed and Parallel Databases 1(3), 251–279 (1993)
Article Google Scholar
Kim, W., Seo, J.: Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24(12), 12–18 (1991)
Article Google Scholar
Lynden, S., Mukherjee, A., Hume, A.C., Fernandes, A.A.A., Paton, N.W., Sakellariou, R., Watson, P.: The design and implementation of OGSA-DQP: A service-based distributed query processor. Future Generation Comp. Syst. 25(3), 224–236 (2009)
Article Google Scholar
Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR, pp. 342–350 (2007)
Google Scholar
Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A.: Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings. In: van Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 79–93. Springer, Heidelberg (2009)
Chapter Google Scholar
McBrien, P., Poulovassilis, A.: P2P Query Reformulation over Both-As-View Data Transformation Rules. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005 and DBISP2P 2006. LNCS, vol. 4125, pp. 310–322. Springer, Heidelberg (2007)
Chapter Google Scholar
McCann, R., Kramnik, A., Shen, W., Varadarajan, V., Sobulo, O., Doan, A.: Integrating data from disparate sources: A mass collaboration approach. In: ICDE, pp. 487–488 (2005)
Google Scholar
Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: a programming platform for generic model management. In: SIGMOD, pp. 193–204 (2003)
Google Scholar
Michalewicz, Z., Fogel, D.: How to solve it: modern heuristics. Springer, Heidelberg (2000)
Book MATH Google Scholar
Mork, P., Seligman, L., Rosenthal, A., Korb, J., Wolf, C.: The harmony integration workbench. J. Data Semantics 11, 65–93 (2008)
Google Scholar
Naumann, F., Leser, U., Freytag, J.C.: Quality-driven integration of heterogenous information systems. In: VLDB, pp. 447–458 (1999)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Article MATH Google Scholar
Scannapieco, M., Virgillito, A., Marchetti, C., Mecella, M., Baldoni, R.: The architecture: a platform for exchanging and improving data quality in cooperative information systems. Inf. Syst. 29(7), 551–582 (2004)
Article Google Scholar
Seligman, L., Mork, P., Halevy, A.Y., Smith, K.P., Carey, M.J., Chen, K., Wolf, C., Madhavan, J., Kannan, A., Burdick, D.: Openii: an open source information integration toolkit. In: SIGMOD Conference, pp. 1057–1060 (2010)
Google Scholar
Smith, A., Rizopoulos, N., McBrien, P.: AutoMed Model Management. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 542–543. Springer, Heidelberg (2008)
Chapter Google Scholar
Talukdar, P.P., Ives, Z.G., Pereira, F.: Automatically incorporating new sources in keyword search-based data integration. In: SIGMOD Conference, pp. 387–398 (2010)
Google Scholar
Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. PVLDB 1(1), 785–796 (2008)
Google Scholar
Wang, R.Y.: A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK
Cornelia Hedeler, Khalid Belhajjame, Lu Mao, Chenjuan Guo, Ian Arundale, Norman W. Paton, Alvaro A. A. Fernandes & Suzanne M. Embury
Centro de Informatica Cidade Universitria, Universidade Federal de Pernambuco, 50740-540, Recife, PE, Brasil
Bernadette Farias Lóscio

Authors

Cornelia Hedeler
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Belhajjame
View author publications
You can also search for this author in PubMed Google Scholar
Lu Mao
View author publications
You can also search for this author in PubMed Google Scholar
Chenjuan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ian Arundale
View author publications
You can also search for this author in PubMed Google Scholar
Bernadette Farias Lóscio
View author publications
You can also search for this author in PubMed Google Scholar
Norman W. Paton
View author publications
You can also search for this author in PubMed Google Scholar
Alvaro A. A. Fernandes
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne M. Embury
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Abdelkader Hameurlain Josef Küng Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hedeler, C. et al. (2012). DSToolkit: An Architecture for Flexible Dataspace Management. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems V. Lecture Notes in Computer Science, vol 7100. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28148-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-28148-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28147-1
Online ISBN: 978-3-642-28148-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics