Skip to main content

Abstract

The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs. Combining this with opportunities for incremental refinement enables a ‘pay-as-you-go’ approach to data integration, resulting in simplified integrated access to distributed data. It has been speculated that model management could provide the basis for Dataspace Management, however, this has not been investigated until now.

Here, we present DSToolkit, the first dataspace management system that is based on model management, and therefore, benefits from the flexibility provided by the approach for the management of schemas represented in heterogeneous models, supports the complete dataspace lifecycle, which includes automatic initialisation, maintenance and improvement of a dataspace, and allows the user to provide feedback by annotating result tuples returned as a result of queries the user has posed. The user feedback gathered is utilised for improvement by annotating, selecting and refining mappings. Without the need for additional feedback on a new data source, these techniques can also be applied to determine its perceived quality with respect to already gathered feedback and to identify the best mappings over all sources including the new one.

The work reported in this paper was supported by a grant from the EPSRC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: Mism: A platform for model-independent solutions to model management problems. J. Data Semantics 14, 133–161 (2009)

    Article  Google Scholar 

  2. Atzeni, P., Gianforme, G., Cappellari, P.: A universal metamodel and its dictionary. T. Large-Scale Data- and Knowledge-Centered Systems 1, 38–62 (2009)

    Google Scholar 

  3. Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD Conference, pp. 906–908 (2005)

    Google Scholar 

  4. Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: EDBT, pp. 573–584 (2010)

    Google Scholar 

  5. Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Hedeler, C., Embury, S.M.: User feedback as a first class citizen in information integration systems. In: CIDR, pp. 175–183 (2011)

    Google Scholar 

  6. Bernstein, P.A.: Applying model management to classical meta data problems. In: CIDR, pp. 209–220 (2003)

    Google Scholar 

  7. Bernstein, P.A., Halevy, A.Y., Pottinger, R.A.: A vision for management of complex models. SIGMOD Record 29(4), 55–63 (2000)

    Article  Google Scholar 

  8. Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD Conference, pp. 1–12 (2007)

    Google Scholar 

  9. Bernstein, P.A., Melnik, S., Petropoulos, M., Quix, C.: Industrial-strength schema matching. SIGMOD Record 33(4), 38–43 (2004)

    Article  Google Scholar 

  10. Cao, H., Qi, Y., Candan, K.S., Sapino, M.L.: Feedback-driven result ranking and query refinement for exploring semi-structured data collections. In: EDBT, pp. 3–14 (2010)

    Google Scholar 

  11. Chai, X., Vuong, B.Q., Doan, A., Naughton, J.F.: Efficiently incorporating user feedback into information extraction and integration programs. In: SIGMOD Conference, pp. 87–100 (2009)

    Google Scholar 

  12. Chiticariu, L., Kolaitis, P.G., Popa, L.: Interactive generation of integrated schemas. In: SIGMOD Conference, pp. 833–846 (2008)

    Google Scholar 

  13. Chiticariu, L., Tan, W.C.: Debugging schema mappings with routes. In: VLDB, pp. 79–90 (2006)

    Google Scholar 

  14. Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD, pp. 861–874 (2008)

    Google Scholar 

  15. Dittrich, J., Salles, M.A.V., Blunschi, L.: imemex: From search to information integration and back. IEEE Data Eng. Bull. 32(2), 28–35 (2009)

    Google Scholar 

  16. Do, H.H., Rahm, E.: Coma: a system for flexible combination of schema matching approaches. In: VLDB, pp. 610–621 (2002)

    Google Scholar 

  17. Do, H.H., Rahm, E.: Matching large schemas: Approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)

    Article  Google Scholar 

  18. Dong, X., Halevy, A.Y.: A platform for personal information management and integration. In: CIDR, pp. 119–130 (2005)

    Google Scholar 

  19. Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27–33 (2005)

    Article  Google Scholar 

  20. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems The Complete Book. Pearson International edn., 2nd edn. (2009)

    Google Scholar 

  21. Graefe, G.: Encapsulation of parallelism in the volcano query processing system. In: SIGMOD Conference, pp. 102–111 (1990)

    Google Scholar 

  22. Haas, L.: Beauty and the Beast: The Theory and Practice of Information Integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  23. Haas, L., Lin, E., Roth, M.: Data integration through database federation. IBM Systems Journal 41(4), 578–596 (2002)

    Article  Google Scholar 

  24. Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (2001)

    Article  MATH  Google Scholar 

  25. Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. In: PODS, pp. 1–9 (2006)

    Google Scholar 

  26. Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 2009. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  27. Hedeler, C., Belhajjame, K., Paton, N.W., Campi, A., Fernandes, A.A.A., Embury, S.M.: Dataspaces. In: SeCO Workshop, pp. 114–134 (2009)

    Google Scholar 

  28. Hedeler, C., Paton, N.W.: Utilising the MISM Model Independent Schema Management Platform for Query Evaluation. In: Fernandes, A.A.A., Gray, A.J.G., Belhajjame, K. (eds.) BNCOD 2011. LNCS, vol. 7051, pp. 108–117. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  29. Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The orchestra collaborative data sharing system. SIGMOD Record 37(3), 26–32 (2008)

    Article  Google Scholar 

  30. Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD Conference, pp. 847–860 (2008)

    Google Scholar 

  31. Kensche, D., Quix, C., Li, X., Li, Y., Jarke, M.: Generic schema mappings for composition and query answering. Data & Knowledge Engineering (DKE) 68(7), 599–621 (2009)

    Article  Google Scholar 

  32. Kim, W., Choi, I., Gala, S.K., Scheevel, M.: On resolving schematic heterogeneity in multidatabase systems. Distributed and Parallel Databases 1(3), 251–279 (1993)

    Article  Google Scholar 

  33. Kim, W., Seo, J.: Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24(12), 12–18 (1991)

    Article  Google Scholar 

  34. Lynden, S., Mukherjee, A., Hume, A.C., Fernandes, A.A.A., Paton, N.W., Sakellariou, R., Watson, P.: The design and implementation of OGSA-DQP: A service-based distributed query processor. Future Generation Comp. Syst. 25(3), 224–236 (2009)

    Article  Google Scholar 

  35. Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR, pp. 342–350 (2007)

    Google Scholar 

  36. Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A.: Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings. In: van Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 79–93. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  37. McBrien, P., Poulovassilis, A.: P2P Query Reformulation over Both-As-View Data Transformation Rules. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005 and DBISP2P 2006. LNCS, vol. 4125, pp. 310–322. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  38. McCann, R., Kramnik, A., Shen, W., Varadarajan, V., Sobulo, O., Doan, A.: Integrating data from disparate sources: A mass collaboration approach. In: ICDE, pp. 487–488 (2005)

    Google Scholar 

  39. Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: a programming platform for generic model management. In: SIGMOD, pp. 193–204 (2003)

    Google Scholar 

  40. Michalewicz, Z., Fogel, D.: How to solve it: modern heuristics. Springer, Heidelberg (2000)

    Book  MATH  Google Scholar 

  41. Mork, P., Seligman, L., Rosenthal, A., Korb, J., Wolf, C.: The harmony integration workbench. J. Data Semantics 11, 65–93 (2008)

    Google Scholar 

  42. Naumann, F., Leser, U., Freytag, J.C.: Quality-driven integration of heterogenous information systems. In: VLDB, pp. 447–458 (1999)

    Google Scholar 

  43. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  44. Scannapieco, M., Virgillito, A., Marchetti, C., Mecella, M., Baldoni, R.: The architecture: a platform for exchanging and improving data quality in cooperative information systems. Inf. Syst. 29(7), 551–582 (2004)

    Article  Google Scholar 

  45. Seligman, L., Mork, P., Halevy, A.Y., Smith, K.P., Carey, M.J., Chen, K., Wolf, C., Madhavan, J., Kannan, A., Burdick, D.: Openii: an open source information integration toolkit. In: SIGMOD Conference, pp. 1057–1060 (2010)

    Google Scholar 

  46. Smith, A., Rizopoulos, N., McBrien, P.: AutoMed Model Management. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 542–543. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  47. Talukdar, P.P., Ives, Z.G., Pereira, F.: Automatically incorporating new sources in keyword search-based data integration. In: SIGMOD Conference, pp. 387–398 (2010)

    Google Scholar 

  48. Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. PVLDB 1(1), 785–796 (2008)

    Google Scholar 

  49. Wang, R.Y.: A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Abdelkader Hameurlain Josef Küng Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hedeler, C. et al. (2012). DSToolkit: An Architecture for Flexible Dataspace Management. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems V. Lecture Notes in Computer Science, vol 7100. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28148-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28148-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28147-1

  • Online ISBN: 978-3-642-28148-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics