Advertisement

A Functional Model for Dataspace Management Systems

  • Cornelia Hedeler
  • Alvaro A. A. Fernandes
  • Khalid Belhajjame
  • Lu Mao
  • Chenjuan Guo
  • Norman W. Paton
  • Suzanne M. Embury
Part of the Intelligent Systems Reference Library book series (ISRL, volume 36)

Abstract

Dataspace management systems (DSMSs) hold the promise of pay-as-you-go data integration. We describe a comprehensive model of DSMS functionality using an algebraic style. We begin by characterizing a dataspace life cycle and highlighting opportunities for both automation and user-driven improvement techniques. Building on the observation that many of the techniques developed in model management are of use in data integration contexts as well, we briefly introduce the model management area and explain how previous work on both data integration and model management needs extending if the full dataspace life cycle is to be supported.We show that many model management operators already enable important functionalities (e.g., the merging of schemas, the composition of mappings, etc.) and formulate these capabilities in an algebraic structure, thereby giving rise to the notion of the core functionality of a DSMS as a many-sorted algebra. Given this view, we show how core tasks in the dataspace life cycle can be enacted by means of algebraic programs. An extended case study illustrates how such algebraic programs capture a challenging, practical scenario.

Keywords

Model Management Data Integration Functional Model Data Resource User Feedback 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.C.: Muse: Mapping Understanding and deSign by Example. In: ICDE, pp. 10–19. IEEE (2008) Google Scholar
  2. 2.
    Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: MISM: A Platform for Model-Independent Solutions to Model Management Problems. In: Spaccapietra, S., Delcambre, L. (eds.) Journal on Data Semantics XIV. LNCS, vol. 5880, pp. 133–161. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P.A., Gianforme, G.: Model-Independent Schema Translation. VLDB J. 17(6), 1347–1370 (2008)CrossRefGoogle Scholar
  4. 4.
    Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Özcan, F. (ed.) SIGMOD Conference, pp. 906–908. ACM (2005)Google Scholar
  5. 5.
    Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration. ACM Comput. Surv. 18(4), 323–364 (1986)CrossRefGoogle Scholar
  6. 6.
    Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based Annotation, Selection and Refinement of Schema Mappings for Dataspaces. In: EDBT, pp. 573–584 (2010)Google Scholar
  7. 7.
    Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Hedeler, C., Embury, S.M.: User Feedback as a First Class Citizen in Information Integration Systems. In: CIDR (2011)Google Scholar
  8. 8.
    Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucleic Acids Research 31(1), 23–27 (2003); Databases in biology: Genbank CrossRefGoogle Scholar
  9. 9.
    Bernstein, P.A., Halevy, A.Y., Pottinger, R.: A Vision of Management of Complex Models. SIGMOD Record 29(4), 55–63 (2000)CrossRefGoogle Scholar
  10. 10.
    Bernstein, P.A., Melnik, S.: Model Management 2.0: Manipulating Richer Mappings. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD Conference, pp. 1–12. ACM (2007)Google Scholar
  11. 11.
    Blunschi, L., Dittrich, J.-P., Girard, O.R., Karakashian, S.K., Salles, M.A.V.: A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo). In: CIDR, pp. 114–119 (2007)Google Scholar
  12. 12.
    Boyd, M., Kittivoravitkul, S., Lazanitis, C., Mçbrien, P., Rizopoulos, N.: AutoMed: A BAV Data Integration System for Heterogeneous Data Sources. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 82–97. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Boyd, M., Mçbrien, P.: Comparing and Transforming Between Data Models Via an Intermediate Hypergraph Data Model. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 69–109. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Bult, C., Eppig, J., Kadin, J., Richardson, J., Blake, J., the members of the Mouse Genome Database Group: The Mouse Genome Database (MGD): Mouse Biology and Model Systems. Nucleic Acids Research 36(Database issue), D724–D728 (2008)Google Scholar
  15. 15.
    Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data Integration for the Relational Web. PVLDB 2(1), 1090–1101 (2009)Google Scholar
  16. 16.
    Cao, H., Qi, Y., Candan, K.S., Sapino, M.L.: Feedback-driven Result Ranking and Query Refinement for Exploring Semi-structured Data Collections. In: EDBT, pp. 3–14 (2010)Google Scholar
  17. 17.
    Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-Automatic Schema Integration in Clio. In: VLDB, pp. 1326–1329 (2007)Google Scholar
  18. 18.
    Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: SIGMOD, pp. 861–874 (2008)Google Scholar
  19. 19.
    Dittrich, J.-P., Vaz Salles, M.A.: iDM: A Unified and Versatile Data Model for Personal Dataspace Management. In: VLDB, pp. 367–378 (2006)Google Scholar
  20. 20.
    Do, H.-H., Rahm, E.: COMA: A System for Flexible Combination of Schema Matching Approaches. In: VLDB, pp. 610–621 (2002)Google Scholar
  21. 21.
    Doan, A., Halevy, A.Y.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine 26(1), 83–94 (2005)Google Scholar
  22. 22.
    Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., Sayyadian, M., Shen, W.: Community Information Management. IEEE Data Eng. Bull. 29(1), 64–72 (2006)Google Scholar
  23. 23.
    Dong, X., Halevy, A.Y.: A Platform for Personal Information Management and Integration. In: CIDR, pp. 119–130 (2005)Google Scholar
  24. 24.
    Dong, X., Halevy, A.Y., Yu, C.: Data Integration with Uncertainty. In: VLDB, pp. 687–698 (2007)Google Scholar
  25. 25.
    Dong, X.L., Halevy, A.Y., Yu, C.: Data Integration with Uncertainty. VLDB J. 18(2), 469–500 (2009)CrossRefGoogle Scholar
  26. 26.
    Flicek, P., Aken, B.L., Ballester, B., et al.: Ensembl’s 10th Year. Nucleic Acids Research 38(Database issue), D557–D562 (2010)Google Scholar
  27. 27.
    Franklin, M.J., Halevy, A.Y., Maier, D.: From Databases to Dataspaces: A New Abstraction for Information Management. SIGMOD Record 34(4), 27–33 (2005)CrossRefGoogle Scholar
  28. 28.
    Haas, L.M., Lin, E.T., Roth, M.A.: Data Integration through Database Federation. IBM Systems Journal 41(4), 578–596 (2002)CrossRefGoogle Scholar
  29. 29.
    Halevy, A.Y.: Answering Queries using Views: A Survey. The VLDB Journal 10(4), 270–294 (2001)zbMATHCrossRefGoogle Scholar
  30. 30.
    Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of Dataspace Systems. In: Vansummeren, S. (ed.) PODS, pp. 1–9. ACM (2006)Google Scholar
  31. 31.
    Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data Integration: The Teenage Years. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) VLDB, pp. 9–16. ACM (2006)Google Scholar
  32. 32.
    Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 26. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  33. 33.
    Hedeler, C., Belhajjame, K., Mao, L., Paton, N.W., Fernandes, A.A.A., Guo, C., Embury, S.M.: Flexible Dataspace Management Through Model Management. In: EDBT/ICDT Workshops (2010)Google Scholar
  34. 34.
    Hedeler, C., Belhajjame, K., Paton, N.W., Campi, A., Fernandes, A.A.A., Embury, S.M.: Dataspaces. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 5950, pp. 114–134. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  35. 35.
    Hedeler, C., Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Embury, S.M., Mao, L., Guo, C.: Pay-As-You-Go Mapping Selection in Dataspaces. In: SIGMOD, pp. 1279–1282 (2011)Google Scholar
  36. 36.
    Hedeler, C., Paton, N.W.: Utilising the MISM Model Independent Schema Management Platform for Query Evaluation. In: Fernandes, A.A.A., Gray, A.J.G., Belhajjame, K. (eds.) BNCOD 2011. LNCS, vol. 7051, pp. 108–117. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  37. 37.
    Hernández, M.A., Ho, H., Popa, L., Fuxman, A., Miller, R.J., Fukuda, T., Papotti, P.: Creating Nested Mappings with Clio. In: ICDE, pp. 1487–1488 (2007)Google Scholar
  38. 38.
    Howe, B., Maier, D., Rayner, N., Rucker, J.: Quarrying Dataspaces: Schemaless Profiling of Unfamiliar Information Sources. In: ICDE Workshops, pp. 270–277 (2008)Google Scholar
  39. 39.
    Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The ORCHESTRA Collaborative Data Sharing System. SIGMOD Record 37(3), 26–32 (2008)CrossRefGoogle Scholar
  40. 40.
    Ives, Z.G., Knoblock, C.A., Minton, S., Jacob, M., Talukdar, P.P., Tuchinda, R., Ambite, J.L., Muslea, M., Gazen, C.: Interactive Data Integration through Smart Copy & Paste. In: CIDR (2009), www.crdrdb.org
  41. 41.
    Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-As-You-Go User Feedback for Dataspace Systems. In: SIGMOD, pp. 847–860. (2008)Google Scholar
  42. 42.
    Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., Hirakawa, M.: KEGG for Representation and Analysis of Molecular Networks Involving Diseases and Drugs. Nucleic Acicds Research 38(Database issue), D355–D360 (2010)CrossRefGoogle Scholar
  43. 43.
    Kensche, D., Quix, C., Li, X., Li, Y., Jarke, M.: Generic Schema Mappings for Composition and Query Answering. Data Knowl. Eng 68(7), 599–621 (2009)CrossRefGoogle Scholar
  44. 44.
    Kim, W., Seo, J.: Classifying Schematic and Data Heterogeneity in Multidatabase Systems. IEEE Computer 24(12), 12–18 (1991)CrossRefGoogle Scholar
  45. 45.
    Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Popa, L. (ed.) PODS, pp. 233–246. ACM (2002)Google Scholar
  46. 46.
    Leser, U., Naumann, F.: (Almost) Hands-off Information Integration for the Life Sciences. In: CIDR, pp. 131–143 (2005)Google Scholar
  47. 47.
    Liu, J., Dong, X., Halevy, A.: Answering Structured Queries on Unstructured Data. In: WebDB, pp. 25–30 (2006)Google Scholar
  48. 48.
    Lorenzo, G.D., Hacid, H., Paik, H.Y., Benatallah, B.: Data Integration in Mashups. SIGMOD Record 38(1), 59–66 (2009)CrossRefGoogle Scholar
  49. 49.
    Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR, pp. 342–350 (2007)Google Scholar
  50. 50.
    Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A.: Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings. In: van Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 79–93. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  51. 51.
    McCann, R., Shen, W., Doan, A.: Matching Schemas in Online Communities: A Web 2.0 Approach. In: ICDE, pp. 110–119 (2008)Google Scholar
  52. 52.
    McKusick, V.A.: Mendelian Inheritance in Man and Its Online Version, OMIM. Am. J. Hum. Genet. 80(4), 588–604 (2007), http://www.ncbi.nlm.nih.gov/omim/ CrossRefGoogle Scholar
  53. 53.
    Mecca, G., Papotti, P., Raunich, S., Buoncristiano, M.: Concise and Expressive Mappings with +Spicy. PVLDB 2(2), 1582–1585 (2009)Google Scholar
  54. 54.
    Melnik, S.: Generic Model Management. LNCS, vol. 2967. Springer, Heidelberg (2004)zbMATHGoogle Scholar
  55. 55.
    Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: A Semantics for Model Management Operators. Technical Report MSR-TR-2004-59, Microsoft Research (2004)Google Scholar
  56. 56.
    Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: Supporting Executable Mappings in Model Management. In: SIGMOD, pp. 167–178 (2005)Google Scholar
  57. 57.
    Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A Programming Platform for Generic Model Management. In: SIGMOD, pp. 193–204 (2003)Google Scholar
  58. 58.
    Miller, R.J., Haas, L.M., Hernández, M.A.: Schema Mapping as Query Discovery. In: VLDB, pp. 77–88 (2000)Google Scholar
  59. 59.
    Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Ho, C.T.H., Fagin, R., Popa, L.: The Clio Project: Managing Heterogeneity. SIGMOD Record 30(1), 78–83 (2001)CrossRefGoogle Scholar
  60. 60.
    Parkinson, H., Sarkans, U., Kolesnikov, N., et al.: ArrayExpress Update - an Archive of Microarray and High-Throughput Sequencing-based Functional Genomics Experiments. Nucleic Acids Research (2010)Google Scholar
  61. 61.
    Poulovassilis, A., McBrien, P.: A General Formal Framework for Schema Transformation. Data Knowl. Eng. 28(1), 47–71 (1998)zbMATHCrossRefGoogle Scholar
  62. 62.
    Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)zbMATHCrossRefGoogle Scholar
  63. 63.
    Sarma, A.D., Dong, X. L., Halevy, A.Y.: Data Modeling in Dataspace Support Platforms. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Mylopoulos Festschrift. LNCS, vol. 5600, pp. 122–138. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  64. 64.
    Talukdar, P.P., Ives, Z.G., Pereira, F.: Automatically Incorporating New Sources in Keyword Search-based Data Integration. In: Elmagarmid, A.K., Agrawal, D. (eds.) SIGMOD Conference, pp. 387–398. ACM (2010)Google Scholar
  65. 65.
    Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to Create Data-Integrating Queries. PVLDB 1(1), 785–796 (2008)Google Scholar
  66. 66.
    The Gene Ontology Consortium: Gene Ontology: Tool for the Unification of Biology. Nature Genetics 25(1), 25–29 (2000); Databases in Biology: Gene Ontology Google Scholar
  67. 67.
    Vaz Salles, M.A., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: iTrails: Pay-as-you-go Information Integration in Dataspaces. In: VLDB, pp. 663–674 (2007)Google Scholar
  68. 68.
    Vizcaíno, J.A., Côté, R., Reisinger, F., Foster, J.M., Mueller, M., Rameseder, J., Hermjakob, H., Martens, L.: A Guide to the Proteomics Identifications Database Proteomics Data Repository. Proteomics 9(18), 4276–4283 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Cornelia Hedeler
    • 1
  • Alvaro A. A. Fernandes
    • 1
  • Khalid Belhajjame
    • 1
  • Lu Mao
    • 1
  • Chenjuan Guo
    • 1
  • Norman W. Paton
    • 1
  • Suzanne M. Embury
    • 1
  1. 1.University of ManchesterManchesterUK

Personalised recommendations