Uncertainty in Data Integration and Dataspace Support Platforms

  • Anish Das SarmaEmail author
  • Xin Luna Dong
  • Alon Y. Halevy
Part of the Data-Centric Systems and Applications book series (DCSA)


Data integration has been an important area of research for several years. However, such systems suffer from one of the main drawbacks of database systems: the need to invest significant modeling effort upfront. Dataspace support platforms (DSSP) envision a system that offers useful services on its data without any setup effort and that improves with time in a pay-as-you-go fashion. We argue that to support DSSPs, the system needs to model uncertainty at its core. We describe the concepts of probabilistic mediated schemas and probabilistic mappings as enabling concepts for DSSPs.


Schema Mapping Keyword Query Conjunctive Query Source Attribute Query Answer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. .
    Agrawal S, Chaudhuri S, Das G (2002) DBXplorer: A system for keyword-based search over relational databases. In: ICDE, February 2002. IEEE Computer Society, Washington, DC, p 5Google Scholar
  2. .
    Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4):323–364CrossRefGoogle Scholar
  3. .
    Berlin J, Motro A (2002) Database schema matching using machine learning with feature selection. In: Proceedings of the 14th international conference on advanced information systems engineering (CAiSE02), May 2002. Springer, London, pp 452–466Google Scholar
  4. .
    Buneman P, Davidson S, Kosky A (1992) Theoretical aspects of schema merging. In: Proceedings of EDBT, March 1992. Springer, London, pp 152–167Google Scholar
  5. .
    Chiticariu L, Kolaitis PG, Popa L (2008) Interactive generation of integrated schemas. In: Proceedings of ACM SIGMOD, Vancouver, Canada, June 2008. ACM, NY, pp 833–846Google Scholar
  6. .
    Dhamankar R, Lee Y, Doan A, Halevy AY, Domingos P (2004) iMAP: Discovering complex semantic matches between database schemas. In: Proceedings of ACM SIGMOD, Paris, France, June 2004. ACM, NY, pp 383–394Google Scholar
  7. .
    Do H, Rahm E (2002) COMA – a system for flexible combination of schema matching approaches. In: Proceedings of VLDB, Hong Kong, China, August 2002. VLDB Endowment, pp 610–621Google Scholar
  8. .
    Doan A, Madhavan J, Domingos P, Halevy AY (2002) Learning to map between ontologies on the Semantic Web. In: Proceedings of the international WWW conference, Honolulu, HI, May 2002. ACM, NY, pp 662–673Google Scholar
  9. .
    Dong X, Halevy AY (2005) A platform for personal information management and integration. In: Proceedings of Conference on Innovative Data Research (CIDR), Asilomar, CAGoogle Scholar
  10. .
    Dong X, Halevy AY, Yu C (2007) Data integration with uncertainty. In: Proceedings of VLDB, Vienna, Austria, September 2007. VLDB Endowment, pp 687–698Google Scholar
  11. .
    Florescu D, Koller D, Levy AY (1997) Using probabilistic information in data integration. In: Proceedings of VLDB, August 1997. Morgan Kaufmann, CA, pp 216–225Google Scholar
  12. .
    Gal A (2007) Why is schema matching tough and what can we do about it? SIGMOD Rec35(4):2–5Google Scholar
  13. .
    Gal A, Anaby-Tavor A, Trombetta A, Montesi D (2005a) A framework for modeling and evaluating automatic semantic reconciliation. VLDB J 14(1):50–67CrossRefGoogle Scholar
  14. .
    Gal A, Modica G, Jamil H, Eyal A (2005b) Automatic ontology matching using application semantics. AI Mag 26(1):21–31Google Scholar
  15. .
    Gal A, Martinez M, Simari G, Subrahmanian V (2009) Aggregate query answering under uncertain schema mappings. In: Proceedings of ICDE, Shanghai, China, March 2009. IEEE Computer Society, Washington, DC, pp 940–951Google Scholar
  16. .
    GoogleBase (2005) GoogleBase. Scholar
  17. .
    Halevy AY, Ashish N, Bitton D, Carey MJ, Draper D, Pollock J, Rosenthal A, Sikka V (2005) Enterprise information integration: Successes, challenges and controversies. In: SIGMOD, Baltimore, MD, June 2005. ACM, NY, pp 778–787Google Scholar
  18. .
    Halevy AY, Franklin MJ, Maier D (2006a) Principles of dataspace systems. In: PODS, Chicago, IL, June 2006. ACM, NY, pp 1–9Google Scholar
  19. .
    Halevy AY, Rajaraman A, Ordille JJ (2006b) Data integration: The teenage years. In: VLDB, Seoul, Korea, September 2006. VLDB Endowment, pp 9–16Google Scholar
  20. .
    He B, Chang KC (2003) Statistical schema matching across web query interfaces. In: Proceedings of ACM SIGMOD, San Diego, CA, June 2003. ACM, NY, pp 217–228Google Scholar
  21. .
    He B, Chang KCC (2006) Automatic complex schema matching across web query interfaces: A correlation mining approach. TODS 31(1):346–395CrossRefGoogle Scholar
  22. .
    He B, Chang KCC, Han J (2004) Discovering complex matchings across web query interfaces: a correlation mining approach. In: KDDGoogle Scholar
  23. .
    Hristidis V, Papakonstantinou Y (2002) DISCOVER: Keyword search in relational databases. In: Proceedings of VLDB, Seattle, WA, August 2004. ACM, NY, pp 148–157Google Scholar
  24. .
    Hull R (1984) Relative information capacity of simple relational database schemata. In: Proceedings of ACM PODS, Waterloo, ON, April 1984. ACM, NY, pp 97–109Google Scholar
  25. .
    Kalinichenko LA (1990) Methods and tools for equivalent data model mapping construction. In: Proceedings of EDBT, Venice, Italy, March 1990. Springer, NY, pp 92–119Google Scholar
  26. .
    Kang J, Naughton J (2003) On schema matching with opaque column names and data values. In: Proceedings of ACM SIGMOD, San Diego, CA, June 2003. ACM, NY, pp 205–216Google Scholar
  27. .
    Levy A (ed) (2000) Special issue on adaptive query processing. IEEE Data Eng Bull 23(2), IEEE Computer Society, Washington, DCGoogle Scholar
  28. .
    Madhavan J, Cohen S, Dong X, Halevy A, Jeffery S, Ko D, Yu C (2007) Web-scale data integration: You can afford to pay as you go. In: Proceedings of CIDR, pp 342–350Google Scholar
  29. .
    Magnani M, Montesi D (2007) Uncertainty in data integration: current approaches and open problems. In: VLDB workshop on management of uncertain data, pp 18–32Google Scholar
  30. .
    Magnani M, Rizopoulos N, Brien P, Montesi D (2005) Schema integration based on uncertain semantic mappings. Lecture Notes in Computer Science, vol 3716. Springer, Heidelberg, pp 31–46Google Scholar
  31. .
    Miller RJ, Ioannidis Y, Ramakrishnan R (1993) The use of information capacity in schema integration and translation. In: Proceedings of VLDB, August 1993. Morgan Kaufmann, CA, pp 120–133Google Scholar
  32. .
    Nottelmann H, Straccia U (2007) Information retrieval and machine learning for probabilistic schema matching. Inform Process Manag 43(3):552–576CrossRefGoogle Scholar
  33. .
    Pottinger R, Bernstein P (2002) Creating a mediated schema based on initial correspondences. IEEE Data Eng Bull 25:26–31Google Scholar
  34. .
    Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350zbMATHCrossRefGoogle Scholar
  35. .
    Sarma AD, Dong L, Halevy A (2008) Bootstrapping pay-as-you-go data integration systems. In: Proceedings of ACM SIGMOD, Vancouver, Canada, June 2008. ACM, NY, pp 861–874Google Scholar
  36. .
    Wang J, Wen J, Lochovsky FH, Ma W (2004) Instance-based schema matching for Web databases by domain-specific query probing. In: Proceedings of VLDB, Toronto, Canada, August 2004. VLDB Endowment, pp 408–419Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Anish Das Sarma
    • 1
    Email author
  • Xin Luna Dong
    • 2
  • Alon Y. Halevy
    • 3
  1. 1.Yahoo! ResearchSanta ClaraUSA
  2. 2.AT&T Labs – ResearchFlorham ParkUSA
  3. 3.Google Inc.Mountain ViewUSA

Personalised recommendations