Querying Conflicting Web Data Sources

  • Gilles Nachouki
  • Mohamed Quafafou
  • Omar Boucelma
  • François-Marie Colonna
Part of the Intelligent Systems Reference Library book series (ISRL, volume 36)

Abstract

Over the last twenty years, information integration has received considerable efforts from both industry and academia. Approaches to information integration developed so far can be categorized as follows: (1) first-generation approaches, that require the definition of a global schema and a semantic integration which should be performed upfront (before query execution); (2) second-generation approaches, well illustrated by the dataspace management concept, which promote a pay-asyou-go data integration. The first category has led to well known mediation approaches such as GAV (Global as View), LAV (Local as View), GLAV (Generalized Local As View), BAV (Both As View), and BGLAV (BYU Global-Local-as-View). Approaches pertaining to the second category are geared towards the development of dataspace management systems and are currently gaining a lot of attention. In this chapter we are interested in exploiting both types of approaches in querying conflicting data spread over multiple web sources. To this aim, first we show how an XML-based BGLAV approach can handle these conflicting data sources, then we describe how the same problem can be addressed by using the Multi Fusion Approach (MFA), an approach pertaining to second-generation techniques. Both BGLAV and MFA are illustrated in using genomic data sources accessible through the Web.

Keywords

Data Integration Global Schema Semantic Variable Data Integration System Semantic Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery/
  2. 2.
    ASN.1: Abstract Syntax Notation One, http://asn1.elibel.tm.fr/en/
  3. 3.
    Benson, D., Boguski, M., Lipman, D., Ostell, J., GenBank., J.: Nucleic Acids Res. 1–6 (1997)Google Scholar
  4. 4.
    Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a Graph. In: Proc. of the First Conference on Latin American Web Congress. IEEE Computer Society (2003)Google Scholar
  5. 5.
    Brien, M., Poulovassilis, A.: Data Integration by Bi-Directional Schema Transformation Rules. In: ICDE, pp. 227–238 (2003)Google Scholar
  6. 6.
    Castano, S., Ferrara, A., Montanelli, S.: H-Match: An Algorithm for Dynamically Matching Ontologies in Peer-based Systems. In: Proc. of the 1st Int. Workshop on Semantic Web and Databases (SWDB) VLDB 2003, pp. 231–250 (2003)Google Scholar
  7. 7.
    Colonna, F.M.: Intégration de Données Hétérogènes et Distribuées sur le Web et Applications à la Biologie. Ph.D. thesis. University Paul Cézanne, Aix-Marseille 3 (2008)Google Scholar
  8. 8.
    Colonna, F.M., Sam, Y., Boucelma, O.: Database Integration for Predisposition Genes Discovery. In: Challenges and Opportunities of Healthgrids, Proc. of 4th HealthGrid Annual Conference. Studies in Health Technology and Informatics, vol. 120. IOS Press (2006)Google Scholar
  9. 9.
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating Conflicting Data: The Role of Source Dependence. In: Proceedings of VLDB 2009, pp. 562–573 (2009)Google Scholar
  10. 10.
    Franklin, M.J., Halevy, A.Y., Maier, D.: From Databases to Dataspaces: a New Abstraction for Information Management. SIGMOD Record 34(4), 27–33 (2005)CrossRefGoogle Scholar
  11. 11.
    Friedman, M., Levy, A., Millstein, T.: Navigational Plans for Data Integration. In: Proc. of the National Conference on Artificial Intelligence (1999)Google Scholar
  12. 12.
    Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The TSIMMIS Approach to Mediation: Data Models and Languages. Journal of Intelligent Information Systems 8, 17–132 (1997)CrossRefGoogle Scholar
  13. 13.
    Haase, P., Broekstra, J., Eberhart, A., Volz, R.: A Comparison of RDF Query Languages. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 502–517. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Halevy, A.: Answering Queries using Views: A Survey. Journal of the VLDB, 270–294 (2001)Google Scholar
  15. 15.
    Halevy, A., Franklin, M., Maier, D.: Principles of Dataspace Systems. In: Proc. of PODS, pp. 1–9. ACM Press (2006)Google Scholar
  16. 16.
    Halevy, A., Rajaraman, A., Ordille, J.: Data Integration: The Teenage Years. In: Proceedings of VLDB (2006)Google Scholar
  17. 17.
    Hertel, A., Broekstra, J., Stuckenschmidt, H.: RDF Storage and Retrieval System. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies, pp. 489–508. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  18. 18.
    International, R.: The GDB Human Genome Database (2006), http://www.gdb.org
  19. 19.
    Jeffery, S., Franklin, M., Halevy, A.: Pay-as-you-go User Feedback for Dataspace Systems. In: Proc. of ACM SIGMOD, pp. 847–859. ACM Press (2008)Google Scholar
  20. 20.
    Karvounarakis, G., Alexaki, S., Christophides, V., Plexousakis, D., Scholl, M.: RQL: A Declarative Query Language for RDF. In: Proc. of the 11th International Conference on World Wide Web, pp. 592–603 (2002)Google Scholar
  21. 21.
    Keen, G., Burton, J., Crowley, G., Dickinson, E., Espinosa-Lujan, A., Franks, E., Harger, C., Manning, M., March, S., McLeod, M., O’Neill, J., Power, A., Pumilia, M., Reinert, R., Rider, D., Rohrlich, J., Schwertfeger, J., Smyth, L., Thayer, N., Troup, C., Fields, C.: The Genome Sequence DataBase (GSDB): Meeting the Challenge of Genomic Sequencing. Nucleic Acids Res. 24, 13–16 (1996)CrossRefGoogle Scholar
  22. 22.
    Lenzerini, M.: Data Integration: A Theoretical Perspective. In: PODS, pp. 236–246 (2002)Google Scholar
  23. 23.
    Levy, A., Rajaraman, A., Ordille, J.: Query-Answering Algorithms for Information Agents. In: Proc. of the 13th National Conference on Artificial Intelligence (IAAI 1996), AAAI Press, MIT Press, pp. 40–47 (1996)Google Scholar
  24. 24.
    Lyngbaek, P., McLeod, D.: An Approach to Object Sharing in Distributed Database Systems. In: Proc. of the VLDB, pp. 364–375 (1983)Google Scholar
  25. 25.
    Mootha, V., Lepage, P., Miller, K., Bunkenborg, J., Reich, M., Hjerrild, M., Delmonte, T., Villeneuve, A., Sladek, R., Xu, F., Mitchell, G.A., Morin, C., Mann, M., Hudson, T., Robinson, B., Rioux, J., Lande, E.S.: Identification of a Gene Causing Human Cytochrome Oxidase Deficiency by Integrative Genomics. Proc. of the National Academy of Sciences, 605–610 (2003)Google Scholar
  26. 26.
    Nachouki, G., Quafafou, M.: Multi-Data Source Fusion. Information Fusion 9(4), 523–537 (2008)CrossRefGoogle Scholar
  27. 27.
    Nachouki, G., Quafafou, M.: MashUp Web Data Sources and Services based on Semantic Queries. Special Issue: Semantic Integration of Data, Multimedia and Services 36(2), 151–173 (2011); ISSN 0306-4379Google Scholar
  28. 28.
    Nachouki, G., Quafafou, M.: Using Semantic equivalence for MRL Queries Rewriting in Multi-Data Source Fusion System. In: Jin, H. (ed.) Data Management in Semantic Web, pp. 345–382. Nova Science Publishers (2011)Google Scholar
  29. 29.
    Nachouki, G., Quafafou, M., Chastang, M.: A System Based on Multidatasource Approach for Data Integration. In: IEEE-International Conference on Web Intelligence (WI), pp. 438–441 (2005)Google Scholar
  30. 30.
  31. 31.
    Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF, W3C Recommendation. (2008), http://www.w3.org/TR/rdf-sparql-query/
  32. 32.
    Rahm, E., Bernstein, P.: A Survey of Approaches to Automatic Schema Matching. Journal of the VLDB 10(4), 334–350 (2001)MATHCrossRefGoogle Scholar
  33. 33.
    Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: Proc. of ACM SIGMOD, pp. 663–674. ACM Press (2008)Google Scholar
  34. 34.
    Schulze-Kremer, S.: Ontologies for Molecular Biology. In: Proc. of the 3rd Pacific Symposium on Biocomputing, pp. 705–716 (1998)Google Scholar
  35. 35.
    Sheth, A., Larson, J.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys (CSUR), 183–236 (1990)Google Scholar
  36. 36.
    Xu, L., Embley, D.W.: Combining the Best of Global-as-View and Local-as-View for Data Integration. In: ISTA, pp. 123–136 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Gilles Nachouki
    • 1
  • Mohamed Quafafou
    • 2
  • Omar Boucelma
    • 2
  • François-Marie Colonna
    • 3
  1. 1.LINA-UMR CNRS 6241Nantes UniversityNantesFrance
  2. 2.LSIS-UMR CNRS 6168Aix-Marseille UniversityMarseilleFrance
  3. 3.Institut Supérieur de l’Electronique et du NumériqueRennesFrance

Personalised recommendations