Abstract
Evaluation of data non-quality in database or datawarehouse systems is a preliminary stage before any data usage and analysis, moreover in the context of data integration where several sources provide more or less redundant or contradictory information items and whose quality is often unknown, imprecise and very heterogeneous. Our application domain is bioinformatics where more than five hundred of semi-structured databanks propose biological information without any quality information (i.e. metadata and statistics describing the production and the management of the biological data). In order to facilitate the multi-source data integration in the context of distributed biological databanks, we propose a technique based on the concepts of quality contract and data source negotiation for a standard wrapper-mediator architecture. A quality source contract allows to specify quality dimensions necessary to the mediator for data extraction among several distributed resources. The source selection is dynamically computed with the contract negotiation which we propose to include into the mediation and the global query processings before data acquisition. The integration of the multi-source biological data is differed for the restitution and combination of the results ofthe global user’s query by techniques of data recommendation taking into account source quality requirements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Carey, L. Haas, and P. Schwarz et al. Towards heterogeneous multimedia information systems: The GARLIC approach. In RIDE-DOM, pages 124–131, March 1995.
S. Chawathe, H. Garcia-Molina, and J. Hammer et al. The TSIMMIS project: Integration of heterogeneous information sources. IPSJ, pages 7–18, October 1994.
C. Chee, Y. Arens, C. Knoblock, and C. Hsu. Retrieving and integrating data from multiple information sources. Intl. J. of Intelligent and Cooperative Information Systems, 2(2):127–158, 1993.
D. Clavanese, G. De Giacomo, and M. Lenzerini et al. Data integration in datawarehousing. Tech. Rep., 1997.
S. Cluet, C. Delobel, J. Siméon, and K. Smaga. Your mediators need data conversion! In ACM SIGMOD Conf. on Management of Data, pp. 177–188, 1998.
M. Fernandez, D. Florescu, and J. Kang et al. Catching the boat with STRUDEL: Experiences with a web-site management system. In ACM SIGMOD Conf. on Management of Data, pp. 414–425, 1998.
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model, and algorithms. Tech. Rep. RR-4149, INRIA, 2001.
C. Goh, S. Madnick, and M. Siegel. Context Interchange: overcoming the challenges of the large-scale interoperable database systems in a dynamic environment. In Proc. of CIKM’94, pp. 337–346, 1994.
M. Goodchild and R. Jeansoulin. Data quality in geographic information: from error to uncertainty. Hermès, 1998.
W. Hou, Z. Zhang. Enhancing database correctness: a statistical approach. In Proc. of ACM SIGMOD Conf. on Management of Data, 1995.
R. Hull. Managing semantic heterogeneity in databases: a theoretical prospective. In Proc. of PODS’97, pp. 51–61, 1997.
M. Jarke, M. Lenzerini, Y. Vassiliou, and P. Vassiliadis. Fundamentals of Data Warehouses. Springer, 1998.
S. H. Kan. Metrics and models in software quality engineering. Addison-Wesley, 1995.
A. Y. Levy, D. Srivastava, and T. Kirk. Data model and query evaluation in global information system. J. of Intelligent Information Systems, 5(2):121–143, 1995.
E.P. Lim, J. Srivastava, and S. Shekhar. Resolving attribute incompatibility in database integration: An evidential reasoning approach. In Proc. of the 10th Intl. Conference on Data Engineering (ICDE’94), 1994.
A. Monge, C. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997.
F. Naumann, U. Leser. Quality-driven integration ofh eterogeneous information systems. In Proc. of VLDB’99, pp. 447–458, 1999.
J. Ordille, A. Levy, and A. Rajaraman. Querying heterogeneous information sources using source descriptions. In Proc. of VLDB’96, pp. 251–262, 1996.
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information source. In Proc. of ICDE’95, pp. 251–260, 1995.
T.C. Redman. Data quality for the information age. Artech House, 1996.
J. Rothenberg. Metadata to support data quality and longevity. In Proc. of IEEE Metadata Conf., 1996.
F. Sadri. Reliability ofan swers to queries in relational databases. IEEE TKDE, 3(2):245–252, 1991.
J. Schlimmer. Learning determinations and checking databases. In Proc. of the AAAI-91 Workshop on KDD, 1991.
A. Sheth, C. Wood, and V. Kashyap. Q-data: Using deductive database technology to improve data quality. In Proc. of ILPS’93, pp. 23–56, 1993.
D. Strong, Y. Lee, and R. Wang. Data quality in context. Com. of the ACM, 40(5):103–110, 1997.
G. Tayi, D. Ballou. Examining data quality. Com. of the ACM, 41(2):54–57, 1998.
R. Wang. A product perspective on Total Data Quality Management. Com. of the ACM, 41(2):58–65, 1998.
R. Wang, S. Madnick. A polygen model for heterogeneous database systems: the source tagging perspective. In Proc. of VLDB’90, pp. 519–538, 1990.
R. Wang, V. Storey, and C. Firth. A framework for analysis of data quality research. IEEE TKDE, 7(4):623–638, 1995.
G. Wiederhold. Mediation in information systems. ACM Computing Surveys, 27(2):265–267, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berti-Equille, L. (2001). Integration of Biological Data and Quality-Driven Source Negotiation. In: S.Kunii, H., Jajodia, S., Sølvberg, A. (eds) Conceptual Modeling — ER 2001. ER 2001. Lecture Notes in Computer Science, vol 2224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45581-7_20
Download citation
DOI: https://doi.org/10.1007/3-540-45581-7_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42866-4
Online ISBN: 978-3-540-45581-3
eBook Packages: Springer Book Archive