Estimating data accuracy in a federated database environment

Reddy, M. P.; Wang, Richard Y.

doi:10.1007/3-540-60584-3_27

M. P. Reddy¹ &
Richard Y. Wang²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1006))

Included in the following conference series:

International Conference on Information Systems and Management of Data

390 Accesses
18 Citations

Abstract

The need for integration of data in a heterogeneous or federated database environment creates a corresponding need for estimating the accuracy of the integrated data as a function of the accuracy of the originating data sources. Even in a single database system, different base relations are frequently characterized by dissimilar levels of accuracy; however, no technique exists for defining the accuracy of this single database system in terms of the accuracy of the base relations. This need is further heightened in the case of federated environments involving multiple heterogeneous databases. To address this need, a generalized method is proposed for estimating the overall data accuracy in terms of the accuracy of relevant base relations and the actual database query. The query is examined in terms of its underlying set of base operators. A rigorous theoretical framework encompassing all these possible base operators is presented in this paper using the relational model. While the accuracy estimates are postulated on the basis of uniform distribution, the implications of non-uniform error distributions are also examined in theoretical terms. Finally, a running example is utilized to highlight the practical implications of the proposed theoretical framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ceri, S. & Pelagatti, G. (1984). Distributed Databases Principles & Systems. McGraw-Hill.
Google Scholar
Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.
Article Google Scholar
Date, C. J. (1990). An Introduction to Database Systems. Reading: Addison-Wesley.
Google Scholar
Deen, S. M., et al. (1987). Implementation of a prototype for PRECI^*. Computer Journal, 30(2), 157–162.
Article Google Scholar
Heimbigner, D. & McLeod, D. (1985). A federated architecture for information management. ACM Transactions on Office Information Systems, 3, 253–278.
Google Scholar
Janson, M. (1988). Data Quality: The Achilles Heel of End-User Computing. Omega Journal of Management Science, 16(5), 491–502.
Article Google Scholar
Johnson, J. R., et al. (1981). Characteristics of Errors in Accounts Receivable and Inventory Audits. Accounting Review, 56(2), 270–293.
Google Scholar
Kent, W. (1978). Data and Reality. New York: North Holland.
Google Scholar
Klug, A. (1982). Equivalence of relational algebra and relational calculus query languages having aggregate functions. The Journal of ACM, 29, 699–717
Article Google Scholar
Lander, T. & Rosenberg, R. (1982). An Overview of Multibase. In proceedings of Second Symposiam on Distributed Databases, Sept. 1982.
Google Scholar
Laudon, K. C. (1986). Data Quality and Due Process in Large Interorganizational Record Systems. Communications of the ACM, 29(1), 4–11.
Article Google Scholar
Liepens, G. E., et al. (1982). Error localization for erroneous data: A survey. TIMS/Studies in the Management Science, 19, 205–219.
Google Scholar
Litwin, W. & Abdellatif, A. (1986). Multidatabase interoperability. IEEE Computer, 10–18.
Google Scholar
Morey, R. C. (1982). Estimating and Improving the Quality of Information in the MIS. Communications of the ACM, 25(5), 337–342.
Article Google Scholar
O'Neill, E. T. & Vizine-Goetz, D. (1988). Quality Control in Online Databases. In Annual Review of Information, Science, and Technology, (pp. 125–156): Elsevier Publishing Company.
Google Scholar
Paradice, D. B. & Fuerst, W. L. (1991). An MIS data quality methodology based on optimal error detection. Journal of Information Systems, 5(1), 48–66.
Google Scholar
Pu, C. (1988). Superdatabases for Composition of Heterogeneous Databases. J. Carlis (Ed.), In IEEE 1988 Data Engineering Conference, Los Angeles, 548–555.
Google Scholar
Rajinikanth, M. (1990). Multiple Database Integration in CALIDA: Design and Implementation. In First International Conference on Systems Integration, inproceedings of first international conference on systems integration, (April).
Google Scholar
Reddy, M. P., et al. (1989). Query Processing in Heterogeneous Distributed Database Management Systems. (Ed.) Amar Gupta, IEEE Press, New York.
Google Scholar
Sheth, A. (1991). Special Issue: Semantic Issues in Multidatabase Systems. SIGMOD Record, 20(4), (December).
Google Scholar
Sheth, A. & Larson, J. (1990). Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3).
Google Scholar
Smith, J. M., et al. (1981). Multibase — Integrating Heterogeneous Distributed Database Systems. In Proceedings of AFIPS, 50, 487–499.
Google Scholar
Spaccapietra, S., et al. (1992). Model Independent Assertions for Integration of Heterogeneous Schemas. The VLDB Journal, 1(1), 81–126.
Article Google Scholar
Templeton, M., et al. (1987). MERMAID — A Front-end to Distributed Hetergeneous Databases. Proceedings of the IEEE, 1(5), (May), 695–708.
Google Scholar
Wang, Y. R., et al. (1993). Data Quality Requirements Analysis and Modeling. In the Proceedings of the 9th International Conference on Data Engineering, Vienna: IEEE Computer Society Press, 670–677.
Google Scholar
Wang, Y. R., et al. (1995). Toward Quality Data: An Attribute-based Approach. Journal of Decision Support Systems (March).
Google Scholar
Wang, Y. R. & Madnick, S. E. (1989). Facilitating connectivity in composite information systems. ACM Data Base, 20(3), 38–46.
Google Scholar
Wang, Y. R. & Madnick, S. E. (1990). A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In the Proceedings of the 16th International Conference on Very Large Data bases (VLDB), Brisbane, Australia, 519–538.
Google Scholar

Download references

Author information

Authors and Affiliations

Kenan Systems Corporation, One Main Street, 02142, Cambridge, MA
M. P. Reddy
Sloan School of Management, MIT, 02139, Cambridge, MA
Richard Y. Wang

Authors

M. P. Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Richard Y. Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reddy, M.P., Wang, R.Y. (1995). Estimating data accuracy in a federated database environment. In: Bhalla, S. (eds) Information Systems and Data Management. CISMOD 1995. Lecture Notes in Computer Science, vol 1006. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60584-3_27

Download citation

DOI: https://doi.org/10.1007/3-540-60584-3_27
Published: 02 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60584-3
Online ISBN: 978-3-540-47799-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics