Advertisement

Estimating data accuracy in a federated database environment

  • M. P. Reddy
  • Richard Y. Wang
Distributed Systems
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1006)

Abstract

The need for integration of data in a heterogeneous or federated database environment creates a corresponding need for estimating the accuracy of the integrated data as a function of the accuracy of the originating data sources. Even in a single database system, different base relations are frequently characterized by dissimilar levels of accuracy; however, no technique exists for defining the accuracy of this single database system in terms of the accuracy of the base relations. This need is further heightened in the case of federated environments involving multiple heterogeneous databases. To address this need, a generalized method is proposed for estimating the overall data accuracy in terms of the accuracy of relevant base relations and the actual database query. The query is examined in terms of its underlying set of base operators. A rigorous theoretical framework encompassing all these possible base operators is presented in this paper using the relational model. While the accuracy estimates are postulated on the basis of uniform distribution, the implications of non-uniform error distributions are also examined in theoretical terms. Finally, a running example is utilized to highlight the practical implications of the proposed theoretical framework.

Key words

Data Quality Relational Algebra Database Management Systems 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Ceri, S. & Pelagatti, G. (1984). Distributed Databases Principles & Systems. McGraw-Hill.Google Scholar
  2. [2]
    Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.CrossRefGoogle Scholar
  3. [3]
    Date, C. J. (1990). An Introduction to Database Systems. Reading: Addison-Wesley.Google Scholar
  4. [4]
    Deen, S. M., et al. (1987). Implementation of a prototype for PRECI*. Computer Journal, 30(2), 157–162.CrossRefGoogle Scholar
  5. [5]
    Heimbigner, D. & McLeod, D. (1985). A federated architecture for information management. ACM Transactions on Office Information Systems, 3, 253–278.Google Scholar
  6. [6]
    Janson, M. (1988). Data Quality: The Achilles Heel of End-User Computing. Omega Journal of Management Science, 16(5), 491–502.CrossRefGoogle Scholar
  7. [7]
    Johnson, J. R., et al. (1981). Characteristics of Errors in Accounts Receivable and Inventory Audits. Accounting Review, 56(2), 270–293.Google Scholar
  8. [8]
    Kent, W. (1978). Data and Reality. New York: North Holland.Google Scholar
  9. [9]
    Klug, A. (1982). Equivalence of relational algebra and relational calculus query languages having aggregate functions. The Journal of ACM, 29, 699–717CrossRefGoogle Scholar
  10. [10]
    Lander, T. & Rosenberg, R. (1982). An Overview of Multibase. In proceedings of Second Symposiam on Distributed Databases, Sept. 1982.Google Scholar
  11. [11]
    Laudon, K. C. (1986). Data Quality and Due Process in Large Interorganizational Record Systems. Communications of the ACM, 29(1), 4–11.CrossRefGoogle Scholar
  12. [12]
    Liepens, G. E., et al. (1982). Error localization for erroneous data: A survey. TIMS/Studies in the Management Science, 19, 205–219.Google Scholar
  13. [13]
    Litwin, W. & Abdellatif, A. (1986). Multidatabase interoperability. IEEE Computer, 10–18.Google Scholar
  14. [14]
    Morey, R. C. (1982). Estimating and Improving the Quality of Information in the MIS. Communications of the ACM, 25(5), 337–342.CrossRefGoogle Scholar
  15. [15]
    O'Neill, E. T. & Vizine-Goetz, D. (1988). Quality Control in Online Databases. In Annual Review of Information, Science, and Technology, (pp. 125–156): Elsevier Publishing Company.Google Scholar
  16. [16]
    Paradice, D. B. & Fuerst, W. L. (1991). An MIS data quality methodology based on optimal error detection. Journal of Information Systems, 5(1), 48–66.Google Scholar
  17. [17]
    Pu, C. (1988). Superdatabases for Composition of Heterogeneous Databases. J. Carlis (Ed.), In IEEE 1988 Data Engineering Conference, Los Angeles, 548–555.Google Scholar
  18. [18]
    Rajinikanth, M. (1990). Multiple Database Integration in CALIDA: Design and Implementation. In First International Conference on Systems Integration, inproceedings of first international conference on systems integration, (April).Google Scholar
  19. [19]
    Reddy, M. P., et al. (1989). Query Processing in Heterogeneous Distributed Database Management Systems. (Ed.) Amar Gupta, IEEE Press, New York.Google Scholar
  20. [20]
    Sheth, A. (1991). Special Issue: Semantic Issues in Multidatabase Systems. SIGMOD Record, 20(4), (December).Google Scholar
  21. [21]
    Sheth, A. & Larson, J. (1990). Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3).Google Scholar
  22. [22]
    Smith, J. M., et al. (1981). Multibase — Integrating Heterogeneous Distributed Database Systems. In Proceedings of AFIPS, 50, 487–499.Google Scholar
  23. [23]
    Spaccapietra, S., et al. (1992). Model Independent Assertions for Integration of Heterogeneous Schemas. The VLDB Journal, 1(1), 81–126.CrossRefGoogle Scholar
  24. [24]
    Templeton, M., et al. (1987). MERMAID — A Front-end to Distributed Hetergeneous Databases. Proceedings of the IEEE, 1(5), (May), 695–708.Google Scholar
  25. [25]
    Wang, Y. R., et al. (1993). Data Quality Requirements Analysis and Modeling. In the Proceedings of the 9th International Conference on Data Engineering, Vienna: IEEE Computer Society Press, 670–677.Google Scholar
  26. [26]
    Wang, Y. R., et al. (1995). Toward Quality Data: An Attribute-based Approach. Journal of Decision Support Systems (March).Google Scholar
  27. [27]
    Wang, Y. R. & Madnick, S. E. (1989). Facilitating connectivity in composite information systems. ACM Data Base, 20(3), 38–46.Google Scholar
  28. [28]
    Wang, Y. R. & Madnick, S. E. (1990). A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In the Proceedings of the 16th International Conference on Very Large Data bases (VLDB), Brisbane, Australia, 519–538.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • M. P. Reddy
    • 1
  • Richard Y. Wang
    • 2
  1. 1.Kenan Systems CorporationCambridge
  2. 2.Sloan School of ManagementMITCambridge

Personalised recommendations