Skip to main content

Reliability Models for Data Integration Systems

  • Chapter

Part of the book series: Springer Series in Reliability Engineering ((RELIABILITY))

Abstract

Data integration systems (DIS) are devoted to providing information by integrating and transforming data extracted from external sources. Examples of DIS are the mediators, data warehouses, federations of databases, and web portals. Data quality is an essential issue in DIS as it concerns the confidence of users in the supplied information. One of the main challenges in this field is to offer rigorous and practical means to evaluate the quality of DIS. In this sense, DIS reliability intends to represent its capability for providing data with a certain level of quality, taking into account not only current quality values but also the changes that may occur in data quality at the external sources. Simulation techniques constitute a non-traditional approach to data quality evaluation, and more specifically for DIS reliability. This chapter presents techniques for DIS reliability evaluation by applying simulation techniques in addition to exact computation models. Simulation enables some important drawbacks of exact techniques to be addressed: the scalability of the reliability computation when the set of data sources grows, and modeling data sources with inter-related (non independent) quality properties.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In the solution given in this work we do not differentiate between “quality factor” and “quality dimension.”

References

  1. Bulteau S, El Khadiri M (2002) A new importance sampling Monte Carlo method for a flow network reliability problem. Naval Res Logist 49(2):204–228

    Article  MATH  MathSciNet  Google Scholar 

  2. Canavos G (1988) Probabilidad y estadística. Aplicaciones y métodos. McGraw Hill, Madrid, Spain [ISBN: 968-451-856-0]

    Google Scholar 

  3. Cho J, Garcia-Molina H (2003) Estimating frequency of change. ACM Trans Internet Technol 3(3):256–290

    Article  Google Scholar 

  4. Cancela H, El Khadiri M, Rubino G (2006) An efficient simulation method for K-network reliability problem. In 6th international workshop on rare event simulation (RESIM’2006), Bamberg, Germany

    Google Scholar 

  5. Cancela H, El Khadiri M, Rubino G (2009) Rare events analysis by Monte Carlo techniques in static models. In: Rubino G and Tuffin B (eds) Rare event simulation methods using Monte Carlo methods, Chap 7. Wiley, Chichester, UK

    Google Scholar 

  6. Cancela H, Murray L, Rubino G (2008) Splitting in source-terminal network reliability estimation. In: 7th international workshop on rare event simulation (RESIM’2008), Rennes, France

    Google Scholar 

  7. Gertsbakh I (1989) Statistical reliability theory. Probability: pure and applied. (A series of text books and reference books.) Marcel Dekker, New York, NY, USA [ISBN: 0-8247-8019-1]

    Google Scholar 

  8. Gertz M, Tamer Ozsu M, Saake G, Sattler K (1998) Managing data quality and integrity in federated databases. In: 2nd working conference on integrity and internal control in information systems (IICIS’1998), Warrenton, USA, Kluwer, Deventer, The Netherlands

    Google Scholar 

  9. Gertz M, Tamer Ozsu M, Saake G, Sattler K (2004) Report on the Dagstuhl seminar: data quality on the web. SIGMOD Rec 33(1), March. vol 33, issue 1 (March 2004) ACM, New York, NY, USA, pp 127–132

    Google Scholar 

  10. Helfert M, Herrmann C (2002) Proactive data quality management for data warehouse systems. In: International workshop on design and management of data warehouses (DMDW’2002), Toronto, Canada. University of Toronto Bookstores, Toronto, Canada, pp 97–106

    Google Scholar 

  11. Hui K, Bean N, Kraetzl M, Kroese D (2005) The cross-entropy method for network reliability estimation. Oper Res 134:101–118

    MATH  MathSciNet  Google Scholar 

  12. Jankowska M A (2000) The need for environmental information quality. Issues in Science and Technology Librarianship. http://www.library.ucsb.edu/istl/00-spring/article5.html (Last modified in 2000.)

  13. Jarke M, Vassiliou Y (1997) Data warehouse quality: a review of the DWQ project. In: 2nd conference on information quality (IQ’1997), Cambridge, MA, MIT Pub, Cambridge, MA, USA

    Google Scholar 

  14. Marotta A (2008) Data quality maintenance in data integration systems. PhD thesis, University of the Republic, Uruguay

    Google Scholar 

  15. Marotta A, Ruggia R (2008) Applying probabilistic models to data quality change management. In: 3rd international conference on software and data technologies (ICSOFT’2008), Porto, Portugal, INSTICC, Setubal, Portugal

    Google Scholar 

  16. Mazzi G L, Museux J M, Savio G (2005) Quality measures for economic indicators. Statistical Office of the European Communities, Eurostat, http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-DT-05-003/EN/KS-DT-05-003-EN.PDF [ISBN 92-894-8623-6]

  17. Müller H, Naumann F (2003) Data quality in genome databases. In: Proceedings of the 8th international conference on information quality (IQ 2003), MIT, Cambridge, MA, USA

    Google Scholar 

  18. Neely M (2005) The product approach to data quality and fitness for use: a framework for analysis. In: 10th international conference on information quality (IQ’2005), Cambridge, MA, MIT Pub, Cambridge, MA, USA

    Google Scholar 

  19. Peralta V (2006) Data quality evaluation in data integration systems. PhD thesis, University of Versailles, France and University of the Republic, Uruguay.

    Google Scholar 

  20. Peralta V, Ruggia R, Bouzeghoub M (2004) Analyzing and evaluating data freshness in data integration systems. Ing Syst Inf 9(5–6):145–162

    Google Scholar 

  21. Peralta V, Ruggia R, Kedad Z, Bouzeghoub M (2004) A framework for data quality evaluation in a data integration system. In: 19th Brazilian symposium on databases (SBBD’2004), Brasilia, Brazil, Universidade de Brasilia, Brasilia, Brasil, pp 134–147

    Google Scholar 

  22. Rubino G (1999) Network reliability evaluation. In: Walrand J, Bagchi K, Zobrist G (eds) Network performance modeling and simulation. Gordon and Breach Science Publishers, Amsterdam

    Google Scholar 

  23. Salanti G, Sanderson S, Higgins J (2005) Obstacles and opportunities in meta-analysis of genetic association studies. Genet Med 7(1):13–20

    Article  Google Scholar 

  24. Scannapieco M, Missier P, Batini C (2005) Data quality at a glance. Datenbank-Spektrum 14:6–14

    Google Scholar 

  25. US Environment Protection Agency (2004) Increase the availability of quality health and environmental information. Available at http://www.epa.gov/oei/increase.htm (last accessed August 2004)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London Limited

About this chapter

Cite this chapter

Marotta, A., Cancela, H., Peralta, V., Ruggia, R. (2010). Reliability Models for Data Integration Systems. In: Faulin, J., Juan, A., Martorell, S., Ramírez-Márquez, JE. (eds) Simulation Methods for Reliability and Availability of Complex Systems. Springer Series in Reliability Engineering. Springer, London. https://doi.org/10.1007/978-1-84882-213-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-84882-213-9_6

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84882-212-2

  • Online ISBN: 978-1-84882-213-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics