Skip to main content

Discovery of Domain Values for Data Quality Assurance

  • Chapter
Developing Concepts in Applied Intelligence

Part of the book series: Studies in Computational Intelligence ((SCI,volume 363))

  • 589 Accesses

Abstract

Data profiling is a crucial step in the data quality process as it provides the current data quality rules. In this paper we present experimental results comparing our DOMAIN method for the discovery of domain constraint values to the commercially available Oracle Warehouse Builder (OWB). The experimental results prove that the effectiveness of our approach in the discovery of domain values for textual data affected by data quality problems is greater than that offered by the OWB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arieli, O., Denecker, M., Bruynooghe, M.: Distance Semantics for Database Repair. Annals of Mathematics and Artificial Intelligence 50(3-4), 389–415 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  2. Ceri, S., Giunta, F.D., Lanzi, P.L.: Mining Constraint Violations. ACM Transactions on Database Systems 32(1), 1–32 (2007)

    Article  Google Scholar 

  3. Ciszak, Ł.: A method for automatic discovery of reference data. In: Chien, B.-C., Hong, T.-P., Chen, S.-M., Ali, M. (eds.) IEA/AIE 2009. LNCS, vol. 5579, pp. 797–805. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Ciszak, Ł.: Experimental Comparison of String Similarity Measures for Data Cleaning. In: Proceedings of the 3rd National Scientific Conference on Data Processing Techniques, KKNTPD (2010)

    Google Scholar 

  5. Engle, J.T., Robertson, E.L.: HLS: Tunable Mining of Approximate Functional Dependencies. In: Gray, A., Jeffery, K., Shao, J. (eds.) BNCOD 2008. LNCS, vol. 5071, pp. 28–39. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional Functional Dependencies for Capturing Data Inconsistencies. ACM Transactions on Database Systems 33(2), 1–48 (2008)

    Article  Google Scholar 

  7. Huhtala, Y., Porkka, P., Toivonen, H.: TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. The Computer Journal 42(2), 100–111 (1999)

    Article  MATH  Google Scholar 

  8. Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, 1st edn. Wiley, Chichester (2004)

    Google Scholar 

  9. Lindsey, E.: Three-Dimensional Analysis. Data Profiling Techniques. Data Profiling LLC (2008)

    Google Scholar 

  10. Maydanchik, A.: Data Quality Assessment. Technics Publications, LLC (2007)

    Google Scholar 

  11. Wand, Y., Yang, R.D.: Anchoring data quality dimensions in ontological foundations. Communications of the ACM 39, 86–95 (1996)

    Article  Google Scholar 

  12. Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proceedings of the Section on Survey Research Methods (1990)

    Google Scholar 

  13. Winkler, W.E.: Overview of Record Linkage and Current Research Directions. Tech. rep., Statistical Research Division U.S. Census Bureau (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ciszak, Ł. (2011). Discovery of Domain Values for Data Quality Assurance. In: Mehrotra, K.G., Mohan, C., Oh, J.C., Varshney, P.K., Ali, M. (eds) Developing Concepts in Applied Intelligence. Studies in Computational Intelligence, vol 363. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21332-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21332-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21331-1

  • Online ISBN: 978-3-642-21332-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics