Abstract
Data profiling is a crucial step in the data quality process as it provides the current data quality rules. In this paper we present experimental results comparing our DOMAIN method for the discovery of domain constraint values to the commercially available Oracle Warehouse Builder (OWB). The experimental results prove that the effectiveness of our approach in the discovery of domain values for textual data affected by data quality problems is greater than that offered by the OWB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arieli, O., Denecker, M., Bruynooghe, M.: Distance Semantics for Database Repair. Annals of Mathematics and Artificial Intelligence 50(3-4), 389–415 (2007)
Ceri, S., Giunta, F.D., Lanzi, P.L.: Mining Constraint Violations. ACM Transactions on Database Systems 32(1), 1–32 (2007)
Ciszak, Ł.: A method for automatic discovery of reference data. In: Chien, B.-C., Hong, T.-P., Chen, S.-M., Ali, M. (eds.) IEA/AIE 2009. LNCS, vol. 5579, pp. 797–805. Springer, Heidelberg (2009)
Ciszak, Ł.: Experimental Comparison of String Similarity Measures for Data Cleaning. In: Proceedings of the 3rd National Scientific Conference on Data Processing Techniques, KKNTPD (2010)
Engle, J.T., Robertson, E.L.: HLS: Tunable Mining of Approximate Functional Dependencies. In: Gray, A., Jeffery, K., Shao, J. (eds.) BNCOD 2008. LNCS, vol. 5071, pp. 28–39. Springer, Heidelberg (2008)
Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional Functional Dependencies for Capturing Data Inconsistencies. ACM Transactions on Database Systems 33(2), 1–48 (2008)
Huhtala, Y., Porkka, P., Toivonen, H.: TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. The Computer Journal 42(2), 100–111 (1999)
Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, 1st edn. Wiley, Chichester (2004)
Lindsey, E.: Three-Dimensional Analysis. Data Profiling Techniques. Data Profiling LLC (2008)
Maydanchik, A.: Data Quality Assessment. Technics Publications, LLC (2007)
Wand, Y., Yang, R.D.: Anchoring data quality dimensions in ontological foundations. Communications of the ACM 39, 86–95 (1996)
Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proceedings of the Section on Survey Research Methods (1990)
Winkler, W.E.: Overview of Record Linkage and Current Research Directions. Tech. rep., Statistical Research Division U.S. Census Bureau (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ciszak, Ł. (2011). Discovery of Domain Values for Data Quality Assurance. In: Mehrotra, K.G., Mohan, C., Oh, J.C., Varshney, P.K., Ali, M. (eds) Developing Concepts in Applied Intelligence. Studies in Computational Intelligence, vol 363. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21332-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-21332-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21331-1
Online ISBN: 978-3-642-21332-8
eBook Packages: EngineeringEngineering (R0)