Abstract
High data quality is defined as the reliability and application efficiency of data present in a system. Maintaining high data quality has become a key feature for most organizations. Different data quality tools are used for extracting, cleaning, and matching data sources. In this paper, we first introduce state of the art open source data quality tools, specifically Talend Open Studio, DataCleaner, WinPure, Data Preparator, Data Match, DataMartist, Pentaho Kettle, SQL Power Architect, SQL Power DQguru, and DQ Analyzer. Secondly, we compare these tools based on their key features and performance in data profiling, integration, and cleaning. Overall, DataCleaner scores highest among the considered tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pushkarev, V., Neumann, H., Varol, C., Talburt, C.: An overview of Open source data quality tools. In: International Conference on Information & Knowledge Engineering (IKE), Las Vegas (2010)
Li, X., Madnick, S., Zhu, H., Fan, Y.: Improving Data Quality for Web Services Composition. In: 10th International Workshop on Quality in Databases (QDB), Trento, Italy (2009)
Lin, M., Hua, Z.: A Method for Measuring Data Quality in Data Integration. In: International Seminar on Future Information Technology and Management Engineering, Leicestershire, UK (2008)
Liu, H.: Analysis of Statistical Data Quality. In: Seventh International Joint Conference on Computational Sciences and Optimization, Beijing, China (2014)
Saha, B., Srivastava, D.: Data Quality: The other Face of Big Data. In: IEEE 30th International Conference on Data Engineering (ICDE), Chicago (2014)
Glowalla, P., Balazy, P., Basten, D., Sunyaev, A.: Process-driven Data Quality management-An Application of the combined conceptual Life Cycle Model. In: 47th Hawaii International conference on System Science, Hawaii (2014)
Talend Open Studio. https://www.talend.com/resource/data-quality-tools.html
Data Cleaner. http://datacleaner.org/docs
WinPure clean & Match. http://www.winpure.com/cleanmatch.html
Data Preparator. http://www.datapreparator.com/
Data Match. http://dataladder.com/data-matching-software/
DataMartist. http://www.datamartist.com/
Pentaho Kettle. http://wiki.pentaho.com/display/EAI/Latest+Pentaho+Data+Integration+%28aka+Kettle%29+Documentation
Pentaho Kettle. https://help.pentaho.com/Documentation/5.4/0T0/040
SQL power architect. http://www.sqlpower.ca/page/architect
SQL Power DQguru. http://www.sqlpower.ca/page/dqguru
DQ Analyzer. https://www.ataccama.com/products/dq-analyzer
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Pulla, V.S.V., Varol, C., Al, M. (2016). Open Source Data Quality Tools: Revisited. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_77
Download citation
DOI: https://doi.org/10.1007/978-3-319-32467-8_77
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32466-1
Online ISBN: 978-3-319-32467-8
eBook Packages: EngineeringEngineering (R0)