Skip to main content

Open Source Data Quality Tools: Revisited

  • Conference paper
  • First Online:
Information Technology: New Generations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 448))

Abstract

High data quality is defined as the reliability and application efficiency of data present in a system. Maintaining high data quality has become a key feature for most organizations. Different data quality tools are used for extracting, cleaning, and matching data sources. In this paper, we first introduce state of the art open source data quality tools, specifically Talend Open Studio, DataCleaner, WinPure, Data Preparator, Data Match, DataMartist, Pentaho Kettle, SQL Power Architect, SQL Power DQguru, and DQ Analyzer. Secondly, we compare these tools based on their key features and performance in data profiling, integration, and cleaning. Overall, DataCleaner scores highest among the considered tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Pushkarev, V., Neumann, H., Varol, C., Talburt, C.: An overview of Open source data quality tools. In: International Conference on Information & Knowledge Engineering (IKE), Las Vegas (2010)

    Google Scholar 

  2. Li, X., Madnick, S., Zhu, H., Fan, Y.: Improving Data Quality for Web Services Composition. In: 10th International Workshop on Quality in Databases (QDB), Trento, Italy (2009)

    Google Scholar 

  3. Lin, M., Hua, Z.: A Method for Measuring Data Quality in Data Integration. In: International Seminar on Future Information Technology and Management Engineering, Leicestershire, UK (2008)

    Google Scholar 

  4. Liu, H.: Analysis of Statistical Data Quality. In: Seventh International Joint Conference on Computational Sciences and Optimization, Beijing, China (2014)

    Google Scholar 

  5. Saha, B., Srivastava, D.: Data Quality: The other Face of Big Data. In: IEEE 30th International Conference on Data Engineering (ICDE), Chicago (2014)

    Google Scholar 

  6. Glowalla, P., Balazy, P., Basten, D., Sunyaev, A.: Process-driven Data Quality management-An Application of the combined conceptual Life Cycle Model. In: 47th Hawaii International conference on System Science, Hawaii (2014)

    Google Scholar 

  7. Talend Open Studio. https://www.talend.com/resource/data-quality-tools.html

  8. Data Cleaner. http://datacleaner.org/docs

  9. WinPure clean & Match. http://www.winpure.com/cleanmatch.html

  10. Data Preparator. http://www.datapreparator.com/

  11. Data Match. http://dataladder.com/data-matching-software/

  12. DataMartist. http://www.datamartist.com/

  13. Pentaho Kettle. http://wiki.pentaho.com/display/EAI/Latest+Pentaho+Data+Integration+%28aka+Kettle%29+Documentation

  14. Pentaho Kettle. https://help.pentaho.com/Documentation/5.4/0T0/040

  15. SQL power architect. http://www.sqlpower.ca/page/architect

  16. SQL Power DQguru. http://www.sqlpower.ca/page/dqguru

  17. DQ Analyzer. https://www.ataccama.com/products/dq-analyzer

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cihan Varol .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Pulla, V.S.V., Varol, C., Al, M. (2016). Open Source Data Quality Tools: Revisited. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_77

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32467-8_77

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32466-1

  • Online ISBN: 978-3-319-32467-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics