Skip to main content

Towards a Unified Framework for Data Cleaning and Data Privacy

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2015 (WISE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9419))

Included in the following conference series:

Abstract

Data quality has become a pervasive challenge for organizations as they wrangle with large, heterogeneous datasets to extract value. Existing data cleaning solutions have focused on scalable techniques to resolve inconsistencies quickly. However, given the proliferation of sensitive, confidential user information, data privacy concerns have largely remained unexplored in data cleaning techniques. In this work, we present a new privacy-aware, data cleaning framework that aims to resolve data inconsistencies while minimizing the amount of information disclosed. We present a set of data disclosure operations that facilitate the data cleaning process, and propose two information-theoretic measures for privacy loss and data utility that are used to correct inconsistencies in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For illustration purposes of this example, we assume persons of a unique age determine a unique disease.

References

  1. Beskales, G., Ilyas, I., Golab, L.: Sampling the repairs of functional dependency violations under hard constraints. In: VLDB, pp. 197–207 (2010)

    Google Scholar 

  2. Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: On the relative trust between inconsistent data and inaccurate constraints. In: ICDE, pp. 541–552 (2013)

    Google Scholar 

  3. Chiang, F., Miller, R.J.: Active repair of data quality rules. In: ICIQ, pp. 174–188 (2011)

    Google Scholar 

  4. Deb, K.: Multi-objective optimization. In: Burke, E.K., Kendall, G. (eds.) Search Methodologies, pp. 403–449. Springer, New York (2014)

    Chapter  Google Scholar 

  5. Fung, B., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comp. Surv. 42(4), 14 (2010)

    Article  Google Scholar 

  6. Howard, P.: The business case for data quality. Bloor Research, White Paper (2012)

    Google Scholar 

  7. Sweeney, L.: k-anonymity: a model for protecting privacy. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Thomas, J., Cover, T.: Elements of Information Theory, vol. 2. Wiley, New York (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Huang, Y., Chiang, F. (2015). Towards a Unified Framework for Data Cleaning and Data Privacy. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26187-4_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26186-7

  • Online ISBN: 978-3-319-26187-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics