Skip to main content

A DP Canopy K-Means Algorithm for Privacy Preservation of Hadoop Platform

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10581))

Abstract

K-means algorithm for data mining is combined with differential privacy preservation. Although it improves the security of data information, the selection of clustering number and initial center point is still blind and random. In this paper, we integrate an optimized Canopy algorithm with DP K-means algorithm, and apply it to Hadoop platform. Firstly, we optimize the Canopy algorithm according to the minimum and maximum principle and use the functions of the MapReduce framework to implement it. Secondly, we utilize the number and the set of center points obtained to implement the DP K-means algorithm on MapReduce. As a result, the improved Canopy algorithm can optimize the selection of the number of centers and clusters on Hadoop platform, so the proposed K-means algorithm can improve security, usability and efficiency of calculation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hua, Y.: Understanding big data processing and programming. China Machine Press (2014)

    Google Scholar 

  2. Dwork, C.: Differential privacy. In: Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, pp. 338–340. Springer, Berlin (2006)

    Google Scholar 

  3. Blum, A., Dwork, C., Mcsherry, F., et al.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138 (2005)

    Google Scholar 

  4. Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM (2007)

    Google Scholar 

  5. Li, Y., Hao, Z., Wen, W., Xie, G.: Research on differential privacy preserving K-means clustering. Comput. Sci. 40(3), 287–290 (2013)

    Google Scholar 

  6. Mccallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. Knowl. Discov. Data Min., pp. 169–178 (2000)

    Google Scholar 

  7. Dianhui, M.: Improved Canopy K-means algorithm based on MapReduce. Comput. Eng. Appl. 48(27), 22–26 (2012)

    Google Scholar 

  8. Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)

    Article  Google Scholar 

Download references

Acknowledgment

Project supported by the National Key Research and Development Program of China (No. 2016YFC1000307) and the National Natural Science Foundation of China (No. 61571024) for valuable helps.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Shang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Shang, T., Zhao, Z., Guan, Z., Liu, J. (2017). A DP Canopy K-Means Algorithm for Privacy Preservation of Hadoop Platform. In: Wen, S., Wu, W., Castiglione, A. (eds) Cyberspace Safety and Security. CSS 2017. Lecture Notes in Computer Science(), vol 10581. Springer, Cham. https://doi.org/10.1007/978-3-319-69471-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69471-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69470-2

  • Online ISBN: 978-3-319-69471-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics