Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2015: Machine Learning and Knowledge Discovery in Databases pp 458-473

Differentially Private Analysis of Outliers

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9285)

Abstract

This paper presents an investigation of differentially private analysis of distance-based outliers. Outlier detection aims to identify instances that are apparently distant from other instances. Meanwhile, the objective of differential privacy is to conceal the presence (or absence) of any particular instance. Outlier detection and privacy protection are therefore intrinsically conflicting tasks. In this paper, we present differentially private queries for counting outliers that appear in a given subspace, instead of reporting the outliers detected. Our analysis of the global sensitivity of outlier counts reveals that regular global sensitivity-based methods can make the outputs too noisy, particularly when the dimensionality of the given subspace is high. Noting that the counts of outliers are typically expected to be small compared to the number of data, we introduce a mechanism based on the smooth upper bound of the local sensitivity. This study is the first trial to ensure differential privacy for distance-based outlier analysis. The experimentally obtained results show that our method achieves better utility than global sensitivity-based methods do.

Keywords

Differential privacy Outlier detection Smooth sensitivity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bao, H.T., et al.: A distributed solution for privacy preserving outlier detection. In: Proceedings of the 2011 Third International Conference on Knowledge and Systems Engineering, pp. 26–31. IEEE Computer Society (2011)Google Scholar
  2. 2.
    Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  3. 3.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  4. 4.
    Dwork, C., Smith, A.: Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality 1(2), 2 (2010)MATHGoogle Scholar
  5. 5.
    Fan, L., Xiong, L.: Differentially private anomaly detection with a case study on epidemic outbreak detection. In: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 833–840. IEEE Computer Society (2013)Google Scholar
  6. 6.
    Fischer, K., Gärtner, B., Kutz, M.: Fast smallest-enclosing-ball computation in high dimensions. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 630–641. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  7. 7.
    Keller, F., Müller, E., Böhm, K.: Hics: high contrast subspaces for density-based outlier ranking. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1–5 April, 2012, pp. 1037–1048. IEEE Computer Society (2012)Google Scholar
  8. 8.
    Keller, F., Müller, E., Wixler, A., Böhm, K.: Flexible and adaptive subspace search for outlier analysis. In: 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, October 27 - November 1, 2013, pp. 1381–1390. ACM (2013)Google Scholar
  9. 9.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24rd International Conference on Very Large Data Bases. pp. 392–403. VLDB 1998, Morgan Kaufmann Publishers Inc., San Francisco, CA (1998)Google Scholar
  10. 10.
    Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 211–222. VLDB 1999, Morgan Kaufmann Publishers Inc., San Francisco, CA (1999)Google Scholar
  11. 11.
    Kutz, M., Kaspar, F., Bernd, G.: A java library to compute the miniball of a point set. https://github.com/hbf/miniball, last Accessed Time: February 2, 2015
  12. 12.
    Li, L., Huang, L., Yang, W., Yao, X., Liu, A.: Privacy-preserving lof outlier detection. Knowledge and Information Systems 42(3), 579–597 (2015)CrossRefGoogle Scholar
  13. 13.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  14. 14.
    Lui, E., Pass, R.: Outlier privacy. In: Dodis, Y., Nielsen, J.B. (eds.) TCC 2015, Part II. LNCS, vol. 9015, pp. 277–305. Springer, Heidelberg (2015) Google Scholar
  15. 15.
    Mittelmann, H.D., Vallentin, F.: High-accuracy semidefinite programming bounds for kissing numbers. Experimental Mathematics 19(2), 175–179 (2010)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Musin, O.R.: The kissing problem in three dimensions. Discrete & Computational Geometry 35(3), 375–384 (2006)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Musin, O.R.: The kissing number in four dimensions. Annals of Mathematics 168(1), 1–32 (2008)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. STOC 2007. ACM, New York (2007)Google Scholar
  19. 19.
    Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 877–885. KDD 2012. ACM, New York (2012)Google Scholar
  20. 20.
    Vaidya, J., Clifton, C.: Privacy-preserving outlier detection. In: The Fourth IEEE International Conference on Data Mining, pp. 233–240. IEEE Computer Society, Brighton (2004)Google Scholar
  21. 21.
    Xue, A., Duan, X., Ma, H., Chen, W., Ju, S.: Privacy preserving spatial outlier detection. In: Proceedings of the 9th International Conference for Young Computer Scientists, pp. 714–719. IEEE Computer Society (2008)Google Scholar
  22. 22.
    Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 813–822. Springer, Heidelberg (2009) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.University of TsukubaTsukubaJapan

Personalised recommendations