Privacy Preserving BIRCH Algorithm for Clustering over Vertically Partitioned Databases

  • P. Krishna Prasad
  • C. Pandu Rangan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4165)


BIRCH algorithm, introduced by Zhang et al. [15], is a well known algorithm for effectively finding clusters in a large data set. The two major components of the BIRCH algorithm are CF tree construction and global clustering. However BIRCH algorithm is basically designed as an algorithm working on a single database. We propose the first novel method for running BIRCH over a vertically partitioned data sets, distributed in two different databases in a privacy preserving manner. We first provide efficient solutions to crypto primitives such as finding minimum index in a vector sum and checking if sum of two private values exceed certain threshold limit. We then use these primitives as basic tools to arrive at secure solutions to CF tree construction and single link clustering for implementing BIRCH algorithm.


Privacy Preserve Cluster Feature Global Cluster Single Link Cluster Threshold Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, D., Aggarwal, C.C.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: Proceedings of the Twentieth ACM SIGACT - SIGMOD - SIGART Symposium on Principles of Database Systems, May 21-23, 2001, pp. 247–255. ACM, Santa Barbara (2001)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, TX, May 14-19, 2000. ACM Press, New York (2000)Google Scholar
  3. 3.
    Goethals, B., Laur, S., Lipmaa, H., Mielikainen, T.: On private scalar product computation for privacy-preserving data mining. In: Park, C.-s., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Cachin, C.: Efficient private bidding and auctions with an oblivious third party. In: Proceedings of 6th ACM Computer and communications security, SIGSAC, pp. 120–127. ACM Press, New York (1999)CrossRefGoogle Scholar
  5. 5.
    Damgard, I., Jurik, M.: A Generalisation, a Simplification and Some Applications of Paillier’s Probabilistic Public-Key System. In: Kim, K.-c. (ed.) PKC 2001. LNCS, vol. 1992, pp. 119–136. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A New Privacy-Preserving Distributed k-Clustering Algorithm. In: Proceedings of the 2006 SIAM International Conference on Data Mining (SDM) (2006)Google Scholar
  7. 7.
    Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005, Chicago, Illinois, USA, August 21-24, 2005. ACM, New York (2005)Google Scholar
  8. 8.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data, ch. 3. Prentice-Hall Inc., Englewood Cliffs (1988)MATHGoogle Scholar
  9. 9.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Natan, R.B.: Implementing Database Security and Auditing, ch. 11. Elsevier, Amsterdam (2005)Google Scholar
  11. 11.
    Oliveira, S., Zaiane, O.R.: Privacy preserving clustering by data transformation. In: Proceedings of the 18th Brazilian Symposium on Databases, pp. 304–318 (2003)Google Scholar
  12. 12.
    Paillier, P.: Public-key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)Google Scholar
  13. 13.
    Rivest, R., Adleman, L., Dertouzos, M.: On data banks and privacy homomorphisms. In: Foundations of Secure Computation, pp. 169–178. Academic Press, London (1978)Google Scholar
  14. 14.
    Jha, S., Kruger, L., McDaniel, P.: Privacy Preserving Clustering. In: di Vimercati, S.d.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient Data Clustering Method of Very Large Databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 103–114 (June 1996)Google Scholar
  16. 16.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23-26, 2002, pp. 639–644. ACM, New York (2002)CrossRefGoogle Scholar
  17. 17.
    Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD International Conference on knowledge Discovery and Data Mining, Washington, DC, USA, August 24-27, 2003. ACM, New York (2003)Google Scholar
  18. 18.
    Yao, A.C.: Protocols for secure computation. In: Proceedings of 23rd IEEE Symposium on Foundations of Computer Science, pp. 160–164. IEEE Computer Society Press, Los Alamitos (1982)Google Scholar
  19. 19.
    Yao, A.C.: How to generate and exchange secrets. In: Proceedings of the 27th IEEE Symp. on Foundations of Computer Science, Toronto, Ontario, Canada, October 27 - 29, 1986, pp. 162–167 (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • P. Krishna Prasad
    • 1
  • C. Pandu Rangan
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of TechnologyMadras, ChennaiIndia

Personalised recommendations