Advertisement

Privacy Protected Mining Using Heuristic Based Inherent Voting Spatial Cluster Ensembles

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 236)

Abstract

Spatial data mining i.e., discovery of implicit knowledge in spatial databases, is very crucial for effective use of spatial data. Clustering is an important task, mostly used in preprocessing phase of data analysis. It is widely recognized that combining multiple models typically provides superior results compared to using a single, well-tuned model. The idea of combining object partitions without accessing the original objects’ features leads us to knowledge reuse termed as cluster ensembles. The most important advantage is that ensembles provide a platform where vertical slices of data can be fused. This approach provides an easy and effective solution for the most haunted issue of preserving privacy and dimensionality curse in data mining applications. We have designed four approaches to implement spatial cluster ensembles and have used these for merging vertical slices of attribute data. In our approach, we have brought out that by using a guided approach in combining the outputs of the various clusterers, we can reduce the intensive distance matrix computations and also generate robust clusters. We have proposed hybrid and layered cluster merging approach for fusion of spatial clusterings and used it in our three-phase clustering combination technique. The major challenge in fusion of ensembles is creation and manipulation of voting matrix or proximity matrix of order \(\text {n}^{2}\), where n is the number of data points. This is very expensive both in time and space factors, with respect to spatial data sets. We have eliminated the computation of such expensive voting matrix. Compatible clusterers are identified for the partially fused clusterers, so that this acquired knowledge will be used for further fusion. The apparent advantage is that we can prune the data sets after every (m\(-\)1)/2 layers. Privacy preserving has become a very important aspect as data sharing between organizations is also difficult. We have tried to provide a solution for this problem. We have obtained clusters from the partial datasets and then without access to the original data, we have used the clusters to help us in merging similar clusters obtained from other partial datasets. Our ensemble fusion models are tested extensively with both intrinsic and extrinsic metrics.

Keywords

Cluster ensembles Degree of agreement Performance metrics Spatial attribute data 

References

  1. 1.
    Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. J. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), (2007). doi: 10.1145/1217299.1217303
  2. 2.
    Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, (2002)Google Scholar
  3. 3.
    Fred, A.L.N.: Finding consistent clusters in data partitions. In: Proceedings of Second International Workshop on Multiple Classifier Systems, pp. 309–318. Springer-Verlag, London (2001)Google Scholar
  4. 4.
    Dudoit, S., Fridyand, J.: Bagging to improve the accuracy of a clustering procedure. Oxford J. Bioinform. 19(9), 1090–1099 (2003). doi: 10.1093/bioinformatics/btg038 CrossRefGoogle Scholar
  5. 5.
    Azimi, J., Abdoos, M., Analoui, M.: A new efficient approach in clustering ensembles, LNCS, 4881, pp. 395–405 (2007)Google Scholar
  6. 6.
    Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 25(11):1411–1415 (2003)Google Scholar
  7. 7.
    Topchy, A., Jain, A.K., Punch, W.:Combining multiple weak clusterings. Proceeding of the Third IEEE International Conference on Data Mining, pp. 331–338 (2003). ISBN :0-7695-1978-4 doi:  10.1109/ICDM.2003.1250937
  8. 8.
    Anandhi, R.J., Natarajan, S.: Efficient consensus function for spatial cluster ensembles: An heuristic layered approach. Proceedings of International Symposium on Computing, Communication and Control, Singapore (2009). ISBN 978-9-8108-3815-7Google Scholar
  9. 9.
    Anandhi, R.J., Natarajan, S.:A novel method for combining results of clusters in spatial cluster ensembles: A layered depth first merge approach with inherent voting. Int. J. Algorithms, Comp. Math. 2(4), 53–58 (2009). ISSN 0973–8215Google Scholar
  10. 10.
    Anandhi, R.J., Natarajan, S.:An enhanced clusterer aggregation using nebulous pool. Proceedings of ACM -w International Conference of Celebration of Women in Computing, India, ACM Digital, Library (2010). 978–1-4503-0194-7Google Scholar
  11. 11.
    Anandhi, R.J., Natarajan,S.:A robust-knowledge guided fusion of clustering ensembles. Int. J. Comput. Sci. Inf. Sci. ISBN 1947-5500 8(4), LJS Publisher and IJCSIS Press, Pennsylvania, USA (2010)Google Scholar
  12. 12.
    Anandhi, R.J., Natarajan, S.: Efficient and effortless similarity measures for spatial cluster ensembles. CiiT Int. J. Artif. Intell. Syst. Mach. Learn. Pr. ISSN 0974–9667 and Online: ISSN 0974–9543, doi:  AIML112010010, 2(11), pp. 359–365 (2010)

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Department of Computer Science and EnggThe Oxford College of EngineeringBangaloreIndia
  2. 2.Department of Information Science and EngineeringPESITBangaloreIndia

Personalised recommendations