Applying Electromagnetic Field Theory Concepts to Clustering with Constraints

  • Huseyin Hakkoymaz
  • Georgios Chatzimilioudis
  • Dimitrios Gunopulos
  • Heikki Mannila
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)

Abstract

This work shows how concepts from the electromagnetic field theory can be efficiently used in clustering with constraints. The proposed framework transforms vector data into a fully connected graph, or just works straight on the given graph data. User constraints are represented by electromagnetic fields that affect the weight of the graph’s edges. A clustering algorithm is then applied on the adjusted graph, using k-distinct shortest paths as the distance measure. Our framework provides better accuracy compared to MPCK-Means, SS-Kernel-KMeans and Kmeans+Diagonal Metric even when very few constraints are used, significantly improves clustering performance on some datasets that other methods fail to partition successfully, and can cluster both vector and graph datasets. All these advantages are demonstrated through thorough experimental evaluation.

Keywords

Data Clustering User Constraints Electromagnetic Field Theory 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asuncion, A., Newman, D.J.: UCI Machine Learning RepositoryGoogle Scholar
  2. 2.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance function using equivalence relations. In: ICML 2003, Washington DC (August 2003)Google Scholar
  3. 3.
    Basu, S., Bilenko, M., Mooney, R.J.: A Probabilistic Framework for Semi-Supervised Clustering. In: KDD 2004, Seattle, WA (August 2004)Google Scholar
  4. 4.
    Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semi-Supervised Clustering. In: ICML 2004, Canada, July 2004, pp. 81–88 (2004)Google Scholar
  5. 5.
    Brander, A., Sinclair, M.: A comparative study of k-shortest path algorithms. In: Proceedings of 11th UK Performance Engineering Workshop for Computer and Telecomm. Systems (1995)Google Scholar
  6. 6.
    Davidson, I., Wagstaff, K., Basu, S.: Measuring Constraint-Set Utility for Partitional Clustering Algorithms. In: Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany, September 18-22 (2006)Google Scholar
  7. 7.
    Dhillon, I., Guan, Y., Kulis, B.: A fast kernel-based multilevel algorithm for graph clustering. In: Proceedings of ACM SIGKDD 2005, Chicago, Illinois, USA, August 21-24 (2005)Google Scholar
  8. 8.
    Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classifi-cation in sparsely labeled networks. In: KDD 2008, Las Vegas, NV, USA, August 24-27 (2008)Google Scholar
  9. 9.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20(1), 359–392 (1999)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)CrossRefMATHGoogle Scholar
  11. 11.
    Koren, Y., North, S.C., Volinsky, C.: Measuring and Extracting Proximity in Networks. In: KDD 2006, Philadelphia, Pennsylvania, USA, August 20-23 (2006)Google Scholar
  12. 12.
    Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. In: Proceedings of the 22nd international conference on Machine learning, Bonn, Germany, August 07-11, 2005, pp. 457–464 (2005)Google Scholar
  13. 13.
    Law, M.H.C., Topchy, A.P., Jain, A.K.: Model-based clustering with probabilistic constraints. In: SDM 2005 (2005)Google Scholar
  14. 14.
    Lutz, H., Stocker, H., Harris, J.W.: Handbook of Physics, 1st edn., pp. 439–444 (2002)Google Scholar
  15. 15.
    Raytchev, B., Murase, H.: Unsupervised Face Recognition from Image Sequences Based on clustering with Attraction and Repulsion. In: CVPR 2001, vol. 2, p. 25 (2001)Google Scholar
  16. 16.
    Suhir, E.: Applied Probability for Engineers and Scientists. McGraw-Hill, New York (1997)MATHGoogle Scholar
  17. 17.
    Tong, H., Faloutsos, C., Pan, J.: Fast Random Walk with Restart and Its Applications. In: ICDM 2006, Hong Kong (2006)Google Scholar
  18. 18.
    Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-Means clustering with background knowledge. In: ICML 2001, pp. 577–584 (2001)Google Scholar
  19. 19.
    Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB 1998, New York, USA, pp. 194–205 (1998)Google Scholar
  20. 20.
    Xing, E., Ng, A.Y., Jordan, M., Russell, S.: Distance metric learning, with app- lication to clustering with side-information. In: Advances in NIPS, vol. 15. MIT Press, Cambridge (2002)Google Scholar
  21. 21.
    Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: a structural clustering algorithm for networks. In: KDD 2007, San Jose, California, USA, August 12-15 (2007)Google Scholar
  22. 22.
    Yan, B., Domeniconi, C.: An Adaptive Kernel Method for Semi-supervised Clustering. In: Proc. of the 17th European Conference on Machine Learning, Berlin, Germany (September 2006)Google Scholar
  23. 23.
    Klein, D., Kamvar, S.D., Manning, C.D.: From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering. In: Proc. of 19th Int. Conf. on Machine Learning 2002, San Francisco, CA, USA (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Huseyin Hakkoymaz
    • 1
  • Georgios Chatzimilioudis
    • 1
  • Dimitrios Gunopulos
    • 1
    • 2
  • Heikki Mannila
    • 3
  1. 1.Dept. of Computer ScienceUniversity of CaliforniaRiversideUSA
  2. 2.Dept. of Informatics and TelecommunicationsUniv. of AthensGreece
  3. 3.HIITHelsinki University of Technology and University of HelsinkiFinland

Personalised recommendations