Restoring: A Greedy Heuristic Approach Based on Neighborhood for Correlation Clustering

Wang, Ning; Li, Jie

doi:10.1007/978-3-642-53914-5_30

Ning Wang²⁵ &
Jie Li²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8346))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2380 Accesses
2 Citations

Abstract

Correlation Clustering has received considerable attention in machine learning literature due to its not requiring specifying the number of clusters in advance. Many approximation algorithms for Correlation Clustering have been proposed with worst-case theoretical guarantees, but with less experimental evaluations. These methods simply consider the direct associations between vertices and achieve poor performance in real datasets. In this paper, we propose a neighborhood-based method called Restoring, in which we argue that the neighborhood around two connected vertices is important and two vertices belonging to the same cluster should have the same neighborhood. Our algorithm iteratively chooses two connected vertices and restores their neighborhood. We also define the cost of keeping or removing one non-common neighbor and identify a restoring order based on the neighborhood similarity. Experiments conducted on five sub datasets of Cora show that our method performs better than existing well-known methods both in results quality and objective value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1-3), 89–113 (2004)
Google Scholar
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 104–111. Association for Computational Linguistics (2002)
Google Scholar
Cohen, W.W., Richman, J.: Learning to match and cluster large high-dimensional data sets for data integration. In. In: Proceedings of the Eighth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 475–480. ACM (2002)
Google Scholar
http://www.cs.umd.edu/~getoor/Tutorials/ER_VLDB2012.pdf (2012)
Malioutov, I., Barzilay, R.: Minimum cut model for spoken lecture segmentation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 25–32. Association for Computational Linguistics (2006)
Google Scholar
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. Journal of Computer and System Sciences 71(3), 360–383 (2005)
Google Scholar
Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. Theoret. Comput. Science 361(2–3), 172–187 (2006)
Article MATH MathSciNet Google Scholar
Swamy, C.: Correlation clustering: maximizing agreements via semi definite programming. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 526–527. Society for Industrial and Applied Mathematics (2004)
Google Scholar
Ailon, N., Charikar, M., Newman, A.: Aggregating in consistent information: rank in gand clustering. Journal of the ACM (JACM) 55(5), 23 (2008)
Google Scholar
VanZuylen, A., Williamson, D.P.: Deterministicpi voting algorithms for constraine dranking and clustering problems. Mathematics of Operations Research 34(3), 594–620 (2009)
Google Scholar
Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 1167–1176. ACM (2006)
Google Scholar
Bonchi, F., Gionis, A., Ukkonen, A.: Overlapping correlation clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 51–60 (2011)
Google Scholar
Bonchi, F., Gionis, A., Gullo, F., Ukkonen, A.: Chromatic correlation clustering. In: Proceedings of the 18th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1321–1329. ACM (2012)
Google Scholar
Bertolacci, M., Wirth, A.: Are approximation algorithms for consensus clustering worth while? In: SDM (2007)
Google Scholar
Goder, A., Filkov, V.: Consensus Clustering Algorithms: Comparison and Refinement. In: ALENEX, vol. 8, pp. 109–117 (2008)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1), 4 (2007)
Google Scholar
Elsner, M., Schudy, W.: Bounding and comparing methods for correlation clustering beyond ILP. In: Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 19–27. Association for Computational Linguistics (2009)
Google Scholar
Elsner, M., Charniak, E.: You Talking to Me?A Corpus and Algorithm for Conversation Disentanglement. In: ACL, pp. 834–842 (2008)
Google Scholar
Meilă, M.: Comparing clusterings—an information based distance. Journal of Multivariate Analysis 98(5), 873–895 (2007)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
Ning Wang & Jie Li

Authors

Ning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, University of Alberta, T6G 2E8, Edmonton, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, N., Li, J. (2013). Restoring: A Greedy Heuristic Approach Based on Neighborhood for Correlation Clustering. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-53914-5_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics