Skip to main content

An Efficient Local Search Algorithm for Correlation Clustering on Large Graphs

  • Conference paper
  • First Online:
Combinatorial Optimization and Applications (COCOA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14461))

  • 280 Accesses

Abstract

Correlation clustering (CC) is a widely-used clustering paradigm, with many applications to problems such as classification, database deduplication, and community detection. CC instances represent objects as graph nodes, and clustering is performed based on relationships between objects (positive or negative edges between pairs of nodes). The CC objective is to obtain a graph clustering that minimizes the number of incorrectly assigned edges (negative edges within clusters, and positive edges between clusters).

For large CC instances, lightweight algorithms like the Pivot method have been preferred due to their scalability. Because these algorithms do not have state-of-the-art approximation guarantees, LocalSearch (LS) methods have often then been applied to refine their clustering results. Unfortunately, LS does not enjoy the same ability to scale since it is inherently sequential and has the potential to converge slowly.

We propose a lightweight, parallelizable LS method called InnerLocalSearch (ILS) to use in conjunction with the Pivot algorithm. We show that ILS still provides a significant improvement to clustering quality while dramatically reducing the additional running time costs incurred by LS. We demonstrate our algorithm’s effectiveness against several LS benchmarks and other popular CC methods on real and synthetic data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Also called “Best One Element Move” (BOEM).

  2. 2.

    Code available at github.com/cc-conf-sub/ils-improvement.

  3. 3.

    Available at snap.stanford.edu/data/#communities.

References

  1. Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Fair correlation clustering. In: International Conference on Artificial Intelligence and Statistics, pp. 4195–4205. PMLR (2020)

    Google Scholar 

  2. Ahn, K., Cormode, G., Guha, S., McGregor, A., Wirth, A.: Correlation clustering in data streams. In: International Conference on Machine Learning, pp. 2237–2246. PMLR (2015)

    Google Scholar 

  3. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM (JACM) 55(5), 1–27 (2008)

    Article  MathSciNet  Google Scholar 

  4. Ailon, N., Liberty, E.: Correlation clustering revisited: the ‘True’ cost of error minimization problems. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 24–36. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02927-1_4

    Chapter  Google Scholar 

  5. Assadi, S., Wang, C.: Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions. In: 13th Innovations in Theoretical Computer Science Conference (ITCS 2022) (2022)

    Google Scholar 

  6. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)

    Article  MathSciNet  Google Scholar 

  7. Behnezhad, S., Charikar, M., Ma, W., Tan, L.Y.: Almost 3-approximate correlation clustering in constant rounds. In: 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 720–731. IEEE (2022)

    Google Scholar 

  8. Bonchi, F., Garcia-Soriano, D., Liberty, E.: Correlation clustering: from theory to practice. In: KDD, p. 1972 (2014)

    Google Scholar 

  9. Bonchi, F., Gionis, A., Ukkonen, A.: Overlapping correlation clustering. Knowl. Inf. Syst. 35, 1–32 (2013)

    Article  Google Scholar 

  10. Chehreghani, M.H.: Clustering by shift. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 793–798. IEEE (2017)

    Google Scholar 

  11. Chierichetti, F., Dalvi, N., Kumar, R.: Correlation clustering in mapreduce. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 641–650 (2014)

    Google Scholar 

  12. Christiansen, L., Mobasher, B., Burke, R.: Using uncertain graphs to automatically generate event flows from news stories. In: Proceedings of Workshop on Social Media World Sensors at ACM Hypertext 2017 (SIDEWAYS, HT’17) (2017)

    Google Scholar 

  13. Cohen-Addad, V., Lee, E., Newman, A.: Correlation clustering with sherali-adams. In: 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 651–661. IEEE (2022)

    Google Scholar 

  14. Coleman, T., Saunderson, J., Wirth, A.: A local-search 2-approximation for 2-correlation-clustering. In: Halperin, D., Mehlhorn, K. (eds.) ESA 2008. LNCS, vol. 5193, pp. 308–319. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87744-8_26

    Chapter  Google Scholar 

  15. Elsner, M., Schudy, W.: Bounding and comparing methods for correlation clustering beyond ilp. In: Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing, pp. 19–27 (2009)

    Google Scholar 

  16. García-Soriano, D., Kutzkov, K., Bonchi, F., Tsourakakis, C.: Query-efficient correlation clustering. In: Proceedings of The Web Conference 2020, pp. 1468–1478 (2020)

    Google Scholar 

  17. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discovery Data (TKDD) 1(1), 4-es (2007)

    Google Scholar 

  18. Goder, A., Filkov, V.: Consensus clustering algorithms: comparison and refinement. In: 2008 Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 109–117. SIAM (2008)

    Google Scholar 

  19. Halim, Z., Waqas, M., Hussain, S.F.: Clustering large probabilistic graphs using multi-population evolutionary algorithm. Inf. Sci. 317, 78–95 (2015)

    Article  Google Scholar 

  20. Haruna, C.R., Hou, M., Eghan, M.J., Kpiebaareh, M.Y., Tandoh, L.: A hybrid data deduplication approach in entity resolution using chromatic correlation clustering. In: Li, F., Takagi, T., Xu, C., Zhang, X. (eds.) FCS 2018. CCIS, vol. 879, pp. 153–167. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-3095-7_12

    Chapter  Google Scholar 

  21. Hua, J., Yu, J., Yang, M.S.: Star-based learning correlation clustering. Pattern Recogn. 116, 107966 (2021)

    Article  Google Scholar 

  22. Klodt, N., Seifert, L., Zahn, A., Casel, K., Issac, D., Friedrich, T.: A color-blind 3-approximation for chromatic correlation clustering and improved heuristics. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 882–891 (2021)

    Google Scholar 

  23. Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25(2), 325–336 (2011)

    Article  Google Scholar 

  24. Lattanzi, S., Moseley, B., Vassilvitskii, S., Wang, Y., Zhou, R.: Robust online correlation clustering. In: Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  25. Levinkov, E., Kirillov, A., Andres, B.: A comparative study of local search algorithms for correlation clustering. In: Roth, V., Vetter, T. (eds.) GCPR 2017. LNCS, vol. 10496, pp. 103–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66709-6_9

    Chapter  Google Scholar 

  26. Mandaglio, D., Tagarelli, A., Gullo, F.: In and out: Optimizing overall interaction in probabilistic graphs under clustering constraints. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1371–1381 (2020)

    Google Scholar 

  27. Pan, X., Papailiopoulos, D., Oymak, S., Recht, B., Ramchandran, K., Jordan, M.I.: Parallel correlation clustering on big graphs. In: Advances in Neural Information Processing Systems, pp. 82–90 (2015)

    Google Scholar 

  28. Puleo, G.J., Milenkovic, O.: Correlation clustering with constrained cluster sizes and extended weights bounds. SIAM J. Optim. 25(3), 1857–1872 (2015)

    Article  MathSciNet  Google Scholar 

  29. Queiroga, E., Subramanian, A., Figueiredo, R., Frota, Y.: Integer programming formulations and efficient local search for relaxed correlation clustering. J. Global Optim. 81, 919–966 (2021)

    Article  MathSciNet  Google Scholar 

  30. Shi, J., Dhulipala, L., Eisenstat, D., Lacki, J., Mirrokni, V.: Scalable community detection via parallel correlation clustering. Proc. VLDB Endowment 14(11), 2305–2313 (2021)

    Article  Google Scholar 

  31. Thiel, E., Chehreghani, M.H., Dubhashi, D.: A non-convex optimization approach to correlation clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5159–5166 (2019)

    Google Scholar 

  32. Van Zuylen, A., Williamson, D.P.: Deterministic pivoting algorithms for constrained ranking and clustering problems. Math. Oper. Res. 34(3), 594–620 (2009)

    Article  MathSciNet  Google Scholar 

  33. Veldt, N., Gleich, D.F., Wirth, A.: A correlation clustering framework for community detection. In: Proceedings of the 2018 World Wide Web Conference, pp. 439–448 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathan Cordner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cordner, N., Kollios, G. (2024). An Efficient Local Search Algorithm for Correlation Clustering on Large Graphs. In: Wu, W., Guo, J. (eds) Combinatorial Optimization and Applications. COCOA 2023. Lecture Notes in Computer Science, vol 14461. Springer, Cham. https://doi.org/10.1007/978-3-031-49611-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49611-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49610-3

  • Online ISBN: 978-3-031-49611-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics