Approximate Correlation Clustering Using Same-Cluster Queries

Ailon, Nir; Bhattacharya, Anup; Jaiswal, Ragesh

doi:10.1007/978-3-319-77404-6_2

Nir Ailon¹⁶,
Anup Bhattacharya¹⁷ &
Ragesh Jaiswal¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10807))

Included in the following conference series:

Latin American Symposium on Theoretical Informatics

2720 Accesses
4 Citations

Abstract

Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for clustering (SSAC) where a learner is allowed to make same-cluster queries. More specifically, in their model, there is a query oracle that answers queries of the form “given any two vertices, do they belong to the same optimal cluster?”. In many clustering contexts, this kind of oracle queries are feasible. Ashtiani et al. showed the usefulness of such a query framework by giving a polynomial time algorithm for the k-means clustering problem where the input dataset satisfies some separation condition. Ailon et al. extended the above work to the approximation setting by giving an efficient \((1+\varepsilon )\)-approximation algorithm for k-means for any small \(\varepsilon > 0\) and any dataset within the SSAC framework. In this work, we extend this line of study to the correlation clustering problem. Correlation clustering is a graph clustering problem where pairwise similarity (or dissimilarity) information is given for every pair of vertices and the objective is to partition the vertices into clusters that minimise the disagreement (or maximises agreement) with the pairwise information given as input. These problems are popularly known as \(\mathsf {MinDisAgree}\) and \(\mathsf {MaxAgree}\) problems, and \(\mathsf {MinDisAgree}[k]\) and \(\mathsf {MaxAgree}[k]\) are versions of these problems where the number of optimal clusters is at most k. There exist Polynomial Time Approximation Schemes (PTAS) for \(\mathsf {MinDisAgree}[k]\) and \(\mathsf {MaxAgree}[k]\) where the approximation guarantee is \((1+\varepsilon )\) for any small \(\varepsilon \) and the running time is polynomial in the input parameters but exponential in k and \(1/\varepsilon \). We get a significant running time improvement within the SSAC framework at the cost of making a small number of same-cluster queries. We obtain an \((1+ \varepsilon )\)-approximation algorithm for any small \(\varepsilon \) with running time that is polynomial in the input parameters and also in k and \(1/\varepsilon \). We also give non-trivial upper and lower bounds on the number of same-cluster queries, the lower bound being based on the Exponential Time Hypothesis (ETH). Note that the existence of an efficient algorithm for \(\mathsf {MinDisAgree}[k]\) in the SSAC setting exhibits the power of same-cluster queries since such polynomial time algorithm (polynomial even in k and \(1/\varepsilon \)) is not possible in the classical (non-query) setting due to our conditional lower bounds. Our conditional lower bound is particularly interesting as it not only establishes a lower bound on the number of same cluster queries in the SSAC framework but also establishes a conditional lower bound on the running time of any \((1+\varepsilon )\)-approximation algorithm for \(\mathsf {MinDisAgree}[k]\).

Nir Ailon acknowledges the generous support of ISF grant number 2021408.

Anup Bhattacharya acknowledges the support of TCS fellowship at IIT Delhi.

Ragesh Jaiswal acknowledges the support of ISF-UGC India-Israel Grant 2014.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Every clause in an \(\mathsf {E}3\)-\(\mathsf {SAT}\) formula has exactly 3 literals.
2.
Readers familiar with [12] will realise that the statement of the theorem is slightly different from statement of the similar theorem (Theorem 13) in [12]. More specifically, the claim is about the function call with \(\varepsilon /4\) as a parameter rather than \(\varepsilon \). This is done to allow the recursive call in step (9) to be made with same value of precision parameter as the initial call. This does not change the approximation analysis but is crucial for our running time analysis.

References

Ailon, N., Bhattacharya, A., Jaiswal, R., Kumar, A.: Approximate clustering with same-cluster queries (2017). CoRR, abs/1704.01862. To Appear in ITCS 2018
Google Scholar
Angelidakis, H., Makarychev, K., Makarychev, Y.: Algorithms for stable and perturbation-resilient problems. In: STOC, pp. 438–451 (2017)
Google Scholar
Ashtiani, H., Kushagra, S., Ben-David, S.: Clustering with same-cluster queries. In: NIPS, pp. 3216–3224 (2016)
Google Scholar
Awasthi, P., Balcan, M.-F, Voevodski, K.: Local algorithms for interactive clustering. In: ICML, pp. 550–558 (2014)
Google Scholar
Balcan, M.-F., Blum, A.: Clustering with interactive feedback. In: Freund, Y., Györfi, L., Turán, G., Zeugmann, T. (eds.) ALT 2008. LNCS (LNAI), vol. 5254, pp. 316–328. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87987-9_27
Chapter Google Scholar
Balcan, M.-F., Blum, A., Gupta, A.: Clustering under approximation stability. J. ACM (JACM) 60(2), 8 (2013)
Article MathSciNet MATH Google Scholar
Balcan, M.F., Liang, Y.: Clustering under perturbation resilience. SIAM J. Comput. 45(1), 102–155 (2016)
Article MathSciNet MATH Google Scholar
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)
Article MathSciNet MATH Google Scholar
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005)
Article MathSciNet MATH Google Scholar
Dinur, I.: The PCP theorem by gap amplification. J. ACM 54(3), 12 (2007)
Article MathSciNet MATH Google Scholar
Fomin, F.V., Kratsch, S., Pilipczuk, M., Pilipczuk, M., Villanger, Y.: Tight bounds for parameterized complexity of cluster editing with a small number of clusters. J. Comput. Syst. Sci. 80(7), 1430–1447 (2014)
Article MathSciNet MATH Google Scholar
Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, pp. 1167–1176 (2006)
Google Scholar
Impagliazzo, R., Paturi, R.: On the complexity of k-SAT. J. Comput. Syst. Sci. 62(2), 367–375 (2001)
Article MathSciNet MATH Google Scholar
Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001)
Article MathSciNet MATH Google Scholar
Makarychev, K., Makarychev, Y., Vijayaraghavan, A.: Correlation clustering with noisy partial information. In: COLT, pp. 1321–1342 (2015)
Google Scholar
Manurangsi, P.: Almost-polynomial ratio ETH-hardness of approximating densest \(k\)-subgraph. CoRR, abs/1611.05991 (2016)
Google Scholar
Mathieu, C., Schudy, W.: Correlation clustering with noisy input. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 712–728 (2010)
Google Scholar
Mazumdar, A., Saha, B.: Query complexity of clustering with side information. arXiv preprint arXiv:1706.07719 (2017)
Voevodski, K., Balcan, M.-F., Röglin, H., Teng, S.-H., Xia, Y.: Efficient clustering with limited distance information. In: Conference on Uncertainty in Artificial Intelligence, pp. 632–640 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Technion, Haifa, Israel
Nir Ailon
Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India
Anup Bhattacharya & Ragesh Jaiswal

Authors

Nir Ailon
View author publications
You can also search for this author in PubMed Google Scholar
Anup Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Ragesh Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anup Bhattacharya .

Editor information

Editors and Affiliations

Stony Brook University, Stony Brook, New York, USA
Michael A. Bender
Rutgers University, New Brunswick, New Jersey, USA
Martín Farach-Colton
Pace University, New York, New York, USA
Miguel A. Mosteiro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ailon, N., Bhattacharya, A., Jaiswal, R. (2018). Approximate Correlation Clustering Using Same-Cluster Queries. In: Bender, M., Farach-Colton, M., Mosteiro, M. (eds) LATIN 2018: Theoretical Informatics. LATIN 2018. Lecture Notes in Computer Science(), vol 10807. Springer, Cham. https://doi.org/10.1007/978-3-319-77404-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-77404-6_2
Published: 13 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77403-9
Online ISBN: 978-3-319-77404-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics