Abstract
We consider the problem of spectral clustering with partial supervision in the form of must-link and cannot-link constraints. Such pairwise constraints are common in problems like coreference resolution in natural language processing. The approach developed in this paper is to learn a new representation space for the data together with a distance in this new space. The representation space is obtained through a constraint-driven linear transformation of a spectral embedding of the data. Constraints are expressed with a Gaussian function that locally reweights the similarities in the projected space. A global, non-convex optimization objective is then derived and the model is learned via gradient descent techniques. Our algorithm is evaluated on standard datasets and compared with state of the art algorithms, like [14,18,31]. Results on these datasets, as well on the CoNLL-2012 coreference resolution shared task dataset, show that our algorithm significantly outperforms related approaches and is also much more scalable.
This work was supported by the French National Research Agency (ANR). Project Lampada ANR-09-EMER-007.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proc. of SODA, pp. 1027–1035 (2007)
Bagga, A., Baldwin, B.: Algorithms for scoring coreference chains. In: Proc. of TREC, vol. 1, pp. 563–566 (1998)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 1373–1396 (2002)
Bengtson, E., Roth, D.: Understanding the value of features for coreference resolution. In: Proc. of EMNLP, pp. 294–303 (2008)
De Bie, T., Suykens, J.A.K., De Moor, B.: Learning from General Label Constraints. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 671–679. Springer, Heidelberg (2004)
Cai, J., Strube, M.: End-to-end coreference resolution via hypergraph partitioning. In: Proc. of COLING, pp. 143–151 (2010)
Davidson, I., Ravi, S.S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proc. of SIAM Data Mining Conference (2005)
Dhillon, I.S., Guan, Y., Kulis, B.: A unified view of kernel k-means, spectral clustering and graph cuts. Technical report, UTCS (2004)
Finley, T., Joachims, T.: Supervised clustering with support vector machines. In: Proc. of ICML, pp. 217–224. ACM Press (2005)
Guattery, S., Miller, G.L.: On the quality of spectral separators. SIAM Journal on Matrix Analysis and Applications 19(3), 701–719 (1998)
Hein, M., Setzer, S.: Beyond spectral clustering - tight relaxations of balanced graph cuts. In: Proc. of NIPS, pp. 2366–2374 (2011)
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)
Jordan, M., Bach, F.: Learning spectral clustering. In: Proc. of NIPS (2004)
Kamvar, S.D., Klein, D., Manning, C.D.: Spectral learning. In: Proc. of IJCAI, pp. 561–566 (2003)
Klein, D.J., Randic, M.: Resistance distance. Journal of Mathematical Chemistry 12, 81–95 (1993)
Kulis, B., Basu, S., Dhillon, I.S., Mooney, R.J.: Semi-supervised graph clustering: a kernel approach. In: Proc. of ICML, pp. 457–464 (2005)
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the conll-2011 shared task. In: Proc. of CoNLL: Shared Task, pp. 28–34 (2011)
Li, Z., Liu, J., Tang, X.: Constrained clustering via spectral regularization. In: Proc. of CVPR, pp. 421–428 (2009)
Lu, Z., Carreira-Perpinan, M.A.: Constrained spectral clustering through affinity propagation. In: Proc. of CVPR (2008)
Luo, X.: On coreference resolution performance metrics. In: Proc. of HLT-EMNLP, pp. 25–32 (2005)
Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Proc. of NIPS, pp. 849–856. MIT Press (2001)
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proc. of ACL, pp. 104–111 (2002)
Nicolae, C., Nicolae, G.: Bestcut: A graph algorithm for coreference resolution. In: Proc. EMNLP, pp. 275–283 (2006)
Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y.: Conll-2012 shared task: Modeling multilingual unrestricted coreference in ontonotes. In: Joint Conference on EMNLP and CoNLL-Shared Task, pp. 1–40 (2012)
Rangapuram, S.S., Hein, M.: Constrained 1-spectral clustering. In: Proc. of AISTATS, pp. 1143–1151 (2012)
Saerens, M., Fouss, F., Yen, L., Dupont, P.E.: The principal components analysis of a graph, and its relationships to spectral clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 371–383. Springer, Heidelberg (2004)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proc. of the Conference on Message Understanding, pp. 45–52 (1995)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means Clustering with Background Knowledge. In: Proc. of ICML, pp. 577–584 (2001)
Wang, X., Davidson, I.: Flexible constrained spectral clustering. In: Proc. of KDD, pp. 563–572 (2010)
Wang, X., Qian, B., Davidson, I.: On constrained spectral clustering and its applications. In: Data Mining and Knowledge Discovery (2012)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance Metric Learning, with Application to Clustering with Side-information. In: Proc. of NIPS (2002)
Yu, S.X., Shi, J.: Grouping with Bias. In: Proc. of NIPS (2001)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proc. ICML, p. 912 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chatel, D., Denis, P., Tommasi, M. (2014). Fast Gaussian Pairwise Constrained Spectral Clustering. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-662-44848-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44847-2
Online ISBN: 978-3-662-44848-9
eBook Packages: Computer ScienceComputer Science (R0)