Data Mining and Knowledge Discovery

, Volume 31, Issue 2, pp 350–370 | Cite as

Unsupervised group matching with application to cross-lingual topic matching without alignment information

  • Tomoharu Iwata
  • Motonobu Kanagawa
  • Tsutomu Hirao
  • Kenji Fukumizu


We propose a method for unsupervised group matching, which is the task of finding correspondence between groups across different domains without cross-domain similarity measurements or paired data. For example, the proposed method can find matching of topic categories in different languages without alignment information. The proposed method interprets a group as a probability distribution, which enables us to handle uncertainty in a limited amount of data, and to incorporate the high order information on groups. Groups are matched by maximizing the dependence between distributions, in which we use the Hilbert Schmidt independence criterion for measuring the dependence. By using kernel embedding which maps distributions into a reproducing kernel Hilbert space, we can calculate the dependence between distributions without density estimation. In the experiments, we demonstrate the effectiveness of the proposed method using synthetic and real data sets including an application to cross-lingual topic matching.


Unsupervised object matching Kernel embedding of distributions Multilingual corpus analysis 


  1. Barnard K, Duygulu P, Forsyth D, De Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learning Res 3:1107–1135MATHGoogle Scholar
  2. Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27CrossRefGoogle Scholar
  3. Christmann A, Steinwart I (2010) Universal kernels on non-standard input spaces. In: Advances in neural information processing systems, pp 406–414Google Scholar
  4. Coleman TF, Li Y (1996) An interior trust region approach for nonlinear minimization subject to bounds. SIAM J Optim 6(2):418–445MathSciNetCrossRefMATHGoogle Scholar
  5. Djuric N, Grbovic M, Vucetic S (2012) Convex kernelized sorting. In: AAAI conference on artificial intelligenceGoogle Scholar
  6. Doan A, Madhavan J, Domingos P, Halevy A (2004) Ontology matching: a machine learning approach. In: Staab S, Studer R (eds) Handbook on ontologies. Springer, Berlin, pp 385–403CrossRefGoogle Scholar
  7. Dudley RM (2002) Real analysis and probability. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  8. Fukumizu K, Bach FR, Jordan MI (2004) Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J Mach Learning Res 5:73–99MathSciNetMATHGoogle Scholar
  9. Fukumizu K, Gretton A, Sun X, Schölkopf B (2008) Kernel measures of conditional dependence. In: Advances in neural information processing systems, pp 489–496Google Scholar
  10. Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory 3734:63–77MathSciNetMATHGoogle Scholar
  11. Gretton A, Borgwardt K, Rasch M, Schölkopf B, Smola A (2012a) A kernel two-sample test. J Mach Learning Res 13:723–773MathSciNetMATHGoogle Scholar
  12. Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012b) A kernel two-sample test. J Mach Learning Res 13(1):723–773MathSciNetMATHGoogle Scholar
  13. Haghighi A, Liang P, Berg-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: Annual meeting of the association for computational linguistics: human language technologies, pp 771–779Google Scholar
  14. Iwata T, Hirao T, Ueda N (2013) Unsupervised cluster matching via probabilistic latent variable models. In: AAAI conference on artificial intelligence, pp 445–451Google Scholar
  15. Jagarlamudi J, Juarez S, Daumé III H (2010) Kernelized sorting for natural language processing. In: AAAI conference on artificial intelligence, pp 1020–1025Google Scholar
  16. Kamahara J, Asakawa T, Shimojo S, Miyahara H (2005) A community-based recommendation system to reveal unexpected interests. In: International multimedia modelling conference, pp 433–438Google Scholar
  17. Klami A (2012) Variational Bayesian matching. In: Asian conference on machine learning, pp 205–220Google Scholar
  18. Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Logist Q 2(1–2):83–97MathSciNetCrossRefMATHGoogle Scholar
  19. Li B, Yang Q, Xue X (2009) Transfer learning for collaborative filtering via a rating-matrix generative model. In: International conference on machine learning, pp 617–624Google Scholar
  20. Muandet K, Schölkopf B (2013) One-class support measure machines for group anomaly detection. In: Conference on uncertainty in artificial intelligence, pp 449–458Google Scholar
  21. Muandet K, Fukumizu K, Dinuzzo F, Schölkopf B (2012) Learning from distributions via support measure machines. In: Advances in neural information processing systems, pp 10–18Google Scholar
  22. Parthasarathy KR (1967) Probability measures on metric spaces. Academic Press, New YorkCrossRefMATHGoogle Scholar
  23. Quadrianto N, Smola AJ, Song L, Tuytelaars T (2010) Kernelized sorting. IEEE Trans Pattern Anal Mach Intell 32(10):1809–1821CrossRefGoogle Scholar
  24. Shvaiko P, Euzenat J (2013) Ontology matching: state of the art and future challenges. IEEE Trans Knowl Data Eng 25(1):158–176CrossRefGoogle Scholar
  25. Smola A, Gretton A, Song L, Schölkopf B (2007) A Hilbert space embedding for distributions. In: Algorithmic learning theory, pp 13–31Google Scholar
  26. Socher R, Fei-Fei L (2010) Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In: IEEE conference on computer vision and pattern recognition, pp 966–973Google Scholar
  27. Song L, Smola A, Gretton A, Bedo J, Borgwardt K (2012) Feature selection via dependence maximization. J Mach Learning Res 13(1):1393–1434MathSciNetMATHGoogle Scholar
  28. Sriperumbudur BK, Fukumizu K, Gretton A, Lanckriet GR, Schölkopf B (2009) Kernel choice and classifiability for RKHS embeddings of probability distributions. In: Advances in neural information processing systems, pp 1750–1758Google Scholar
  29. Sriperumbudur BK, Gretton A, Fukumizu K, Schölkopf B, Lanckriet GR (2010) Hilbert space embeddings and metrics on probability measures. J Mach Learning Res 11:1517–1561MathSciNetMATHGoogle Scholar
  30. Steinwart I (2001) On the influence of the kernel on the consistency of support vector machines. J Mach Learning Res 2:67–93MathSciNetMATHGoogle Scholar
  31. Taira H, Haruno M (1999) Feature selection in SVM text categorization. In: National conference on artificial intelligence, pp 480–486Google Scholar
  32. Terada A, Sese J (2012) Global alignment of protein-protein interaction networks for analyzing evolutionary changes of network frameworks. In: Proceedings of 4th international conference on bioinformatics and computational biology, pp 196–201Google Scholar
  33. Tripathi A, Klami A, Virpioja S (2010) Bilingual sentence matching using kernel CCA. In: IEEE international workshop on machine learning for signal processing, pp 130–135Google Scholar
  34. Yamada M, Sugiyama M (2011) Cross-domain object matching with model selection. In: International conference on artificial intelligence and statistics, pp 807–815Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  • Tomoharu Iwata
    • 1
  • Motonobu Kanagawa
    • 2
  • Tsutomu Hirao
    • 1
  • Kenji Fukumizu
    • 2
  1. 1.NTT Communication Science LaboratoriesKyotoJapan
  2. 2.The Institute of Statistical MathematicsTokyoJapan

Personalised recommendations