A Kernel Between Unordered Sets of Data: The Gaussian Mixture Approach

  • Siwei Lyu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


In this paper, we present a new kernel for unordered sets of data of the same type. It works by first fitting a set with a Gaussian mixture, then evaluate an efficient kernel on the two fitted Gaussian mixtures. Furthermore, we show that this kernel can be extended to sets embedded in a feature space implicitly defined by another kernel, where Gaussian mixtures are fitted with the kernelized EM algorithm [6], and the kernel for Gaussian mixtures are modified to use the outputs from the kernelized EM. All computation depends on data only through their inner products as evaluations of the base kernel. The kernel is computable in closed form, and being able to work in a feature space improves its flexibility and applicability. Its performance is evaluated in experiments on both synthesized and real data.


Feature Space Gaussian Mixture Model Neural Information Processing System Kernel Evaluation Neural Information Processing System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. Journal of Machine Learning Research 2(2), 125–137 (2002)zbMATHCrossRefGoogle Scholar
  2. 2.
    Bilmes, J.: A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture and hidden Markov models. Technical Report ICSI-TR-97-021, UC Berkeley (1997)Google Scholar
  3. 3.
    Campbell, S.L., Meyer Jr., C.D.: Generalized Inverses of Linear Transformations. Dover, New York (1991)zbMATHGoogle Scholar
  4. 4.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001), Software available at
  5. 5.
    Hein, M., Bousquet, O.: Hilbertian metrics and positive definite kernels on probability measures. Technical report, MPI for Biological Cybernetics (2004)Google Scholar
  6. 6.
    Lee, J., Wang, J., Zhang, C.: Kernel trick embedded gaussian mixture model. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds.) ALT 2003. LNCS (LNAI), vol. 2842, pp. 159–174. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, NIPS (1999)Google Scholar
  8. 8.
    Jebara, T.: Image as bag of pixels. In: International Conference on Computer Vision, ICCV (2003)Google Scholar
  9. 9.
    Jebara, T., Kondor, R., Howard, A.: Probability product kernels. Journal of Machine Learning Research 5 (2004)Google Scholar
  10. 10.
    Kondor, R., Jebara, T.: A kernel between sets of vectors. In: International Conference on Machine Learning, ICML (2003)Google Scholar
  11. 11.
    Kondor, R., Lafferty, J.: Diffusion kernels on graphs and other discrete input spaces. In: International Conference on Machine Learning, ICML (2002)Google Scholar
  12. 12.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  13. 13.
    Lyu, S.: Kernel between sets: the gaussian mixture approach. Technical Report TR2005-214, Computer Science Department, Dartmouth College (2005)Google Scholar
  14. 14.
    Mika, S., Rütsch, G., Weston, J., Schölkopf, B., Müller, K.-R.: Fisher discriminant analysis with kernels. In: IEEE Conference on Neural Networks for Signal Processing (1999)Google Scholar
  15. 15.
    Mika, S., Schölkopf, B., Smola, A., Müller, K.-R., Scholz, M., Rütsch, G.: Kernel PCA and de-noising in feature spaces. In: Advances in Neural Information Processing Systems 11, Cambridge, MA, pp. 536–542 (1999)Google Scholar
  16. 16.
    Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Statistics 33, 1,065–1,076 (1962)Google Scholar
  17. 17.
    Roweis, S.: Gaussian Identities. Department of Computer Science, U. Toronto (2001), Manuscript available at
  18. 18.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge (2004)Google Scholar
  19. 19.
    Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset: An information-theoretic approach. Journal of American Statistical Association 98(463), 750–763 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  21. 21.
    Vishwanathan, S., Smola, A.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems, NIPS (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Siwei Lyu
    • 1
  1. 1.Department of Computer ScienceDartmouth CollegeHanoverUSA

Personalised recommendations