Machine Learning

, Volume 99, Issue 2, pp 169–187 | Cite as

Adaptive Euclidean maps for histograms: generalized Aitchison embeddings

Article
  • 355 Downloads

Abstract

Learning distances that are specifically designed to compare histograms in the probability simplex has recently attracted the attention of the machine learning community. Learning such distances is important because most machine learning problems involve bags of features rather than simple vectors. Ample empirical evidence suggests that the Euclidean distance in general and Mahalanobis metric learning in particular may not be suitable to quantify distances between points in the simplex. We propose in this paper a new contribution to address this problem by generalizing a family of embeddings proposed by Aitchison (J R Stat Soc 44:139–177, 1982) to map the probability simplex onto a suitable Euclidean space. We provide algorithms to estimate the parameters of such maps by building on previous work on metric learning approaches. The criterion we study is not convex, and we consider alternating optimization schemes as well as accelerated gradient descent approaches. These algorithms lead to representations that outperform alternative approaches to compare histograms in a variety of contexts.

Keywords

Metric learning for histograms Aitchison geometry  Probability simplex Embeddings 

References

  1. Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society, 44, 139–177.MATHMathSciNetGoogle Scholar
  2. Aitchison, J. (1986). The statistical analysis of compositional data. London: Chapman and Hall Ltd.MATHCrossRefGoogle Scholar
  3. Aitchison, J. (2003). A concise guide to compositional data analysis. In CDA workshop.Google Scholar
  4. Aitchison, J., & Lauder, I. J. (1985). Kernel density estimation for compositional data. Applied statistics, 34, 129–137.MATHCrossRefGoogle Scholar
  5. Aitchison, J., & Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67, 261–272.MATHMathSciNetCrossRefGoogle Scholar
  6. Aizawa, A. (2003). An information-theoretic perspective of tf-idf measures. Information Processing and Management, 39(1), 45–65.MATHMathSciNetCrossRefGoogle Scholar
  7. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval (Vol. 463). New York: ACM press.Google Scholar
  8. Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In European conference on computer vision (pp. 404–417).Google Scholar
  9. Blei, D., & Lafferty, J. (2006). Correlated topic models. In B. Schökopf, J. C. Platt & T. Hoffman (Eds.), Advances in Neural Information Processing Systems (pp. 147–154). Vancouver, Canada: MIT Press.Google Scholar
  10. Blei, D., & Lafferty, J. (2009). Topic models. In A. Srivastava & M. Sahami (Eds.), Text mining: Classification, clustering, and applications. Boca Raton, FL: Chapman & Hall, CRC Press.Google Scholar
  11. Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.MATHGoogle Scholar
  12. Burge, C., Campbell, A. M., & Karlin, S. (1992). Over-and under-representation of short oligonucleotides in DNA sequences. National Academy of Sciences, 89(4), 1358–1362.CrossRefGoogle Scholar
  13. Campbell, W. M., & Richardson, F. S. (2007). Discriminative keyword selection using support vector machines. In J. C. Platt, D. Koller, Y. Singer & S. T. Roweis (Eds.), Advances in Neural Information Processing Systems. Vancouver, Canada: Curran Associates, Inc.Google Scholar
  14. Campbell, W. M., Campbell, J. P., Reynolds, D. A., Jones, D. A., & Leek,T. R. (2003). Phonetic speaker recognition with support vector machines. In S. Thrun, L. K. Saul & B. Schökopf (Eds.), Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press.Google Scholar
  15. Cuturi, M., & Avis, D. (2014). Ground metric learning. Journal of Machine Learning Research, 15(1), 533–564.Google Scholar
  16. Cuturi, M., & Avis, D. (2011). Ground metric learning. arXiv preprint arXiv:1110.2306.
  17. Davis, J. V., Kulis, B., Jain, P., Sra, S., & Dhillon, I. S. (2007). Information-theoretic metric learning. In International conference on machine learning, pp. 209–216.Google Scholar
  18. Doddington, G. (2001). Speaker recognition based on idiolectal differences between speakers. In P. Dalsgaard, B. Lindberg, H. Benner, & Z.-H. Tan (Eds.), Eurospeech (pp. 2521–2524). Aalborg, Denmark: Center for Personkommunikation, Aalborg University.Google Scholar
  19. Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., & Barcel-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3), 279–300.MATHMathSciNetCrossRefGoogle Scholar
  20. Erhan, S., Marzolf, T., & Cohen, L. (1980). Amino-acid neighborhood relationships in proteins. Breakdown of amino-acid sequences into overlapping doublets, triplets and quadruplets. International Journal of Bio-Medical Computing, 11(1), 67–75.CrossRefGoogle Scholar
  21. Globerson, A., & Roweis, S. T. (2005). Metric learning by collapsing classes. In Y. Weiss, B. Schökopf & J. C. Platt (Eds.), Advances in Neural Information Processing Systems (pp. 451–458). Vancouver, Canada: MIT Press.Google Scholar
  22. Goldberger, J., Roweis, S. T., Hinton, G. E., & Salakhutdinov, R. (2004). Neighbourhood components analysis. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press.Google Scholar
  23. Joachims, T. (2002). Learning to classify text using support vector machines: Methods: Theory and algorithms. Berlin: Springer.CrossRefGoogle Scholar
  24. Julesz, B. (1981). Textons, the elements of texture perception, and their interactions. Nature, 290(5802), 91–97.Google Scholar
  25. Kedem, D., Tyree, S., Weinberger, K. Q., Sha, F., & Lanckriet, G. (2012). Nonlinear metric learning. In F. Pereira, C . J. C. Burges, L. Bottou & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (pp. 2582–2590). Nevada: Curran Associates, Inc.Google Scholar
  26. Kwok, J. T., & Tsang, I. W. (2003). Learning with idealized kernels. In International conference on machine learning (pp. 400–407).Google Scholar
  27. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Computer Vision and Pattern Recognition, 2, 2169–2178.Google Scholar
  28. Le, T., & Cuturi, M. (2013). Generalized aitchison embeddings for histograms. In Asian conference on machine learning (pp. 293–308).Google Scholar
  29. Le, T., Kang, Y., Sugimoto, A., Tran, S., & Nguyen, T. (2011). Hierarchical spatial matching kernel for image categorization. In International conference on image analysis and recognition (pp. 141–151).Google Scholar
  30. Leslie, C. S., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. In Pacific symposium on biocomputing (Vol. 7, pp. 566–575).Google Scholar
  31. Lewis, D. D., Yang, Y., Rose, T. G., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.Google Scholar
  32. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
  33. Madsen, R. E., Kauchak, D., & Elkan, C. (2005). Modeling word burstiness using the dirichlet distribution. In International conference on machine learning (pp. 545–552).Google Scholar
  34. Nesterov, Y. (1983). A method of solving a convex programming problem with convergence rate o (1/k2). In Soviet Mathematics Doklady (Vol. 27, pp. 372–376).Google Scholar
  35. Nesterov, Y. (2004). Introductory lectures on convex optimization: A basic course. Berlin: Springer.CrossRefGoogle Scholar
  36. O’Donoghue, B., & Candès, E. (2013). Adaptive restart for accelerated gradient schemes. Foundations of Computational Mathematics, 13, 1–18.Google Scholar
  37. Perronnin, F., Sánchez, J., & Liu, Y. (2010). Large-scale image categorization with explicit data embedding. In Computer vision and pattern recogition (pp. 2297–2304). San Francisco, CA: Curran Associates, Inc.Google Scholar
  38. Rennie, J. D., Shih, L., Teevan, J., & Karger, D. (2003). Tackling the poor assumptions of naive bayes text classifiers. In International conference on machine learning (Vol. 3, pp. 616–623).Google Scholar
  39. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.CrossRefGoogle Scholar
  40. Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of. Reading: Addison-Wesley.Google Scholar
  41. Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.MATHGoogle Scholar
  42. Schultz, M., & Joachims, T. (2003). Learning a distance metric from relative comparisons. In S. Thrun , L. K. Saul & B. Schökopf (Eds.), Advances in Neural Information Processing Systems (Vol. 16, p. 41). Vancouver, Canada: MIT Press.Google Scholar
  43. Shalev-Shwartz, S., Singer, Y., & Ng, A. Y. (2004). Online and batch learning of pseudo-metrics. In International conference on machine learning (p. 94).Google Scholar
  44. Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In International conference on computer vision.Google Scholar
  45. Torresani, L., & Lee, K. (2006). Large margin component analysis. In Advances in Neural Information Processing Systems (pp. 1385–1392).Google Scholar
  46. Vedaldi, A., & Zisserman, A. (2012). Efficient additive kernels via explicit feature maps. IEEE Pattern Analysis and Machine Intelligence, 34(3), 480–492.CrossRefGoogle Scholar
  47. Weinberger, K. Q., & Saul, L. K. (2008). Fast solvers and efficient implementations for distance metric learning. In International conference on machine learning (pp. 1160–1167).Google Scholar
  48. Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10, 207–244.MATHGoogle Scholar
  49. Weinberger, K. Q., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems (pp. 1473–1480).Google Scholar
  50. Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. J. (2002). Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems (pp. 1473–1480).Google Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations