Density Ratio Estimation: A New Versatile Tool for Machine Learning

  • Masashi Sugiyama
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5828)


A new general framework of statistical data processing based on the ratio of probability densities has been proposed recently and gathers a great deal of attention in the machine learning and data mining communities [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17]. This density ratio framework includes various statistical data processing tasks such as non-stationarity adaptation [18,1,2,4,13], outlier detection [19,20,21,6], and conditional density estimation [22,23,24,15]. Furthermore, mutual information—which plays a central role in information theory [25]—can also be estimated via density ratio estimation. Since mutual information is a measure of statistical independence between random variables [26,27,28], density ratio estimation can be used also for variable selection [29,7,11], dimensionality reduction [30,16], and independent component analysis [31,12].


Mutual Information Density Ratio Outlier Detection Independent Component Analysis Neural Information Processing System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 903–910. ACM Press, New York (2004)Google Scholar
  2. 2.
    Sugiyama, M., Müller, K.R.: Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions 23(4), 249–279 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Huang, J., Smola, A., Gretton, A., Borgwardt, K.M., Schölkopf, B.: Correcting sample selection bias by unlabeled data. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 601–608. MIT Press, Cambridge (2007)Google Scholar
  4. 4.
    Sugiyama, M., Krauledat, M., Müller, K.R.: Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research 8, 985–1005 (2007)Google Scholar
  5. 5.
    Bickel, S., Brückner, M., Scheffer, T.: Discriminative learning for differing training and test distributions. In: Proceedings of the 24th International Conference on Machine Learning, pp. 81–88 (2007)Google Scholar
  6. 6.
    Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., Kanamori, T.: Inlier-based outlier detection via direct density ratio estimation. In: Giannotti, F., Gunopulos, D., Turini, F., Zaniolo, C., Ramakrishnan, N., Wu, X. (eds.) Proceedings of IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, December 15–19, pp. 223–232 (2008)Google Scholar
  7. 7.
    Suzuki, T., Sugiyama, M., Sese, J., Kanamori, T.: Approximating mutual information by maximum likelihood density ratio estimation. In: Saeys, Y., Liu, H., Inza, I., Wehenkel, L., de Peer, Y.V. (eds.) JMLR Workshop and Conference Proceedings. New Challenges for Feature Selection in Data Mining and Knowledge Discovery, vol. 4, pp. 5–20 (2008)Google Scholar
  8. 8.
    Sugiyama, M., Nakajima, S., Kashima, H., von Bünau, P., Kawanabe, M.: Direct importance estimation with model selection and its application to covariate shift adaptation. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20, pp. 1433–1440. MIT Press, Cambridge (2008)Google Scholar
  9. 9.
    Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von Bünau, P., Kawanabe, M.: Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics 60(4), 699–746 (2008)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Kanamori, T., Hido, S., Sugiyama, M.: Efficient direct density ratio estimation for non-stationarity adaptation and outlier detection. In: Koller, D., Schuurmans, D., Bengio, Y., Botton, L. (eds.) Advances in Neural Information Processing Systems 21, pp. 809–816. MIT Press, Cambridge (2009)Google Scholar
  11. 11.
    Suzuki, T., Sugiyama, M., Kanamori, T., Sese, J.: Mutual information estimation reveals global associations between stimuli and biological processes. BMC Bioinformatics 10(1), S52 (2009)CrossRefGoogle Scholar
  12. 12.
    Suzuki, T., Sugiyama, M.: Estimating squared-loss mutual information for independent component analysis. In: Adali, T., Jutten, C., Romano, J.M.T., Barros, A.K. (eds.) Independeqnt Component Analysis and Signal Separation. LNCS, vol. 5441, pp. 130–137. Springer, Berlin (2009)CrossRefGoogle Scholar
  13. 13.
    Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N. (eds.): Dataset Shift in Machine Learning. MIT Press, Cambridge (2009)Google Scholar
  14. 14.
    Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., Sugiyama, M.: Direct density ratio estimation for large-scale covariate shift adaptation. Journal of Information Processing 17, 138–155 (2009)CrossRefGoogle Scholar
  15. 15.
    Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H.: Least-squares conditional density estimation. Technical Report TR09-0004, Department of Computer Science, Tokyo Institute of Technology (February 2009)Google Scholar
  16. 16.
    Suzuki, T., Sugiyama, M.: Sufficient dimension reduction via squared-loss mutual information estimation. Technical Report TR09-0005, Department of Computer Science, Tokyo Institute of Technology (February 2009)Google Scholar
  17. 17.
    Kanamori, T., Hido, S., Sugiyama, M.: A least-squares approach to direct importance estimation. Journal of Machine Learning Research (to appear, 2009)Google Scholar
  18. 18.
    Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90(2), 227–244 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data (2000)Google Scholar
  20. 20.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)zbMATHCrossRefGoogle Scholar
  21. 21.
    Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22(2), 85–126 (2004)zbMATHCrossRefGoogle Scholar
  22. 22.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  23. 23.
    Takeuchi, I., Le, Q.V., Sears, T.D., Smola, A.J.: Nonparametric quantile estimation. Journal of Machine Learning Research 7, 1231–1264 (2006)MathSciNetGoogle Scholar
  24. 24.
    Takeuchi, I., Nomura, K., Kanamori, T.: Nonparametric conditional density estimation using piecewise-linear solution path of kernel quantile regression. Neural Computation 21(2), 533–559 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, Inc., New York (1991)zbMATHCrossRefGoogle Scholar
  26. 26.
    Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Physical Review E 69(6), 066138 (2004)CrossRefGoogle Scholar
  27. 27.
    Hulle, M.M.V.: Edgeworth approximation of multivariate differential entropy. Neural Computation 17(9), 1903–1910 (2005)zbMATHCrossRefGoogle Scholar
  28. 28.
    Suzuki, T., Sugiyama, M., Tanaka, T.: Mutual information approximation via maximum likelihood estimation of density ratio. In: Proceedings of 2009 IEEE International Symposium on Information Theory (ISIT 2009), Seoul, Korea, June 28–July 3, pp. 463–467 (2009)Google Scholar
  29. 29.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHCrossRefGoogle Scholar
  30. 30.
    Song, L., Smola, A., Gretton, A., Borgwardt, K.M., Bedo, J.: Supervised feature selection via dependence estimation. In: Proceedings of the 24th International Conference on Machine learning, pp. 823–830. ACM, New York (2007)CrossRefGoogle Scholar
  31. 31.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)CrossRefGoogle Scholar
  32. 32.
    Qin, J.: Inferences for case-control and semiparametric two-sample density ratio models. Biometrika 85(3), 619–639 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Cheng, K.F., Chu, C.K.: Semiparametric density estimation under a two-sample density ratio model. Bernoulli 10(4), 583–604 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Fishman, G.S.: Monte Carlo: Concepts, Algorithms, and Applications. Springer, Berlin (1996)zbMATHGoogle Scholar
  35. 35.
    Sugiyama, M., Kawanabe, M., Chui, P.L.: Dimensionality reduction for density ratio estimation in high-dimensional spaces. Neural Networks (to appear)Google Scholar
  36. 36.
    Sugiyama, M., Kanamori, T., Suzuki, T., Hido, S., Sese, J., Takeuchi, I., Wang, L.: A density-ratio framework for statistical data processing. IPSJ Transactions on Computer Vision and Applications (to appear, 2009)Google Scholar
  37. 37.
    Li, Y., Koike, Y., Sugiyama, M.: A framework of adaptive brain computer interfaces. In: Proceedings of the 2nd International Conference on BioMedical Engineering and Informatics (BMEI 2009), Tianjin, China, October 17–19 (to appear, 2009)Google Scholar
  38. 38.
    Hachiya, H., Akiyama, T., Sugiyama, M., Peters, J.: Adaptive importance sampling with automatic model selection in value function approximation. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI 2008), Chicago, Illinois, USA, pp. 1351–1356. The AAAI Press, Menlo Park (2008)Google Scholar
  39. 39.
    Hachiya, H., Akiyama, T., Sugiyama, M., Peters, J.: Adaptive importance sampling for value function approximation in off-policy reinforcement learning. Neural Networks (to appear, 2009)Google Scholar
  40. 40.
    Akiyama, T., Hachiya, H., Sugiyama, M.: Active policy iteration: Efficient exploration through active learning for value function approximation in reinforcement learning. In: Proceedings of the Twenty-first International Joint Conference on Artificial Intelligence (IJCAI 2009), Pasadena, California, USA, July 11–17 (to appear, 2009)Google Scholar
  41. 41.
    Hachiya, H., Peters, J., Sugiyama, M.: Efficient sample reuse in EM-based policy search. In: Machine Learning and Knowledge Discovery in Databases. LNCS. Springer, Berlin (to appear, 2009)Google Scholar
  42. 42.
    Yamada, M., Sugiyama, M., Matsui, T.: Covariate shift adaptation for semi-supervised speaker identification. In: Proceedings of 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 19–24, pp. 1661–1664 (2009)Google Scholar
  43. 43.
    Yamada, M., Sugiyama, M., Matsui, T.: Semi-supervised speaker identification under covariate shift. Signal Processing (to appear, 2009)Google Scholar
  44. 44.
    Takimoto, M., Matsugu, M., Sugiyama, M.: Visual inspection of precision instruments by least-squares outlier detection. In: Proceedings of The Fourth International Workshop on Data-Mining and Statistical Science (DMSS 2009), Kyoto, Japan, July 7–8, pp. 22–26 (2009)Google Scholar
  45. 45.
    Bickel, S., Bogojeska, J., Lengauer, T., Scheffer, T.: Multi-task learning for HIV therapy screening. In: McCallum, A., Roweis, S. (eds.) Proceedings of 25th Annual International Conference on Machine Learning (ICML2008), Helsinki, Finland, July 5–9, pp. 56–63. Omnipress (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Masashi Sugiyama
    • 1
  1. 1.Department of Computer ScienceTokyo Institute of Technology 

Personalised recommendations