Skip to main content

Prior Shift Using the Ratio Estimator

  • Conference paper
  • First Online:
Bayesian Inference and Maximum Entropy Methods in Science and Engineering (maxent 2017)

Abstract

Several machine learning applications use classifiers as a way of quantifying the prevalence of positive class labels in a target dataset, a task named quantification. For instance, a naive a way of determining what proportion of people like a given product with no labeled reviews is to (i) train a classifier based on the Google Shopping reviews to predict whether a user likes a product given its review, and then (ii) apply this classifier to Facebook/Google+ posts about that product. It is well known that such a two-step approach, named Classify and Count, fails because of dataset shift, and thus, several improvements have been recently proposed under an assumption named prior shift. Unfortunately, these methods only explore the relationship between the covariates and the response via classifiers. Moreover, the literature lacks in the theoretical foundation to improve these techniques. We propose a new family of estimators named Ratio Estimator which is able to explore the relationship between the cov ariates and the response using any function \( g: \mathscr {X} \rightarrow \mathbb {R}\) and not only classifiers. We show that for some choices of g, our estimator matches standard estimators used in the literature. We also explore alternative ways of constructing functions g that lead to estimators with good performance, and compare them using real datasets. Finally, we provide a theoretical analysis of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Forman, G.: Quantifying trends accurately despite classifier error and class imbalance. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 157–166 (2006)

    Google Scholar 

  2. Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)

    Google Scholar 

  3. Izbicki, R., Lee, A.B., Freeman, P.E.: Photo-\( z \) estimation: an example of nonparametric conditional density estimation under selection bias. Ann. Appl. Stat. 11(2), 698–724 (2017)

    Article  MathSciNet  Google Scholar 

  4. Du Plessis, M.C., Sugiyama, M.: Semi-supervised learning of class balance under class-prior change by distribution matching. Neural Netw. 50, 110–119 (2014)

    Article  Google Scholar 

  5. Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17, 164–206 (2008)

    Article  MathSciNet  Google Scholar 

  6. Lehmann, E.L.: Elements of Large-sample Theory. Springer Science & Business Media, Berlin (2004)

    Google Scholar 

  7. Scholkopf, B., Smola, A.J.: Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press, Cambridge (2001)

    Google Scholar 

  8. Zhang, L.H.: On optimizing the sum of the Rayleigh quotient and the generalized Rayleigh quotient on the unit sphere. Comput. Optim. Appl. 54(1), 111 (2013)

    Article  MathSciNet  Google Scholar 

  9. Freeman, P.E., Izbicki, R., Lee, A.B., Newman, J.A., Conselice, C.J., Koekemoer, A.M., Lotz, J.M., Mozena, M.: New image statistics for detecting disturbed galaxy morphologies at high redshift. Mon. Not. R. Astron. Soc. 434(1), 282–295 (2013)

    Article  Google Scholar 

  10. Izbicki, R., Stern, R.B.: Learning with many experts: model selection and sparsity. Mon. Not. R. Astron. Soc. 6(6), 565–577 (2013)

    MathSciNet  MATH  Google Scholar 

  11. Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California. Department of Information and Computer Science, vol. 55, (1998)

Download references

Acknowledgements

This work was partially supported by FAPESP grant 2017/03363-8 and CAPES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afonso Vaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vaz, A., Izbicki, R., Stern, R.B. (2018). Prior Shift Using the Ratio Estimator. In: Polpo, A., Stern, J., Louzada, F., Izbicki, R., Takada, H. (eds) Bayesian Inference and Maximum Entropy Methods in Science and Engineering. maxent 2017. Springer Proceedings in Mathematics & Statistics, vol 239. Springer, Cham. https://doi.org/10.1007/978-3-319-91143-4_3

Download citation

Publish with us

Policies and ethics