Prior Shift Using the Ratio Estimator

Vaz, Afonso; Izbicki, Rafael; Stern, Rafael Bassi

doi:10.1007/978-3-319-91143-4_3

Afonso Vaz⁶,
Rafael Izbicki⁶ &
Rafael Bassi Stern⁶

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 239))

Included in the following conference series:

International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering

1164 Accesses
1 Citations

Abstract

Several machine learning applications use classifiers as a way of quantifying the prevalence of positive class labels in a target dataset, a task named quantification. For instance, a naive a way of determining what proportion of people like a given product with no labeled reviews is to (i) train a classifier based on the Google Shopping reviews to predict whether a user likes a product given its review, and then (ii) apply this classifier to Facebook/Google+ posts about that product. It is well known that such a two-step approach, named Classify and Count, fails because of dataset shift, and thus, several improvements have been recently proposed under an assumption named prior shift. Unfortunately, these methods only explore the relationship between the covariates and the response via classifiers. Moreover, the literature lacks in the theoretical foundation to improve these techniques. We propose a new family of estimators named Ratio Estimator which is able to explore the relationship between the cov ariates and the response using any function \( g: \mathscr {X} \rightarrow \mathbb {R}\) and not only classifiers. We show that for some choices of g, our estimator matches standard estimators used in the literature. We also explore alternative ways of constructing functions g that lead to estimators with good performance, and compare them using real datasets. Finally, we provide a theoretical analysis of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Forman, G.: Quantifying trends accurately despite classifier error and class imbalance. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 157–166 (2006)
Google Scholar
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Google Scholar
Izbicki, R., Lee, A.B., Freeman, P.E.: Photo-\( z \) estimation: an example of nonparametric conditional density estimation under selection bias. Ann. Appl. Stat. 11(2), 698–724 (2017)
Article MathSciNet Google Scholar
Du Plessis, M.C., Sugiyama, M.: Semi-supervised learning of class balance under class-prior change by distribution matching. Neural Netw. 50, 110–119 (2014)
Article Google Scholar
Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17, 164–206 (2008)
Article MathSciNet Google Scholar
Lehmann, E.L.: Elements of Large-sample Theory. Springer Science & Business Media, Berlin (2004)
Google Scholar
Scholkopf, B., Smola, A.J.: Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press, Cambridge (2001)
Google Scholar
Zhang, L.H.: On optimizing the sum of the Rayleigh quotient and the generalized Rayleigh quotient on the unit sphere. Comput. Optim. Appl. 54(1), 111 (2013)
Article MathSciNet Google Scholar
Freeman, P.E., Izbicki, R., Lee, A.B., Newman, J.A., Conselice, C.J., Koekemoer, A.M., Lotz, J.M., Mozena, M.: New image statistics for detecting disturbed galaxy morphologies at high redshift. Mon. Not. R. Astron. Soc. 434(1), 282–295 (2013)
Article Google Scholar
Izbicki, R., Stern, R.B.: Learning with many experts: model selection and sparsity. Mon. Not. R. Astron. Soc. 6(6), 565–577 (2013)
MathSciNet MATH Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California. Department of Information and Computer Science, vol. 55, (1998)

Download references

Acknowledgements

This work was partially supported by FAPESP grant 2017/03363-8 and CAPES.

Author information

Authors and Affiliations

Federal University of São Carlos, Rod. Washington Luís km 235, 310, São Carlos, SP, Brazil
Afonso Vaz, Rafael Izbicki & Rafael Bassi Stern

Authors

Afonso Vaz
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Izbicki
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Bassi Stern
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Afonso Vaz .

Editor information

Editors and Affiliations

Department of Statistics, Federal University of São Carlos, São Carlos, São Paulo, Brazil
Adriano Polpo
Applied Mathematics, University of São Paulo, São Paulo, São Paulo, Brazil
Julio Stern
Institute of Mathematical Sciences and Computing, University of São Paulo, São Paulo, São Paulo, Brazil
Francisco Louzada
Department of Statistics, Federal University of São Carlos, São Carlos, São Paulo, Brazil
Rafael Izbicki
Itaú Asset Management, Banco Itaú-Unibanco, São Paulo, São Paulo, Brazil
Hellinton Takada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vaz, A., Izbicki, R., Stern, R.B. (2018). Prior Shift Using the Ratio Estimator. In: Polpo, A., Stern, J., Louzada, F., Izbicki, R., Takada, H. (eds) Bayesian Inference and Maximum Entropy Methods in Science and Engineering. maxent 2017. Springer Proceedings in Mathematics & Statistics, vol 239. Springer, Cham. https://doi.org/10.1007/978-3-319-91143-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-91143-4_3
Published: 13 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91142-7
Online ISBN: 978-3-319-91143-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics