A Framework for Deep Quantification Learning

Qi, Lei; Khaleel, Mohammed; Tavanapong, Wallapak; Sukul, Adisak; Peterson, David

doi:10.1007/978-3-030-67658-2_14

Lei Qi¹²,
Mohammed Khaleel¹²,
Wallapak Tavanapong¹²,
Adisak Sukul¹² &
…
David Peterson¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12457))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1780 Accesses
2 Citations
1 Altmetric

Abstract

A quantification learning task estimates class ratios or class distribution given a test set. Quantification learning is useful for a variety of application domains such as commerce, public health, and politics. For instance, it is desirable to automatically estimate the proportion of customer satisfaction in different aspects from product reviews to improve customer relationships. We formulate the quantification learning problem as a maximum likelihood problem and propose the first end-to-end Deep Quantification Network (DQN) framework. DQN jointly learns quantification feature representations and directly predicts the class distribution. Compared to classification-based quantification methods, DQN avoids three separate steps: classification of individual instances, calculation of the predicted class ratios, and class ratio adjustment to account for classification errors. We evaluated DQN on four public datasets, ranging from movie and product reviews to multi-class news. We compared DQN against six existing quantification methods and conducted a sensitivity analysis of DQN performance. Compared to the best existing method in our study, (1) DQN reduces Mean Absolute Error (MAE) by about 35%. (2) DQN uses around 40% less training samples to achieve a comparable MAE.

This work is partially supported in part by the NSF SBE Grant No. 1729775.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Google Scholar
Barranquero, J., et al.: On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recognit. 46(2), 472–82 (2013)
Article Google Scholar
Asoh, H., et al.: A fast and simple method for profiling a population of twitter users. In: The Third International Workshop on Mining Ubiquitous and Social Environments. Citeseer (2012)
Google Scholar
Buck, A.A., Gart, J.J.: Comparison of a screening test and a reference test in epidemiologic studies. II. A probabilistic model for the comparison of diagnostic tests. Am. J. Epidemiol. 83(3), 593–602 (1966)
Article Google Scholar
Forman, G.: Quantifying trends accurately despite classifier error and class imbalance. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)
Google Scholar
González, P., Castaño, A., Chawla, N.V., Coz, J.J.D.: A review on quantification learning. ACM Comput. Surv. (CSUR) 50(5), 74 (2017)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Hofer, V., Krempl, G.: Drift mining in data: a framework for addressing drift in classification. Comput. Stat. Data Anal. 57(1), 377–391 (2013)
Article MathSciNet Google Scholar
King, G., Lu, Y.: Verbal autopsy methods with multiple causes of death. Stat. Sci. 23(1), 78–91 (2008)
Article MathSciNet Google Scholar
González-Castro, V., Alaiz-Rodríguez, R., Fernández-Robles, L., Guzmán-Martínez, R., Alegre, E.: Estimating class proportions in boar semen analysis using the Hellinger distance. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS (LNAI), vol. 6096, pp. 284–293. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13022-9_29
Chapter Google Scholar
Forman, G.: Counting positives accurately despite inaccurate classification. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 564–575. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_55
Chapter Google Scholar
Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17(2), 164–206 (2008)
Article MathSciNet Google Scholar
Hopkins, D.J., King, G.: A method of automated nonparametric content analysis for social science. Am. J. Polit. Sci. 54(1), 229–247 (2010)
Article Google Scholar
Bella, A., Ferri, C., Hernández-Orallo, J., et al.: Quantification via probability estimators. In: 2010 IEEE International Conference on Data Mining, pp. 737–742. IEEE (2010)
Google Scholar
Milli, L., Monreale, A., Rossetti, G., et al.: Quantification trees. In: ICDM, pp. 528–536. IEEE (2013)
Google Scholar
Esuli, A., Sebastiani, F.: Optimizing text quantifiers for multivariate loss functions. ACM Trans. Knowl. Discov. Data 9(4), 27:1–27:27 (2015)
Article Google Scholar
Barranquero, J., et al.: Quantification-oriented learning based on reliable classifiers. Pattern Recognit. 48(2), 591–604 (2015)
Article Google Scholar
Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the 22nd International Conference on Machine Learning (2005)
Google Scholar
Pérez-Gállego, P., Quevedo, J.R., del Coz, J.J.: Using ensembles for problems with characterizable changes in data distribution: a case study on quantification. Inf. Fusion 34, 87–100 (2017)
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Trans. Inf. Theory 49(7), 1858–1860 (2003)
Article MathSciNet Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Google Scholar
Zipf, G.K.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Ravenio Books, Cambridge (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Maas, A.L., et al.: Learning word vectors for sentiment analysis. In: ACL, pp. 142–150 (2011)
Google Scholar
Zhang, X., et al.: Character-level convolutional networks for text classification. In: NIPS, pp. 649–657 (2015)
Google Scholar
Lang, K.: NewsWeeder: learning to filter netnews. In: Machine Learning Proceedings, pp. 331–339 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Iowa State University, Ames, IA, 50011, USA
Lei Qi, Mohammed Khaleel, Wallapak Tavanapong, Adisak Sukul & David Peterson

Authors

Lei Qi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Khaleel
View author publications
You can also search for this author in PubMed Google Scholar
Wallapak Tavanapong
View author publications
You can also search for this author in PubMed Google Scholar
Adisak Sukul
View author publications
You can also search for this author in PubMed Google Scholar
David Peterson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Qi .

Editor information

Editors and Affiliations

Albert-Ludwigs-Universität, Freiburg, Germany
Frank Hutter
TU Darmstadt, Darmstadt, Germany
Kristian Kersting
Ghent University, Ghent, Belgium
Jefrey Lijffijt
Saarland University, Saarbrücken, Germany
Isabel Valera

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qi, L., Khaleel, M., Tavanapong, W., Sukul, A., Peterson, D. (2021). A Framework for Deep Quantification Learning. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-67658-2_14
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67657-5
Online ISBN: 978-3-030-67658-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)