Advertisement

Machine Learning

, Volume 107, Issue 6, pp 1013–1037 | Cite as

Wallenius Bayes

  • Enric Junqué de Fortuny
  • David Martens
  • Foster Provost
Article
  • 184 Downloads

Abstract

This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.

Keywords

Naive Bayes Wallenius distribution Destructive choice 

Notes

Acknowledgements

Thank you very much to Michal Kosinski, David Stillwell and Thore Graepel for sharing the Facebook Likes data set. Thanks to our reviewers for helpful feedback. David thanks the Flemish Research Council (FWO) for financial support (Grant G.0827.12N). Foster thanks NEC and Andre Meyer for Faculty Fellowships. We thank the Moore and Sloan Foundations for their generous support of the Moore-Sloan Data Science Environment at NYU.

References

  1. Asuncion, A., Newman, D. J. (2007). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
  2. Bourdieu, P. (1984). Distinction: A social critique of the judgement of taste. Harvard University Press. http://books.google.com/books/about/Distinction.html?id=nVaS6gS9Jz4C&pgis=1.
  3. Chesson, J. (1976). A non-central multivariate hypergeometric distribution arising from biased sampling with application to selective predation. Journal of Applied Probability. http://www.jstor.org/stable/10.2307/3212535.
  4. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning Special Issue on Learning with Probabilistic Representations, 29(2–3), 103 – 130. http://link.springer.com/article/10.1023/A:1007413511361.
  5. Etter, J.-F., Le Houezec, J., & Perneger, T. (2003). A self-administered questionnaire to measure dependence on cigarettes: The cigarette dependence scale. Neuropsychopharmacology: Official Publication of the American College of Neuropsychopharmacology, 28(2), 359–70.  https://doi.org/10.1038/sj.npp.1300030.CrossRefGoogle Scholar
  6. Fantino, E., & Navarick, D. (1975). Recent developments in choice. In G. H. Bower (Ed.), Psychology of learning & motivation (p. 304). Academic Press. http://books.google.com/books?hl=en&lr=&id=o5LScJ9ecGUC&pgis=1.
  7. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. http://linkinghub.elsevier.com/retrieve/pii/S016786550500303X
  8. Flach, P., & Lachiche, N. (2000). Decomposing probability distributions on structured individuals. In Work-in-progress reports of the 10th international conference on inductive logic programming (pp. 96–106). http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=1000485.
  9. Flach, P., & Lachiche, N. (2004). Naive Bayesian classification of structured data. Machine Learning, 57(3), 233–269.  https://doi.org/10.1023/B:MACH.0000039778.69032.ab.CrossRefMATHGoogle Scholar
  10. Fog, A. (2008). Calculation methods for Wallenius’ noncentral hypergeometric distribution. Communications in Statistics—Simulation and Computation, 37(2), 258–273.  https://doi.org/10.1080/03610910701790269.MathSciNetCrossRefMATHGoogle Scholar
  11. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272.  https://doi.org/10.1901/jeab.1961.4-267.CrossRefGoogle Scholar
  12. Junqué de Fortuny, E., Martens, D., & Provost, F. (2013). Predictive modeling with big data: Is bigger really better? Big Data, 1(4), 215–226.  https://doi.org/10.1089/big.2013.0037.CrossRefGoogle Scholar
  13. Kahaner, D., Moler, C., Nash, S. (1989). Numerical methods and software. Prentice Hall. ftp://ftp.math.utah.edu/pub/errata/kahaner.errata.
  14. Kant, I. (1790). The critique of judgement (Part one, the critique of aesthetic judgement). BiblioLife. http://www.amazon.com/The-Critique-Judgement-Part-Aesthetic/dp/1420926942.
  15. Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences of the United States of America, 110(15), 5802–5805.  https://doi.org/10.1073/pnas.1218772110.CrossRefGoogle Scholar
  16. Langley, P., Iba, W., & Thomposn, K. (1992). An analysis of Bayesian Classifers. In Proceedings of the tenth national conference on artificial intelligence (No. 415, pp. 223–228). https://pdfs.semanticscholar.org/1925/bacaa10b4ec83a0509132091bb79243b41b6.pdf.
  17. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. Dover Publications. http://www.amazon.com/Individual-Choice-Behavior-Theoretical-Mathematics/dp/0486441369.
  18. McCallum, A., Nigam, K. (1998). A comparison of event models for Naive Bayes text classification. In AAAI workshop on learning for text categorization. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.9324&rep=rep1&type=pdf.
  19. Ng, A., Jordan, M. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and Naive Bayes. In Advances in neural information processing systems. https://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes.pdf.
  20. Sapolsky, R., & Bonetta, L. (1997). The trouble with testosterone: And other essays on the biology of the human predicament. http://www.nature.com/nm/wilma/v3n8.870469132.html.
  21. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273.  https://doi.org/10.1037/h0070288.CrossRefGoogle Scholar
  22. Wallenius, K. (1963). Biased sampling: The non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University.Google Scholar
  23. Ziegler, C., & McNee, S. (2005). Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web (pp. 22–32). http://dl.acm.org/citation.cfm?id=1060754.

Copyright information

© The Author(s) 2018

Authors and Affiliations

  • Enric Junqué de Fortuny
    • 1
  • David Martens
    • 2
  • Foster Provost
    • 3
  1. 1.NYU ShanghaiShanghaiChina
  2. 2.Faculty of Applied EconomicsUniversity of AntwerpAntwerpBelgium
  3. 3.Information, Operations and Management Sciences, Stern School of BusinessNew York UniversityNew York CityUSA

Personalised recommendations