Learning monotone nonlinear models using the Choquet integral
 Ali Fallah Tehrani,
 Weiwei Cheng,
 Krzysztof Dembczyński,
 Eyke Hüllermeier
 … show all 4 hide
Abstract
The learning of predictive models that guarantee monotonicity in the input variables has received increasing attention in machine learning in recent years. By trend, the difficulty of ensuring monotonicity increases with the flexibility or, say, nonlinearity of a model. In this paper, we advocate the socalled Choquet integral as a tool for learning monotone nonlinear models. While being widely used as a flexible aggregation operator in different fields, such as multiple criteria decision making, the Choquet integral is much less known in machine learning so far. Apart from combining monotonicity and flexibility in a mathematically sound and elegant manner, the Choquet integral has additional features making it attractive from a machine learning point of view. Notably, it offers measures for quantifying the importance of individual predictor variables and the interaction between groups of variables. Analyzing the Choquet integral from a classification perspective, we provide upper and lower bounds on its VCdimension. Moreover, as a methodological contribution, we propose a generalization of logistic regression. The basic idea of our approach, referred to as choquistic regression, is to replace the linear function of predictor variables, which is commonly used in logistic regression to model the log odds of the positive class, by the Choquet integral. First experimental results are quite promising and suggest that the combination of monotonicity and flexibility offered by the Choquet integral facilitates strong performance in practical applications.
 Angilella, S., Greco, S., & Matarazzo, B. (2009). Nonadditive robust ordinal regression with Choquet integral, bipolar and level dependent Choquet integrals. In Proceedings of the joint 2009 international fuzzy systems association world congress and 2009 European society of fuzzy logic and technology conference. IFSA/EUSFLAT (pp. 1194–1199).
 Beliakov, G. (2008). Fitting fuzzy measures by linear programming. Programming library fmtools. In Proc. FUZZIEEE 2008, IEEE international conference on fuzzy systems, Piscataway, NJ (pp. 862–867).
 Beliakov, G., & James, S. (2011). Citationbased journal ranks: the use of fuzzy measures. Fuzzy Sets and Systems, 167(1), 101–119. CrossRef
 BenDavid, A. (1995). Monotonicity maintenance in informationtheoretic machine learning algorithms. Machine Learning, 19, 29–43.
 BenDavid, A., Sterling, L., & Pao, Y. H. (1989). Learning and classification of monotonic ordinal concepts. Computational Intelligence, 5(1), 45–49. CrossRef
 Bohanec, M., & Rajkovic, V. (1990). Expert system for decision making. Sistemica, 1(1), 145–157.
 Chandrasekaran, R., Ryu, Y., Jacob, V., & Hong, S. (2005). Isotonic separation. INFORMS Journal on Computing, 17, 462–474. CrossRef
 Choquet, G. (1954). Theory of capacities. Annales de L’Institut Fourier, 5, 131–295. CrossRef
 Daniels, H., & Kamp, B. (1999). Applications of mlp networks to bond rating and house pricing. Neural Computation and Applications, 8, 226–234. CrossRef
 Dembczyński, K., Kotłowski, W., & Słowiński, R. (2006). Additive preference model with piecewise linear components resulting from dominancebased rough set approximations. In Lecture notes in computer science: Vol. 4029. International conference on artificial intelligence and soft computing 2006 (pp. 499–508).
 Dembczyński, K., Kotlowski, W., & Slowinski, R. (2009). Learning rule ensembles for ordinal classification with monotonicity constraints. Fundamenta Informaticae, 94(2), 163–178.
 Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
 Duivesteijn, W., & Feelders, A. (2008). Nearest neighbour classification with monotonicity constraints. In Lecture notes in computer science: Vol. 5211. Machine learning and knowledge discovery in databases (pp. 301–316). Berlin: Springer. CrossRef
 Fallah Tehrani, A., Cheng, W., Dembczynski, K., & Hüllermeier, E. (2011). Learning monotone nonlinear models using the Choquet integral. In Proceedings ECML/PKDD–2011, European conference on machine learning and principles and practice of knowledge discovery in databases, Athens, Greece.
 Feelders, A. (2010). Monotone relabeling in ordinal classification. In Proceedings of the 10th IEEE international conference on data mining (pp. 803–808). Washington: IEEE Computer Society. CrossRef
 Grabisch, M. (1995a). Fuzzy integral in multicriteria decision making. Fuzzy Sets and Systems, 69(3), 279–298. CrossRef
 Grabisch, M. (1995b). A new algorithm for identifying fuzzy measures and its application to pattern recognition. In Proceedings of IEEE international conference on fuzzy systems (Vol. 1, pp. 145–150). New York: IEEE.
 Grabisch, M. (1997). korder additive discrete fuzzy measures and their representation. Fuzzy Sets and Systems, 92(2), 167–189. CrossRef
 Grabisch, M. (2003). Modelling data by the Choquet integral. In Information fusion in data mining (pp. 135–148). Berlin: Springer.
 Grabisch, M., & Nicolas, J. M. (1994). Classification by fuzzy integral: performance and tests. Fuzzy Sets and Systems, 65(2–3), 255–271. CrossRef
 Grabisch, M., Murofushi, T., & Sugeno, M. (Eds.) (2000). Fuzzy measures and integrals: theory and applications. Heidelberg: Physica.
 Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. (2009). The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18. CrossRef
 Hosmer, D., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley. CrossRef
 Hüllermeier, E., & Fallah Tehrani, A. (2012a). Efficient learning of classifiers based on the 2additive Choquet integral. In Computational intelligence in intelligent data analysis. Studies in computational intelligence. Springer, forthcoming.
 Hüllermeier, E., & Fallah Tehrani, A. (2012b). On the VC dimension of the Choquet integral. In IPMU–2012, 14th international conference on information processing and management of uncertainty in knowledgebased systems, Catania, Italy.
 Jaccard, J. (2001). Interaction effects in logistic regression. Newbury Park: Sage Publications.
 Kotłowski, W., Dembczyński, K., Greco, S., & Słowiński, R. (2008). Stochastic dominancebased rough set model for ordinal classification. Information Sciences, 178(21), 3989–4204. CrossRef
 Landwehr, N., Hall, M., & Frank, E. (2003). Logistic model trees. In Proceedings of the 14th European conference on machine learning (pp. 241–252). Berlin: Springer.
 Lee, S., Lee, H., Abbeel, P., & Ng, A. (2006). Efficient L1 regularized logistic regression. In Proceedings of the 21st national conference on artificial intelligence (pp. 401–408). Menlo Park: AAAI.
 Modave, F., & Grabisch, M. (1998). Preference representation by a Choquet integral: commensurability hypothesis. In Proceedings of the 7th international conference on information processing and management of uncertainty in knowledgebased systems (pp. 164–171). Paris: Editions EDK.
 Mori, T., & Murofushi, T. (1989). An analysis of evaluation model using fuzzy measure and the Choquet integral. In Proceedings of the 5th fuzzy system symposium (pp. 207–212). Japan Society for Fuzzy Sets and Systems.
 Murofushi, T., & Soneda, S. (1993). Techniques for reading fuzzy measures (III): interaction index. In Proceedings of the 9th fuzzy systems symposium (pp. 693–696).
 Potharst, R., & Feelders, A. (2002). Classification trees for problems with monotonicity constraints. ACM SIGKDD Explorations Newsletter, 4(1), 1–10. CrossRef
 Sill, J. (1998). Monotonic networks. In Advances in neural information processing systems (pp. 661–667). Denver: MIT Press.
 Sperner, E. (1928). Ein Satz über Untermengen einer endlichen Menge. Mathematische Zeitschrift, 27(1), 544–548. CrossRef
 Sugeno, M. (1974). Theory of fuzzy integrals and its application. Ph.D. thesis, Tokyo Institute of Technology.
 Tibshirani, R. J., Hastie, T. J., & Friedman, J. (2001). The elements of statistical learning: data mining, inference, and prediction. Berlin: Springer.
 Torra, V. (2011). Learning aggregation operators for preference modeling. In Preference learning (pp. 317–333). Berlin: Springer.
 Torra, V., & Narukawa, Y. (2007). Modeling decisions: information fusion and aggregation operators. Berlin: Springer.
 Valiant, L. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142. CrossRef
 Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
 Vitali, G. (1925). Sulla definizione di integrale delle funzioni di una variabile. Annali Di Matematica Pura Ed Applicata, 2(1), 111–121. CrossRef
 Title
 Learning monotone nonlinear models using the Choquet integral
 Journal

Machine Learning
Volume 89, Issue 12 , pp 183211
 Cover Date
 20121001
 DOI
 10.1007/s1099401253183
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Choquet integral
 Monotone learning
 Nonlinear models
 Choquistic regression
 Classification
 VC dimension
 Industry Sectors
 Authors

 Ali Fallah Tehrani ^{(1)}
 Weiwei Cheng ^{(1)}
 Krzysztof Dembczyński ^{(2)}
 Eyke Hüllermeier ^{(1)}
 Author Affiliations

 1. Department of Mathematics and Computer Science, Marburg University, Marburg, Germany
 2. Institute of Computing Science, Poznań University of Technology, Poznań, Poland