A new column generation algorithm for Logical Analysis of Data
- 159 Downloads
We present a new column generation algorithm for the determination of a classifier in the two classes LAD (Logical Analysis of Data) model. Unlike existing algorithms who seek a classifier that at the same time maximizes the margin of correctly classified observations and minimizes the amount of violations of incorrectly classified observations, we fix the margin to a difficult-to-achieve target and minimize a piecewise convex linear function of the violation of incorrectly classified observations. Moreover a part of the training set, called control set, is reserved to select, among all feasible classifiers found by the algorithm, the one with highest performance on that set. One advantage of the proposed algorithm is that it essentially does not require any calibration. Computational results are presented that show the effectiveness of this approach.
KeywordsLogical analysis of data Column generation Classification
Unable to display preview. Download preview PDF.
- Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press. Google Scholar
- Bonates, T. O. (2007). Optimization in logical analysis of data. PhD thesis, Rutgers. The State University of New Jersey. Google Scholar
- Bonates, T. O. (2010). Large margin rule-based classifiers. In J. J. Cochran (Ed.), Wiley encyclopedia of operations research and management science (pp. 1–12). New York: Wiley. Google Scholar
- Bonates, T. O. (2007). Personnal communication. Google Scholar
- Bonates, T. O., & Hammer, P. L. (2007a). A branch-and-bound algorithm for a family of pseudo-boolean optimization problems (Technical Report RRR 21-2007). Rutcor, July 2007. Google Scholar
- Bonates, T. O., & Hammer, P. L. (2007b). Large margin LAD classifiers (Technical Report RRR 22-2007). Rutcor, July 2007. Google Scholar
- Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Proceedings of the fifteenth international conference on machine learning (pp. 82–90). San Francisco: Morgan Kaufmann. Google Scholar
- Eckstein, J., & Goldberg, N. (2009). An improved branch-and-bound method for maximum monomial agreement (Technical Report RRR 14). Rutcor, July 2009. Google Scholar
- Goldberg, N., & Shan, C. C. (2007). Boosting optimal logical patterns using noisy data. In Proceedings of the SIAM international conference on data mining (pp. 228–236). Google Scholar
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. In SIGKDD Explorations (Vol. 11(1)). Google Scholar
- Hammer, P. L. (1986). Partially defined boolean functions and cause-effect relationships. In Proceedings international conf. multi-attribute decision making via OR-based expert systems, Passau, 1986. Google Scholar
- ILOG, CPLEX 10.1.1 documentation (2006). Ilog Cplex Optimization Inc. Google Scholar
- Kearns, M. J., Schapire, R. E., & Sellie, L. M. (1994). Toward efficient agnostic learning. Machine Learning, 17, 115–141. Google Scholar
- Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th international joint conference on artificial intelligence (IJCAI) (pp. 1137–1143). Google Scholar
- Ladtools. http://rutcor.rutgers.edu/pub/LAD/c.
- Martin-Barragan, B. (2006). Mathematical programming for support vector machines. PhD thesis, Universidad de Sevilla. Google Scholar
- Mayoraz, E. (1996). C++ tools for logical analysis of data. Technical Report RTR 1-95, Rutgers University, July 1995. revised June 1996. Google Scholar
- Newman, D., Hettich, S., Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. Google Scholar