Optimal Subset Selection for Classification through SAT Encodings

  • Fabrizio Angiulli
  • Stefano Basta
Conference paper
Part of the IFIP – The International Federation for Information Processing book series (IFIPAICT, volume 276)


In this work we propose a method for computing a minimum size training set consistent subset for the Nearest Neighbor rule (also said CNN problem) via SAT encodings. We introduce the SAT–CNN algorithm, which exploits a suitable encoding of the CNN problem in a sequence of SAT problems in order to exactly solve it, provided that enough computational resources are available. Comparison of SAT–CNN with well-known greedy methods shows that SAT–CNN is able to return a better solution. The proposed approach can be extended to several hard subset selection classification problems.


  1. 1.
    Angiulli, F. (2005). Fast condensed nearest neighbor rule. In 22nd International Conference on Machine Learning (ICML), Bonn, Germany.Google Scholar
  2. 2.
    Angiulli, F. (2007). Condensed nearest neighbor data domain description. IEEE Trans. Pattern Anal. Mach. Intell., 29(10):1746-1758.CrossRefGoogle Scholar
  3. 3.
    Angiulli, F. (2007). Fast nearest neighbor condensation for large data sets classification. IEEE Trans. Knowl. Data Eng., 19(11):1450-1464.CrossRefGoogle Scholar
  4. 4.
    Many à , F., & Ans ótegui, C (2004). Mapping problems with finite-domain variables into problems with boolean variables. In Proc. of the Seventh Int. Conf. on Theory and Applications of Satisifiability Testing (SAT), pages 111-119, Vancouver, BC, Canada.Google Scholar
  5. 5.
    Cook, S.A. (1971). The complexity of theorem-proving procedures. In 3rd ACM Symposium on Theory of Computing, pages 151-158, Ohio, United States.Google Scholar
  6. 6.
    Hart P.E., & Cover, T.M. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21-27.MATHCrossRefGoogle Scholar
  7. 7.
    Dasarathy, B. (1994). Minimal consistent subset (mcs) identification for optimal nearest neighbor decision systems design. IEEE Transactions on Systems, Man, and Cybernetics, 24(3):511-517.CrossRefGoogle Scholar
  8. 8.
    Logemann, G., Loveland, D., & Davis, M. (1962). A machine program for theorem-proving. Communications of the ACM, 5(7):394-397.MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Murty, M.N., & Devi, F.S. (2002). An incremental prototype set building technique. Pattern Recognition, 35(2):505-513.MATHCrossRefGoogle Scholar
  10. 10.
    Devroye, L. (1981). On the inequality of cover and hart in nearest neighbor discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3:75-78.MATHCrossRefGoogle Scholar
  11. 11.
    S örensson, N., & E én, N. (2005). Minisat a sat solver with conflict-clause minimization. In International Conference on Theory and Applications of Satisfiability Testing.Google Scholar
  12. 12.
    Warmuth, M., & Floyd, S. (1995). Sample compression, learnability, and the vapnikchervonenkis dimension. Machine Learning, 21(3):269-304.Google Scholar
  13. 13.
    Hostetler, L.D., & Fukunaga, K. (1975). k-nearest-neighbor bayes-risk estimation. IEEE Trans. on Information Theory, 21:285-293.MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Johnson, D.S., & Garey, M.R. (1979). Computers and Intractability. A Guide to the Theory of NP-completeness. Freeman and Comp., NY, USA.MATHGoogle Scholar
  15. 15.
    Gates, W. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3):431-433.CrossRefGoogle Scholar
  16. 16.
    Prosser, P., & Gent, I.P. (2002). In Proc. of the Fifth Int. Conf. on Theory and Applications of Satisifiability Testing (SAT), Cincinnati, Ohio, USA.Google Scholar
  17. 17.
    Gonzalez, T. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293-306.MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Hart, P.E. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14(3):515-516.CrossRefGoogle Scholar
  19. 19.
    Krim, H., & Karaçali, B. (2003). Fast minimization of structural risk by nearest neighbor rule. IEEE Transactions on Neural Networks, 14(1):127-134.CrossRefGoogle Scholar
  20. 20.
    Nakagawa, M., & Liu, C.L. (2001). Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition. Pattern Recognition, 34(3):601-615.MATHCrossRefGoogle Scholar
  21. 21.
    Madigan, C., Zhao, Y., Zhang, L., Malik, S., & Moskewicz, M. (2001). Engineering an efficient sat solver. In 39th Design Automation Conference (DAC).Google Scholar
  22. 22.
    Darwiche, A., & Pipatsrisawat, K. (2007). Rsat 2.0: Sat solver description. Technical Report D-153, Automated Reasoning Group, Computer Science Department, UCLA.Google Scholar
  23. 23.
    Woodruff, H.B., Lowry, S.R., Isenhour, T.L., & Ritter, G.L. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21:665-669.MATHCrossRefGoogle Scholar
  24. 24.
    Stone, C. (1977). Consistent nonparametric regression. Annals of Statistics, 8:1348-1360.CrossRefGoogle Scholar
  25. 25.
    Toussaint, G. (2002). Proximity graphs for nearest neighbor decision rules: Recent progress. In Proceedings of the Symposium on Computing and Statistics, Montreal, Canada, April 17-20.Google Scholar
  26. 26.
    Vapnik, V. (1995). The Nature of the statistical learning theory. Springer Verlag, New York.MATHGoogle Scholar
  27. 27.
    Wilfong, G. (1992). Nearest neighbor problems. International Journal of Computational Geometry & Applications, 2(4):383-416.MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Martinez, T.R., & Wilson, D.R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38(3):257-286.MATHCrossRefGoogle Scholar

Copyright information

© International Federation for Information Processing 2008

Authors and Affiliations

  • Fabrizio Angiulli
    • 1
  • Stefano Basta
    • 2
  1. 1.DEISUniversità della CalabriaRende (CS)Italy
  2. 2.ICAR-CNRRende (CS)Italy

Personalised recommendations