On SAT instance classes and a method for reliable performance experiments with SAT solvers

  • Franc Brglez
  • Xiao Yu Li
  • Matthias F. Stallmann


A recent series of experiments with a group of state-of-the-art SAT solvers and several well-defined classes of problem instances reports statistically significant performance variability for the solvers. A systematic analysis of the observed performance data, all openly archived on the Web, reveals distributions which we classify into three broad categories: (1) readily characterized with a simple χ2-test, (2) requiring more in-depth analysis by a statistician, (3) incomplete, due to time-out limit reached by specific solvers. The first category includes two well-known distributions: normal and exponential; we use simple first-order criteria to decide the second category and label the distributions as near-normal, near-exponential and heavy-tail. We expect that good models for some if not most of these may be found with parameters that fit either generalized gamma, Weibull, or Pareto distributions. Our experiments show that most SAT solvers exhibit either normal or exponential distribution of execution time (runtime) on many equivalence classes of problem instances. This finding suggests that the basic mathematical framework for these experiments may well be the same as the one used to test the reliability or lifetime of hardware components such as lightbulbs, A/C units, etc. A batch of N replicated hardware components represents an equivalence class of N problem instances in SAT, a controlled operating environment A represents a SAT solver A, and the survival function\(\mathcal{R}^A \left( x \right)\) (where x represents the lifetime) is the complement of the solvability function\(\mathcal{S}^A \left( x \right) = 1--\mathcal{R}^A \left( x \right)\) where x may represent runtime, implications, backtracks, etc. As demonstrated in the paper, a set of unrelated benchmarks or randomly generated SAT instances available today cannot measure the performance of SAT solvers reliably — there is no control on their ‘hardness’. However, equivalence class instances as defined in this paper are, in effect, replicated instances of a specific reference instance. The proposed method not only provides a common platform for a systematic study and a reliable improvement of deterministic and stochastic SAT solvers alike but also supports the introduction and validation of new problem instance classes.


satisfiability conjunctive normal form equivalence classes experimental design exponential and heavy-tail distributions reliability function 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    S. Baase and A. Van Gelder, Computer Algorithms, 3rd edition (Addison-Wesley, Reading, MA, 2000).Google Scholar
  2. [2]
    G.E.P. Box, W.G. Hunter and J.S. Hunter, Statistics for Experimenters: An Introduction to Design, Data Analysis and Model Building (Wiley, 1978).Google Scholar
  3. [3]
    F. Brglez, Design of experiments to evaluate CAD algorithms: which improvements are due to improved heuristic and which are merely due to chance?, Technical Report 1998-TR@CBL-04-Brglez, CBL, CS Dept., NCSU, Raleigh, NC (April 1998). Also available at http://www.cbl.ncsu.edu/publications/#1998-TR@CBL-04-Brglez.Google Scholar
  4. [4]
    F. Brglez and R. Drechsler, Design of experiments in CAD: Context and new data sets for ISCAS’99, in: Proc. of IEEE 1999 International Symposium on Circuits and Systems — ISCAS’99 (May 1999). A reprint is accessible from http://www.cbl.ncsu.edu/publications/#I999-ISCAS-Brglez.Google Scholar
  5. [5]
    F. Brglez, X.Y. Li and M. Stallmann, The role of a skeptic agent in testing and benchmarking of SAT algorithms, in: Proc. of Fifth International Symposium on the Theory and Applications of Satisfiability Testing, http://www.cbl.ncsu.edu/publications/ (May 2002).Google Scholar
  6. [6]
    F. Brglez, M.F. Stallmann and X.Y. Li, SATbed — An environment for reliable performance experiments with SAT instance classes and algorithms, in: Proc. of SAT 2003, Sixth International Symposium on the Theory and Applications of Satisfiability Testing, Portofino, Italy, ed. S.M. Ligure (May 5–8, 2003). A revised version available at http://www.cbl.ncsu.edu/publications/.Google Scholar
  7. [7]
    F. Brglez, M. Stallmann and X.Y. Li, SATbed home page: A tutorial, a user guide, a software archive, archives of SAT instance classes and experimental results, http://www.cbl.ncsu.edu/OpenExperiments/SAT/ (2003).Google Scholar
  8. [8]
    C. Coarfa, D.D. Demopoulos, A.S.M. Aguirre, D. Subramanian and M.Y. Vardi, Random 3-SAT: The plot thickens, in: Principles and Practice of Constraint Programming (2000) pp. 143–159.Google Scholar
  9. [9]
    S. Cook and D. Mitchell, Finding hard instances of the satisfiability problem: A survey, http://dream.dai.ed.ac.uk/group/tw/sat/sat-survey3.ps (1997).Google Scholar
  10. [10]
    E.L. Crow, F.A. Davis and M.W. Maxfield, Statistics Manual (Dover, New York, 1960).MATHGoogle Scholar
  11. [11]
    M. Davis, G. Logemann and D. Loveland, A machine program for theorem-proving, Communications of the ACM 5(7) (1962) 394–397.CrossRefMATHMathSciNetGoogle Scholar
  12. [12]
    S. Davis and M. Putnam, A computing procedure for quantification theory, Journal of the Association for Computing Machinery 7(3) (1960) 201–215.CrossRefMATHMathSciNetGoogle Scholar
  13. [13]
    I.P. Gent and T. Walsh, The search for satisfaction, http://dream.dai.ed.ac.uk/group/tw/sat/sat-survey2.ps.Google Scholar
  14. [14]
    D. Ghosh, Generation of tightly controlled equivalence classes for experimental design of heuristics for graph-based NP-hard problems, PhD thesis, Electrical and Computer Engineering, North Carolina State University, Raleigh, NC (May 2000). Also available at http://www.cbl.ncsu.edu/publications/#2000-Thesis-PhD-Ghosh.Google Scholar
  15. [15]
    D. Ghosh and F. Brglez, Equivalence classes of circuit mutants for experimental design, in: Proc.of Intl. Symp. Circuits and Systems (ISCAS) (May–June 1999). Also available at http://www.cbl.ncsu.edu/publications/#1999-ISCAS-Ghosh.Google Scholar
  16. [16]
    S.W. Golomb, On the classification of boolean functions, IRE Transactions on Information Theory 5 (1959) 176–186.CrossRefGoogle Scholar
  17. [17]
    C.P. Gomes, B. Selman and N. Crato, Heavy-tailed distributions in combinatorial search, in: Principles and Practice of Constraint Programming (1997) pp. 121–135.Google Scholar
  18. [18]
    J. Gu, P. Purdom, J. Franco and B. Wah, Algorithms for the satisfiability (SAT) problem: A survey, in: DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 35 (1997) pp. 19–152, http://dream.dai.ed.ac.uk/group/tw/sat/sat-survey.ps.Google Scholar
  19. [19]
    N.C. Gupta and D.S, Nau, On the complexity of blocks-world planning, Artificial Intelligence 56(2–3) (1992) 223–254.CrossRefMATHMathSciNetGoogle Scholar
  20. [20]
    J.E. Harlow and F. Brglez, Design of experiments for evaluation of BDD packages using controlled circuit mutations, in: Proc. of the International Conference on Formal Methods in Computer-Aided Design (FMCAD’98), Lecture Notes in Computer Science, Vol. 1522 (Springer, 1998) pp. 64–81. Also available from http://www.cbl.ncsu.edu/publications/#1998-FMCAD-Harlow.Google Scholar
  21. [21]
    J.E. Harlow and F. Brglez, Design of experiments and evaluation of BDD ordering heuristics, International Journal on Software Tools for Technology Transfer (STTT): Special Issue on BDDs (2001).Google Scholar
  22. [22]
    E. Hirsch and A. Kojevnikov, UnitWalk: A new SAT solver that uses local search guided by unit clause elimination, 2001, PDMI preprint 9/2001, Steklov Institute of Mathematics at St. Petersburg (2001).Google Scholar
  23. [23]
    J. Hooker, Testing heuristics: We have it all wrong, Journal of Heuristics 1 (1996) 33–42.CrossRefGoogle Scholar
  24. [24]
    J.N. Hooker, Needed: An empirical science of algorithms, Operations Research 42(2) (1994) 201–212.CrossRefMATHGoogle Scholar
  25. [25]
    H.H. Hoos and T. Stützle, Evaluating Las Vegas algorithms — pitfalls and remedies, in: Proc. of UAI-98 (Morgan Kaufmann, San Mateo, CA, 1998) pp. 238–245.Google Scholar
  26. [26]
    H.H. Hoos and T. Stützle, Local search algorithms for SAT: An empirical evaluation, Journal of Automated Reasoning 24 (2000).Google Scholar
  27. [27]
    H.H. Hoos and T. Stützle, SATLIB: An online resource for research on SAT, in: Proc. of SAT’2000 (IOS Press, 2000) pp. 283–292, http://www.satlib.org.Google Scholar
  28. [28]
    F. Jense, Electronic Component Reliability: Fundamentals, Modelling, Evaluation, and Assurance (Wiley, New York, 1996).Google Scholar
  29. [29]
    N. Kapur, D. Ghosh and F. Brglez, Towards a new benchmarking paradigm in EDA: analysis of equivalence class mutant circuit distributions, in: Proc. of ACM International Symposium on Physical Design (April 1997).Google Scholar
  30. [30]
    H. Kautz, D. McAllester and B. Selman, Encoding plans in propositional logic, in: KR’96: Principles of Knowledge Representation and Reasoning (1996) pp. 374–384. The SATPLAN benchmark set is available from http://sat.inesc.pt/benchmarks/cnf/satplan/.Google Scholar
  31. [31]
    X.Y. Li, M.F. Stallmann and F. Brglez, QingTing: A local search SAT solver using an effective switching strategy and an efficient unit propagation, in: Proc. of 2003-SAT Issue on Satisfiability Testing, Lecture Notes in Computer Science, Vol. 2919 (2003) pp. 53–68. A significant revision of the paper published in Proc. of Sixth International Symposium on the Theory and Applications of Satisfiability Testing, Portofino, Italy, ed. S.M. Ligure (May 5–8, 2003). Available at http://www.cbl.ncsu.edu/publications/.Google Scholar
  32. [32]
    H. Lilliefors, On the Kolmogorov-Smirnov test for the exponential distribution with mean unknown, Journal of the American Statistical Association 64 (1969) 387–389.CrossRefGoogle Scholar
  33. [33]
    J.P. Marques-Silva, On selecting problem instances for evaluating satisfiability algorithms, in: Proc. of ECAI Workshop on Empirical Methods in Artificial Intelligence (ECAI-EMAI), 2000.Google Scholar
  34. [34]
    D.A. McAllester, B. Selman and H.A. Kautz, Evidence for invariants in local search, in: Proc. of AAAI/IAAI (1997) pp. 321–326.Google Scholar
  35. [35]
    D. Mitchell, A remark on benchmarks and analysis, in: Proc. of IJCAI-99 Workshop on Empirical AI (1999).Google Scholar
  36. [36]
    M. Mitzenmacher, A Brief History of Generative Models for Power Law and Lognormal Distributions (Allerton, 2001).Google Scholar
  37. [37]
    M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang and S. Malik, Chaff: Engineering an efficient SAT solver, in: IEEE/ACM Design Automation Conference (DAC) (2001). Version 1.0 of Chaff is available from http://www.ee.princeton.edu/chaff/zchaff/zchaff.2001.2.17.src.tar.gz.Google Scholar
  38. [38]
    I. Olkin, L.J. Gleser and C. Derman, Probability, Models, and Applications (Macmillan, New York, 1960).Google Scholar
  39. [39]
    J.A. Osborne and T.A. Severini, Inference for exponential order statistic models based on an integrated likelihood function, Journal of the American Statistical Association 95 (2000) 1220–1228.CrossRefMATHMathSciNetGoogle Scholar
  40. [40]
    J.A. Osborne and T.A. Severini, The Lorenz curve for model assessment in exponential order statistic models, Journal of Statistical Computation and Simulation 72 (2002) 87–97.CrossRefMathSciNetGoogle Scholar
  41. [41]
    F. Prochan, Theoretical explanation of observed decreasing failure rate, Technometrics 5 (1963) 375.CrossRefGoogle Scholar
  42. [42]
    Sat-Ex: The experimentation web site around the satisfiability, http://www.lri.fr/ ~simon/satex/satex.php3.Google Scholar
  43. [43]
    SATLIB — The Satisfiability Library, http://www.satlib.org (2003).Google Scholar
  44. [44]
    SAT Live! Up-to-date links for the SATisfiability problem, http://www.satlive.org.Google Scholar
  45. [45]
    B. Selman, D.G. Mitchell and H.J. Levesque, Generating hard satisfiability problems, Artificial Intelligence 81(1–2) (1996) 17–29.CrossRefMathSciNetGoogle Scholar
  46. [46]
    M. Stallmann, F. Brglez and D. Ghosh, Heuristics and experimental design for bigraph crossing number minimization, in: Proc. of the First Workshop on Algorithm Engineering and Experimentation (ALENEX 99) (January 1999). Also available at http://www.cbl.ncsu.edu/publications/.Google Scholar
  47. [47]
    M. Stallmann, F. Brglez and D. Ghosh, Heuristics, experimental subjects and treatment evaluation in bigraph crossing minimization, Journal on Experimental Algorithmics (2001). Also available at http://www.cbl.ncsu.edu/publications/#2001-JEA-Stallmann.Google Scholar
  48. [48]
    M.A. Trick, Second DIMACS challenge test problems, in: DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 26 (1993) pp. 653–657. The SAT benchmark sets are available at ftp://dimacs.rutgers.edu/pub/challenge/satisfiability.Google Scholar
  49. [49]
    D.B. West, Introduction to Graph Theory (Prentice-Hall, Englewood Cliffs, NJ, 1996).MATHGoogle Scholar
  50. [50]
    J. Whittemore, J. Kim and K. Sakallah, SATIRE: a new incremental satisfiability engine, in: IEEE/ACMDesign Automation Conference (DAC) (2001). Version 1.0.0 of SATIRE is available from http://andante.eecs.umich.edu/satire/Satire.tgz.Google Scholar
  51. [51]
    H. Zhang, SATO: An efficient propositional proven, in: Conference on Automated Deduction (1997) pp. 272–275. Version 3.2 of SATO is available from ftp://cs.uiowa.edu/pub/hzhang/sato/sato.tar.gz.Google Scholar
  52. [52]
    H. Zhang and M.E. Stickel, Implementing the Davis-Putnam Method (Kluwer Academic, Dordrecht, 2000).Google Scholar

Copyright information

© Springer 2004

Authors and Affiliations

  • Franc Brglez
    • 1
  • Xiao Yu Li
    • 1
  • Matthias F. Stallmann
    • 1
  1. 1.Department of Computer ScienceNC State UniversityRaleighUSA

Personalised recommendations