Learning Interpretable Classification Rules with Boolean Compressed Sensing

Malioutov, Dmitry M.; Varshney, Kush R.; Emad, Amin; Dash, Sanjeeb

doi:10.1007/978-3-319-54024-5_5

Dmitry M. Malioutov⁵,
Kush R. Varshney⁵,
Amin Emad^6,7 &
…
Sanjeeb Dash⁵

Part of the book series: Studies in Big Data ((SBD,volume 32))

2921 Accesses
10 Citations
1 Altmetric

Abstract

An important problem in the context of supervised machine learning is designing systems which are interpretable by humans. In domains such as law, medicine, and finance that deal with human lives, delegating the decision to a black-box machine-learning model carries significant operational risk, and often legal implications, thus requiring interpretable classifiers. Building on ideas from Boolean compressed sensing, we propose a rule-based classifier which explicitly balances accuracy versus interpretability in a principled optimization formulation. We represent the problem of learning conjunctive clauses or disjunctive clauses as an adaptation of a classical problem from statistics, Boolean group testing, and apply a novel linear programming (LP) relaxation to find solutions. We derive theoretical results for recovering sparse rules which parallel the conditions for exact recovery of sparse signals in the compressed sensing literature. This is an exciting development in interpretable learning where most prior work has focused on heuristic solutions. We also consider a more general class of rule-based classifiers, checklists and scorecards, learned using ideas from threshold group testing. We show competitive classification accuracy using the proposed approach on real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Explaining AI Decisions Using Efficient Methods for Learning Sparse Boolean Formulae

Article 29 November 2018

Classifier-based constraint acquisition

Article Open access 17 April 2021

On Learning Sparse Boolean Formulae for Explaining AI Decisions

Notes

1.
Other approaches to approximately solve group testing include greedy methods and loopy belief propagation; see references in [34].
2.
Instead of using LP, one can find solutions greedily, as is done in the SCM, which gives a log(m) approximation. The same guarantee holds for LP with randomized rounding. Empirically, LP tends to find sparser solutions.
3.
Surprisingly, for many practical datasets the LP formulation obtains integral solutions, or requires a small number of branch and bound steps.
4.
In general it will contain the features and their complements as columns. However, with enough data, one of the two choices will be removed by zero-row elimination beforehand.
5.
Here, the subscript “z” stands for zero and “o” stands for one.
6.
We use IBM SPSS Modeler 14.1 and Matlab R2009a with default settings.

References

Adams, S.T., Leveson, S.H.: Clinical prediction rules. Br. Med. J. 344, d8312 (2012)
Article Google Scholar
Atia, G.K., Saligrama, V.: Boolean compressed sensing and noisy group testing. IEEE Trans. Inf. Theory 58 (3), 1880–1901 (2012)
Article MathSciNet MATH Google Scholar
Bertsimas, D., Chang, A., Rudin, C.: An integer optimization approach to associative classification. In: Advances in Neural Information Processing Systems 25, pp. 269–277 (2012)
Google Scholar
Blum, A., Kalai, A., Langford, J.: Beating the hold-out: bounds for k-fold and progressive cross-validation. In: Proceedings of the Conference on Computational Learning Theory, Santa Cruz, CA, pp. 203–208 (1999)
Google Scholar
Boros, E., Hammer, P.L., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Trans. Knowl. Data Eng. 12 (2), 292–306 (2000)
Article Google Scholar
Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25 (2), 21–30 (2008)
Article Google Scholar
Chen, H.B., Fu, H.L.: Nonadaptive algorithms for threshold group testing. Discret. Appl. Math. 157, 1581–1585 (2009)
Article MathSciNet MATH Google Scholar
Cheraghchi, M., Hormati, A., Karbasi, A., Vetterli, M.: Compressed sensing with probabilistic measurements: a group testing solution. In: Proceedings of the Annual Allerton Conference on Communication Control and Computing, Allerton, IL, pp. 30–35 (2009)
Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3 (4), 261–283 (1989)
Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the International Conference on Machine Learning, Tahoe City, CA, pp. 115–123 (1995)
Google Scholar
Dai, L., Pelckmans, K.: An ellipsoid based, two-stage screening test for BPDN. In: Proceedings of the European Signal Processing Conference, Bucharest, Romania, pp. 654–658 (2012)
Google Scholar
Dash, S., Malioutov, D.M., Varshney, K.R.: Screening for learning classification rules via Boolean compressed sensing. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, Italy, pp. 3360–3364 (2014)
Google Scholar
Dash, S., Malioutov, D.M., Varshney, K.R.: Learning interpretable classification rules using sequential row sampling. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Brisbane, Australia (2015)
Google Scholar
Dembczyński, K., Kotłowski, W., Słowiński, R.: ENDER: a statistical framework for boosting decision rules. Data Min. Knowl. Disc. 21 (1), 52–90 (2010)
Article MathSciNet Google Scholar
Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. 100 (5), 2197–2202 (2003)
Article MathSciNet MATH Google Scholar
Du, D.Z., Hwang, F.K.: Pooling Designs and Nonadaptive Group Testing: Important Tools for DNA Sequencing. World Scientific, Singapore (2006)
Book MATH Google Scholar
Dyachkov, A.G., Rykov, V.V.: A survey of superimposed code theory. Prob. Control. Inf. 12 (4), 229–242 (1983)
MathSciNet Google Scholar
Dyachkov, A.G., Vilenkin, P.A., Macula, A.J., Torney, D.C.: Families of finite sets in which no intersection of l sets is covered by the union of s others. J. Combin. Theory 99, 195–218 (2002)
Article MathSciNet MATH Google Scholar
Eckstein, J., Goldberg, N.: An improved branch-and-bound method for maximum monomial agreement. INFORMS J. Comput. 24 (2), 328–341 (2012)
Article MathSciNet MATH Google Scholar
El Ghaoui, L., Viallon, V., Rabbani, T.: Safe feature elimination in sparse supervised learning. Pac. J. Optim. 8 (4), 667–698 (2012)
MathSciNet MATH Google Scholar
Emad, A., Milenkovic, O.: Semiquantitative group testing. IEEE Trans. Inf. Theory 60 (8), 4614–4636 (2014)
Article MathSciNet MATH Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2010)
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2 (3), 916–954 (2008)
Article MathSciNet MATH Google Scholar
Fry, C.: Closing the gap between analytics and action. INFORMS Analytics Mag. 4 (6), 4–5 (2011)
Google Scholar
Gage, B.F., Waterman, A.D., Shannon, W., Boechler, M., Rich, M.W., Radford, M.J.: Validation of clinical classification schemes for predicting stroke. J. Am. Med. Assoc. 258 (22), 2864–2870 (2001)
Article Google Scholar
Gawande, A.: The Checklist Manifesto: How To Get Things Right. Metropolitan Books, New York (2009)
Google Scholar
Gilbert, A.C., Iwen, M.A., Strauss, M.J.: Group testing and sparse signal recovery. In: Conference Record - Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, pp. 1059–1063 (2008)
Google Scholar
Jawanpuria, P., Nath, J.S., Ramakrishnan, G.: Efficient rule ensemble learning using hierarchical kernels. In: Proceedings of the International Conference on Machine Learning, Bellevue, WA, pp. 161–168 (2011)
Google Scholar
John, G.H., Langley, P.: Static versus dynamic sampling for data mining. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 367–370 (1996)
Google Scholar
Kautz, W., Singleton, R.: Nonrandom binary superimposed codes. IEEE Trans. Inf. Theory 10 (4), 363–377 (1964)
Article MATH Google Scholar
Letham, B., Rudin, C., McCormick, T.H., Madigan, D.: Building interpretable classifiers with rules using Bayesian analysis. Tech. Rep. 609, Department of Statistics, University of Washington (2012)
Google Scholar
Liu, J., Li, M.: Finding cancer biomarkers from mass spectrometry data by decision lists. J. Comput. Biol. 12 (7), 971–979 (2005)
Article Google Scholar
Liu, J., Zhao, Z., Wang, J., Ye, J.: Safe screening with variational inequalities and its application to lasso. In: Proceedings of the International Conference on Machine Learning, Beijing, China, pp. 289–297 (2014)
Google Scholar
Malioutov, D., Malyutov, M.: Boolean compressed sensing: LP relaxation for group testing. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 3305–3308 (2012)
Google Scholar
Malioutov, D.M., Varshney, K.R.: Exact rule learning via Boolean compressed sensing. In: Proceedings of the International Conference on Machine Learning, Atlanta, GA, pp. 765–773 (2013)
Google Scholar
Malioutov, D.M., Sanghavi, S.R., Willsky, A.S.: Sequential compressed sensing. IEEE J. Spec. Top. Signal Proc. 4 (2), 435–444 (2010)
Article Google Scholar
Malyutov, M.: The separating property of random matrices. Math. Notes 23 (1), 84–91 (1978)
Article MathSciNet MATH Google Scholar
Malyutov, M.: Search for sparse active inputs: a review. In: Aydinian, H., Cicalese, F., Deppe, C. (eds.) Information Theory, Combinatorics, and Search Theory: In Memory of Rudolf Ahlswede, pp. 609–647. Springer, Berlin/Germany (2013)
Chapter Google Scholar
Marchand, M., Shawe-Taylor, J.: The set covering machine. J. Mach. Learn. Res. 3, 723–746 (2002)
MathSciNet MATH Google Scholar
Maron, O., Moore, A.W.: Hoeffding races: accelerating model selection search for classification and function approximation. Adv. Neural Inf. Proces. Syst. 6, 59–66 (1993)
Google Scholar
Mazumdar, A.: On almost disjunct matrices for group testing. In: Proceedings of the International Symposium on Algorithms and Computation, Taipei, Taiwan, pp. 649–658 (2012)
Google Scholar
Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, pp. 23–32 (1999)
Google Scholar
Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27 (3), 221–234 (1987)
Article Google Scholar
Rivest, R.L.: Learning decision lists. Mach. Learn. 2 (3), 229–246 (1987)
Google Scholar
Rückert, U., Kramer, S.: Margin-based first-order rule learning. Mach. Learn. 70 (2–3), 189–206 (2008)
Article Google Scholar
Sejdinovic, D., Johnson, O.: Note on noisy group testing: asymptotic bounds and belief propagation reconstruction. In: Proceedings of the Annual Allerton Conference on Communication Control and Computing, Allerton, IL, pp. 998–1003 (2010)
Google Scholar
Stinson, D.R., Wei, R.: Generalized cover-free families. Discret. Math. 279, 463–477 (2004)
Article MathSciNet MATH Google Scholar
Ustun, B., Rudin, C.: Methods and models for interpretable linear classification. Available at http://arxiv.org/pdf/1405.4047 (2014)
Wagstaff, K.L.: Machine learning that matters. In: Proceedings of the International Conference on Machine Learning, Edinburgh, United Kingdom, pp. 529–536 (2012)
Google Scholar
Wang, F., Rudin, C.: Falling rule lists. Available at http://arxiv.org/pdf/1411.5899 (2014)
Wang, J., Zhou, J., Wonka, P., Ye, J.: Lasso screening rules via dual polytope projection. Adv. Neural Inf. Proces. Syst. 26, 1070–1078 (2013)
MATH Google Scholar
Wang, Y., Xiang, Z.J., Ramadge, P.J.: Lasso screening with a small regularization parameter. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3342–3346 (2013)
Google Scholar
Wang, Y., Xiang, Z.J., Ramadge, P.J.: Tradeoffs in improved screening of lasso problems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3297–3301 (2013)
Google Scholar
Wang, T., Rudin, C., Doshi, F., Liu, Y., Klampfl, E., MacNeille, P.: Bayesian or’s of and’s for interpretable classification with application to context aware recommender systems. Available at http://arxiv.org/abs/1504.07614 (2015)
Wu, H., Ramadge, P.J.: The 2-codeword screening test for lasso problems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3307–3311 (2013)
Google Scholar
Xiang, Z.J., Ramadge, P.J.: Fast lasso screening tests based on correlations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 2137–2140 (2012)
Google Scholar
Xiang, Z.J., Xu, H., Ramadge, P.J.: Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries. Advances in Neural Information Processing Systems, vol. 24, pp. 900–908. MIT Press, Cambridge, MA (2011)
Google Scholar

Download references

Acknowledgements

The authors thank Vijay S. Iyengar, Benjamin Letham, Cynthia Rudin, Viswanath Nagarajan, Karthikeyan Natesan Ramamurthy, Mikhail Malyutov and Venkatesh Saligrama for valuable discussions.

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Dmitry M. Malioutov, Kush R. Varshney & Sanjeeb Dash
Institute for Genomic Biology, University of Illinois, Urbana Champaign, Urbana, IL, USA
Amin Emad
1218 Thomas M. Siebel Center for Computer Science, University of Illinois, Urbana, IL, 61801, USA
Amin Emad

Authors

Dmitry M. Malioutov
View author publications
You can also search for this author in PubMed Google Scholar
Kush R. Varshney
View author publications
You can also search for this author in PubMed Google Scholar
Amin Emad
View author publications
You can also search for this author in PubMed Google Scholar
Sanjeeb Dash
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry M. Malioutov .

Editor information

Editors and Affiliations

Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
Tania Cerquitelli
Bell Laboratories, Cambridge, United Kingdom
Daniele Quercia
Carey School of Law, University of Maryland, Baltimore, Maryland, USA
Frank Pasquale

Appendices

Appendix 1: Dual Linear Program

We now derive the dual LP, which we use in Sect. 5. We start off by giving a reformulation of the LP in (10), i.e., we consider an LP with the same set of optimal solutions as the one in (10). First note that the upper bounds of 1 on the variables ξ _i are redundant.Let $(\bar{\mathbf{w}},\bar{\boldsymbol{\xi }})$ be a feasible solution of (10) without the upper bound constraints such that $\bar{\xi }_{i} > 1$ for some $i \in \mathcal{P}$. Reducing $\bar{\xi }_{i}$ to 1 yields a feasible solution (as $\mathbf{a}_{i}\bar{\mathbf{w}} +\bar{\xi } _{i} \geq 1$—the only inequality ξ _i participates in besides the bound constraints—is still satisfied). The new feasible solution has lower objective function value than before, as ξ _i has a positive coefficient in the objective function (which is to be minimized). One can similarly argue that in every optimal solution of (10) without the upper bound constraints, we have w _j ≤ 1 (for j = 1, …, n). Finally, observe that we can substitute ξ _i for $i \in \mathcal{Z}$ in the objective function by a _i w because of the constraints a _i w = ξ _i for $i \in \mathcal{Z}$. We thus get the following LP equivalent to (10):

$$ \displaystyle\begin{array}{rcl} & & \min \quad \sum _{j=1}^{n}\left (\lambda +\|\mathbf{a}_{ \mathcal{Z}}^{j}\|_{ 1}\right )w_{j} +\sum _{ i=1}^{p}\xi _{ i} \\ & & \,\ \mathrm{s.t.}\quad 0 \leq w_{j},\,j = 1,\ldots,n \\ & & \qquad \quad 0 \leq \xi _{i},\,i = 1,\ldots,p \\ & & \qquad \quad \mathbf{A}_{\mathcal{P}}\mathbf{w} +\boldsymbol{\xi } _{\mathcal{P}}\geq \mathbf{1}. {}\end{array}$$

(19)

The optimal solutions and optimal objective values are the same as in (10).

Writing $\mathbf{A}_{\mathcal{P}}\mathbf{w} +\boldsymbol{\xi } _{\mathcal{P}}$ as $\mathbf{A}_{\mathcal{P}}\mathbf{w} + \mathbf{I}\boldsymbol{\xi }_{\mathcal{P}}$, where I is the p × p identity matrix, $\vert \vert \mathbf{a}_{\mathcal{Z}}^{j}\vert \vert _{1}$ as $\mathbf{1}^{T}\mathbf{a}_{\mathcal{Z}}^{j}$, and letting $\boldsymbol{\mu }$ be a row vector of p dual variables, one can see that the dual is:

$$\displaystyle\begin{array}{rcl} & & \max \quad \sum _{i=1}^{p}\mu _{ i} \\ & & \,\ \mathrm{s.t. }\quad 0 \leq \mu _{i} \leq 1,\,i = 1,\ldots,p \\ & & \qquad \quad \boldsymbol{\mu }^{T}\mathbf{A}_{ \mathcal{P}}\leq \lambda \mathbf{1}_{n} + \mathbf{1}^{T}\mathbf{A}_{ \mathcal{Z}}.{}\end{array}$$

(20)

Suppose $\boldsymbol{\bar{\mu }}$ is a feasible solution to (20). Then clearly $\sum _{i=1}^{p}\bar{\mu }_{i}$ yields a lower bound on the optimal solution value of (19).

Appendix 2: Derivation of Screening Tests

Let $\mathcal{S}(j)$ stand for the support of $\mathbf{a}_{\mathcal{P}}^{j}$. Furthermore, let $\mathcal{N}(j)$ stand for the support of $\mathbf{1} -\mathbf{a}_{\mathcal{P}}^{j}$, i.e, it is the set of indices from $\mathcal{P}$ such that the corresponding components of $\mathbf{a}_{\mathcal{P}}^{j}$ are zero.

Now consider the situation where we fix w ₁ (say) to 1. Let A′ stand for the submatrix of A consisting of the last n − 1 columns. Let w′ stand for the vector of variables w ₂, …, w _n. Then the constraints $\mathbf{A}_{\mathcal{P}}\mathbf{w} +\boldsymbol{\xi } _{\mathcal{P}}\geq \mathbf{1}$ in (19) become $\mathbf{A}_{\mathcal{P}}^{\prime}\mathbf{w}^{\prime} +\boldsymbol{\xi } _{\mathcal{P}}\geq \mathbf{1} -\mathbf{a}_{\mathcal{P}}^{1}$. Therefore, for all $i \in \mathcal{S}(1)$, the corresponding constraint is now $(\mathbf{A}_{\mathcal{P}}^{\prime})_{i}\mathbf{w}^{\prime} +\xi _{i} \geq 0$ which is a redundant constraint as $\mathbf{A}_{\mathcal{P}}^{\prime}\geq 0$ and w′, ξ _i ≥ 0. The only remaining non-redundant constraints correspond to the indices in $\mathcal{N}(1)$. Then the value of (19) with w ₁ set to 1 becomes

$$\displaystyle{ \begin{array}{rl} \left (\lambda +\|\mathbf{a}_{\mathcal{Z}}^{1}\|_{1}\right ) +\min \quad &\sum _{j=2}^{n}\left (\lambda +\|\mathbf{a}_{\mathcal{Z}}^{j}\|_{1}\right )w_{j} +\sum _{i\in \mathcal{N}(1)}\xi _{i} \\ \mathrm{s.t.}\quad &0 \leq w_{j},\,j = 2,\ldots,n \\ &0 \leq \xi _{i},\,i \in \mathcal{N}(1) \\ &\mathbf{A}^{\prime}_{\mathcal{N}(1)}\mathbf{w}^{\prime} +\boldsymbol{\xi } _{\mathcal{N}(1)} \geq \mathbf{1}. \end{array} }$$

(21)

This LP clearly has the same form as the LP in (19). Furthermore, given any feasible solution $\boldsymbol{\bar{\mu }}$ of (20), $\boldsymbol{\bar{\mu }}_{\mathcal{N}(1)}$ defines a feasible dual solution of (21) as

$$\displaystyle\begin{array}{rcl} & \boldsymbol{\bar{\mu }}^{T}\mathbf{A}_{\mathcal{P}}\leq \lambda \mathbf{1}_{n} + \mathbf{1}^{T}\mathbf{A}_{\mathcal{Z}} & {}\\ & \Rightarrow \boldsymbol{\bar{\mu }}_{\mathcal{S}(1)}^{T}\mathbf{A}^{\prime}_{\mathcal{S}(1)} +\boldsymbol{\bar{\mu }}_{ \mathcal{N}(1)}^{T}\mathbf{A}^{\prime}_{\mathcal{N}(1)} \leq \lambda \mathbf{1}_{n-1} + \mathbf{1}^{T}\mathbf{A}_{\mathcal{Z}}^{\prime}& {}\\ & \Rightarrow \boldsymbol{\bar{\mu }}_{\mathcal{N}(1)}^{T}\mathbf{A}^{\prime}_{\mathcal{N}(1)} \leq \lambda \mathbf{1}_{n-1} + \mathbf{1}^{T}\mathbf{A}_{\mathcal{Z}}^{\prime}. & {}\\ \end{array}$$

Therefore $\sum _{i\in \mathcal{N}(n)}\bar{\mu }_{i}$ is a lower bound on the optimal solution value of the LP in (21) and therefore

$$\displaystyle{ \lambda +\vert \vert \mathbf{a}_{\mathcal{Z}}^{1}\vert \vert _{ 1} +\sum _{i\in \mathcal{N}(1)}\bar{\mu }_{i} }$$

(22)

is a lower bound on the optimal solution value of (19) with w ₁ set to 1. In particular, if $(\bar{\mathbf{w}},\bar{\boldsymbol{\xi }})$ is a feasible integral solution to (19) with objective function value $\lambda (\sum _{i=1}^{n}\bar{w}_{i}) +\sum _{ i=1}^{p}\bar{\xi }_{i}$, and if (22) is greater than this value, than no optimal integral solution of (19) can have w ₁ = 1. Therefore w ₁ = 0 in any optimal solution, and we can simply drop the column corresponding to w ₁ from the LP.

In order to use the screening results in this section we need to obtain a feasible primal and a feasible dual solution. Some useful heuristics to obtain such a pair are described in [12].

Appendix 3: Extending the Dual Solution for Row-Sampling

Suppose that $\hat{\boldsymbol{\mu }}^{p}$ is the optimal dual solution to the small LP in Sect. 5.3. Note that the number of variables in the dual for the large LP increases from p to $\bar{p}$ and the bound on the second constraint grows from $\lambda \mathbf{1}_{n} + \mathbf{1}^{T}\mathbf{A}_{\mathcal{Z}}$ to $\lambda \mathbf{1}_{n} + \mathbf{1}^{T}\bar{\mathbf{A}}_{\mathcal{Z}}$.

We use a greedy heuristic to extend $\hat{\boldsymbol{\mu }}^{p}$ to a feasible dual solution $\bar{\boldsymbol{\mu }}_{\bar{p}}$ of the large LP. We set $\bar{\mu }_{j} =\hat{\mu } _{j}$ for j = 1, . . , p. We extend the remaining entries $\bar{\mu }_{j}$ for $j = (p + 1),..,\bar{p}$ by setting a subset of its entries to 1 while satisfying the dual feasibility constraint. In other words the extension of $\boldsymbol{\bar{\mu }}$ corresponds to a subset $\mathcal{R}$ of the row indices $\{p + 1,\ldots,\bar{p}\}$ of $\bar{\mathbf{A}}_{\mathcal{P}}$ such that $\hat{\boldsymbol{\mu }}_{p}^{T}\mathbf{A}_{\mathcal{P}} +\sum _{i\in \mathcal{R}}(\bar{\mathbf{A}}_{\mathcal{P}})_{i} \leq \mathbf{1}^{T}\bar{\mathbf{A}}_{\mathcal{Z}}$. Having $\boldsymbol{\bar{\mu }}^{T}\mathbf{A}_{\mathcal{P}} \leq \mathbf{1}^{T}\mathbf{A}_{\mathcal{Z}}$ with $\boldsymbol{\bar{\mu }}$ extended by a binary vector implies that $\boldsymbol{\bar{\mu }}$ is feasible for (20). We initialize $\mathcal{R}$ to ∅ and then simply go through the unseen rows of $\bar{\mathbf{A}}_{\mathcal{P}}$ in some fixed order (increasing from p + 1 to $\bar{p}$), and for a row k, if

$$\displaystyle{\hat{\boldsymbol{\mu }}_{p}^{T}\mathbf{A}_{ \mathcal{P}} +\sum _{i\in \mathcal{R}}(\bar{\mathbf{A}}_{\mathcal{P}})_{i} + (\bar{\mathbf{A}}_{\mathcal{P}})_{k} \leq \mathbf{1}^{T}\bar{\mathbf{A}}_{ \mathcal{Z}},}$$

we set $\mathcal{R}$ to $\mathcal{R}\cup \{ k\}$. The heuristic (we call it H1) needs only a single pass through the matrix $\bar{\mathbf{A}}_{\mathcal{P}}$, and is thus very fast.

This heuristic, however, does not use the optimal solution $\hat{\mathbf{w}}^{m}$ in any way. Suppose $\hat{\mathbf{w}}^{m}$ were an optimal solution of the large LP. Then complementary slackness would imply that if $(\bar{\mathbf{A}}_{\mathcal{P}})_{i}\hat{\mathbf{w}}^{m} > 1$, then in any optimal dual solution $\boldsymbol{\mu },\mu _{i} = 0$. Thus, assuming $\hat{\mathbf{w}}^{m}$ is close to an optimal solution for the large LP, we modify heuristic H1 to obtain heuristic H2, by simply setting $\bar{\mu }_{i} = 0$ whenever $(\bar{\mathbf{A}}_{\mathcal{P}})_{i}\hat{\mathbf{w}}^{m} > 1$, while keeping the remaining steps unchanged.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Malioutov, D.M., Varshney, K.R., Emad, A., Dash, S. (2017). Learning Interpretable Classification Rules with Boolean Compressed Sensing. In: Cerquitelli, T., Quercia, D., Pasquale, F. (eds) Transparent Data Mining for Big and Small Data. Studies in Big Data, vol 32. Springer, Cham. https://doi.org/10.1007/978-3-319-54024-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-54024-5_5
Published: 10 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54023-8
Online ISBN: 978-3-319-54024-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Learning Interpretable Classification Rules with Boolean Compressed Sensing

Abstract

Access this chapter

Similar content being viewed by others

Explaining AI Decisions Using Efficient Methods for Learning Sparse Boolean Formulae

Classifier-based constraint acquisition

On Learning Sparse Boolean Formulae for Explaining AI Decisions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1: Dual Linear Program

Appendix 2: Derivation of Screening Tests

Appendix 3: Extending the Dual Solution for Row-Sampling

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Learning Interpretable Classification Rules with Boolean Compressed Sensing

Abstract

Access this chapter

Similar content being viewed by others

Explaining AI Decisions Using Efficient Methods for Learning Sparse Boolean Formulae

Classifier-based constraint acquisition

On Learning Sparse Boolean Formulae for Explaining AI Decisions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1: Dual Linear Program

Appendix 2: Derivation of Screening Tests

Appendix 3: Extending the Dual Solution for Row-Sampling

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation