# Data Mining with Algorithmic Transparency

## Abstract

In this paper, we investigate whether decision trees can be used to interpret a black-box classifier without knowing the learning algorithm and the training data. Decision trees are known for their transparency and high expressivity. However, they are also notorious for their instability and tendency to grow excessively large. We present a classifier reverse engineering model that outputs a decision tree to interpret the black-box classifier. There are two major challenges. One is to build such a decision tree with controlled stability and size, and the other is that probing the black-box classifier is limited for security and economic reasons. Our model addresses the two issues by simultaneously minimizing sampling cost and classifier complexity. We present our empirical results on four real datasets, and demonstrate that our reverse engineering learning model can effectively approximate and simplify the black box classifier.

## Notes

### Acknowledgement

The research reported herein was supported in part by NIH award 1R01HG006844, NSF awards CNS-1111529, CICI-1547324, and IIS-1633331 and ARO award W911NF-17-1-0356.

## References

- 1.Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: ICML, pp. 1–9 (1998)Google Scholar
- 2.Beygelzimer, A., Dasgupta, S., Langford, J.: Importance weighted active learning. In: ICML, pp. 49–56 (2009)Google Scholar
- 3.Craven, M.W., Shavlik, J.W.: Extracting tree-structured representations of trained networks. In: Proceedings of the 8th International Conference on Neural Information Processing Systems, pp. 24–30 (1995)Google Scholar
- 4.Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 598–617 (2016)Google Scholar
- 5.Diederich, J. (ed.): Rule Extraction from Support Vector Machines. Studies in Computational Intelligence, vol. 80. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-75390-2CrossRefzbMATHGoogle Scholar
- 6.Duivesteijn, W., Thaele, J.: Understanding where your classifier does (not) work - the scape model class for EMM. In: ICDM, pp. 809–814 (2014)Google Scholar
- 7.Dwyer, K., Holte, R.: Decision tree instability and active learning. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 128–139. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_15CrossRefGoogle Scholar
- 8.Dyer, M., Frieze, A., Kannan, R.: A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM
**38**(1), 1–17 (1991)MathSciNetCrossRefGoogle Scholar - 9.Fung, G., Sandilya, S., Rao, R.B.: Rule extraction from linear support vector machines. In: ACM SIGKDD, pp. 32–40 (2005)Google Scholar
- 10.Held, M., Buhmann, J.M.: Unsupervised on-line learning of decision trees for hierarchical data analysis. In: Advances in Neural Information Processing Systems, pp. 514–520 (1998)Google Scholar
- 11.Henelius, A., Puolamäki, K., Boström, H., Asker, L., Papapetrou, P.: A peek into the black box: exploring classifiers by randomization. Data Min. Knowl. Discov.
**28**(5–6), 1503–1529 (2014)MathSciNetCrossRefGoogle Scholar - 12.Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 217–226. ACM, New York (2006)Google Scholar
- 13.Klivans, A.R., O’Donnell, R., Servedio, R.A.: Learning geometric concepts via Gaussian surface area. In: Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 541–550 (2008)Google Scholar
- 14.Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: ACM SIGIR, pp. 3–12 (1994)CrossRefGoogle Scholar
- 15.Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
- 16.PCAST: Big data and privacy: a technological perspective (2014). https://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_big_data_and_privacy_-_may_2014.pdf
- 17.Rademacher, L., Goyal, N.: Learning convex bodies is hard. In: COLT (2009)Google Scholar
- 18.Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: SIGKDD, pp. 1135–1144 (2016)Google Scholar
- 19.Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: ICML, pp. 441–448 (2001)Google Scholar
- 20.Saad, E.W., Wunsch II, D.C.: Neural network explanation using inversion. Neural Netw.
**20**(1), 78–93 (2007)CrossRefGoogle Scholar - 21.Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National Conference on Artificial Intelligence, AAAI 2006, vol. 1, pp. 500–505. AAAI Press (2006)Google Scholar
- 22.Sweeney, L.: Discrimination in online ad delivery. Commun. ACM
**56**(5), 44–54 (2013)CrossRefGoogle Scholar - 23.Tong, S., Koller, D.: Active learning for parameter estimation in Bayesian networks. In: NIPS, pp. 647–653 (2001)Google Scholar