Abstract
Quantization is one of the most applied Deep Neural Network (DNN) compression strategies, when deploying a trained DNN model on an embedded system or a cell phone. This is owing to its simplicity and adaptability to a wide range of applications and circumstances, as opposed to specific Artificial Intelligence (AI) accelerators and compilers that are often designed only for certain specific hardware (e.g., Google Coral Edge TPU). With the growing demand for quantization, ensuring the reliability of this strategy is becoming a critical challenge. Traditional testing methods, which gather more and more genuine data for better assessment, are often not practical because of the large size of the input space and the high similarity between the original DNN and its quantized counterpart. As a result, advanced assessment strategies have become of paramount importance. In this paper, we present DiverGet, a search-based testing framework for quantization assessment. DiverGet defines a space of metamorphic relations that simulate naturally-occurring distortions on the inputs. Then, it optimally explores these relations to reveal the disagreements among DNNs of different arithmetic precision. We evaluate the performance of DiverGet on state-of-the-art DNNs applied to hyperspectral remote sensing images. We chose the remote sensing DNNs as they’re being increasingly deployed at the edge (e.g., high-lift drones) in critical domains like climate change research and astronomy. Our results show that DiverGet successfully challenges the robustness of established quantization techniques against naturally-occurring shifted data, and outperforms its most recent concurrent, DiffChaser, with a success rate that is (on average) four times higher.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. arXiv:1605.08695 [cs]
Agili H, Daniel S, Chokmani K (2014) Revue des méthodes de prétraitement des données d’imagerie hyperspectrale acquises depuis un drone. Geomatica 68(4):331–343. https://doi.org/10.5623/cig2014-407. http://espace.inrs.ca/id/eprint/4391/
Alzantot M, Sharma Y, Chakraborty S, Zhang H, Hsieh C J, Srivastava M (2019) Genattack: practical black-box attacks with gradient-free optimization. arXiv:1805.11090 [cs]
Biggio B, Roli F (2018) Wild patterns: ten years after the rise of adversarial machine learning half-day tutorial. In: 25th ACM conference on computer and communications security, CCS 2018. Association for Computing Machinery, pp 2154–2156
Bouzidi S (2019) Parallel and distributed implementation on spark of a spectral–spatial classifier for hyperspectral images. J Appl Remote Sens 13(3):034501
Braiek H B, Khomh F (2019) Deepevolution: a search-based testing approach for deep neural networks. arXiv:1909.02563 [cs, stat]
Braiek H B, Khomh F (2020) On testing machine learning programs. J Syst Softw 164:110542
Cover T M, Thomas J A (1991) Elements of information theory Wiley series in telecommunications. Wiley, New York
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2009.5206848. iSSN: 1063-6919, pp 248–255
Dua Y, Kumar V, Singh R S (2020) Comprehensive review of hyperspectral image compression algorithms. Optical Eng 59(9):090902
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science. https://doi.org/10.1109/MHS.1995.494215, pp 39–43
Gholami A, Kim S, Dong Z, Yao Z, Mahoney M W, Keutzer K (2021) A survey of quantization methods for efficient neural network inference. arXiv:210313630
Guo Y (2018) A survey on methods and theories of quantized neural networks. arXiv:1808.04752 [cs, stat]
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385 [cs]
Hess M R, Kromrey J D (2004) Robust confidence intervals for effect sizes: a comparative study of cohen’sd and cliff’s delta under non-normality and heterogeneous variances. In: Annual meeting of the American Educational Research Association, Citeseer, vol 1
Hinton G, Deng L, Yu D, Dahl G E, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97. https://doi.org/10.1109/MSP.2012.2205597
Ho Y C, Pepyne D L (2002) Simple explanation of the no-free-lunch theorem and its implications. J Optim Theory Appl 115(3):549–570
Hu Q, Guo Y, Cordy M, Xie X, Ma W, Papadakis M, Traon Y L (2022) Characterizing and understanding the behavior of quantized models for reliable deployment. arXiv:220404220
Jaccard P (1912) The distribution of the flora in the alpine zone.1. New Phytol 11(2):37–50. https://doi.org/10.1111/j.1469-8137.1912.tb05611.x. https://nph.onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-8137.1912.tb05611.x
Joshi S K, Bansal J C (2020) Parameter tuning for meta-heuristics. Knowl-Based Syst 189:105094
Krishnamoorthi R (2018) Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv:1806.08342 [cs, stat]
Krizhevsky A, Nair V, Hinton G (2014) The cifar-10 dataset. http://www.cs.toronto.edu/~kriz/cifar.html
Kullback S (1987) Letter to the editor: the Kullback-Leibler distance
LeCun Y (1998) The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Ma L, Juefei-Xu F, Zhang F, Sun J, Xue M, Li B, Chen C, Su T, Li L, Liu Y, Zhao J, Wang Y (2018) Deepgauge: multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. https://doi.org/10.1145/3238147.3238202. arXiv:1803.07519, pp 120–131
McMinn P (2011) Search-based software testing: past, present and future. In: 2011 IEEE Fourth international conference on software testing, verification and validation workshops. https://doi.org/10.1109/ICSTW.2011.100, pp 153–163
Mitchell M (2001) An introduction to genetic algorithms, 7th edn. Complex adaptive systems. Cambridge
Mosli R, Wright M, Yuan B, Pan Y (2019) They might NOT be giants: crafting black-Box adversarial examples with fewer queries using particle swarm optimization. arXiv:1909.07490
Odena A, Goodfellow I (2018) Tensorfuzz: debugging neural networks with coverage-guided fuzzing. arXiv:1807.10875
Pei K, Cao Y, Yang J, Jana S (2017) Deepxplore: automated whitebox testing of deep learning systems. In: Proceedings of the 26th symposium on operating systems principles. https://doi.org/10.1145/3132747.3132785. arXiv:1705.06640, pp 1–18
Roy S K, Krishna G, Dubey S R, Chaudhuri B B (2020) HybridSN: exploring 3d-2D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci Remote Sens Lett 17(2):277–281. https://doi.org/10.1109/LGRS.2019.2918719. arXiv:1902.06701
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. Advances in neural information processing systems 29
Shi Y Q, Sun H (2017) Image and video compression for multimedia engineering: fundamentals, algorithms and standards. CRC Press
Thomos N, Boulgouris N V, Strintzis M G (2005) Optimized transmission of jpeg2000 streams over wireless channels. IEEE Trans Image Process 15 (1):54–67
Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: automated testing of deep-neural-network-driven autonomous cars. arXiv:1708.08559 [cs]
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)
Vargha A, Delaney H D (2000) A critique and improvement of the cl common language effect size statistics of Mcgraw and Wong. J Educ Behav Stat 25 (2):101–132
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1(6):80. https://doi.org/10.2307/3001968. https://www.jstor.org/stable/10.2307/3001968?origin=crossref
Wu H, Judd P, Zhang X, Isaev M, Micikevicius P (2020) Integer quantization for deep learning inference: principles and empirical evaluation. arXiv:2004.09602 arXiv:200409602 [cs, stat]
Xie X, Ma L, Juefei-Xu F, Chen H, Xue M, Li B, Liu Y, Zhao J, Yin J, See S (2018) Deephunter: hunting deep neural network defects via coverage-guided fuzzing. arXiv:1809.01266 [cs]
Xie X, Ma L, Wang H, Li Y, Liu Y, Li X (2019) Diffchaser: detecting disagreements for deep neural networks. In: IJCAI, pp 5772–5778
Yang G, Zheng N, Guo S (2007) Optimal wavelet filter design for remote sensing image compression. J Electron (China) 24(2):276–284
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. arXiv:1708.02709 [cs]
Yu J, Fu Y, Zheng Y, Wang Z, Ye X (2019) Test4deep: an effective white-box testing for deep neural networks. In: 2019 IEEE International conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC). IEEE, pp 16–23
Zaatour R, Bouzidi S, Zagrouba E (2020) Unsupervised image-adapted local fisher discriminant analysis to reduce hyperspectral images without ground truth. IEEE Trans Geosci Remote Sens 58(11):7931–7941
Zhong Z, Li J, Luo Z, Chapman M (2018) Spectral–spatial residual network for hyperspectral image classification: a 3-D deep learning framework. IEEE Trans Geosci Remote Sens 56(2):847–858. https://doi.org/10.1109/TGRS.2017.2755542
Acknowledgements
We acknowledge the support from the following organizations and companies: Fonds de Recherche du Québec (FRQ), Natural Sciences and Engineering Research Council of Canada (NSERC), Canadian Institute for Advanced Research (CIFAR), and Huawei Canada. However, the findings and opinions expressed in this paper are those of the authors and do not necessarily represent or reflect those organizations/companies.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
All authors declare that they have no conflicts of interest.
Additional information
Communicated by: Andrea Stocco, Onn Shehory, Gunel Jahangirova, Vincenzo Riccio
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Software Testing in the Machine Learning Era
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yahmed, A.H., Braiek, H.B., Khomh, F. et al. DiverGet: a Search-Based Software Testing approach for Deep Neural Network Quantization assessment. Empir Software Eng 27, 193 (2022). https://doi.org/10.1007/s10664-022-10202-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10202-w