Abstract
While optimising quality assurance has been an important research area for many years, we still see interesting new ideas in this area such as incorporating psychological factors, detecting pseudo-tested code and detecting code with low fault risk.
Similar content being viewed by others
Keywords
1 Optimisation of Quality Assurance
Analytical quality assurance is the part of software quality assurance that analyses artefacts and processes to assess the level of quality and identify quality shortcomings. Analytical quality assurance is a major cost factor in any software development project. While real numbers are hard to find, any practitioner would agree that they spend a lot of time on testing and similar activities. At the same time, analytical quality assurance is indispensable to avoid quality problems of all kinds.
Hence, we have a set of activities in the development process that are expensive but are also decisive for the quality, and thereby probably the success, of a software product. Therefore, it is a practical need to optimise what kinds of analytical quality assurance are employed, to what degree and with how much effort to reach high quality and low costs.
In the following, we will first discuss how this problem could be framed from an economics point of view. Second, we will discuss the progress on evaluating and understanding the effectiveness and efficiency of various analytical quality assurance techniques. Third, we the will widen our perspective by discussing psychological aspects of quality assurance, and, finally, discuss some concrete, current proposals to optimise software testing.
2 Quality Economics
As the optimisation of quality assurance contains many different factors such as the effort spent for the techniques and what kind of techniques, what kind of faults are in the software it is difficult to find a common unit. Hence, various quality economics approaches have been proposed that use monetary units [1, 10].
We have proposed a quality economics model to estimate and evaluate costs and revenue from using analytical quality assurance mainly based on effort and difficulty functions of faults and quality assurance techniques [11,12,13]. We show in Fig. 1 just the part of the model describing the direct costs of applying an analytical quality assurance technique. Such an application usually comes with some kind of fixed setup costs, execution costs depending on how much effort we spend and removal costs depending on the probability that we detect a given fault.
This needs to be complemented with future costs describing the costs incurred by faults not detected by analytical quality assurance but leading to failures in the field. Those faults need also to be removed and there are further effect costs such as compensations to the customers. We showed that especially the latter are an important part of the model, but it is extremely difficult to determine. A very simple fault can have catastrophic effects while a very complicated, far ranging fault might never lead to a failure. Overall, we found that the empirical basis is too thin to practical apply such models and, because of the difficulty of collecting all the necessary data, might never improve.
3 Effectiveness and Efficiency of Quality Assurance Techniques
Therefore, we are convinced that it is more useful to rather devote research on the effectiveness (what share of the faults is found?) and efficiency (what is the relation of the number of found faults to effort spent?) of particular quality assurance techniques to at least be able to judge and compare them to have a basis for the decision what techniques to apply when and for what. There have been many studies for evaluating these aspects for various quality assurance techniques. In particular testing and inspections have been a focus of empirical research.
We have also contributed to better understand several quality assurance techniques. For model-based testing, we could show that a large share of the detected defects are already found during the modelling activity. The created test suite itself was not more effective than a test suite created by experts but was able to find other defects, often defects that require a particular long interaction sequence [9]. For black-box integration testing, we could show that test suites with a higher data-flow-based coverage are able to detect more more defects [2].
In a comparison of tests, reviews and automated static analysis [15], we found that automated static analysis finds a subset of defect types of reviews, but if it detected a specific type, it would detect it more thoroughly. Tests found completely different detect types than reviews and automated static analysis. In a study only of automated static analysis [14], we found that none of the analysed 72 field defects would have been found by one of the used static analysis tools. For the particular static analysis technique clone detection, we found that by looking particularly at clones with differences between their instance, we found 107 faults in five production systems leading to the observation that every other clone with unintended differences between its instances constitutes a fault [3].
Together with the large and growing body of evidence from other researchers, this starts to give a good understanding of the effectiveness and efficiency of analytical quality assurance. The main weaknesses I see still today is the infrequent replication of studies and slightly different operationalisations of effectiveness and efficiency that make meta-analysis difficult.
4 Psychological Aspects
A further dimension we believe to be of critical importance but that has not widely been investigated are psychological factors of the software developers and testers applying analytical quality assurance. In particular, we studied the use of automated static analysis in relation to the personality of the developers [8] and the stress it caused for developers [7]. For example, we found that while people with a high level of agreeableness show a relatively structured strategy in dealing with static analysis findings by working with small changes in cycles, people with high neuroticism show a more chaotic approach by making larger changes and impulsive decisions: They change the file they were working on without finishing the work they had started. Such findings can in turn inform how to improve quality assurance techniques or the user interfaces of corresponding tools.
5 Test Optimisation
Especially for the optimisation of tests, there is a lot of existing research on test case prioritisation and test suite minimisation. These techniques aim at only executing relevant test cases and executing test cases with a high probability of detecting a defect for a giving change early. Yet, there is still room for improvement. We recently introduced two novel concepts to optimise test suites: (1) detection of pseudo-tested code [4, 6] and (2) inverse defect prediction [5].
We define pseudo-tested code as code that is covered by some test case but the test case does not effectively test that code. It would not detect faults in that code. We detect pseudo-tested code by using an extreme mutation operator: We remove the whole implementation of a method or function and return a constant. In empirical analyses, we found in all 19 study objects that pseudo-tested code existed and that the median of pseudo-tested methods was 10%. It can help in a practical setting to reduce incorrect coverage values to concentrate on test cases that effectively add more coverage.
We introduced inverse defect detection to identify methods that are too trivial to be tested. For example, many getter and setter methods in Java contain only trivial functionality. In many cases, it is not time and effort well spent to test these methods. We identified a set of metrics that we considered to be likely indicators for trivial methods. We used these for association rule mining and identified association rules to identify methods that have a low fault risk. This forms effectively a theory for low-fault-risk methods. The predictor that uses these rules is what we call inverse defect detection. It is effective in identifying methods with low fault risk: On average, only 0.3% of the methods classified as ?low fault risk? are faulty. The identified methods are, on average, 5.7 times less likely to contain a fault than an arbitrary method in the analysed systems. Hence, we can either not test those methods at all or at least execute the corresponding test cases last.
6 Conclusions
Optimising analytical quality assurance has been an important practical problem as well as a corresponding research area for many years. We have learnt that precise economical models suffer from the lack of empirical data and, hence, have not led to much practical progress. Yet, many empirical studies contributed to a better understanding of the effectiveness and efficiency of particular quality assurance methods. Furthermore, new research directions such as the influence of psychological factors, pseudo-tested code and the prediction of low-fault-risk code provide us with more theoretical understanding and already practical impact in the form of new tools.
References
Boehm, B.W., Huang, L., Jain, A., Madachy, R.J.: The ROI of software dependability: the iDAVE model. IEEE Softw. 21(3), 54–61 (2004)
Hellhake, D., Schmid, T., Wagner, S.: Using data flow-based coverage criteria for black-box integration testing of distributed software systems. In: 12th IEEE Conference on Software Testing, Validation and Verification, ICST 2019, Xi’an, China, 22–27 April 2019, pp. 420–429. IEEE (2019)
Jürgens, E., Deissenboeck, F., Hummel, B., Wagner, S.: Do code clones matter? In: Proceedings of 31st International Conference on Software Engineering, ICSE 2009, Vancouver, Canada, 16–24 May 2009, pp. 485–495. IEEE (2009)
Niedermayr, R., Jürgens, E., Wagner, S.: Will my tests tell me if I break this code? In: Proceedings of the International Workshop on Continuous Software Evolution and Delivery, CSED@ICSE 2016, Austin, Texas, USA, 14–22 May 2016, pp. 23–29. ACM (2016)
Niedermayr, R., Röhm, T., Wagner, S.: Too trivial to test? An inverse view on defect prediction to identify methods with low fault risk. PeerJ Comput. Sci. 5, e187 (2019)
Niedermayr, R., Wagner, S.: Is the stack distance between test case and method correlated with test effectiveness? In Ali, S., Garousi, V. (eds.) Proceedings of the Evaluation and Assessment on Software Engineering, EASE 2019, Copenhagen, Denmark, 15–17 April 2019, pp. 189–198. ACM (2019)
Ostberg, J., Wagner, S.: At ease with your warnings: the principles of the salutogenesis model applied to automatic static analysis. In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, Suita, Osaka, Japan, 14–18 March 2016, vol. 1, pp. 629–633. IEEE Computer Society (2016)
Ostberg, J., Wagner, S., Weilemann, E.: Does personality influence the usage of static analysis tools?: an explorative experiment. In: Proceedings of the 9th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE@ICSE 2016, Austin, Texas, USA, 16 May 2016, p. 75–81. ACM (2016)
Pretschner, A., Prenninger, W., Wagner, S., Kühnel, C., Baumgartner, M., Sostawa, B., Zölch, R., Stauner, T.: One evaluation of model-based testing and its automation. In: Roman, G., Griswold, W.G., Nuseibeh, B. (eds.) 27th International Conference on Software Engineering (ICSE 2005), St. Louis, Missouri, USA, 15–21 May 2005, pp. 392–401. ACM (2005)
Slaughter, S., Harter, D.E., Krishnan, M.S.: Evaluating the cost of software quality. Commun. ACM 41(8), 67–73 (1998)
Wagner, S.: A literature survey of the quality economics of defect-detection techniques. In: Travassos, G.H., Maldonado, J.C., Wohlin, C. (eds.) 2006 International Symposium on Empirical Software Engineering (ISESE 2006), Rio de Janeiro, Brazil, 21–22 September 2006, pp. 194–203. ACM (2006)
Wagner, S.: A model and sensitivity analysis of the quality economics of defect-detection techniques. In Pollock, L.L., Pezzè, M. (eds.) Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2006, Portland, Maine, USA, 17–20 July 2006, pp. 73–84. ACM (2006)
Wagner, S.: Cost optimisation of analytical software quality assurance. Ph.D. thesis, Technical University Munich, Germany (2007)
Wagner, S., Deissenboeck, F., Aichner, M., Wimmer, J., Schwalb, M.: An evaluation of two bug pattern tools for Java. In: First International Conference on Software Testing, Verification, and Validation, ICST 2008, Lillehammer, Norway, 9–11 April 2008, pp. 248–257. IEEE Computer Society (2008)
Wagner, S., Jürjens, J., Koller, C., Trischberger, P.: Comparing bug finding tools with reviews and tests. In: Khendek, F., Dssouli, R. (eds.) TestCom 2005. LNCS, vol. 3502, pp. 40–55. Springer, Heidelberg (2005). https://doi.org/10.1007/11430230_4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wagner, S. (2020). Optimising Analytical Software Quality Assurance. In: Winkler, D., Biffl, S., Mendez, D., Bergsmann, J. (eds) Software Quality: Quality Intelligence in Software and Systems Engineering. SWQD 2020. Lecture Notes in Business Information Processing, vol 371. Springer, Cham. https://doi.org/10.1007/978-3-030-35510-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-35510-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35509-8
Online ISBN: 978-3-030-35510-4
eBook Packages: Computer ScienceComputer Science (R0)