On the Role of Software Quality Management in Software Process Improvement
- 3.1k Downloads
Software Process Improvement (SPI) programs have been implemented, inter alia, to improve quality and speed of software development. SPI addresses many aspects ranging from individual developer skills to entire organizations. It comprises, for instance, the optimization of specific activities in the software lifecycle as well as the creation of organizational awareness and project culture. In the course of conducting a systematic mapping study on the state-of-the-art in SPI from a general perspective, we observed Software Quality Management (SQM) being of certain relevance in SPI programs. In this paper, we provide a detailed investigation of those papers from the overall systematic mapping study that were classified as addressing SPI in the context of SQM (including testing). From the main study’s result set, 92 papers were selected for an in-depth systematic review to study the contributions and to develop an initial picture of how these topics are addressed in SPI. Our findings show a fairly pragmatic contribution set in which different solutions are proposed, discussed, and evaluated. Among others, our findings indicate a certain reluctance towards standard quality or (test) maturity models and a strong focus on custom review, testing, and documentation techniques, whereas a set of five selected improvement measures is almost equally addressed.
KeywordsSoftware process improvement Software quality management Software test Systematic mapping study Systematic literature review
To organize software development companies look for Software Process Improvement (SPI; ) allowing them to analyze and to continuously improve their development approaches. In the course of conducting a systematic mapping study , SPI was mentioned a diverse field: many SPI facets are studied, several hundreds of custom SPI approaches were proposed, e.g., to address weaknesses of standard approaches like CMMI , SPI success factors are collected and analyzed, and new trends such as SPI employing agility as improvement principle are addressed. SPI thereby aims at improving companies’ competitiveness and is considered important regardless of a company’s size .
Besides accelerated development procedures, the quality of the software products developed is another important criterion (cf. Bennett and Weinberg , who found bug fixing cost increasing by magnitudes in later lifecycle phases). Therefore, improving the quality of software and determining the economic value , notably for small and very small companies  is of certain relevance. For those companies, emphasizing quality is crucial, as software testing is a strenuous and expensive process  consuming up to 50 % of the total development costs . Therefore, improving the quality management and, in particular, the software test activities provide a perfect starting point for improving the software process and hence product quality.
Problem Statement and Objective. SPI programs have been implemented to improve product quality and speed of software development and have shown impact . Also, software quality assurance techniques play an important role to guarantee and improve quality. Yet, the role of software quality assurance and SQM in SPI programs has not explicitly been investigated so far. The objective of this research is therefore to analyze the literature to characterize the role of SQM in SPI.
Contribution. This paper provides an overview of the study population on SPI with a special focus on SQM and shows how these studies are evaluated. It presents the software quality assurance techniques and improvement measures addressed in SPI. Our findings show indication that SPI in the context of SQM is equally focussed on software testing as well as on complementing (support) activities including reviews and documentation techniques. Furthermore, our findings show a trend towards utilizing individual testing approaches rather than implementing/following standards.
Context: A Systematic Mapping Study on SPI. This study is grounded in a comprehensive systematic mapping study on the state of SPI of which the findings where published in  (to which we refer to as the main study). Outcomes of this study show SPI being an actively researched topic, yet lacking theories and models. Instead, the field of SPI is shaped by a constant rate of approx. 10–12 new SPI models per year. These trends observed were used to form topic clusters of which one cluster addresses Software Quality Management and Software Test. The study at hand investigates this particular cluster in more detail utilizing a systematic review (cf. Sect. 3).
Outline. The remainder of the paper is organized as follows: Sect. 2 discusses related work. In Sect. 3, we describe our research approach, before we present the results of our study in Sect. 4. We provide a discussion on the results in Sect. 5 and conclude the paper in Sect. 6.
2 Related Work
In (general) SPI, different topics are researched in secondary studies. For instance, Monteiro and Oliveira , Bayona-Oré , and Dybå  study SPI success factors, while Helgesson et al.  and van Wangenheim et al.  review maturity models, and Hull et al.  review different assessment models. These exemplarily mentioned studies show that the SPI community has started the search for generalizable knowledge. Yet, the mentioned studies address more general SPI issues.
The study at hand is the first literature study explicitly dedicated to the role of Software Quality Management (SQM) and Test Process Improvement (TPI) in SPI. It is, however, related to other reviews and secondary studies in SPI, TPI, and the improvement of other analytical and constructive software quality aspects. For instance, regarding TPI, Afzal et al.  provide a systematic review, which identified 18 approaches and their characteristics, and an industrial case study on two prominent approaches, i.e., TPI Next and TMMi. Authors found that many of the test process improvement approaches do not provide sufficient information nor do the approaches include assessment instruments. A systematic review by Garcia et al.  identified 23 test process models, many of them adapted from TMMi and TPI. Reviews and comparisons of TPI models are also covered by a number of industrial white papers (so-called “grey literature”, e.g., [21, 27]), which points to the practical relevance of this field. At the more general level of analytical verification and validation processes, Farooq and Dumke  discuss research directions for the improvement of verification and validation processes. Authors identify research challenges concerning quantitative management, improvement of existing approaches, approaches for emerging development environments as well as empirical investigation of success factors and tool selection. Regarding constructive software quality aspects, several systematic reviews (e.g., for software documentation ) are available, but reviews discussing these quality aspects in relation to SPI are missing so far.
All these representatively selected studies address specific topics, yet, they do not contribute to a more general perspective on SPI in the context of SQM. The paper at hand thus fills a gap in literature by collecting and analyzing publications that emphasize SPI in the SQM context and, therefore, also lays the foundation to direct future research in this field in SPI research.
3 Research Design
This study is an in-depth analysis of a data subset identified in a systematic mapping study . In this section, we present the research design including research questions, data collection and analysis procedures, as well as considerations on the study’s validity. Our research approach for the present study follows the procedures applied in ; an in-depth analysis of SPI in Global Software Engineering.
3.1 Research Questions
In the course of analyzing the selected papers on SQM, this study aims to answer the following research questions:
What is the study population on SPI with a special focus on SQM? This research question aims at capturing the field of SPI from the perspective of quality management and test. It also helps positioning the sub-study to the main study.
Which software quality assurance techniques and improvement measures are addressed in SPI? Based on 58 new metadata attributes, this research question aims at determining the different quality assurance techniques and improvement measures addressed by SPI.
How are studies on SQM in SPI evaluated? This research question is concerned with the determination of the impact of the investigated studies, in particular, to determine the rigor and relevance  of the result set.
3.2 Data Collection Procedures
Being a study on a data subset (see also ), in this study, we had no need for an explicit and self-contained data collection. Input data was obtained from the main study’s result set , which we refer to as the study’s raw data. The selection of the data of interest in the raw data was carried out by selecting all publications from the raw data having the attributes “Quality Management” and/or “Test” set (Fig. 3), which initially results in 96 publications. The resulting subset (to which we refer to as the study data) was then copied to an own spreadsheet. To improve the reliability of the data analysis, two external researchers joined the team. Finally, two researchers carried out the data selection and cleaning procedures and the initial data analysis, one researcher was concerned with the definition of the extended metadata set and the data classification and analysis, and the two remaining researchers took over quality assurance tasks.
Having the study data available, in the course of downloading all selected papers, an initial quality assurance was performed. This quality assurance led to the exclusion of four papers (reasons: misclassification, violation of language constraints). Those papers’ metadata was updated, such that they will be returned to the main study (Sect. 6). Eventually, 92 papers remained in the cleaned study dataset, which where then analyzed as described in Sect. 3.3.
3.3 Analysis Procedure
As “preparatory” study with the purpose of getting the big picture, the main study was conducted as a systematic mapping study following the guidelines as proposed by Petersen et al. . The present study however aims to deliver more insights and details and, thus, is carried out also using the systematic review instrument as described by Kitchenham and Charters . In particular, during the paper download and quality assurance, the initial metadata set (40 attributes, Fig. 3) was revisited and, if necessary, updated. Furthermore, with calling in an external researcher (an expert in quality management and testing), the set of metadata was substantially extended by 58 extra attributes in nine new metadata categories (see Fig. 4).
During the analysis, each paper was inspected by two researchers, who checked (and if necessary revised) the initial values of the metadata, provided an initial assignment of values to the new attributes, and developed a paper summary of 2–3 sentences. Finally, to evaluate the papers regarding their rigor and relevance, we applied the model proposed by Ivarsson and Gorschek  to complete the picture. These steps were iteratively double-checked by a third researcher, and finally independently checked by the two researchers concerned with (general) quality assurance. The analysis as such utilizes descriptive statistics (e.g., charts and tables), whereas we mainly rely on bubble-charts and heat maps.
3.4 Validity Procedures
To improve the validity of the results, we applied the following measures: First, we called in two external researchers and formed two teams. Team 1 (3 persons) conducted the data analysis, while team 2 (2 persons) was taking over the quality assurance. Second, in the data analysis phase, team 1 re-applied the procedures of the main study , i.e., all papers were re-inspected to check the correct assignment and to complete the assignment of the 40 metadata attributes. Third, in the inspection, the assignment of the attributes (40: main study, 58: new, scoped), and the evaluation according to the rigor-relevance model  were carried out using the systematic review instrument  using the full text of the study-relevant papers.
4 Study Results
In this section, we present the results of the study. We start with an overview of the study population, before we present the results of the analyses structured according to the research questions in Sects. 4.1, 4.2 and 4.3. Section 5 presents an integrated discussion of the results obtained from the study.
4.1 RQ1: General Study Population
4.2 RQ2: Improvement Measures and Quality Assurance Techniques
To investigate which improvement measures and quality assurance techniques are addressed by the study dataset, we extended the metadata system from  and defined 58 new attributes for classifying the papers under study. We added “Quality Management and Testing” as new dimension, and we refined this dimension into nine groups (Fig. 4). For space limitations, in the following, we provide the big picture in Fig. 4, but focus on the groups “Improvement Measures” and “Quality Assurance Techniques”. The big picture in Fig. 4 shows the groups test activity, non-functional testing, and level of testing well covered. Furthermore, the dataset provides rich information regarding the groups improvement measures and quality assurance techniques. However, especially regarding test maturity models (or “standardized” testing approaches in general), the dataset provides only little information, which indicates to a confirmation of the observed trend from  regarding the reluctance towards standardization—also for quality management and testing (and as initially found in ).
Regarding the groups “Improvement Measures” and “Quality Assurance Techniques”, in the data, we see a fairly balanced distribution, i.e., a variety of topics is equally researched. The only remarkable outlier is the attribute software infrastructure. Favorites regarding the improvement measures are the improvement of defect handling (50 mentions), cost and time optimization (54 and 56 mentions). Regarding the quality assurance techniques, review (62), as well as testing and documentation (60 mentions each) are the most frequently mentioned ones. Subsequent sections provide further details for the aforementioned two “favorite” groups.
4.3 RQ3: Evaluation of Software Quality Management and Software Testing
Methods Applied. Figure 5 provides a heat map summarizing the study types applied to investigate the different topics. The overview shows that SPI in the context of SQM is a fairly practically researched field. The majority of the papers assessed combine different research methods, whereas case study research is the most used approach—quite often in a mixed-method approach and also implementing a multi-case or longitudinal study approach (for term definitions, see Wohlin et al. ). A remarkable insight is the absence of replication research. Secondary studies and research based on Grounded Theory is present in the study data set, yet the action research approach prevails. Regarding the topic clusters, from the data, we see the cluster “Improvement Measures” fully covered, whereas in the cluster “Quality Assurance Technique” the topics software infrastructure, traceability, training, and other are only partially covered.
Overview of the highest rated papers according to the rigor-relevance model in the categories Improvement Measure and Quality Assurance Technique.
5 Study Summary and Discussion
To provide a in-depth discussion, we ranked the highest rated papers regarding their coverage of improvement measures and quality assurance techniques (Figs. 6 and 7; both based on the classification according to the rigor-relevance model). Table 1 summarizes these papers for the two categories “Improvement Measure” and “Quality Assurance Technique”, whereas we only provide a subset for the in-depth discussion. In particular, we select the papers [8, 14, 22, 29] as sample from the study data set, as we found those papers represented in both categories.
Elliot et al.  document a methodology for implementing a software quality management system (SQMS). Table 1 shows the method proposed addressing quality management in general thus covering a number of attributes (in particular documentation, guideline, and training; reviews and (general) testing were mentioned as concrete techniques to, inter alia, better address different quality criteria, especially in the “system use” section). Key factors for the successful implementation of the SQMS were staff training and treating users like customers, which was also required for a cultural change within the organization.
Harter et al.  present a framework for assessing the economic value of SPI and quality over the software lifecycle. The effects to be measured are defined based on the number of defects (development quality: defects found prior customer testing; conformance quality: defects found in customer testing prior acceptance)—similar measures are defined for development effort and cycle time, and support costs. Therefore, in , authors mainly address the attributes defects, cost, and time to conclude the economic value of SPI (Table 1). Eventually, authors found that higher quality is associated with reduced cycle times and development effort, and that savings accrue due to reduced rework and, moreover, that support activity savings outweigh development savings. Harter et al. conclude that future research efforts should focus on how SPI strategies affect support activities.
Kasoju et al.  use evidence-based software engineering (EBSE) to help an organization improve its testing process (domain: automotive software). They use an in-depth investigation of automotive test processes using a mixed-method approach including case study research, systematic reviews and value stream analysis/mapping. For eight analyzed projects, authors collect information regarding the test approaches, project/system kind and size, and the development approach used (Table 1; mainly attributes cost, time, testing, verification). In interview sessions, among other things, authors found interviewees stating a lack of a clear test process, which can be applied to any project lifecycle. Only 3 out of 8 studied projects follow a defined process (which indicates to the mainly individual and non-standardized process selection as already found in ; moreover, authors found that a basic testing strategy is actually defined, yet not implemented by most of the teams, which is also consistent with our previous findings from ). Eventually, in , authors conclude strengths found for automotive software testing, such as work in small agile teams, implementing agile (communication) practices, or different approaches like exploratory testing. However, authors also mention that these findings also depend on project/team size, i.e., teams of different size might go for different solution, e.g., comprehensive test case management tools are considered more valuable for larger teams. Nevertheless, authors found process issues problematic for teams of any size (consistent with ), e.g., lacking unified testing process, unawareness of the process, or different process-related constrains like available time windows. Finally, authors identified seven wastes, which were mapped to the testing process to drive process improvement.
Li et al.  describe how agile processes affect software quality, software defects and defect fixing efficiency (Table 1; mainly attributes defects, testing, time). A major finding is that a significant reduction of defect densities or changes of defect profiles could not be found after Scrum was used. Yet, due to the iterative development approach, the development was considered more efficiently (e.g., fewer surprises, better control over the quality, and better schedule adherence). However, on the downside, authors also mention that Scrum puts more stress and time pressure on the developers (which could make them more reluctant towards performing tasks relevant for later maintenance). In a nutshell, authors conclude that the actual development approach is less important than iterative development and early testing (in their study, authors showed that about half of the (critical) defects was identified and fixed early thus reducing the risk of finding bugs late).
Summarizing the big picture obtained (Fig. 4) and the exemplarily selected papers (Table 1), we conclude: first, testing as such is not that massively represented in the study data as expected. For this, we argue that there is specialized (grey) literature on test process improvement (TPI), which is not properly linked to SPI—a phenomenon that we already observed for GSE . In particular, so far, we did not found detailed data, e.g., regarding the actual impact of switching to an alternative test approach. On the other hand, we found indication for individual and project-specific test approach selection (even in highly-regulated domains; ), which confirms a finding we made in . Second, so far, we found improving the quality focussing on reducing the number of defects. In [22, 29], the authors found a lack of unified (standardized) testing approaches , and that the actual development approach (agile or traditional) seemingly not affects the defect densities or defect profiles. Harter et al.  suggest putting more effort in improving support activities. It therefore remains as a question for future work whether an SPI program with a “broader” perspective is more beneficial then optimizing a “technical” test method.
Threats to Validity. In the following, we evaluate our findings and critically review our study regarding the threats to validity. As a literature study, this study suffers from potential incompleteness of the search results and a general publication bias. Beyond this general threat to validity, we have to particularly discuss the internal and external validity. The internal validity could be biased by personal ratings of the researchers. To address this risk, we continued and refined our study , which follows a proven procedure that utilizes different tools and researcher triangulation to support dataset cleaning, study selection, and classification. The internal validity is also affected by the limited data collection, in particular, no new data was collected, and data analyzed is derived from the main study that serves as an umbrella. Calling in extra researchers to analyze and/or confirm decisions therefore further increases internal validity. The external validity is threatened by missing knowledge about the generalizability of the results. Furthermore, this study “inherits” several limitations regarding the external validity by relying on the main study’s raw data only. Consequently, this study also inherits the main study’s scope thus having certain limitations regarding the generalizability. Nevertheless, to increase the external validity, further independently conducted studies are required to confirm our findings.
The paper at hand provides an in-depth investigation of how software quality management (SQM) is treated in software process improvement (SPI). Based on a systematic mapping study , we selected all papers from the main study’s dataset that address the topics SQM and software testing. In total, in this study, we inspected 92 papers.
Our findings show indication that SPI in the context of SQM is equally focussed on software testing as well as on complementing (or support) activities including reviews and documentation techniques. Furthermore, our findings show a trend in SPI towards utilizing individual testing approaches rather than implementing/following standards. A detailed discussion of four exemplarily selected papers reveals that the actual software process is less relevant than a smart arrangement of test activities (early testing) and an interactive implementation of the development process . Furthermore, Harter et al.  suggest putting more effort on supporting activities rather than optimizing (isolated) technical tasks.
Limitations. Our study is limited by the context of the main study , yet showed some overlap and similar trends as obtained in other independently conducted studies, such as [11, 26]. In total, only 92 papers were selected for analysis and, therefore, this study cannot claim to have delivered a generalizable set of conclusions. A major limitation is the use of a given dataset only without an extra topic-specific literature search, which potentially limits the reliability of the data. An extension and a complementing search, however, is subject to future research.
Future Work. This paper provides the first analysis iteration of the 92 papers selected thus barely scratching the surface. Future work therefore includes further detailed analyses of the study data. Furthermore, as being a study on a data subset, in future iterations, the data analyzed will be (re-)integrated with the main study’s data to improve the overall data quality and reliability of the data.
- 4.Bennett, T., Wennberg, P.: Eliminating embedded software defects prior to integration test. CROSSTALK J. Defense Softw. Eng., pp. 13–18 (2005)Google Scholar
- 5.Bertolino, A., Marchetti, E.: A brief essay on software testing. In: Software Engineering: Development Process, 3rd edn., vol. 1, pp. 393–411 (2005)Google Scholar
- 10.Garcia, C., Dávila, A., Pessoa, M.: Test process models: systematic literature review. In: Mitasiunas, A., Rout, T., O’Connor, R.V., Dorling, A. (eds.) Software Process Improvement and Capability Determination, pp. 84–93. Springer, Heidelberg (2014)Google Scholar
- 11.Garousi, V., Felderer, M., Mäntylä, M.V.: The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, EASE 2016, pp. 26:1–26:6. ACM, New York (2016)Google Scholar
- 12.Camargo, K.G., Ferrari, F.C., Fabbri, S.C.P.F.: Identifying a subset of TMMi practices to establish a streamlined software testing process. In: Brazilian Symposium on Software Engineering, SBES, pp. 137–146. IEEE (2013)Google Scholar
- 14.Harter, D.E., Krishnan, M.S., Slaughter, S.A.: The life cycle effects of software process improvement: a longitudinal analysis. In: Proceedings of the International Conference on Information Systems, ICIS, Atlanta, GA, USA, pp. 346–351. Association for Information Systems (1998)Google Scholar
- 19.Humphrey, W.S.: Managing the Software Process. Addison Wesley, Boston (1989)Google Scholar
- 21.Karthikeyan, S., Rao, S.: Adopting the right software test maturity assessment model. Technical report, Cognizant (2014)Google Scholar
- 23.Kitchenham,B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering. Technical Report EBSE-2007-01, Keele University (2007)Google Scholar
- 24.Kuhrmann, M., Diebold, P., Münch, J.: Software process improvement: a systematic mapping study on the state of the art. PeerJ Comput. Sci. 2(1), 1–38 (2016)Google Scholar
- 25.Kuhrmann, M., Diebold, P., Münch, J., Tell, P.: How does software process improvement address global software engineering? In: International Conference on Global Software Engineering, ICGSE, pp. 89–98. IEEE (2016)Google Scholar
- 26.Kuhrmann, M., Fernández, D.M.: Systematic software development: a state of the practice report from Germany. In: International Conference on Global Software Engineering, ICGSE, pp. 51–60. IEEE (2015)Google Scholar
- 27.Kumar, P.: Test process improvement - evaluation of available models. Technical report, Maveric (2012)Google Scholar
- 29.Li, J., Moe, N.B., Dybå, T.: Transition from a plan-driven process to scrum: a longitudinal case study on software quality. In: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2010, pp. 13:1–13:10. ACM, New York (2010)Google Scholar
- 30.McGarry, F., Burke, S., Decker, B.: Measuring the impacts individual process maturity attributes have on software products. In: Proceedings of Fifth International on Software Metrics Symposium, Metrics 1998, pp. 52–60. IEEE (1998)Google Scholar
- 32.Petersen, K., Feldt, R., Mujtaba, S., Mattson, M.: Systematic mapping studies in software engineering. In: International Conference on Evaluation and Assessment in Software Engineering, EASE, pp. 68–77. ACM (2008)Google Scholar
- 35.Sylemez, M., Tarhan, A.: Using process enactment data analysis to support orthogonal defect classification for software process improvement. In: International Conference on Software Process and Product Measurement, IWSM-MENSURA, pp. 120–125, October 2013Google Scholar
- 36.von Wangenheim, C.G., Hauck, J.C.R., Salviano, C.F., von Wangenheim, A.: Systematic literature review of software process capability/maturity models. In: International Conference on Software Process Improvement and Capability Determination-SPICE (2010)Google Scholar