The internal consistency and precedence of key process areas in the capability maturity model for software

Jung, Ho-Won; Goldenson, Dennis R.

doi:10.1007/s10664-007-9049-1

The internal consistency and precedence of key process areas in the capability maturity model for software

Published: 03 October 2007

Volume 13, pages 125–146, (2008)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Ho-Won Jung¹ &
Dennis R. Goldenson²

356 Accesses
6 Citations
Explore all metrics

Abstract

Evaluating the reliability of maturity level (ML) ratings is crucial for providing confidence in the results of software process assessments. This study investigates the dimensions underlying the maturity construct in the Capability Maturity Model (CMM) for Software (SW-CMM) and estimates the internal consistency of each dimension. The results suggest that SW-CMM maturity is a three-dimensional construct, with “Project Implementation” representing the ML 2 key process areas (KPAs), “Organization Implementation” representing the ML 3 KPAs, and “Quantitative Process Implementation” representing the KPAs at MLs 4 and 5. The internal consistency for each of the three dimensions as estimated by Cronbach’s alpha exceeds the recommended value of 0.9. Based on those results, this study builds and tests a theoretical model which posits that the achievement of lower ML KPAs sustains the implementation of higher ML KPAs. Results of path analysis using partial least squares (PLS) support the theoretical model and provide detailed understanding of the process improvement path. The analysis is based on 676 CMM-Based Appraisal for Internal Process Improvement (CBA IPI) assessments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proposal for a Maturity Model Based on Expert Judgment for Spanish Project Organisations

Rethinking Maturity Models: From Project Management to Project-Based

Best Practices for Software Maturity Improvement: A GÉANT Case Study

Notes

Assessment data on the reports was kept in an SEI repository called the Process Appraisal Information System (PAIS). The PAIS has since been replaced. Appraisals now are reported using the SEI Appraisal System (SAS), http://www.sei.cmu.edu/appraisal-program/profile/report-faq.html.
Assessors could intentionally or unintentionally create consistently-biased ratings that are not detected by internal consistency analyses. For example, they could misunderstand parts of the CMM or they could be under pressure to yield a certain level. Interrater agreement is known to partially alleviate this problem. Trochim (2001) describes how to reduce measurement error.
Functional Area Representatives are practitioners who have technical responsibilities in various areas that support their organizations’ software development or maintenance projects (e.g., configuration management or quality assurance). Note that five organizations are missing.
Chin, W.W. 2001. PLS-Graph User’s Guide, Version 3.0. http://www.pubinfo.vcu.edu/carma/Documents/OCT1405/PLSGRAPH3.0Manual.hubona.pdf.
®CMMI is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University.

References

Anderson JC, Gerbing DW (1991) Predicting the performance of measures in a confirmatory factor analysis with a pretest assessment of their substantive validities. J Appl Psychol 76(5):732–740
Article Google Scholar
Bollinger T, McGowan C (1991) A critical look at software capability evaluation. IEEE Softw 8(4):25–41
Article Google Scholar
Brodman JG, Johnson DI (1996) Return on investment from software process improvement as measured by U.S. industry. Crosstalk 9(4): 23–29. http://www.stsc.hill.af.mil/crosstalk/1996/04/index.html
Campbell DT, Fiske DW. (1959) Convergent and discriminant validation by the multitrait–multimethod matrix. Psychol Bull 56(1):81–105
Article Google Scholar
Carmines EG, Zeller RA (1979) Reliability and validity assessment. Sage University paper series on quantitative applications in social sciences. Sage, Newbury Park, CA
Google Scholar
Chin WW (1998) Issues and opinion on structural equation modeling. MIS Quarterly 22(1):vii–xvi
MathSciNet Google Scholar
Chin WW, Newsted PR (1999) Structural equation modeling analysis with small samples using partial least squares. In: Hoyle R (ed) Statistical strategies for small sample research. Sage, Thousand Oaks, CA, pp 307–341
Google Scholar
Chin WW, Marcolin BL, Newsted P (2003) A partial least squares latent variable modeling approach for measuring interaction effects: results from a Monte Carlo simulation study and an electronic-mail emotion/adoption study. Inf Syst Res 14(2):189–217
Article Google Scholar
Clark B (1997) The effect of software process maturity on software development effort. Ph.D. Thesis. University of Southern California, Los Angeles, CA
Coffman A, Thompson K (1997) Air force software process improvement report. Crosstalk 10(1):25–27. http://www.stsc.hill.af.mil/crosstalk/1997/01/index.html
Google Scholar
Cohen J (1988) Statistical power analysis for the behavior sciences, 2nd edn. Erlbaum, Hillsdale, NJ
Google Scholar
Comrey A (1973) A first course on factor analysis. Academic, London
Google Scholar
Cronbach L (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297–334
Article Google Scholar
Curtis B (1996) Factor structure of the CMM and other latent issues. Proceedings of the 1996 SEPG Conference, Atlantic City, NJ, USA
Dunaway D, Baker M (2001) Analysis of CMM-based appraisal for internal process improvement (CBA IPI) assessment feedback. Technical report CMU/SEI-2001-TR-021. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA. http://www.sei.cmu.edu/publications/documents/01.reports/01tr021.html
El-Emam K (1998) The internal consistency of the ISO/IEC 15504 software process capability scale. Proceedings of the 5th International Symposium on Software Metrics, Los Alamitos, CA, USA, pp 72–81
El-Emam K, Goldenson D (1995) SPICE: an empiricist’s perspective. Proceedings of the Second IEEE International Software Engineering Standards Symposium, Los Alamitos, CA, USA, pp 84–97
El-Emam K, Goldenson D (2000) An empirical review of software process assessments. Adv Comput 5(3):319–423
Google Scholar
El-Emam K, Madhavji N (1995) The reliability of measuring organizational maturity. Softw Process Improv Pract 1(1):3–25
Google Scholar
El-Emam K, Simon J-M, Rousseau S, Jacquet E (1998) Cost implications of interrater agreement for software process assessment. Proceedings of the 5th International Symposium on Software Metrics, Los Alamitos, CA, USA, pp 38–51
Fayad M, Laitinen M (1997) Process assessment considered wasteful. Commun ACM 40(11):125–128
Article Google Scholar
Fornell C, Larcker DF (1981) Evaluating structural equation models with unobservable variables and measurement errors. J Mark Res 18(1):39–50
Article Google Scholar
Fusaro P, El-Emam K, Smith B (1998) The internal consistencies of the 1987 SEI maturity questionnaire and the SPICE capability dimension. Empirical Software Engineering 3(2):179–210
Article Google Scholar
Gefen D, Straub D (2005) A practical guide to factorial validity using pls-graph: tutorial and annotated example. Communications of the Association for Information Systems 16:91–109
Google Scholar
Goldenson D, El-Emam K (2000) What should you measure first? Lessons learned from the software CMM. Software Engineering Symposium, September 2000
Gray E, Smith W (1998) On the limitations of software process assessment and the recognition of a required re-orientation for global process improvement. Softw Qual J 7(1):21–34
Google Scholar
Hattie J (1985) Methodology review: assessing unidimensionality of tests and items. Appl Psychol Meas 9:139–164
Article Google Scholar
Herbsleb J, Zubrow D, Goldenson D, Hayes W, Paulk M (1997) Software quality and the capability maturity model. Commun ACM 40(6):30–40
Article Google Scholar
Humphrey W, Curtis B (1991) Comments on ‘A Critical Look’. IEEE Softw 8(4):42–46
Article Google Scholar
Igbaria M, Zinatelli N, Cragg P, Cavaye A (1997) Personal computing acceptance factors in small firms: a structural equation model. MIS Quarterly 21(3):279–302
Article Google Scholar
ISO, ISO/IEC PDTR 15504. 1996. Information technology—software process assessment: part 1–part 9. ISO/IEC JTC1/SC7/WG10
Jöreskog KG, Sörbom D (1993) LISREL 8: structural equation modeling with the SIMPLIS command language. SSI Scientific Software International, Chicago
Jung H-W, Hunter R (2003) Evaluating the SPICE rating scale with regard to the internal consistency of capability measures. Softw Process Improv Pract 8(3):169–178
Article Google Scholar
Jung H-W, Hunter R, Goldenson D, El-Emam K (2001) Findings from phase 2 of the SPICE trials. Softw Process Improv Pract 6(2):205–242
Article Google Scholar
Kotrlik J, Williams H (2003) The incorporation of effect size in information technology, learning, and performance research. Inf Technol Learn Perform J 21(1):1–7
Google Scholar
Krishnan MS, Kellner MI (1999) Measuring process consistency: implications for reducing software defects. IEEE Trans Softw Eng 25(6):800–815
Article Google Scholar
Kuder GF, Richardson MW (1937) The theory of the estimation of test reliability. Psychometrika 2(3):151–160
Article Google Scholar
Kwok WC, Sharp DJ (1998) A review of construct measurement issues in behavior accounting research. J Account Lit 17:137–174
Google Scholar
Likert R, Roslow S (1934) The effects upon the reliability of attitude scales of using three, five, seven alternatives. Working paper, New York University, New York
Lissitz RW, Green SB (1975) Effects of the number of scale points on reliability: a Monte Carlo approach. J Appl Psychol 60(1):10–13
Article Google Scholar
Marcoulides GA, Saunders C (2006) PLS: a silver bullet? MIS Quarterly 30(2):iii–x
Google Scholar
McIver JP, Carmines GE (1981) Unidimensional scaling. Sage University paper series on quantitative applications in social sciences. Sage, Newbury Park, CA
Google Scholar
Nunnally JC, Bernstein IH (1994) Psychometric theory, 3rd edn. McGraw-Hill, New York
Google Scholar
Paulk M, Webber C, Curtis B, Chrissis MB (1994) The capability maturity model: guidelines for improving the software process. Addison-Wesley, New York
Google Scholar
Saiedian H, Kuzara R (1995) SEI capability maturity models’ impact on contractors. Computer 28(1):16–26
Article Google Scholar
SEI (2006) CMMI® for Development (CMMI-DEV), Version 1.2, Technical Report, CMU/SEI-2006-TR-008, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA
Sharma S (1996) Applied multivariate techniques. Wiley, New York
Google Scholar
Spector PF (1992) Summated rating scale construction: an introduction. Sage University paper series on quantitative applications in social sciences. Sage, Newbury Park, CA
Google Scholar
Trochim WM (2001) The research methods knowledge base, 2nd edn. http://automicdogpublishinh.com
Van DeVen AH, Ferry DL (1980) Measuring and assessing organizations. Wiley, New York
Google Scholar
Werts CE, Linn RL, Jöreskog KG (1974) Intraclass reliability estimates: testing structural assumptions. Educ Psychol Meas 34:25–33
Article Google Scholar
Wold H (1982) Soft modeling: intermediate between traditional model building and data analysis. Mathematical Statistics 6:333–346
MathSciNet Google Scholar
Zeller RA, Carmines EG (1980) Measurement in the social sciences: the link between theory and data. Cambridge University Press, Cambridge
Google Scholar

Download references

Acknowledgements

The authors wish to acknowledge the assessors, sponsors, and others who participated in the assessments of the SW-CMM. This work would not be possible without the information that they regularly provide to the SEI. Thanks to Mike Zuccher, Kenny Smith, and Xiaobo Zhou for their support in extracting the data on which the study is based. The authors would also like to thank Sheila Rosenthal for her expert support with our bibliography, and Lauren Heinz for helping improve the readability of the document. The authors express their thanks to our SEI colleagues, Will Hayes, Mike Konrad, Keith Kost, Steve Masters, Jim McCurley, Mark Paulk, Mike Phillips, and Dave Zubrow. Thanks also to Khaled El-Emam, Robin Hunter, and Hyung-Min Park for their valuable comments on earlier drafts. Many thanks to the anonymous referees for the valuable comments and suggestions to improve the presentation of the paper. This study was supported by a Korea University Grant (2006). This support is gratefully acknowledged.

Author information

Authors and Affiliations

Korea University Business School, Anam-dong 5Ka, Sungbuk-gu, Seoul, 136-701, South Korea
Ho-Won Jung
Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, 15213-3890, USA
Dennis R. Goldenson

Authors

Ho-Won Jung
View author publications
You can also search for this author in PubMed Google Scholar
Dennis R. Goldenson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ho-Won Jung.

Appendices

Appendix 1 1.1 Estimating Cronbach’s Alpha

In discussing the reliability of measurements, a set of items (indicators) is posited to reflect an underlying construct. In the SW-CMM, maturity that is neither directly measurable nor observable can be indirectly measured by considering the assessed values of the KPAs. We can say that the SW-CMM uses an 18-item (or KPA) instrument to measure the process maturity of organizations. If the necessary data were readily available, we also could use a 52-item (goal) instrument (the SW-CMM includes 52 goals, with 20, 17, 6, and 9 goals in MLs 2, 3, 4, and 5, respectively).

The type of scale used in most assessment instruments is a summative one (Spector 1992). A summative scale means that the individual ratings x _i’s for item i are summed up to produce an overall rating score (i.e., $ y = {\sum\limits_{i = 1}^n {x_{i} } } $, where n denotes the number of items). One property of the covariance matrix for a summative rating is that the sum of all terms in the matrix gives exactly the variance of the scale as a whole, i.e., $ \sigma ^{2}_{y} = {\sum\limits_{{\left( {i,j} \right)}} {\sigma _{{ij}} } } $, where i and j are indices of x _i and x _j; if i = j, then $ \sigma _{{ij}} = \sigma ^{e}_{i} , $, i.e., variance of the ith item’s ratings (unique variation). The variability in a set of items score is considered to consist of the following two components:

The error terms are the source of unique variation that each item possesses (i.e., $ {\sum\limits_i {\sigma ^{2}_{i} } } $).
The signal component of variance (covariance part) that is considered to be attributable to a common source due to the fact that maturity is the difference between total variance $ {\left( {\sigma ^{2}_{y} } \right)} $ and unique variance $ {\left( {{\sum\limits_i {\sigma ^{2}_{i} } }} \right)} $ (i.e., $ \sigma ^{2}_{y} - {\sum\limits_i {\sigma ^{2}_{i} } } $).

The ratio of true to observed variance is $ {\left( {\sigma ^{2}_{y} - {\sum\limits_i {\sigma ^{2}_{i} } }} \right)} \div \sigma ^{2}_{y} $. To express the ratio in relative terms, the number of elements in the covariance matrix of a summative rating must be considered. The total number of elements in the covariance matrix is n ², and the total number of communal elements is n ² − n. Thus, Cronbach’s alpha becomes the following:

$$ \alpha = \frac{n} {{{\left( {n - 1} \right)}}}{\left[ {1 - {\sum\limits_i {{\sigma ^{2}_{i} } \mathord{\left/ {\vphantom {{\sigma ^{2}_{i} } {\sigma ^{2}_{y} }}} \right. \kern-\nulldelimiterspace} {\sigma ^{2}_{y} }} }} \right]}\,{\text{or}}\,\alpha = \frac{{n\overline{\rho } }} {{1 + \overline{\rho } {\left( {n - 1} \right)}}}, $$

where $ \overline{\rho } $ is equal to the mean inter-item correlation.

Cronbach’s alpha is a generalization of a KR20 Kuder–Richardson formula number 20 coefficient to estimate the reliability of items scored dichotomously with zero or one (Kuder and Richardson 1937). The KR20 is computed as follows:

$$ {\text{KR}}20 = \frac{n} {{n - 1}}{\left[ {1 - {\sum\limits_i {p_{i} {{\left( {1 - p_{i} } \right)}} \mathord{\left/ {\vphantom {{{\left( {1 - p_{i} } \right)}} {\sigma ^{2}_{y} }}} \right. \kern-\nulldelimiterspace} {\sigma ^{2}_{y} }} }} \right]}, $$

where n is the number of dichotomous items; p _i is the proportion responding “positively” (i.e., coded to 1) to item i; and $ \sigma ^{2}_{y} $ is equal to the variance of the total composite. The KR20 has the same interpretation as Cronbach’s alpha.

To illustrate the computation of Cronbach’s alpha coefficient, let us use a case with 676 ratings at ML 2. Five KPAs at ML 2 are Requirements Management, Software Project Planning, Software Project Tracking and Oversight, Software Quality Assurance, and Software Configuration Management. The sample variances of the ith KPA, $ \sigma ^{2}_{i} $, is computed as $ {\sum\limits_i {{{\left( {x_{{ij}} - \overline{x} _{i} } \right)}^{2} } \mathord{\left/ {\vphantom {{{\left( {x_{{ij}} - \overline{x} _{i} } \right)}^{2} } {{\left( {N - 1} \right)},}}} \right. \kern-\nulldelimiterspace} {{\left( {N - 1} \right)},}} } $ where the $ x_{{ij}} \prime s $ are the codified actual values of ratings for each assessment j of the ith KPA (i = 1,..., 5; j = 1,...,N), $ \overline{x} _{i} $ is the mean of all assessments of the ith KPA, and N is the total number of assessments. As an example of data set 1 in Table 2, the computed sample variance for each KPA is 0.066 (Requirements Management), 0.095 (Software Project Planning), 0.098 (Software Project Tracking and Oversight), 0.106 (Software Quality Assurance), and 0.093 (Software Configuration Management). The sum of the sample variance, $ {\sum\limits_{i = 1}^5 {\sigma ^{2}_{i} } } $, is 0.458. For the five-KPAs, the sample variance of the sum of the five KPAs is 1.904 $ {\left( {\sigma ^{2}_{y} } \right)} $. Then, Cronbach’s alpha coefficient would be 0.95, i.e., (5 / 4) × (1 − 0.458 / 1.904).

Appendix 2 1.1 Convergent and Discriminant Validities

Figure 7 is a redrawing of Fig. 6b that includes the component (factor) loadings to explain convergent and discriminant validities, where λ _ij is the component loading to an indicator; ɛ _ij and δ _ij denotes measurement error of indicators (KPAs) in exogenous and indigenous latent variables (constructs), respectively. The notations ξ and η are exogenous and indigenous latent variables, respectively. In addition, ζ denotes an error term of indigenous latent variable.

PLS modeling including other SEMs consists of two components. The first is the inner model (also termed inner relationships, structural models, or substantive theory). For example, the relationship between the latent construct “Project Implementation” (ML 2) and “Organization Implementation” (ML 3) in Figure 7 is an inner model. The latter is the outer model (also referred to as measurement model or outer relationships) that defines how each block of indicators relates to its latent variables.

Convergent (also referred to as composite reliability) and discriminant validities are components of a measurement concept known as construct validity. These two validities summarize how well indicators relate to the constructs (Gefen and Straub 2005). They can be evaluated within the PLS model.

Convergent validity is the extent to which multiple methods to construct measurement provide the same results (Campbell and Fiske 1959). Each indicator can be viewed as a different method of measuring the same construct, so the extent of convergence of the set of indicators on a construct can be assessed by showing that each indicator strongly correlates with its assumed theoretical construct. Convergent validity in a given block of indicators is calculated as follows (Werts et al. 1974):

$$ \rho _{c} = \frac{{{\left( {{\sum\limits_i {\lambda _{{ij}} } }} \right)}^{2} }} {{{\left( {{\sum\limits_i {\lambda _{{ij}} } }} \right)}^{2} + {\sum\limits_i {\operatorname{var} {\left( {\varepsilon _{{ij}} } \right)}} }}}\,\,{\text{for}}\,{\text{construct}}\,j, $$

where ɛ _ij is replaced with δ _ij for the block of indicators in the exogenous latent variables. A low value of ρ _c means poor construct definition and/or multidimensionality (where more than one defined construct may be preferable). A recommended value of the coefficient is greater than 0.7 (“modest composite reliability”) (Nunnally and Bernstein 1994).

Discriminant validity of a set of constructs can be undertaken after the convergent validity of individual constructs has been established. Discriminant validity assesses the extent to which a construct and its indicators differ from another construct and its indicators. Discriminant validity is shown when each indicator weakly or not significantly correlates with other constructs in the same model (i.e., the variance in the indicator mostly reflect the variance attributable to its assumed latent variable and not to other latent variables) (Gefen and Straub 2005). Average variance extracted (AVE) is used to summarize discriminant validity (Fornell and Larcker 1981). It is computed as follows:

$$ {\text{AVE}} = \frac{{{\sum\limits_i {\lambda ^{2}_{{ij}} } }}} {{{\sum\limits_i {\lambda ^{2}_{{ij}} } } + {\sum\limits_i {\operatorname{var} {\left( {\varepsilon _{{ij}} } \right)}} }}}\,\,{\text{for}}\,{\text{construct}}\,j. $$

As a rule of thumb, criteria for determining discriminant validity are that the root of the AVE of each construct should be much larger, although there are no guidelines about how much larger, than the correlation of the specific construct with any of the other constructs in the model (Chin 1998). Fornell and Larcker (1981) suggested that AVE can also be interpreted as a measure of reliability for the latent variable component score with a recommended value of 0.5.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jung, HW., Goldenson, D.R. The internal consistency and precedence of key process areas in the capability maturity model for software. Empir Software Eng 13, 125–146 (2008). https://doi.org/10.1007/s10664-007-9049-1

Download citation

Received: 20 January 2007
Accepted: 17 August 2007
Published: 03 October 2007
Issue Date: April 2008
DOI: https://doi.org/10.1007/s10664-007-9049-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The internal consistency and precedence of key process areas in the capability maturity model for software

Abstract

Access this article

Similar content being viewed by others

Proposal for a Maturity Model Based on Expert Judgment for Spanish Project Organisations

Rethinking Maturity Models: From Project Management to Project-Based

Best Practices for Software Maturity Improvement: A GÉANT Case Study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

1.1 Estimating Cronbach’s Alpha

Appendix 2

1.1 Convergent and Discriminant Validities

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The internal consistency and precedence of key process areas in the capability maturity model for software

Abstract

Access this article

Similar content being viewed by others

Proposal for a Maturity Model Based on Expert Judgment for Spanish Project Organisations

Rethinking Maturity Models: From Project Management to Project-Based

Best Practices for Software Maturity Improvement: A GÉANT Case Study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

1.1 Estimating Cronbach’s Alpha

Appendix 2

1.1 Convergent and Discriminant Validities

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation