Skip to main content
Log in

The internal consistency and precedence of key process areas in the capability maturity model for software

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Evaluating the reliability of maturity level (ML) ratings is crucial for providing confidence in the results of software process assessments. This study investigates the dimensions underlying the maturity construct in the Capability Maturity Model (CMM) for Software (SW-CMM) and estimates the internal consistency of each dimension. The results suggest that SW-CMM maturity is a three-dimensional construct, with “Project Implementation” representing the ML 2 key process areas (KPAs), “Organization Implementation” representing the ML 3 KPAs, and “Quantitative Process Implementation” representing the KPAs at MLs 4 and 5. The internal consistency for each of the three dimensions as estimated by Cronbach’s alpha exceeds the recommended value of 0.9. Based on those results, this study builds and tests a theoretical model which posits that the achievement of lower ML KPAs sustains the implementation of higher ML KPAs. Results of path analysis using partial least squares (PLS) support the theoretical model and provide detailed understanding of the process improvement path. The analysis is based on 676 CMM-Based Appraisal for Internal Process Improvement (CBA IPI) assessments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Assessment data on the reports was kept in an SEI repository called the Process Appraisal Information System (PAIS). The PAIS has since been replaced. Appraisals now are reported using the SEI Appraisal System (SAS), http://www.sei.cmu.edu/appraisal-program/profile/report-faq.html.

  2. Assessors could intentionally or unintentionally create consistently-biased ratings that are not detected by internal consistency analyses. For example, they could misunderstand parts of the CMM or they could be under pressure to yield a certain level. Interrater agreement is known to partially alleviate this problem. Trochim (2001) describes how to reduce measurement error.

  3. Functional Area Representatives are practitioners who have technical responsibilities in various areas that support their organizations’ software development or maintenance projects (e.g., configuration management or quality assurance). Note that five organizations are missing.

  4. Chin, W.W. 2001. PLS-Graph User’s Guide, Version 3.0. http://www.pubinfo.vcu.edu/carma/Documents/OCT1405/PLSGRAPH3.0Manual.hubona.pdf.

  5. ®CMMI is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University.

References

  • Anderson JC, Gerbing DW (1991) Predicting the performance of measures in a confirmatory factor analysis with a pretest assessment of their substantive validities. J Appl Psychol 76(5):732–740

    Article  Google Scholar 

  • Bollinger T, McGowan C (1991) A critical look at software capability evaluation. IEEE Softw 8(4):25–41

    Article  Google Scholar 

  • Brodman JG, Johnson DI (1996) Return on investment from software process improvement as measured by U.S. industry. Crosstalk 9(4): 23–29. http://www.stsc.hill.af.mil/crosstalk/1996/04/index.html

  • Campbell DT, Fiske DW. (1959) Convergent and discriminant validation by the multitrait–multimethod matrix. Psychol Bull 56(1):81–105

    Article  Google Scholar 

  • Carmines EG, Zeller RA (1979) Reliability and validity assessment. Sage University paper series on quantitative applications in social sciences. Sage, Newbury Park, CA

    Google Scholar 

  • Chin WW (1998) Issues and opinion on structural equation modeling. MIS Quarterly 22(1):vii–xvi

    MathSciNet  Google Scholar 

  • Chin WW, Newsted PR (1999) Structural equation modeling analysis with small samples using partial least squares. In: Hoyle R (ed) Statistical strategies for small sample research. Sage, Thousand Oaks, CA, pp 307–341

    Google Scholar 

  • Chin WW, Marcolin BL, Newsted P (2003) A partial least squares latent variable modeling approach for measuring interaction effects: results from a Monte Carlo simulation study and an electronic-mail emotion/adoption study. Inf Syst Res 14(2):189–217

    Article  Google Scholar 

  • Clark B (1997) The effect of software process maturity on software development effort. Ph.D. Thesis. University of Southern California, Los Angeles, CA

  • Coffman A, Thompson K (1997) Air force software process improvement report. Crosstalk 10(1):25–27. http://www.stsc.hill.af.mil/crosstalk/1997/01/index.html

    Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavior sciences, 2nd edn. Erlbaum, Hillsdale, NJ

    Google Scholar 

  • Comrey A (1973) A first course on factor analysis. Academic, London

    Google Scholar 

  • Cronbach L (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297–334

    Article  Google Scholar 

  • Curtis B (1996) Factor structure of the CMM and other latent issues. Proceedings of the 1996 SEPG Conference, Atlantic City, NJ, USA

  • Dunaway D, Baker M (2001) Analysis of CMM-based appraisal for internal process improvement (CBA IPI) assessment feedback. Technical report CMU/SEI-2001-TR-021. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA. http://www.sei.cmu.edu/publications/documents/01.reports/01tr021.html

  • El-Emam K (1998) The internal consistency of the ISO/IEC 15504 software process capability scale. Proceedings of the 5th International Symposium on Software Metrics, Los Alamitos, CA, USA, pp 72–81

  • El-Emam K, Goldenson D (1995) SPICE: an empiricist’s perspective. Proceedings of the Second IEEE International Software Engineering Standards Symposium, Los Alamitos, CA, USA, pp 84–97

  • El-Emam K, Goldenson D (2000) An empirical review of software process assessments. Adv Comput 5(3):319–423

    Google Scholar 

  • El-Emam K, Madhavji N (1995) The reliability of measuring organizational maturity. Softw Process Improv Pract 1(1):3–25

    Google Scholar 

  • El-Emam K, Simon J-M, Rousseau S, Jacquet E (1998) Cost implications of interrater agreement for software process assessment. Proceedings of the 5th International Symposium on Software Metrics, Los Alamitos, CA, USA, pp 38–51

  • Fayad M, Laitinen M (1997) Process assessment considered wasteful. Commun ACM 40(11):125–128

    Article  Google Scholar 

  • Fornell C, Larcker DF (1981) Evaluating structural equation models with unobservable variables and measurement errors. J Mark Res 18(1):39–50

    Article  Google Scholar 

  • Fusaro P, El-Emam K, Smith B (1998) The internal consistencies of the 1987 SEI maturity questionnaire and the SPICE capability dimension. Empirical Software Engineering 3(2):179–210

    Article  Google Scholar 

  • Gefen D, Straub D (2005) A practical guide to factorial validity using pls-graph: tutorial and annotated example. Communications of the Association for Information Systems 16:91–109

    Google Scholar 

  • Goldenson D, El-Emam K (2000) What should you measure first? Lessons learned from the software CMM. Software Engineering Symposium, September 2000

  • Gray E, Smith W (1998) On the limitations of software process assessment and the recognition of a required re-orientation for global process improvement. Softw Qual J 7(1):21–34

    Google Scholar 

  • Hattie J (1985) Methodology review: assessing unidimensionality of tests and items. Appl Psychol Meas 9:139–164

    Article  Google Scholar 

  • Herbsleb J, Zubrow D, Goldenson D, Hayes W, Paulk M (1997) Software quality and the capability maturity model. Commun ACM 40(6):30–40

    Article  Google Scholar 

  • Humphrey W, Curtis B (1991) Comments on ‘A Critical Look’. IEEE Softw 8(4):42–46

    Article  Google Scholar 

  • Igbaria M, Zinatelli N, Cragg P, Cavaye A (1997) Personal computing acceptance factors in small firms: a structural equation model. MIS Quarterly 21(3):279–302

    Article  Google Scholar 

  • ISO, ISO/IEC PDTR 15504. 1996. Information technology—software process assessment: part 1–part 9. ISO/IEC JTC1/SC7/WG10

  • Jöreskog KG, Sörbom D (1993) LISREL 8: structural equation modeling with the SIMPLIS command language. SSI Scientific Software International, Chicago

  • Jung H-W, Hunter R (2003) Evaluating the SPICE rating scale with regard to the internal consistency of capability measures. Softw Process Improv Pract 8(3):169–178

    Article  Google Scholar 

  • Jung H-W, Hunter R, Goldenson D, El-Emam K (2001) Findings from phase 2 of the SPICE trials. Softw Process Improv Pract 6(2):205–242

    Article  Google Scholar 

  • Kotrlik J, Williams H (2003) The incorporation of effect size in information technology, learning, and performance research. Inf Technol Learn Perform J 21(1):1–7

    Google Scholar 

  • Krishnan MS, Kellner MI (1999) Measuring process consistency: implications for reducing software defects. IEEE Trans Softw Eng 25(6):800–815

    Article  Google Scholar 

  • Kuder GF, Richardson MW (1937) The theory of the estimation of test reliability. Psychometrika 2(3):151–160

    Article  Google Scholar 

  • Kwok WC, Sharp DJ (1998) A review of construct measurement issues in behavior accounting research. J Account Lit 17:137–174

    Google Scholar 

  • Likert R, Roslow S (1934) The effects upon the reliability of attitude scales of using three, five, seven alternatives. Working paper, New York University, New York

  • Lissitz RW, Green SB (1975) Effects of the number of scale points on reliability: a Monte Carlo approach. J Appl Psychol 60(1):10–13

    Article  Google Scholar 

  • Marcoulides GA, Saunders C (2006) PLS: a silver bullet? MIS Quarterly 30(2):iii–x

    Google Scholar 

  • McIver JP, Carmines GE (1981) Unidimensional scaling. Sage University paper series on quantitative applications in social sciences. Sage, Newbury Park, CA

    Google Scholar 

  • Nunnally JC, Bernstein IH (1994) Psychometric theory, 3rd edn. McGraw-Hill, New York

    Google Scholar 

  • Paulk M, Webber C, Curtis B, Chrissis MB (1994) The capability maturity model: guidelines for improving the software process. Addison-Wesley, New York

    Google Scholar 

  • Saiedian H, Kuzara R (1995) SEI capability maturity models’ impact on contractors. Computer 28(1):16–26

    Article  Google Scholar 

  • SEI (2006) CMMI® for Development (CMMI-DEV), Version 1.2, Technical Report, CMU/SEI-2006-TR-008, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA

  • Sharma S (1996) Applied multivariate techniques. Wiley, New York

    Google Scholar 

  • Spector PF (1992) Summated rating scale construction: an introduction. Sage University paper series on quantitative applications in social sciences. Sage, Newbury Park, CA

    Google Scholar 

  • Trochim WM (2001) The research methods knowledge base, 2nd edn. http://automicdogpublishinh.com

  • Van DeVen AH, Ferry DL (1980) Measuring and assessing organizations. Wiley, New York

    Google Scholar 

  • Werts CE, Linn RL, Jöreskog KG (1974) Intraclass reliability estimates: testing structural assumptions. Educ Psychol Meas 34:25–33

    Article  Google Scholar 

  • Wold H (1982) Soft modeling: intermediate between traditional model building and data analysis. Mathematical Statistics 6:333–346

    MathSciNet  Google Scholar 

  • Zeller RA, Carmines EG (1980) Measurement in the social sciences: the link between theory and data. Cambridge University Press, Cambridge

    Google Scholar 

Download references

Acknowledgements

The authors wish to acknowledge the assessors, sponsors, and others who participated in the assessments of the SW-CMM. This work would not be possible without the information that they regularly provide to the SEI. Thanks to Mike Zuccher, Kenny Smith, and Xiaobo Zhou for their support in extracting the data on which the study is based. The authors would also like to thank Sheila Rosenthal for her expert support with our bibliography, and Lauren Heinz for helping improve the readability of the document. The authors express their thanks to our SEI colleagues, Will Hayes, Mike Konrad, Keith Kost, Steve Masters, Jim McCurley, Mark Paulk, Mike Phillips, and Dave Zubrow. Thanks also to Khaled El-Emam, Robin Hunter, and Hyung-Min Park for their valuable comments on earlier drafts. Many thanks to the anonymous referees for the valuable comments and suggestions to improve the presentation of the paper. This study was supported by a Korea University Grant (2006). This support is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ho-Won Jung.

Appendices

Appendix 1

1.1 Estimating Cronbach’s Alpha

In discussing the reliability of measurements, a set of items (indicators) is posited to reflect an underlying construct. In the SW-CMM, maturity that is neither directly measurable nor observable can be indirectly measured by considering the assessed values of the KPAs. We can say that the SW-CMM uses an 18-item (or KPA) instrument to measure the process maturity of organizations. If the necessary data were readily available, we also could use a 52-item (goal) instrument (the SW-CMM includes 52 goals, with 20, 17, 6, and 9 goals in MLs 2, 3, 4, and 5, respectively).

The type of scale used in most assessment instruments is a summative one (Spector 1992). A summative scale means that the individual ratings x i ’s for item i are summed up to produce an overall rating score (i.e., \( y = {\sum\limits_{i = 1}^n {x_{i} } } \), where n denotes the number of items). One property of the covariance matrix for a summative rating is that the sum of all terms in the matrix gives exactly the variance of the scale as a whole, i.e., \( \sigma ^{2}_{y} = {\sum\limits_{{\left( {i,j} \right)}} {\sigma _{{ij}} } } \), where i and j are indices of x i and x j ; if i = j, then \( \sigma _{{ij}} = \sigma ^{e}_{i} , \), i.e., variance of the ith item’s ratings (unique variation). The variability in a set of items score is considered to consist of the following two components:

  • The error terms are the source of unique variation that each item possesses (i.e., \( {\sum\limits_i {\sigma ^{2}_{i} } } \)).

  • The signal component of variance (covariance part) that is considered to be attributable to a common source due to the fact that maturity is the difference between total variance \( {\left( {\sigma ^{2}_{y} } \right)} \) and unique variance \( {\left( {{\sum\limits_i {\sigma ^{2}_{i} } }} \right)} \) (i.e., \( \sigma ^{2}_{y} - {\sum\limits_i {\sigma ^{2}_{i} } } \)).

The ratio of true to observed variance is \( {\left( {\sigma ^{2}_{y} - {\sum\limits_i {\sigma ^{2}_{i} } }} \right)} \div \sigma ^{2}_{y} \). To express the ratio in relative terms, the number of elements in the covariance matrix of a summative rating must be considered. The total number of elements in the covariance matrix is n 2, and the total number of communal elements is n 2 − n. Thus, Cronbach’s alpha becomes the following:

$$ \alpha = \frac{n} {{{\left( {n - 1} \right)}}}{\left[ {1 - {\sum\limits_i {{\sigma ^{2}_{i} } \mathord{\left/ {\vphantom {{\sigma ^{2}_{i} } {\sigma ^{2}_{y} }}} \right. \kern-\nulldelimiterspace} {\sigma ^{2}_{y} }} }} \right]}\,{\text{or}}\,\alpha = \frac{{n\overline{\rho } }} {{1 + \overline{\rho } {\left( {n - 1} \right)}}}, $$

where \( \overline{\rho } \) is equal to the mean inter-item correlation.

Cronbach’s alpha is a generalization of a KR20 Kuder–Richardson formula number 20 coefficient to estimate the reliability of items scored dichotomously with zero or one (Kuder and Richardson 1937). The KR20 is computed as follows:

$$ {\text{KR}}20 = \frac{n} {{n - 1}}{\left[ {1 - {\sum\limits_i {p_{i} {{\left( {1 - p_{i} } \right)}} \mathord{\left/ {\vphantom {{{\left( {1 - p_{i} } \right)}} {\sigma ^{2}_{y} }}} \right. \kern-\nulldelimiterspace} {\sigma ^{2}_{y} }} }} \right]}, $$

where n is the number of dichotomous items; p i is the proportion responding “positively” (i.e., coded to 1) to item i; and \( \sigma ^{2}_{y} \) is equal to the variance of the total composite. The KR20 has the same interpretation as Cronbach’s alpha.

To illustrate the computation of Cronbach’s alpha coefficient, let us use a case with 676 ratings at ML 2. Five KPAs at ML 2 are Requirements Management, Software Project Planning, Software Project Tracking and Oversight, Software Quality Assurance, and Software Configuration Management. The sample variances of the ith KPA, \( \sigma ^{2}_{i} \), is computed as \( {\sum\limits_i {{{\left( {x_{{ij}} - \overline{x} _{i} } \right)}^{2} } \mathord{\left/ {\vphantom {{{\left( {x_{{ij}} - \overline{x} _{i} } \right)}^{2} } {{\left( {N - 1} \right)},}}} \right. \kern-\nulldelimiterspace} {{\left( {N - 1} \right)},}} } \) where the \( x_{{ij}} \prime s \) are the codified actual values of ratings for each assessment j of the ith KPA (i = 1,..., 5; j = 1,...,N), \( \overline{x} _{i} \) is the mean of all assessments of the ith KPA, and N is the total number of assessments. As an example of data set 1 in Table 2, the computed sample variance for each KPA is 0.066 (Requirements Management), 0.095 (Software Project Planning), 0.098 (Software Project Tracking and Oversight), 0.106 (Software Quality Assurance), and 0.093 (Software Configuration Management). The sum of the sample variance, \( {\sum\limits_{i = 1}^5 {\sigma ^{2}_{i} } } \), is 0.458. For the five-KPAs, the sample variance of the sum of the five KPAs is 1.904 \( {\left( {\sigma ^{2}_{y} } \right)} \). Then, Cronbach’s alpha coefficient would be 0.95, i.e., (5 / 4) × (1 − 0.458 / 1.904).

Appendix 2

1.1 Convergent and Discriminant Validities

Figure 7 is a redrawing of Fig. 6b that includes the component (factor) loadings to explain convergent and discriminant validities, where λ ij is the component loading to an indicator; ɛ ij and δ ij denotes measurement error of indicators (KPAs) in exogenous and indigenous latent variables (constructs), respectively. The notations ξ and η are exogenous and indigenous latent variables, respectively. In addition, ζ denotes an error term of indigenous latent variable.

Fig. 7
figure 7

Full presentation of results for the theoretical model in Fig. 6b

PLS modeling including other SEMs consists of two components. The first is the inner model (also termed inner relationships, structural models, or substantive theory). For example, the relationship between the latent construct “Project Implementation” (ML 2) and “Organization Implementation” (ML 3) in Figure 7 is an inner model. The latter is the outer model (also referred to as measurement model or outer relationships) that defines how each block of indicators relates to its latent variables.

Convergent (also referred to as composite reliability) and discriminant validities are components of a measurement concept known as construct validity. These two validities summarize how well indicators relate to the constructs (Gefen and Straub 2005). They can be evaluated within the PLS model.

Convergent validity is the extent to which multiple methods to construct measurement provide the same results (Campbell and Fiske 1959). Each indicator can be viewed as a different method of measuring the same construct, so the extent of convergence of the set of indicators on a construct can be assessed by showing that each indicator strongly correlates with its assumed theoretical construct. Convergent validity in a given block of indicators is calculated as follows (Werts et al. 1974):

$$ \rho _{c} = \frac{{{\left( {{\sum\limits_i {\lambda _{{ij}} } }} \right)}^{2} }} {{{\left( {{\sum\limits_i {\lambda _{{ij}} } }} \right)}^{2} + {\sum\limits_i {\operatorname{var} {\left( {\varepsilon _{{ij}} } \right)}} }}}\,\,{\text{for}}\,{\text{construct}}\,j, $$

where ɛ ij is replaced with δ ij for the block of indicators in the exogenous latent variables. A low value of ρ c means poor construct definition and/or multidimensionality (where more than one defined construct may be preferable). A recommended value of the coefficient is greater than 0.7 (“modest composite reliability”) (Nunnally and Bernstein 1994).

Discriminant validity of a set of constructs can be undertaken after the convergent validity of individual constructs has been established. Discriminant validity assesses the extent to which a construct and its indicators differ from another construct and its indicators. Discriminant validity is shown when each indicator weakly or not significantly correlates with other constructs in the same model (i.e., the variance in the indicator mostly reflect the variance attributable to its assumed latent variable and not to other latent variables) (Gefen and Straub 2005). Average variance extracted (AVE) is used to summarize discriminant validity (Fornell and Larcker 1981). It is computed as follows:

$$ {\text{AVE}} = \frac{{{\sum\limits_i {\lambda ^{2}_{{ij}} } }}} {{{\sum\limits_i {\lambda ^{2}_{{ij}} } } + {\sum\limits_i {\operatorname{var} {\left( {\varepsilon _{{ij}} } \right)}} }}}\,\,{\text{for}}\,{\text{construct}}\,j. $$

As a rule of thumb, criteria for determining discriminant validity are that the root of the AVE of each construct should be much larger, although there are no guidelines about how much larger, than the correlation of the specific construct with any of the other constructs in the model (Chin 1998). Fornell and Larcker (1981) suggested that AVE can also be interpreted as a measure of reliability for the latent variable component score with a recommended value of 0.5.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jung, HW., Goldenson, D.R. The internal consistency and precedence of key process areas in the capability maturity model for software. Empir Software Eng 13, 125–146 (2008). https://doi.org/10.1007/s10664-007-9049-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-007-9049-1

Keywords

Navigation