Allen, M., and Yen, W. 1979. Introduction to Measurement Theory. Brooks/Cole Publishing Company.
Altman, D. 1991. Practical Statistics for Medical Research. Chapman and Hall.
Armitage, P., and Berry, G. 1994. Statistical Methods in Medical Research. Blackwell Science.
Bennett, E., Alpert, R., and Goldstein, A. 1954. Communications through limited response questioning. Public Opinion Quarterly 18: 303–308.
Bicego, A., Khurana, M., and Kuvaja, P. 1998. Bootstrap 3.0: Software process assessment methodology. Proceedings of SQM'98.
Briand, L., El Emam, K., Laitenberger, O., and Fussbroich, T. 1998. Using simulation to build inspection efficiency benchmarks for development projects. Proceedings of the International Conference on Software Engineering. 340–349.
Briand, L., El Emam, K., and Wieczorek, I. 1998. A case study in productivity benchmarking: Methods and lessons learned. Proceedings of the 9th European Software Control and Metrics Conference. Shaker Publishing B.V., The Netherlands, 4–14.
Camp, R. 1989. Benchmarking: The Search for Industry Best Practices that Lead to Superior Performance. ASQC Quality Press.
Cicchetti, D. 1972. A new measure of agreement netween rank order variables. Proceedings of the American Psychological Association 7: 17–18.
Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement XX(1): 37–46.
Cohen, J. 1968. Weighted kappa: Nominal scale agreement with provision for scaled agreement or partial credit. Psychological Bulletin 70: 213–220.
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates.
El Emam, K. 1998. The internal consistency of the ISO/IEC 15504 software process capability scale. To appear in Proceedings of the 5th International Symposium on Software Metrics. IEEE CS Press.
El Emam, K., and Goldenson, D. R. 1995. SPICE: An empiricist's perspective. Proceedings of the Second IEEE International Software Engineering Standards Symposium, pp. 84–97.
El Emam, K., and Madhavji, N. H. 1995. The reliability of measuring organizational maturity. Software Process Improvement and Practice Journal 1(1): 3–25.
El Emam, K., Briand, L., and Smith, B. 1996. Assessor agreement in rating SPICE processes. Software Process Improvement and Practice Journal 2(4): 291–306.
El Emam, K., and Goldenson, D. R. 1996. An empirical evaluation of the prospective international SPICE standard. Software Process Improvement and Practice Journal 2(2): 123–148.
El Emam, K., Goldenson, D., Briand, L., and Marshall, P. 1996. Interrater agreement in SPICE-based assessments: Some preliminary results. Proceedings of the International Conference on the Software Process, pp. 149–156.
El Emam, K., Smith, B., and Fusaro, P. 1997. Modeling the reliability of SPICE based assessments. Proceedings of the Third IEEE International Software Engineering Standards Symposium, pp. 69–82.
El Emam, K., Drouin, J-N, and Melo, W. (eds.) 1998. SPICE: The Theory and Practice of Software Process Improvement and Capability Determination. IEEE CS Press.
El Emam, K., and Marshall, P. 1998. Interrater agreement in assessment ratings. El Emam, K., Drouin, J-N, and Melo, W. (eds.) SPICE: The Theory and Practice of Software Process Improvement and Capability Determination. IEEE CS Press.
El Emam, K., Simon, J-M, Rousseau, S., and Jacquet, E. 1998. Cost implications of interrater agreement for software process assessments. To appear in Proceedings of the 5th International Symposium on Software Metrics. IEEE CS Press.
El Emam, K., and Wieczorek, I. 1998. The repeatability of code defect classifications. To appear in Proceedings of the International Symposium on Software Reliability Engineering. IEEE CS Press.
Everitt, B. 1992. The Analysis of Contingency Tables. Chapman and Hall.
Fleiss, J. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5): 378–382.
Fleiss, J. 1981. Statistical Methods for Rates and Proportions. John Wiley & Sons.
Fleiss, J., and Cohen, J. 1973. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement 33: 613–619.
Fleiss, J., Cohen, J., and Everitt, B. 1969. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72(5): 323–327.
Fusaro, P., El Emam, K., and Smith, B. 1997a. Evaluating the interrater agreement of process capability ratings. Proceedings of the Fourth International Software Metrics Symposium. 2–11.
Fusaro, P., El Emam, K., and Smith, B. 1997b. The internal consistencies of the 1987 SEI maturity questionnaire and the SPICE capability dimension. Empirical Software Engineering: An International Journal 3: 179–201, Kluwer Academic Publishers.
Gordis, L. 1996. Epidemiology. W. B. Saunders.
Hartmann, D. 1977. Considerations in the choice of interobserver reliability estimates. Journal of Applied Behavior Analysis 10(1): 103–116.
Henkel, E. 1976. Tests of Significance. Sage Publications.
Landis, J., and Koch, G. 1977. The measurement of observer agreement for categorical data. Biometrics 33: 159–174.
Lindsay, R., and Ehrenberg, A. 1993. The design of replicated studies. The American Statistician 47(3): 217–228.
Lyman, H. 1963. Test Scores and What They Mean. Prentice-Hall.
Maclennan, F., Ostrolenk, G., and Tobin, M. 1998. Introduction to the SPICE trials. El Emam, K., Drouin, J-N, and Melo, W. (eds.) SPICE: The Theory and Practice of Software Process Improvement and Capability Determination. IEEE CS Press.
Rout, T., and Simms, P. 1998. Introduction to the SPICE documents and architecture. El Emam, K., Drouin, J-N, and Melo, W. (eds.) SPICE: The Theory and Practice of Software Process Improvement and Capability Determination. IEEE CS Press.
Scott, W. 1955. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly 19: 321–325.
Simon, J-M, El Emam, K., Rousseau, S., Jacquet, E., and Babey, F. 1997. The reliability of ISO/IEC PDTR 15504 assessments. Software Process Improvement and Practice Journal 3: 177–188.
Software Engineering Institute 1998. CMMI A Specification Version 1.1. Available at http://www.sei.cmu.edu/activities/cmm/cmmi/specs/aspec1.1.html, 23rd April.
Squires, B. 1990. Statistics in biomedical manuscripts: What editors want from authors and peer reviewers. Canadian Medical Association Journal 142(3): 213–214.
Suen, H., and Lee, P. 1985. The effects of the use of percentage agreement on behavioral observation reliabilities: A reassessment. Journal of Psychopathology and Behavioral Assessment 7(3): 221–234.
Umesh, U., Peterson, R., and Sauber, M. 1989. Interjudge agreement and the maximum value of kappa. Educational and Psychological Measurement 49: 835–850.
Woodman, I., and Hunter, R. 1996. Analysis of assessment data from phase 1 of the SPICE trials. IEEE TCSE Software Process Newsletter, No. 6, Spring 1996 (available at http://www-se.cs.mcgill.ca/process/spn.html).
Zeisel, H. 1955. The significance of insignificant differences. Public Opinion Quarterly 319–321.
Zwick, R. 1988. Another look at interrater agreement. Psychological Bulletin 103(3):374–378.