Abstract
Software quality models can predict which modules will have high risk, enabling developers to target enhancement activities to the most problematic modules. However, many find collection of the underlying software product and process metrics a daunting task.
Many software development organizations routinely use very large databases for project management, configuration management, and problem reporting which record data on events during development. These large databases can be an unintrusive source of data for software quality modeling. However, multiplied by many releases of a legacy system or a broad product line, the amount of data can overwhelm manual analysis. The field of data mining is developing ways to find valuable bits of information in very large databases. This aptly describes our software quality modeling situation.
This paper presents a case study that applied data mining techniques to software quality modeling of a very large legacy telecommunications software system's configuration management and problem reporting databases. The case study illustrates how useful models can be built and applied without interfering with development.
Similar content being viewed by others
References
Arthur, J.D. and Henry, S.M., Eds. 1995. Software process and product measurement, Ann. Software Engineering 1.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees, London, Chapman & Hall.
Efron, B. 1983. Estimating the error rate of a prediction rule: improvement on cross-validation, J. Amer. Stat. Assoc. 78(382): 316-331.
Fayyad, U.M. 1996. Data mining and knowledge discovery: Making sense out of data, IEEE Expert 11(4): 20-25.
Fayyad, U.M., Haussler, D., and Stolorz, P. 1996a. Mining scientific data, Comm. ACM 39(11): 51-57.
Fayyad, U.M., Piatetsky-Shapiro, G., and Smyth, P. 1996b. The KDD process for extracting useful knowledge from volumes of data, Comm. ACM 39(11): 27-34.
Fenton, N.E. and Pfleeger, S.L. 1997. Software Metrics: A Rigorous and Practical Approach, 2nd ed., London, PWS Publishing.
Gokhale, S.S. and Lyu, M.R. 1997. Regression tree modeling for the prediction of software quality. In Pham, H., Ed., Proc. Third ISSAT Int. Conf. on Reliability and Quality in Design, Anaheim, CA, pp. 31-36.
Hand, D.J. 1998. Data mining: Statistics and more? The Amer. Stat. 52(2): 112-118.
Hudepohl, J.P., Aud, S.J., Khoshgoftaar, T.M., Allen, E.B., and Mayrand, J. 1996. Emerald: Software metricsand modelson the desktop, IEEE Software 13(5): 56-60.
Jones, W.D., Hudepohl, J.P., Khoshgoftaar, T.M., and Allen, E.B. 1999. Application of a usage profile in software quality models. Proc. Third European Conf. on Software Maintenance and Reengineering, Amsterdam, Netherlands, pp. 148-157.
Khoshgoftaar, T.M. and Allen, E.B. 2000. A practical classification rule for software quality models, IEEE Trans. Reliability 49(2): 209-216.
Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., and Hudepohl, J.P. 1999a. Data mining for predictiors of software quality, Int. J. Software Eng. Knowledge Eng. 9(5): 547-563.
Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., and Hudepohl, J.P. 2000. Classification-tree models of software-quality over multiple releases, IEEE Trans. on Reliability 49(1): 4-11.
Khoshgoftaar, T.M., Allen, E.B., Kalaichelvan, K.S., and Goel, N. 1996a. Early quality prediction: A case study in telecommunications, IEEE Software 13(1): 65-71.
Khoshgoftaar, T.M., Allen, E.B., Kalaichelvan, K.S., and Goel, N. 1996b. The impact of software evolution and reuse on software quality, Empirical Software Eng.: An International Journal 1(1): 31-44.
Khoshgoftaar, T.M., Allen, E.B., Naik, A., Jones, W.D., and Hudepohl, J.P. 1998. Using classification trees for software quality models: lessons learned, Proc. Third IEEE Int. High-Assurance Systems Engineering Symposium, Bethesda, MD USA, pp. 82-89.
Khoshgoftaar, T.M., Allen, E.B., Yuan, X., Jones, W.D., and Hudepohl, J.P. 1999b. Assessing uncertain predictionsof software quality. Proc. Sixth Int. Software Metrics Symposium, Boca Raton, FL USA, pp. 159-168.
Lachenbruch, P.A. and Mickey, M.R., 1968. Estimation of error rates in discriminant analysis, Technometrics 10(1): 1-11.
Mayrand, J. and Coallier, F., 1996. System acquisition based on software product assessment. Proc. Eighteenth Int. Conf. on Software Engineering. Berlin, pp. 210-219.
Naik, A. 1998. Prediction of software quality using classification tree modeling. Master's thesis, Florida Atlantic University, Boca Raton, FL, USA. (Advised by Taghi M. Khoshgoftaar.)
Oman, P. and Pfleeger, S.L., Eds., 1997. Applying Software Metrics, Los Alamitos, CA, IEEE Computer Society Press.
Peeger, S.L., Jeffery, R., Curtis, B., and Kitchenham, B.A., 1997. Status report on software measurement, IEEE Software 14(2): 33-43.
Porter, A.A. and Selby, R.W., 1990. Empirically guided software development using metric-based classification trees, IEEE Software 7(2): 46-54.
Steinberg, D. and Colla, P., 1995. CART: A supplementary modules for SYSTAT. Salford Systems, San Diego, CA.
Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: Approach, experimentation, evaluation, Proc. Eighth Int. Symposium on Software Reliability Engineering, Albuquerque, NM USA, pp. 222-233.
Troster, J. and Tian, J. 1995. Measurement and defect modeling for a legacy software system, Ann. Software Eng. 1: 95-118.
Weir, N., Fayyad, U.M., and Djorgovski, S.G. 1995. Automated star/galaxy classification for digitized POSS-II, Astronomical J. 109(6): 2401-2412.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Khoshgoftaar, T.M., Allen, E.B., Jones, W.D. et al. Data Mining of Software Development Databases. Software Quality Journal 9, 161–176 (2001). https://doi.org/10.1023/A:1013349419545
Issue Date:
DOI: https://doi.org/10.1023/A:1013349419545