Skip to main content
Log in

Data Mining of Software Development Databases

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Software quality models can predict which modules will have high risk, enabling developers to target enhancement activities to the most problematic modules. However, many find collection of the underlying software product and process metrics a daunting task.

Many software development organizations routinely use very large databases for project management, configuration management, and problem reporting which record data on events during development. These large databases can be an unintrusive source of data for software quality modeling. However, multiplied by many releases of a legacy system or a broad product line, the amount of data can overwhelm manual analysis. The field of data mining is developing ways to find valuable bits of information in very large databases. This aptly describes our software quality modeling situation.

This paper presents a case study that applied data mining techniques to software quality modeling of a very large legacy telecommunications software system's configuration management and problem reporting databases. The case study illustrates how useful models can be built and applied without interfering with development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arthur, J.D. and Henry, S.M., Eds. 1995. Software process and product measurement, Ann. Software Engineering 1.

  • Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees, London, Chapman & Hall.

    Google Scholar 

  • Efron, B. 1983. Estimating the error rate of a prediction rule: improvement on cross-validation, J. Amer. Stat. Assoc. 78(382): 316-331.

    Google Scholar 

  • Fayyad, U.M. 1996. Data mining and knowledge discovery: Making sense out of data, IEEE Expert 11(4): 20-25.

    Google Scholar 

  • Fayyad, U.M., Haussler, D., and Stolorz, P. 1996a. Mining scientific data, Comm. ACM 39(11): 51-57.

    Google Scholar 

  • Fayyad, U.M., Piatetsky-Shapiro, G., and Smyth, P. 1996b. The KDD process for extracting useful knowledge from volumes of data, Comm. ACM 39(11): 27-34.

    Google Scholar 

  • Fenton, N.E. and Pfleeger, S.L. 1997. Software Metrics: A Rigorous and Practical Approach, 2nd ed., London, PWS Publishing.

    Google Scholar 

  • Gokhale, S.S. and Lyu, M.R. 1997. Regression tree modeling for the prediction of software quality. In Pham, H., Ed., Proc. Third ISSAT Int. Conf. on Reliability and Quality in Design, Anaheim, CA, pp. 31-36.

  • Hand, D.J. 1998. Data mining: Statistics and more? The Amer. Stat. 52(2): 112-118.

    Google Scholar 

  • Hudepohl, J.P., Aud, S.J., Khoshgoftaar, T.M., Allen, E.B., and Mayrand, J. 1996. Emerald: Software metricsand modelson the desktop, IEEE Software 13(5): 56-60.

    Google Scholar 

  • Jones, W.D., Hudepohl, J.P., Khoshgoftaar, T.M., and Allen, E.B. 1999. Application of a usage profile in software quality models. Proc. Third European Conf. on Software Maintenance and Reengineering, Amsterdam, Netherlands, pp. 148-157.

  • Khoshgoftaar, T.M. and Allen, E.B. 2000. A practical classification rule for software quality models, IEEE Trans. Reliability 49(2): 209-216.

    Google Scholar 

  • Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., and Hudepohl, J.P. 1999a. Data mining for predictiors of software quality, Int. J. Software Eng. Knowledge Eng. 9(5): 547-563.

    Google Scholar 

  • Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., and Hudepohl, J.P. 2000. Classification-tree models of software-quality over multiple releases, IEEE Trans. on Reliability 49(1): 4-11.

    Google Scholar 

  • Khoshgoftaar, T.M., Allen, E.B., Kalaichelvan, K.S., and Goel, N. 1996a. Early quality prediction: A case study in telecommunications, IEEE Software 13(1): 65-71.

    Google Scholar 

  • Khoshgoftaar, T.M., Allen, E.B., Kalaichelvan, K.S., and Goel, N. 1996b. The impact of software evolution and reuse on software quality, Empirical Software Eng.: An International Journal 1(1): 31-44.

    Google Scholar 

  • Khoshgoftaar, T.M., Allen, E.B., Naik, A., Jones, W.D., and Hudepohl, J.P. 1998. Using classification trees for software quality models: lessons learned, Proc. Third IEEE Int. High-Assurance Systems Engineering Symposium, Bethesda, MD USA, pp. 82-89.

  • Khoshgoftaar, T.M., Allen, E.B., Yuan, X., Jones, W.D., and Hudepohl, J.P. 1999b. Assessing uncertain predictionsof software quality. Proc. Sixth Int. Software Metrics Symposium, Boca Raton, FL USA, pp. 159-168.

  • Lachenbruch, P.A. and Mickey, M.R., 1968. Estimation of error rates in discriminant analysis, Technometrics 10(1): 1-11.

    Google Scholar 

  • Mayrand, J. and Coallier, F., 1996. System acquisition based on software product assessment. Proc. Eighteenth Int. Conf. on Software Engineering. Berlin, pp. 210-219.

  • Naik, A. 1998. Prediction of software quality using classification tree modeling. Master's thesis, Florida Atlantic University, Boca Raton, FL, USA. (Advised by Taghi M. Khoshgoftaar.)

    Google Scholar 

  • Oman, P. and Pfleeger, S.L., Eds., 1997. Applying Software Metrics, Los Alamitos, CA, IEEE Computer Society Press.

    Google Scholar 

  • Peeger, S.L., Jeffery, R., Curtis, B., and Kitchenham, B.A., 1997. Status report on software measurement, IEEE Software 14(2): 33-43.

    Google Scholar 

  • Porter, A.A. and Selby, R.W., 1990. Empirically guided software development using metric-based classification trees, IEEE Software 7(2): 46-54.

    Google Scholar 

  • Steinberg, D. and Colla, P., 1995. CART: A supplementary modules for SYSTAT. Salford Systems, San Diego, CA.

    Google Scholar 

  • Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: Approach, experimentation, evaluation, Proc. Eighth Int. Symposium on Software Reliability Engineering, Albuquerque, NM USA, pp. 222-233.

  • Troster, J. and Tian, J. 1995. Measurement and defect modeling for a legacy software system, Ann. Software Eng. 1: 95-118.

    Google Scholar 

  • Weir, N., Fayyad, U.M., and Djorgovski, S.G. 1995. Automated star/galaxy classification for digitized POSS-II, Astronomical J. 109(6): 2401-2412.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Allen, E.B., Jones, W.D. et al. Data Mining of Software Development Databases. Software Quality Journal 9, 161–176 (2001). https://doi.org/10.1023/A:1013349419545

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013349419545

Navigation