Skip to main content

The Fallacy of Shotgun Correlations for Software Measures

  • Conference paper
Computing Science and Statistics

Abstract

Many software measures have been forwarded on the simple basis of a high linear correlation coefficient with some measurable quantities. The linear correlation coefficient is an unreliable statistic for deciding whether an observed correlation indicates significant association. Several published software measure experiments collected upwards of 20 different measurements or have fourteen or fewer observations. With considerable data from small samples, the probability of “discovering” a “significant” correlation is high. We present a computer simulation experiment where the correlation between sets of randomly generated numbers is calculated. We also look at randomly generated numbers in the ranges that would be expected in Halstead’s Software Science measures. Our results show that the average maximum linear correlation for randomly generated numbers is.70 or higher if the sample size is low compared to the number of variables. Alternative statistical approaches to obtain meaningful significant results is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. Albrecht, Allan J., and John E. Gaffney, Jr. “Software Function, Source Lines of Code and Development Effort Prediction: A Software Science Validation.” IEEE Transactions of Software Engineering. Vol. SE-9 (1983) pp. 639–648.

    Article  Google Scholar 

  2. Baker, Albert L., James M. Bieman, David A. Gustafson, and Austin C. Melton. “Modeling and Measuring the Software Development Process.” Proc. of the Twentieth Hawaii International Conference on Systems Sciences. (January 1987) pp 23–30.

    Google Scholar 

  3. Basili, Victor R., and Richard W. Reiter, and Tsai-Yun Phillips. “Metric Analysis and Data Validation Across FORTRAN Projects.” IEEE Transactions of Software Engineering. Vol. SE-9 (1983) pp. 652–663.

    Article  Google Scholar 

  4. Card, David N., and William W. Agresti. “Measuring Software Design Complexity.” Journal of Systems and Software. Vol. 8 (1988) pp. 185–197.

    Article  Google Scholar 

  5. Courtney, Richard E. and David A. Gustafson. Preliminary Study of Shotgun Correlations in Software Measures. Tech Report TR-CS-90-9 Department of Computing and Information Sciences, Kansas State University, Manhattan, KS. (1990).

    Google Scholar 

  6. Edwards, William R, Chi-Ming Chung, and Ming-Gaey Yang. “A Study of Data Flow and Testing-Specific Metrics.” in Proc. of 11th Minnowbrook Workshop on Software Reliability. (1988).

    Google Scholar 

  7. Halstead, M.H. Elements of Software Science. New York: North-Holland (Elsevier Computer Science Library), 1977.

    MATH  Google Scholar 

  8. Hwang, Chern-Hwang. “An Empirical Investigation of Halstead’s Software Length Formula.” Masters Report; Kansas State University (1988).

    Google Scholar 

  9. Kearney, Joseph K., Robert L. Sedlmeyer, William B. Thompson, Michael A. Gray, and Michael A. Adler. “Software Complexity Measurement.” Communications of the ACM. Vol. 29 (November 1986) pp. 1044–1050.

    Article  Google Scholar 

  10. Kitchenham, Barbara A. and N. R. Taylor. “Software Project Development Cost Estimation.” Journal of Systems and Software. Vol. 5 (1985) pp. 267–278.

    Article  Google Scholar 

  11. McCabe, T.J. “A Complexity Measure.” IEEE Transactions of Software Engineering. Vol. SE-2 (1976) pp. 308–320.

    Article  MathSciNet  Google Scholar 

  12. van der Poel, Klaas G., and Stephen R. Schach. “A Software Metric for Cost Estimation and Efficiency Measurement in Data Processing System Development.” Journal of Systems and Software. Vol. 3 (1983) pp. 187–191.

    Article  Google Scholar 

  13. Press, Flannery, Teukolsky and Vetterling. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press (1988).

    Google Scholar 

  14. Takahasi, Mueno and Yuji Kamayachi. “An Empirical Study of a Model for Program Error Prediction.” IEEE Transactions of Software Engineering. Vol SE-15, (1989) pp. 82–86.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag New York, Inc.

About this paper

Cite this paper

Courtney, R.E., Gustafson, D.A. (1992). The Fallacy of Shotgun Correlations for Software Measures. In: Page, C., LePage, R. (eds) Computing Science and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2856-1_45

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2856-1_45

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-97719-5

  • Online ISBN: 978-1-4612-2856-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics