On the proposal and evaluation of a benchmark-based threshold derivation method

Vale, Gustavo; Fernandes, Eduardo; Figueiredo, Eduardo

doi:10.1007/s11219-018-9405-y

On the proposal and evaluation of a benchmark-based threshold derivation method

Published: 01 May 2018

Volume 27, pages 275–306, (2019)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

495 Accesses
16 Citations
Explore all metrics

Abstract

Software-intensive systems have been growing in both size and complexity. Consequently, developers need better support for measuring and controlling the software quality. In this context, software metrics aim at quantifying different software quality aspects. However, the effectiveness of measurement depends on the definition of reliable metric thresholds, i.e., numbers that characterize a metric value as critical given a quality aspect. In fact, without proper metric thresholds, it might be difficult for developers to indicate problematic software components for correction, for instance. Based on a literature review, we have found several existing methods for deriving metric thresholds and observed their evolution. Such evolution motivated us to propose a new method that incorporates the best of the existing methods. In this paper, we propose a novel benchmark-based method for deriving metric thresholds. We assess our method, called Vale’s method, using a set of metric thresholds derived with the support of our method, aimed at composing detection strategies for two well-known code smells, namely god class and lazy class. For this purpose, we analyze three benchmarks composed of multiple software product lines. In addition, we demonstrate our method in practice by applying it to a benchmark composed of 103 Java open-source software systems. In the evaluation, we compare Vale’s method to two state-of-the-practice threshold derivation methods selected as a baseline, which are Lanza’s method and Alves’ method. Our results suggest that the proposed method provides more realistic and reliable thresholds, with better recall and precision in the code smell detection, when compared to both baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A metrics suite for JUnit test code: a multiple case study on open source software

Article Open access 30 December 2014

Empirical software metrics for benchmarking of verification tools

Article Open access 10 January 2017

Test smells 20 years later: detectability, validity, and reliability

Article Open access 20 September 2022

Notes

www.qualitascorpus.com

References

Abilio, R., Vale, G., Oliveira, J., Figueiredo, E., Costa, H. (2014). Code smell detection tool for compositional-based software product lines. In Proceedings of the 5th Brazilian Conference on Software: Theory and Practice (CBSoft), Tools Session (pp. 109–116).
Abilio, R., Padilha, J., Figueiredo, E., Costa, H. (2015). Detecting code smells in software product lines: An exploratory study. In Proceedings of the 12th International Conference on Information Technology: New Generations (ITNG) (pp. 433–438).
Abilio, R., Vale, G., Figueiredo, E., Costa, H. (2016). Metrics for feature-oriented programming. In Proceedings of 7th International Workshop on Emerging Trends in Software Metrics (WETSoM) (pp. 36–42).
Alves, T., Ypma, C., Visser, J. (2010). Deriving metric thresholds from benchmark data. In Proceedings of the 26th International Conference on Software Maintenance (ICSM) (pp. 1–10).
Apel, S., Kästner, C., Lengauer, C. (2009). FeatureHouse: language-independent, automated software composition. In Proceedings of the 31st International Conference on Software Engineering (ICSE) (pp. 221–231).
Batory, D., & O’Malley, S. (1992). The design and implementation of hierarchical software systems with reusable components. ACM Transactions on Software Engineering Methodology, 1(4), 335–398.
Article Google Scholar
Brereton, P., Kitchenham, B., Budgen, D., Tumer, M., & Khalil, M. (2007). Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software (JSS), 80(4), 571–583.
Article Google Scholar
Chidamber, S., & Kemerer, C. (1994). A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 20(6), 476–493.
Article Google Scholar
Coleman, D., Lowther, B., & Oman, P. (1995). The application of software maintainability models in industrial software systems. Journal of Systems and Software, 29(1), 3–16.
Article Google Scholar
Concas, G., Marchesi, M., Pinna, S., & Serra, N. (2007). Power-laws in a large object-oriented software system. IEEE Transactions on Software Engineering, 33(10), 687–708.
Article Google Scholar
Dumke, R., & Winkler, A. (1997). Managing the component-based software engineering with metrics. In Proceedings of the 5th International Symposium on Assessment of Software Tools and Technologies (SAST) (pp. 104–110).
Erni, K., & Lewerentz, C. (1996). Applying design-metrics to object-oriented frameworks. In Proceedings of the 3rd International Symposium on Software Metrics (METRICS) (pp. 64–72).
FeatureIDE. (2017). https://urldefense.proofpoint.com/v2/url?u=https-3A__featureide.github.io_&d=DwIDaQ&c=vh6FgFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJhSqx8&r=ao7Vv0uBvR-wgd0ykVbHjMjeV7vz8HzQ1TmA0JYOvoNuKuAvWNpcltsySHnxfZLM&m=NXtTfn_yFTngAOYPIXUR6CiVMbeellwiOxvPQCx-ywc&s=7fD1VjMY-X8pVI_KRWFYU0ymJlMEQ-rXjpSLxafCo3A&e=.
Fenton, N. (1991). Software metrics: a rigorous Approach (pp. 28–37). London: Chapman-Hall.
MATH Google Scholar
Fernandes, E., Oliveira, J., Vale, G., Paiva, T., Figueiredo, E. (2016). A review-based comparative study of bad smell detection tools. In Proceedings of the 20th International Conference on Evaluation and assessment in software engineering (EASE). Limerick, 1–3 June 2016.
Fernandes, E., Vale, G., Sousa, L., Figueiredo, E., Garcia, A., Lee, J. (2017). No code anomaly is an island: anomaly agglomeration as sign of product line instabilities. In Proceedings of the 16th International Conference on Software Reuse (ICSR), pp. 48–64.
Ferreira, K., Bigonha, M., Bigonha, R., Mendes, L., & Almeida, H. (2012). Identifying thresholds for object-oriented software metrics. Journal of Systems and Software, 85(2), 244–257.
Article Google Scholar
Ferreira, G., Gaia, F., Figueiredo, E., & Maia, M. (2014). On the use of feature-oriented programming for evolving software product lines: a comparative study. Science Computer Programming, 93(1), 65–85.
Article Google Scholar
Figueiredo, E., Cacho, N., Sant’Anna, C, Monteiro M, Kulesza U, Garcia A, Soares S, Ferrari F, Khan S, Castor Filho F, Dantas F (2008) Evolving software product lines with aspects: an empirical study on design stability. In: Proceeding of the 30th iInternational Conference on Software Engineering (ICSE) (pp. 261–270). Leipzig: IEEE Computer Society.
Fowler, M. (1999). Refactoring: improving the design of existing code. Reading: Addison Wesley.
French, V. (1999). Establishing software metric thresholds. In Proceedings of the 4th International Workshop on Software Measurement (IWSM).
Gamma, G., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns: elements of reusable object-oriented software. Reading: Addison-Wesley.
Heitlager, I., Kuipers, T., & Visser, J. (2007). A practical model for measuring maintainability. In Proceedings of the 6th International Conference on the Quality of Information and Communications Technology (QUATIC) (pp. 30–39).
Herbold, S., Grabowki, J., & Waack, S. (2011). Calculation and optimization of thresholds for sets of software metrics. Empirical Software Engineering, 16(6), 812–841.
Article Google Scholar
Kitchenham, B., & Charters, S. (2007). Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report, Keele University.
LabSoft (2017). http://labsoft.dcc.ufmg.br/doku.php?id=%20about:spl_list.
Lanza, M., & Marinescu, R. (2006). Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Berlin Heidelberg: Springer-Verlag.
Lima, E., Resende, A., & Lethbridge, T. (2016). The uncomfortable discrepancies of software metric thresholds and reference values in literature. In Proceedings of the 6th International Conference on Software Engineering Advances (ICSEA) (pp. 1–9).
Lorenz, M., & Kidd, J. (1994). Object-oriented software metrics. New York: Englewood Cliffs.
Louridas, P., Spinellis, D., & Vlachos, V. (2008). Power laws in software. ACM Transactions on Software Engineering Methodology, 18(1), 1–26.
Article Google Scholar
Marinescu, R. (2004). Detection strategies: metrics-based rules for detecting design flaws. In Proceedings of the 20th International Conference on Software Maintainability (ICSM) (pp. 350–359).
McCabe, T. (1976). A complexity measure. IEEE Transactions on Software Engineering, 2(4), 308–320.
Article MathSciNet MATH Google Scholar
Mori, A., Vale, G., Viggiato, M., Oliveira, J., Figueiredo, E., Cirilo, E., Jamshidi, P., Kastner, C. (2018) Evaluating domain-specific metric thresholds: an empirical study. International Conference on Technical Debt (TechDebt).
Munro, M. (2005). Product metrics for automatic identification of “bad smell” design problems in java source-code. In Proceeding of the 11th international software METRICS symposium (METRICS) (pp. 1–9).
Nejmeh, B. (1988). NPATH: A measure of execution path complexity and its applications. Communications of the ACM, 31(2), 188–200.
Article Google Scholar
Oliveira, P., Valente, M., Lima, F. (2014). Extracting relative thresholds for source code metrics. In Proceedings of the 18th International Conference on Software Maintenance and Reengineering (CSMR) (pp. 254–263).
Padilha, J., Pereira, J., Figueiredo, E., Almeida, J., Garcia, A., Sant'Anna, C.. (2014) On the effectiveness of concern metrics to detect code smells: an empirical study. In Proceedings of the 26th International Conference on Advanced Information Systems Engineering (CAiSE).
Perkusich, M., Medeiros, A., Silva, L., Gorgônio, K., Almeida, H., Perkusich, A. (2015). A Bayesian network approach to assist on the interpretation of software metrics. In Proceedings of the 30th Symposium on Applied Computing (SAC) (pp. 1498–1503).
Riel, J. (1996). Object-oriented design heuristics. Boston: Addison-Wesley.
Schulze, S., Apel, S., Kastner, C. (2010). Code clones in feature-oriented software product lines. In Proceedings of the 9th International Conference on Generative Programming and Component Engineering (GPCE) (pp. 103–112).
Software Engineering Institute – SEI (2016). http://www.sei.cmu.edu/productlines/
Spinellis, D. (2008). A tale of four kernels. In Proceedings of the 30th International Conference on Software Engineering (ICSE) (pp. 381–390).
SPL2GO (2015). http://spl2go.cs.ovgu.de.
Tempero, E., Anslow, C., Dietrich, J., Han, T., Li, J., Lumpe, M., Melton, H., Noble, J. (2010). Qualitas Corpus: A curated collection of Java code for empirical studies. In Proceedings of 17th the Asia-Pacific Software Engineering Conference (APSEC) (pp. 336–345).
Vale, G., & Figueiredo, E. (2015). A method to derive metric thresholds for software product lines. In Proceedings of the 29th Brazilian Symposium on Software Engineering (SBES) (pp. 110–119).
Vale, G., Albuquerque, D., Figueiredo, E., Garcia, A. (2015). Defining metric thresholds for software product lines: a comparative study. In Proceedings of the 19th International Software Product Line Conference (SPLC) (pp. 176–185).
Vasa, R., Lumpe, M., Branch, P., Nierstrasz, O. (2009). Comparative analysis of evolving software systems using the Gini coefficient. In Proceedings of the 25th International Conference on Software Maintenance (ICSM) (pp. 179–188).
Veado, L., Vale, G., Fernandes, E., Figueiredo, E. (2016). TDTool: threshold derivation tool. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE), Tools Session (Article No. 24).

Download references

Acknowledgments

This work was partially supported by CAPES, CNPq (grant 424340/2016-0 and 290136/2015-6), and FAPEMIG (grant PPM-00651-17).

Author information

Authors and Affiliations

Department of Computer Science, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
Gustavo Vale, Eduardo Fernandes & Eduardo Figueiredo
Department of Computer Science, University of Passau, Passau, Germany
Gustavo Vale
Informatics Department, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil
Eduardo Fernandes

Authors

Gustavo Vale
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Fernandes
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gustavo Vale.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vale, G., Fernandes, E. & Figueiredo, E. On the proposal and evaluation of a benchmark-based threshold derivation method. Software Qual J 27, 275–306 (2019). https://doi.org/10.1007/s11219-018-9405-y

Download citation

Published: 01 May 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s11219-018-9405-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the proposal and evaluation of a benchmark-based threshold derivation method

Abstract

Access this article

Similar content being viewed by others

A metrics suite for JUnit test code: a multiple case study on open source software

Empirical software metrics for benchmarking of verification tools

Test smells 20 years later: detectability, validity, and reliability

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the proposal and evaluation of a benchmark-based threshold derivation method

Abstract

Access this article

Similar content being viewed by others

A metrics suite for JUnit test code: a multiple case study on open source software

Empirical software metrics for benchmarking of verification tools

Test smells 20 years later: detectability, validity, and reliability

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation