Skip to main content

Revisiting the debate: Are code metrics useful for measuring maintenance effort?

Abstract

Evaluating and predicting software maintenance effort using source code metrics is one of the holy grails of software engineering. Unfortunately, previous research has provided contradictory evidence in this regard. The debate is still open: as a community we are not certain about the relationship between code metrics and maintenance impact. In this study we investigate whether source code metrics can indeed establish maintenance effort at the previously unexplored method level granularity. We consider \(\sim \)730K Java methods originating from 47 popular open source projects. After considering seven popular method level code metrics and using change proneness as a maintenance effort indicator, we demonstrate why past studies contradict one another while examining the same data. We also show that evaluation context is king. Therefore, future research should step away from trying to devise generic maintenance models and should develop models that account for the maintenance indicator being used and the size of the methods being analyzed. Ultimately, we show that future source code metrics can be applied reliably and that these metrics can provide insight into maintenance effort when they are applied in a judiciously context-sensitive manner.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. https://github.com/shaifulcse/codemetrics-with-context-replication

  2. https://verifysoft.com/en_maintainability.html: last accessed: December-28-2021

  3. https://docs.microsoft.com/en-us/visualstudio/code-quality/code-metrics-maintainability-index-range-and-meaning?view=vs-2022: last accessed: December-28-2021

References

  • Abid N J, Sharif B, Dragan N, Alrasheed H, Maletic J I (2019) Developer reading behavior while summarizing java methods: Size and context matters. In: Proceedings of the 41st international conference on software engineering, pp 384–395

  • Alfadel M, Kobilica A, Hassine J (2017) Evaluation of halstead and cyclomatic complexity metrics in measuring defect density. In: 2017 9th IEEE-GCC conference and exhibition, pp 1–9

  • Alsolai H, Roper M, Nassar D (2018) Predicting software maintainability in object-oriented systems using ensemble techniques. In: 2018 IEEE International conference on software maintenance and evolution, pp 716–721

  • Alves T L, Ypma C, Visser J (2010) Deriving metric thresholds from benchmark data. In: IEEE International conference on software maintenance, pp 1–10

  • Aniche M F, Treude C, Zaidman A, van Deursen A, Gerosa M A (2016) SATT: Tailoring code metric thresholds for different software architectures. In: 16th IEEE International working conference on source code analysis and manipulation, 2016, Raleigh, NC, USA, October 2-3, 2016, pp 41–50

  • Antinyan V, Staron M, Derehag J, Runsten M, Wikström E, Meding W, Henriksson A, Hansson J (2015) Identifying complex functions: By investigating various aspects of code complexity. In: 2015 Science and information conference (SAI), pp 879–888

  • Antinyan V, Staron M, Meding W, Österström P, Wikstrom E, Wranker J, Henriksson A, Hansson J (2014) Identifying risky areas of software code in agile/lean software development: An industrial experience report. In: IEEE Conference on software maintenance, reengineering, and reverse engineering, pp 154–163

  • Antinyan V, Staron M, Sandberg A (2017) Evaluating code complexity triggers, use of complexity measures and the influence of code complexity on maintenance time. Empirical Softw Engg 22(6):3057–3087

    Article  Google Scholar 

  • Athanasiou D, Nugroho A, Visser J, Zaidman A (2014) Test code quality and its relation to issue handling performance. IEEE Trans Software Eng 40(11):1100–1125

    Article  Google Scholar 

  • Bandi R K, Vaishnavi V K, Turk D E (2003) Predicting maintenance performance using object-oriented design complexity metrics. IEEE Trans Softw Eng 29(1):77–87

    Article  Google Scholar 

  • Bauer J, Siegmund J, Peitek N, Hofmeister J C, Apel S (2019) Indentation: Simply a matter of style or support for program comprehension?. In: IEEE/ACM 27th International conference on program comprehension, pp 154–164

  • Bavota G, Linares-Vásquez M, Bernal-Cárdenas C E, Penta M D, Oliveto R, Poshyvanyk D (2015) The impact of api change- and fault-proneness on the user ratings of Android apps. IEEE Trans Softw Eng 41(4):384–407

    Article  Google Scholar 

  • Bell R M, Ostrand T J, Weyuker E J (2011) Does measuring code change improve fault prediction?. In: Proceedings of the 7th international conference on predictive models in software engineering, Promise ’11

  • Börstler J, Paech B (2016) The role of method chains and comments in software readability and comprehension–an experiment. IEEE Trans Softw Eng 42 (9):886–898

    Article  Google Scholar 

  • Brittain J M (1982) Pitfalls of user research, and some neglected areas. Soc Sci Inf Stud 2(3):139–148

    Google Scholar 

  • Buse RP L, Weimer W R (2010) Learning a metric for code readability. IEEE Trans Softw Eng 36(4):546–558

    Article  Google Scholar 

  • Catolino G, Palomba F, De Lucia A, Ferrucci F, Zaidman A (2018) Enhancing change prediction models using developer-related factors. J Syst Softw 143:14–28

    Article  Google Scholar 

  • Chen Y T, Gopinath R, Tadakamalla A, Ernst M D, Holmes R, Fraser G, Ammann P, Just R (2020) Revisiting the relationship between fault detection, test adequacy criteria, and test set size. In: 2020 35th IEEE/ACM International conference on automated software engineering (ASE), pp 237–249

  • Chidamber S R, Kemerer C F (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Chowdhury S, Borle S, Romansky S, Hindle A (2019) Greenscaler: Training software energy models with automatic test generation. Emp Softw Eng: Int J 24(4):1649–1692

    Article  Google Scholar 

  • Coleman D, Ash D, Lowther B, Oman P (1994) Using metrics to evaluate software system maintainability. Computer 27(8):44–49

    Article  Google Scholar 

  • Cruz L, Abreu R, Grundy J, Li L, Xia X (2019) Do energy-oriented changes hinder maintainability?. In: 2019 IEEE International conference on software maintenance and evolution, pp 29–40

  • Curtis B, Sheppard S B, Milliman P, Borst M A, Love T (1979) Measuring the psychological complexity of software maintenance tasks with the halstead and mccabe metrics. IEEE Trans Softw Eng SE-5(2):96–104

    Article  Google Scholar 

  • Darcy D P, Kemerer C F, Slaughter S A, Tomayko J E (2005) The structural complexity of software an experimental test. IEEE Trans Softw Eng 31(11):982–995

    Article  Google Scholar 

  • Ebert C, Cain J, Antoniol G, Counsell S, Laplante P (2016) Cyclomatic complexity. IEEE Softw 33(6):27–29

    Article  Google Scholar 

  • El Emam K, Benlarbi S, Goel N, Rai S N (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650

    Article  Google Scholar 

  • Fenton N, Kitchenham B (1991) Validating software measures. Softw Test Verif Reliab 1(2):27–42

    Article  Google Scholar 

  • Giger E, D’Ambros M, Pinzger M, Gall H C (2012) Method-level bug prediction. In: Proceedings of the 2012 ACM-IEEE international symposium on empirical software engineering and measurement, pp 171–180

  • Gil Y, Lalouche G (2016) When do software complexity metrics mean nothing? — when examined out of context. J Obj Technol 15(1):2:1–25

    Article  Google Scholar 

  • Gil Y, Lalouche G (2017) On the correlation between size and metric validity. Empir Softw Eng 22(5):2585–2611

    Article  Google Scholar 

  • Gopinath R, Jensen C, Groce A (2014) Code coverage for suite evaluation by developers. In: Proceedings of the 36th international conference on software engineering, pp 72–82

  • Grund F, Chowdhury S, Bradley N C, Hall B, Holmes R (2021) Codeshovel: A reusable and available tool for extracting source code histories. In: 2021 IEEE/ACM 43rd international conference on software engineering: Companion proceedings (ICSE-Companion), pp 221–222

  • Grund F, Chowdhury S, Bradley N C, Hall B, Holmes R (2021) Codeshovel: Constructing method-level source code histories. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp 1510–1522

  • Hata H, Mizuno O, Kikuno T (2011) Historage: Fine-grained version control system for java. In: Proc. International workshop on principles of software evolution and ERCIM workshop on software evolution, pp 96–100

  • Herraiz I, Gonzalez-Barahona J M, Robles G (2007) Towards a theoretical model for software growth. In: Fourth international workshop on mining software repositories, pp 21–21

  • Herzig K, Zeller A (2013) The impact of tangled code changes. In: 2013 10th Working conference on mining software repositories, pp 121–130

  • Higo Y, Hayashi S, Kusumoto S (2020) On tracking java methods with git mechanisms. J Syst Softw 165:110571

    Article  Google Scholar 

  • Hindle A, Godfrey M W, Holt R C (2008) Reading beside the lines: Indentation as a proxy for complexity metric. In: 16th IEEE International conference on program comprehension, pp 133–142

  • Hofmeister J, Siegmund J, Holt D V (2017) Shorter identifier names take longer to comprehend. In: IEEE 24th International conference on software analysis, evolution and reengineering, pp 217–227

  • Inozemtseva L, Holmes R (2014) Coverage is not strongly correlated with test suite effectiveness. In: Proceedings of the 36th international conference on software engineering, pp 435–445

  • Islam M R, Zibran M F (2020) How bugs are fixed: Exposing bug-fix patterns with edits and nesting levels. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 1523–1531

  • Johnson J, Lubo S, Yedla N, Aponte J, Sharif B (2019) An empirical study assessing source code readability in comprehension. In: 2019 IEEE International conference on software maintenance and evolution, pp 513–523

  • Just R, Jalali D, Inozemtseva L, Ernst M D, Holmes R, Fraser G (2014) Are mutants a valid substitute for real faults in software testing?. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 654–665

  • Kafura D, Reddy G R (1987) The use of software complexity metrics in software maintenance. IEEE Trans Softw Eng SE-13(3):335–343

    Article  Google Scholar 

  • Khomh F, Penta M D, Guéhéneuc Y-G, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change- and fault-proneness. Emp Softw Eng: Int J 17(3):243–275

    Article  Google Scholar 

  • Kondo M, German D M, Mizuno O, Choi E-H (2020) The impact of context metrics on just-in-time defect prediction. Emp Softw Eng 25(1):890–939

    Article  Google Scholar 

  • Kotu V, Deshpande B (2015) Chapter 2 - data mining process. In: Kotu V, Deshpande B (eds) Predictive analytics and data mining. Morgan Kaufmann, Boston, pp 17–36 LakeA,CookC R:Useoffactoranalysistodevelopoopsoftware complexitymetrics.USA,1994.

  • LandmanD,SerebrenikA,VinjuJ(2014)Empiricalanalysisoftherelationship betweenccandslocinalargecorpusofjavamethods.In: IEEEInternationalconference onsoftwaremaintenanceandevolution,pp221–230

  • LehmanM M,RamilJ F,WernickP D,PerryD E,TurskiW M (1997)Metricsandlawsofsoftwareevolution-theninetiesview.In: International softwaremetricssymposium,pp20–32

  • LenarduzziV,SillittiA,TaibiD(2017)Analyzingfortyyearsofsoftware maintenancemodels.In: Internationalconferenceonsoftwareengineeringcompanion (ICSE-C),pp146–148

  • LevenshteinV I(1966)Binarycodescapableofcorrectingdeletions,insertions,and reversals.In: Sovietphysicsdoklady,vol 10,pp707–710

  • MatterD,KuhnA,NierstraszO(2009)Assigningbugreportsusinga vocabulary-basedexpertisemodelofdevelopers.In: 20096thIEEEInternationalworking conferenceonminingsoftwarerepositories,pp131–140

  • McCabeT J(1976)Acomplexitymeasure.IEEETransSoftwEng SE-2(4):308–320

    Google Scholar 

  • McClureC L(1978)Amodelforprogramcomplexityanalysis.In: Proceedings ofthe3rdinternationalconferenceonsoftwareengineering,pp149–157

  • MenziesT,GreenwaldJ,FrankA(2007)Dataminingstaticcodeattributes tolearndefectpredictors.IEEETransSoftwEng 33(1):2–13

    Google Scholar 

  • MoR,CaiY,KazmanR,XiaoL,FengQ(2016)Decoupling level:Anewmetricforarchitecturalmaintenancecomplexity.In: 2016IEEE/ACM38th internationalconferenceonsoftwareengineering,pp499–510

  • MockuA,VottaL G(2000)Identifyingreasonsforsoftwarechangesusinghistoric databases.In: Proceedings2000Internationalconferenceonsoftwaremaintenance,pp 120–130

  • MondenA,NakaeD,KamiyaT,SatoS,MatsumotoK(2002) Softwarequalityanalysisbycodeclonesinindustriallegacysoftware.In: Proceedings IEEEsymposiumonsoftwaremetrics,pp87–94

  • MoserR,PedryczW,SucciG(2008)Analysisofthereliabilityofa subsetofchangemetricsfordefectprediction.In: ProceedingsoftheSecondACM-IEEE internationalsymposiumonempiricalsoftwareengineeringandmeasurement,ESEM’08, pp309–311

  • NagappanN,BallT(2005)Useofrelativecodechurnmeasurestopredictsystem defectdensity.In: Proceedings.27thInternationalconferenceonsoftwareengineering, pp284–292

  • OmanP,HagemeisterJ(1992)Metricsforassessingasoftwaresystem’s maintainability.In: Proceedingsconferenceonsoftwaremaintenance1992,pp337–344

  • PalombaF,ZaidmanA,OlivetoR,De LuciaA(2017)Anexploratory studyontherelationshipbetweenchangesandrefactoring.In: Proceedingsofthe25th internationalconferenceonprogramcomprehension,pp176–185

  • PantiuchinaJ,LanzaM,BavotaG(2018)Improvingcode:The(mis) perceptionofqualitymetrics.In: IEEEInternationalconferenceonsoftwaremaintenance andevolution,pp80–91

  • PapadakisM,ShinD,YooS,BaeD-H(2018)Aremutationscores correlatedwithrealfaultdetection? Alargescaleempiricalstudyontherelationship betweenmutantsandrealfaults.In: Proceedingsofthe40thinternationalconference onsoftwareengineering,pp537–548

  • PascarellaL,PalombaF,BacchelliA(2020)Ontheperformanceofmethod-level bugprediction:Anegativeresult.JSystSoftw,161

  • PosnettD,HindleA,DevanbuP(2011)Asimplermodelofsoftwarereadability. In: Proceedingsofthe8thworkingconferenceonminingsoftwarerepositories,pp73–82

  • RadjenovićD,HeričkoM,TorkarR,živkovičA(2013)Softwarefault predictionmetrics:Asystematicliteraturereview.InfSoftwTechnol 55 (8):1397–1418

    Google Scholar 

  • RahmanM S,RoyC K(2017)Ontherelationshipsbetweenstabilityand bug-pronenessofcodeclones:Anempiricalstudy.In: 2017IEEE17thInternational workingconferenceonsourcecodeanalysisandmanipulation(SCAM),pp131–140

  • RalphP,TemperoE(2018)Constructvalidityinsoftwareengineeringresearch andsoftwaremetrics.In: Proceedingsofthe22ndInternationalconferenceonevaluation andassessmentinsoftwareengineering2018,pp13–23

  • RayB,HellendoornV,GodhaneS,TuZ,BacchelliA,DevanbuP (2016)Onthe“naturalness”ofbuggycode.In: Proceedingsofthe38thinternational conferenceonsoftwareengineering.ICSE’16,pp428–439

  • RobertB,CorreiaJ P,SchillK,VisserJ(2012)Standardizedcode qualitybenchmarkingforimprovingsoftwaremaintainability.SoftwQual J 20:287–307

    Google Scholar 

  • RomanoD,PinzgerM(2011)Usingsourcecodemetricstopredictchange-prone javainterfaces.In: 201127thIEEEInternationalconferenceonsoftwaremaintenance, pp303–312

  • RomanoJ,KromreyJ D,CoraggioJ,SkowronekJ(2006)Appropriate statisticsforordinalleveldata:Shouldwereallybeusingt-testandcohen’sdforevaluating groupdifferencesonthensseandothersurveys.In: AnnualmeetingoftheFlorida associationofinstitutionalresearch,pp1–33

  • ScalabrinoS,BavotaG,VendomeC,Linares-VásquezM,Poshyvanyk D,OlivetoR(2017)Automaticallyassessingcodeunderstandability:Howfararewe?. In: 32ndIEEE/ACMInternationalconferenceonautomatedsoftwareengineering,pp 417–427

  • ScalabrinoS,Linares-VásquezM,PoshyvanykD,OlivetoR(2016) Improvingcodereadabilitymodelswithtextualfeatures.In: IEEE24thInternational conferenceonprogramcomprehension,pp1–10

  • ScholtesI,MavrodievP,SchweitzerF(2016)Fromaristotletoringelmann:A large-scaleanalysisofteamproductivityandcoordinationinopensourcesoftwareprojects. EmpSoftwEng:IntJ 21(2):642–683

    Google Scholar 

  • ShepperdM(1988)Acritiqueofcyclomaticcomplexityasasoftwaremetric. SoftwEngJ 3(2):30–36

    Google Scholar 

  • SheskinD J(2020)Handbookofparametricandnonparametricstatistical procedures.CRCPress

  • ShihabE,HassanA E,AdamsB,JiangZ M(2012)Anindustrialstudyon theriskofsoftwarechanges.In: ProceedingsoftheACMSIGSOFT20thinternational symposiumonthefoundationsofsoftwareengineering

  • ShinY,MeneelyA,WilliamsL,OsborneJ A(2011)Evaluatingcomplexity, codechurn,anddeveloperactivitymetricsasindicatorsofsoftwarevulnerabilities. IEEETransSoftwEng 37(6):772–787

    Google Scholar 

  • SjøbergDI K,YamashitaA,AndaBC D,MockusA,DybåT (2013)Quantifyingtheeffectofcodesmellsonmaintenanceeffort.IEEE TransSoftwEng 39(8):1144–1156

    Google Scholar 

  • SpadiniD,PalombaF,ZaidmanA,BruntinkM,BacchelliA(2018) Ontherelationoftestsmellstosoftwarecodequality.In: 2018IEEEInternational conferenceonsoftwaremaintenanceandevolution,pp1–12

  • SridharaG,HillE,MuppaneniD,PollockL,Vijay-ShankerK(2010) Towardsautomaticallygeneratingsummarycommentsforjavamethods.In: Proceedings oftheIEEE/ACMInternationalconferenceonautomatedsoftwareengineering,pp43–52

  • StåhlD,MartiniA,MårtenssonT(2019)Bigbangsandsmallpops:On criticalcyclomaticcomplexityanddeveloperintegrationbehavior.In: 2019IEEE/ACM 41stInternationalconferenceonsoftwareengineering:(ICSE-SEIP),pp81–90

  • SubandriM A,SarnoR(2017)Cyclomaticcomplexityfordeterminingproduct complexitylevelincocomoii.ProcedComputSci 124:478–486.4th Informationsystemsinternationalconference2017,ISICO2017,6-8November2017,Bali, Indonesia

    Google Scholar 

  • SuhS D,NeamtiuI(2010)Studyingsoftwareevolutionfortamingsoftware complexity.In: 21stAustraliansoftwareengineeringconference,pp3–12

  • TantithamthavornC,McIntoshS,HassanA E,MatsumotoK (2016)Automatedparameteroptimizationofclassificationtechniquesfordefectprediction models.In: IEEE/ACM38thInternationalconferenceonsoftwareengineering,pp 321–332

  • TerceiroA,RiosL R,ChavezC(2010)Anempiricalstudyonthestructural complexityintroducedbycoreandperipheraldevelopersinfreesoftwareprojects.In: Braziliansymposiumonsoftwareengineering,pp21–29

  • ThodeH C(2002)Testingfornormality,vol164.CRCpress

  • TiwariU,KumarS(2014)Cyclomaticcomplexitymetricforcomponentbased software.SIGSOFTSoftwEngNotes 39(1):1–6

    Google Scholar 

  • TosunA,BenerA,TurhanB,MenziesT(2010)Practicalconsiderations indeployingstatisticalmethodsfordefectprediction:Acasestudywithintheturkish telecommunicationsindustry.InfSoftwTechnol 52(11):1242–1257

    Google Scholar 

  • ViggiatoM,OliveiraJ,FigueiredoE,JamshidiP,KästnerC(2019)How docodechangesevolveindifferentplatforms? Amining-basedinvestigation.In: 2019 IEEEInternationalconferenceonsoftwaremaintenanceandevolution,pp218–222

  • WangQ,XiaX,LoD,LiS(2019)Whyismycodechangeabandoned? InfSoftwTechnol 110:108–120

    Google Scholar 

  • WeyukerE J(1988)Evaluatingsoftwarecomplexitymeasures.IEEE TransSoftwEng 14(9):1357–1365

    MathSciNet  Google Scholar 

  • YingATT,MurphyG C,NgR,Chu-CarrollM C(2004)Predicting sourcecodechangesbyminingchangehistory.IEEETransSoftwEng 30 (9):574–586

    Google Scholar 

  • YuL,MishraA(2013)AnempiricalstudyofLehman’slawonsoftwarequality evolution

  • ZhangF,MockusA,ZouY,KhomhF,HassanA E(2013)How doescontextaffectthedistributionofsoftwaremaintainabilitymetrics?.In: IEEE Internationalconferenceonsoftwaremaintenance,pp350–359

  • ZhouY,XuB,LeungH(2010)Ontheabilityofcomplexitymetricstopredict fault-proneclassesinobject-orientedsystems.JSystSoftw 83(4):660–674

    Google Scholar 

  • ZimmermannT,PremrajR,ZellerA(2007)Predictingdefectsforeclipse. In: Proceedingsofthethirdinternationalworkshoponpredictormodelsinsoftware engineering,p9

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaiful Chowdhury.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Mika Mäntylä

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chowdhury, S., Holmes, R., Zaidman, A. et al. Revisiting the debate: Are code metrics useful for measuring maintenance effort?. Empir Software Eng 27, 158 (2022). https://doi.org/10.1007/s10664-022-10193-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10193-8

Keywords

  • Code metrics
  • Maintenance
  • McCabe
  • Code complexity