Detecting and refactoring code smells in spreadsheet formulas

Abstract

Spreadsheets are used extensively in business processes around the world and just like software, spreadsheets are changed throughout their lifetime causing understandability and maintainability issues. This paper adapts known code smells to spreadsheet formulas. To that end we present a list of metrics by which we can detect smelly formulas; a visualization technique to highlight these formulas in spreadsheets and a method to automatically suggest refactorings to resolve smells. We implemented the metrics, visualization and refactoring suggestions techniques in a prototype tool and evaluated our approach in three studies. Firstly, we analyze the EUSES spreadsheet corpus, to study the occurrence of the formula smells. Secondly, we analyze ten real life spreadsheets, and interview the spreadsheet owners about the identified smells. Finally, we generate refactoring suggestions for those ten spreadsheets and study the implications. The results of these evaluations indicate that formula smells are common, that they can reveal real errors and weaknesses in spreadsheet formulas and that in simple cases they can be refactored.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    http://www.gemboxsoftware.com/spreadsheet/overview

References

  1. Abraham R, Erwig M (2005) How to communicate unit error messages in spreadsheets. In: Proceedings of WEUSE ’05, pp 1–5

  2. Abraham R, Erwig M (2006) Inferring templates from spreadsheets. In: Proceedings of ICSE ’06, pp 182–191

  3. Abraham R, Erwig M (2007) Ucheck: a spreadsheet type checker for end users. J Vis Lang Comput 18:71–95. doi:10.1016/j.jvlc.2006.06.001

    Article  Google Scholar 

  4. Alves TL, Ypma C, Visser J (2010) Deriving metric thresholds from benchmark data. In: Proceedings of ICSM ’10. IEEE Computer Society, pp 1–10

  5. Ayalew Y, Clermont M, Mittermeir RT (2000) Detecting errors in spreadsheets. In: Proceedings of EuSpRIG ’00, pp 51–62

  6. Badame S, Dig D (2012) Refactoring meets spreadsheet formulas. In: Proceedings of ICSM ’12, pp 399–409

  7. Bregar A (2004) Complexity metrics for spreadsheet models. In: Proceedings of EuSpRIG ’04, p 9

  8. Chambers C, Erwig M (2009) Automatic detection of dimension errors in spreadsheets. J Vis Lang Comput 20:269–283

    Article  Google Scholar 

  9. Cunha J, Fernandes JP, Mendes J, Hugo Pacheco JS (2012) Towards a catalog of spreadsheet smells. In: Proceeding of ICCSA’12. LNCS

  10. Cunha J, Fernandes JP, Peixoto C, Saraiva J (2012) A quality model for spreadsheets. In: Proceedings of QUATIC ’12, pp 231–236

  11. Cunha J, Saraiva J, Visser J (2009) Discovery-based edit assistance for spreadsheets. In: Proceedings of VL/HCC ’09, pp 233–237. IEEE

  12. Erwig M (2009) Software engineering for spreadsheets. IEEE Softw 26:25–30. doi:10.1109/MS.2009.140

    Article  Google Scholar 

  13. Fisher M, Rothermel G (2005) The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms. In: Proceedings of WEUSE ’05, pp 47–51

  14. Fowler M (1999) Refactoring: improving the design of existing code. Addison-Wesley Longman Publishing Co., Inc., Boston

  15. Hendry DG, Green TRG (1994) Creating, comprehending and explaining spreadsheets: a cognitive interpretation of what discretionary users think of the spreadsheet model. Int J Human-Computer Stud 40(6):1033–1065

    Article  Google Scholar 

  16. Hermans F, Pinzger M, van Deursen A (2010) Automatically extracting class diagrams from spreadsheets. In: Proceedings of ECOOP ’10, pp 52–75

  17. Hermans F, Pinzger M, van Deursen A (2011) Breviz: spreadsheet visualization and quality analysis. In: Proceedings of EuSpRIG ’11, pp 63–72

  18. Hermans F, Pinzger M, van Deursen A (2012a) Detecting and visualizing inter-worksheet smells in spreadsheets. In: Proceedings of ICSE ’12, pp 441–451

  19. Hermans F, Pinzger M, van Deursen A (2012b) Detecting code smells in spreadsheet formulas. In: Proceedings of ICSM ’12, pp 409–418

  20. Hermans F, Pinzger M, van Deursen A (2012c) Measuring spreadsheet formula understandability. In: Proceedings of EuSpRIG ’12

  21. Hodnigg K, Mittermeir R (2008) Metrics-based spreadsheet visualization: support for focused maintenance. In: Proceedings of EuSpRIG ’08, p 16

  22. Hole S, McPhee D, Lohfink A (2009) Mining spreadsheet complexity data to classify end user developers. In: Proceedings of ICDM ’09. CSREA Press, pp 573–579

  23. Kamiya T, Kusumoto S, Inoue K (2002) Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. TSE 28(7):654–670

    Google Scholar 

  24. Lanza M, Marinescu R, Ducasse S (2005) Object-oriented metrics in practice. Springer-Verlag, New York, Inc., Secaucus

    Google Scholar 

  25. Marinescu R (2001) Detecting design flaws via metrics in object-oriented systems. In: Proceedings of TOOLS ’01. IEEE Computer Society, pp 173–182

  26. Moha N, Gueheneuc YG, Duchien L, Le Meur AF (2010) Decor: a method for the specification and detection of code and design smells. TSE 36(1):20–36

    Article  Google Scholar 

  27. Nardi B, Miller J (1990) The spreadsheet interface: a basis for end user programming. In: Proceeding of the IFIP conference on human-computer interaction (INTERACT). North-Holland, pp 977–983

  28. Olbrich S, Cruzes DS, Basili V, Zazworka N (2009) The evolution and impact of code smells: a case study of two open source systems. In: Proceedings of ESEM ’09, pp 390–400

  29. Panko R (2006) Facing the problem of spreadsheet errors. Decis Line 37(5):8–10

    Google Scholar 

  30. Panko RR (1998) What we know about spreadsheet errors. J End User Comput 10(2):15–21

    Article  Google Scholar 

  31. Powell S, Baker K, Lawson B (2009) Errors in operational spreadsheets: a review of the state of the art. In: Proceedings of HICCS ’09. IEEE Computer Society, pp 1–8

  32. Raffensperger J (2009) New guidelines for spreadsheets. Int J Bus Econ 2:141–154

    Google Scholar 

  33. Rajalingham K, Chadwick D, Knight B, Edwards D (2000) Quality control in spreadsheets: a software engineering-based approach to spreadsheet development. In: Proceedings HICSS ’00, pp 133–143

  34. Wettel R, Lanza M (2008) Visually localizing design problems with disharmony maps. In: Proceedings of SoftVis ’08, pp 155–164

  35. Wettel R, Lanza M, Robbes R (2011) Software systems as cities: a controlled experiment. In: Proceedings of ICSE ’11, pp 551–560

  36. Winston W (2001) Executive education opportunities. OR/MS Today 28(4)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Felienne Hermans.

Additional information

Communicated by: Massimiliano Di Penta and Jonathan Maletic

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hermans, F., Pinzger, M. & van Deursen, A. Detecting and refactoring code smells in spreadsheet formulas. Empir Software Eng 20, 549–575 (2015). https://doi.org/10.1007/s10664-013-9296-2

Download citation

Keywords

  • Spreadsheets
  • Code smells
  • Refactoring