Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study

Abstract

Context

There is a widespread belief in both SE and other branches of science that experience helps professionals to improve their performance. However, cases have been reported where experience not only does not have a positive influence but sometimes even degrades the performance of professionals.

Aim

Determine whether years of experience influence programmer performance.

Method

We have analysed 10 quasi-experiments executed both in academia with graduate and postgraduate students and in industry with professionals. The experimental task was to apply ITLD on two experimental problems and then measure external code quality and programmer productivity.

Results

Programming experience gained in industry does not appear to have any effect whatsoever on quality and productivity. Overall programming experience gained in academia does tend to have a positive influence on programmer performance. These two findings may be related to the fact that, as opposed to deliberate practice, routine practice does not appear to lead to improved performance. Experience in the use of productivity tools, such as testing frameworks and IDE also has positive effects.

Conclusion

Years of experience are a poor predictor of programmer performance. Academic background and specialized knowledge of task-related aspects appear to be rather good predictors.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    TEKES: Finnish Funding Agency for Technology and Innovation

  2. 2.

    Not specified so as not to disclose T. Raty’s organization.

  3. 3.

    Although we had 126 experimental subjects, 11 observations were lost during the analysis as two subjects failed to complete the experimental task, six failed to report their academic qualifications and four failed to report any experience. Consequently, we were only able to effectively process 115 cases.

References

  1. Adelson B (1981) Problem solving and the development of abstract categories in programming languages. Mem Cogn 9(4):422–433

    Article  Google Scholar 

  2. Adelson B (1984) When novices surpass experts: the difficulty of a task may increase with expertise. J Exp Psychol: Learn Mem Cogn 10(3):483

    Google Scholar 

  3. Agarwal R, Tanniru MR (1991) Knowledge extraction using content analysis. Knowl Acquis 3:421–441

    Article  Google Scholar 

  4. Aranda A, Dieste O, Juristo N (2014) Evidence of the presence of bias in subjective metrics: analysis within a family of experiments. Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE 2014). London, UK, pp 24–27

  5. Arisholm E, Gallis H, Dyba T, Sjoberg DIK (2007) Evaluating pair programming with respect to system complexity and programmer expertise. IEEE Trans Softw Eng 33(2):65–86

    Article  Google Scholar 

  6. Armour PG (2004) Beware of counting LOC. Commun ACM 47(3):21–24

    MathSciNet  Article  Google Scholar 

  7. Askar P, Davenport D (2009) An investigation of factors related to self-efficacy for java programming among engineering students. Turk Online J Educ Technol 8(1):26–32

    Google Scholar 

  8. Belsley DA (1991) Conditioning diagnostics: collinearity and weak data in regression. Wiley

  9. Bob U (2005) The bowling game kata. Retrieved from http://butunclebob.com/ArticleS.UncleBob.TheBowlingGameKata

  10. Brandmaier AM, von Oertzen T, McArdle JJ, Lindenberger U (2013) Structural equation model trees. Psychol Methods 18:71–86

    Article  Google Scholar 

  11. Burkhardt J, Détienne F, Wiedenbeck S (1997) Mental representations constructed by experts and novices in object-oriented program comprehension. In: Howard S, Hammond J, Lindgaard G (eds) Springer US, pp 339–346

  12. Burkhardt J, Détienne F, Wiedenbeck S (2002) Object-oriented program comprehension: effect of expertise, task and phase. Empir Softw Eng 7(2):115–156

    MATH  Article  Google Scholar 

  13. Camerer CF, Johnson EJ (1997) 10 the process-performance paradox in expert judgment: How can experts know so much and predict so badly? Research on Judgment and Decision Making: Currents, Connections, and Controversies. 342

  14. Campbell RL, Bello LD (1996) Studying human expertise: beyond the binary paradigm. J Exp Theor Artif Intell 8(3-4):277–291

    Article  Google Scholar 

  15. Chase WG, Simon HA (1973) The mind’s eye in chess

  16. Chmiel R, Loui MC (2004) Debugging: from novice to expert. ACM SIGCSE Bull 36(1):17–21

    Article  Google Scholar 

  17. Chulis K (2012) Optimal segmentation approach and application. clustering vs. classification trees. Retrieved from http://www.ibm.com/developerworks/library/ba-optimal-segmentation/

  18. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale

    Google Scholar 

  19. Colvin G (2008) Talent is overrated: What really separates world-class performers from Everybody Else. Penguin Publishing Group

  20. Crosby M, Scholtz J, Widenbeck S (2002) The roles beacons play in comprehension for novice and expert programmers. 14th Workshop of the Psychology of Programming Interest Group, Brunel University. pp 58–73

  21. Curtis B (1984) Fifteen years of psychology in software engineering: individual differences and cognitive science. IEEE Press, Orlando

    Google Scholar 

  22. Curtis B, Krasner H, Iscoe N (1988) A field study of the software design process for large systems. Commun ACM 31(11):1268–1287

    Article  Google Scholar 

  23. Darcy DP, Ma M (2005) Exploring individual characteristics and programming performance: Implications for programmer selection. Proceedings of the 38th Annual Hawaii International Conference on System Sciences, 314a.

  24. Daun M, Salmon A, Weyer T, Pohl K (2015) The impact of students’ skills and experiences on empirical results: A controlled experiment with undergraduate and graduate students. Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, Art. No. 29.

  25. De Groot AD (1978) Thought and choice in chess. Walter de Gruyter

  26. Erdogmus H, Morisio M, Torchiano M (2005) On the effectiveness of the test-first approach to programming. Softw Eng IEEE Trans 31(3):226–237

    Article  Google Scholar 

  27. Ericsson KA (2006a) The influence of experience and deliberate practice on the development of superior expert performance. The Cambridge Handbook of Expertise and Expert Performance, pp 683–703

  28. Ericsson KA (2006b) An introduction to cambridge handbook of expertise and expert performance: Its development, organization, and content. In: Ericsson KA, Charness N, Hoffman RR, Feltovich PJ (eds) The cambridge handbook of expertise and expert performance. Cambridge University Press, pp 3–19

  29. Ericsson KA, Charness N (1994) Expert performance: its structure and acquisition. Am Psychol 49(8):725

    Article  Google Scholar 

  30. Ericsson KA, Lehmann AC (1996) Expert and exceptional performance: evidence of maximal adaptation to task constraints. Annu Rev Psychol 47(1):273–305

    Article  Google Scholar 

  31. Ericsson KA, Krampe RT, Tesch-Römer C (1993) The role of deliberate practice in the acquisition of expert performance. Psychol Rev 100(3):363–406

    Article  Google Scholar 

  32. Experience (2015) from http://www.merriam-webster.com/dictionary/experience. Retrieved 7 Oct 2015

  33. Faul F, Erdfelder E, Lang A, Buchner A (2007) G* power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39(2):175–191

    Article  Google Scholar 

  34. Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach, third edition. CRC Press.

  35. Field A, Miles J, Field Z (2012) Discovering statistics using R. SAGE Publications

  36. Glenwick DS (2016) Handbook of methodological approaches to community-based research: Qualitative, quantitative, and mixed methods. Oxford University Press

  37. Green SB (1991) How many subjects does it take to do A regression analysis. Multivar Behav Res 26(3):499–510

    Article  Google Scholar 

  38. Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. Academic Press

  39. Heiberger RM, Holland B (2013) Statistical analysis and data display: an intermediate course with examples in S-plus, R, and SAS. Springer, New York

    Google Scholar 

  40. ISO I (2011) IEC25010: 2011 systems and software engineering–systems and software quality requirements and evaluation (SQuaRE)–System and software quality models. Int Organ Stand

  41. Jeffries R, Turner AA, Polson PG, Atwood ME (1981) The processes involved in designing software. Cogn Skills Acquis 255:283

    Google Scholar 

  42. Jørgensen M, Faugli B, Gruschke T (2007) Characteristics of software engineers with optimistic predictions. J Syst Softw 80(9):1472–1482

    Article  Google Scholar 

  43. Kitchenham B, Mendes E (2004) Software productivity measurement using multiple size measures. IEEE Trans Softw Eng 30(12):1023–1035

    Article  Google Scholar 

  44. Larkin J, McDermott J, Simon DP, Simon HA (1980) Expert and novice performance in solving physics problems. Science (New York, NY) 208(4450):1335–1342

    Article  Google Scholar 

  45. Lee WK, Chung IS, Yoon GS, Kwon YR (2001) Specification-based program slicing and its applications. J Syst Archit 47(5):427–443

    Article  Google Scholar 

  46. Lui KM, Chan KCC (2006) Pair programming productivity: novice–novice vs. expert–expert. Int J Hum-Comput Stud 64(9):915–925

    Article  Google Scholar 

  47. MacCallum R, Zhang S, Preacher K, Rucker D (2002) On the practice of dichotomization of quantitative variables. 7:10–40

  48. MacDorman KF, Whalen TJ, Ho C, Patel H (2011) An improved usability measure based on novice and expert performance. Int J Hum-Comput Interact 27(3):280–302

    Article  Google Scholar 

  49. Madeyski L (2005) Preliminary analysis of the effects of pair programming and test-driven development on the external code quality. Proceedings of the 2005 Conference on Software Engineering: Evolution and Emerging Technologies. pp. 113–123

  50. Marakas GM, Elam JJ (1998) Semantic structuring in analyst and representation of facts in requirements analysis. Inf Syst Res 9(1):37–63

    Article  Google Scholar 

  51. Mayer RE (1997) From novice to expert. In: Helander M, Landauer TK, Prabhu P (eds) Handbook of human-computer interaction, 2nd edn. Elsevier Science B.V, pp. 781–795

  52. McDaniel MA, Schmidt FL, Hunter JE (1988) Job experience correlates of job performance. J Appl Psychol 73(2):327

    Article  Google Scholar 

  53. McKeithen KB, Reitman JS, Rueter HH, Hirtle SC (1981) Knowledge organization and skill differences in computer programmers. Cogn Psychol 13(3):307–325

    Article  Google Scholar 

  54. Miles J, Shevlin M (2001) Applying regression and correlation: A guide for students and researchers. SAGE Publications

  55. Müller MM, Höfer A (2007) The effect of experience on the test-driven development process. Empir Softw Eng 12(6):593–615

    Article  Google Scholar 

  56. Muller MM, Padberg F (2004) An empirical study about the feelgood factor in pair programming. Proceedings 10th International Symposium on Software Metrics. pp 151–158

  57. Munir H, Moayyed M, Petersen K (2014) Considering rigor and relevance when evaluating test driven development: a systematic review. Inf Softw Technol 56(4):375–394

    Article  Google Scholar 

  58. Nisbet R, Elder J, Miner G (2009) Handbook of statistical analysis and data mining applications. Academic Press

  59. O’brien R (2007) A caution regarding rules of thumb for variance inflation factors. Qual Quant 41(5):673–690

    Article  Google Scholar 

  60. Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2007) The role of experience and ability in comprehension tasks supported by UML stereotypes. 29th International Conference on Software Engineering. pp 375–384

  61. Riley RD, Lambert PC, Abo-Zaid G (2010) Meta-analysis of individual participant data: Rationale, conduct, and reporting. BMJ 340. doi:10.1136/bmj.c221

  62. Runeson P (2003) Using students as experiment subjects – an analysis on graduate and freshmen student data. Proceedings 7Th International conference on empirical assessment & evaluation in software engineering. pp 95–102

  63. Sheppard SB, Curtis B, Milliman P, Love T (1979) Modern coding practices and programmer performance. Computer 12:41–49

    Article  Google Scholar 

  64. Siegmund J, Kästner C, Liebig J, Apel S, Hanenberg S (2014) Measuring and modeling programming experience. Empir Softw Eng 19(5):1299–1334

    Article  Google Scholar 

  65. Sim SE, Ratanotayanon S, Aiyelokun O, Morris E (2006) An initial study to develop an empirical test for software engineering expertise. Institute for Software Research, University of California, Irvine, CA, USA, Technical Report# UCI-ISR-06-6

  66. Soloway E, Ehrlich K (1984) Empirical studies of programming knowledge. IEEE Trans Softw Eng SE-10(5):595–609

    Article  Google Scholar 

  67. Soloway E, Bonar J, Ehrlich K (1983) Cognitive strategies and looping constructs: an empirical study. Commun ACM 26(11):853–860

  68. Sonnentag S (1995) Excellent software professionals: experience, work activities, and perception by peers. Behav Inform Technol 14(5):289–299

    Article  Google Scholar 

  69. Sonnentag S (1998) Expertise in professional software design: a process study. J Appl Psychol 83(5):703–715

    Article  Google Scholar 

  70. Votta LG (1994) By the way, has anyone studied any real programmers, yet? Software Process Workshop, 1994. Proceedings., Ninth International. pp 93–95

  71. Weisberg S (2005). Applied Linear Regression, third edition. John Wiley & Sons, Inc., Hoboken, NJ

  72. Weiser M (1981) Program slicing. IEEE Press, San Diego

    Google Scholar 

  73. Weiser J, Shertz J (1984) Programming problem representation in novice and expert programmers. Int J Man-Mach Stud 19:391–398

    Article  Google Scholar 

  74. Wiedenbeck S (1985) Novice/expert differences in programming skills. Int J Man-Mach Stud 23(4):383–390

    Article  Google Scholar 

  75. Williams L, Kudrjavets G, Nagappan N (2009) On the effectiveness of unit test automation at microsoft. software reliability engineering, 2009. ISSRE ‘09. 20th International symposium on. pp 81–89

  76. Winship C, Mare RD (1984) Regression models with ordinal variables. Am Sociol Rev 49(4):512–525

    Article  Google Scholar 

  77. Ye N, Salvendy G (1994) Quantitative and qualitative differences between experts and novices in chunking computer software knowledge. Int J Hum-Comput Interact 6(1):105–118

    Article  Google Scholar 

Download references

Acknowledgments

We would like to acknowledge Dr.Hakan Erdogmus who contributed to the design of one of the tasks used in the study (BSK) and the corresponding test cases. We also wish to acknowledge Mr. Timo Raty for his participation in the creation of the code templates for C++, and the training given in one of the quasi-experiments. We wish also acknowledge Mr. Adrian Santos for his support in the collection of the subjects’ data.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Oscar Dieste.

Additional information

Communicated by: Richard Paige, Jordi Cabot, and Neil Ernst

Appendices

Appendix 1: Description of the Independent Variables

Table 15 shows the 15 independent variables used in this research. The main aim of this appendix is to list each variable giving a brief description of the variable, its type (nominal, ordinal or dummy) and its respective levels. Section 3 details the types and measurement of variables.

Table 15 Independent variables

Appendix 2: Details of the Experiment

Specification for Mars Rover API with Slicing

Develop an API that moves a rover around a planet. The planet is represented as a grid with x and y coordinates. The rover is also facing in a direction. The direction can be north (N), south (S), west (W) or east (E). The input received by the rover is a string representing the commands it needs to execute.

The Planet

The planet on which the rover moves is represented as a square grid, with size (x, y).

Requirement: Define a planet of size (x, y).

Example: (100,100) creates a planet of size 100 × 100.

Landing

When the rover lands on the planet, it begins its journey at the start of the grid facing north.

Requirement: When the rover lands on the planet its position shall be (0,0) facing north.

Example: An empty command (i.e., “”) to the rover returns its landing status (0,0,N).

Turning

The rover turns right or left. It remains in the same cell of the grid. Its direction changes accordingly.

Requirement: Compute the position of the rover after turning left (command “l”) or right (command “r”).

Example: A rover at position (0,0,N) is at position (0,0,E) after executing command “r”. A rover at position (0,0,N) is at position (0,0,W) after executing command “l”.

Moving

The rover moves forward or backward one grid cell in the direction that it is facing. The rover’s direction does not change.

Requirement: Compute the position of the rover after moving forward (command “f”) or backward (command “b”) one grid cell.

Example: A rover at position (7,6,N) moves to (7,7,N) after executing a “f” command. A rover at position (5,8,E) moves to (4,8,E) after executing a “b” command.

Moving and Turning Combined

The rover shall be able to execute arbitrary sequences of “f”, “b”, “l” and “r” commands.

Requirement: Compute the position of the rover after executing a series of commands.

Example: A rover at position (0,0,N) moves to position (2,2,E) after executing “ffrff”.

Wrapping

Since the planet is a sphere the rover wraps at the opposite edge once it moves over it.

Requirement: Compute the position of the rover moving over the edges. The rover shall spawn on the opposite side.

Example: A rover on a planet of size 100 × 100, which moves backward (command “b”) after landing (remember that landing always takes place at position (0,0,N)) moves to position (0,99,N).

Positioning of Obstacles

Obstacles can be positioned on specific cells of the grid.

Requirement: Define the obstacles as a string (x1,y1) (x2,y2)… Place the obstacles on the grid.

Example: “(1,1) (4,5)” defines two obstacles, one at position (1,1) and another at position (4,5). Notice that the planet grid should be greater than or equal to 6 × 6.

Identifying a Single Obstacle

The rover might encounter (i.e., tries to move into) an obstacle. When it does it should report the obstacle and continue executing the remaining commands.

Requirement: Compute the position of a rover encountering an obstacle and report the obstacle. The same obstacle should be reported only once.

Example: A rover just landed (position (0,0,N)). There is one obstacle at planet coordinates (2,2). The rover executes “ffrfff” and reports (1,2,E) (2,2). Notice that the same obstacle is encountered twice but reported only once.

Identifying Multiple Obstacles

The rover might encounter multiple obstacles. When it does, it should report all of them once and in the order they were encountered.

Requirement: Compute the position of the rover encountering obstacles, and report the obstacles encountered in the order they are encountered. The same obstacle shall be reported only once.

Example: A rover just landed (position(0,0,N)). There are two obstacles at planet coordinates (2,2) and (2,1). The rover executes “ffrfffrflf” and reports (1,1,E) (2,2) (2,1). Notice that the first obstacle is encountered twice but reported only once.

A Tour Around the Planet

The rover goes on a tour around the planet encountering several obstacles, and wrapping in both axes.

Requirement: Compute the position of a rover that executes a series of commands that result in moving along both axes in both directions, encountering several obstacles and wrapping from both edges of the planet.

Example: The rover lands on a 6 × 6 planet with obstacles at (2,2), (0,5) and (5,0). It executes the command “ffrfffrbbblllfrfrbbl” and returns (0,0,N) (2,2) (0,5) (5,0).

Congratulations, you are done!

Specification for Mars Rover API without Slicing

The API manages a rover that moves on a planet (/squared grid) of arbitrary size (x,y). The rover starts the movement at position (0,0). The direction of the movement can be N (north), S (south), E (east) and W (west). The rover is north facing at the start.

The rover receives a string of commands: l (left), r (right), f (forward) and b (backward). l and r change the rover’s direction counter- and clockwise, respectively, but do not alter its position. f and b move the rover 1 position on the grid in or away from the direction that it is facing, respectively. The direction in which the rover is facing does not change. When the rover moves over the edges of the planet, it spawns on the opposite side.

The planet (/grid) may contain obstacles. Obstacles are defined as a list of coordinates “(obs1X, obs1Y) (obs2X, obs2Y)…” . When the rover finds an obstacle during a tour, it skips the current command (i.e., it does not move to the cell in which the obstacle is located) and continues to execute the remaining commands.

Upon processing the string of commands, the rover returns its position and direction in the format “(posX, posY, facing)”. If obstacles are found, the output will be “(posX, posY, facing) (obs1X, obs1Y) (obs2X, obs2Y)…” The same obstacle shall be reported only once. Obstacles are reported in the order in which they are found.

figureb

Specification for Bowling Score Keeper with Slicing

The objective is to develop an application that can calculate the score of a single bowling game using TDD. There is no graphical user interface. All that you will use in this assignment is the objects and JUnit testing. You will not need a main method.

The application requirements are divided into a set of user stories, which is as your to-do list. You should be able to incrementally develop a complete solution without an upfront comprehension of all the game’s rules. For this exercise, don’t read ahead, and handle the requirements one at a time in the stated order. Solve the problem using TDD, starting with the requirement for the first story. Remember to always lead with a test case, taking hints from the examples provided. Do not move to the next story until you have done with the last one. A story is done when you are confident that your program correctly implements the functionality stipulated by the requirement for the story. This means that all of your test cases for that story and all of the test cases for the previous stories pass. You may need to tweak your solution as you progress towards more advanced stories.

Frame

Each turn of a bowling game is called a frame. 10 pins are arranged in each frame. The goal of the player is to knock down as many pins as possible in each frame. The player has two chances, or throws, to do so. The value of a throw is given by the number of pins knocked down in that throw.

Story: As the scorekeeper, I want to be able to record a frame as composed of two throws. The first and second throws should be distinguishable.

Example: [2, 4] is a frame with two throws, in which two pins were knocked down in the first throw and four pins were knocked down in the second.

Frame Score

An ordinary frame’s score is the sum of its throws.

Story: As the scorekeeper, I want to be able to compute the score of an ordinary frame after a player has rolled both throws.

Examples: The score of the frame [2, 6] is 8. The score of the frame [0, 9] is 9.

Game

A single game consists of 10 frames.

Story: As the scorekeeper, I want to define a game as a sequence of 10 frames.

Example: The sequence of frames [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] represents a game. You may reuse this game from now on to represent and test different scenarios, modifying only a few frames each time.

Partial Game

When the player rolls a throw, the throw is automatically recorded in the correct frame.

Story: As the scorekeeper, when a player rolls throws, I want the game to keep track of the frames and figure out in which frame to place the next throw depending on the past throws. You think this is easy. Maybe for now. We’ll see.

Example: If the game currently consists of the frames [1, 5] [3, 6] [7, 2] [3, ?] and the player rolls a throw with a value of 4, the game becomes [1, 5] [3, 6] [7, 2] [3, 4]. Another roll with a value of 5 transforms the game to [1, 5] [3, 6] [7, 2] [3, 4] [5, ?].

Game Score

The score of a bowling game is the sum of the individual scores of its frames.

Story: As the scorekeeper, I want to know a player’s current game score at all times.

Example: The score of the game [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] is 81. Partial scores are possible for an incomplete game if the frame scores are known up to the last complete frame. The score of the game [1, 5] [3, 6] [7, ?] is 15. The frame [7, ?] is not yet complete.

Strike

A frame is called a strike if all 10 pins are knocked down in the first throw. In this case, there is no second throw. A strike frame can be written as [10, 0]. The score of a strike equals 10 plus the sum of the next two throws of the subsequent frame.

Story: As the scorekeeper, I want to be able to recognize a strike frame, compute its score after the next frame has been completed, and compute the game score.

Examples: Suppose [10, 0] and [3, 6] are consecutive frames. Then the first frame is a strike and its score equals 10 + 3 + 6 = 19. The game [10, 0] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 94. The partial game [10, 0] [3, 6] has a score of 28.

Spare

A frame is called a spare when all 10 pins are knocked down in two throws. The score of a spare frame is 10 plus the value of the first throw from the subsequent frame.

Story: As the scorekeeper, I want to be able to recognize a spare frame, compute the score of a game containing a spare frame after the first throw of the next frame has been rolled, and compute the game’s score.

Examples: [1, 9], [4, 6], [7, 3] are all spares. If you have two frames [1, 9] and [3, 6] in a row, the spare frame’s score is 10 + 3 = 13. The game [1, 9] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 88. The partial game [1, 9] [3, 6] has a score of 22.

Strike and Spare

A strike can be followed by a spare. The strike’s score is not affected when this happens.

Story: As the scorekeeper, I want to make sure that the score of a strike is computed right when it’s followed by a spare.

Examples: In the sequence [10, 0] [4, 6] [7, 2], a strike is followed by a spare. In this case, the score of the strike is 10 + 4 + 6 = 20, and the score of the spare is 4 + 6 + 7 = 17. The game [10, 0] [4, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 103.

Multiple Strikes

Two strikes in a row are possible. You must take care when this happens as you need the values of throws from the next two frames to compute the score of the first strike..

Story: As the scorekeeper, I want to make sure that I can record two consecutive strikes correctly in the game, and correctly compute the score of the first strike after the next two throws have been rolled.

Examples: In the sequence [10, 0] [10, 0] [7, 2], the score of the first strike is 10 + 10 + 7 = 27. The score of the second strike is 10 + 7 + 2 = 19. The game [10, 0] [10, 0] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 112. The score of the partial game [10, 0] [10, 0] [7, ?] is 27 (we cannot compute the scores of the last two frames yet).

Multiple Spares

Two spares in a row are possible. The score of the first spare is not affected when this happens.

Story: As the scorekeeper, I want to be able to compute the score of a game with two spares in a row, and the scores of the first spare after the next spare has been completed.

Example: The game [8, 2] [5, 5] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 98.

Spare as the Last Frame

When the last frame in a game is a spare, the player will be given a bonus throw. However, this bonus throw does not belong to a regular frame. It is only used to calculate the score of the last spare.

Story: As the scorekeeper, I hate it when the last frame is a spare: let the game please figure out that the next roll is a bonus throw and compute the score of the last frame and the whole game based on the value of that bonus throw.

Example: The last frame in the game [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 8] is a spare. If the bonus throw is [7], the last frame has a score of 2 + 8 + 7 = 17. The game has a score of 90.

Strike as the Last Frame

When the last frame of the game is a strike, the player will be given two bonus throws. However, these two bonus throws do not belong to a regular frame. They are only used to calculate score of the last strike frame.

Story: As the scorekeeper, I hate it even more when the last frame of a game is a strike: let the game please figure out that the next rolls are bonus throws and compute the score of the last frame and the whole game based on the value of those bonus throws.

Example: The last frame in the game [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [10, 0] is a strike. If the bonus throws are [7, 2], the last frame’s score is 10 + 7 + 2 = 19. The game score is 92.

Bonus is a Strike

No more bonus throws are granted when the last frame in the game is a spare and the bonus throw is a strike.

Story: As the scorekeeper, I hate it most when the last frame is spare and the bonus throw is a strike: please God, let the game figure this scenario out correctly.

Example: In the game [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 8], the last frame is a spare. If the bonus throw is [10], the game score is 93.

Best Score

A perfect game consists of all strikes (a total of 12, including the bonus throws), and has a score of 300.

Story: As the scorekeeper, I love it when the game is just a sequence of strikes, including the bonus throws, because I know that the player then deserves a perfect score of 300.

Example: A perfect game looks like [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] with bonus throws [10, 10]. Its score is 300.

Random Game

Story: As the scorekeeper, I want to make sure that the game [6, 3] [7, 1] [8, 2] [7, 2] [10, 0] [6, 2] [7, 3] [10, 0] [8, 0] [7, 3] [10] has a score of 135.

Congratulations, you are done!

Specification for Bowling Score Keeper without Slicing

figurec

The game consists of 10 frames as shown above. The player has two opportunities in each frame to knock down 10 pins. The score for the frame is the total number of pins knocked down, plus bonuses for strikes and spares.

A spare is when the player knocks down all 10 pins in two tries. The bonus for that frame is the number of pins knocked down by the next ball rolled. So, the score in frame 3 above is 10 (the total number knocked down), plus a bonus of 5 (the number of pins knocked down on the next roll.).

A strike is when the player knocks down all 10 pins on his or her first try. The bonus for that frame is the value of the next two balls rolled.

A player who rolls a spare or strike in the tenth frame is allowed to roll the extra balls to complete the frame. However, no more than three balls can be rolled in tenth frame.

Appendix 3: Industry Questionnaire

figuredfigured

Appendix 4: Academic Questionnaire

figureefiguree

Appendix 5: Breakdown of Experience

Programming Language Experience

Figure 8

Fig. 8
figure8

Breakdown of programming language experience

Overall Programming Language Experience

Figure 9

Fig. 9
figure9

Breakdown of Overall programming language experience

Appendix 6: Collinearity Conditions

Table 16 reports the results of the collinearity analysis for the model with 15 independent variables. The pattern shown in Table 16 suggests that the testing framework (UNIT_TESTING_FRAMEWORK2_ADAPTED) might be collinear, as it has values close to the bounds established for the variance inflation factor (VIF = 4.943) and a low tolerance (T = 0.202). On the other hand, the collinearity statistics for the other variables are within the expected values (VIF < 5 and T > 0.2), which is a sign that they are not collinear.

Table 16 Coefficients of the linear regression model with 15 independent variables

Table 17 shows the collinearity diagnostics of the model specified in Table 16. Note that component 16 has a very high condition index (CI = 86.918 > 30), which suggests that the level of collinearity is high. Comparing the proportion of variance explained for each of the model explanatory variables, we find that the UNIT_TESTING_FRAMEWORK_ADAPTED and EXPERIMENT_PROGRAMMING_LANGUAGE variables have an extremely high proportion of variance explained with values of 0.90 and 0.46, respectively. One way of solving the collinearity problem is to remove the most collinear variable, which, in this case, is UNIT_TESTING_FRAMEWORK_ADAPTED.

Table 17 Collinearity diagnostics (1)

Model 2

Table 18 reports the collinearity diagnostics of model 2 with 14 variables, which is composed of all the variables of the original model, except the UNIT_TESTING_FRAMEWORK_ADAPTED variable that was eliminated on the grounds of collinearity.

Table 18 Collinearity diagnostics (2) with 14 variables

Note that dimension 15 still has a very high condition index (CI = 43 > 30), which implies that there is a problem of collinearity. There are three closely correlated variables: EXPERIMENT_PROGRAMMING_LANGUAGE, SITE and TRAINER. In order to deal with the collinearity problem, we have opted to eliminate the variable with the highest proportion of variance explained, which in this case is EXPERIMENT_PROGRAMMING_LANGUAGE with a proportion of variance explained of 0.40.

Model 3

Table 19 reports the collinearity diagnostics of model 3 with 13 variables, which is composed of all the variables of model 2 except the EXPERIMENT_PROGRAMMING_ LANGUAGE variable.

Table 19 Collinearity diagnostics (3) with 13 variables

Note that dimension 14 still has a condition index greater than 30 (CI = 33.67 > 30), which suggests that there is a problem of collinearity. There are three closely correlated variables: SITE, and TRAINER and CS_DEGREE. According to the non-collinearity condition, we should eliminate the variable with the highest proportion of variance explained. Bearing in mind the experimental data type, we know that SITE (which refers to whether the experiment was conducted in academia or industry) is closely related to TRAINER. Therefore, we will eliminate the TRAINER variable, as one of the trainers mostly trained subjects in industry and the other trained subjects in academia, and kept SITE, which is a more interesting variable for this research.

Model 4

Table 20 shows the collinearity diagnostics of model 4 with 12 variables, which is composed of all the variables of model 3 except the TRAINER variable. Model 4 is the model that we finally used in this research. Note that this model meets the collinearity conditions: a) the condition index of dimension 13 (CI = 29) is less than 30 and b) the proportions of variance explained are within the established bounds (less than 0.5).

Table 20 Collinearity diagnostics (4) with 12 variables

Appendix 7: Multiple Linear Regression – Alternative Model

Quality

Table 21 shows the results of the multiple regression model with respect to the influence of External Quality. Note that experience is measured on a Likert scale in this case.

Table 21 Results of the MRL - Quality

Productivity

Table 22 shows the results of the multiple regression model with respect to the influence of Productivity. Note that experience is measured on a Likert scale in this case.

Table 22 MRL results – Productivity

Appendix 8: Residual Analysis by Experiment

Quality

Fig. 10
figure10

Residual by Experiment – QLTY

Table 23 Effect of the experiment on Quality

The results reported in Table 24 show that the model residuals plotted against the EXPERIMENT_CODE variable are significant (p-value = 0.006 < 0.05), which means that the variances are not homogeneous.

Table 24 Levene test for QLTY

Productivity

Fig. 11
figure11

Residual by Experiment – PROD

Table 25 Effect of the experiment on PRODUCTIVITY

The results reported in Table 26 show that the model residuals plotted against the EXPERIMENT_CODE variable are not significant (p-value = 0.155 > 0.05), which suggests that the residual variances are homogeneous.

Table 26 Levene test for PROD

Appendix 9: SPSS Scripts

Filter

figuref

Original MLR Model

figureg

MLR Results for QLTY

figureh

MLR Results for PROD

figurei

Decision Trees for the QLTY

figurej

Decision Trees for the PROD

figurek

Appendix 10: Decision Trees CART (CRT)

QLTY

Figure 12 shows the decision tree for the QLTY response variable with different number of cases for the parent node (N) and the child node (n).

Fig. 12
figure12

CART decision tree for QLTY

Productivity

Figure 13 shows the decision tree for the PROD response variable with different number of cases for the parent node (N) and the child node (n).

Fig. 13
figure13

CART decision tree for PROD

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dieste, O., Aranda, A.M., Uyaguari, F. et al. Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study. Empir Software Eng 22, 2457–2542 (2017). https://doi.org/10.1007/s10664-016-9471-3

Download citation

Keywords

  • Experience
  • Industry
  • Academy
  • Programming
  • Iterative test-last development
  • External quality
  • Productivity
  • Performance