Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study

Dieste, Oscar; Aranda, Alejandrina M.; Uyaguari, Fernando; Turhan, Burak; Tosun, Ayse; Fucci, Davide; Oivo, Markku; Juristo, Natalia

doi:10.1007/s10664-016-9471-3

Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study

Published: 04 February 2017

Volume 22, pages 2457–2542, (2017)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Oscar Dieste¹,
Alejandrina M. Aranda¹,
Fernando Uyaguari¹,
Burak Turhan²,
Ayse Tosun³,
Davide Fucci²,
Markku Oivo² &
…
Natalia Juristo^1,2

2360 Accesses
38 Citations
48 Altmetric
1 Mention
Explore all metrics

Abstract

Context

There is a widespread belief in both SE and other branches of science that experience helps professionals to improve their performance. However, cases have been reported where experience not only does not have a positive influence but sometimes even degrades the performance of professionals.

Aim

Determine whether years of experience influence programmer performance.

Method

We have analysed 10 quasi-experiments executed both in academia with graduate and postgraduate students and in industry with professionals. The experimental task was to apply ITLD on two experimental problems and then measure external code quality and programmer productivity.

Results

Programming experience gained in industry does not appear to have any effect whatsoever on quality and productivity. Overall programming experience gained in academia does tend to have a positive influence on programmer performance. These two findings may be related to the fact that, as opposed to deliberate practice, routine practice does not appear to lead to improved performance. Experience in the use of productivity tools, such as testing frameworks and IDE also has positive effects.

Conclusion

Years of experience are a poor predictor of programmer performance. Academic background and specialized knowledge of task-related aspects appear to be rather good predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Developers’ Diverging Perceptions of Productivity

Empirical, Human-Centered Evaluation of Programming and Programming Language Constructs: Controlled Experiments

No Single Metric Captures Productivity

Notes

TEKES: Finnish Funding Agency for Technology and Innovation
Not specified so as not to disclose T. Raty’s organization.
Although we had 126 experimental subjects, 11 observations were lost during the analysis as two subjects failed to complete the experimental task, six failed to report their academic qualifications and four failed to report any experience. Consequently, we were only able to effectively process 115 cases.

References

Adelson B (1981) Problem solving and the development of abstract categories in programming languages. Mem Cogn 9(4):422–433
Article Google Scholar
Adelson B (1984) When novices surpass experts: the difficulty of a task may increase with expertise. J Exp Psychol: Learn Mem Cogn 10(3):483
Google Scholar
Agarwal R, Tanniru MR (1991) Knowledge extraction using content analysis. Knowl Acquis 3:421–441
Article Google Scholar
Aranda A, Dieste O, Juristo N (2014) Evidence of the presence of bias in subjective metrics: analysis within a family of experiments. Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE 2014). London, UK, pp 24–27
Arisholm E, Gallis H, Dyba T, Sjoberg DIK (2007) Evaluating pair programming with respect to system complexity and programmer expertise. IEEE Trans Softw Eng 33(2):65–86
Article Google Scholar
Armour PG (2004) Beware of counting LOC. Commun ACM 47(3):21–24
Article MathSciNet Google Scholar
Askar P, Davenport D (2009) An investigation of factors related to self-efficacy for java programming among engineering students. Turk Online J Educ Technol 8(1):26–32
Google Scholar
Belsley DA (1991) Conditioning diagnostics: collinearity and weak data in regression. Wiley
Bob U (2005) The bowling game kata. Retrieved from http://butunclebob.com/ArticleS.UncleBob.TheBowlingGameKata
Brandmaier AM, von Oertzen T, McArdle JJ, Lindenberger U (2013) Structural equation model trees. Psychol Methods 18:71–86
Article Google Scholar
Burkhardt J, Détienne F, Wiedenbeck S (1997) Mental representations constructed by experts and novices in object-oriented program comprehension. In: Howard S, Hammond J, Lindgaard G (eds) Springer US, pp 339–346
Burkhardt J, Détienne F, Wiedenbeck S (2002) Object-oriented program comprehension: effect of expertise, task and phase. Empir Softw Eng 7(2):115–156
Article MATH Google Scholar
Camerer CF, Johnson EJ (1997) 10 the process-performance paradox in expert judgment: How can experts know so much and predict so badly? Research on Judgment and Decision Making: Currents, Connections, and Controversies. 342
Campbell RL, Bello LD (1996) Studying human expertise: beyond the binary paradigm. J Exp Theor Artif Intell 8(3-4):277–291
Article Google Scholar
Chase WG, Simon HA (1973) The mind’s eye in chess
Chmiel R, Loui MC (2004) Debugging: from novice to expert. ACM SIGCSE Bull 36(1):17–21
Article Google Scholar
Chulis K (2012) Optimal segmentation approach and application. clustering vs. classification trees. Retrieved from http://www.ibm.com/developerworks/library/ba-optimal-segmentation/
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale
MATH Google Scholar
Colvin G (2008) Talent is overrated: What really separates world-class performers from Everybody Else. Penguin Publishing Group
Crosby M, Scholtz J, Widenbeck S (2002) The roles beacons play in comprehension for novice and expert programmers. 14th Workshop of the Psychology of Programming Interest Group, Brunel University. pp 58–73
Curtis B (1984) Fifteen years of psychology in software engineering: individual differences and cognitive science. IEEE Press, Orlando
Google Scholar
Curtis B, Krasner H, Iscoe N (1988) A field study of the software design process for large systems. Commun ACM 31(11):1268–1287
Article Google Scholar
Darcy DP, Ma M (2005) Exploring individual characteristics and programming performance: Implications for programmer selection. Proceedings of the 38th Annual Hawaii International Conference on System Sciences, 314a.
Daun M, Salmon A, Weyer T, Pohl K (2015) The impact of students’ skills and experiences on empirical results: A controlled experiment with undergraduate and graduate students. Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, Art. No. 29.
De Groot AD (1978) Thought and choice in chess. Walter de Gruyter
Erdogmus H, Morisio M, Torchiano M (2005) On the effectiveness of the test-first approach to programming. Softw Eng IEEE Trans 31(3):226–237
Article Google Scholar
Ericsson KA (2006a) The influence of experience and deliberate practice on the development of superior expert performance. The Cambridge Handbook of Expertise and Expert Performance, pp 683–703
Ericsson KA (2006b) An introduction to cambridge handbook of expertise and expert performance: Its development, organization, and content. In: Ericsson KA, Charness N, Hoffman RR, Feltovich PJ (eds) The cambridge handbook of expertise and expert performance. Cambridge University Press, pp 3–19
Ericsson KA, Charness N (1994) Expert performance: its structure and acquisition. Am Psychol 49(8):725
Article Google Scholar
Ericsson KA, Lehmann AC (1996) Expert and exceptional performance: evidence of maximal adaptation to task constraints. Annu Rev Psychol 47(1):273–305
Article Google Scholar
Ericsson KA, Krampe RT, Tesch-Römer C (1993) The role of deliberate practice in the acquisition of expert performance. Psychol Rev 100(3):363–406
Article Google Scholar
Experience (2015) from http://www.merriam-webster.com/dictionary/experience. Retrieved 7 Oct 2015
Faul F, Erdfelder E, Lang A, Buchner A (2007) G* power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39(2):175–191
Article Google Scholar
Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach, third edition. CRC Press.
Field A, Miles J, Field Z (2012) Discovering statistics using R. SAGE Publications
Glenwick DS (2016) Handbook of methodological approaches to community-based research: Qualitative, quantitative, and mixed methods. Oxford University Press
Green SB (1991) How many subjects does it take to do A regression analysis. Multivar Behav Res 26(3):499–510
Article Google Scholar
Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. Academic Press
Heiberger RM, Holland B (2013) Statistical analysis and data display: an intermediate course with examples in S-plus, R, and SAS. Springer, New York
MATH Google Scholar
ISO I (2011) IEC25010: 2011 systems and software engineering–systems and software quality requirements and evaluation (SQuaRE)–System and software quality models. Int Organ Stand
Jeffries R, Turner AA, Polson PG, Atwood ME (1981) The processes involved in designing software. Cogn Skills Acquis 255:283
Google Scholar
Jørgensen M, Faugli B, Gruschke T (2007) Characteristics of software engineers with optimistic predictions. J Syst Softw 80(9):1472–1482
Article Google Scholar
Kitchenham B, Mendes E (2004) Software productivity measurement using multiple size measures. IEEE Trans Softw Eng 30(12):1023–1035
Article Google Scholar
Larkin J, McDermott J, Simon DP, Simon HA (1980) Expert and novice performance in solving physics problems. Science (New York, NY) 208(4450):1335–1342
Article Google Scholar
Lee WK, Chung IS, Yoon GS, Kwon YR (2001) Specification-based program slicing and its applications. J Syst Archit 47(5):427–443
Article Google Scholar
Lui KM, Chan KCC (2006) Pair programming productivity: novice–novice vs. expert–expert. Int J Hum-Comput Stud 64(9):915–925
Article Google Scholar
MacCallum R, Zhang S, Preacher K, Rucker D (2002) On the practice of dichotomization of quantitative variables. 7:10–40
MacDorman KF, Whalen TJ, Ho C, Patel H (2011) An improved usability measure based on novice and expert performance. Int J Hum-Comput Interact 27(3):280–302
Article Google Scholar
Madeyski L (2005) Preliminary analysis of the effects of pair programming and test-driven development on the external code quality. Proceedings of the 2005 Conference on Software Engineering: Evolution and Emerging Technologies. pp. 113–123
Marakas GM, Elam JJ (1998) Semantic structuring in analyst and representation of facts in requirements analysis. Inf Syst Res 9(1):37–63
Article Google Scholar
Mayer RE (1997) From novice to expert. In: Helander M, Landauer TK, Prabhu P (eds) Handbook of human-computer interaction, 2nd edn. Elsevier Science B.V, pp. 781–795
McDaniel MA, Schmidt FL, Hunter JE (1988) Job experience correlates of job performance. J Appl Psychol 73(2):327
Article Google Scholar
McKeithen KB, Reitman JS, Rueter HH, Hirtle SC (1981) Knowledge organization and skill differences in computer programmers. Cogn Psychol 13(3):307–325
Article Google Scholar
Miles J, Shevlin M (2001) Applying regression and correlation: A guide for students and researchers. SAGE Publications
Müller MM, Höfer A (2007) The effect of experience on the test-driven development process. Empir Softw Eng 12(6):593–615
Article Google Scholar
Muller MM, Padberg F (2004) An empirical study about the feelgood factor in pair programming. Proceedings 10th International Symposium on Software Metrics. pp 151–158
Munir H, Moayyed M, Petersen K (2014) Considering rigor and relevance when evaluating test driven development: a systematic review. Inf Softw Technol 56(4):375–394
Article Google Scholar
Nisbet R, Elder J, Miner G (2009) Handbook of statistical analysis and data mining applications. Academic Press
O’brien R (2007) A caution regarding rules of thumb for variance inflation factors. Qual Quant 41(5):673–690
Article Google Scholar
Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2007) The role of experience and ability in comprehension tasks supported by UML stereotypes. 29th International Conference on Software Engineering. pp 375–384
Riley RD, Lambert PC, Abo-Zaid G (2010) Meta-analysis of individual participant data: Rationale, conduct, and reporting. BMJ 340. doi:10.1136/bmj.c221
Runeson P (2003) Using students as experiment subjects – an analysis on graduate and freshmen student data. Proceedings 7^Th International conference on empirical assessment & evaluation in software engineering. pp 95–102
Sheppard SB, Curtis B, Milliman P, Love T (1979) Modern coding practices and programmer performance. Computer 12:41–49
Article Google Scholar
Siegmund J, Kästner C, Liebig J, Apel S, Hanenberg S (2014) Measuring and modeling programming experience. Empir Softw Eng 19(5):1299–1334
Article Google Scholar
Sim SE, Ratanotayanon S, Aiyelokun O, Morris E (2006) An initial study to develop an empirical test for software engineering expertise. Institute for Software Research, University of California, Irvine, CA, USA, Technical Report# UCI-ISR-06-6
Soloway E, Ehrlich K (1984) Empirical studies of programming knowledge. IEEE Trans Softw Eng SE-10(5):595–609
Article Google Scholar
Soloway E, Bonar J, Ehrlich K (1983) Cognitive strategies and looping constructs: an empirical study. Commun ACM 26(11):853–860
Sonnentag S (1995) Excellent software professionals: experience, work activities, and perception by peers. Behav Inform Technol 14(5):289–299
Article Google Scholar
Sonnentag S (1998) Expertise in professional software design: a process study. J Appl Psychol 83(5):703–715
Article Google Scholar
Votta LG (1994) By the way, has anyone studied any real programmers, yet? Software Process Workshop, 1994. Proceedings., Ninth International. pp 93–95
Weisberg S (2005). Applied Linear Regression, third edition. John Wiley & Sons, Inc., Hoboken, NJ
Weiser M (1981) Program slicing. IEEE Press, San Diego
MATH Google Scholar
Weiser J, Shertz J (1984) Programming problem representation in novice and expert programmers. Int J Man-Mach Stud 19:391–398
Article Google Scholar
Wiedenbeck S (1985) Novice/expert differences in programming skills. Int J Man-Mach Stud 23(4):383–390
Article Google Scholar
Williams L, Kudrjavets G, Nagappan N (2009) On the effectiveness of unit test automation at microsoft. software reliability engineering, 2009. ISSRE ‘09. 20th International symposium on. pp 81–89
Winship C, Mare RD (1984) Regression models with ordinal variables. Am Sociol Rev 49(4):512–525
Article Google Scholar
Ye N, Salvendy G (1994) Quantitative and qualitative differences between experts and novices in chunking computer software knowledge. Int J Hum-Comput Interact 6(1):105–118
Article Google Scholar

Download references

Acknowledgments

We would like to acknowledge Dr.Hakan Erdogmus who contributed to the design of one of the tasks used in the study (BSK) and the corresponding test cases. We also wish to acknowledge Mr. Timo Raty for his participation in the creation of the code templates for C++, and the training given in one of the quasi-experiments. We wish also acknowledge Mr. Adrian Santos for his support in the collection of the subjects’ data.

Author information

Authors and Affiliations

Escuela Técnica Superior de Ingenieros en Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, 28660, Boadilla del Monte, Spain
Oscar Dieste, Alejandrina M. Aranda, Fernando Uyaguari & Natalia Juristo
Department of Information Processing Science, University of Oulu, P. O. Box 3000, 90014, Oulu, Finland
Burak Turhan, Davide Fucci, Markku Oivo & Natalia Juristo
Faculty of Computer & Informatics, Istanbul Technical University, 34469, Maslak, Istanbul, Turkey
Ayse Tosun

Authors

Oscar Dieste
View author publications
You can also search for this author in PubMed Google Scholar
Alejandrina M. Aranda
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Uyaguari
View author publications
You can also search for this author in PubMed Google Scholar
Burak Turhan
View author publications
You can also search for this author in PubMed Google Scholar
Ayse Tosun
View author publications
You can also search for this author in PubMed Google Scholar
Davide Fucci
View author publications
You can also search for this author in PubMed Google Scholar
Markku Oivo
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Juristo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oscar Dieste.

Additional information

Communicated by: Richard Paige, Jordi Cabot, and Neil Ernst

Appendices

Appendix 1: Description of the Independent Variables

Table 15 shows the 15 independent variables used in this research. The main aim of this appendix is to list each variable giving a brief description of the variable, its type (nominal, ordinal or dummy) and its respective levels. Section 3 details the types and measurement of variables.

Table 15 Independent variables

Full size table

Appendix 2: Details of the Experiment

2.1 Specification for Mars Rover API with Slicing

Develop an API that moves a rover around a planet. The planet is represented as a grid with x and y coordinates. The rover is also facing in a direction. The direction can be north (N), south (S), west (W) or east (E). The input received by the rover is a string representing the commands it needs to execute.

2.1.1 The Planet

The planet on which the rover moves is represented as a square grid, with size (x, y).

Requirement: Define a planet of size (x, y).

Example: (100,100) creates a planet of size 100 × 100.

2.1.2 Landing

When the rover lands on the planet, it begins its journey at the start of the grid facing north.

Requirement: When the rover lands on the planet its position shall be (0,0) facing north.

Example: An empty command (i.e., “”) to the rover returns its landing status (0,0,N).

2.1.3 Turning

The rover turns right or left. It remains in the same cell of the grid. Its direction changes accordingly.

Requirement: Compute the position of the rover after turning left (command “l”) or right (command “r”).

Example: A rover at position (0,0,N) is at position (0,0,E) after executing command “r”. A rover at position (0,0,N) is at position (0,0,W) after executing command “l”.

2.1.4 Moving

The rover moves forward or backward one grid cell in the direction that it is facing. The rover’s direction does not change.

Requirement: Compute the position of the rover after moving forward (command “f”) or backward (command “b”) one grid cell.

Example: A rover at position (7,6,N) moves to (7,7,N) after executing a “f” command. A rover at position (5,8,E) moves to (4,8,E) after executing a “b” command.

2.1.5 Moving and Turning Combined

The rover shall be able to execute arbitrary sequences of “f”, “b”, “l” and “r” commands.

Requirement: Compute the position of the rover after executing a series of commands.

Example: A rover at position (0,0,N) moves to position (2,2,E) after executing “ffrff”.

2.1.6 Wrapping

Since the planet is a sphere the rover wraps at the opposite edge once it moves over it.

Requirement: Compute the position of the rover moving over the edges. The rover shall spawn on the opposite side.

Example: A rover on a planet of size 100 × 100, which moves backward (command “b”) after landing (remember that landing always takes place at position (0,0,N)) moves to position (0,99,N).

2.1.7 Positioning of Obstacles

Obstacles can be positioned on specific cells of the grid.

Requirement: Define the obstacles as a string (x1,y1) (x2,y2)… Place the obstacles on the grid.

Example: “(1,1) (4,5)” defines two obstacles, one at position (1,1) and another at position (4,5). Notice that the planet grid should be greater than or equal to 6 × 6.

2.1.8 Identifying a Single Obstacle

The rover might encounter (i.e., tries to move into) an obstacle. When it does it should report the obstacle and continue executing the remaining commands.

Requirement: Compute the position of a rover encountering an obstacle and report the obstacle. The same obstacle should be reported only once.

Example: A rover just landed (position (0,0,N)). There is one obstacle at planet coordinates (2,2). The rover executes “ffrfff” and reports (1,2,E) (2,2). Notice that the same obstacle is encountered twice but reported only once.

2.1.9 Identifying Multiple Obstacles

The rover might encounter multiple obstacles. When it does, it should report all of them once and in the order they were encountered.

Requirement: Compute the position of the rover encountering obstacles, and report the obstacles encountered in the order they are encountered. The same obstacle shall be reported only once.

Example: A rover just landed (position(0,0,N)). There are two obstacles at planet coordinates (2,2) and (2,1). The rover executes “ffrfffrflf” and reports (1,1,E) (2,2) (2,1). Notice that the first obstacle is encountered twice but reported only once.

2.1.10 A Tour Around the Planet

The rover goes on a tour around the planet encountering several obstacles, and wrapping in both axes.

Requirement: Compute the position of a rover that executes a series of commands that result in moving along both axes in both directions, encountering several obstacles and wrapping from both edges of the planet.

Example: The rover lands on a 6 × 6 planet with obstacles at (2,2), (0,5) and (5,0). It executes the command “ffrfffrbbblllfrfrbbl” and returns (0,0,N) (2,2) (0,5) (5,0).

Congratulations, you are done!

2.2 Specification for Mars Rover API without Slicing

The API manages a rover that moves on a planet (/squared grid) of arbitrary size (x,y). The rover starts the movement at position (0,0). The direction of the movement can be N (north), S (south), E (east) and W (west). The rover is north facing at the start.

The rover receives a string of commands: l (left), r (right), f (forward) and b (backward). l and r change the rover’s direction counter- and clockwise, respectively, but do not alter its position. f and b move the rover 1 position on the grid in or away from the direction that it is facing, respectively. The direction in which the rover is facing does not change. When the rover moves over the edges of the planet, it spawns on the opposite side.

The planet (/grid) may contain obstacles. Obstacles are defined as a list of coordinates “(obs1X, obs1Y) (obs2X, obs2Y)…” . When the rover finds an obstacle during a tour, it skips the current command (i.e., it does not move to the cell in which the obstacle is located) and continues to execute the remaining commands.

Upon processing the string of commands, the rover returns its position and direction in the format “(posX, posY, facing)”. If obstacles are found, the output will be “(posX, posY, facing) (obs1X, obs1Y) (obs2X, obs2Y)…” The same obstacle shall be reported only once. Obstacles are reported in the order in which they are found.

2.3 Specification for Bowling Score Keeper with Slicing

The objective is to develop an application that can calculate the score of a single bowling game using TDD. There is no graphical user interface. All that you will use in this assignment is the objects and JUnit testing. You will not need a main method.

The application requirements are divided into a set of user stories, which is as your to-do list. You should be able to incrementally develop a complete solution without an upfront comprehension of all the game’s rules. For this exercise, don’t read ahead, and handle the requirements one at a time in the stated order. Solve the problem using TDD, starting with the requirement for the first story. Remember to always lead with a test case, taking hints from the examples provided. Do not move to the next story until you have done with the last one. A story is done when you are confident that your program correctly implements the functionality stipulated by the requirement for the story. This means that all of your test cases for that story and all of the test cases for the previous stories pass. You may need to tweak your solution as you progress towards more advanced stories.

2.3.1 Frame

Each turn of a bowling game is called a frame. 10 pins are arranged in each frame. The goal of the player is to knock down as many pins as possible in each frame. The player has two chances, or throws, to do so. The value of a throw is given by the number of pins knocked down in that throw.

Story: As the scorekeeper, I want to be able to record a frame as composed of two throws. The first and second throws should be distinguishable.

Example: [2, 4] is a frame with two throws, in which two pins were knocked down in the first throw and four pins were knocked down in the second.

2.3.2 Frame Score

An ordinary frame’s score is the sum of its throws.

Story: As the scorekeeper, I want to be able to compute the score of an ordinary frame after a player has rolled both throws.

Examples: The score of the frame [2, 6] is 8. The score of the frame [0, 9] is 9.

2.3.3 Game

A single game consists of 10 frames.

Story: As the scorekeeper, I want to define a game as a sequence of 10 frames.

Example: The sequence of frames [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] represents a game. You may reuse this game from now on to represent and test different scenarios, modifying only a few frames each time.

2.3.4 Partial Game

When the player rolls a throw, the throw is automatically recorded in the correct frame.

Story: As the scorekeeper, when a player rolls throws, I want the game to keep track of the frames and figure out in which frame to place the next throw depending on the past throws. You think this is easy. Maybe for now. We’ll see.

Example: If the game currently consists of the frames [1, 5] [3, 6] [7, 2] [3, ?] and the player rolls a throw with a value of 4, the game becomes [1, 5] [3, 6] [7, 2] [3, 4]. Another roll with a value of 5 transforms the game to [1, 5] [3, 6] [7, 2] [3, 4] [5, ?].

2.3.5 Game Score

The score of a bowling game is the sum of the individual scores of its frames.

Story: As the scorekeeper, I want to know a player’s current game score at all times.

Example: The score of the game [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] is 81. Partial scores are possible for an incomplete game if the frame scores are known up to the last complete frame. The score of the game [1, 5] [3, 6] [7, ?] is 15. The frame [7, ?] is not yet complete.

2.3.6 Strike

A frame is called a strike if all 10 pins are knocked down in the first throw. In this case, there is no second throw. A strike frame can be written as [10, 0]. The score of a strike equals 10 plus the sum of the next two throws of the subsequent frame.

Story: As the scorekeeper, I want to be able to recognize a strike frame, compute its score after the next frame has been completed, and compute the game score.

Examples: Suppose [10, 0] and [3, 6] are consecutive frames. Then the first frame is a strike and its score equals 10 + 3 + 6 = 19. The game [10, 0] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 94. The partial game [10, 0] [3, 6] has a score of 28.

2.3.7 Spare

A frame is called a spare when all 10 pins are knocked down in two throws. The score of a spare frame is 10 plus the value of the first throw from the subsequent frame.

Story: As the scorekeeper, I want to be able to recognize a spare frame, compute the score of a game containing a spare frame after the first throw of the next frame has been rolled, and compute the game’s score.

Examples: [1, 9], [4, 6], [7, 3] are all spares. If you have two frames [1, 9] and [3, 6] in a row, the spare frame’s score is 10 + 3 = 13. The game [1, 9] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 88. The partial game [1, 9] [3, 6] has a score of 22.

2.3.8 Strike and Spare

A strike can be followed by a spare. The strike’s score is not affected when this happens.

Story: As the scorekeeper, I want to make sure that the score of a strike is computed right when it’s followed by a spare.

Examples: In the sequence [10, 0] [4, 6] [7, 2], a strike is followed by a spare. In this case, the score of the strike is 10 + 4 + 6 = 20, and the score of the spare is 4 + 6 + 7 = 17. The game [10, 0] [4, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 103.

2.3.9 Multiple Strikes

Two strikes in a row are possible. You must take care when this happens as you need the values of throws from the next two frames to compute the score of the first strike..

Story: As the scorekeeper, I want to make sure that I can record two consecutive strikes correctly in the game, and correctly compute the score of the first strike after the next two throws have been rolled.

Examples: In the sequence [10, 0] [10, 0] [7, 2], the score of the first strike is 10 + 10 + 7 = 27. The score of the second strike is 10 + 7 + 2 = 19. The game [10, 0] [10, 0] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 112. The score of the partial game [10, 0] [10, 0] [7, ?] is 27 (we cannot compute the scores of the last two frames yet).

2.3.10 Multiple Spares

Two spares in a row are possible. The score of the first spare is not affected when this happens.

Story: As the scorekeeper, I want to be able to compute the score of a game with two spares in a row, and the scores of the first spare after the next spare has been completed.

Example: The game [8, 2] [5, 5] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 6] has a score of 98.

2.3.11 Spare as the Last Frame

When the last frame in a game is a spare, the player will be given a bonus throw. However, this bonus throw does not belong to a regular frame. It is only used to calculate the score of the last spare.

Story: As the scorekeeper, I hate it when the last frame is a spare: let the game please figure out that the next roll is a bonus throw and compute the score of the last frame and the whole game based on the value of that bonus throw.

Example: The last frame in the game [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 8] is a spare. If the bonus throw is [7], the last frame has a score of 2 + 8 + 7 = 17. The game has a score of 90.

2.3.12 Strike as the Last Frame

When the last frame of the game is a strike, the player will be given two bonus throws. However, these two bonus throws do not belong to a regular frame. They are only used to calculate score of the last strike frame.

Story: As the scorekeeper, I hate it even more when the last frame of a game is a strike: let the game please figure out that the next rolls are bonus throws and compute the score of the last frame and the whole game based on the value of those bonus throws.

Example: The last frame in the game [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [10, 0] is a strike. If the bonus throws are [7, 2], the last frame’s score is 10 + 7 + 2 = 19. The game score is 92.

2.3.13 Bonus is a Strike

No more bonus throws are granted when the last frame in the game is a spare and the bonus throw is a strike.

Story: As the scorekeeper, I hate it most when the last frame is spare and the bonus throw is a strike: please God, let the game figure this scenario out correctly.

Example: In the game [1, 5] [3, 6] [7, 2] [3, 6] [4, 4] [5, 3] [3, 3] [4, 5] [8, 1] [2, 8], the last frame is a spare. If the bonus throw is [10], the game score is 93.

2.3.14 Best Score

A perfect game consists of all strikes (a total of 12, including the bonus throws), and has a score of 300.

Story: As the scorekeeper, I love it when the game is just a sequence of strikes, including the bonus throws, because I know that the player then deserves a perfect score of 300.

Example: A perfect game looks like [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] [10, 0] with bonus throws [10, 10]. Its score is 300.

2.3.15 Random Game

Story: As the scorekeeper, I want to make sure that the game [6, 3] [7, 1] [8, 2] [7, 2] [10, 0] [6, 2] [7, 3] [10, 0] [8, 0] [7, 3] [10] has a score of 135.

Congratulations, you are done!

2.4 Specification for Bowling Score Keeper without Slicing

The game consists of 10 frames as shown above. The player has two opportunities in each frame to knock down 10 pins. The score for the frame is the total number of pins knocked down, plus bonuses for strikes and spares.

A spare is when the player knocks down all 10 pins in two tries. The bonus for that frame is the number of pins knocked down by the next ball rolled. So, the score in frame 3 above is 10 (the total number knocked down), plus a bonus of 5 (the number of pins knocked down on the next roll.).

A strike is when the player knocks down all 10 pins on his or her first try. The bonus for that frame is the value of the next two balls rolled.

A player who rolls a spare or strike in the tenth frame is allowed to roll the extra balls to complete the frame. However, no more than three balls can be rolled in tenth frame.

Appendix 3: Industry Questionnaire

Appendix 4: Academic Questionnaire

Appendix 5: Breakdown of Experience

5.1 Programming Language Experience

Figure 8

5.2 Overall Programming Language Experience

Figure 9

Appendix 6: Collinearity Conditions

Table 16 reports the results of the collinearity analysis for the model with 15 independent variables. The pattern shown in Table 16 suggests that the testing framework (UNIT_TESTING_FRAMEWORK2_ADAPTED) might be collinear, as it has values close to the bounds established for the variance inflation factor (VIF = 4.943) and a low tolerance (T = 0.202). On the other hand, the collinearity statistics for the other variables are within the expected values (VIF < 5 and T > 0.2), which is a sign that they are not collinear.

Table 16 Coefficients of the linear regression model with 15 independent variables

Full size table

Table 17 shows the collinearity diagnostics of the model specified in Table 16. Note that component 16 has a very high condition index (CI = 86.918 > 30), which suggests that the level of collinearity is high. Comparing the proportion of variance explained for each of the model explanatory variables, we find that the UNIT_TESTING_FRAMEWORK_ADAPTED and EXPERIMENT_PROGRAMMING_LANGUAGE variables have an extremely high proportion of variance explained with values of 0.90 and 0.46, respectively. One way of solving the collinearity problem is to remove the most collinear variable, which, in this case, is UNIT_TESTING_FRAMEWORK_ADAPTED.

Table 17 Collinearity diagnostics (1)

Full size table

6.1 Model 2

Table 18 reports the collinearity diagnostics of model 2 with 14 variables, which is composed of all the variables of the original model, except the UNIT_TESTING_FRAMEWORK_ADAPTED variable that was eliminated on the grounds of collinearity.

Table 18 Collinearity diagnostics (2) with 14 variables

Full size table

Note that dimension 15 still has a very high condition index (CI = 43 > 30), which implies that there is a problem of collinearity. There are three closely correlated variables: EXPERIMENT_PROGRAMMING_LANGUAGE, SITE and TRAINER. In order to deal with the collinearity problem, we have opted to eliminate the variable with the highest proportion of variance explained, which in this case is EXPERIMENT_PROGRAMMING_LANGUAGE with a proportion of variance explained of 0.40.

6.2 Model 3

Table 19 reports the collinearity diagnostics of model 3 with 13 variables, which is composed of all the variables of model 2 except the EXPERIMENT_PROGRAMMING_ LANGUAGE variable.

Table 19 Collinearity diagnostics (3) with 13 variables

Full size table

Note that dimension 14 still has a condition index greater than 30 (CI = 33.67 > 30), which suggests that there is a problem of collinearity. There are three closely correlated variables: SITE, and TRAINER and CS_DEGREE. According to the non-collinearity condition, we should eliminate the variable with the highest proportion of variance explained. Bearing in mind the experimental data type, we know that SITE (which refers to whether the experiment was conducted in academia or industry) is closely related to TRAINER. Therefore, we will eliminate the TRAINER variable, as one of the trainers mostly trained subjects in industry and the other trained subjects in academia, and kept SITE, which is a more interesting variable for this research.

6.3 Model 4

Table 20 shows the collinearity diagnostics of model 4 with 12 variables, which is composed of all the variables of model 3 except the TRAINER variable. Model 4 is the model that we finally used in this research. Note that this model meets the collinearity conditions: a) the condition index of dimension 13 (CI = 29) is less than 30 and b) the proportions of variance explained are within the established bounds (less than 0.5).

Table 20 Collinearity diagnostics (4) with 12 variables

Full size table

Appendix 7: Multiple Linear Regression – Alternative Model

7.1 Quality

Table 21 shows the results of the multiple regression model with respect to the influence of External Quality. Note that experience is measured on a Likert scale in this case.

Table 21 Results of the MRL - Quality

Full size table

7.2 Productivity

Table 22 shows the results of the multiple regression model with respect to the influence of Productivity. Note that experience is measured on a Likert scale in this case.

Table 22 MRL results – Productivity

Full size table

Appendix 8: Residual Analysis by Experiment

8.1 Quality

Table 23 Effect of the experiment on Quality

Full size table

The results reported in Table 24 show that the model residuals plotted against the EXPERIMENT_CODE variable are significant (p-value = 0.006 < 0.05), which means that the variances are not homogeneous.

Table 24 Levene test for QLTY

Full size table

8.2 Productivity

Table 25 Effect of the experiment on PRODUCTIVITY

Full size table

The results reported in Table 26 show that the model residuals plotted against the EXPERIMENT_CODE variable are not significant (p-value = 0.155 > 0.05), which suggests that the residual variances are homogeneous.

Table 26 Levene test for PROD

Full size table

Appendix 9: SPSS Scripts

9.1 Filter

9.2 Original MLR Model

9.3 MLR Results for QLTY

9.4 MLR Results for PROD

9.5 Decision Trees for the QLTY

9.6 Decision Trees for the PROD

Appendix 10: Decision Trees CART (CRT)

10.1 QLTY

Figure 12 shows the decision tree for the QLTY response variable with different number of cases for the parent node (N) and the child node (n).

10.2 Productivity

Figure 13 shows the decision tree for the PROD response variable with different number of cases for the parent node (N) and the child node (n).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dieste, O., Aranda, A.M., Uyaguari, F. et al. Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study. Empir Software Eng 22, 2457–2542 (2017). https://doi.org/10.1007/s10664-016-9471-3

Download citation

Published: 04 February 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10664-016-9471-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study

Abstract

Context

Aim

Method

Results

Conclusion

Access this article

Similar content being viewed by others

Developers’ Diverging Perceptions of Productivity

Empirical, Human-Centered Evaluation of Programming and Programming Language Constructs: Controlled Experiments

No Single Metric Captures Productivity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Description of the Independent Variables

Appendix 2: Details of the Experiment

2.1 Specification for Mars Rover API with Slicing

2.1.1 The Planet

2.1.2 Landing

2.1.3 Turning

2.1.4 Moving

2.1.5 Moving and Turning Combined

2.1.6 Wrapping

2.1.7 Positioning of Obstacles

2.1.8 Identifying a Single Obstacle

2.1.9 Identifying Multiple Obstacles

2.1.10 A Tour Around the Planet

2.2 Specification for Mars Rover API without Slicing

2.3 Specification for Bowling Score Keeper with Slicing

2.3.1 Frame

2.3.2 Frame Score

2.3.3 Game

2.3.4 Partial Game

2.3.5 Game Score

2.3.6 Strike

2.3.7 Spare

2.3.8 Strike and Spare

2.3.9 Multiple Strikes

2.3.10 Multiple Spares

2.3.11 Spare as the Last Frame

2.3.12 Strike as the Last Frame

2.3.13 Bonus is a Strike

2.3.14 Best Score

2.3.15 Random Game

2.4 Specification for Bowling Score Keeper without Slicing

Appendix 3: Industry Questionnaire

Appendix 4: Academic Questionnaire

Appendix 5: Breakdown of Experience

5.1 Programming Language Experience

5.2 Overall Programming Language Experience

Appendix 6: Collinearity Conditions

6.1 Model 2

6.2 Model 3

6.3 Model 4

Appendix 7: Multiple Linear Regression – Alternative Model

7.1 Quality

7.2 Productivity

Appendix 8: Residual Analysis by Experiment

8.1 Quality

8.2 Productivity

Appendix 9: SPSS Scripts

9.1 Filter

9.2 Original MLR Model

9.3 MLR Results for QLTY

9.4 MLR Results for PROD

9.5 Decision Trees for the QLTY

9.6 Decision Trees for the PROD

Appendix 10: Decision Trees CART (CRT)

10.1 QLTY

10.2 Productivity

Rights and permissions

About this article

Cite this article

Share this article