Skip to main content
Log in

Characterizing the transfer of program comprehension in onboarding: an information-push perspective

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

A Correction to this article was published on 24 February 2021

This article has been updated

Abstract

Many software developers struggle to understand code written by others, leading to increased maintenance costs. Research on program comprehension to date has primarily focused on individual developers attempting to understand code. However, software developers also work together to share and transfer understanding of their codebases. This is common during the onboarding process, when a new developer has joined a project or a company. The work reported here uses a Grounded Theory approach to explore the different types of information passed from experts to newcomers during onboarding, and the perceived value of these types. The theory is grounded in field-study data collected during twelve in-situ onboarding sessions, across eight organizations, with a design based on two pilot studies that were carried out in advance. The field-study data was supplemented and validated with interviews and questionnaires. It provides a description of four views through which the experts represent their code to the newcomers, revealing several interesting aspects of expert-led program comprehension. In particular, it provides evidence that extends current thinking on the temporal aspect of code: where experts discuss changes that have been made to the code-base, changes that are currently being made to the code-base (including temporary fixes) and changes intended for the code-base in the future. In addition, a rationale-based view of the code-base is emphasized in the findings, making explicit the system’s functional/non-functional requirements, and their impact on the system’s design. This information was perceived as highly valued by the newcomers. Additionally, Structural and Algorithmic views, which have already been firmly established in program comprehension literature, were also noted in these onboarding sessions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Change history

References

  • Adair JG (1984) The Hawthorne effect: a reconsideration of the methodological artefact. J Appl Psychol 69(2):334–345

    Google Scholar 

  • Adolph S, Hall W, Kruchten P (2011) Using grounded theory to study the experience of software development. Empir Softw Eng 16(4):487–513

    Google Scholar 

  • Afonso LM., Cerqueira RF de G and de Souza CS (2012), Evaluating application programming interfaces as communication artefacts. in Proceedings of the Psychology of Programming Interest Group 2012, pp 151–162

  • Bass L (2007), Software architecture in practice. Pearson Education. ISBN: 0321815734

  • Begel A and Simon B (2008a) Novice software developers, all over again. In Proceedings of the Fourth international Workshop on Computing Education Research (ICER '08). ACM, New York, 3–14

  • Begel A and Simon B (2008b), Struggles of new college graduates in their first software development job. In Proceedings of the 39th SIGCSE technical symposium on Computer science education (SIGCSE '08). ACM, New York, 226–230

  • Berlin L (1993), Beyond program understanding: A look at programming expertise in industry. In: Empirical Studies of Programmers: Fifth Workshop, pp 6–25

  • Berlin LM and Jeffries R (1992), Consultants and apprentices: observations about learning and collaborative problem solving. In: Proceedings of the 1992 ACM Conference on Computer-Supported Cooperative Work, pp 130–137

  • Boehm-Davis DA, Fox JE, Philips BH (1996) Techniques for exploring program comprehension. In: Empirical studies of programmers: Sixth Workshop, pp 3–37

  • Brooks R (1983) Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies 18(6):543–554

    Google Scholar 

  • Buckley J, Mooney S, Rosik J and Ali N (2013), ‘JITTAC: a just-in-time tool for architectural consistency’. In: Proceedings of the 35th International Conference on Software Engineering, pp 1291–1294

  • Buckley J, O'Brien MP, Power N (2006) Empirically refining a model of programmers’ information-seeking behavior during software maintenance. In Proceedings of the 18th Workshop of the Psychology of Programming Interest Group, pp 168-182

  • Buckley J, Rosik J, Herold S, Wasala A, Botterweck G and Exton C (2016), FLINTS: a tool for architectural-level modeling of features in software systems. In the proceedings of the 10th European Conference on Software Architecture Workshop. pp 14–22

  • Charmaz K (2009) Shifting the grounds: Constructivist grounded theory methods. In: Morse JM, Stern PN, Corbin J, Bowers B, Charmaz K, Clarke AE (eds) Developing grounded theory: The second generation. Left Coast Press, Walnut Creek, pp 127–154

    Google Scholar 

  • Chen K and Rajlich V (2011), Case study of feature location using dependency graph, after 10 years. In: Proceedings of the 18th International Conference on Program Comprehension, pp 1–3

  • Chen C, Zhang K and Itoh T (2012), Empirical evidence of tags supporting high-level awareness. Cooperative Design, Visualization, and Engineering, pp. 94–101

  • Chochlov M, English M, Buckley J (2017) A historical, textual analysis approach to feature location. Inf Softw Technol 88:110–126

    Google Scholar 

  • Clements P, Garlan D, Bass L, Stafford J, Nord R, Ivers J, and Little R (2002), Documenting software architectures: views and beyond. Pearson Education. ISBN: 0201703726

  • Corbin J and Strauss A (2008), Basics of qualitative research: Techniques and procedures for developing Grounded Theory. Sage Publications. ISBN: 141290644X

  • Dagenais B, Ossher H, Bellamy RKE, Robillard MP and de Vries JP (2010), Moving into a new software project landscape. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, pp 275–284

  • de Gialdino IV (2009), Ontological and Epistemological Foundations of Qualitative Research. at the Forum: Qualitative Social Research. 10(2), Article 30. Available at http://www.qualitative-research.net/index.php/fqs/article/view/1299/3163 Accessed 30 Sept 2018

  • Dekel U and Herbsleb J (2009a), Reading the documentation of invoked API functions in program comprehension, in IEEE 17th International Conference on Program Comprehension, pp 168–177

  • Dekel U and Herbsleb JD (2009b), Improving API documentation usability with knowledge pushing, in Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, pp 320–330

  • Denzin N (1983) Interpretive interactionism. In: Morgan G (ed) Beyond Method. Sage, California

    Google Scholar 

  • Detienne F (2002), Software design - cognitive aspects. Springer-Verlag. ISBN: 1852332530

  • Detienne F, Soloway E (1990) An empirically-derived control structure for the process of program understanding. International Journal of Man-Machine Studies 33(3):323–342

    Google Scholar 

  • Dit B, Revelle M, Gethers M, Poshyvanyk D (2011) `Feature location in source code: a taxonomy and survey. J Softw Maint Evol Res Pract 25(1):53–95

    Google Scholar 

  • Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting Empirical Methods for Software Engineering Research. In: Shull F, Singer J, Sjøberg DIK (eds) Guide to Advanced Empirical Software Engineering. Springer, London

    Google Scholar 

  • Ellis D, Haugan M (1997) `Modelling the information seeking patterns of engineers and research scientists in an industrial environment. J Doc 53(4):384–403

    Google Scholar 

  • Ericsson KA, Simon HA (1980) Verbal reports as data. Psychol Rev 87(3):215

  • Fagerholm F, Johnson P, Guinea AS, Borenstein J, and Munch J (2013), Onboarding in Open Source Projects: A Preliminary Analysis. IEEE 8th International Conference on Global Software Engineering Workshops. pp 5–10

  • Feigenspan J, Kästner C, Liebig J, Apel S, Hanenberg S (2012), Measuring programming experience. In 20th IEEE International Conference on Program Comprehension, pp. 73–82

  • Fritz T, Ou J, Murphy GC, and Murphy-Hill E (2010), A degree-of-knowledge model to capture source code familiarity. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, Vol. 1, pp 385–394

  • Gamma E, Helm R, Johnson R and Vlissides J (1995), Design patterns: elements of reusable object-oriented software. Vol. 206, Addison-Wesley. ISBN: 0321700694

  • Glaser BG, Strauss AL (1967) The discovery of Grounded Theory: Strategies for qualitative research. Aldine de Gruyter, Hawthorne ISBN: 0202302601

    Google Scholar 

  • Goncalves MK, de Souza CRB, Gonzalez VM (2011) Collaboration, information seeking and communication: An observational study of software developers' work practices. J Univ Comput Sci 17(14):1913–1930

    Google Scholar 

  • Gorton I (2006) Essential software architecture. Springer ISBN: 3–540–28713-2

  • Hertzum M, Pejtersen AM (2000) `The information-seeking practices of engineers: searching for documents as well as for people. Inf Process Manag 36(5):761–778

    Google Scholar 

  • Hoda R, Nobel J, Marshall S (2012) Developing a grounded theory to explain the practices of self-organizing agile teams. Empir Softw Eng 17(6):609–639

    Google Scholar 

  • Hunt A, Thomas D (2002) Software archaeology. IEEE Softw 19(2):20–22

    Google Scholar 

  • Jordan H, Rosik J, Herold S, Botterweck G, Buckley J (2015) Manually Locating Features in Industrial Source Code: The Search Actions of Software Nomads, in Proccedings of the IEEE 23rd International Conference on Program Comprehension, pp 174–177

  • Johnson M, Senges M (2010) Learning to be a programmer in a complex organization: A case study on practice-based learning during the onboarding process at Google. J Work Learn 22(3):180–194. https://doi.org/10.1108/13665621011028620 Accessed 17 Dec 2018

    Article  Google Scholar 

  • Kelly T and Buckley J (2006), A context-aware analysis scheme for bloom’s taxonomy, In: Proceedings of the 14th International Workshop on Program Comprehension, pp 275–284

  • Kingrey KP (2002) Concepts of information seeking and their presence in the practical library literature. Libr Philos Pract (e-journal) Available at: http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1035&context=libphilprac Accessed 18 Aug 2016

  • Ko AJ, Myers BA, Coblenz MJ, Aung HH (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Softw Eng 32:971–987

    Google Scholar 

  • Ko AJ, DeLine R and Venolia G (2007), Information needs in collocated software development teams, In: Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society, pp 344–353

  • Kuhlthau C (1988) Developing a Model of the Library Search Process: Investigation of Cognitive and Affective Aspects. Reference Quarterly 28(2):232–242

    Google Scholar 

  • Lakhotia A (1993) Understanding someone else's code: analysis of experiences. J Syst Softw 23(3):269–275

    MathSciNet  Google Scholar 

  • LaToza TD, Venolia G and DeLine R (2006), Maintaining mental models: a study of developer work habits, In: Proceedings of the 28th International Conference on Software Engineering, pp 492–501

  • Lawrance J, Burnett M, Bellamy R, Bogart C and Swart C (2010), Reactive information foraging for evolving goals, In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI '10, pp. 25–34

  • Lawrance J, Bogart C, Burnett M, Bellamy R, Rector K, Fleming S (2013) How programmers debug, revisited: An information foraging theory perspective. IEEE Trans Softw Eng 39:197–215

    Google Scholar 

  • Lee S, Kang S (2012) A study on guiding programmers code navigation with a graphical code recommender. In: Lee R (ed) Software Engineering Research, Management and Applications, Vol. 377 of Studies in Computational Intelligence. Springer, Berlin, pp 61–75

    Google Scholar 

  • Lethbridge T, Singer J, Forward A (2003) How software engineers use documentation: The state of the practice. IEEE Softw 20(6):35–39

    Google Scholar 

  • Lethbridge T, Sim S, Singer J (2005) Studying software engineers: Data collection techniques for software field studies. Empir Softw Eng 10(3):311–341

    Google Scholar 

  • Letovsky S (1987) Cognitive processes in program comprehension. J Syst Softw 7(4):325–339

    Google Scholar 

  • Lincoln YS, Guba EG (1985) Establishing trustworthiness. Naturalistic Inquiry 289:331

  • Littman D, Pinto J, Letovsky S and Soloway E (1986), Mental models and software maintenance, In: Empirical Studies of Programmers: First Workshop, p. 80–93

  • MacLeod L, Storey M-A, Bergen A (2015), Code, camera, action: how software developers document and share program knowledge using YouTube, In: Proceedings of International Conference on Program Comprehension 2015, pp 104–114

  • Marchionini G (1997), Information seeking in electronic environments, Vol. 9, Cambridge University Press. ISBN: 0521586747

  • Matroska (2013), Matroska media container. URL: http://matroska.org/. Accessed 19 June 2016

  • McDonald DW and Ackerman MS (1998), Just talk to me: a field study of expertise location, In: Proceedings of the 1998 ACM conference on Computer Supported Cooperative Work, CSCW '98, pp 315–324

  • McKeogh J and Exton C (2004), Eclipse plug-in to monitor programmer behaviour In: Proceedings of the 2004 OOPSLA Workshop on Eclipse Technology Exchange, pp 93–97

  • Mockus A, Herbsleb JD (2002) Expertise browser: a quantitative approach to identifying expertise, in Proceedings of the 24th International Conference on Software Engineering, pp 503–512

  • Muhr T (2013), Atlas.ti v6. URL: http://www.atlasti.com. Accessed 08 July 2016

  • Murray AR (2006), Discourse structure of software explanation: snapshot theory, cognitive patterns and grounded theory methods, PhD thesis, University of Ottawa

  • Murray A and Lethbridge T (2005a), Presenting micro-theories of program comprehension in pattern form, In: Proceedings of the 13th International Workshop on Program Comprehension, pp 45–54

  • Murray A and Lethbridge TC (2005b), On generating cognitive patterns of software comprehension, In: Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, CASCON '05, pp 200–211

  • Neville-Neil GV (2003) Code spelunking: Exploring cavernous code bases. ACM Queue 1(6):42–48

    Google Scholar 

  • Northrup DA (1997) The problem of the self-report in survey research. Institute for Social Research, York University

  • O’Brien M (2007), Evolving a model of the information-seeking behaviour of industrial programmers, PhD thesis, University of Limerick

  • O’Brien MP, Buckley J, Shaft TM (2004) Expectation-based, inference-based, and bottom-up software comprehension. J Softw Maint Evol Res Pract 16(6):427–447

    Google Scholar 

  • O’Brien M, Buckley J and Exton C (2005) Empirically studying software practitioners – bridging the gap between theory and practice’. In: Proceedings of the 21 International Conference on Software Maintenance, pp 433–442

  • Pennington N (1987) Stimulus structures and mental representations in expert comprehension of computer programs. Cogn Psychol 19(3):295–341

    Google Scholar 

  • Perlow L (1999) The time famine: Toward a sociology of work time. Adm Sci Q 44(1):57–81

    Google Scholar 

  • Pirolli P, Card S (1999) Information foraging. Psychol Rev 104(4):643–675

    Google Scholar 

  • Poff MA (2003), Pair programming to Facilitate the Training of Newly Hired Programmers. Technical report, Florida Institute of Technology. URI: http://hdl.handle.net/11141/116 Accessed 17 Dec 2018

  • Ragavan SS, Kuttal SK, Hill C, Sarma A, Piorkowski D, and Burnett M (2016), Foraging Among an Overabundance of Similar Variants. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, pp 3509–3521. https://doi.org/10.1145/2858036.2858469

  • Ragavan SS, Pandya B, Piorkowski D, Hill C, Kuttal SK, Sarma A, and Burnett M (2017), PFIS-V: Modeling Foraging Behavior in the Presence of Variants. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, pp 6232–6244. https://doi.org/10.1145/3025453.3025818

  • Ratanotayanon S and Sim S (2006), When programmers don't ask, in Proceedings of the 21st International Conference on Automated Software Engineering, pp 9–16

  • Razzaq A, Wasala A, Exton C, Buckley J (2019) The State of Empirical Evaluation in Static Feature Location. ACM Trans Softw Eng Methodol (TOSEM) 28(1)

  • Riley J (1996), Getting the most from your data, 2nd edn, Technical and Education Services Ltd. ISBN: 0947885307

  • Rist RS (1986), Plans in programming: definition, demonstration, and development, In First workshop on Empirical Studies of Programmers, pp 28–47

  • Robillard MP, Coelho W, Murphy GC (2004) How Effective Developers Investigate Source Code: An Exploratory Study. IEEE Trans. Softw. Eng. 30(12):889–903

    Google Scholar 

  • Rubin J and Chechik M (2013) A survey of feature location techniques, In: I. Reinhartz-Berger, Sturm A, Clark T, Cohen S, and Bettin J, (eds). Domain engineering, Springer, pp 29–58

  • Seaman C (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572

    Google Scholar 

  • Seaman C (2002), The information gathering strategies of software maintainers. In: Proceedings of the International Conference on Software Maintenance, pp 141–149

  • Shaft TM, Vessey I (1995) `The relevance of application domain knowledge: the case of computer program comprehension. Inf Syst Res 6:286–299

    Google Scholar 

  • Sharif KY (2012), Open source programmers' information seeking, PhD thesis, University of Limerick

  • Sharif KY, English M, Ali N, Exton C, Collins JJ, Buckley J (2015) An empirically-based characterization and quantification of information seeking through mailing lists during Open Source developers’ software evolution. Inf Softw Technol 57:77–94

    Google Scholar 

  • Shaw M and Garlan D (1996), Software architecture: perspectives on an emerging discipline. Prentice Hall. ISBN: 0131829572

  • Sheppard S, Curtis B, Milliman P, Love T (1979) Modern coding practices and programmer performance. Computer 12:41–49

    Google Scholar 

  • Shneiderman B, Mayer R (1979) Syntactic/semantic interactions in programmer behavior: A model and experimental results. Int J Comput Inform Sci 8(3):219–238

    MATH  Google Scholar 

  • Sillito J, Murphy G and De Volder K (2006), Questions Programmers ask during Software Evolution Tasks. Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering, pp 23–34

  • Sillito J, Murphy G, De Volder K (2008) Asking and answering questions during a programming change task. IEEE Trans Softw Eng 34:434–451

    Google Scholar 

  • Sim S, Holt R (1998) The Ramp-Up Problem in Software Projects: A Case Study of How Software Immigrants Naturalize. In: Proceedings of the 1998 International Conference on Software Engineering, pp 361–370

  • Singer J (1998), Practices of software maintenance. In: Proceedings of the International Conference on Software Maintenance, ICSM '98, pp. 139–145

  • Smith-Atakan S (2006), Human Computer Interaction. Thompson publishing. ISBN: 1–84480–454-2

  • Soloway E, Ehrlich K (1984) Empirical studies of programming knowledge. IEEE Trans Softw Eng 10(5):595–609

    Google Scholar 

  • Starke J, Luce C and Sillito J (2009), Searching and skimming: An exploratory study. In: Proceedings of the IEEE International Conference on Software Maintenance ICSM 2009, pp. 157–166

  • Stol, K-J, Ralph P, and Fitzgerald B (2016), Grounded Theory in Software Engineering Research. the 38th International Conference on Software Engineering, pp. 120–31

  • Storey MA (2006) Theories, tools and research methods in program comprehension: past, present and future. Softw Qual J 14(3):187–208

    Google Scholar 

  • van Deursen A (2001). Program Comprehension Risks and Opportunities in Extreme Programming. Proceedings Eighth Working Conference on Reverse Engineering. pp 176–185

  • Van Maanen J, Schein EH (1979) Toward a theory of organizational socialization. Res Organ Behav 1:209–264

  • VideoLAN (2013), VLC Media Player. URL: http://www.videolan.org/vlc/. Accessed 19th June 2016

  • von Mayrhauser A and Vans AM (1993), From program comprehension to tool requirements for an industrial environment. In: Proceedings of the IEEE Workshop on Program Comprehension, pp 78–86

  • von Mayrhauser A, Vans AM (1995a) Program understanding: Models and experiments. Adv Comput 40:1–38

    Google Scholar 

  • von Mayrhauser A, Vans AM (1995b) `Industrial experience with an integrated code comprehension model. Softw Eng J 10(5):171–182

    Google Scholar 

  • von Mayrhauser A, Vans AM, Howe AE (1997), Program understanding behaviour during enhancement of large‐scale software. In: Journal of Software Maintenance: Research and Practice 9 (5), pp 299–327

  • Wiedenbeck S (1986) Beacons in computer program comprehension. International Journal of Man-Machine Studies 25:697–709

    Google Scholar 

  • Wilson TD (1981) On user studies and information needs. J Doc 37(1):3–15

    Google Scholar 

Download references

Acknowledgments

This work is supported by Science Foundation Ireland grants 03/CE2/I303_1, 04/CE2/I303_1 and 10/CE/I1855 to Lero - the Irish Software Engineering Research Centre (www.lero.ie)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rebecca Yates.

Additional information

Communicated by: Emerson Murphy-Hill

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised due to a retrospective Open Access order.

Appendices

Appendix 1

figure d
figure e
figure f
figure g
figure h

Appendix 2

The following questions comprise the set of standard questions used in the follow-up interviews of newcomers. Like the background questionnaire, the standard question evolved slightly in response to the direction of the analysis, and the version presented here is the final version. The interviews took a semi-structured format, so additional questions were introduced in response to the participant’s answers and any unusual features of each session.

  1. 1.

    Please give the overall purpose of the software that was being discussed in the session.

  2. 2.

    Give an overview of what you discussed in that session.

  3. 3.

    When did you first see the code?

  4. 4.

    When did you start modifying the code?

  5. 5.

    Please highlight some things you learned in the session that proved useful when you were modifying the code.

  6. 6.

    With hindsight, what extra session content would have been useful?

  7. 7.

    How else could the session have improved for you?

  8. 8.

    If you had to explain this code to another developer joining the project, what would you do?

  9. 9.

    [After explaining the concept of `the driver’] Do you think it is better for the expert or the newcomer to drive? Why?

  10. 10.

    How does the relative experience of the newcomer and expert affect the session? Why?

  11. 11.

    [At the end of the interview] Do you have any other comments about anything we’ve discussed?

Appendix 3

figure i
figure j
figure k
figure l
figure m
figure n
figure o
figure p
figure q
figure r
figure s

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yates, R., Power, N. & Buckley, J. Characterizing the transfer of program comprehension in onboarding: an information-push perspective. Empir Software Eng 25, 940–995 (2020). https://doi.org/10.1007/s10664-019-09741-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09741-6

Keywords

Navigation