Skip to main content
Log in

Tracing distributed collaborative development in apache software foundation projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Developing and maintaining large software systems typically requires that developers collaborate on many tasks. During such collaborations, when multiple people work on the same chunk of code at the same time, they communicate with each other and employ safeguards in various ways. Recent studies have considered group co-development in OSS projects and found that it is an essential part of many projects. However, those studies were limited to groups of size two, i.e., pairs of developers. Here we go further and characterize co-development in larger groups. We develop an effective methodology for capturing distributed collaboration beyond groups of size two, based on synchronized commit activities among multiple developers, and apply it to data from 26 OSS projects from the Apache Software Foundation. We find that distributed collaborations is prevalent, but not as frequent as expected. We also find that while in distributed collaborative groups, developers’ behavior is different than when programming alone, e.g., high developer focus on specific code packages associates with lower team participation, while packages with higher ownership get less attention from groups than from individuals. Finally, we show that productivity effort during co-development is more often lower for developers while they co-develop in groups. To verify our results we use both quantitative and qualitative methods, including a developer survey. We conclude that these methods and results can be used to understand the effects of the collaborative dynamic in OSS teams on the software engineering process. Our code, along with our datasets and survey is available at http://www.gharehyazie.com/supplementary/teamwork/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Linus Torvalds runs the Linux project in a more centralized fashion, depending on his lieutenants for decisions regarding which new code filters up to him.

  2. Most of our studied projects are written in Java where files within the same file directory are considered to be in the same package. The three non-java projects, “axis2_c”, “log4net”, and “log4php”, use the same file structure as their Java counterparts,“axis2_java” and “log4j”.

  3. We also generated all the results for Tables 36 and 10 for choices of 2 and 5 days for Δt. While the results were slightly different, the overall theme of the tables remained consistent with our original choice.

  4. The reason that we speak of files instead of packages at this stage is that commit datasets record files, and to randomize them, we have to randomize at a file level. All results extracted from these randomized datasets are still based on package level code proximity.

  5. We scanned by hand a number of CoGs and were able to identify via the contents of their messages that developers were truly coordinating their collaboration as predicted. That encouraged us to come up with the automated, but necessarily more simplistic, large-scale analysis, presented here.

  6. We search for files within the packages subject to collaboration since in technical discussions, file names occur naturally and more frequently than package names.

  7. The word cloud was created using the “comparison.wordcloud” function in the “wordcloud” package in R.

References

  • Adams PJ, Capiluppi A, Boldyreff C (2009) Coordination and productivity issues in free software: The role of Brooks’ law. In: IEEE International Conference on Software Maintenance, 2009. ICSM 2009, pages 319–328. IEEE

  • Al-Ani B, Edwards HK (2008) A comparative empirical study of communication in distributed and collocated development teams. In: ICGSE IEEE International Conference on Global Software Engineering, 2008, pages 35–44. IEEE

  • Avritzer A, Paulish DJ (2010) A comparison of commonly used processes for multi-site software development. In: Collaborative Software Engineering, pages 285–302. Springer

  • Baruch Y (1999) Response rate in academic studies-a comparative analysis. Human relations 52(4):421–438

    Google Scholar 

  • Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. Inproceedings of the 2006 international workshop on Mining software repositories. ACM:137–143

  • Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 4–14 ACM

  • Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P (2008) Latent social structure in open source projects. In: proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 24–35 ACM

  • Blüthgen N, Menzel F, Blüthgen N (2006) Measuring specialization in species interaction networks. BMC Ecology 6(1):9

    Article  Google Scholar 

  • Brooks Jr FP (1995) The Mythical Man-month (Anniversary Ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA USA

  • Caglayan B, Bener AB, Miranskyy A (2013) Emergence of developer teams in the collaboration network. In: Cooperative and Human Aspects of Software Engineering (CHASE), 2013 6th International Workshop on, pages 33–40. IEEE

  • Carmel E (1999) Global software teams: collaborating across borders and time zones Prentice Hall PTR

  • Cataldo M, Herbsleb JD (2013) Coordination breakdowns and their impact on development productivity and software failures Engineering. IEEE Trans Softw Eng 39(3):343–360

    Article  Google Scholar 

  • Child J (1972) Organizational structure, environment and performance: the role of strategic choice. Sociology 6(1):1–22

    Article  Google Scholar 

  • Cohen PR, Levesque HJ (1991) Teamwork SRI International Menlo Park

  • Crowston K, Li Q, Wei K, Eseryel UY, Howison J (2007) Self-organization of teams for free/libre open source software development. J Inf Softw Technol 49(6):564–575

    Article  Google Scholar 

  • Dabbish L, Stuart C, Tsay J, Herbsleb J (2012) Social coding in GitHub: transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pages 1277–1286 ACM

  • Damian D, Izquierdo L, Singer J, Kwan I (2007) Awareness in the wild: Why communication breakdowns occur. In: Global Software Engineering, 2007. ICGSE 2007. Second IEEE International Conference on, pages 81–90. IEEE

  • Di Penta M, Harman M, Antoniol G, Qureshi F (2007) The effect of communication overhead on software maintenance project staffing: a search-based approach. In: Software Maintenance, 2007. ICSM 2007. IEEE International Conference on, pages 315–324. IEEE

  • Dugatkin LA (1997) Cooperation among animals, Oxford Series in Ecology and Evolution

  • Foucault M, Falleri J-R, Blanc X (2014) Code ownership in open-source software. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, page 39 ACM

  • Gharehyazie M, Posnett D, Filkov V (2013) Social activities rival patch submission for prediction of developer initiation in oss projects. In: Software Maintenance (ICSM), 2013 29th IEEE International Conference on, pages 340–349. IEEE

  • Gharehyazie M, Posnett D, Vasilescu B, Filkov V (2014) Developer initiation and social interactions in oss: A case study of the apache software foundation. Empir Softw Eng:1–36

  • Goeminne M, Claes M, Mens T (2013) A historical dataset for the gnome ecosystem

  • Grechanik M, Jones JA, Orso A, van der Hoek A (2010) Bridging gaps between developers and testers in globally-distributed software development. In: Proceedings of the FSE/SDP workshop on Future of software engineering research, pages 149–154 ACM

  • Gutwin C, Penner R, Schneider K (2004) Group awareness in distributed software development. In: Proceedings of the 2004 ACM conference on Computer supported cooperative work, pages 72–81. ACM

  • Guzzi A, Bacchelli A, Lanza M, Pinzger M, Deursen AV (2013) Communication in open source software development mailing lists. In: MSR, pages 277–286. IEEE

  • Herbsleb JD (2007) Global software engineering: The future of socio-technical coordination. In: 2007 Future of Software Engineering, pages 188–198. IEEE Computer Society

  • Herbsleb J, Grinter RE (1999) Architectures, coordination, and distance: Conway’s law and beyond. IEEE Softw 16(5):63–70

    Article  Google Scholar 

  • Herbsleb J, Mockus A, Finholt TA, Grinter RE (2001) An empirical study of global software development: distance and speed. In: Proceedings of the 23rd international conference on software engineering, pages 81–90 IEEE Computer Society

  • Herbsleb JD, Moitra D (2001) Global software development. IEEE Soft 18 (2):16–20

    Article  Google Scholar 

  • Hertel G, Niedner S, Herrmann S (2003) Motivation of software developers in open source projects: an internet-based survey of contributors to the linux kernel. Res Policy 32(7):1159–1177

    Article  Google Scholar 

  • Holmstrom H, Conchúir E. Ó, Ågerfalk PJ, Fitzgerald B (2006) Global software development challenges: A case study on temporal, geographical and socio-cultural distance. In: Global Software Engineering, 2006. ICGSE’06. International Conference on, pages 3–11. IEEE

  • Jermakovics A, Sillitti A, Succi G (2011) Mining and visualizing developer networks from version control systems. In: Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, pages 24–31 ACM

  • Kakimoto T, Kamei Y, Ohira M, Matsumoto K (2006) Social network analysis on communications for knowledge collaboration in oss communities

  • Kampstra P et al (2008) Beanplot: A boxplot alternative for visual comparison of distributions. J Stat Softw 28(1):1–9

    Google Scholar 

  • Katzenbach JR (1993) The wisdom of teams: Creating the high-performance organization. Harvard Business Press

  • Kuipers BS, De Witte MC (2005) Teamwork: a case study on development and performance. Int J Hum Resour Manag 16(2):185–201

    Article  Google Scholar 

  • Lanubile F, Ebert C, Prikladnicki R, Vizca íno A (2010) Collaboration tools for global software engineering. IEEE soft 2:52–55

    Article  Google Scholar 

  • Luther K, Caine K, Ziegler K, Bruckman A (2010) Why it works (when it works): Success factors in online creative collaboration. In: Proceedings of the 16th ACM international conference on Supporting group work, pages 1–10 ACM

  • Maalej W, Happel H-J (2009) From work to word: How do software developers describe their work?. In: Mining Software Repositories, 2009. MSR’09. 6th IEEE International Working Conference on, pages 121–130. IEEE

  • Maalej W, Happel H-J (2010) Can development work describe itself?. In: Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, pages 191–200. IEEE

  • Mistrík I, Grundy J, Van der Hoek A, Whitehead J (2010) Collaborative software engineering: challenges and prospects. In: Collaborative Software Engineering, pages 389–403. Springer

  • Mockus A (2009) Succession: Measuring transfer of code and developer productivity. In: Proceedings of the 31st International Conference on Software Engineering, pages 67–77 IEEE Computer Society

  • Mockus A (2010) Organizational volatility and its effects on software defects. In: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, pages 117–126 ACM

  • Moe NB, Dingsøyr T, Dybå T (2010) A teamwork model for understanding an agile team: A case study of a scrum project. Inf Softw Technol 52(5):480–491

    Article  Google Scholar 

  • Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: an empirical case study. In: Proceedings of the 30th international conference on Software engineering, pages 521–530 ACM

  • Nakakoji K, Yamada K, Giaccardi E (2005) Understanding the nature of collaboration in open-source software development. In: Software Engineering Conference, 2005. APSEC’05. 12th Asia-Pacific, pages 8–pp. IEEE

  • Nakakoji K, Ye Y, Yamamoto Y (2010) Supporting expertise communication in developer-centered collaborative software development environments. In: Collaborative Software Engineering, pages 219–236. Springer

  • Nguyen T, Wolf T, Damian D (2008) Global software development and delay: Does distance still matter?. In: Global Software Engineering, 2008. ICGSE 2008. IEEE International Conference on, pages 45–54. IEEE

  • Nohria N, Eccles R (1994) Networks and organizations: structure, form, and action. Harvard Business School Press

  • Pagano D, Maalej W (2013) How do open source communities blog? Empir Softw Eng 18(6):1090–1124

    Article  Google Scholar 

  • Panichella S, Canfora G, Di Penta M, Oliveto R (2014) How the evolution of emerging collaborations relates to code changes: An empirical study. In: 22nd International Conference on Program Comprehension (ICPC). IEEE

  • Pinzger M, Gall H (2010) Dynamic analysis of communication and collaboration in oss projects. In: Collaborative Software Engineering, pages 265–284. Springer

  • Posnett D, D’Souza R, Devanbu P, Filkov V (2013) Dual ecological measures of focus in software development. In: 35th International Conference on Software Engineering (ICSE), pages 452–461. IEEE

  • Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd International Conference on Software Engineering, pages 491–500 ACM

  • Redmiles D, Van Der Hoek A, Al-Ani B, Hildenbrand T, Quirk S, Sarma A, Filho R, de Souza C, Trainer E (2007) Continuous coordination-a new paradigm to support globally distributed software development projects. Wirtschafts Informatik 49(1):28

    Google Scholar 

  • Robertsa J, Hann I-H, Slaughter S (2006) Communication networks in an open source software project. In: Open Source Systems, pages 297–306. Springer

  • Salas EE, Fiore SM (2004) Team cognition: Understanding the factors that drive process and performance. American Psychological Association

  • Sarma A, Al-Ani B, Trainer E, Silva Filho RS, da Silva IA, Redmiles D, van der Hoek A (2010) Continuous coordination tools and their evaluation. In: Collaborative Software Engineering, pages 153–178. Springer

  • Sarma A, Herbsleb J, Van Der Hoek A (2008) Challenges in measuring, understanding, and achieving social-technical congruence. In: Proceedings of Socio-Technical Congruence Workshop, In Conjuction With the International Conference on Software Engineering

  • Scacchi W (2010) Collaboration practices and affordances in free/open source software development. In: Collaborative software engineering, pages 307–327. Springer

  • Serebrenik A, van den Brand M (2010) Theil index for aggregation of software metrics values. In: Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1–9. IEEE

  • Takhteyev Y, Hilts A (2010) Investigating the geography of open source software through github

  • Vasilescu B, Serebrenik A, van den Brand M (2011) You can’t control the unfamiliar: A study on the relations between aggregation techniques for software metrics. In: Software Maintenance (ICSM), 2011 27th IEEE International Conference on, pages 313–322. IEEE

  • Whitehead J, Mistrík I, Grundy J, van der Hoek A (2010) Collaborative software engineering: concepts and techniques. In: Collaborative Software Engineering, pages 1–30. Springer

  • Wilson EO (1978) What is sociobiology? Society 15(6):10–14

  • Xuan Q, Devanbu P, Filkov V (2014) Converging work-talk patterns in online task-oriented communities. arXiv:1404.5708

  • Xuan Q, Fang H, Fu C, Filkov V (2015) Temporal motifs reveal collaboration patterns in online task-oriented networks. Phys Rev E 91(5):052813

    Article  Google Scholar 

  • Xuan Q, Filkov V (2013) Synchrony in social groups and its benefits. In: Handbook of Human Computation, pages 791–802. Springer

  • Xuan Q, Filkov V (2014) Building it together: synchronous development in OSS. In: Proceedings of the 34th International Conference on Software Engineering ACM

  • Xuan Q, Gharehyazie M, Devanbu P, Filkov V (2012) Measuring the effect of social communications on individual working rhythms: A case study of open source software. In: Social Informatics (SocialInformatics), 2012 International Conference on, pages 78–85. IEEE

  • Xuan Q, Okano A, Devanbu P, Filkov V (2014) Focus-shifting patterns of oss developers and their congruence with call graphs. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 401–412 ACM

Download references

Acknowledgments

The authors would like to thank the members of our DECAL research group and Prof. Qi Xuan for the valuable discussion about the ideas and technical details presented in this paper. We thank also Dr. Bogdan Vasilescu for his contributions in designing the survey and for his insightful comments and feedback on this work, and Mehrdad Afshari for his help in improving the paper. The comments by the anonymous reviewers helped us make this paper better, for which we are thankful. Both authors gratefully acknowledge support from the Air Force Office of Scientific Research, award FA955-11-1-0246.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Gharehyazie.

Additional information

Communicated by: Filippo Lanubile

Appendices

Appendix A Developer Questionnaire

The questionnaire is sent to each individual through email. Each email starts with a proper introduction of the authors, and our research. afterwards, they are asked to complete the form and submit it to us.

ASF Collaborative Development Questionnaire

* Required

How would you describe your involvement in this project? e.g., project founder, core developer, ...

How frequently did/do you work on this mentioned project? *

  • ◯     Daily

  • ◯     Once per 2-3 days

  • ◯     Once per week

  • ◯     Less than once per week

What are some typical tasks you carried out in this project? Please give a few examples. e.g., fixing bugs, implementing a new feature, ...

How do you choose which tasks to work on? Do you choose your own tasks? How do you prioritize which tasks to work on first?

How long did tasks you worked on typically take, from start to finish? * If you were part of a bigger task, please answer with the overall task in mind

  • ◯     1-2 days

  • ◯     3-5 days

  • ◯     A week

  • ◯     2 weeks

  • ◯     Other:

When does work by others influence you / your work directly? *

When it is in the same files you are touching at the time; the same packages; the whole project or something else

  • ◯     The file(s) I am working on

  • ◯     The package(s) I am working on

  • ◯     The whole project

  • ◯     Other:

Which of your tasks do you consider to be more collaborative than the others? e.g., bug fixes, adding new features, ....

How many people do collaborative tasks typically involve?

  • ◯     2

  • ◯     3

  • ◯     4

  • ◯     5

  • ◯     6

  • ◯     more

How do you coordinate your work with collaborators on the same task? What communication channels do you use? Do you discuss with them prior to task assignment, during task work, or after task completion?

How do you adjust your working style when collaborating as opposed to during solitary work, if at all? e.g.,, by committing less frequently, or by pushing smaller commits more frequently, ...

When is it beneficial and when is it detrimental to collaborate with others on the same task?

Please tell us how much you agree or disagree with the following sentences *

Table 12 ᅟ

Appendix B Verification of Data Mining Scripts

Our scripts are based on scripts developed by Bird et al., which we have slighlty modified to fit our purposes. Both ours and their scripts are available at http://www.gharehyazie.com/supplementary/teamwork/miningscripts/. As this data gathering step is critical to the analyses downstream, we proceeded to verify their accuracy. To that end, we randomly selected three months (June 2008, April 2009, and Feburary 2010) and three of our 26 projects (Abdera, Harmony and Cayenne). We then manually iterated over all of the messages by those selected projects during those selected time periods. Overall about 1200 messages were inspected during this process, as follows.

We observed the message senders, subject, timestamp, thread IDs, and body. This information was then compared to the corresponding entries for the messages in the projects’ mailing list archive available at http://mail-archives.apache.org/mod_mbox/. While almost everything was consistent the original archive, two issues were discovered:

  1. 1.

    The timestamp of messages stored in our database were off by a few hours compared to the archives. Upon further investigation, we identified the issue to be the way we parse the timezone information. This inconsistency does not affect our results since it results in a time discrepancy in message timestamps of at most one day and our study is insensitive to this resolution of time.

  2. 2.

    The last message of each month was not recorded in our database. This resulted in a difference of 12 messages per project per year between our database and the actual archives, a difference of 1%.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gharehyazie, M., Filkov, V. Tracing distributed collaborative development in apache software foundation projects. Empir Software Eng 22, 1795–1830 (2017). https://doi.org/10.1007/s10664-016-9463-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9463-3

Keywords

Navigation