Abstract
The risk to using third-party libraries in a software application is that much needed maintenance is solely carried out by library maintainers. These libraries may rely on a core team of maintainers (who might be a single maintainer that is unpaid and overworked) to serve a massive client user-base. On the other hand, being open source has the benefit of receiving contributions (in the form of External PRs) to help fix bugs and add new features. In this paper, we investigate the role by which External PRs (contributions from outside the core team of maintainers) contribute to a library. Through a preliminary analysis, we find that External PRs are prevalent, and just as likely to be accepted as maintainer PRs. We find that 26.75% of External PRs submitted fix existing issues. Moreover, fixes also belong to labels such as breaking changes, urgent, and on-hold. Differently from Internal PRs, External PRs cover documentation changes (44 out of 384 PRs), while not having as much refactoring (34 out of 384 PRs). On the other hand, External PRs also cover new features (380 out of 384 PRs) and bugs (120 out of 384 PRs). Our results lay the groundwork for understanding how maintainers decide which external contributions they select to evolve their libraries and what role they play in reducing the workload.
Similar content being viewed by others
Data Availability
Our scripts and tools are made available on GitHub https://github.com/NAIST-SE/External-PullRequest, and our generated dataset is at https://zenodo.org/record/6366998#.Y9-KTnZBxXU.
Notes
References
(2016) Big-21501 - eu cookie warning (bugfix) by mickr \(\cdot \) pull request #50 \(\cdot \) bigcommerce/stencil-utils. https://github.com/bigcommerce/stencil-utils/pull/50. Accessed 20 Jan 2022
(2017a) Merging cards theme into master by grtjn \(\cdot \) pull request #445 \(\cdot \) marklogic-community/slush-marklogic-node. https://github.com/marklogic-community/slush-marklogic-node/pull/445. Accessed 20 Jan 2022
(2017b) Remove tls account creation tests by dmitrizagidulin \(\cdot \) pull request #495 \(\cdot \) solid/node-solid-server. https://github.com/solid/node-solid-server/pull/495. Accessed 20 Jan 2022
(2017c) Update writingtests.md by mattmilburn \(\cdot \) pull request #2654 \(\cdot \) reduxjs/redux. https://github.com/reduxjs/redux/pull/2654. Accessed 20 Jan 2022
(2019a) feat: Add ‘twitch‘ icon by ahtohbi4 \(\cdot \) pull request #677 \(\cdot \) feathericons/feather. https://github.com/feathericons/feather/pull/677. Accessed 20 Jan 2022
(2019b) Major refactoring by szmarczak \(\cdot \) pull request #921 \(\cdot \) sindresorhus/got. https://github.com/sindresorhus/got/pull/921. Accessed 20 Jan 2022
(2019c) Mark the package as having no side effects by stof \(\cdot \) pull request #77 \(\cdot \) d3/d3-format. https://github.com/d3/d3-format/pull/77. Accessed 20 Jan 2022
(2022) Libraries.io - the open source discovery service. https://libraries.io/. Accessed 17 Dec 2022
Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 385–395
Alfadel M, Costa DE, Shihab E, Mkhallalati M (2021) On the use of dependabot security pull requests. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, pp 254–265
Alrubaye H, Mkaouer MW, Khokhlov I, Reznik L, Ouni A, Mcgoff J (2020) Learning to recommend third-party library migration opportunities at the api level. Appl Soft Comput
Assavakamhaenghan N, Wattanakriengkrai S, Shimada N, Kula RG, Ishio T, ichi Matsumoto K (2021) Does the first-response matter for future contributions? a study of first contributions. In: Proceedings of the 18th international conference on mining software repositories
Berger A (2021) Log4j vulnerability explained: What is log4shell? https://www.dynatrace.com/news/blog/what-is-log4shell/. Accessed 04 July 2022
Bonaccorsi A, Rossi-Lamastra C (2006) Comparing motivations of individual programmers and firms to take part in the open source movement: from community to business. Knowl Policy 18:40–64
Chinthanet B, Kula RG, McIntosh S, Ishio T, Ihara A, Matsumoto K (2021) Lags in the release, adoption, and propagation of npm vulnerability fixes. Empir Softw Eng 26(3):1–28
Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114:494
Cogo FR, Oliva GA, Hassan AE (2019) An empirical study of dependency downgrades in the npm ecosystem. IEEE Transactions on Software Engineering, pp 1–1
Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences. Routledge
Cramér H (2016) Mathematical Methods of Statistics (PMS-9), vol 9. Princeton University Press
Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the npm package dependency network. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp 181–191
Dey T, Mockus A (2020) Effect of technical and social factors on pull request quality for the npm ecosystem. In: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Association for Computing Machinery, New York, NY, USA, ESEM ’20
Dey T, Ma Y, Mockus A (2019) Patterns of effort contribution and demand and user classification based on participation patterns in npm ecosystem. PROMISE’19, p 36–45
Dey T, Mousavi S, Ponce E, Fry T, Vasilescu B, Filippova A, Mockus A (2020) Detecting and characterizing bots that commit code. In: Proceedings of the 17th international conference on mining software repositories, pp 209–219
Dinno A (2015) Nonparametric pairwise multiple comparisons in independent groups using dunn’s test. Stata J 15(1):292–300
Durumeric Z, Li F, Kasten J, Amann J, Beekman J, Payer M, Weaver N, Adrian D, Paxson V, Bailey M, Halderman JA (2014) The matter of heartbleed. In: Proceedings of the 2014 Conference on Internet Measurement Conference, Association for Computing Machinery, New York, NY, USA, IMC ’14, pp 475–488
Fagerholm F, Guinea AS, Münch J, Borenstein J (2014) The role of mentoring and project characteristics for onboarding in open source software projects. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Association for Computing Machinery, New York, NY, USA, ESEM ’14
Friedman N (2020) npm is joining github | the github blog. https://github.blog/2020-03-16-npm-is-joining-github/. Accessed 04 July 2022
FRS KP (1900) X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond Edinb Dublin Philos Mag J Sci 50(302):157–175
Golzadeh M, Legay D, Decan A, Mens T (2020) Bot or not? detecting bots in github pull request activity based on comment similarity. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, pp 31–35
Gousios G (2013) The GHTorrent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp 233–236
Gousios G, Storey MA, Bacchelli A (2016) Work practices and challenges in pull-based development: The contributor’s perspective. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pp 285–296
Hars A, Ou S (2001) Working for free? motivations of participating in open source projects. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences
Hata H, Treude C, Kula RG, Ishio T (2019) 9.6 million links in source code comments: Purpose, evolution, and decay. In: Proceedings of the 41st International Conference on Software Engineering, IEEE Press, ICSE ’19, pp 1211–1221
He H, He R, Gu H, Zhou M (2021) A large-scale empirical study on java library migrations: Prevalence, trends, and rationales. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2021, pp 478–490
Heinemann L, Deissenboeck F, Gleirscher M, Hummel B, Irlbeck M (2011) On the extent and nature of software reuse in open source java projects. In: Schmid K (ed) Top Productivity through Software Reuse. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 207–222
Huang K, Chen B, Shi B, Wang Y, Xu C, Peng X (2020) Interactive, effort-aware library version harmonization. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 518–529
Iaffaldano G, Steinmacher I, Calefato F, Gerosa M, Lanubile F (2019) Why do developers take breaks from contributing to oss projects? a preliminary analysis. In: Proceedings of the 2nd International Workshop on Software Health, IEEE Press, SoHeal ’19, pp 9–16
Islam S, Kula RG, Treude C, Chinthanet B, Ishio T, Matsumoto K (2021) Contrasting third-party package management user experience. In: 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 664–668
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621
Kula RG, German DM, Ouni A, Ishio T, Inoue K (2018) Do developers update their library dependencies? Empir Softw Eng 23:384–417
Lee A, Carver JC, Bosu A (2017) Understanding the impressions, motivations, and barriers of one time code contributors to floss projects: A survey. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp 187–197
Li Z, Yu Y, Wang T, Yin G, Li S, Wang H (2021) Are you still working on this an empirical study on pull request abandonment. IEEE Trans Softw Eng PP:1. https://doi.org/10.1109/TSE.2021.3053403
Mäntylä MV, Novielli N, Lanubile F, Claes M, Kuutila M (2017) Bootstrapping a lexicon for emotional arousal in software engineering. In: Proceedings of the 14th International Conference on Mining Software Repositories, IEEE Press, MSR ’17, pp 198–202
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochemia Med 22(3):276–282
Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, IEEE Press, ASE 2017, p 84–94
Nakakoji K, Yamamoto Y, NISHINAKA Y, Kishida K, Ye Y (2003) Evolution patterns of open-source software systems and communities. In: International Workshop on Principles of Software Evolution (IWPSE)
Nichols S (2022) Log4shell vulnerability continues to menace developers. https://bit.ly/3yEDDrn. Accessed 04 July 2022
OpenSSF (2022) Openssf announces the alpha-omega project to improve software supply chain security for 10,000 oss projects - open source security foundation. https://openssf.org/press-release/2022/02/01/openssf-announces-the-alpha-omega-project-to-improve-software-supply-chain-security-for-10000-oss-projects/. Accessed 04 July 2022
Pinto G, Steinmacher I, Gerosa MA (2016) More common than you think: An in-depth study of casual contributors. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol 1, pp 112–123
Raymond E (1999) The cathedral and the bazaar. Knowl Technol Policy 12(3):23–49
Rehman I, Wang D, Kula RG, Ishio T, Matsumoto K (2020) Newcomer candidate: Characterizing contributions of a novice developer to github. In: Proceedings of the 36th international conference on software maintainance and evolution
Roberts J, Hann IH, Slaughter S (2006) Understanding the motivations, participation, and performance of open source software developers: A longitudinal study of the apache projects. Manag Sci 52:984–999
Rombaut B, Roseiro Côgo F, Adams B, Hassan AE (2022) There’s no such thing as a free lunch: Lessons learned from exploring the overhead introduced by the greenkeeper dependency bot in npm. ACM Transactions on Software Engineering and Methodology
Roth E (2022) Open source developer corrupts widely-used libraries, affecting tons of projects. https://www.theverge.com/2022/1/9/22874949/developer-corrupts-open-source-libraries-projects-affected. Accessed 04 July 2022
Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52:902–922
Schilling A, Laumer S, Weitzel T (2012) Who will remain? an evaluation of actual person-job and person-team fit to predict developer retention in floss projects. In: 2012 45th Hawaii International Conference on System Sciences, pp 3446–3455
Sharma A (2022) npm libraries ‘colors’ and ‘faker’ sabotaged in protest by their maintainer-what to do now? https://blog.sonatype.com/npm-libraries-colors-and-faker-sabotaged-in-protest-by-their-maintainer-what-to-do-now. Accessed 04 July 2022
Steinmacher I, Wiese I, Chaves AP, Gerosa MA (2013) Why do newcomers abandon open source software projects? In: 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE), pp 25–32
Steinmacher I, Pinto G, Wiese IS, Gerosa MA (2018) Almost there: a study on quasi-contributors in open source software projects. In: Proceedings of the 40th International Conference on Software Engineering, Association for Computing Machinery, New York, NY, USA, ICSE ’18, pp 256–266
Subramanian VN, Rehman I, Nagappan M, Kula RG (2022) Analyzing first contributions on github: What do newcomers do? IEEE Softw 39:93–101
Thung F (2016) Api recommendation system for software development. In: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 896–899
Valiev M, Vasilescu B, Herbsleb J (2018) Ecosystem-level determinants of sustained activity in open-source projects: A case study of the pypi ecosystem. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018, p 644–655
Viera A, Garrett J (2005) Understanding interobserver agreement: The kappa statistic. Fam Med 37:360–3
Wang D, Xiao T, Thongtanunam P, Kula RG, Matsumoto K (2021) Understanding shared links and their intentions to meet information needs in modern code review: A case study of the openstack and qt projects. Empir Softw Eng 26:1–32
Wattanakriengkrai S, Chinthanet B, Hata H, Kula RG, Treude C, Guo J, Matsumoto K (2022) Github repositories with links to academic papers: Public access, traceability, and evolution. J Syst Softw 183:111117
Wattanakriengkrai S, Wang D, Kula RG, Treude C, Thongtanunam P, Ishio T, Matsumoto K (2022) Giving back: Contributions congruent to library dependency changes in a software ecosystem. IEEE Trans Softw Eng 1–13. https://doi.org/10.1109/TSE.2022.3225197
Wessel M, de Souza BM, Steinmacher I, Wiese IS, Polato I, Chaves AP, Gerosa MA (2018) The power of bots: Characterizing and understanding bots in oss projects. Proc ACM Hum-Comput Interact 2(CSCW)
Xu B, An L, Thung F, Khomh F, Lo D (2020) Why reinventing the wheels? an empirical study on library reuse and re-implementation. Empir Softw Eng 25:755–789
YazıcıV (2021) Volkan Yazıcıon twitter: log4j maintainers have been working sleeplessly on mitigation measures; fixes, docs, cve, replies to inquiries, etc. yet nothing is stopping people to bash us, for work we aren’t paid for, for a feature we all dislike yet needed to keep due to backward compatibility concerns. / twitter. https://twitter.com/yazicivo/status/1469349956880408583?lang=en. Accessed 04 July 2022
Zerouali A, Constantinou E, Mens T, Robles G, Gonzalez-Barahona J (2018) An empirical analysis of technical lag in npm package dependencies. In: New Opportunities for Software Reuse: 17th International Conference, ICSR 2018, Madrid, Spain, May 21-23, 2018, Proceedings 17, Springer, pp 95–110
Zhou M, Mockus A (2012) What make long term contributors: Willingness and opportunity in oss community. In: 2012 34th International Conference on Software Engineering (ICSE), pp 518–528
Acknowledgements
This work is supported by Japanese Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers 20K19774 and 20H05706.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Raula Gaikovina Kula and Christoph Treude are members of the EMSE Editorial Board.
Additional information
Communicated by: Andrea De Lucia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Maeprasart, V., Wattanakriengkrai, S., Kula, R.G. et al. Understanding the role of external pull requests in the NPM ecosystem. Empir Software Eng 28, 84 (2023). https://doi.org/10.1007/s10664-023-10315-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-023-10315-w