Skip to main content
Log in

Understanding the role of external pull requests in the NPM ecosystem

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The risk to using third-party libraries in a software application is that much needed maintenance is solely carried out by library maintainers. These libraries may rely on a core team of maintainers (who might be a single maintainer that is unpaid and overworked) to serve a massive client user-base. On the other hand, being open source has the benefit of receiving contributions (in the form of External PRs) to help fix bugs and add new features. In this paper, we investigate the role by which External PRs (contributions from outside the core team of maintainers) contribute to a library. Through a preliminary analysis, we find that External PRs are prevalent, and just as likely to be accepted as maintainer PRs. We find that 26.75% of External PRs submitted fix existing issues. Moreover, fixes also belong to labels such as breaking changes, urgent, and on-hold. Differently from Internal PRs, External PRs cover documentation changes (44 out of 384 PRs), while not having as much refactoring (34 out of 384 PRs). On the other hand, External PRs also cover new features (380 out of 384 PRs) and bugs (120 out of 384 PRs). Our results lay the groundwork for understanding how maintainers decide which external contributions they select to evolve their libraries and what role they play in reducing the workload.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Our scripts and tools are made available on GitHub https://github.com/NAIST-SE/External-PullRequest, and our generated dataset is at https://zenodo.org/record/6366998#.Y9-KTnZBxXU.

Notes

  1. https://github.com/dependabot

  2. https://docs.github.com/en/rest/reference/pulls#get-a-pull-request

  3. https://www.surveysystem.com/sscalc.htm

  4. https://github.com/neilernst/cliffsDelta

  5. Documentation at https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue

  6. https://pypi.org/project/regex/

  7. https://www.nltk.org/

  8. https://docs.scipy.org/doc/scipy/reference/stats.html

References

  • (2016) Big-21501 - eu cookie warning (bugfix) by mickr \(\cdot \) pull request #50 \(\cdot \) bigcommerce/stencil-utils. https://github.com/bigcommerce/stencil-utils/pull/50. Accessed 20 Jan 2022

  • (2017a) Merging cards theme into master by grtjn \(\cdot \) pull request #445 \(\cdot \) marklogic-community/slush-marklogic-node. https://github.com/marklogic-community/slush-marklogic-node/pull/445. Accessed 20 Jan 2022

  • (2017b) Remove tls account creation tests by dmitrizagidulin \(\cdot \) pull request #495 \(\cdot \) solid/node-solid-server. https://github.com/solid/node-solid-server/pull/495. Accessed 20 Jan 2022

  • (2017c) Update writingtests.md by mattmilburn \(\cdot \) pull request #2654 \(\cdot \) reduxjs/redux. https://github.com/reduxjs/redux/pull/2654. Accessed 20 Jan 2022

  • (2019a) feat: Add ‘twitch‘ icon by ahtohbi4 \(\cdot \) pull request #677 \(\cdot \) feathericons/feather. https://github.com/feathericons/feather/pull/677. Accessed 20 Jan 2022

  • (2019b) Major refactoring by szmarczak \(\cdot \) pull request #921 \(\cdot \) sindresorhus/got. https://github.com/sindresorhus/got/pull/921. Accessed 20 Jan 2022

  • (2019c) Mark the package as having no side effects by stof \(\cdot \) pull request #77 \(\cdot \) d3/d3-format. https://github.com/d3/d3-format/pull/77. Accessed 20 Jan 2022

  • (2022) Libraries.io - the open source discovery service. https://libraries.io/. Accessed 17 Dec 2022

  • Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 385–395

  • Alfadel M, Costa DE, Shihab E, Mkhallalati M (2021) On the use of dependabot security pull requests. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, pp 254–265

  • Alrubaye H, Mkaouer MW, Khokhlov I, Reznik L, Ouni A, Mcgoff J (2020) Learning to recommend third-party library migration opportunities at the api level. Appl Soft Comput

  • Assavakamhaenghan N, Wattanakriengkrai S, Shimada N, Kula RG, Ishio T, ichi Matsumoto K (2021) Does the first-response matter for future contributions? a study of first contributions. In: Proceedings of the 18th international conference on mining software repositories

  • Berger A (2021) Log4j vulnerability explained: What is log4shell? https://www.dynatrace.com/news/blog/what-is-log4shell/. Accessed 04 July 2022

  • Bonaccorsi A, Rossi-Lamastra C (2006) Comparing motivations of individual programmers and firms to take part in the open source movement: from community to business. Knowl Policy 18:40–64

    Article  Google Scholar 

  • Chinthanet B, Kula RG, McIntosh S, Ishio T, Ihara A, Matsumoto K (2021) Lags in the release, adoption, and propagation of npm vulnerability fixes. Empir Softw Eng 26(3):1–28

    Article  Google Scholar 

  • Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114:494

    Article  Google Scholar 

  • Cogo FR, Oliva GA, Hassan AE (2019) An empirical study of dependency downgrades in the npm ecosystem. IEEE Transactions on Software Engineering, pp 1–1

  • Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences. Routledge

    MATH  Google Scholar 

  • Cramér H (2016) Mathematical Methods of Statistics (PMS-9), vol 9. Princeton University Press

    Google Scholar 

  • Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the npm package dependency network. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp 181–191

  • Dey T, Mockus A (2020) Effect of technical and social factors on pull request quality for the npm ecosystem. In: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Association for Computing Machinery, New York, NY, USA, ESEM ’20

  • Dey T, Ma Y, Mockus A (2019) Patterns of effort contribution and demand and user classification based on participation patterns in npm ecosystem. PROMISE’19, p 36–45

  • Dey T, Mousavi S, Ponce E, Fry T, Vasilescu B, Filippova A, Mockus A (2020) Detecting and characterizing bots that commit code. In: Proceedings of the 17th international conference on mining software repositories, pp 209–219

  • Dinno A (2015) Nonparametric pairwise multiple comparisons in independent groups using dunn’s test. Stata J 15(1):292–300

    Article  Google Scholar 

  • Durumeric Z, Li F, Kasten J, Amann J, Beekman J, Payer M, Weaver N, Adrian D, Paxson V, Bailey M, Halderman JA (2014) The matter of heartbleed. In: Proceedings of the 2014 Conference on Internet Measurement Conference, Association for Computing Machinery, New York, NY, USA, IMC ’14, pp 475–488

  • Fagerholm F, Guinea AS, Münch J, Borenstein J (2014) The role of mentoring and project characteristics for onboarding in open source software projects. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Association for Computing Machinery, New York, NY, USA, ESEM ’14

  • Friedman N (2020) npm is joining github | the github blog. https://github.blog/2020-03-16-npm-is-joining-github/. Accessed 04 July 2022

  • FRS KP (1900) X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond Edinb Dublin Philos Mag J Sci 50(302):157–175

  • Golzadeh M, Legay D, Decan A, Mens T (2020) Bot or not? detecting bots in github pull request activity based on comment similarity. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, pp 31–35

  • Gousios G (2013) The GHTorrent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp 233–236

  • Gousios G, Storey MA, Bacchelli A (2016) Work practices and challenges in pull-based development: The contributor’s perspective. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pp 285–296

  • Hars A, Ou S (2001) Working for free? motivations of participating in open source projects. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences

  • Hata H, Treude C, Kula RG, Ishio T (2019) 9.6 million links in source code comments: Purpose, evolution, and decay. In: Proceedings of the 41st International Conference on Software Engineering, IEEE Press, ICSE ’19, pp 1211–1221

  • He H, He R, Gu H, Zhou M (2021) A large-scale empirical study on java library migrations: Prevalence, trends, and rationales. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2021, pp 478–490

  • Heinemann L, Deissenboeck F, Gleirscher M, Hummel B, Irlbeck M (2011) On the extent and nature of software reuse in open source java projects. In: Schmid K (ed) Top Productivity through Software Reuse. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 207–222

    Chapter  Google Scholar 

  • Huang K, Chen B, Shi B, Wang Y, Xu C, Peng X (2020) Interactive, effort-aware library version harmonization. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 518–529

  • Iaffaldano G, Steinmacher I, Calefato F, Gerosa M, Lanubile F (2019) Why do developers take breaks from contributing to oss projects? a preliminary analysis. In: Proceedings of the 2nd International Workshop on Software Health, IEEE Press, SoHeal ’19, pp 9–16

  • Islam S, Kula RG, Treude C, Chinthanet B, Ishio T, Matsumoto K (2021) Contrasting third-party package management user experience. In: 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 664–668

  • Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621

    Article  MATH  Google Scholar 

  • Kula RG, German DM, Ouni A, Ishio T, Inoue K (2018) Do developers update their library dependencies? Empir Softw Eng 23:384–417

    Article  Google Scholar 

  • Lee A, Carver JC, Bosu A (2017) Understanding the impressions, motivations, and barriers of one time code contributors to floss projects: A survey. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp 187–197

  • Li Z, Yu Y, Wang T, Yin G, Li S, Wang H (2021) Are you still working on this an empirical study on pull request abandonment. IEEE Trans Softw Eng PP:1. https://doi.org/10.1109/TSE.2021.3053403

  • Mäntylä MV, Novielli N, Lanubile F, Claes M, Kuutila M (2017) Bootstrapping a lexicon for emotional arousal in software engineering. In: Proceedings of the 14th International Conference on Mining Software Repositories, IEEE Press, MSR ’17, pp 198–202

  • McHugh ML (2012) Interrater reliability: the kappa statistic. Biochemia Med 22(3):276–282

    Article  MathSciNet  Google Scholar 

  • Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, IEEE Press, ASE 2017, p 84–94

  • Nakakoji K, Yamamoto Y, NISHINAKA Y, Kishida K, Ye Y (2003) Evolution patterns of open-source software systems and communities. In: International Workshop on Principles of Software Evolution (IWPSE)

  • Nichols S (2022) Log4shell vulnerability continues to menace developers. https://bit.ly/3yEDDrn. Accessed 04 July 2022

  • OpenSSF (2022) Openssf announces the alpha-omega project to improve software supply chain security for 10,000 oss projects - open source security foundation. https://openssf.org/press-release/2022/02/01/openssf-announces-the-alpha-omega-project-to-improve-software-supply-chain-security-for-10000-oss-projects/. Accessed 04 July 2022

  • Pinto G, Steinmacher I, Gerosa MA (2016) More common than you think: An in-depth study of casual contributors. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol 1, pp 112–123

  • Raymond E (1999) The cathedral and the bazaar. Knowl Technol Policy 12(3):23–49

    Article  Google Scholar 

  • Rehman I, Wang D, Kula RG, Ishio T, Matsumoto K (2020) Newcomer candidate: Characterizing contributions of a novice developer to github. In: Proceedings of the 36th international conference on software maintainance and evolution

  • Roberts J, Hann IH, Slaughter S (2006) Understanding the motivations, participation, and performance of open source software developers: A longitudinal study of the apache projects. Manag Sci 52:984–999

    Article  Google Scholar 

  • Rombaut B, Roseiro Côgo F, Adams B, Hassan AE (2022) There’s no such thing as a free lunch: Lessons learned from exploring the overhead introduced by the greenkeeper dependency bot in npm. ACM Transactions on Software Engineering and Methodology

  • Roth E (2022) Open source developer corrupts widely-used libraries, affecting tons of projects. https://www.theverge.com/2022/1/9/22874949/developer-corrupts-open-source-libraries-projects-affected. Accessed 04 July 2022

  • Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52:902–922

    Article  Google Scholar 

  • Schilling A, Laumer S, Weitzel T (2012) Who will remain? an evaluation of actual person-job and person-team fit to predict developer retention in floss projects. In: 2012 45th Hawaii International Conference on System Sciences, pp 3446–3455

  • Sharma A (2022) npm libraries ‘colors’ and ‘faker’ sabotaged in protest by their maintainer-what to do now? https://blog.sonatype.com/npm-libraries-colors-and-faker-sabotaged-in-protest-by-their-maintainer-what-to-do-now. Accessed 04 July 2022

  • Steinmacher I, Wiese I, Chaves AP, Gerosa MA (2013) Why do newcomers abandon open source software projects? In: 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE), pp 25–32

  • Steinmacher I, Pinto G, Wiese IS, Gerosa MA (2018) Almost there: a study on quasi-contributors in open source software projects. In: Proceedings of the 40th International Conference on Software Engineering, Association for Computing Machinery, New York, NY, USA, ICSE ’18, pp 256–266

  • Subramanian VN, Rehman I, Nagappan M, Kula RG (2022) Analyzing first contributions on github: What do newcomers do? IEEE Softw 39:93–101

    Article  Google Scholar 

  • Thung F (2016) Api recommendation system for software development. In: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 896–899

  • Valiev M, Vasilescu B, Herbsleb J (2018) Ecosystem-level determinants of sustained activity in open-source projects: A case study of the pypi ecosystem. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018, p 644–655

  • Viera A, Garrett J (2005) Understanding interobserver agreement: The kappa statistic. Fam Med 37:360–3

    Google Scholar 

  • Wang D, Xiao T, Thongtanunam P, Kula RG, Matsumoto K (2021) Understanding shared links and their intentions to meet information needs in modern code review: A case study of the openstack and qt projects. Empir Softw Eng 26:1–32

  • Wattanakriengkrai S, Chinthanet B, Hata H, Kula RG, Treude C, Guo J, Matsumoto K (2022) Github repositories with links to academic papers: Public access, traceability, and evolution. J Syst Softw 183:111117

    Article  Google Scholar 

  • Wattanakriengkrai S, Wang D, Kula RG, Treude C, Thongtanunam P, Ishio T, Matsumoto K (2022) Giving back: Contributions congruent to library dependency changes in a software ecosystem. IEEE Trans Softw Eng 1–13. https://doi.org/10.1109/TSE.2022.3225197

  • Wessel M, de Souza BM, Steinmacher I, Wiese IS, Polato I, Chaves AP, Gerosa MA (2018) The power of bots: Characterizing and understanding bots in oss projects. Proc ACM Hum-Comput Interact 2(CSCW)

  • Xu B, An L, Thung F, Khomh F, Lo D (2020) Why reinventing the wheels? an empirical study on library reuse and re-implementation. Empir Softw Eng 25:755–789

  • YazıcıV (2021) Volkan Yazıcıon twitter: log4j maintainers have been working sleeplessly on mitigation measures; fixes, docs, cve, replies to inquiries, etc. yet nothing is stopping people to bash us, for work we aren’t paid for, for a feature we all dislike yet needed to keep due to backward compatibility concerns. / twitter. https://twitter.com/yazicivo/status/1469349956880408583?lang=en. Accessed 04 July 2022

  • Zerouali A, Constantinou E, Mens T, Robles G, Gonzalez-Barahona J (2018) An empirical analysis of technical lag in npm package dependencies. In: New Opportunities for Software Reuse: 17th International Conference, ICSR 2018, Madrid, Spain, May 21-23, 2018, Proceedings 17, Springer, pp 95–110

  • Zhou M, Mockus A (2012) What make long term contributors: Willingness and opportunity in oss community. In: 2012 34th International Conference on Software Engineering (ICSE), pp 518–528

Download references

Acknowledgements

This work is supported by Japanese Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers 20K19774 and 20H05706.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vittunyuta Maeprasart.

Ethics declarations

Raula Gaikovina Kula and Christoph Treude are members of the EMSE Editorial Board.

Additional information

Communicated by: Andrea De Lucia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maeprasart, V., Wattanakriengkrai, S., Kula, R.G. et al. Understanding the role of external pull requests in the NPM ecosystem. Empir Software Eng 28, 84 (2023). https://doi.org/10.1007/s10664-023-10315-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-023-10315-w

Keywords

Navigation