Empirical Software Engineering

, Volume 23, Issue 4, pp 1871–1894 | Cite as

Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software

  • Michail TsikerdekisEmail author


Measuring code contribution in crowdsourced software is essential for ranking contributors to a project or distributing revenue. Past studies have demonstrated that there is variation between different code contribution measures and their ability for ranking users accurately. This study proposes a new code contribution ranking algorithm, Persistent Code Contribution (PCC), that aims to be language independent, quality aware and provide a ranking balance between new and senior users. PCC tracks the number of characters contributed by a user and ranks each character based on the number of subsequent revisions that each character survived for. It also tracks lines that may have been moved between revisions in the code and attributes character changes to the appropriate user that committed them to a repository. A ranking comparison between existing code contribution measures is performed to determine the similarities and differences, and, quantitative as well as qualitative evidence is presented as a means to validate the algorithm.


Contribution Crowdsource Open-source Code Measure Ranking 


  1. Beck K (1999) Embracing change with extreme programming.
  2. Benaglia T, Chauveau D, Hunter DR, Young DS (2009) mixtools: An R Package for Analyzing Finite Mixture Models. J Stat Softw 32 (6):1–29. CrossRefGoogle Scholar
  3. Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’T touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE ’11., pp 4–14
  4. Black P (2004) Ratcliff/Obershelp pattern recognition.
  5. Canfora G, Cerulo L, Penta MD (2007) Identifying changed source code lines from version repositories.
  6. Canfora G, Cerulo L, Penta MD (2009) Ldiff: an enhanced line differencing tool.
  7. Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12. CrossRefGoogle Scholar
  8. Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess?. In: Proceedings of the 8th Working Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ’11, pp 153–162.
  9. Foucault M, Falleri JR, Blanc X (2014) Code ownership in open-source software. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, ACM, New York, NY, USA, EASE ’14, pp 39:1—-39:9.
  10. Foucault M, Teyton C, Lo D, Blanc X, Falleri JR (2015) On the usefulness of ownership metrics in open-source software projects. Inf Softw Technol 64:102–112. CrossRefGoogle Scholar
  11. Frantzeskou G, Stamatatos E, Gritzalis S, Chaski CE, Howald BS (2007) Identifying authorship by byte-level N-grams: the source code author profile (SCAP) method. Int J Digital Evidence 6(1):1–18Google Scholar
  12. Frantzeskou G, MacDonell SG, Stamatatos E (2010) Source code authorship analysis for supporting the cybercrime investigation process. In: Handbook of Research on Computational Forensics, Digital Crime, and Investigation, IGI Global., pp 470–495
  13. Halfaker A, Keyes O, Kluver D, Thebault-Spieker J, Nguyen T, Shores K, Uduwage A, Warncke-Wang M (2015) User session identification based on strong regularities in inter-activity time. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, WWW ’15, pp 410–418.
  14. Halvorsen SM, Raaen K (2014) Games for research: a comparative study of open source game projects. In: Mey D, Alexander M, Bientinesi P, Cannataro M, Clauss C, Costan A, Kecskemeti G, Morin C, Ricci L, Sahuquillo J, Schulz M, Scarano V, Scott SL, Weidendorfer J (eds) Euro-Par 2013: Parallel Processing Workshops: BigDataCloud, DIHC, FedICI, HeteroPar, HiBB, LSDVE, MHPC, OMHI, PADABS, PROPER, Resilience, ROME, and UCHPC 2013, Aachen, Germany, August 26-27, 2013. Revised Selected Papers. Springer, Berlin, pp 353–362.
  15. Harrison W (1992) An entropy-based measure of software complexity. IEEE Trans Softw Eng 18(11):1025–1029. CrossRefGoogle Scholar
  16. Hirth M, Hoßfeld T, Tran-Gia P (2011), Anatomy of a Crowdsourcing Platform - Using the Example of
  17. Kilgour R, Gray A, Sallis P, MacDonell S (1998) A fuzzy logic approach to computer software source code authorship analysis. In: Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems. Springer, Berlin, pp 865–868.
  18. Linares-Vasquez M, Hossen K, Dang H, Kagdi H, Gethers M, Poshyvanyk D (2012) Triaging incoming change requests: Bug or commit history, or code authorship?. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM), pp 451–460.
  19. Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25(2):322–336. MathSciNetCrossRefzbMATHGoogle Scholar
  20. McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK Projects. In: Proceedings of the 11th working conference on mining software repositories, ACM, New York, NY, USA, MSR 2014, pp 192–201.
  21. Meng X, Miller BP, Williams WR, Bernat AR (2013) Mining software repositories for accurate authorship. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, IEEE Computer Society, Washington, DC, USA, ICSM ’13, pp 250–259.
  22. Nardi BA (1996) Context and consciousness: activity theory and human-computer interaction. MIT Press, CambridgeGoogle Scholar
  23. Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes.
  24. Panciera K, Halfaker A, Terveen L (2009) Wikipedians are born, not made: a study of power editors on Wikipedia. In: Proceedings of the ACM 2009 International Conference on Supporting Group Work, Association for Computing Machinery, vol 4. ACM Press, New York, pp 51–60.
  25. Peng X, Babar MA, Ebert C (2014) Collaborative software development platforms for crowdsourcing. IEEE Softw 31(2):30–36. CrossRefGoogle Scholar
  26. Posnett D, D’Souza R, Devanbu P, Filkov V (2013) Dual ecological measures of focus in software development. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, NJ, USA, ICSE ’13, pp 452–461.
  27. Prechelt L (2000) An empirical comparison of seven programming languages. Computer 33(10):23–29. CrossRefGoogle Scholar
  28. Pythonorg (2016) difflib — Helpers for computing deltas.
  29. Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd International Conference on Software Engineering, ACM, New York, NY, USA, ICSE ’11, pp 491–500.
  30. Raymond E (1999) The cathedral and the bazaar. Knowl Technol Policy 12(3):23–49. CrossRefGoogle Scholar
  31. van Wendel de Joode R, De Bruijn JA, Van Eeten MJG (2003) Protecting the virtual commons: self-organizing open source communities and innovative intellectual property regimes. Asser Press International Distribution by kluwer Law International, The Hague, The Netherlands.
  32. Wagner R, Fischer M (1974) The string-to-string correction problem. J ACM 21(1):168–173MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceWestern Washington UniversityBellinghamUSA

Personalised recommendations