Advertisement

Empirical Software Engineering

, Volume 23, Issue 4, pp 1871–1894 | Cite as

Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software

  • Michail TsikerdekisEmail author
Article

Abstract

Measuring code contribution in crowdsourced software is essential for ranking contributors to a project or distributing revenue. Past studies have demonstrated that there is variation between different code contribution measures and their ability for ranking users accurately. This study proposes a new code contribution ranking algorithm, Persistent Code Contribution (PCC), that aims to be language independent, quality aware and provide a ranking balance between new and senior users. PCC tracks the number of characters contributed by a user and ranks each character based on the number of subsequent revisions that each character survived for. It also tracks lines that may have been moved between revisions in the code and attributes character changes to the appropriate user that committed them to a repository. A ranking comparison between existing code contribution measures is performed to determine the similarities and differences, and, quantitative as well as qualitative evidence is presented as a means to validate the algorithm.

Keywords

Contribution Crowdsource Open-source Code Measure Ranking 

References

  1. Beck K (1999) Embracing change with extreme programming.  https://doi.org/10.1109/2.796139
  2. Benaglia T, Chauveau D, Hunter DR, Young DS (2009) mixtools: An R Package for Analyzing Finite Mixture Models. J Stat Softw 32 (6):1–29. https://hal.archives-ouvertes.fr/hal-00384896 CrossRefGoogle Scholar
  3. Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’T touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE ’11.  https://doi.org/10.1145/2025113.2025119, pp 4–14
  4. Black P (2004) Ratcliff/Obershelp pattern recognition. http://www.nist.gov/dads/HTML/ratcliffObershelp.html
  5. Canfora G, Cerulo L, Penta MD (2007) Identifying changed source code lines from version repositories.  https://doi.org/10.1109/MSR.2007.14
  6. Canfora G, Cerulo L, Penta MD (2009) Ldiff: an enhanced line differencing tool.  https://doi.org/10.1109/ICSE.2009.5070564
  7. Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12.  https://doi.org/10.1109/32.895984 CrossRefGoogle Scholar
  8. Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess?. In: Proceedings of the 8th Working Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ’11, pp 153–162.  https://doi.org/10.1145/1985441.1985464
  9. Foucault M, Falleri JR, Blanc X (2014) Code ownership in open-source software. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, ACM, New York, NY, USA, EASE ’14, pp 39:1—-39:9.  https://doi.org/10.1145/2601248.2601283
  10. Foucault M, Teyton C, Lo D, Blanc X, Falleri JR (2015) On the usefulness of ownership metrics in open-source software projects. Inf Softw Technol 64:102–112.  https://doi.org/10.1016/j.infsof.2015.01.013. http://www.sciencedirect.com/science/article/pii/S0950584915000294 CrossRefGoogle Scholar
  11. Frantzeskou G, Stamatatos E, Gritzalis S, Chaski CE, Howald BS (2007) Identifying authorship by byte-level N-grams: the source code author profile (SCAP) method. Int J Digital Evidence 6(1):1–18Google Scholar
  12. Frantzeskou G, MacDonell SG, Stamatatos E (2010) Source code authorship analysis for supporting the cybercrime investigation process. In: Handbook of Research on Computational Forensics, Digital Crime, and Investigation, IGI Global.  https://doi.org/10.4018/978-1-60566-836-9.ch020, pp 470–495
  13. Halfaker A, Keyes O, Kluver D, Thebault-Spieker J, Nguyen T, Shores K, Uduwage A, Warncke-Wang M (2015) User session identification based on strong regularities in inter-activity time. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, WWW ’15, pp 410–418.  https://doi.org/10.1145/2736277.2741117
  14. Halvorsen SM, Raaen K (2014) Games for research: a comparative study of open source game projects. In: Mey D, Alexander M, Bientinesi P, Cannataro M, Clauss C, Costan A, Kecskemeti G, Morin C, Ricci L, Sahuquillo J, Schulz M, Scarano V, Scott SL, Weidendorfer J (eds) Euro-Par 2013: Parallel Processing Workshops: BigDataCloud, DIHC, FedICI, HeteroPar, HiBB, LSDVE, MHPC, OMHI, PADABS, PROPER, Resilience, ROME, and UCHPC 2013, Aachen, Germany, August 26-27, 2013. Revised Selected Papers. Springer, Berlin, pp 353–362.  https://doi.org/10.1007/978-3-642-54420-0_35
  15. Harrison W (1992) An entropy-based measure of software complexity. IEEE Trans Softw Eng 18(11):1025–1029.  https://doi.org/10.1109/32.177371 CrossRefGoogle Scholar
  16. Hirth M, Hoßfeld T, Tran-Gia P (2011), Anatomy of a Crowdsourcing Platform - Using the Example of Microworkers.com.  https://doi.org/10.1109/IMIS.2011.89
  17. Kilgour R, Gray A, Sallis P, MacDonell S (1998) A fuzzy logic approach to computer software source code authorship analysis. In: Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems. Springer, Berlin, pp 865–868. http://hdl.handle.net/10292/3471
  18. Linares-Vasquez M, Hossen K, Dang H, Kagdi H, Gethers M, Poshyvanyk D (2012) Triaging incoming change requests: Bug or commit history, or code authorship?. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM), pp 451–460.  https://doi.org/10.1109/ICSM.2012.6405306
  19. Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25(2):322–336.  https://doi.org/10.1145/322063.322075 MathSciNetCrossRefzbMATHGoogle Scholar
  20. McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK Projects. In: Proceedings of the 11th working conference on mining software repositories, ACM, New York, NY, USA, MSR 2014, pp 192–201.  https://doi.org/10.1145/2597073.2597076
  21. Meng X, Miller BP, Williams WR, Bernat AR (2013) Mining software repositories for accurate authorship. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, IEEE Computer Society, Washington, DC, USA, ICSM ’13, pp 250–259.  https://doi.org/10.1109/ICSM.2013.36
  22. Nardi BA (1996) Context and consciousness: activity theory and human-computer interaction. MIT Press, CambridgeGoogle Scholar
  23. Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes.  https://doi.org/10.1109/TSE.2007.1015
  24. Panciera K, Halfaker A, Terveen L (2009) Wikipedians are born, not made: a study of power editors on Wikipedia. In: Proceedings of the ACM 2009 International Conference on Supporting Group Work, Association for Computing Machinery, vol 4. ACM Press, New York, pp 51–60.  https://doi.org/10.1145/1531674.1531682
  25. Peng X, Babar MA, Ebert C (2014) Collaborative software development platforms for crowdsourcing. IEEE Softw 31(2):30–36.  https://doi.org/10.1109/MS.2014.31 CrossRefGoogle Scholar
  26. Posnett D, D’Souza R, Devanbu P, Filkov V (2013) Dual ecological measures of focus in software development. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, NJ, USA, ICSE ’13, pp 452–461. http://dl.acm.org/citation.cfm?id=2486788.2486848
  27. Prechelt L (2000) An empirical comparison of seven programming languages. Computer 33(10):23–29.  https://doi.org/10.1109/2.876288 CrossRefGoogle Scholar
  28. Pythonorg (2016) difflib — Helpers for computing deltas. https://docs.python.org/2/library/difflib.html
  29. Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd International Conference on Software Engineering, ACM, New York, NY, USA, ICSE ’11, pp 491–500.  https://doi.org/10.1145/1985793.1985860
  30. Raymond E (1999) The cathedral and the bazaar. Knowl Technol Policy 12(3):23–49.  https://doi.org/10.1007/s12130-999-1026-0 CrossRefGoogle Scholar
  31. van Wendel de Joode R, De Bruijn JA, Van Eeten MJG (2003) Protecting the virtual commons: self-organizing open source communities and innovative intellectual property regimes. Asser Press International Distribution by kluwer Law International, The Hague, The Netherlands. http://hdl.handle.net/10535/25
  32. Wagner R, Fischer M (1974) The string-to-string correction problem. J ACM 21(1):168–173MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceWestern Washington UniversityBellinghamUSA

Personalised recommendations