Source code authorship attribution is the task of determining who wrote a computer program, based on its source code, usually when the author is either unknown or under dispute. Areas where this can be applied include software forensics, cases of software copyright infringement, and detecting plagiarism. Numerous methods of source code authorship attribution have been proposed and studied. However, there are no known easily accessible and user-friendly programs that perform this task. Instead, researchers typically develop software in an ad hoc manner for use in their studies, and the software is rarely made publicly available. In this paper, we present a software tool called A Source Code Authorship Program (ASAP), which is suitable to be used by either the layperson or the expert. An author can be attributed to individual documents one at a time, or complex authorship attribution experiments can easily be performed on large datasets. In this paper, the interface and implementation of the ASAP tool is presented, and the tool is validated by using it to replicate previously published authorship attribution experiments.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price includes VAT (USA)
Tax calculation will be finalised during checkout.
Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. In: Proceedings of the Second Asian Information Retrieval Symposium (AIRS), pp. 174–189 (2005)
Frantzeskou, G., Stamatatos, E., Gritzalis, S., Katsikas, S.: Effective identification of source code authors using byte-level information. In: Proceedings of the 28th International Conference on Software Engineering (ICSE), pp. 893–896 (2006)
Burrows, S., Tahaghoghi, S.: Source code authorship attribution using n-grams. In: Proceedings of the 12th Australasian Document Computing Symposium, pp. 32–39 (2007)
Krsul, I., Spafford, E.: Authorship analysis: identifying the author of a program. Comput. Secur. (COMPSEC) 16(3), 233–257 (1997)
MacDonell, S., Gray, A., MacLennan, G., Sallis, P.: Software forensics for discriminating between program authors. In: Proceedings of the 6th International Conference on Neural Information Processing (ICONIP), pp. 66–71 (1999)
Ding, H., Samadzadeh, M.: Extraction of java program fingerprints for software authorship identification. J. Syst. Softw. 72, 49–57 (2004)
Lange, R., Mancoridis, S.: Using code metric histograms and genetic algorithms to perform author identification for software forensics. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO), pp. 2082–2089 (2007)
Kothari, J., Shevertalov, M., Stehle, E., Mancoridis, S.: A probabilistic approach to source code authorship identification, Proceedings of the Fourth International Conference on Information Technology, pp. 243248 (2007)
Elenbogen, B., Seliya, N.: Detecting outsourced student programming assignments. J. Comput. Sci. Coll. 23(3), 50–57 (2008)
Shevertalov, M., Kothari, J., Stehle, E., Mancoridis, S.: On the use of discretized source code metrics for author identification. In: Proceedings of the 1st International Symposium on Search Based Software Engineering (SSBSE), pp. 69–78 (2009)
Neme, A., Pulido, J., Muoz, A., Hernndez, S., Dey, T.: Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing 147, 147–159 (2015)
Caliskan-Islam, A., Harang, R., Liu, A., Narayanan, A., Voss, C., Yamaguchi, F.: De-anonymizing programmers via code stylometry. In: Proceedings of the 24th USENIX Security Symposium, pp. 255–270 (2015)
Yang, X., Xu, G., Li, Q., Guo, Y., Zhang, M.: Authorship attribution of source code by using back propagation neural network based on particle swarm optimization. PLoS ONE 12(11) (2017). https://doi.org/10.1371/journal.pone.0187204
Tennyson, M.: Authorship Attribution of Source Code. Nova Southeastern University, Florida (2013)
Tennyson, M., Mitropoulos, F.: Choosing a Profile Length in the SCAP Method of Source Code Authorship Attribution. In: 2014 Proceedings of the IEEE Southeastcon, pp. 1–6 (2014)
Tennyson, M., Mitropoulos, F.: Improving the Burrows Method of Source Code Authorship Attribution. In: Proceedings of the IADIS International Conference on Applied Computing, p. 39 (2013)
Burrows, S.: Source Code Authorship Attribution. RMIT, Melbourne (2010)
Burrows, S., Uitdenbogerd, A., Turpin, A.: Comparing techniques for authorship attribution of source code. J. Softw. Pract. Exp. 44, 1–32 (2014)
Swain, S., Mishra, G., Sindhu, C.: Recent approaches on authorship attribution techniques: an overview. In: Proceedings of the International Conference on Electronics, Communication and Aerospace Technology (ICECA), (2017)
Hendrikse, S.: The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files. Nova Southeastern University, Florida (2017)
Tennyson, M.: A replicated comparative study of Source Code Authorship Attribution. In: Proceedings of the 3rd International Workshop on Replication in Empirical Software Engineering Research (RESER), pp. 76–83 (2013)
McDonald, A., Afroz, S., Caliskan, A., Stolerman, A., Greenstadt, R.: Use fewer instances of the letter “i”: toward writing style anonymization. In: Proceedings of the International Symposium on Privacy Enhancing Technologies Symposium (PETS), pp. 299–318 (2012)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Frank, E., Hall, M., Witten, I.: The WEKA Workbench, 4th edn. Morgan Kaufmann, Burlington (2016)
Prechelt, L., Malpohl, G., Philippsen, M.: Finding plagiarisms among a set of programs with JPlag. J. Univers. Comput. Sci. 8(11), 1016–1038 (2002)
Schleimer, S., Wilkerson, D., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 76–85 (2003)
Niezgoda, S., Way, T.: SNITCH: a software tool for detecting cut and paste plagiarism. In: Proceedings of the 37th SIGCSE Technical Symposium on Computer Science Education (SIGCSE), pp. 51–55 (2006)
Robertson, S., Walker, S.: Okapi/Keenbow at TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), pp. 151–162 (1999)
I would like to extend a sincere word of thanks to the following current and former students for their software development contributions: Ethan Hill, Jacob Siegers, Justin Sassine, Conor Aberle, Joseph Sorgea, Anirudh Kambatla, Brian Rickard, and Michael Decker.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Tennyson, M.F. ASAP: A Source Code Authorship Program. Int J Softw Tools Technol Transfer 21, 471–484 (2019). https://doi.org/10.1007/s10009-019-00517-3
- Authorship attribution
- Source code
- Software forensics
- Plagiarism detection
- Software copyright infringement
- Similarity search
- Information retrieval
- Machine learning