Source Code Author Identification Based on N-gram Author Profiles
Purchase on Springer.com
$29.95 / €24.95 / £19.95*
* Final gross prices may vary according to local VAT.
Source code author identification deals with the task of identifying the most likely author of a computer program, given a set of predefined author candidates. This is usually. based on the analysis of other program samples of undisputed authorship by the same programmer. There are several cases where the application of such a method could be of a major benefit, such as authorship disputes, proof of authorship in court, tracing the source of code left in the system after a cyber attack, etc. We present a new approach, called the SCAP (Source Code Author Profiles) approach, based on byte-level n-gram profiles in order to represent a source code author’s style. Experiments on data sets of different programming language (Java or C++) and varying difficulty (6 to 30 candidate authors) demonstrate the effectiveness of the proposed approach. A comparison with a previous source code authorship identification study based on more complicated information shows that the SCAP approach is language independent and that n-gram author profiles are better able to capture the idiosyncrasies of the source code authors. Moreover the SCAP approach is able to deal surprisingly well with cases where only a limited amount of very short programs per programmer is available for training. It is also demonstrated that the effectiveness of the proposed model is not affected by the absence of comments in the source code, a condition usually met in cyber-crime cases.
- B Ding, H., Samadzadeh, M., H., Extraction of Java program fingerprints for software authorship identification, The Journal of Systems and Software, Volume 72, Issue 1, Pages 49–57 June 2004. CrossRef
- Frantzeskou, G., Stamatatos, E., Gritzalis, S., Supporting the cybercrime investigation process: Effective discrimination of source code based on byte level information, in Proc. 2nd International Conference on e-business and Telecommunications Networks (ICETE05), 2005.
- Keselj, V., Peng, F., Cercone, N., Thomas, C, N-gram based author profiles for authorship attribution, In Proc. Pacific Association for Computational Linguistics 2003.
- Keselj, V.,. Perl package Text::N-grams http://www.cs.dal.ca/~vlado/srcperl/N-grams or http://search.cpan.org/author/VL ADO/Text-N-grams-0.03/N-grams.pm. 2003.
- Krsul, I., and Spafford, E. H, Authorship analysis: Identifying the author of a program, In Proc. 8th National Information Systems Security Conference, pages 514–524, National Institute of Standards and Technology, 1995.
- Krsul, I., and Spafford, E. H., 1996, Authorship analysis: Identifying the author of a program, Technical Report TR-96-052, 1996.
- MacDonell, S.G, and Gray, A.R. Software forensics applied to the task of discriminating between program authors. Journal of Systems Research and Information Systems 10: 113–127 (2001).
- Source Code Author Identification Based on N-gram Author Profiles
- Book Title
- Artificial Intelligence Applications and Innovations
- Book Subtitle
- 3rd IFIP Conference on Artificial Intelligence Applications and Innovations (AIAI) 2006, June 7–9, 2006, Athens, Greece
- pp 508-515
- Print ISBN
- Online ISBN
- Series Title
- IFIP International Federation for Information Processing
- Series Volume
- Series ISSN
- Springer US
- Copyright Holder
- International Federation for Information Processing
- Additional Links
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 1. University of the Aegean
- 2. ICCS/NTUA
- 3. University of Plymouth
- Author Affiliations
- 4. Laboratory of Information and Communication Systems Security, Department of Information and Communication Systems Engineering, University of the Aegean, Karlovasi, Samos, 83200, Greece
To view the rest of this content please follow the download PDF link above.