Advertisement

Abstract

R is a programming language and software environment for performing statistical computations and applying data analysis that increasingly gains popularity among practitioners and scientists. In this paper we present a preliminary version of a system to detect pairs of similar R code blocks among a given set of routines, which bases on a proper aggregation of the output of three different [0,1]-valued (fuzzy) proximity degree estimation algorithms. Its analysis on empirical data indicates that the system may in future be successfully applied in practice in order e.g. to detect plagiarism among students’ homework submissions or to perform an analysis of code recycling or code cloning in R’s open source packages repositories.

Keywords

antiplagiarism detection code cloning fuzzy proximity relations aggregation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aiken, A.: MOSS (Measure of software similarity) plagiarism detection system, http://theory.stanford.edu/~aiken/moss/
  2. 2.
    Chilowicz, M., Duris, E., Roussel, G.: Viewing functions as token sequences to highlight similarities in source code. Science of Computer Programming 78, 1871–1891 (2013)CrossRefGoogle Scholar
  3. 3.
    Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)CrossRefGoogle Scholar
  4. 4.
    Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program Lang. Syst. 9(3), 319–349 (1987)CrossRefzbMATHGoogle Scholar
  5. 5.
    Fodor, J., Roubens, M.: Fuzzy Preference Modelling and Multicriteria Decision Support. Springer (1994)Google Scholar
  6. 6.
    Gagolewski, M., Grzegorzewski, P.: Possibilistic analysis of arity-monotonic aggregation operators and its relation to bibliometric impact assessment of individuals. International Journal of Approximate Reasoning 52(9), 1312–1324 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Grabisch, M., Marichal, J.L., Mesiar, R., Pap, E.: Aggregation functions. Cambridge University Press (2009)Google Scholar
  8. 8.
    Hamming, R.W.: Error detecting and error correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Lee, C.Y.: Some properties of nonbinary error-correcting codes. IRE Transactions on Information Theory 4(2), 77–82 (1958)CrossRefGoogle Scholar
  10. 10.
    Levenshtein, I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  11. 11.
    Liu, C., Chen, C., Han, J., Yu, P.S.: GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis. In: Proc. 12th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining (KDD 2006), pp. 872–881 (2006)Google Scholar
  12. 12.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)CrossRefGoogle Scholar
  13. 13.
    Prechelt, L., Malpohl, G., Philippsen, M.: Finding plagiarisms among a set of programs with JPlag. Journal of Universal Computer Science 8(11), 1016–1038 (2002)Google Scholar
  14. 14.
    Prechelt, L., Malpohl, G., Phlippsen, M.: JPlag: Finding plagiarisms among a set of programs. Tech. rep. (2000)Google Scholar
  15. 15.
    Qu, W., Jia, Y., Jiang, M.: Pattern mining of cloned codes in software systems. Information Sciences 259, 544–554 (2014)CrossRefGoogle Scholar
  16. 16.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2014), http://www.R-project.org/
  17. 17.
    Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proc. Section on Survey Research Methods (ASA), pp. 354–359 (1990)Google Scholar
  18. 18.
    Wise, M.J.: String similarity via greedy string tiling and running Karp-Rabin matching. Tech. rep., Dept. of Computer Science, University of Sydney (1993)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Maciej Bartoszuk
    • 1
  • Marek Gagolewski
    • 2
    • 3
  1. 1.Interdisciplinary PhD Studies Program, Systems Research InstitutePolish Academy of SciencesPoland
  2. 2.Systems Research InstitutePolish Academy of SciencesWarsawPoland
  3. 3.Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations