Baer, N., & Zeidman, R. (2012). Measuring whitespace pattern sequence as an indication of plagiarism. Journal of Software Engineering and Applications, 5(4), 249–254.
Article
Google Scholar
Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In Proceedings of the international conference on software maintenance, ICSM ’98 (p. 368).
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article
MATH
Google Scholar
Burrows, S., Tahaghoghi, S. M . M., & Zobel, J. (2007). Efficient plagiarism detection for large code repositories. Software: Practice and Experience, 37(2), 151–175.
Google Scholar
Chae, D.-K., Ha, J., Kim, S.-W., Kang, B., & Im, E. G. (2013a). Software plagiarism detection: A graph-based approach. In Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM ’13 (pp. 1577–1580).
Chae, D.-K., Kim, S.-W., Ha, J., Lee, S.-C., & Woo, G. (2013b). Software plagiarism detection via the static api call frequency birthmark. In Proceedings of the 28th annual ACM symposium on applied computing, SAC’13 (pp. 1639–1643).
Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on theory of computing, STOC ’02 (pp. 380–388). New York, NY, USA: ACM.
Cosma, G., & Joy, M. (2013). Evaluating the performance of lsa for source-code plagiarism detection. Informatica, 36(4), 409–424.
Google Scholar
Faidhi, J. A. W., & Robinson, S. K. (1987). An empirical approach for detecting program similarity and plagiarism within a university programming environment. Computers and Education, 11(1), 11–19.
Article
Google Scholar
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181.
MathSciNet
MATH
Google Scholar
Flores, E., Barrón-Cedeño, A., Rosso, P., & Moreno, L. (2011). Towards the detection of cross-language source code reuse. In Proceedings of the 16th international conference on applications of natural language to information systems, NLDB 2011 (pp. 250–253).
Flores, E., Barrede, A., Moreno, L., & Rosso, P. (2014a). Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education, 23, 383–390.
Article
Google Scholar
Flores, E., Rosso, P., Moreno, L., & Villatoro-Tello, E. (2014b). PAN@FIRE: Overview of SOCO track on the detection of source code re-use. In Working notes of the forum for information retrieval evaluation, FIRE 2014.
Flores, E., Rosso, P., Moreno, L., & Villatoro-Tello, E. (2014c). Pan@fire: Overview of soco track on the detection of source code re-use. In Proceedings of the forum for information retrieval evaluation, FIRE 2014.
Fox, E. A., Koushik, M. P., Shaw, J. A., Modlin, R., & Rao, D. (1992). Combining evidence from multiple searches. In Proceedings of the first text REtrieval conference, TREC 1992, Gaithersburg, Maryland (pp. 319–328), November 4–6, 1992.
Grieve, J. (2007). Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251–270.
Article
Google Scholar
Hiemstra, D. (2000). Using language models for information retrieval. Ph.D. thesis, CTIT, AE Enschede.
Jones, J. (2003). Abstract syntax tree implementation idioms. In Proceedings of PLP ’03.
Kim, J. & Croft, W. B. (2012). A field relevance model for structured document retrieval. In Proceedings of the 34th European conference on IR research, ECIR 2012 (pp. 97–108).
Marinescu, D., Baicoianu, A., & Dimitriu, S. (2012). Software for plagiarism detection in computer source code. In Proceedings of the 7th international conference on virtual learning (Vol. 156, pp. 373–379).
Narayanan, S., & Simi, S. (2012). Source code plagiarism detection and performance analysis using fingerprint based distance measure method. In Procceedings of the 7th international conference on computer science and education, ICCSE ’12 (pp. 1065–1068).
Neamtiu, I., Foster, J. S., & Hicks, M. (2005). Understanding source code evolution using abstract syntax tree matching. Proceedings of the 2005 International Workshop on Mining Software Repositories, MSR’05, 30(4), 1–5.
Google Scholar
Ogilvie, P., & Callan, J. (2003). Combining document representations for known-item search. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’03 (pp. 143–150). New York, NY, USA: ACM.
Ponte, J. M. (1998). A language modeling approach to information retrieval. Ph.D. thesis, University of Massachusetts.
Potthast, M., Hagen, M., Beyer, A., Busse, M., Tippmann, M., Rosso, P., & Stein, B. (2014). Overview of the 6th international competition on plagiarism detection. In Working notes for CLEF 2014 conference (pp. 845–876).
Prechelt, L., Malpohl, G., & Philippsen, M. (2002). Finding plagiarisms among a set of programs with jplag. Journal of Universal Computer Science J-UCS, 8(11), 1016–1038.
Google Scholar
Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieva, SIGIR’05 (pp. 162–169). New York, NY, USA.
Schleimer, S., Wilkerson, D. S., & Aiken, A. (2003). Winnowing: Local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data (pp. 76–85). ACM.
Stein, B., Potthast, M., Rosso, P., Barredeo, A., Stamatatos, E., & Koppel, M. (2011). Fourth international workshop on uncovering plagiarism, authorship, and social software misuse. In SIGIR Forum (Vol. 45, pp. 45–48).
Takaki, T., Fujii, A., & Ishikawa, T. (2004). Associative document retrieval by query subtopic analysis and its application to invalidity patent search. In Proceedings of the thirteenth ACM international conference on information and knowledge management, CIKM ’04 (pp. 399–405).
Xue, X. & Croft, W. B. (2009). Automatic query generation for patent search. In Proceedings of the 18th ACM conference on information and knowledge management, CIKM ’09 (pp. 2037–2040). New York, NY, USA: ACM.