An Improved Plagiarism Detection Method: Model and Sample

  • Jing FangEmail author
  • Yuanyuan Zhang
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 236)


Cosine similarity measure is an efficient plagiarism detection algorithm for documents. However, it may be misled if the document is not properly preprocessed. Furthermore, the weight for the words in the document should depend on its occurrence frequency in the whole digital library. Otherwise, cosine similarity measure may not accurate enough. This paper aims to enhance the accuracy of similarity measure. A preprocessing method and a model to adjust word’s weight according to occurrence frequency are proposed in this paper. The paper also develops a sample to illustrate how to preprocess documents, adjust the weight for the words and calculate the similarity. The sample shows that it gets better result after applying the model in this paper.


Plagiarism detection Feature vector Cosine 


  1. 1.
    Sven, M.E., Benno, S.: Intrinsic plagiarism detection. In: Advances in Information retrieval 28th European Conference on IR Research, ECIR 2006, London, UK, Automatic Conceptual Analysis for plagiarism detection April 10–12, 2006 Proceedings. Lecture Notes in Computer Science, vol. 3936, pp. 565–569. Springer (2006)Google Scholar
  2. 2.
    Zechner, M., Muhr, M., Kern, R., Granitzer, M.: External and intrinsic plagiarism detection using vector space models. PAN’s, pp. 47–55 (2009)Google Scholar
  3. 3.
    Kang, N., Han, S.Y.: Document copy detection system based on plagiarism patterns. In: CICLing’06 Proceedings of the 7th international conference on computational linguistics and intelligent text processing, pp. 571–574 (2006)Google Scholar
  4. 4.
    Si, A., Leong, H.V., Lau, R.W.H.: CHECK: A document plagiarism detection system. Proc. ACM Symp. Applied Comput., 70–77 (1997)Google Scholar
  5. 5.
    Dreher, H.: Automatic conceptual analysis for plagiarism detection. J. Issues Informing Sci. Inf. Technol. 601–614 (2007)Google Scholar
  6. 6.
    Kang, N., Gelbukh, A., Han, S.: PPChecker: Plagiarism pattern checker in document copy detection. Proc. TSD, 661–667 (2006)Google Scholar
  7. 7.
    Timothy, H., Justin, Z.: Methods for Identifying versioned and plagiarized documents. J. Am. Soc. Inform. Sci. Technol. 54(3), 203–215 (2003)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Modern Educational Technology CenterNorth China Institute of Science and TechnologyHebeiChina
  2. 2.LibraryNorth China Institute of Science and TechnologyHebeiChina

Personalised recommendations