Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 236))

  • 1506 Accesses

Abstract

Cosine similarity measure is an efficient plagiarism detection algorithm for documents. However, it may be misled if the document is not properly preprocessed. Furthermore, the weight for the words in the document should depend on its occurrence frequency in the whole digital library. Otherwise, cosine similarity measure may not accurate enough. This paper aims to enhance the accuracy of similarity measure. A preprocessing method and a model to adjust word’s weight according to occurrence frequency are proposed in this paper. The paper also develops a sample to illustrate how to preprocess documents, adjust the weight for the words and calculate the similarity. The sample shows that it gets better result after applying the model in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sven, M.E., Benno, S.: Intrinsic plagiarism detection. In: Advances in Information retrieval 28th European Conference on IR Research, ECIR 2006, London, UK, Automatic Conceptual Analysis for plagiarism detection April 10–12, 2006 Proceedings. Lecture Notes in Computer Science, vol. 3936, pp. 565–569. Springer (2006)

    Google Scholar 

  2. Zechner, M., Muhr, M., Kern, R., Granitzer, M.: External and intrinsic plagiarism detection using vector space models. PAN’s, pp. 47–55 (2009)

    Google Scholar 

  3. Kang, N., Han, S.Y.: Document copy detection system based on plagiarism patterns. In: CICLing’06 Proceedings of the 7th international conference on computational linguistics and intelligent text processing, pp. 571–574 (2006)

    Google Scholar 

  4. Si, A., Leong, H.V., Lau, R.W.H.: CHECK: A document plagiarism detection system. Proc. ACM Symp. Applied Comput., 70–77 (1997)

    Google Scholar 

  5. Dreher, H.: Automatic conceptual analysis for plagiarism detection. J. Issues Informing Sci. Inf. Technol. 601–614 (2007)

    Google Scholar 

  6. Kang, N., Gelbukh, A., Han, S.: PPChecker: Plagiarism pattern checker in document copy detection. Proc. TSD, 661–667 (2006)

    Google Scholar 

  7. Timothy, H., Justin, Z.: Methods for Identifying versioned and plagiarized documents. J. Am. Soc. Inform. Sci. Technol. 54(3), 203–215 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Fang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Fang, J., Zhang, Y. (2013). An Improved Plagiarism Detection Method: Model and Sample. In: Wong, W.E., Ma, T. (eds) Emerging Technologies for Information Systems, Computing, and Management. Lecture Notes in Electrical Engineering, vol 236. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7010-6_106

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-7010-6_106

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-7009-0

  • Online ISBN: 978-1-4614-7010-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics