Comparison of Overlap Detection Techniques

Monostori, Krisztián; Finkel, Raphael; Zaslavsky, Arkady; Hodász, Gábor; Pataki, Máté

doi:10.1007/3-540-46043-8_4

Krisztián Monostori⁷,
Raphael Finkel⁸,
Arkady Zaslavsky⁷,
Gábor Hodász⁹ &
…
Máté Pataki⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2329))

Included in the following conference series:

International Conference on Computational Science

920 Accesses
10 Citations

Abstract

Easy access to the World Wide Web has raised concerns about copyright issues and plagiarism. It is easy to copy someone else’s work and submit it as someone’s own. This problem has been targeted by many systems, which use very similar approaches. These approaches are compared in this paper and suggestions are made when different strategies are more applicable than others. Some alternative approaches are proposed that perform better than previously presented methods. These previous methods share two common stages: chunking of documents and selection of representative chunks. We study both stages and also propose alternatives that are better in terms of accuracy and space requirement. The applications of these methods are not limited to plagiarism detection but may target other copy-detection problems. We also propose a third stage to be applied in the comparison that uses suffix trees and suffix vectors to identify the overlapping chunks.

Download to read the full chapter text

Chapter PDF

Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition

Text Analysis with Enhanced Annotated Suffix Trees: Algorithms and Implementation

An Innovative Similarity Measure for Sentence Plagiarism Detection

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Argetsinger A. Technology exposes cheating at U-Va. The Washington Post, May 8, 2001.
Google Scholar
Benjaminson A. Internet offers new path to plagiarism, UC-Berkeley officials say. Daily Californian, October 6, 1999.
Google Scholar
Broder A.Z., Glassman S.C., Manasse M.S. Syntatic Clustering of the Web. Sixth International Web Conference, Santa Clara, California USA. URL http://decweb.ethz.ch/WWW6/Technical/Paper205/paper205.html
EVE Plagiarism Detection System. URL http://www.canexus.com , 2000
Garcia-Molina H., Shivakumar N. The SCAM Approach To Copy Detection in Digital Libraries. D-lib Magazine, November, 1995.
Google Scholar
Garcia-Molina H., Shivakumar N. Building a Scalable and Accurate Copy Detection Mechanism. Proceedings of 1st ACM International Conference on Digital Libraries (DL’96) March, Bethesda Maryland, 1996.
Google Scholar
Heintze N. Scalable Document Fingerprinting. Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, California, 18–21 November, 1996. URL http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html
Monostori K., Zaslavsky A., Schmidt H. MatchDetectReveal: Finding Overlapping and Similar Digital Documents. Information Resources Management Association International Conference (IRMA2000), 21-24 May, 2000 at Anchorage Hilton Hotel, Anchorage, Alaska, USA. pp 955-957, 2000.
Google Scholar
Monostori K., Zaslavsky A., Schmidt H. Parallel Overlap and Similarity Detection in Semi-Structured Document Collections. Proceedings of 6th Annual Australasian Conference on Parallel And Real-Time Systems (PART’ 99), Melbourne, Australia, 1999. pp 92–103, 1999.
Google Scholar
Plagiarism.org, the Internet plagiarism detection service for authors & education. URL http://www.plagiarism.org , 1999.
Rivest R. L.. RFC 1321: The MD5 Message-Digest Algorithm. Internet Activities Board, April 1992.
Google Scholar
Wall L. and Schwartz R. L. Programming Perl. O’Reilly & Associates, Inc., 981 Chestnut Street, Newton, MA 02164, USA, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Software Engineering, Monash University, 900 Dandenong Rd, Caulfield East, VIC 3145, Australia
Krisztián Monostori & Arkady Zaslavsky
Computer Science, University of Kentucky, 773 Anderson Hall, Lexington, KY 40506-0046, USA
Raphael Finkel
Department of Automation and Applied Informatics, Budapest University of Technology and Economic Sciences, 1111 Budapest, Goldmann György tér 3. IV.em.433., Hungary
Gábor Hodász & Máté Pataki

Authors

Krisztián Monostori
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Finkel
View author publications
You can also search for this author in PubMed Google Scholar
Arkady Zaslavsky
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Hodász
View author publications
You can also search for this author in PubMed Google Scholar
Máté Pataki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science, Section Computational Science, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Peter M. A. Sloot & Alfons G. Hoekstra &
Western Science Center, SHARCNET, University ofWestern Ontario, London, Ontario, Canada, N6A 5B7
C. J. Kenneth Tan
Computer Science Department Innovative Computing Laboratory, University of Tennessee, 1122 Volunteer Blvd, Knoxville, TN, 37996-3450, USA
Jack J. Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monostori, K., Finkel, R., Zaslavsky, A., Hodász, G., Pataki, M. (2002). Comparison of Overlap Detection Techniques. In: Sloot, P.M.A., Hoekstra, A.G., Tan, C.J.K., Dongarra, J.J. (eds) Computational Science — ICCS 2002. ICCS 2002. Lecture Notes in Computer Science, vol 2329. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46043-8_4

Download citation

DOI: https://doi.org/10.1007/3-540-46043-8_4
Published: 10 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43591-4
Online ISBN: 978-3-540-46043-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Comparison of Overlap Detection Techniques

Abstract

Chapter PDF

Similar content being viewed by others

Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition

Text Analysis with Enhanced Annotated Suffix Trees: Algorithms and Implementation

An Innovative Similarity Measure for Sentence Plagiarism Detection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Comparison of Overlap Detection Techniques

Abstract

Chapter PDF

Similar content being viewed by others

Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition

Text Analysis with Enhanced Annotated Suffix Trees: Algorithms and Implementation

An Innovative Similarity Measure for Sentence Plagiarism Detection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation