Kang N., Gelbukh A., Han S. (2006) PPChecker: Plagiarism Pattern Checker in Document Copy Detection. In: Sojka P., Kopeček I., Pala K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science, vol 4188. Springer, Berlin, Heidelberg
Nowadays, most of documents are produced in digital format, in which they can be easily accessed and copied. Document copy detection is a very important tool for protecting the author’s copyright. We present PPChecker, a document copy detection system based on plagiarism pattern checking. PPChecker calculates the amount of data copied from the original document to the query document, based on linguistically-motivated plagiarism patterns. Experiments performed on CISI document collection show that PPChecker produces better decision information for document copy detection than existing systems.