A Detection of the Most Influential Documents
This work is a result of the ongoing research on semantic compression and robust algorithms applicable in plagiarism detection. This article includes a brief description of Sentence Hashing Algorithm for Plagiarism Detection SHAPD along with a comparison with the other available alternatives using frame structures for subsequence detection. What is more, the core of this publication is devoted to the application of SHAPD to a task of discovery of the most influential documents in a corpus. The experiments were carried out on multiple datasets diversified in terms of structure and content. The observations gathered during the experiments were summarised and are given in the article. The experiment allowed the authors to verify their initial hypothesis that it is possible to single out the most important documents in a corpus capturing the relations of citation among them.
Unable to display preview. Download preview PDF.
- 1.Hamid, O.A., Behzadi, B., Christoph, S., Henzinger, M.: Detecting the origin of text segments efficiently. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, vol. 7(3), pp. 61–70 (2009)Google Scholar
- 7.Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantic Compression for Specialised Information Retrieval Systems. In: Nguyen, N.T., Katarzyniak, R., Chen, S.-M. (eds.) Advances in Intelligent Information and Database Systems. SCI, vol. 283, pp. 111–121. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 9.Chvatal, V., Klarner, D.A., Knuth, D.E.: Selected combinatorial research problems. Technical report, Stanford, CA, USA (1972)Google Scholar
- 10.Grozea, C., Gehl, C., Popescu, M.: Encoplot: Pairwise sequence matching in linear time applied to plagiarism detection. Time, 10–18 (2009)Google Scholar
- 13.Irving, R.W.: Plagiarism and collusion detection using the smith-waterman algorithm. Technical report, University of Glasgow, Department of Computing Science (2004)Google Scholar
- 15.Manber, U.: Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference, WTEC 1994, p. 2. USENIX Association, Berkeley (1994)Google Scholar
- 17.Mozgovoy, M., Karakovskiy, S., Klyuev, V.: Fast and reliable plagiarism detection system. In: 37th Annual Frontiers In Education Conference - Global Engineering: Knowledge Without Borders, Opportunities Without Passports, FIE 2007, pp. S4H-11–S4H-14 (October 2007)Google Scholar