Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works
This article describes a joint research work between Monash University and the University of Alicante, where software originally meant for plagiarisman and copy detection in academic works is successfully applied to perform comparative analysis of different editions of literary works. The experiments were performed with Spanish texts from the Miguel de Cervantes digital library. The results have proved useful for literary and linguistic research, automating part of the tedious task of comparative text analysis. Besides, other interesting uses were detected.
KeywordsDigital Library Literary Work Digital Watermark Linguistic Research Approximate String Match
Unable to display preview. Download preview PDF.
- [Bia and Pedreño, 2001]Bia, A. and Pedreño, A. (2001). The Miguel de Cervantes Digital Library: The Hispanic Voice on the WEB. LLC (Literary and Linguistic Computing) journal, Oxford University Press, 16(2): 161–177. Presented at ALLC/ACH 2000, The Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 21/25 July 2000, University of Glasgow.CrossRefGoogle Scholar
- [Broder et al., ]Broder, A., Glassman, S., and Manasse, M. Syntatic Clustering of the Web. In Sixth International Web Conference, Santa Clara, California, USA. URL: http://decweb.ethz.ch/WWW6/Technical/Paper205/paper205.html.
- [Chang and Lawler, 1994]
- [Garcia-Molina and Shivakumar, 1995a]Garcia-Molina, H. and Shivakumar, N. (1995a). SCAM: A Copy Detection Mechanismfor Digital Documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries (DL’95), Austin, Texas.Google Scholar
- [Garcia-Molina and Shivakumar, 1995b]Garcia-Molina, H. and Shivakumar, N. (1995b). The SCAM Approach To Copy Detection in Digital Libraries. D-lib Magazine.Google Scholar
- [Gusfield, 1997]Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge University Press.Google Scholar
- [Heintze, 1996]Heintze, N. (1996). Scalable Document Fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, California. URL: http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.
- [Manber, 1994]Manber, U. (1994). Finding similar Files in a Large File System. In Proceedings of the 1994 USENIX Conference, pages 1–10. URL: http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.
- [Monostori et al., 1999]Monostori, K., Zaslavsky, A., and Schmidt, H. (1999). Parallel Overlap and Similarity Detection in Semi-Structured Document Collections. In 6th Annual Australasian Conference on Parallel And Real-Time Systems (PART’ 99).Google Scholar
- [Navarro et al., 1999]Navarro, G., Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Indexing and searching. In Modern Information Retrieval, chapter 8, pages 191–228. ACM press and Addison Wesley, Edinburgh Gate, Harlow, Essex CM20 2JE, England, 1st edition. See also http://www.dcc.ufmg.br/irbook or http://sunsite.dcc.uchile.cl/irbook.Google Scholar
- [Ukkonen, 1995]