Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works

  • Arkady Zaslavsky
  • Alejandro Bia
  • Krisztian Monostori
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2163)


This article describes a joint research work between Monash University and the University of Alicante, where software originally meant for plagiarisman and copy detection in academic works is successfully applied to perform comparative analysis of different editions of literary works. The experiments were performed with Spanish texts from the Miguel de Cervantes digital library. The results have proved useful for literary and linguistic research, automating part of the tedious task of comparative text analysis. Besides, other interesting uses were detected.


Digital Library Literary Work Digital Watermark Linguistic Research Approximate String Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Bia and Pedreño, 2001]
    Bia, A. and Pedreño, A. (2001). The Miguel de Cervantes Digital Library: The Hispanic Voice on the WEB. LLC (Literary and Linguistic Computing) journal, Oxford University Press, 16(2): 161–177. Presented at ALLC/ACH 2000, The Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 21/25 July 2000, University of Glasgow.CrossRefGoogle Scholar
  2. [Broder et al., ]
    Broder, A., Glassman, S., and Manasse, M. Syntatic Clustering of the Web. In Sixth International Web Conference, Santa Clara, California, USA. URL:
  3. [Chang and Lawler, 1994]
    Chang, W. and Lawler, E. (1994). Sublinear Approximate String Matching and Biological Applications. Algorithmica, 12:327–344.zbMATHCrossRefMathSciNetGoogle Scholar
  4. [Garcia-Molina and Shivakumar, 1995a]
    Garcia-Molina, H. and Shivakumar, N. (1995a). SCAM: A Copy Detection Mechanismfor Digital Documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries (DL’95), Austin, Texas.Google Scholar
  5. [Garcia-Molina and Shivakumar, 1995b]
    Garcia-Molina, H. and Shivakumar, N. (1995b). The SCAM Approach To Copy Detection in Digital Libraries. D-lib Magazine.Google Scholar
  6. [Gusfield, 1997]
    Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge University Press.Google Scholar
  7. [Heintze, 1996]
    Heintze, N. (1996). Scalable Document Fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, California. URL:
  8. [Manber, 1994]
    Manber, U. (1994). Finding similar Files in a Large File System. In Proceedings of the 1994 USENIX Conference, pages 1–10. URL:
  9. [Monostori et al., 1999]
    Monostori, K., Zaslavsky, A., and Schmidt, H. (1999). Parallel Overlap and Similarity Detection in Semi-Structured Document Collections. In 6th Annual Australasian Conference on Parallel And Real-Time Systems (PART’ 99).Google Scholar
  10. [Navarro et al., 1999]
    Navarro, G., Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Indexing and searching. In Modern Information Retrieval, chapter 8, pages 191–228. ACM press and Addison Wesley, Edinburgh Gate, Harlow, Essex CM20 2JE, England, 1st edition. See also or Scholar
  11. [Ukkonen, 1995]
    Ukkonen, E. (1995). On-Line Construction of Suffix Trees. Algorithmica, 14:249–260.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Arkady Zaslavsky
    • 1
  • Alejandro Bia
    • 2
  • Krisztian Monostori
    • 3
  1. 1.Monash UniversityMelbourneAustralia
  2. 2.Miguel de Cervantes DLUniversity of AlicanteAlicanteSpain
  3. 3.Monash UniversityMelbourneAustralia

Personalised recommendations