Identifying Quotations in Reference Works and Primary Materials

  • Andrea Ernst-Gerlach
  • Gregory Crane
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5173)


Identifying quotations from reference works in primary materials is a very important feature for digital libraries. By adding corresponding citation links to the original text, we can help contextualize the source material. In this paper we introduce an algorithm for identifying citations automatically based on an analysis of the structure of quotations from three different reference works of Latin texts. An evaluation shows that this approach is capable of finding a large number of quotations with which no machine actionable citations are associated. Additionally this approach can be applied for quotations that have been altered in a range of ways from their source.


citations reference works 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Crane, G.: What Do You Do With A Million Books? D-Lib Magazine 12 (2006),
  2. 2.
    Stewart, G., Crane, G., Babeu, A.: A New Generation of Textual Corpora: Mining Corpora from Very Large Collections. In: JCDL 2007: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 356–365. ACM Press, New York (2007)Google Scholar
  3. 3.
    Kinable, G.: Computerized Restoration of Historical Dictionaries: Uniformization and Date-assigning in Dictionary Quotations of the Woordenboek der Nederlandsche Taal. Literary & Linguistic Computing 21, 295–310 (2006)CrossRefGoogle Scholar
  4. 4.
    Pouliquen, B., Steinberger, R., Best, C.: Automatic Detection of Quotations in Multilingual News. In: Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2007) (2007)Google Scholar
  5. 5.
    Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-Based Plagiarism Detection Methods and Tools: an Overview. In: Rachev, B., Smrikarov, A., Dimov, D. (eds.) CompSysTech 2007: Proceedings of the 2007 International Conference on Computer Systems and Technologies, Article no. 40. ACM Press, New York (2007)Google Scholar
  6. 6.
    Brin, S., Davis, J., García-Molina, H.: Copy Detection Mechanisms for Digital Documents. In: Carey, M., Schneider, D. (eds.) Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 398–409. ACM Press, New York (1995)CrossRefGoogle Scholar
  7. 7.
    Hoad, T.C., Zobel, J.: Methods for Identifying Versioned and Plagiarized Documents. Journal of the ASIS&T 54, 203–215 (2003)Google Scholar
  8. 8.
    Zaslavsky, A., Bia, A., Monostori, K.: Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 103–114. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  9. 9.
    Stein, B., Meyer zu Eissen, S.: Near Similarity Search and Plagiarism Analysis. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 430–437. Springer, Berlin (2005)Google Scholar
  10. 10.
    Metzler, D., Dumais, S., Meek, C.: Similarity Measures for Short Segments of Text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Metzler, D., Bernstein, Y., Croft, B.W., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: CIKM 2005: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 517–524. ACM Press, New York (2005)CrossRefGoogle Scholar
  12. 12.
    Lee, J.: A Computational Model of Text Reuse in Ancient Literary Texts. In: 45th Annual Meeting of the Association of Computational Linguistics, pp. 472–479. ACL (2007)Google Scholar
  13. 13.
    Takeda, M., Fukuka, T., Nanri, I., Yamasaki, M., Tamari, K.: Discovering Instances of Poetic Allusion from Anthologies of Classical Japanese Poems. Theoretical Computer Science 292, 497–524 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Hori, H., Shimozono, S., Takeda, M., Shinohara, A.: Fragmentary Pattern Matching: Complexity, Algorithms and Applications for Analyzing Classic Literary Works. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS, vol. 2223, pp. 719–730. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Ernst-Gerlach, A., Fuhr, N.: Generating Search Term Variants for Text Collections with Historic Spellings. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 49–60. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Andrea Ernst-Gerlach
    • 1
  • Gregory Crane
    • 2
  1. 1.Department of Computational and Cognitive SciencesUniversity of Duisburg-EssenDuisburgGermany
  2. 2.Perseus Digital LibraryTufts UniversityMedford MAUSA

Personalised recommendations