Identifying Quotations in Reference Works and Primary Materials

Ernst-Gerlach, Andrea; Crane, Gregory

doi:10.1007/978-3-540-87599-4_9

Andrea Ernst-Gerlach¹ &
Gregory Crane²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5173))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1220 Accesses
2 Citations
6 Altmetric

Abstract

Identifying quotations from reference works in primary materials is a very important feature for digital libraries. By adding corresponding citation links to the original text, we can help contextualize the source material. In this paper we introduce an algorithm for identifying citations automatically based on an analysis of the structure of quotations from three different reference works of Latin texts. An evaluation shows that this approach is capable of finding a large number of quotations with which no machine actionable citations are associated. Additionally this approach can be applied for quotations that have been altered in a range of ways from their source.

This work was supported by a grant from the Mellon Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Crane, G.: What Do You Do With A Million Books? D-Lib Magazine 12 (2006), http://www.dlib.org/dlib/march06/crane/03crane.html
Stewart, G., Crane, G., Babeu, A.: A New Generation of Textual Corpora: Mining Corpora from Very Large Collections. In: JCDL 2007: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 356–365. ACM Press, New York (2007)
Google Scholar
Kinable, G.: Computerized Restoration of Historical Dictionaries: Uniformization and Date-assigning in Dictionary Quotations of the Woordenboek der Nederlandsche Taal. Literary & Linguistic Computing 21, 295–310 (2006)
Article Google Scholar
Pouliquen, B., Steinberger, R., Best, C.: Automatic Detection of Quotations in Multilingual News. In: Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2007) (2007)
Google Scholar
Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-Based Plagiarism Detection Methods and Tools: an Overview. In: Rachev, B., Smrikarov, A., Dimov, D. (eds.) CompSysTech 2007: Proceedings of the 2007 International Conference on Computer Systems and Technologies, Article no. 40. ACM Press, New York (2007)
Google Scholar
Brin, S., Davis, J., García-Molina, H.: Copy Detection Mechanisms for Digital Documents. In: Carey, M., Schneider, D. (eds.) Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 398–409. ACM Press, New York (1995)
Chapter Google Scholar
Hoad, T.C., Zobel, J.: Methods for Identifying Versioned and Plagiarized Documents. Journal of the ASIS&T 54, 203–215 (2003)
Google Scholar
Zaslavsky, A., Bia, A., Monostori, K.: Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 103–114. Springer, Heidelberg (2001)
Chapter Google Scholar
Stein, B., Meyer zu Eissen, S.: Near Similarity Search and Plagiarism Analysis. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 430–437. Springer, Berlin (2005)
Google Scholar
Metzler, D., Dumais, S., Meek, C.: Similarity Measures for Short Segments of Text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007)
Chapter Google Scholar
Metzler, D., Bernstein, Y., Croft, B.W., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: CIKM 2005: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 517–524. ACM Press, New York (2005)
Chapter Google Scholar
Lee, J.: A Computational Model of Text Reuse in Ancient Literary Texts. In: 45th Annual Meeting of the Association of Computational Linguistics, pp. 472–479. ACL (2007)
Google Scholar
Takeda, M., Fukuka, T., Nanri, I., Yamasaki, M., Tamari, K.: Discovering Instances of Poetic Allusion from Anthologies of Classical Japanese Poems. Theoretical Computer Science 292, 497–524 (2003)
Article MATH MathSciNet Google Scholar
Hori, H., Shimozono, S., Takeda, M., Shinohara, A.: Fragmentary Pattern Matching: Complexity, Algorithms and Applications for Analyzing Classic Literary Works. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS, vol. 2223, pp. 719–730. Springer, Heidelberg (2001)
Chapter Google Scholar
Ernst-Gerlach, A., Fuhr, N.: Generating Search Term Variants for Text Collections with Historic Spellings. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 49–60. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational and Cognitive Sciences, University of Duisburg-Essen, Lotharstr. 65, 47048, Duisburg, Germany
Andrea Ernst-Gerlach
Perseus Digital Library, Tufts University, Eaton 124, Medford MA, 02155, USA
Gregory Crane

Authors

Andrea Ernst-Gerlach
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Crane
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Birte Christensen-Dalsgaard Donatella Castelli Bolette Ammitzbøll Jurik Joan Lippincott

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ernst-Gerlach, A., Crane, G. (2008). Identifying Quotations in Reference Works and Primary Materials. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2008. Lecture Notes in Computer Science, vol 5173. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87599-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-87599-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87598-7
Online ISBN: 978-3-540-87599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics