Abstract
Identifying quotations from reference works in primary materials is a very important feature for digital libraries. By adding corresponding citation links to the original text, we can help contextualize the source material. In this paper we introduce an algorithm for identifying citations automatically based on an analysis of the structure of quotations from three different reference works of Latin texts. An evaluation shows that this approach is capable of finding a large number of quotations with which no machine actionable citations are associated. Additionally this approach can be applied for quotations that have been altered in a range of ways from their source.
This work was supported by a grant from the Mellon Foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Crane, G.: What Do You Do With A Million Books? D-Lib Magazine 12 (2006), http://www.dlib.org/dlib/march06/crane/03crane.html
Stewart, G., Crane, G., Babeu, A.: A New Generation of Textual Corpora: Mining Corpora from Very Large Collections. In: JCDL 2007: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 356–365. ACM Press, New York (2007)
Kinable, G.: Computerized Restoration of Historical Dictionaries: Uniformization and Date-assigning in Dictionary Quotations of the Woordenboek der Nederlandsche Taal. Literary & Linguistic Computing 21, 295–310 (2006)
Pouliquen, B., Steinberger, R., Best, C.: Automatic Detection of Quotations in Multilingual News. In: Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2007) (2007)
Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-Based Plagiarism Detection Methods and Tools: an Overview. In: Rachev, B., Smrikarov, A., Dimov, D. (eds.) CompSysTech 2007: Proceedings of the 2007 International Conference on Computer Systems and Technologies, Article no. 40. ACM Press, New York (2007)
Brin, S., Davis, J., GarcÃa-Molina, H.: Copy Detection Mechanisms for Digital Documents. In: Carey, M., Schneider, D. (eds.) Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 398–409. ACM Press, New York (1995)
Hoad, T.C., Zobel, J.: Methods for Identifying Versioned and Plagiarized Documents. Journal of the ASIS&T 54, 203–215 (2003)
Zaslavsky, A., Bia, A., Monostori, K.: Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 103–114. Springer, Heidelberg (2001)
Stein, B., Meyer zu Eissen, S.: Near Similarity Search and Plagiarism Analysis. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 430–437. Springer, Berlin (2005)
Metzler, D., Dumais, S., Meek, C.: Similarity Measures for Short Segments of Text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007)
Metzler, D., Bernstein, Y., Croft, B.W., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: CIKM 2005: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 517–524. ACM Press, New York (2005)
Lee, J.: A Computational Model of Text Reuse in Ancient Literary Texts. In: 45th Annual Meeting of the Association of Computational Linguistics, pp. 472–479. ACL (2007)
Takeda, M., Fukuka, T., Nanri, I., Yamasaki, M., Tamari, K.: Discovering Instances of Poetic Allusion from Anthologies of Classical Japanese Poems. Theoretical Computer Science 292, 497–524 (2003)
Hori, H., Shimozono, S., Takeda, M., Shinohara, A.: Fragmentary Pattern Matching: Complexity, Algorithms and Applications for Analyzing Classic Literary Works. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS, vol. 2223, pp. 719–730. Springer, Heidelberg (2001)
Ernst-Gerlach, A., Fuhr, N.: Generating Search Term Variants for Text Collections with Historic Spellings. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 49–60. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ernst-Gerlach, A., Crane, G. (2008). Identifying Quotations in Reference Works and Primary Materials. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2008. Lecture Notes in Computer Science, vol 5173. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87599-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-87599-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87598-7
Online ISBN: 978-3-540-87599-4
eBook Packages: Computer ScienceComputer Science (R0)