Abstract
Citation matching is the problem of finding which citation occurs in a given textual corpus. Most existing citation matching work is done on scientific literature. The goal of this paper is to present methods for performing citation matching on Sanskrit texts. Exact matching and approximate matching are the two methods for performing citation matching. The exact matching method checks for exact occurrence of the citation with respect to the textual corpus. Approximate matching is a fuzzy string-matching method which computes a similarity score between an individual line of the textual corpus and the citation. The Smith-Waterman-Gotoh algorithm for local alignment, which is generally used in bioinformatics, is used here for calculating the similarity score. This similarity score is a measure of the closeness between the text and the citation. The exact- and approximate-matching methods are evaluated and compared. The methods presented can be easily applied to corpora in other Indic languages like Kannada, Tamil, etc. The approximate-matching method can in particular be used in the compilation of critical editions and plagiarism detection in a literary work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press/McGraw-Hill (2001)
Csernel, M., Patte, F.: Critical edition of Sanskrit texts. In: Huet, G., Kulkarni, A., Scharf, P. (eds.) Sanskrit Computational Linguistics 2007/2008. LNCS (LNAI), vol. 5402, pp. 358–379. Springer, Heidelberg (2009)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Meeting of the Association for Computational Linguistics, pp. 177–184 (1991)
Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology 162(3), 705–708 (1982)
Lawrence, S., Bollacker, K., Giles, L.C.: Autonomous citation matching. In: Etzioni, O. (ed.) Proceedings of the Third International Conference on Autonomous Agents. ACM Press, New York (1999)
Mahoney, R.: Arbitrary lexicographic sorting: Sort UTF-8 encoded Romanised Sanskrit, http://www.indica-et-buddhica.org/sections/repositorium-preview/materials/software/sort-utf8-sanskrit
Mesquita, R.: Madhva’s Unknown Literary Sources: Some Observations. Aditya Prakashan, New Delhi (2000)
Mesquita, R.: Madhva’s Quotes from the Puranas and the Mahabharata: An Analytical Compilation of Untraceable Source-Quotations in Madhva’s Works along with Footnotes. Aditya Prakashan, New Delhi (January 2008)
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching (2002)
Rao, S., Sharma, B.N.K.: Madhva’s unknown sources: A review. Asiatische StudienÉtudes Asiatiques LVII 1, 181–194 (2003)
Robinson, P.: The one text and the many texts. Literary and Linguistic Computing 15(1), 5–14 (2000)
Sharma, B.N.K.: History of the Dvaita School of Vedanta and its Literature, 3rd edn. Motilal Banarsidass, Delhi (2000)
Smith, J.: The Mahabharata (2009), http://bombay.indology.info/mahabharata/statement.html
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)
Su, Z., Ahn, B.R., Eom, K.Y., Kang, M.K., Kim, J.P., Kim, M.K.: Plagiarism detection using the Levenshtein Distance and Smith-Waterman algorithm. In: International Conference on Innovative Computing, Information and Control. IEEE Computer Society, Los Alamitos (2008)
Chapman, S.: SimMetrics - open source Similarity Measure Library, http://www.dcs.shef.ac.uk/~sam/simmetrics.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prasad, A.S., Rao, S. (2010). Citation Matching in Sanskrit Corpora Using Local Alignment. In: Jha, G.N. (eds) Sanskrit Computational Linguistics. ISCLS 2010. Lecture Notes in Computer Science(), vol 6465. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17528-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-17528-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17527-5
Online ISBN: 978-3-642-17528-2
eBook Packages: Computer ScienceComputer Science (R0)