Citation Matching in Sanskrit Corpora Using Local Alignment

Prasad, Abhinandan S.; Rao, Shrisha

doi:10.1007/978-3-642-17528-2_9

Abhinandan S. Prasad²⁰ &
Shrisha Rao²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6465))

Included in the following conference series:

International Sanskrit Computational Linguistics Symposium

625 Accesses
1 Citations

Abstract

Citation matching is the problem of finding which citation occurs in a given textual corpus. Most existing citation matching work is done on scientific literature. The goal of this paper is to present methods for performing citation matching on Sanskrit texts. Exact matching and approximate matching are the two methods for performing citation matching. The exact matching method checks for exact occurrence of the citation with respect to the textual corpus. Approximate matching is a fuzzy string-matching method which computes a similarity score between an individual line of the textual corpus and the citation. The Smith-Waterman-Gotoh algorithm for local alignment, which is generally used in bioinformatics, is used here for calculating the similarity score. This similarity score is a measure of the closeness between the text and the citation. The exact- and approximate-matching methods are evaluated and compared. The methods presented can be easily applied to corpora in other Indic languages like Kannada, Tamil, etc. The approximate-matching method can in particular be used in the compilation of critical editions and plagiarism detection in a literary work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press/McGraw-Hill (2001)
Google Scholar
Csernel, M., Patte, F.: Critical edition of Sanskrit texts. In: Huet, G., Kulkarni, A., Scharf, P. (eds.) Sanskrit Computational Linguistics 2007/2008. LNCS (LNAI), vol. 5402, pp. 358–379. Springer, Heidelberg (2009)
Chapter Google Scholar
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Meeting of the Association for Computational Linguistics, pp. 177–184 (1991)
Google Scholar
Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology 162(3), 705–708 (1982)
Article Google Scholar
Lawrence, S., Bollacker, K., Giles, L.C.: Autonomous citation matching. In: Etzioni, O. (ed.) Proceedings of the Third International Conference on Autonomous Agents. ACM Press, New York (1999)
Google Scholar
Mahoney, R.: Arbitrary lexicographic sorting: Sort UTF-8 encoded Romanised Sanskrit, http://www.indica-et-buddhica.org/sections/repositorium-preview/materials/software/sort-utf8-sanskrit
Mesquita, R.: Madhva’s Unknown Literary Sources: Some Observations. Aditya Prakashan, New Delhi (2000)
Google Scholar
Mesquita, R.: Madhva’s Quotes from the Puranas and the Mahabharata: An Analytical Compilation of Untraceable Source-Quotations in Madhva’s Works along with Footnotes. Aditya Prakashan, New Delhi (January 2008)
Google Scholar
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching (2002)
Google Scholar
Rao, S., Sharma, B.N.K.: Madhva’s unknown sources: A review. Asiatische StudienÉtudes Asiatiques LVII 1, 181–194 (2003)
Google Scholar
Robinson, P.: The one text and the many texts. Literary and Linguistic Computing 15(1), 5–14 (2000)
Article MathSciNet Google Scholar
Sharma, B.N.K.: History of the Dvaita School of Vedanta and its Literature, 3rd edn. Motilal Banarsidass, Delhi (2000)
Google Scholar
Smith, J.: The Mahabharata (2009), http://bombay.indology.info/mahabharata/statement.html
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)
Article Google Scholar
Su, Z., Ahn, B.R., Eom, K.Y., Kang, M.K., Kim, J.P., Kim, M.K.: Plagiarism detection using the Levenshtein Distance and Smith-Waterman algorithm. In: International Conference on Innovative Computing, Information and Control. IEEE Computer Society, Los Alamitos (2008)
Google Scholar
Dudenskt, http://www.sanskritweb.net/koko/dudenskt.pdf
Chapman, S.: SimMetrics - open source Similarity Measure Library, http://www.dcs.shef.ac.uk/~sam/simmetrics.html

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Bangalore, India
Abhinandan S. Prasad & Shrisha Rao

Authors

Abhinandan S. Prasad
View author publications
You can also search for this author in PubMed Google Scholar
Shrisha Rao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Special Center for Sanskrit Studies, Jawaharlal Nehru University, 110067, New Delhi, India
Girish Nath Jha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prasad, A.S., Rao, S. (2010). Citation Matching in Sanskrit Corpora Using Local Alignment. In: Jha, G.N. (eds) Sanskrit Computational Linguistics. ISCLS 2010. Lecture Notes in Computer Science(), vol 6465. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17528-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-17528-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17527-5
Online ISBN: 978-3-642-17528-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics