Skip to main content

Citation Matching in Sanskrit Corpora Using Local Alignment

  • Conference paper
Sanskrit Computational Linguistics (ISCLS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6465))

Included in the following conference series:

Abstract

Citation matching is the problem of finding which citation occurs in a given textual corpus. Most existing citation matching work is done on scientific literature. The goal of this paper is to present methods for performing citation matching on Sanskrit texts. Exact matching and approximate matching are the two methods for performing citation matching. The exact matching method checks for exact occurrence of the citation with respect to the textual corpus. Approximate matching is a fuzzy string-matching method which computes a similarity score between an individual line of the textual corpus and the citation. The Smith-Waterman-Gotoh algorithm for local alignment, which is generally used in bioinformatics, is used here for calculating the similarity score. This similarity score is a measure of the closeness between the text and the citation. The exact- and approximate-matching methods are evaluated and compared. The methods presented can be easily applied to corpora in other Indic languages like Kannada, Tamil, etc. The approximate-matching method can in particular be used in the compilation of critical editions and plagiarism detection in a literary work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press/McGraw-Hill (2001)

    Google Scholar 

  2. Csernel, M., Patte, F.: Critical edition of Sanskrit texts. In: Huet, G., Kulkarni, A., Scharf, P. (eds.) Sanskrit Computational Linguistics 2007/2008. LNCS (LNAI), vol. 5402, pp. 358–379. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Meeting of the Association for Computational Linguistics, pp. 177–184 (1991)

    Google Scholar 

  4. Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology 162(3), 705–708 (1982)

    Article  Google Scholar 

  5. Lawrence, S., Bollacker, K., Giles, L.C.: Autonomous citation matching. In: Etzioni, O. (ed.) Proceedings of the Third International Conference on Autonomous Agents. ACM Press, New York (1999)

    Google Scholar 

  6. Mahoney, R.: Arbitrary lexicographic sorting: Sort UTF-8 encoded Romanised Sanskrit, http://www.indica-et-buddhica.org/sections/repositorium-preview/materials/software/sort-utf8-sanskrit

  7. Mesquita, R.: Madhva’s Unknown Literary Sources: Some Observations. Aditya Prakashan, New Delhi (2000)

    Google Scholar 

  8. Mesquita, R.: Madhva’s Quotes from the Puranas and the Mahabharata: An Analytical Compilation of Untraceable Source-Quotations in Madhva’s Works along with Footnotes. Aditya Prakashan, New Delhi (January 2008)

    Google Scholar 

  9. Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching (2002)

    Google Scholar 

  10. Rao, S., Sharma, B.N.K.: Madhva’s unknown sources: A review. Asiatische StudienÉtudes Asiatiques LVII 1, 181–194 (2003)

    Google Scholar 

  11. Robinson, P.: The one text and the many texts. Literary and Linguistic Computing 15(1), 5–14 (2000)

    Article  MathSciNet  Google Scholar 

  12. Sharma, B.N.K.: History of the Dvaita School of Vedanta and its Literature, 3rd edn. Motilal Banarsidass, Delhi (2000)

    Google Scholar 

  13. Smith, J.: The Mahabharata (2009), http://bombay.indology.info/mahabharata/statement.html

  14. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)

    Article  Google Scholar 

  15. Su, Z., Ahn, B.R., Eom, K.Y., Kang, M.K., Kim, J.P., Kim, M.K.: Plagiarism detection using the Levenshtein Distance and Smith-Waterman algorithm. In: International Conference on Innovative Computing, Information and Control. IEEE Computer Society, Los Alamitos (2008)

    Google Scholar 

  16. Dudenskt, http://www.sanskritweb.net/koko/dudenskt.pdf

  17. Chapman, S.: SimMetrics - open source Similarity Measure Library, http://www.dcs.shef.ac.uk/~sam/simmetrics.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Prasad, A.S., Rao, S. (2010). Citation Matching in Sanskrit Corpora Using Local Alignment. In: Jha, G.N. (eds) Sanskrit Computational Linguistics. ISCLS 2010. Lecture Notes in Computer Science(), vol 6465. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17528-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17528-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17527-5

  • Online ISBN: 978-3-642-17528-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics