An N-Gram Based Method for Bengali Keyphrase Extraction

Sarkar, Kamal

doi:10.1007/978-3-642-19403-0_6

An N-Gram Based Method for Bengali Keyphrase Extraction

Kamal Sarkar²

Conference paper

714 Accesses
4 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 139))

Abstract

Keyphrases provide the subject metadata that gives the clues about the content of a document. In this paper, we present a new method for Bengali keyphrase extraction. The proposed method has several steps such as extraction of n-grams, identification of candidate keyphrases and assigning scores to the candidate keyphrases. Since Bengali is a highly inflectional language, we have developed a lightweight stemmer for stemming the candidate keyphrases. The proposed method has been tested on a collection of Bengali documents selected from a Bengali corpus downloadable from TDIL website.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barker, K., Cornacchia, N.: Using Noun Phrase Heads to Extract Document Keyphrases. In: Hamilton, H., Yang, Q. (eds.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000)
Chapter Google Scholar
Chien, L.F.: PAT-tree-based Adaptive Keyphrase Extraction for Intelligent Chinese Information Retrieval. Information Processing and Management 35, 501–521 (1999)
Article Google Scholar
HaCohen-Kerner, Y.: Automatic Extraction of Keywords from Abstracts. In: Palade, V., Howlett, R.J., Jain, L.C. (eds.) KES 2003. LNCS (LNAI), vol. 2773, pp. 843–849. Springer, Heidelberg (2003)
Chapter Google Scholar
HaCohen-Kerner, Y., Gross, Z., Masa, A.: Automatic Extraction and Learning of Keyphrases from Scientific Articles. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 657–669. Springer, Heidelberg (2005)
Chapter Google Scholar
Hulth, A., Karlgren, J., Jonsson, A., Boström, H.: Automatic Keyword Extraction Using Domain Knowledge. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 472–482. Springer, Heidelberg (2001)
Chapter Google Scholar
Matsuo, Y., Ohsawa, Y., Ishizuka, M.: KeyWorld: Extracting Keywords from a Document as a Small World. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 271–281. Springer, Heidelberg (2001)
Chapter Google Scholar
Sarkar, K., Nasipuri, M., Ghose, S.: A New Approach to Keyphrase extraction using Neural Networks. International Journal of Computer Science Issues 7(2,3), 16–25 (2010)
Google Scholar
Turney, P.D.: Learning algorithm for keyphrase extraction. Journal of Information Retrieval 2(4), 303–336 (2000)
Article Google Scholar
Frank, E., Paynter, G., Witten, I.H., Gutwin, C., Nevill-Manning, C.: Domain-specific keyphrase extraction. In: Proceeding of the Sixteenth International Joint Conference on Artificial Intelligence, San Mateo, pp. 668–673 (1999)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., et al.: KEA: Practical Automatic Keyphrase Extraction. In: Fox, E.A., Rowe, N. (eds.) Proceedings of Digital Libraries 1999: The Fourth ACM Conference on Digital Libraries, pp. 254–255. ACM Press, Berkeley (1999)
Google Scholar
Kumar, N., Srinathan, K.: Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceeding of the Eighth ACM Symposium on Document Engineering, Sao Paulo, Brazil, pp. 199–208 (2008)
Google Scholar
Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Proceeding of Workshop of Computational Linguistics for South Asian Languages -Expanding Synergies with Europe, EACL 2003, Budapest, Hungary, pp. 42–48 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science & Engineering Department, Jadavpur University, Kolkata, 700 032, India
Kamal Sarkar

Authors

Kamal Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Punjabi University, Patiala, India
Chandan Singh , Gurpreet Singh Lehal , Jyotsna Sengupta , Dharam Veer Sharma & Vishal Goyal , , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarkar, K. (2011). An N-Gram Based Method for Bengali Keyphrase Extraction. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds) Information Systems for Indian Languages. ICISIL 2011. Communications in Computer and Information Science, vol 139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19403-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-19403-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19402-3
Online ISBN: 978-3-642-19403-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics