Keyword Extraction from Hindi Documents Using Statistical Approach

  • Aditi Sharan
  • Sifatullah Siddiqi
  • Jagendra Singh
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 309)


Keywords of a document give us an idea about its important points without going through the whole text. In this paper, we propose an unsupervised, domain-independent, and corpus-independent approach for automatic keyword extraction. The approach is general and can be applied to any language. However, we have tested the approach on Hindi language. Our approach combines the information contained in frequency and spatial distribution of a word in order to extract keywords from a document. Our work is specially significant in the light that it has been implemented and tested on Hindi which is a resource poor and underrepresented language.


Keyword extraction Spatial distribution Standard deviation Frequency Hindi 


  1. 1.
    Salton, G., Buckley, C.: Weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRefGoogle Scholar
  2. 2.
    Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)CrossRefGoogle Scholar
  4. 4.
    Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora. Eur. Phys. J. B. 63(1), 135–146 (2008)Google Scholar
  5. 5.
    Ortuño, M., Carpena, P., Bernaola-Galván, P., Muñoz, E., Somoza, A.M.: Keyword detection in natural languages and DNA. Europhys. Lett. 57, 759–764 (2002)CrossRefGoogle Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  • Aditi Sharan
    • 1
  • Sifatullah Siddiqi
    • 1
  • Jagendra Singh
    • 1
  1. 1.Jawaharlal Nehru UniversityNew DelhiIndia

Personalised recommendations