Keyword Extraction from Hindi Documents Using Statistical Approach
Keywords of a document give us an idea about its important points without going through the whole text. In this paper, we propose an unsupervised, domain-independent, and corpus-independent approach for automatic keyword extraction. The approach is general and can be applied to any language. However, we have tested the approach on Hindi language. Our approach combines the information contained in frequency and spatial distribution of a word in order to extract keywords from a document. Our work is specially significant in the light that it has been implemented and tested on Hindi which is a resource poor and underrepresented language.
KeywordsKeyword extraction Spatial distribution Standard deviation Frequency Hindi
- 4.Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora. Eur. Phys. J. B. 63(1), 135–146 (2008)Google Scholar