Skip to main content

Keyword Extraction from Hindi Documents Using Document Statistics and Fuzzy Modelling

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 672))

Abstract

In this paper, we put forward a novel unsupervised, domain independent and corpus independent approach for automatic keyword extraction. Our approach combines the document statistics of frequency and spatial distribution of a word in order to extract the keywords. We have extracted keywords from Hindi documents using document statistics and utilized the power of fuzzy logic to combine those document statistics effectively for better results. Further, we use this information to frame fuzzy rules for keyword extraction. Main advantages of our approach are that it uses the fuzzy membership for the variables instead of dealing with crisp thresholds and corpus independent setting of fuzzy membership boundaries. Our work is especially significant in the light that it has been implemented and tested on Hindi which is a resource poor and underrepresented language.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zahang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., & Wang, B.: Automatic Keyword Extraction from Documents Using Conditional Random Fields, Journal of CIS (2008), pp. 1169–1180.

    Google Scholar 

  2. Ortuño, M. et al.: Keyword detection in natural languages and DNA, Europhys. Lett. (2002).

    Google Scholar 

  3. Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, 1 (4). (1957) pp. 309–317.

    Google Scholar 

  4. G. Salton, C. S. Yang, Yu, C. T.: A Theory of Term Importance in Automatic Text Analysis, Journal of the American society for Information Science, 26(1), (1975) pp. 33–44.

    Google Scholar 

  5. Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora, The European physical journal, (2008).

    Google Scholar 

  6. Carpena, P. et al.: Level statistics of words-Finding keywords in literary texts and symbolic sequences, Physical Review E, (2009).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sifatullah Siddiqi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Siddiqi, S., Sharan, A. (2018). Keyword Extraction from Hindi Documents Using Document Statistics and Fuzzy Modelling. In: Bhateja, V., Nguyen, B., Nguyen, N., Satapathy, S., Le, DN. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-10-7512-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7512-4_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7511-7

  • Online ISBN: 978-981-10-7512-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics