Skip to main content

Abstract

Classification of text documents from a pool of huge collection of the same is performed usually on the basis of certain key terms present in the said documents that distinguish a particular document set from the universal set. Generally, these key terms are identified using some feature sets, which can be statistical, rule-based, linguistic, or hybrid in nature. This paper develops a simple technique based on Venn diagram to prioritize the different standard features available in the literature, which in turn reduces the dimension of the feature sets used for document classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chakraborty, Neelotpal, Samir Malakar, Ram Sarkar, Mita Nasipuri. “A Rule based Approach for Noun Phrase Extraction from English Text Document.” 2016 Seventh International Conference on CNC. CNC, 2016.

    Google Scholar 

  2. Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques. Elsevier, 2011.

    Google Scholar 

  3. Hasan, Kazi Saidul, and Vincent Ng. “Automatic Keyphrase Extraction: A Survey of the State of the Art.” ACL (1). 2014.

    Google Scholar 

  4. Mangina, Eleni, and John Kilbride. “Evaluation of keyphrase extraction algorithm and tiling process for a document/resource recommender within e-learning environments.” Computers & Education 50.3 (2008): 807–820.

    Google Scholar 

  5. Haddoud, Mounia, and Saïd Abdeddaïm. “Accurate keyphrase extraction by discriminating overlapping phrases.” Journal of Information Science (2014): 0165551514530210.

    Google Scholar 

  6. Jurafsky, Dan, and James H. Martin. Speech and language processing. Pearson, 2014.

    Google Scholar 

  7. Turney, Peter D. “Learning algorithms for keyphrase extraction.” Information Retrieval 2.4 (2000): 303–336.

    Google Scholar 

  8. Witten, Ian H., et al. “KEA: Practical automatic keyphrase extraction.” Proceedings of the fourth ACM conference on Digital libraries. ACM, 1999.

    Google Scholar 

  9. Sarkar, Kamal, Mita Nasipuri, and Suranjan Ghose. “Machine learning based keyphrase extraction: comparing decision trees, naïve Bayes, and artificial neural networks.” Journal of Information Processing Systems 8.4 (2012): 693–712.

    Google Scholar 

  10. Yu, Feng, Hong-Wei Xuan, and De-quan Zheng. “Key-Phrase Extraction Based on a Combination of CRF Model with Document Structure.” Computational Intelligence and Security (CIS), 2012 Eighth International Conference on. IEEE, 2012.

    Google Scholar 

  11. Sarawagi, Sunita, and William W. Cohen. “Semi-markov conditional random fields for information extraction.” Advances in neural information processing systems. 2004.

    Google Scholar 

  12. Beliga, Slobodan, Ana Meštrović, and Sanda Martinčić-Ipšić. “An Overview of Graph-Based Keyword Extraction Methods and Approaches.” Journal of Information and Organizational Sciences 39.1 (2015): 1–20.

    Google Scholar 

  13. Dharmadhikari, Shweta C., Maya Ingle, and Parag Kulkarni. “Empirical Studies on Machine Learning Based Text Classification Algorithms.” Advanced Computing 2.6 (2011): 161.

    Google Scholar 

  14. Jiang, Xin, Yunhua Hu, and Hang Li. “A ranking approach to keyphrase extraction.” Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2009.

    Google Scholar 

  15. Siddiqi, Sifatullah, and Aditi Sharan. “Keyword and Keyphrase Extraction Techniques: A Literature Review.” International Journal of Computer Applications 109.2 (2015).

    Google Scholar 

  16. Kaur, Jasmeen, and Vishal Gupta. “Effective approaches for extraction of keywords.” Journal of Computer Science 7.6 (2010): 144–148.

    Google Scholar 

Download references

Acknowledgements

The authors are thankful to the Center for Microprocessor Applications for Training Education and Research (CMATER) of C.S.E. Dept., JU, for providing infrastructural facilities during progress of the work. The current work, reported here, has been partially funded by Technical Education Quality Improvement Programme Phase–II (TEQIP-II), Jadavpur University, Kolkata, India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neelotpal Chakraborty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Chakraborty, N., Mukherjee, S., Naskar, A.R., Malakar, S., Sarkar, R., Nasipuri, M. (2017). Venn Diagram-Based Feature Ranking Technique for Key Term Extraction. In: Satapathy, S., Bhateja, V., Udgata, S., Pattnaik, P. (eds) Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications . Advances in Intelligent Systems and Computing, vol 515. Springer, Singapore. https://doi.org/10.1007/978-981-10-3153-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3153-3_33

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3152-6

  • Online ISBN: 978-981-10-3153-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics