Abstract
Classification of text documents from a pool of huge collection of the same is performed usually on the basis of certain key terms present in the said documents that distinguish a particular document set from the universal set. Generally, these key terms are identified using some feature sets, which can be statistical, rule-based, linguistic, or hybrid in nature. This paper develops a simple technique based on Venn diagram to prioritize the different standard features available in the literature, which in turn reduces the dimension of the feature sets used for document classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chakraborty, Neelotpal, Samir Malakar, Ram Sarkar, Mita Nasipuri. “A Rule based Approach for Noun Phrase Extraction from English Text Document.” 2016 Seventh International Conference on CNC. CNC, 2016.
Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques. Elsevier, 2011.
Hasan, Kazi Saidul, and Vincent Ng. “Automatic Keyphrase Extraction: A Survey of the State of the Art.” ACL (1). 2014.
Mangina, Eleni, and John Kilbride. “Evaluation of keyphrase extraction algorithm and tiling process for a document/resource recommender within e-learning environments.” Computers & Education 50.3 (2008): 807–820.
Haddoud, Mounia, and Saïd Abdeddaïm. “Accurate keyphrase extraction by discriminating overlapping phrases.” Journal of Information Science (2014): 0165551514530210.
Jurafsky, Dan, and James H. Martin. Speech and language processing. Pearson, 2014.
Turney, Peter D. “Learning algorithms for keyphrase extraction.” Information Retrieval 2.4 (2000): 303–336.
Witten, Ian H., et al. “KEA: Practical automatic keyphrase extraction.” Proceedings of the fourth ACM conference on Digital libraries. ACM, 1999.
Sarkar, Kamal, Mita Nasipuri, and Suranjan Ghose. “Machine learning based keyphrase extraction: comparing decision trees, naïve Bayes, and artificial neural networks.” Journal of Information Processing Systems 8.4 (2012): 693–712.
Yu, Feng, Hong-Wei Xuan, and De-quan Zheng. “Key-Phrase Extraction Based on a Combination of CRF Model with Document Structure.” Computational Intelligence and Security (CIS), 2012 Eighth International Conference on. IEEE, 2012.
Sarawagi, Sunita, and William W. Cohen. “Semi-markov conditional random fields for information extraction.” Advances in neural information processing systems. 2004.
Beliga, Slobodan, Ana Meštrović, and Sanda Martinčić-Ipšić. “An Overview of Graph-Based Keyword Extraction Methods and Approaches.” Journal of Information and Organizational Sciences 39.1 (2015): 1–20.
Dharmadhikari, Shweta C., Maya Ingle, and Parag Kulkarni. “Empirical Studies on Machine Learning Based Text Classification Algorithms.” Advanced Computing 2.6 (2011): 161.
Jiang, Xin, Yunhua Hu, and Hang Li. “A ranking approach to keyphrase extraction.” Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2009.
Siddiqi, Sifatullah, and Aditi Sharan. “Keyword and Keyphrase Extraction Techniques: A Literature Review.” International Journal of Computer Applications 109.2 (2015).
Kaur, Jasmeen, and Vishal Gupta. “Effective approaches for extraction of keywords.” Journal of Computer Science 7.6 (2010): 144–148.
Acknowledgements
The authors are thankful to the Center for Microprocessor Applications for Training Education and Research (CMATER) of C.S.E. Dept., JU, for providing infrastructural facilities during progress of the work. The current work, reported here, has been partially funded by Technical Education Quality Improvement Programme Phase–II (TEQIP-II), Jadavpur University, Kolkata, India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chakraborty, N., Mukherjee, S., Naskar, A.R., Malakar, S., Sarkar, R., Nasipuri, M. (2017). Venn Diagram-Based Feature Ranking Technique for Key Term Extraction. In: Satapathy, S., Bhateja, V., Udgata, S., Pattnaik, P. (eds) Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications . Advances in Intelligent Systems and Computing, vol 515. Springer, Singapore. https://doi.org/10.1007/978-981-10-3153-3_33
Download citation
DOI: https://doi.org/10.1007/978-981-10-3153-3_33
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3152-6
Online ISBN: 978-981-10-3153-3
eBook Packages: EngineeringEngineering (R0)