Abstract
The merits of modern web search engines that intend to access such pages limit relatively to users’ requirement relying highly on information retrieval techniques. For accessing most relevant user subject specific pages, building a categorization system that can analyse the content and present information precisely could be a good alternative. For Text Categorization, most of the researchers relied highly on trained dataset. Each trained dataset is usually large in size due to which most approximations, computations are time consuming. This makes the entire categorization system slow and inaccurate. The proposed method is novel and the number of features is used. This paper explores the effect of word and other values of word in the document, which express the features of a word in the document. The proposed features are exploited by tf-itf, position of the word and compactness. These features are combined and evaluated. The Experimental results showed a significant improvement in Text categorization process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ian, H.: Witten Computer Science, University of Waikato. Hamilton, New Zealand
Xue, X.-B., Zhou, Z.-H.: Distributional Features for Text Categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 497–508. Springer, Heidelberg (2006)
Pattern Recognition and Machine Learning, Christopher Bishop. Springer (2006)
Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification. Wiley and Sons
Ng, A.Y., Jordan, M.I.: On Discriminative vs. Generative Classifiers: A comparison of Logistic Regression and Naive Bayes. Neural Information Processing Systems (2002)
Li, B., Yu, S., Lu, Q.: An Improved k-Nearest Neighbor Algorithm for Text Categorization Institute of Computational Linguistics Department of Computer Science and Technology Peking University, Beijing, P.R. China, 100871
Auria, L., Rouslan: Support Vector Machines (SVM) as a Technique for Solvency Analysis
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Lewis, D.D.: An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–50 (1992)
Mladeni, D., Grobelink, M.: Word Sequences as Features in Text Learning. In: Proceedings of the 17th Electro Technical and Computer Science Conference (ERK 1998). IEEE section, Ljubljana (1998)
Xue, X.-B., Zhou, Z.-H.: Distributional features for text categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 497–508. Springer, Heidelberg (2006)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. of Int’l Conf. on Machine Learning, pp. 412–420 (1997)
Zečević, A.: On feature distributional clustering for text categorization. In: Proceedings of the Student Research Workshop Associated with RANLP, pp. 145–149. Hissar, Bulgaria (September 13, 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sohail, A., Kotha, C., Chavali, R.K., Meghana, K., Manne, S., Fatima, S. (2014). An Extensive Selection of Features as Combinations for Automatic Text Categorization. In: Satapathy, S., Udgata, S., Biswal, B. (eds) Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013. Advances in Intelligent Systems and Computing, vol 247. Springer, Cham. https://doi.org/10.1007/978-3-319-02931-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-02931-3_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02930-6
Online ISBN: 978-3-319-02931-3
eBook Packages: EngineeringEngineering (R0)