Abstract
Text clustering is an efficient analysis technique used in the domain of the text mining to arrange a huge of unorganized text documents into a subset of coherent clusters. Where, the similar documents in the same cluster. In this paper, we proposed a novel term weighting scheme, namely, length feature weight (LFW), to improve the text document clustering algorithms based on new factors. The proposed scheme assigns a favorable term weight according to the obtained information from the documents collection. It recognizes the terms which are particular to each cluster and enhances their weights based on the proposed factors at the level of the document. β-hill climbing technique is used to validate the proposed scheme in the text clustering. The proposed weight scheme is compared with the existing weight scheme (TF-IDF) to validate its results in that domain. Experiments are conducted on eight standard benchmark text datasets taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed weighting scheme LFW overcomes the existing weighting scheme and enhances the result of text document clustering technique in terms of the F-measure, precision, and recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Multi-objectives-based text clustering technique using K-mean algorithm. In 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). IEEE.
Makki, S., Yaakob, R., Mustapha, N., & Ibrahim, H. (2015). Advances in document clustering with evolutionary-based algorithms. American Journal of Applied Sciences, 12(10), 689.
Tang, B., Shepherd, M., Milios, E., & Heywood, M. I. (2005, April). Comparing and combining dimension reduction techniques for efficient text clustering. In International Workshop on Feature Selection for Data Mining, 39 (pp. 81–88).
Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Awadallah, M. A. (2016, May). A krill herd algorithm for efficient text documents clustering. In IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE) (pp. 67–72). IEEE.
Bharti, K. K., & Singh, P. K. (2015). Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Systems with Applications, 42(6), 3105–3114.
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering. In7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). IEEE.
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Unsupervised feature selection technique based on harmony search algorithm for improving the Text Clustering. In 7th International Conference on Computer Science and Information Technology (CSIT) 2016 (pp. 1–6). IEEE.
Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. In Mining text data (pp. 77–128). US: Springer.
Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441–451.
Abualigah, L. M. Q., & Hanandeh, E. S. (2015). Applying genetic algorithms to information retrieval using vector space model. International Journal of Computer Science, Engineering and Applications, 5(1), 19.
Murugesan, A. K., & Zhang, B. J. (2011). A new term weighting scheme for document clustering. In 7th International Conference Data Min. (DMIN 2011-WORLDCOMP 2011, Las Vegas, Nevada, USA.
Cui, X., Potok, T. E., & Palathingal, P. (2005, June). Document clustering using particle swarm optimization. In Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE (pp. 185–191). IEEE.
Jensi, R., & Jiji, D. G. W. (2014). A survey on optimization approaches to text document clustering. arXiv:1401.2229.
Bolaji, A. L. A., Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abualigah, L. M. (2016). A comprehensive review: Krill Herd algorithm (KH) and its applications. Applied Soft Computing, 49, 437–446.
Hanandeh, E., & Maabreh, K. (2015). Effective information retrieval method based on matching adaptive genetic algorithm. Journal of Theoretical and Applied Information Technology, 81(3), 446.
Abualigah, L. M., Khader, A. T., Al-Betar, M. A., Alyasseri Z. A., Alomari, O. A., & Hanandeh, E. S. (2017). Feature Selection with \( \beta \)-hill climbing Search for Text Clustering Application. In Second Palestinian International Conference on Information and Communication Technology. IEEE.
Yeh, W. C., Lai, C. M., & Chang, K. H. (2016). A novel hybrid clustering approach based on K-harmonic means using robust design. Neurocomputing, 173, 1720–1732.
Chandran, T. R., Reddy, A. V., & Janet, B. (2017). Text Clustering Quality Improvement using a hybrid Social spider optimization. International Journal of Applied Engineering Research, 12(6), 995–1008.
Tunali, V., Bilgin, T., & Camurcu, A. (2016). An improved clustering algorithm for text mining: multi-cluster spherical k-means. International Arab Journal of Information Technology, 13(1), 12–19.
Kohli, S., & Mehrotra, S. (2016). A clustering approach for optimization of search result. Journal of Images and Graphics, 4(1), 63–66.
Prakash, B. R., Hanumanthappa, M., & Mamatha, M. (2014). Cluster based term weighting model for web document clustering. In Proceedings of the Third International Conference on Soft Computing for Problem Solving (pp. 815–822). India: Springer.
Vahdani, B., Behzadi, S. S., Mousavi, S. M., & Shahriari, M. R. (2016). A dynamic virtual air hub location problem with balancing requirements via robust optimization: Mathematical modeling and solution methods. Journal of Intelligent & Fuzzy Systems, 31(3), 1521–1534.
Vasant, P. (2015). Handbook of Research on Artificial Intelligence Techniques and Algorithms, 2 Volumes. Information Science Reference-Imprint of IGI Publishing.
Vasant, P. (Ed.). (2013). Handbook of research on novel soft computing intelligent algorithms: Theory and practical applications. IGI Global.
Vasant, P. (Ed.). (2011). Innovation in power, control, and optimization: Emerging energy technologies: Emerging energy technologies. IGI Global.
Vasant, P. (Ed.). (2016). Handbook of research on modern optimization algorithms and applications in engineering and economics. IGI Global.
Mohammed, A. J., Yusof, Y., & Husni, H. (2014). Weight-based Firefly algorithm for document clustering. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 259–266). Singapore: Springer.
Punitha, S. C., & Punithavalli, M. (2012). Performance evaluation of semantic based and ontology based text document clustering techniques. Procedia Engineering, 30, 100–106.
Liu, W., & Wong, W. (2009). Web service clustering using text mining techniques. International Journal of Agent-Oriented Software Engineering, 3(1), 6–26.
Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Hanandeh, E. S. A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. management, 9, 11.
Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Alomari, O. A. (2017). Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Systems with Applications.
Rangrej, A., Kulkarni, S., & Tendulkar, A. V. (2011, March). Comparative study of clustering techniques for short text documents. In Proceedings of the 20th International Conference Companion on World wide web (pp. 111–112). ACM.
Abualigah, L. M., & Khader, A. T. (2017). Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing, 1–23.
Abualigah, L. M., Khader, A. T., AlBetar, M. A., & Hanandeh, E. S. (2017). Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering.
Sharma, S., & Gupta, V. (2012). Recent developments in text clustering techniques. Recent Developments in Text Clustering Techniques, 37(6).
Huang, A. (2008, April). Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008) (pp. 49–56), Christchurch, New Zealand.
Zaw, M. M., & Mon, E. E. (2013). Web document clustering using cuckoo search clustering algorithm based on levy flight. International Journal of Innovation and Applied Studies, 4(1), 182–188.
Forsati, R., Mahdavi, M., Shamsfard, M., & Meybodi, M. R. (2013). Efficient stochastic algorithms for document clustering. Information Sciences, 220, 269–291.
Karol, S., & Mangat, V. (2013). Evaluation of text document clustering approach based on particle swarm optimization. Open Computer Science, 3(2), 69–90.
Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767.
Acknowledgements
The authors would like to thank the editors, reviewers for their helpful comments and EAI COMPSE 2016.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Abualigah, L.M., Khader, A.T., Hanandeh, E.S. (2018). A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques. In: Zelinka, I., Vasant, P., Duy, V., Dao, T. (eds) Innovative Computing, Optimization and Its Applications. Studies in Computational Intelligence, vol 741. Springer, Cham. https://doi.org/10.1007/978-3-319-66984-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-66984-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66983-0
Online ISBN: 978-3-319-66984-7
eBook Packages: EngineeringEngineering (R0)