Skip to main content

A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques

  • Chapter
  • First Online:
Innovative Computing, Optimization and Its Applications

Abstract

Text clustering is an efficient analysis technique used in the domain of the text mining to arrange a huge of unorganized text documents into a subset of coherent clusters. Where, the similar documents in the same cluster. In this paper, we proposed a novel term weighting scheme, namely, length feature weight (LFW), to improve the text document clustering algorithms based on new factors. The proposed scheme assigns a favorable term weight according to the obtained information from the documents collection. It recognizes the terms which are particular to each cluster and enhances their weights based on the proposed factors at the level of the document. β-hill climbing technique is used to validate the proposed scheme in the text clustering. The proposed weight scheme is compared with the existing weight scheme (TF-IDF) to validate its results in that domain. Experiments are conducted on eight standard benchmark text datasets taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed weighting scheme LFW overcomes the existing weighting scheme and enhances the result of text document clustering technique in terms of the F-measure, precision, and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Multi-objectives-based text clustering technique using K-mean algorithm. In 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). IEEE.

    Google Scholar 

  2. Makki, S., Yaakob, R., Mustapha, N., & Ibrahim, H. (2015). Advances in document clustering with evolutionary-based algorithms. American Journal of Applied Sciences, 12(10), 689.

    Article  Google Scholar 

  3. Tang, B., Shepherd, M., Milios, E., & Heywood, M. I. (2005, April). Comparing and combining dimension reduction techniques for efficient text clustering. In International Workshop on Feature Selection for Data Mining, 39 (pp. 81–88).

    Google Scholar 

  4. Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Awadallah, M. A. (2016, May). A krill herd algorithm for efficient text documents clustering. In IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE) (pp. 67–72). IEEE.

    Google Scholar 

  5. Bharti, K. K., & Singh, P. K. (2015). Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Systems with Applications, 42(6), 3105–3114.

    Article  Google Scholar 

  6. Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering. In7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). IEEE.

    Google Scholar 

  7. Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Unsupervised feature selection technique based on harmony search algorithm for improving the Text Clustering. In 7th International Conference on Computer Science and Information Technology (CSIT) 2016 (pp. 1–6). IEEE.

    Google Scholar 

  8. Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. In Mining text data (pp. 77–128). US: Springer.

    Google Scholar 

  9. Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441–451.

    Article  MathSciNet  MATH  Google Scholar 

  10. Abualigah, L. M. Q., & Hanandeh, E. S. (2015). Applying genetic algorithms to information retrieval using vector space model. International Journal of Computer Science, Engineering and Applications, 5(1), 19.

    Article  Google Scholar 

  11. Murugesan, A. K., & Zhang, B. J. (2011). A new term weighting scheme for document clustering. In 7th International Conference Data Min. (DMIN 2011-WORLDCOMP 2011, Las Vegas, Nevada, USA.

    Google Scholar 

  12. Cui, X., Potok, T. E., & Palathingal, P. (2005, June). Document clustering using particle swarm optimization. In Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE (pp. 185–191). IEEE.

    Google Scholar 

  13. Jensi, R., & Jiji, D. G. W. (2014). A survey on optimization approaches to text document clustering. arXiv:1401.2229.

  14. Bolaji, A. L. A., Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abualigah, L. M. (2016). A comprehensive review: Krill Herd algorithm (KH) and its applications. Applied Soft Computing, 49, 437–446.

    Article  Google Scholar 

  15. Hanandeh, E., & Maabreh, K. (2015). Effective information retrieval method based on matching adaptive genetic algorithm. Journal of Theoretical and Applied Information Technology, 81(3), 446.

    Google Scholar 

  16. Abualigah, L. M., Khader, A. T., Al-Betar, M. A., Alyasseri Z. A., Alomari, O. A., & Hanandeh, E. S. (2017). Feature Selection with \( \beta \)-hill climbing Search for Text Clustering Application. In Second Palestinian International Conference on Information and Communication Technology. IEEE.

    Google Scholar 

  17. Yeh, W. C., Lai, C. M., & Chang, K. H. (2016). A novel hybrid clustering approach based on K-harmonic means using robust design. Neurocomputing, 173, 1720–1732.

    Article  Google Scholar 

  18. Chandran, T. R., Reddy, A. V., & Janet, B. (2017). Text Clustering Quality Improvement using a hybrid Social spider optimization. International Journal of Applied Engineering Research, 12(6), 995–1008.

    Google Scholar 

  19. Tunali, V., Bilgin, T., & Camurcu, A. (2016). An improved clustering algorithm for text mining: multi-cluster spherical k-means. International Arab Journal of Information Technology, 13(1), 12–19.

    Google Scholar 

  20. Kohli, S., & Mehrotra, S. (2016). A clustering approach for optimization of search result. Journal of Images and Graphics, 4(1), 63–66.

    Google Scholar 

  21. Prakash, B. R., Hanumanthappa, M., & Mamatha, M. (2014). Cluster based term weighting model for web document clustering. In Proceedings of the Third International Conference on Soft Computing for Problem Solving (pp. 815–822). India: Springer.

    Google Scholar 

  22. Vahdani, B., Behzadi, S. S., Mousavi, S. M., & Shahriari, M. R. (2016). A dynamic virtual air hub location problem with balancing requirements via robust optimization: Mathematical modeling and solution methods. Journal of Intelligent & Fuzzy Systems, 31(3), 1521–1534.

    Article  Google Scholar 

  23. Vasant, P. (2015). Handbook of Research on Artificial Intelligence Techniques and Algorithms, 2 Volumes. Information Science Reference-Imprint of IGI Publishing.

    Google Scholar 

  24. Vasant, P. (Ed.). (2013). Handbook of research on novel soft computing intelligent algorithms: Theory and practical applications. IGI Global.

    Google Scholar 

  25. Vasant, P. (Ed.). (2011). Innovation in power, control, and optimization: Emerging energy technologies: Emerging energy technologies. IGI Global.

    Google Scholar 

  26. Vasant, P. (Ed.). (2016). Handbook of research on modern optimization algorithms and applications in engineering and economics. IGI Global.

    Google Scholar 

  27. Mohammed, A. J., Yusof, Y., & Husni, H. (2014). Weight-based Firefly algorithm for document clustering. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 259–266). Singapore: Springer.

    Google Scholar 

  28. Punitha, S. C., & Punithavalli, M. (2012). Performance evaluation of semantic based and ontology based text document clustering techniques. Procedia Engineering, 30, 100–106.

    Article  Google Scholar 

  29. Liu, W., & Wong, W. (2009). Web service clustering using text mining techniques. International Journal of Agent-Oriented Software Engineering, 3(1), 6–26.

    Article  Google Scholar 

  30. Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Hanandeh, E. S. A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. management9, 11.

    Google Scholar 

  31. Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Alomari, O. A. (2017). Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Systems with Applications.

    Google Scholar 

  32. Rangrej, A., Kulkarni, S., & Tendulkar, A. V. (2011, March). Comparative study of clustering techniques for short text documents. In Proceedings of the 20th International Conference Companion on World wide web (pp. 111–112). ACM.

    Google Scholar 

  33. Abualigah, L. M., & Khader, A. T. (2017). Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing, 1–23.

    Google Scholar 

  34. Abualigah, L. M., Khader, A. T., AlBetar, M. A., & Hanandeh, E. S. (2017). Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering.

    Google Scholar 

  35. Sharma, S., & Gupta, V. (2012). Recent developments in text clustering techniques. Recent Developments in Text Clustering Techniques, 37(6).

    Google Scholar 

  36. Huang, A. (2008, April). Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008) (pp. 49–56), Christchurch, New Zealand.

    Google Scholar 

  37. Zaw, M. M., & Mon, E. E. (2013). Web document clustering using cuckoo search clustering algorithm based on levy flight. International Journal of Innovation and Applied Studies, 4(1), 182–188.

    Google Scholar 

  38. Forsati, R., Mahdavi, M., Shamsfard, M., & Meybodi, M. R. (2013). Efficient stochastic algorithms for document clustering. Information Sciences, 220, 269–291.

    Article  MathSciNet  Google Scholar 

  39. Karol, S., & Mangat, V. (2013). Evaluation of text document clustering approach based on particle swarm optimization. Open Computer Science, 3(2), 69–90.

    Article  Google Scholar 

  40. Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editors, reviewers for their helpful comments and EAI COMPSE 2016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laith Mohammad Abualigah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Abualigah, L.M., Khader, A.T., Hanandeh, E.S. (2018). A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques. In: Zelinka, I., Vasant, P., Duy, V., Dao, T. (eds) Innovative Computing, Optimization and Its Applications. Studies in Computational Intelligence, vol 741. Springer, Cham. https://doi.org/10.1007/978-3-319-66984-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66984-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66983-0

  • Online ISBN: 978-3-319-66984-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics