Using K-Means Algorithm for Description Analysis of Text in RSS News Format

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1071)


This article shows the use of different techniques for the extraction of information through text mining. Through this implementation, the performance of each of the techniques in the dataset analysis process can be identified, which allows the reader to recommend the most appropriate technique for the processing of this type of data. This article shows the implementation of the K-means algorithm to determine the location of the news described in RSS format and the results of this type of grouping through a descriptive analysis of the resulting clusters.


RSS news’s format Simple K-means Bag of words Stopwords Text mining 


  1. 1.
    Palechor, F., De la hoz manotas, A., De la hoz franco, E., Colpas, P: Feature selection, learning metrics and dimension reduction in training and classification processes in intrusion detection systems. J. Theor. Appl. Inf. Technol. 82(2) (2015)Google Scholar
  2. 2.
    Calabria-Sarmiento, J.C., et al.: Software applications to health sector: a systematic review of literature (2018)Google Scholar
  3. 3.
    Sen, T., Ali, M.R., Hoque, M.E., Epstein, R., Duberstein, P.: Modeling doctor-patient communication with affective text analysis. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 170–177. IEEE (2017)Google Scholar
  4. 4.
    Jeon, S.W., Lee, H.J., Cho, S.: Building industry network based on business text: corporate disclosures and news. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 4696–4704. IEEE (2017)Google Scholar
  5. 5.
    Irfan, M., Zulfikar, W.B.: Implementation of fuzzy C-Means algorithm and TF-IDF on English journal summary. In: 2017 Second International Conference on Informatics and Computing (ICIC), pp. 1–5. IEEE (2017)Google Scholar
  6. 6.
    De-La-Hoz-Franco, E., Ariza-Colpas, P., Quero, J.M., Espinilla, M.: Sensor-based datasets for human activity recognition–a systematic review of literature. IEEE Access 6, 59192–59210 (2018)CrossRefGoogle Scholar
  7. 7.
    Zhang, X., Yu, Q.: Hotel reviews sentiment analysis based on word vector clustering. In: 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), pp. 260–264. IEEE (2017)Google Scholar
  8. 8.
    Vieira, A.S., Borrajo, L., Iglesias, E.L.: Improving the text classification using clustering and a novel HMM to reduce the dimensionality. Comput. Methods Programs Biomed. 136, 119–130 (2016)CrossRefGoogle Scholar
  9. 9.
    Wu, H., Zou, B., Zhao, Y.Q., Chen, Z., Zhu, C., Guo, J.: Natural scene text detection by multi-scale adaptive color clustering and non-text filtering. Neurocomputing 214, 1011–1025 (2016)CrossRefGoogle Scholar
  10. 10.
    Palechor, F.M., De la Hoz Manotas, A., Colpas, P.A., Ojeda, J.S., Ortega, R.M., Melo, M.P.: Cardiovascular disease analysis using supervised and unsupervised data mining techniques. JSW 12(2), 81–90 (2017)Google Scholar
  11. 11.
    Aradhya, V.M., Pavithra, M.S.: A comprehensive of transforms, Gabor filter and k-means clustering for text detection in images and video. Appl. Comput. Inform. (2014)Google Scholar
  12. 12.
    Bharti, K.K., Singh, P.K.: Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl. Soft Comput. 43, 20–34 (2016)CrossRefGoogle Scholar
  13. 13.
    Li, C.H.: Confirmatory factor analysis with ordinal data: comparing robust maximum likelihood and diagonally weighted least squares. Behav. Res. Methods 48(3), 936–949 (2016)CrossRefGoogle Scholar
  14. 14.
    Melissa, A., François, R., Mohamed, N.: Graph modularity maximization as an effective method for co-clustering text data. Knowl.-Based Syst. 109(1), 160–173 (2016)Google Scholar
  15. 15.
    Mendoza-Palechor, F.E., Ariza-Colpas, P.P., Sepulveda-Ojeda, J.A., De-la-Hoz-Manotas, A., Piñeres Melo, M.: Fertility analysis method based on supervised and unsupervised data mining techniques (2016)Google Scholar
  16. 16.
    Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)CrossRefGoogle Scholar
  17. 17.
    Shafiabady, N., Lee, L.H., Rajkumar, R., Kallimani, V.P., Akram, N.A., Isa, D.: Using unsupervised clustering approach to train the Support Vector Machine for text classification. Neurocomputing 211, 4–10 (2016)CrossRefGoogle Scholar
  18. 18.
    Zhang, W., Tang, X., Yoshida, T.: Tesc: an approach to text classification using semi-supervised clustering. Knowl.-Based Syst. 75, 152–160 (2015)CrossRefGoogle Scholar
  19. 19.
    De França, F.O.: A hash-based co-clustering algorithm for categorical data. arXiv preprint arXiv:1407.7753 (2014)
  20. 20.
    Echeverri-Ocampo, I., Urina-Triana, M., Patricia Ariza, P., Mantilla, M.: El trabajo colaborativo entre ingenieros y personal de la salud para el desarrollo de proyectos en salud digital: una visión al futuro para lograr tener éxito (2018)Google Scholar
  21. 21.
    Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  22. 22.
    Drineas, P., Frieze, A.M., Kannan, R., Vempala, S., Vinay, V.: Clustering in large graphs and matrices. In: SODA, vol. 99, pp. 291–299 (1999)Google Scholar
  23. 23.
    Meila, M., Shi, J.: Learning segmentation by random walks. In: NIPS, pp. 873–879 (2000)Google Scholar
  24. 24.
    Jain, A.K., Dubes, R.C.: Algorithms for clustering data (1988)Google Scholar
  25. 25.
    Guerrero Cuentas, H.R., Polo Mercado, S.S., Martinez Royert, J.C., Ariza Colpas, P.P.: Trabajo colaborativo como estrategia didáctica para el desarrollo del pensamiento crítico (2018)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Universidad de La Costa, CUCBarranquillaColombia
  2. 2.Universidad Pontificia BolivarianaMedellínColombia

Personalised recommendations