News Article Classification Based on a Vector Representation Including Words’ Collocations

  • Michal Kompan
  • Mária Bieliková
Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 101)


In this paper we present a proposal including collocations into the preprocessing of the text mining, which we use for the fast news article recommendation and experiments based on real data from the biggest Slovak newspaper. The news article section can be predicted based on several article’s characteristics as article name, content, keywords etc. We provided experiments aimed at comparison of several approaches and algorithms including expressive vector representation, with considering most popular words collocations obtained from Slovak National Corpus.


text pre-processing news recommendation news classification vector representation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Billsus, D., Pazzani, M.J.: Adaptive news access. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 550–570. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Burgin, R.: The retrieval effectiveness of five clustering algorithms as a function of indexing exhaustivity. Journal of the American Soc. for Inf. Science 46(8), 562–572 (1995)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Barla, M., Kompan, M., Suchal, J., Vojtek, P., Zeleník, D., Bieliková, M.: News recommendation. In: Proc. of the 9th Znalosti, Jindrichuv Hradec, pp. 171–174 (2010)Google Scholar
  4. 4.
    Göker, A., Davies, J.: Inform. Ret.: searching in the 21st cent. John Wiley & Sons, Chichester (2009)Google Scholar
  5. 5.
    Harish, B.S., Guru, D.S., Manjunath, S.: Representation and Classification of Text Documents: A Brief Review. IJCA, Special Issue on RTIPPR (2), pp. 110–119 (2010)Google Scholar
  6. 6.
    Jungermann, F.: Information Extraction with RapidMiner. In: Proc. of the GSCL Symposium Sprachtechnologie und eHumanities 2009, Duisburg (2009)Google Scholar
  7. 7.
    Kompan, M., Bieliková, M.: Content-Based News Recommendation. In: Proc. of the 11th Conf. EC-WEB, pp. 61–72. Springer, Berlin (2010)Google Scholar
  8. 8.
    Kroha, P., Baeza-Yates, R.A.: A Case Study: News Classification Based on Term Frequency. In: Proceedings of DEXA Workshops, pp. 428–432 (2005)Google Scholar
  9. 9.
    Manning, D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)zbMATHGoogle Scholar
  10. 10.
    Suchal, J., Návrat, P.: Full text search engine as scalable k-nearest neighbor recommendation system. In: Bramer, M. (ed.) IFIP AI 2010. IFIP Advances in Information and Communication Technology, vol. 331, pp. 165–173. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Ser. in Data Manag. Sys. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  12. 12.
    Zeleník, D., Bieliková, M.: News Recommending Based on Text Similarity and User Behaviour. In: WEBIST 2011: Proc. of the 7th Int. Conf. On Web Informations Systems and Technologies, Noordwijkerhout, Holland, pp. 302–307 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Michal Kompan
    • 1
  • Mária Bieliková
    • 1
  1. 1.Faculty of Informatics and Information TechnologiesSlovak University of TechnologyBratislavaSlovakia

Personalised recommendations