Application of Web Search Results for Document Classification
In this chapter, we propose a method applying Web search results to the document classification for the purpose of enriching the amount of the training corpus. For the query that will be submitted to a Web search engine, the proposed method generates the Web query based on the matching score between words in documents and the category. Experimental results show that the Web query based on the higher ranked words can improve the document classification performance while the Web query based on the lower ranked words makes worse the document classification performance.
KeywordsDocument classification Web search results Query generation
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012R1A1A3013405).
- 1.Nyberg K, Raiko T, Tinanen T, Hyvnen E (2010) Document classification utilising ontologies and relations between documents. In: Proceedings of the 8th workshop on mining and learning with graphs, Washington, pp 86–93Google Scholar
- 2.Ayyasamy RK, Tahayna B, Alhashmi S, Eu-gene S, Egerton S (2010) Mining wikipedia knowledge to improve document indexing and classification. In: 10th international conference on information science, signal processing and their applications, pp 806–809Google Scholar
- 4.Park S, Kim CW, An DU (2009) E-mail classification and category re-organization using dynamic category hierarchy and PCA. J Inf Commun Convergence Eng 7(3):351–355Google Scholar
- 5.Baharudin B, Lee LH, Khan K (2010) A review of machine learning algorithms for text-documents classification. J Adv Inf Technol 1(1):4–20Google Scholar
- 9.McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu
- 10.Berger A, Pietra SD, Pietra VD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71Google Scholar
- 11.Lim JH, Hwang YS, Park SY, Rim HC (2004) Semantic role labeling using maximum entropy model. In: Shared task of the fourteenth conference on computational natural language learningGoogle Scholar
- 12.Samarawickrama S, Jayaratne L (2011) Automatic text classification and focused crawling. In: Sixth international conference on digital information management (ICDIM), pp 143–148Google Scholar
- 13.Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: 14th international conference on machine learning, pp 412–420Google Scholar
- 14.Seki K, Mostafa J (2005) An application of text categorization methods to gene ontology annotation. In: 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 138–145Google Scholar