Application of Web Search Results for Document Classification

  • So-Young Park
  • Juno Chang
  • Taesuk Kihl
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 235)


In this chapter, we propose a method applying Web search results to the document classification for the purpose of enriching the amount of the training corpus. For the query that will be submitted to a Web search engine, the proposed method generates the Web query based on the matching score between words in documents and the category. Experimental results show that the Web query based on the higher ranked words can improve the document classification performance while the Web query based on the lower ranked words makes worse the document classification performance.


Document classification Web search results Query generation 



This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012R1A1A3013405).


  1. 1.
    Nyberg K, Raiko T, Tinanen T, Hyvnen E (2010) Document classification utilising ontologies and relations between documents. In: Proceedings of the 8th workshop on mining and learning with graphs, Washington, pp 86–93Google Scholar
  2. 2.
    Ayyasamy RK, Tahayna B, Alhashmi S, Eu-gene S, Egerton S (2010) Mining wikipedia knowledge to improve document indexing and classification. In: 10th international conference on information science, signal processing and their applications, pp 806–809Google Scholar
  3. 3.
    Ferreiraa R, Freitasa F, Britob P, Meloa J, Limaa R, Costab E (2013) RetriBlog: an architecture-centered framework for developing blog crawlers. Expert Syst Appl 40(4):1177–1195CrossRefGoogle Scholar
  4. 4.
    Park S, Kim CW, An DU (2009) E-mail classification and category re-organization using dynamic category hierarchy and PCA. J Inf Commun Convergence Eng 7(3):351–355Google Scholar
  5. 5.
    Baharudin B, Lee LH, Khan K (2010) A review of machine learning algorithms for text-documents classification. J Adv Inf Technol 1(1):4–20Google Scholar
  6. 6.
    Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Lu G, Huang P, He L, Cu C, Li X (2010) A new semantic similarity measuring method based on web search engines. WSEAS Trans Comput 9(1):1–10MATHGoogle Scholar
  8. 8.
    Jialei Z, Hwang CG, Jung GD, Choi YK (2011) A design of K-XMDR search system using topic maps. J Inf Commun Convergence Eng 9(3):287–294CrossRefGoogle Scholar
  9. 9.
    McCallum AK (2002) MALLET: a machine learning for language toolkit.
  10. 10.
    Berger A, Pietra SD, Pietra VD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71Google Scholar
  11. 11.
    Lim JH, Hwang YS, Park SY, Rim HC (2004) Semantic role labeling using maximum entropy model. In: Shared task of the fourteenth conference on computational natural language learningGoogle Scholar
  12. 12.
    Samarawickrama S, Jayaratne L (2011) Automatic text classification and focused crawling. In: Sixth international conference on digital information management (ICDIM), pp 143–148Google Scholar
  13. 13.
    Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: 14th international conference on machine learning, pp 412–420Google Scholar
  14. 14.
    Seki K, Mostafa J (2005) An application of text categorization methods to gene ontology annotation. In: 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 138–145Google Scholar
  15. 15.
    Kihl T, Chang J, Park SY (2012) Application tag system based on experience and pleasure for hedonic searches. Convergence Hybrid Inf Technol Commun Comput Inf Sci 310:342–352CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.SangMyung UniversityJongno-gu, SeoulKorea

Personalised recommendations