Enhancing Concept Based Modeling Approach for Blog Classification

  • Ramesh Kumar Ayyasamy
  • Saadat M. Alhashmi
  • Siew Eu-Gene
  • Bashar Tahayna
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 123)

Abstract

Blogs are user generated content discusses on various topics. For the past 10 years, the social web content is growing in a fast pace and research projects are finding ways to channelize these information using text classification techniques. Existing classification technique follows only boolean (or crisp) logic. This paper extends our previous work with a framework where fuzzy clustering is optimized with fuzzy similarity to perform blog classification. The knowledge base-Wikipedia, a widely accepted by the research community was used for our feature selection and classification. Our experimental result proves that proposed framework significantly improves the precision and recall in classifying blogs.

Keywords

Blog classification Wikipedia Fuzzy clustering Fuzzy similarity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. J. Information Processing & Management 24, 513–523 (1988)CrossRefGoogle Scholar
  2. 2.
    Zadeh, L.A.: Fuzzy Sets, Information and Control 8, 338–353 (1965)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. of Cybernetics 3(1), 32–57 (1973)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)MATHCrossRefGoogle Scholar
  5. 5.
    Mendes, M.E.S., Sacks, L.: Evaluating fuzzy clustering for relevance-based access. In: IEEE International Conference on Fuzzy Systems, pp. 648–653 (2003)Google Scholar
  6. 6.
    Miyamoto, S.: Fuzzy multisets and fuzzy clustering of documents. In: 10th IEEE International Conference on Fuzzy Systems, pp. 1191–1194 (2001)Google Scholar
  7. 7.
    Saraçoglu, R., Tütüncü, K., Allahverdi, N.: A fuzzy clustering approach for finding similar documents using a novel similarity measure. Expert Systems with Applications 33(3), 600–605 (2007)CrossRefGoogle Scholar
  8. 8.
    Widyantoro, D.H., Yen, J.: A Fuzzy Similarity Approach in Text Classification Task. In: IEEE International Conference on Fuzzy Systems, pp. 653–658 (2000)Google Scholar
  9. 9.
    Ayyasamy, R.K., Tahayna, B., Alhashmi, S., Eu-gene, S.: Concept Based Modeling Approach for Blog Classification using Fuzzy Similarity. In: 8th IEEE International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1007–1011 (2011)Google Scholar
  10. 10.
    Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In: AAAI, Park (2006)Google Scholar
  11. 11.
    Ayyasamy, R.K., Tahayna, B., Alhashmi, S., Eu-gene, S., Egerton, S.: Mining Wikipedia Knowledge to improve Document Indexing and Classification. In: 10th Int. Conf. on Information Science, Signal Processing and their Applications, pp. 806–809 (2010)Google Scholar
  12. 12.
    Huang, A., Milne, D., Frank, E., Witten, I.H.: Clustering Documents Using a Wikipedia-Based Concept Representation. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 628–636. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Hu, J., Fang, L., Cao, Y., Hua-Jun Zeng, H., Li, H.: Enhancing Text Clustering by Leveraging Wikipedia Semantics. In: ACM SIGIR, pp. 179–186 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ramesh Kumar Ayyasamy
    • 1
  • Saadat M. Alhashmi
    • 1
  • Siew Eu-Gene
    • 2
  • Bashar Tahayna
    • 1
  1. 1.School of Information TechnologyMonash UniversityMalaysia
  2. 2.School of BusinessMonash UniversityMalaysia

Personalised recommendations