Skip to main content

Constructing Blog Entry Classifiers Using Blog-Level Topic Labels

  • Conference paper
Information Retrieval Technology (AIRS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6458))

Included in the following conference series:

Abstract

Identification of a blogger’s interest is usually solved as a classification problem of a sequence of his/her blog entries. In constructing a blog entry classifier, we need as training data a rather large set of blog entries that are manually labeled with a class label. In contrast, we can easily obtain a set of blog sites with class labels. In this paper, we present a method for constructing a blog entry classifier using only a set of blog sites with class labels. Our method is based on the Naive Bayes classifier coupled with the EM algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brooks, C.H., Montanez, N.: Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proc. of the 15th International World Wide Web Conference, pp. 625–632 (2006)

    Google Scholar 

  2. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  3. Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical Report AIM-1625, Artifical Intelligence Laboratory, Massachusetts Institute of Technology (1998), citeseer.nj.nec.com/hofmann98statistical.html

  4. Ikeda, D., Takamura, H., Okumura, M.: Semi-supervised learning for blog classification. In: Proc. of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI 2008), pp. 1156–1161 (2008)

    Google Scholar 

  5. Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. In: Proc. of the 12th International World Wide Web Conference, pp. 568–576 (2003)

    Google Scholar 

  6. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)

    Google Scholar 

  7. Mishne, G.: Autotag: A collaborative approach to automated tag assignment for weblog posts. In: Proc. of the 15th International World Wide Web Conference, pp. 953–954 (2006)

    Google Scholar 

  8. Ni, X., Wu, X., Yu, Y.: Automated identification of chinese weblogger’s interests based on text classification. In: Proc. of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006), pp. 247–253 (2006)

    Google Scholar 

  9. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)

    Article  MATH  Google Scholar 

  10. Ohkura, T., Kiyota, Y., Nakagawa, H.: Browsing system for weblog articles based on automated folksonomy. In: Proc. of the WWW 2006 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2006)

    Google Scholar 

  11. Teng, C.Y., Chen, H.H.: Detection of bloggers’ interests: Using textual, temporal, and interactive features. In: Proc. of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006), pp. 366–369 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hagiwara, K., Takamura, H., Okumura, M. (2010). Constructing Blog Entry Classifiers Using Blog-Level Topic Labels. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17187-1_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17186-4

  • Online ISBN: 978-3-642-17187-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics