Advertisement

Towards Knowledge Extraction from Weblogs and Rule-Based Semantic Querying

  • Xi Bai
  • Jigui Sun
  • Haiyan Che
  • Jin Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4824)

Abstract

Weblogs (blogs) becomes a very popular medium for exchanging information, opinions and experiences nowadays. However, since new blog pages are constantly issued, finding out helpful information from them becomes a tedious and time consuming work. This paper proposes a system for extracting knowledge hidden in blog pages in Chinese. Before extraction, blog pages are clustered into categories. Then for each category, the knowledge can be extracted based on domain ontologies. Using restrained natural language processing, user can query the KB and the helpful knowledge will be returned based on reasoning about the individuals. KEROB, a prototype of our system, is designed and implemented to fulfill the above functions. The experimental results indicate the superiority of our system.

Keywords

Singular Value Decomposition Regular Expression Domain Ontology Knowledge Extraction Information Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Liu, J.S., Lee, C.Y.: Extracting Structured Subject Information from Digital Document Archives. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds.) ICADL 2006. LNCS, vol. 4312, pp. 141–150. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Nanno, T., Suzuki, Y., Fujiki, T., Okumura, M.: Automatic collection and monitoring of Japanese weblogs. In: Proceedgings of the WWW 2004, ACM Press, New York (2004)Google Scholar
  3. 3.
    Kurashima, T., Tezuka, T., Tanaka, K.: Mining and visualizing local experiences from blog entries. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 213–222. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Kleinberg, J.: Bursty and hierarchical structure in streams. In: Proceedings of the KDD 2002, pp. 91–101. ACM Press, New York (2002)CrossRefGoogle Scholar
  5. 5.
    Nakatsuji, M., Miyoshi, Y., Otsuka, Y.: Innovation detection based on user-interest ontology of blog community. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 515–528. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Schürmann, K.B., Stoye, J.: Counting Suffix Arrays and Strings. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 55–66. Springer, Heidelberg (2005)Google Scholar
  7. 7.
    Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: HHMM-based chinese lexical analyzer ICTCLAS. In: Proceedings of the 2nd SIGHAN Workshop, pp. 184–187 (July 2003)Google Scholar
  8. 8.
    Ilie, L., Shan, B., Yu, S.: Fast algorithms for extended regular expression matching and searching. In: Alt, H., Habib, M. (eds.) STACS 2003. LNCS, vol. 2607, pp. 179–190. Springer, Heidelberg (2003)Google Scholar
  9. 9.
    Gan, K.W., Wong, P.W.: Annotating information structures in Chinese texts using HowNet. In: Proceedings of the 2nd workshop on Chinese language processing, pp. 85–92 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Xi Bai
    • 1
    • 2
  • Jigui Sun
    • 1
    • 2
  • Haiyan Che
    • 1
    • 2
  • Jin Wang
    • 3
  1. 1.College of Computer Science and Technology, Jilin University, Changchun 130012China
  2. 2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun 130012China
  3. 3.Institute of Network and Information Security, Shandong University, Jinan 250100China

Personalised recommendations