Abstract
Weblogs (blogs) becomes a very popular medium for exchanging information, opinions and experiences nowadays. However, since new blog pages are constantly issued, finding out helpful information from them becomes a tedious and time consuming work. This paper proposes a system for extracting knowledge hidden in blog pages in Chinese. Before extraction, blog pages are clustered into categories. Then for each category, the knowledge can be extracted based on domain ontologies. Using restrained natural language processing, user can query the KB and the helpful knowledge will be returned based on reasoning about the individuals. KEROB, a prototype of our system, is designed and implemented to fulfill the above functions. The experimental results indicate the superiority of our system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liu, J.S., Lee, C.Y.: Extracting Structured Subject Information from Digital Document Archives. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds.) ICADL 2006. LNCS, vol. 4312, pp. 141–150. Springer, Heidelberg (2006)
Nanno, T., Suzuki, Y., Fujiki, T., Okumura, M.: Automatic collection and monitoring of Japanese weblogs. In: Proceedgings of the WWW 2004, ACM Press, New York (2004)
Kurashima, T., Tezuka, T., Tanaka, K.: Mining and visualizing local experiences from blog entries. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 213–222. Springer, Heidelberg (2006)
Kleinberg, J.: Bursty and hierarchical structure in streams. In: Proceedings of the KDD 2002, pp. 91–101. ACM Press, New York (2002)
Nakatsuji, M., Miyoshi, Y., Otsuka, Y.: Innovation detection based on user-interest ontology of blog community. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 515–528. Springer, Heidelberg (2006)
Schürmann, K.B., Stoye, J.: Counting Suffix Arrays and Strings. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 55–66. Springer, Heidelberg (2005)
Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: HHMM-based chinese lexical analyzer ICTCLAS. In: Proceedings of the 2nd SIGHAN Workshop, pp. 184–187 (July 2003)
Ilie, L., Shan, B., Yu, S.: Fast algorithms for extended regular expression matching and searching. In: Alt, H., Habib, M. (eds.) STACS 2003. LNCS, vol. 2607, pp. 179–190. Springer, Heidelberg (2003)
Gan, K.W., Wong, P.W.: Annotating information structures in Chinese texts using HowNet. In: Proceedings of the 2nd workshop on Chinese language processing, pp. 85–92 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bai, X., Sun, J., Che, H., Wang, J. (2007). Towards Knowledge Extraction from Weblogs and Rule-Based Semantic Querying. In: Paschke, A., Biletskiy, Y. (eds) Advances in Rule Interchange and Applications. RuleML 2007. Lecture Notes in Computer Science, vol 4824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75975-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-75975-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75974-4
Online ISBN: 978-3-540-75975-1
eBook Packages: Computer ScienceComputer Science (R0)