Abstract
The paper defined an information measure associated with a topic or semantics for a keyword based corpus. Firstly, the topic-based corpus was obtained. Then the latent semantic vector space model of the corpus was established. After that, the information measure of the keyword was defined through the vector space model. Accordingly, it could be calculated that the amount of the topic information any document contained. Lastly, the membership degree which measured the degree of membership of the document belonging to the topic was introduced. Set a measurement threshold, thereby it was determined whether the documents belonging to the topic or not. Experiments show that the definition of the information measurement can get over the difficulty of the word-match search and real reach the goal of the Semantic-match search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lu, R.-Q.: Forefront of scientific knowledge and research. China Awards for Science and Technology 8(4) (2000)
Li, L.-F., Tan, J.-r., Liu, B.: Quantitative information measurement and application for machine component classification codes. Journal of Zhejiang University Science 6A(suppl. I), 35–40 (2005)
Geogre, J.K.: An update on generalized information theory. In: ISIPTA, pp. 321–334 (2003)
Weaver, W.: Recent Contributions to the Mathematical Theory of Communication. Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)
Bar-Hillel, Y., Carnap, R.: An outline of a theory of semantic information. Tech. Rep. No. 247. Research Lab. of Electronics. MIT, Cambridge (1952)
Zadeh, L.A.: Fuzzy Sets. Information Control 8, 338–353 (1965)
Pawlak, Z.I.: Rough sets. International Journal of Computer and Information Sciences (11), 341–356 (1982)
Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Chen, N., Chen, A., Zhou, L.X.: A Documental Clustering Algorithm Based on Fuzzy Concept Graph and Its Application in WebMEIHPGCH. Journal of Software 13(8) (2002)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: 22nd ACM-SIGIR International Conference on Research and Development in Information Retrieval, Berkeley, California, pp. 50–57 (1999)
He, M., Feng, B., Fu, X.: Web Document Classification Based on Rough Set Latent Semantic Indexing. Computer Engineering 30(13) (2004)
He, W.: LSI Latent Semantic Indexing Model. Mathematics in Practice and Theory 33(9) (September 2003)
Zhou, S.-g., Guan, J.-h., Hu, Y.-f.: Latent Semantic Indexing (LSI) and its Applications in Chinese Text Processing. Mini-Micro System 22(2) (February 2001)
Manning, C.D., Schäutze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, K., Li, L., Xu, B. (2011). Research on Information Measurement at Semantic Level. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-23982-3_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)