Measuring Peculiarity of Text Using Relation between Words on the Web
We define the peculiarity of text as a metric of information credibility. Higher peculiarity means lower credibility. We extract the theme word and the characteristic words from text and check whether there is a subject-description relation between them. The peculiarity is defined using the ratio of the subject-description relation between a theme word and characteristic words. We evaluate the extent to which peculiarity can be used to judge by classifying text from Wikipedia and Uncyclopedia in terms of the peculiarity.
KeywordsLower Credibility Device Product Acteristic Word Characteristic Word Information Credibility
Unable to display preview. Download preview PDF.
- 1.Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth international conference on Very large data bases (VLDB 2004), VLDB Endowment, pp. 576–587 (2004)Google Scholar
- 3.Nakagawa, H., Yumoto, H., Mori, T.: Term extraction based on occurrence and concatenation frequency (in Japanese). Journal of natural language processing 10(1), 27–45 (2003)Google Scholar
- 4.Oyama, S., Tanaka, K.: Query modification by discovering topics from web page structures. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 553–564. Springer, Heidelberg (2004)Google Scholar