Comparing Similarity of HTML Structures and Affiliate IDs in Splog Analysis

  • Taichi Katayama
  • Akihito Morijiri
  • Soichi Ishii
  • Takehito Utsuro
  • Yasuhide Kawada
  • Tomohiro Fukuhara
Conference paper

DOI: 10.1007/978-3-642-20244-5_36

Volume 6637 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Katayama T., Morijiri A., Ishii S., Utsuro T., Kawada Y., Fukuhara T. (2011) Comparing Similarity of HTML Structures and Affiliate IDs in Splog Analysis. In: Xu J., Yu G., Zhou S., Unland R. (eds) Database Systems for Adanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6637. Springer, Berlin, Heidelberg

Abstract

Spam blogs or splogs are blogs hosting spam posts, created using machine generated or hijacked content for the sole purpose of hosting advertisements or raising the number of in-links of target sites. Among those splogs, this paper focuses on detecting a group of splogs which are estimated to be created by an identical spammer. In this paper, we compare two clues: namely, similarity of HTML structures of splogs and affiliate IDs automatically extracted from splogs. We first show that the similarity of HTML structures of splogs is quite effective in splog detection, as well as in identifying spammers. We then show that the identity of affiliate IDs extracted from splogs can identify spammers much more directly than similarity of HTML structures, although it is not easy to achieve high coverage in extracting affiliate IDs. Finally, we show that the coverage of the intersection of the two clues, similarity of HTML structures and affiliate IDs, is relatively low, and it is necessary to apply them in a complementary strategy.

Keywords

spam blog detection HTML structures affiliate IDs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Taichi Katayama
    • 1
  • Akihito Morijiri
    • 1
  • Soichi Ishii
    • 2
  • Takehito Utsuro
    • 1
  • Yasuhide Kawada
    • 3
  • Tomohiro Fukuhara
    • 4
  1. 1.University of TsukubaTsukubaJapan
  2. 2.Tokyo Denki UniversityTokyoJapan
  3. 3.Navix Co., Ltd.TokyoJapan
  4. 4.National Institute of Advanced Industrial Science and TechnologyTokyoJapan