Specific-Purpose Web Searches on the Basis of Structure and Contents

  • Mineichi Kudo
  • Atsuyoshi Nakamura
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3847)


We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.


Document Object Model Candidate Text Text Node Keyword Node Natural Language Processing Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Srikant, R.: First algorithms for mining association rules. In: Proc. 20th Int’l Conf. on VLDB, pp. 487–499 (1994)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. 11th Int’l Conf. on Data Eng., pp. 3–14 (1995)Google Scholar
  3. 3.
    Baeza-Yates, R., Ribriro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)Google Scholar
  4. 4.
    Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proc. of 11th Int’l World Wide Web Conf., pp. 232–241 (2002)Google Scholar
  5. 5.
    Flake, G., Lawrence, S., Giles, C.: Efficient identification of web communities. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160 (2000)Google Scholar
  6. 6.
    Flake, G., Tarjan, R., Tsioutsiouliklis, K.: Graph clustering and mining cut trees. Internet Mathematics 1(3), 355–378 (2004)MathSciNetGoogle Scholar
  7. 7.
    Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Hasagawa, H., Kudo, M., Nakamura, A.: Empirical study on usefulness of algorithm sacwrapper for reputation extraction from the www. In: Proceedings of the 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (2005) (to appear)Google Scholar
  9. 9.
    Hasagawa, H., Kudo, M., Nakamura, A.: Reputation extraction using both structural and content information. Technical Report TCS-TR-A-05-2, Division of Computer Science, Hokkaido university (2005),
  10. 10.
    Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive power of tree and string based wrappers. In: Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb 2003), pp. 21–26 (2003)Google Scholar
  11. 11.
    Ino, H., Kudo, M., Nakamura, A.: Partitioning of web graphs by community topology. In: Proceedings of WWW 2005, pp. 661–669 (2005)Google Scholar
  12. 12.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. Computer Networks 31(11-16), 1481–1493 (1999)CrossRefGoogle Scholar
  14. 14.
    Kushmerick, N.: Wrapper induction:efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Mitton, R.: A description of a computer-usable dictionary file based on the oxford advanced learner’s dictionary of current english (June 1992), Downloaded from,
  16. 16.
    Murakami, Y., Sakamoto, H., Arimura, H., Arikawa, S.: Extracting text data from html documents. The Information Processing Society of Japan (IPSJ) Transactions on Mathematical Modeling and its Applications (TOM) 42(SIG 14(TOM 5)), 39–49 (2001) (In Japanese)Google Scholar
  17. 17.
    Nakamura, A., Shigezumi, T., Yamamoto, M.: On nk-community problem. In: Proceedings of the Winter LA Symposium, pp. 12.1–12.8 (2005)Google Scholar
  18. 18.
    Sugibuchi, T., Tanaka, Y.: Interactive web-wrapper construction for extracting relational information from web documents. In: Proceedings of WWW 2005, pp. 968–969 (2005)Google Scholar
  19. 19.
    Tarjan, R.: Data Structure and Network Algorithm. Society for Industrial and Applied Mathematics (1983)Google Scholar
  20. 20.
    Tateishi, K., Ishiguro, Y., Fukushima, T.: A reputation search engine that collects people’s opinions by information extraction technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD) 45(SIG 07) (2004) (In Japanese)Google Scholar
  21. 21.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: Efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of FIMI 2004 (2004)Google Scholar
  22. 22.
    Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. SIGKDD 2002, pp. 71–80 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mineichi Kudo
    • 1
  • Atsuyoshi Nakamura
    • 1
  1. 1.Graduate School of Information Science and TechnologyHokkaido UniversitySapporoJapan

Personalised recommendations