Sibling Page Search by Page Examples

  • Hiroaki Ohshima
  • Satoshi Oyama
  • Katsumi Tanaka
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4312)


We propose methods of searching Web pages that are “semantically” regarded as “siblings” with respect to given page examples. That is, our approach aims to find pages that are similar in theme but have different content from the given sample pages. We called this “sibling page search”. The proposed search methods are different from conventional content-based similarity search for Web pages. Our approach recommends Web pages whose “conceptual” classification category is the same as that of the given sample pages, but whose content is different from the sample pages. In this sense, our approach will be useful for supporting a user’s opportunistic search, meaning a search in which the user’s interest and intention are not fixed. The proposed methods were implemented by computing the “common” and “unique” feature vectors of the given sample pages, and by comparing those feature vectors with each retrieved page. We evaluated our method for sibling page search, in which our method was applied to test sets consisting of page collections from the Open Directory Project (ODP).


Feature Vector Term Frequency Cosine Similarity Part Vector Relevant Page 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  2. 2.
    Beaulieu, M., Gatford, M., Huang, X., Robertson, S., Walker, S., Williams, P.: Okapi at TREC-5. In: Proceedings of TREC-5, pp. 143–166 (1997)Google Scholar
  3. 3.
    Meadow, C.T., Boyce, B.R., Kraft, D.H.: Text information retrieval systems, 2nd edn. Academic Press, London (2000)Google Scholar
  4. 4.
    Rocchio, J.: Relevance feedback in information retrieval, In The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  5. 5.
    Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 81–88 (2002)Google Scholar
  6. 6.
    Ishikawa, Y., Subramanya, R., Faloutsos, C.: MindReader: Querying databases through multiple examples. In: Proceedings 24th International Conference on Very Large Data Bases, pp. 218–227 (1998)Google Scholar
  7. 7.
    Westerveld, T., de Vries, A.: Multimedia retrieval using multiple examples. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 344–352. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hiroaki Ohshima
    • 1
  • Satoshi Oyama
    • 1
  • Katsumi Tanaka
    • 1
  1. 1.Department of Social Informatics, Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations