Link Proximity Analysis - Clustering Websites by Examining Link Proximity

  • Bela Gipp
  • Adriana Taylor
  • Jöran Beel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6273)


This research-in-progress paper presents a new approach called Link Proximity Analysis (LPA) for identifying related web pages based on link analysis. In contrast to current techniques, which ignore intra-page link analysis, the one put forth here examines the relative positioning of links to each other within websites. The approach uses the fact that a clear correlation between the proximity of links to each other and the subject-relatedness of the linked websites can be observed on nearly every web page. By statistically analyzing this relationship and measuring the amount of sentences, paragraphs, etc. between two links, related websites can be automatically, identified as a first study has proven.


Web page Website clustering Network Analysis Link Analysis Citation Proximity Analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gipp, B., Beel, J.: Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis. In: Larsen, B., Leta, J. (eds.) Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI 2009), Rio de Janeiro, Brazil, vol. 2, pp. 571–575 (July 2009), ISSN 2175-1935,
  2. 2.
    Gipp, B., Beel, J., Hentschel, C.: Scienstein: A Research Paper Recommender System. In: Proceedings of the International Conference on Emerging Trends in Computing (ICETiC 2009), Virudhunagar, India, pp. 309–315. Kamaraj College of Engineering and Technology India/IEEE (January 2009),
  3. 3.
    Kessler, M.M.: Bibliographic coupling between scientific papers. American Documentation 14, 10–25 (1963)CrossRefGoogle Scholar
  4. 4.
    Marshakova, I.V.: System of document connections based on references. Scientific and Technical Information Serial of VINITI 6(2), 3–8 (1973)Google Scholar
  5. 5.
    Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: Proceedings of the 14th International Conference on World Wide Web Conference (2005)Google Scholar
  6. 6.
    Dutta, A.K.R., Ghosh, I., Mukhopadhyay, D.: An Advanced Partitioning Approach of Web Page Clustering utilizing Content & Link Structure. Journal of Convergence Information Technology 4, 65–71 (2009)Google Scholar
  7. 7.
    Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of the American Society for Information Science 24, 265–269 (1973)CrossRefGoogle Scholar
  8. 8.
    Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Workshop on Artificial Intelligence for Web Search (AAAI 2000), pp. 58–64 (2000)Google Scholar
  9. 9.
    Klein, D., Haveliwala, T.H., Gionis, A., Indyk, P.: Evaluating strategies for similarity search on the web. In: Proceedings of the 11th International Conference on World Wide Web (2002)Google Scholar
  10. 10.
    Wang, Y., Kitsuregawa, M.: Evaluating contents-link coupled web page clustering for web search results. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, p. 506. ACM, New York (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Bela Gipp
    • 1
    • 2
  • Adriana Taylor
    • 1
  • Jöran Beel
    • 1
    • 2
  1. 1.UC BerkeleyBerkeleyUSA
  2. 2.Computer Science/ITI/VLBA-LabOtto-von-Guericke UniversityMagdeburgGermany

Personalised recommendations