Skip to main content

A Similarity Reinforcement Algorithm for Heterogeneous Web Pages

  • Conference paper
Book cover Web Technologies Research and Development - APWeb 2005 (APWeb 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3399))

Included in the following conference series:

Abstract

Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used single relationship to measure the similarity of data objects. In this paper, we first use an Intra- and Inter- Type Relationship Matrix (IITRM) to represent a set of heterogeneous data objects and their inter-relationships. Then, we propose a novel similarity-calculating algorithm over the Inter- and Intra- Type Relationship Matrix. It tries to integrate information from heterogeneous sources to serve their purposes by iteratively computing. This algorithm can help detect latent relationships among heterogeneous data objects. Our new algorithm is based on the intuition that the intra-relationship should affect the inter-relationship, and vice versa. Experimental results on the MSN logs dataset show that our algorithm outperforms the traditional Cosine similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atnafu, S., Brunie, L., Kosch, H.: Similarity-Based Operators and Query Optimization for Multimedia Database Systems. In: Proceedings of the International Database Engineering and Application Symposium, Grenoble, France, pp. 346–355 (2001)

    Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman, Amsterdam (1999)

    Google Scholar 

  3. Beefermand, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 407–415 (2000)

    Google Scholar 

  4. Brauen, T.L.: Document Vector Modification. Prentice-Hall, Inc., Englewood Cliffs (1971)

    Google Scholar 

  5. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30, 107–117

    Google Scholar 

  6. Chakrabarti, S., Dom, B.E., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., Kleinberg, J.M.: Mining the Web’s Link Structure. IEEE Computer 32(8), 60–67

    Google Scholar 

  7. Davison, B.D.: Toward a unification of text and link analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, Toronto, Canada, pp. 367–368 (2003)

    Google Scholar 

  8. Dean, J., Henzinger, M.R.: Finding Related Pages in the World Wide Web. In: Proceedings of the the 8th international conference on World Wide Web (1999)

    Google Scholar 

  9. Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407

    Google Scholar 

  10. Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S.: Using latent semantic analysis to improve information retrieval. In: Proceedings of the CHI 1988: Conference on Human Factors in Computing, pp. 281–285. ACM, New York (1988)

    Chapter  Google Scholar 

  11. Fox, E.: Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types, Cornell University Dissertation

    Google Scholar 

  12. Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, pp. 230–237 (1999)

    Google Scholar 

  13. Ide, E.: New experiments in relevance feedback. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  14. Kessler, M.M.: Bibliographic coupling between scientific papers. American Documentation 14, 10–25

    Google Scholar 

  15. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), 604–632

    Google Scholar 

  16. Larson, R.R.: Bibliometrics of the World-Wide Web: An exploratory analysis of the intelectual structure of cyberspace. In: Proceedings of the the Annual Meeting of the American Society for Information Science, Baltimore, Maryland (1996)

    Google Scholar 

  17. Pitkow, J., Pirolli, P.: Life, death, and lawfulness on the electronic frontier. In: Proceedings of the the Conference on Human Factors in Computing Systems, Atlanta, Georgia (1997)

    Google Scholar 

  18. Popescul, A., Flake, G., Lawrence, S., Ungar, L.H., Giles, C.L.: Clustering and identifying temporal trends in document database. In: Proceedings of the the IEEE advances in Digital Libraries, Washington, D.C. (2000)

    Google Scholar 

  19. Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query Clustering Using User Logs. ACM Transactions on Information Systems (TOIS) 20(1), 59–81

    Google Scholar 

  20. Rasmussen, E.: Clustering algorithm. Information Retrieval: Data Structure and Algorithms

    Google Scholar 

  21. Resnick, P., Varian, H.R.: Recommender Systems. Communications of the ACM 40(3), 56–58

    Google Scholar 

  22. Rocchio, J.J.: Relevance feedback in information retrieval. Prentice Hall Inc., Englewood Cliffs (1971)

    Google Scholar 

  23. Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)

    Google Scholar 

  24. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Co., New York (1983)

    MATH  Google Scholar 

  25. Shaw, J.A., Fox, E.A.: Combination of multiple searches. In: Proceedings of the the 3rd Text Retrieval Conference (TREC-3), p. 105 (1995)

    Google Scholar 

  26. Small, H.: A new measure of the relationship between two documents. Co-citation in the scientific literature 24, 265–269

    Google Scholar 

  27. Su, Z., Yang, Q., Zhang, H.-J., Xu, X., Hu, Y.: Correlation-based Document Clustering using Web Logs. In: Proceedings of the the 34th Hawaii International Conference On System Science, Hawaii, U.S.A (2001)

    Google Scholar 

  28. Tosun, A.S., Ferhatosmanoglu, H.: Vulnerabilities in Similarity Search Based Systems. In: Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM), McLean,VA (2002)

    Google Scholar 

  29. Wang, J.D., Zeng, H.J., Chen, Z., Lu, H.J., Tao, L., Ma, W.-Y.: ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceedings of the the ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 274–281 (2003)

    Google Scholar 

  30. Zhai, C., Lafferty, J.: Model-based Feedback in the Language Modeling Approach to Information Retrieval. In: Proceedings of the the 10th ACM International Conference on Information and Knowledge Management (CIKM), Atlanta, US, pp. 403–410 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, N. et al. (2005). A Similarity Reinforcement Algorithm for Heterogeneous Web Pages. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31849-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31849-1_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25207-8

  • Online ISBN: 978-3-540-31849-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics