Abstract
Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used single relationship to measure the similarity of data objects. In this paper, we first use an Intra- and Inter- Type Relationship Matrix (IITRM) to represent a set of heterogeneous data objects and their inter-relationships. Then, we propose a novel similarity-calculating algorithm over the Inter- and Intra- Type Relationship Matrix. It tries to integrate information from heterogeneous sources to serve their purposes by iteratively computing. This algorithm can help detect latent relationships among heterogeneous data objects. Our new algorithm is based on the intuition that the intra-relationship should affect the inter-relationship, and vice versa. Experimental results on the MSN logs dataset show that our algorithm outperforms the traditional Cosine similarity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atnafu, S., Brunie, L., Kosch, H.: Similarity-Based Operators and Query Optimization for Multimedia Database Systems. In: Proceedings of the International Database Engineering and Application Symposium, Grenoble, France, pp. 346–355 (2001)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman, Amsterdam (1999)
Beefermand, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 407–415 (2000)
Brauen, T.L.: Document Vector Modification. Prentice-Hall, Inc., Englewood Cliffs (1971)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30, 107–117
Chakrabarti, S., Dom, B.E., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., Kleinberg, J.M.: Mining the Web’s Link Structure. IEEE Computer 32(8), 60–67
Davison, B.D.: Toward a unification of text and link analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, Toronto, Canada, pp. 367–368 (2003)
Dean, J., Henzinger, M.R.: Finding Related Pages in the World Wide Web. In: Proceedings of the the 8th international conference on World Wide Web (1999)
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407
Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S.: Using latent semantic analysis to improve information retrieval. In: Proceedings of the CHI 1988: Conference on Human Factors in Computing, pp. 281–285. ACM, New York (1988)
Fox, E.: Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types, Cornell University Dissertation
Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, pp. 230–237 (1999)
Ide, E.: New experiments in relevance feedback. Prentice-Hall, Englewood Cliffs (1971)
Kessler, M.M.: Bibliographic coupling between scientific papers. American Documentation 14, 10–25
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), 604–632
Larson, R.R.: Bibliometrics of the World-Wide Web: An exploratory analysis of the intelectual structure of cyberspace. In: Proceedings of the the Annual Meeting of the American Society for Information Science, Baltimore, Maryland (1996)
Pitkow, J., Pirolli, P.: Life, death, and lawfulness on the electronic frontier. In: Proceedings of the the Conference on Human Factors in Computing Systems, Atlanta, Georgia (1997)
Popescul, A., Flake, G., Lawrence, S., Ungar, L.H., Giles, C.L.: Clustering and identifying temporal trends in document database. In: Proceedings of the the IEEE advances in Digital Libraries, Washington, D.C. (2000)
Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query Clustering Using User Logs. ACM Transactions on Information Systems (TOIS) 20(1), 59–81
Rasmussen, E.: Clustering algorithm. Information Retrieval: Data Structure and Algorithms
Resnick, P., Varian, H.R.: Recommender Systems. Communications of the ACM 40(3), 56–58
Rocchio, J.J.: Relevance feedback in information retrieval. Prentice Hall Inc., Englewood Cliffs (1971)
Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Co., New York (1983)
Shaw, J.A., Fox, E.A.: Combination of multiple searches. In: Proceedings of the the 3rd Text Retrieval Conference (TREC-3), p. 105 (1995)
Small, H.: A new measure of the relationship between two documents. Co-citation in the scientific literature 24, 265–269
Su, Z., Yang, Q., Zhang, H.-J., Xu, X., Hu, Y.: Correlation-based Document Clustering using Web Logs. In: Proceedings of the the 34th Hawaii International Conference On System Science, Hawaii, U.S.A (2001)
Tosun, A.S., Ferhatosmanoglu, H.: Vulnerabilities in Similarity Search Based Systems. In: Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM), McLean,VA (2002)
Wang, J.D., Zeng, H.J., Chen, Z., Lu, H.J., Tao, L., Ma, W.-Y.: ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceedings of the the ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 274–281 (2003)
Zhai, C., Lafferty, J.: Model-based Feedback in the Language Modeling Approach to Information Retrieval. In: Proceedings of the the 10th ACM International Conference on Information and Knowledge Management (CIKM), Atlanta, US, pp. 403–410 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, N. et al. (2005). A Similarity Reinforcement Algorithm for Heterogeneous Web Pages. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31849-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-31849-1_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25207-8
Online ISBN: 978-3-540-31849-1
eBook Packages: Computer ScienceComputer Science (R0)