Abstract
Previous work in hypertext classification has resulted in two principal approaches for incorporating information about the graph properties of the Web into the training of a classifier. The first approach uses the complete text of the neighboring pages, whereas the second approach uses only their class labels. In this paper, we argue that both approaches are unsatisfactory: the first one brings in too much irrelevant information, while the second approach is too coarse by abstracting the entire page into a single class label. We argue that one needs to focus on relevant parts of predecessor pages, namely on the region in the neighborhood of the origin of an incoming link. To this end, we will investigate different ways for extracting such features, and compare several different techniques for using them in a text classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks 30(1-7), 107–117 (1998); Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of the ACM SIGMOD International Conference on Management on Data, pp. 307–318. ACM Press, Seattle (1998)
Craven, M., Di Pasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence 118(1-2), 69–114 (2000)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, p. 1. Springer, Heidelberg (2000)
Fürnkranz, J.: Web Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 137–142. Springer, Heidelberg (2005)
Fürnkranz, J.: Hyperlink ensembles: A case study in hypertext classification. Information Fusion 3(4), 299–312 (2002) (Special Issue on Fusion of Multiple Classifiers)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998, pp. 137–142. Springer, Heidelberg (1998)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the International Conference on Machine Learning (ICML 2003), pp. 496–503 (2003)
McBryan, O.A.: GENVL and WWWW: Tools for taming the Web. In: Proceedings of the 1st World-Wide Web Conference (WWW-1), pp. 58–67. Elsevier, Geneva (1994)
Rüping, S., Scheffer, T. (eds.): Proceedings of the ICML-2005 Workshop on Learning With Multiple Views, Bonn Germany (2005)
Utard, H.: Hypertext classification. Master’s thesis, TU Darmstadt, Knowledge Engineering Group (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Utard, H., Fürnkranz, J. (2006). Link-Local Features for Hypertext Classification. In: Ackermann, M., et al. Semantics, Web and Mining. EWMF KDO 2005 2005. Lecture Notes in Computer Science(), vol 4289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908678_4
Download citation
DOI: https://doi.org/10.1007/11908678_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47697-9
Online ISBN: 978-3-540-47698-6
eBook Packages: Computer ScienceComputer Science (R0)