Link-Local Features for Hypertext Classification

Utard, Hervé; Fürnkranz, Johannes

doi:10.1007/11908678_4

Hervé Utard²⁸ &
Johannes Fürnkranz²⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4289))

Included in the following conference series:

381 Accesses
7 Citations

Abstract

Previous work in hypertext classification has resulted in two principal approaches for incorporating information about the graph properties of the Web into the training of a classifier. The first approach uses the complete text of the neighboring pages, whereas the second approach uses only their class labels. In this paper, we argue that both approaches are unsatisfactory: the first one brings in too much irrelevant information, while the second approach is too coarse by abstracting the entire page into a single class label. We argue that one needs to focus on relevant parts of predecessor pages, namely on the region in the neighborhood of the origin of an incoming link. To this end, we will investigate different ways for extracting such features, and compare several different techniques for using them in a text classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks 30(1-7), 107–117 (1998); Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia
Google Scholar
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of the ACM SIGMOD International Conference on Management on Data, pp. 307–318. ACM Press, Seattle (1998)
Google Scholar
Craven, M., Di Pasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence 118(1-2), 69–114 (2000)
Article MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, p. 1. Springer, Heidelberg (2000)
Chapter Google Scholar
Fürnkranz, J.: Web Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 137–142. Springer, Heidelberg (2005)
Google Scholar
Fürnkranz, J.: Hyperlink ensembles: A case study in hypertext classification. Information Fusion 3(4), 299–312 (2002) (Special Issue on Fusion of Multiple Classifiers)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998, pp. 137–142. Springer, Heidelberg (1998)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the International Conference on Machine Learning (ICML 2003), pp. 496–503 (2003)
Google Scholar
McBryan, O.A.: GENVL and WWWW: Tools for taming the Web. In: Proceedings of the 1st World-Wide Web Conference (WWW-1), pp. 58–67. Elsevier, Geneva (1994)
Google Scholar
Rüping, S., Scheffer, T. (eds.): Proceedings of the ICML-2005 Workshop on Learning With Multiple Views, Bonn Germany (2005)
Google Scholar
Utard, H.: Hypertext classification. Master’s thesis, TU Darmstadt, Knowledge Engineering Group (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Engineering Group, TU Darmstadt, Hochschulstraße 10, D-64289, Darmstadt, Germany
Hervé Utard & Johannes Fürnkranz

Authors

Hervé Utard
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Fürnkranz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Natural Language Processing, Institute for Computer Science, University of Leipzig,
Markus Ackermann
Department of Computer Science, K.U. Leuven, B-3001, Heverlee, Belgium
Bettina Berendt
Dept. of Knowledge Technologies, Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik
Knowledge & Data Engineering Group, University of Kassel, Wilhelmshöher Allee 73, D-34121, Kassel, Germany
Andreas Hotho
Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Dunja Mladenič
Dipartimento di Informatica, Università di Bari, Via E. Orabona, 4, 70125, Bari, Italia
Giovanni Semeraro
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou
Research Center L3S, Appelstr. 9a, D-30167, Hannover, Germany
Gerd Stumme
Dept. Information and Knowledge Engineering, University of Economics, Prague, Winston Churchill Sq. 4, 130 67 Praha 3, Prague, Czech Republic
Vojtěch Svátek
Human Computer Studies Lab, University of Amsterdam, Kruislaan 419, 1089, Amsterdam, VA, The Netherlands
Maarten van Someren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Utard, H., Fürnkranz, J. (2006). Link-Local Features for Hypertext Classification. In: Ackermann, M., et al. Semantics, Web and Mining. EWMF KDO 2005 2005. Lecture Notes in Computer Science(), vol 4289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908678_4

Download citation

DOI: https://doi.org/10.1007/11908678_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47697-9
Online ISBN: 978-3-540-47698-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics