Skip to main content

Link-Local Features for Hypertext Classification

  • Conference paper
Semantics, Web and Mining (EWMF 2005, KDO 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4289))

Included in the following conference series:

Abstract

Previous work in hypertext classification has resulted in two principal approaches for incorporating information about the graph properties of the Web into the training of a classifier. The first approach uses the complete text of the neighboring pages, whereas the second approach uses only their class labels. In this paper, we argue that both approaches are unsatisfactory: the first one brings in too much irrelevant information, while the second approach is too coarse by abstracting the entire page into a single class label. We argue that one needs to focus on relevant parts of predecessor pages, namely on the region in the neighborhood of the origin of an incoming link. To this end, we will investigate different ways for extracting such features, and compare several different techniques for using them in a text classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks 30(1-7), 107–117 (1998); Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia

    Google Scholar 

  2. Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of the ACM SIGMOD International Conference on Management on Data, pp. 307–318. ACM Press, Seattle (1998)

    Google Scholar 

  3. Craven, M., Di Pasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence 118(1-2), 69–114 (2000)

    Article  MATH  Google Scholar 

  4. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, p. 1. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Fürnkranz, J.: Web Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 137–142. Springer, Heidelberg (2005)

    Google Scholar 

  6. Fürnkranz, J.: Hyperlink ensembles: A case study in hypertext classification. Information Fusion 3(4), 299–312 (2002) (Special Issue on Fusion of Multiple Classifiers)

    Article  Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998, pp. 137–142. Springer, Heidelberg (1998)

    Google Scholar 

  8. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  9. Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the International Conference on Machine Learning (ICML 2003), pp. 496–503 (2003)

    Google Scholar 

  10. McBryan, O.A.: GENVL and WWWW: Tools for taming the Web. In: Proceedings of the 1st World-Wide Web Conference (WWW-1), pp. 58–67. Elsevier, Geneva (1994)

    Google Scholar 

  11. Rüping, S., Scheffer, T. (eds.): Proceedings of the ICML-2005 Workshop on Learning With Multiple Views, Bonn Germany (2005)

    Google Scholar 

  12. Utard, H.: Hypertext classification. Master’s thesis, TU Darmstadt, Knowledge Engineering Group (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Utard, H., Fürnkranz, J. (2006). Link-Local Features for Hypertext Classification. In: Ackermann, M., et al. Semantics, Web and Mining. EWMF KDO 2005 2005. Lecture Notes in Computer Science(), vol 4289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908678_4

Download citation

  • DOI: https://doi.org/10.1007/11908678_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-47697-9

  • Online ISBN: 978-3-540-47698-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics