Skip to main content

Combining statistical and relational methods for learning in hypertext domains

Part of the Lecture Notes in Computer Science book series (LNAI,volume 1446)

Abstract

We present a new approach to learning hypertext classifiers that combines a statistical text-learning method with a relational rule learner. This approach is well suited to learning in hypertext domains because its statistical component allows it to characterize text in terms of word frequencies, whereas its relational component is able to describe how neighboring documents are related to each other by hyperlinks that connect them. We evaluate our approach by applying it to tasks that involve learning definitions for (i) classes of pages; (ii) particular relations that exist between pairs of pages, and (iii) locating a particular class of information in the internal structure of pages. Our experiments demonstrate that this new approach is able to learn more accurate classifiers than either of its constituent methods alone.

Keywords

  • Inductive Logic Programming
  • Statistical Predicate
  • Anchor Text
  • Background Relation
  • Accurate Classifier

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/BFb0027309
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   74.99
Price excludes VAT (USA)
  • ISBN: 978-3-540-69059-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. W. Cohen. Fast effective rule induction. In Proc. of the 12th International Conference on Machine Learning. Morgan Kaufmann, 1995.

    Google Scholar 

  2. W. W, Cohen. Learning to classify English text with ILP methods. In L. De Raedt, editor, Advances in Inductive Logic Programming. IOS Press, 1995.

    Google Scholar 

  3. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the World Wide Web. In Proc. of the 15th National Conference on Artificial Intelligence, Madison, WI, 1998. AAAI Press.

    Google Scholar 

  4. M. Craven, S. Slattery, and K. Nigam. First-order learning for Web mining. In Proc. of the 10th European Conference on Machine Learning, pages 250–255, Chemnitz, Germany, 1998. Springer-Verlag.

    Google Scholar 

  5. D. DiPasquo. Using HTML formatting to aid in natural language processing on the World Wide Web, 1998. Senior thesis, Computer Science Department, Carnegie Mellon University.

    Google Scholar 

  6. P. Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103–130, 1997.

    Google Scholar 

  7. S. Džeroski and I. Bratko. Handling noise in inductive logic programming. In Proc. of the 2nd International Workshop on Inductive Logic Programming, pages 109–125, Tokyo, Japan, 1992.

    Google Scholar 

  8. A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247–251, 1989.

    Google Scholar 

  9. B. Kijsirikul, M. Numao, and M. Shimura. Discrimination-based constructive induction of logic programs. In Proc. of the 10th National Conference on Artificial Intelligence, pages 44–49, San Jose, CA, 1992. AAAI Press.

    Google Scholar 

  10. D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proc. of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81–93, 1994.

    Google Scholar 

  11. D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka. Training algorithms for linear classifiers. In Proc. of the 19th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 298–306. Hartung-Gorre Verlag, 1996.

    Google Scholar 

  12. T. Mitchell. Machine Learning. McGraw Hill, 1997.

    Google Scholar 

  13. I. Moulinier, G. Raškinis, and J.-G. Ganascia. Text categorization: a symbolic approach. In Proc. of the 6th Annual Symposium on Document Analysis and Information Retrieval, 1996.

    Google Scholar 

  14. J. R. Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. In Proc. of the 5th European Conference on Machine Learning, pages 3–20, Vienna, Austria, 1993. Springer-Verlag.

    Google Scholar 

  15. B. Richards and R. Mooney. Learning relations by pathfinding. In Proc. of the 10th National Conference on Artificial Intelligence, pages 50–55, San Jose, CA, 1992. AAAI Press.

    Google Scholar 

  16. C. J. van Rijsbergen. Information Retrieval, chapter 7. Butterworths, 1979.

    Google Scholar 

  17. Y. Yang and J. Pedersen. A comparative study on feature set selection in text categorization. In Proc. of the 14th International Conference on Machine Learning, pages 412–420, Nashville, TN, 1997. Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Slattery, S., Craven, M. (1998). Combining statistical and relational methods for learning in hypertext domains. In: Page, D. (eds) Inductive Logic Programming. ILP 1998. Lecture Notes in Computer Science, vol 1446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027309

Download citation

  • DOI: https://doi.org/10.1007/BFb0027309

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64738-6

  • Online ISBN: 978-3-540-69059-7

  • eBook Packages: Springer Book Archive