Skip to main content

A Web Classification Framework Based on XSLT

  • Conference paper
Advanced Web and Network Technologies, and Applications (APWeb 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3842))

Included in the following conference series:

Abstract

Data on the web is gradually changing format from HTML to XML/XSLT driven by various software and hardware requirements such as interoperability and data-sharing problems between different applications/platforms, devices with vairous capabilities like cell phones, PDAs. This gradual change introduces new challenges in web page and web site classification. HTML is used for presentation of content. XML represents content in a hierarchical manner. XSLT is used to transform XML documents into different formats such as HTML, WML. There are certain drawbacks in HTML and XML classifications for classifying a web page. In this paper we propose a new classification method based on XSLT which is able to combine the advantages of HTML and XML classifications. We also introduce a web classification framework utilizing XSLT classification. Finally we show that using Naïve Bayes classifier XSLT classification outperfoms both HTML and XML classifications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  2. Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence (2001)

    Google Scholar 

  3. McCallum, A., Nigam, K.: A comparision of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  4. Yi, J., Sundaresan, N.: A classifier for semi-structured documents. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (2000)

    Google Scholar 

  5. Denoyer, L., Gallinari, P.: Bayesian network model for semi-structured document classification. In: Information Processing and Management, vol. 40(5) (2004)

    Google Scholar 

  6. Mladenic, D.: Turning Yahoo to Automatic Web-Page Classifier. In: European Conference on Artificial Intelligence (1998)

    Google Scholar 

  7. Esposto, F., Malerba, D., Pace, L.D., Leo, P.: A machine learning apporach to web mining. In: Proc. of the 6th Congress of the Italian Association for Artificial Intelligence (1999)

    Google Scholar 

  8. Sun, A., Lim, E., Ng, W.: Web classification using support vector machine. In: Proceedings of the fourth international workshop on Web information and data management, ACM Press, New York (2002)

    Google Scholar 

  9. Asirvatham, A.P., Ravi, K.K.: Web Page Classification based on Document Structure (2001)

    Google Scholar 

  10. Oh, H.-J., Myaeng, S.H., Lee, M.-H.: A practical hypertext categorization method using links and incrementally available class information. In: Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (2000)

    Google Scholar 

  11. Chakrabarti, S., Dom, B.E., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of SIGMOD 1998, ACM International Conference on Management of Data (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kurt, A., Tozal, E. (2006). A Web Classification Framework Based on XSLT. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds) Advanced Web and Network Technologies, and Applications. APWeb 2006. Lecture Notes in Computer Science, vol 3842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610496_10

Download citation

  • DOI: https://doi.org/10.1007/11610496_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31158-4

  • Online ISBN: 978-3-540-32435-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics