A Web Classification Framework Based on XSLT

Kurt, Atakan; Tozal, Engin

doi:10.1007/11610496_10

Atakan Kurt²¹ &
Engin Tozal²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3842))

Included in the following conference series:

Asia-Pacific Web Conference

769 Accesses
1 Citations

Abstract

Data on the web is gradually changing format from HTML to XML/XSLT driven by various software and hardware requirements such as interoperability and data-sharing problems between different applications/platforms, devices with vairous capabilities like cell phones, PDAs. This gradual change introduces new challenges in web page and web site classification. HTML is used for presentation of content. XML represents content in a hierarchical manner. XSLT is used to transform XML documents into different formats such as HTML, WML. There are certain drawbacks in HTML and XML classifications for classifying a web page. In this paper we propose a new classification method based on XSLT which is able to combine the advantages of HTML and XML classifications. We also introduce a web classification framework utilizing XSLT classification. Finally we show that using Naïve Bayes classifier XSLT classification outperfoms both HTML and XML classifications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence (2001)
Google Scholar
McCallum, A., Nigam, K.: A comparision of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Yi, J., Sundaresan, N.: A classifier for semi-structured documents. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (2000)
Google Scholar
Denoyer, L., Gallinari, P.: Bayesian network model for semi-structured document classification. In: Information Processing and Management, vol. 40(5) (2004)
Google Scholar
Mladenic, D.: Turning Yahoo to Automatic Web-Page Classifier. In: European Conference on Artificial Intelligence (1998)
Google Scholar
Esposto, F., Malerba, D., Pace, L.D., Leo, P.: A machine learning apporach to web mining. In: Proc. of the 6th Congress of the Italian Association for Artificial Intelligence (1999)
Google Scholar
Sun, A., Lim, E., Ng, W.: Web classification using support vector machine. In: Proceedings of the fourth international workshop on Web information and data management, ACM Press, New York (2002)
Google Scholar
Asirvatham, A.P., Ravi, K.K.: Web Page Classification based on Document Structure (2001)
Google Scholar
Oh, H.-J., Myaeng, S.H., Lee, M.-H.: A practical hypertext categorization method using links and incrementally available class information. In: Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (2000)
Google Scholar
Chakrabarti, S., Dom, B.E., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of SIGMOD 1998, ACM International Conference on Management of Data (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Eng. Dept., Fatih University, Istanbul, Turkey
Atakan Kurt & Engin Tozal

Authors

Atakan Kurt
View author publications
You can also search for this author in PubMed Google Scholar
Engin Tozal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
School of Computer Science and Technology, Heilongjiang University, P.O. Box, 150080, Harbin, China
Jinbao Li
Department of Computer Science and Engineering, Shanghai Jiatong University, 80 Dongcuan Road, 200240, Shanghai, China
Minglu Li
Department of Computer Science, College of Liberal Arts and Science, University of Iowa, 52242, Iowa City, IA, USA
Jun Ni
UNC Chapel Hill,
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurt, A., Tozal, E. (2006). A Web Classification Framework Based on XSLT. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds) Advanced Web and Network Technologies, and Applications. APWeb 2006. Lecture Notes in Computer Science, vol 3842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610496_10

Download citation

DOI: https://doi.org/10.1007/11610496_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31158-4
Online ISBN: 978-3-540-32435-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics