Abstract
This paper suggests an automated method for document classification using an ontology, which expresses terminology information and vocabulary contained in Web documents by way of a hierarchical structure. Ontologybased document classification involves determining document features that represent the Web documents most accurately, and classifying them into the most appropriate categories after analyzing their contents by using at least two pre-defined categories per given document features. In this paper, Web documents are classified in real time not with experimental data or a learning process, but by similar calculations between the terminology information extracted from Web texts and ontology categories. This results in a more accurate document classification since the meanings and relationships unique to each document are determined.
Chapter PDF
Similar content being viewed by others
References
Apt, C., Damerau, F., Weis, S.M.: Towards Language Independent Automated Learning of Text Categorization models. In: Proc. of the 17th annual international ACM-SIGIR (1994)
Shapire, R.E., Singhal, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proc. Of the 21th annual international ACM-SIGIR (1998)
Hearst, M.A.: Support Vector Machines. IEEE Information Systems 13(4), 18–28 (1998)
Prabowo, R., Jackson, M., Burden, P., Knoell, H.-D.: Ontology-Based Automatic Classification for the Web Pages:Design,Implementation and Evaluation. In: Proc. Of the 3rd International Conference on Web Information Systems Engineering (2002)
Jenkins, C., Jackson, M., Burden, P., Wallis, J.: Automatic RDF metadata generation for resource discovery. In: Proc. Of 8th International WWW Conference, Toronto, May 1999, pp. 11–14 (1999)
Ng, Y., Tang, J., Goodrich, M.: A binary categorization approach for classifying multiple-record Web documents using application ontologies and a probabilistic model. In: Proc. of 7th International Conference on Database Systems for Advances Applications, April 2001, pp. 58–65 (2001)
Dumais, S.T., Chen, H.: Hierarchical classification of Web content. In: Proc. of the 23rd Annual International ACM SIGIR, Arthens, Greece, July 24-28 (2000)
Goevert, N., Lalmas, M., Fuhr, N.: A probabilistic description-oriented approach for categorisiong Web documents. In: Proc. Of the 8th ACM International Conference on Information and Knowledge Management, Kansas City, U.S, November 2-4, pp. 475–482 (1999)
Salton, McGill: Introduction to modern information retrival. Mcgraw-Hill, New York (1983)
Hotho, A., Maedche, A., Staab, S.: Ontology-based Text Document Clustering, http://www.aifb.uni-karlsruhe.de/WBS
Lewis, D.D.: Evaluating and optimizing autonomous text classification systems. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) SIGIR 1995: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrival, New York, pp. 246–254 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Song, MH., Lim, SY., Park, SB., Kang, DJ., Lee, SJ. (2005). An Automatic Approach to Classify Web Documents Using a Domain Ontology. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2005. Lecture Notes in Computer Science, vol 3776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590316_107
Download citation
DOI: https://doi.org/10.1007/11590316_107
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30506-4
Online ISBN: 978-3-540-32420-1
eBook Packages: Computer ScienceComputer Science (R0)