An Automatic Approach to Classify Web Documents Using a Domain Ontology

Song, Mu-Hee; Lim, Soo-Yeon; Park, Seong-Bae; Kang, Dong-Jin; Lee, Sang-Jo

doi:10.1007/11590316_107

Mu-Hee Song¹⁹,
Soo-Yeon Lim¹⁹,
Seong-Bae Park¹⁹,
Dong-Jin Kang¹⁹ &
…
Sang-Jo Lee¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3776))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1512 Accesses
5 Citations

Abstract

This paper suggests an automated method for document classification using an ontology, which expresses terminology information and vocabulary contained in Web documents by way of a hierarchical structure. Ontologybased document classification involves determining document features that represent the Web documents most accurately, and classifying them into the most appropriate categories after analyzing their contents by using at least two pre-defined categories per given document features. In this paper, Web documents are classified in real time not with experimental data or a learning process, but by similar calculations between the terminology information extracted from Web texts and ontology categories. This results in a more accurate document classification since the meanings and relationships unique to each document are determined.

Download to read the full chapter text

Chapter PDF

Improving Document Classification Effectiveness Using Knowledge Exploited by Ontologies

Automatic Document Classification Based on J.S. Mill’s Ideas

A General Framework for Text Document Classification Using SEMCON and ACVSR

Keywords

References

Apt, C., Damerau, F., Weis, S.M.: Towards Language Independent Automated Learning of Text Categorization models. In: Proc. of the 17th annual international ACM-SIGIR (1994)
Google Scholar
Shapire, R.E., Singhal, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proc. Of the 21th annual international ACM-SIGIR (1998)
Google Scholar
Hearst, M.A.: Support Vector Machines. IEEE Information Systems 13(4), 18–28 (1998)
Google Scholar
Prabowo, R., Jackson, M., Burden, P., Knoell, H.-D.: Ontology-Based Automatic Classification for the Web Pages:Design,Implementation and Evaluation. In: Proc. Of the 3rd International Conference on Web Information Systems Engineering (2002)
Google Scholar
Jenkins, C., Jackson, M., Burden, P., Wallis, J.: Automatic RDF metadata generation for resource discovery. In: Proc. Of 8th International WWW Conference, Toronto, May 1999, pp. 11–14 (1999)
Google Scholar
Ng, Y., Tang, J., Goodrich, M.: A binary categorization approach for classifying multiple-record Web documents using application ontologies and a probabilistic model. In: Proc. of 7th International Conference on Database Systems for Advances Applications, April 2001, pp. 58–65 (2001)
Google Scholar
Dumais, S.T., Chen, H.: Hierarchical classification of Web content. In: Proc. of the 23rd Annual International ACM SIGIR, Arthens, Greece, July 24-28 (2000)
Google Scholar
Goevert, N., Lalmas, M., Fuhr, N.: A probabilistic description-oriented approach for categorisiong Web documents. In: Proc. Of the 8th ACM International Conference on Information and Knowledge Management, Kansas City, U.S, November 2-4, pp. 475–482 (1999)
Google Scholar
Salton, McGill: Introduction to modern information retrival. Mcgraw-Hill, New York (1983)
Google Scholar
Hotho, A., Maedche, A., Staab, S.: Ontology-based Text Document Clustering, http://www.aifb.uni-karlsruhe.de/WBS
Lewis, D.D.: Evaluating and optimizing autonomous text classification systems. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) SIGIR 1995: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrival, New York, pp. 246–254 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Engineering, Information Technology Services, Kyungpook National University, Daegu, The Korea
Mu-Hee Song, Soo-Yeon Lim, Seong-Bae Park, Dong-Jin Kang & Sang-Jo Lee

Authors

Mu-Hee Song
View author publications
You can also search for this author in PubMed Google Scholar
Soo-Yeon Lim
View author publications
You can also search for this author in PubMed Google Scholar
Seong-Bae Park
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Jin Kang
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Jo Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Soft Computing Research, Machine Intelligence Unit, Indian Statistical Institute, India
Sankar K. Pal
Machine Intelligence Unit, Indian Statistical Institute, 203 B. T. Road, 700108, Kolkata
Sanghamitra Bandyopadhyay
Machine Intelligence Unit, Indian Statistical Institute, 700 108, Kolkata, India
Sambhunath Biswas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, MH., Lim, SY., Park, SB., Kang, DJ., Lee, SJ. (2005). An Automatic Approach to Classify Web Documents Using a Domain Ontology. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2005. Lecture Notes in Computer Science, vol 3776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590316_107

Download citation

DOI: https://doi.org/10.1007/11590316_107
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30506-4
Online ISBN: 978-3-540-32420-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

An Automatic Approach to Classify Web Documents Using a Domain Ontology

Abstract

Chapter PDF

Similar content being viewed by others

Improving Document Classification Effectiveness Using Knowledge Exploited by Ontologies

Automatic Document Classification Based on J.S. Mill’s Ideas

A General Framework for Text Document Classification Using SEMCON and ACVSR

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

An Automatic Approach to Classify Web Documents Using a Domain Ontology

Abstract

Chapter PDF

Similar content being viewed by others

Improving Document Classification Effectiveness Using Knowledge Exploited by Ontologies

Automatic Document Classification Based on J.S. Mill’s Ideas

A General Framework for Text Document Classification Using SEMCON and ACVSR

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation