Rearranging Classified Items in Hierarchies Using Categorization Uncertainty
The classification into hierarchical structures is a problem of increasing importance, e.g. considering the growing use of ontologies or keyword hierarchies used in many web-based information systems. Therefore, it is not surprising that it is a field of ongoing research. Here, we propose an approach that utilizes hierarchy information in the classification process. In contrast to other methods, the hierarchy information is used independently of the classifier rather than integrating it directly. This enables the use of arbitrary standard classification methods. Furthermore, we discuss how hierarchical classification in general and our setting in specific can be evaluated appropriately. We present our algorithm and evaluate it on two datasets of web pages using Naïve Bayes and SVM as baseline classifiers.
KeywordsSupport Vector Machine Prediction Probability Class Hierarchy Child Class Wrong Prediction
Unable to display preview. Download preview PDF.
- BADE, K. and NÜRNBERGER, A. (2005): Supporting Web Search by User Specific Document Categorization: Intelligent Bookmarks. Proc. of LIT05, 115–123.Google Scholar
- CAI, L. and HOFMANN, T. (2004): Hierarchical Document Categorization with Support Vector Machines. Proceedings of 13 th ACM Conference on Information and Knowledge Management, 78–87.Google Scholar
- CECI, M. and MALERBA, D. (2003): Hierarchical Classification of HTML Documents with WebClassII. Proc. of 25 th Europ. Conf. on Inform. Retrieval, 57–72.Google Scholar
- CESA-BIANCHI, N., GENTILE, C., TIRONI, A. and ZANIBONI, L. (2004): Incremental Algorithms for Hierarchical Classification. Neural Information Processing Systems, 233–240.Google Scholar
- CHANG, C. and LIN, C. (2001): LIBSVM: A Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
- DUMAIS, S. and CHEN, H. (2000): Hierarchical Classification of Web Content. Proceedings of the 23 rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 256–263.Google Scholar
- FROMMHOLZ, I. (2001): Categorizing Web Documents in Hierarchical Catalogues. Proceedings of the European Colloquium on Information Retrieval Research.Google Scholar
- GRANITZER, M. and AUER, P. (2005): Experiments with Hierarchical Text Classification. Proc. of 9 th IASTED Intern. Conference on Artificial Intelligence.Google Scholar
- HOTHO, A., NÜRNBERGER, A. and PAAß G. (2005): A Brief Survey of Text Mining. GLDV-J. for Comp. Linguistics & Language Technology, 20,1, 19–62.Google Scholar
- MCCALLUM, A., ROSENFELD, R., MITCHELL, T. and NG, A. (1998): Improving Text Classification by Shrinkage in a Hierarchy of Classes. Proceedings of the 15 th International Conference on Machine Learning (ICML98), 359–367.Google Scholar
- SINKA, M. and CORNE, D. (2002): A Large Benchmark Dataset forWeb Document Clustering. Soft Computing Systems: Design, Management and Applications, Volume 87 of Frontiers in Artificial Intelligence and Applications, 881–890.Google Scholar
- SUN, A. and LIM, E. (2001): Hierarchical Text Classification and Evaluation. Proc. of the 2001 IEEE International Conference on Data Mining, 521–528.Google Scholar