Abstract
Structured link vector model (SLVM) is a representation recently proposed for modeling XML documents, which was extended from the conventional vector space model (VSM) by incorporating document structures. In this paper, we describe the classification approach for XML documents based on SLVM and Support Vector Machine (SVM) in INEX 2007 Document Mining Challenge. The experimental results on the challenge’s data set show that it outperforms any other approach on XML document classification task at the challenge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Early Americas Digital Archive, http://www.mith2.umd.edu:8080/eada/intro.jsp
Contemporary Culture Virtual Archives in XML, http://www.covax.org/
Berry, M.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2003)
Zhang, Z.P., Li, R., Cao, S.L., Zhu, Y.Y.: Similarity Metric for XML Documents. In: Proceedings of the 2003 Workshop on Knowledge and Experience Management (FGWM 2003), Karlsruhe (2003)
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proceedings of the Int. Workshop on the Web and Databases (WebDB), Madison, WI (2002)
Zhang, K., Statman, R., Shasha, D.: On the editing distance between unordered labeled trees. Information Processing Letters 42(3), 133–139 (1992)
Abolhassani, M., Fuhr, N., Malik, S.: HyREX at INEX. In: Proceedings of the 2003 INEX Workshop, Schloss Dagstuhl (2003)
Azevedo, M.I.M., Amorim, L.P., Ziviani, N.: A Universal Model for XML Information Retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 311–321. Springer, Heidelberg (2005)
Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Detecting structural simi-larities between xml documents. In: Proceedings of the International Workshop on the Web and Databases (WebDB), Madison, WI (2002)
Schenkel, R., Theobald, A., Weikum, G.: XXL @ INEX 2003. In: Proceedings of the 2003 INEX Workshop, Schloss Dagstuhl (2003)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Yang, J., Chen, X.: A semi-structured document model for text mining. Journal of Computer Science and Technology 17(5), 603–610 (2002)
Ogilvie, P., Callan, J.: Language Models and Structured Document Retrieval. In: Proceedings of the 2002 INEX Workshop, Schloss Dagstuhl (2002)
Mass, Y., Mandelbrod, M., Amitay, E., Carmel, D., Maarek, Y., Soffer, A.: JuruXML – an XML Retrieval System at INEX 2002. In: Proceedings of the 2002 INEX Workshop, Schloss Dagstuhl (2002)
Crouch, C., Mahajan, A., Bellamkonda, A.: Flexible XML Retrieval Based on the Extended Vector Model. In: Proceedings of the 2004 INEX Workshop, Schloss Dagstuhl (2004)
Liu, S., Chu, W.: Cooperative XML (CoXML) Query Answering at INEX 2003. In: Proceedings of the 2003 INEX Workshop, Schloss Dagstuhl (2003)
Vittaut, J., Piwowarski, B., Gallinari, P.: An algebra for Structured Queries in Bayesian Networks. In: Proceedings of the 2004 INEX Workshop, Schloss Dagstuhl (2004)
Sigurbjornsson, B., Kamps, J., Rijke, M.: The University of Amsterdam at INEX 2004. In: Proceedings of the 2004 INEX Workshop, Schloss Dagstuhl (2004)
Woodley, A., Geva, S.: NLPX at INEX 2004. In: Proceedings of the 2004 INEX Workshop, Schloss Dagstuhl (2004)
Salton, G., McGill, M.J.: Introduction to Modern information Retrieval. McGraw-Hill, New York (1983)
Yang, J.W., Cheung, W.K., Chen, X.O.: Integrating Element Kernel and Term Semantics for Similarity-Based XML Document Clustering. In: Proceedings of 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2005), Compiegne, France (2005)
Vapnic, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Cortes, C., Vapnik, V.: Support Vector networks. Machine Learning 20, 273–297 (1995)
Osuna, R.F., Girosi, F.: Support vector machines: Training and applications. In: A.I. Memo. MIT A.I. Lab (1996)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, pp. 148–155 (1998)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: 22nd Annual Inter-national ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 42–49 (1999)
Cooley, R.: Classification of News Stories Using Support Vector Machines. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence Text Mining Workshop (1999)
Bekkerman, R., Ran, E.Y., Tishby, N., Winter, Y.: On feature distributional clustering for text categorization. In: Proceedings of the 24th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 146–153 (2001)
Collobert, R., Bengio, S.: SVMTorch: support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, J., Zhang, F. (2008). XML Document Classification Using Extended VSM. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds) Focused Access to XML Documents. INEX 2007. Lecture Notes in Computer Science, vol 4862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85902-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-85902-4_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85901-7
Online ISBN: 978-3-540-85902-4
eBook Packages: Computer ScienceComputer Science (R0)