Applications of Semidefinite Programming in XML Document Classification

  • Zhonghang Xia
  • Guangming Xing
  • Houduo Qi
  • Qi Li

Extensible Markup Language (XML) has been used as a standard format for data representation over the Internet. An XML document is usually organized by a set of textual data according to a predefined logical structure. It has been shown that storing documents having similar structures together can reduce the fragmentation problem and improve query efficiency. Unlike the flat text document, the XML document has no vectorial representation, which is required in most existing classification algorithms.


Newton Method Kernel Method Edit Distance Kernel Matrix Bayesian Network Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2):121-167, 1998.CrossRefGoogle Scholar
  2. C.J.C. Burges. Geometric methods for feature extraction and dimensional reduction — a guided tour. In The Data Mining and Knowledge Discovery Handbook, pages 59-92. Springer, New York, 2005.CrossRefGoogle Scholar
  3. S. Boyd and L. Xiao. Least-squares covariance matrix adjustment. SIAM Journal on Matrix Analysis and Applications, 27(2):532-546, 2005. Available from World Wide Web: Scholar
  4. T.F. Cox and M.A.A. Cox. Multidimensional Scaling. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, 2nd edition, 2001.Google Scholar
  5. W. Chen. New algorithm for ordered tree-to-tree correction problem. J. Algorithms, 40(2):135-158, 2001.zbMATHCrossRefMathSciNetGoogle Scholar
  6. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at∼cjlin/ libsvm.
  7. F.H. Clark. Optimization and Nonsmooth Analysis. John Wiley and Sons, New York, 1983.Google Scholar
  8. E.R. Canfield and G. Xing. Approximate xml document matching. In SAC ’05: Proceedings of the 2005 ACM Symposium on Applied Computing, pages 787-788. ACM Press, New Work, 2005.CrossRefGoogle Scholar
  9. L. Denoyer and P. Gallinari. XML Document Mining Challenge. Database available at
  10. L. Denoyer and P. Gallinari. Bayesian network model for semistructured document classification. Inf. Process. Manage., 40(5):807-827, 2004.CrossRefGoogle Scholar
  11. M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. Xtract: a system for extracting document type descriptors from xml documents. In SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 165-176. ACM Press, New York, 2000.CrossRefGoogle Scholar
  12. S. Guha, H.V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Integrating xml data sources using approximate joins. ACM Trans. Database Syst., 31(1):161-207, 2006.CrossRefGoogle Scholar
  13. N.J. Higham. Computing the nearest correlation matrix — a problem from finance. IMA Journal of Numerical Analysis, 22(3):329-343, 2002.zbMATHCrossRefMathSciNetGoogle Scholar
  14. T. Joachims. Text categorization with suport vector machines: learning with many relevant features. In Claire N édellec and C éline Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, volume 1398 of Lecture Notes in Computer Science, pages 137-142. Springer, New York, 1998. Available from World Wide Web: joachims97text.html.Google Scholar
  15. G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res., 5:27-72,2004.MathSciNetGoogle Scholar
  16. W. Lian, D.W. Cheung, N. Mamoulis, and S.-M. Yiu. An efficient and scalable algorithm for clustering xml documents by structure. IEEE Transactions on Knowledge and Data Engineering, 16(1):82-96, 2004.CrossRefGoogle Scholar
  17. J. Malick. A dual approach to semidefinite least-squares problems. SIAM J. Matrix Anal. Appl., 26(1):272-284, 2005.CrossRefMathSciNetGoogle Scholar
  18. M. Murata. Hedge automata: a formal model for XML schemata. Web page, 2000. Available from World Wide Web: murata99hedge.html.Google Scholar
  19. A. Nierman and H.V. Jagadish. Evaluating structural similarity in xml documents. In WebDB, pages 61-66, 2002.Google Scholar
  20. H. Qi and D. Sun. A quadratically convergent newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl., 28(2):360-385, 2006.zbMATHCrossRefMathSciNetGoogle Scholar
  21. R.T. Rockafellar. Conjugate duality and optimization. Society for Industrial and Applied Mathematics, Philadelphia, 1974.zbMATHGoogle Scholar
  22. F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34(1):1-47, 2002.CrossRefGoogle Scholar
  23. J.F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11-12:625-653, 1999. Available from World Wide Web: html. Special issue on Interior Point Methods (CD supplement with software).Google Scholar
  24. B. Schölkopf, K. Tsuda, and J.P. Vert. Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.Google Scholar
  25. D. Shasha and K. Zhang. Approximate tree pattern matching. In Pattern Matching Algorithms, pages 341-371. Oxford University Press, New York, 1997. Available from World Wide Web: Scholar
  26. J.-T. Sun, B.-Y. Zhang, Z. Chen, Y.-C. Lu, C.-Y. Shi, and W.-Y. Ma. Ge-cko: A method to optimize composite kernels for web page classification. In WI ’04: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pages 299-305, 2004.Google Scholar
  27. M.J. Zaki and C.C. Aggarwal. Xrules: an effective structural classifier for xml data. In KDD ’03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 316-325. ACM Press, New York, 2003.CrossRefGoogle Scholar
  28. T. Zhang and V.S. Iyengar. Recommender systems using linear classifiers. J. Mach. Learn. Res., 2:313-334, 2002.zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Zhonghang Xia
    • 1
  • Guangming Xing
    • 1
  • Houduo Qi
    • 2
  • Qi Li
    • 1
  1. 1.Department of Computer ScienceWestern Kentucky UniversityBowling Green
  2. 2.Department of MathematicsUniversity of SouthamptonHighfield SouthamptonUK

Personalised recommendations