Skip to main content
Log in

Boosting for graph classification with universum

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Recent years have witnessed extensive studies of graph classification due to the rapid increase in applications involving structural data and complex relationships. To support graph classification, all existing methods require that training graphs should be relevant (or belong) to the target class, but cannot integrate graphs irrelevant to the class of interest into the learning process. In this paper, we study a new universum graph classification framework which leverages additional “non-example” graphs to help improve the graph classification accuracy. We argue that although universum graphs do not belong to the target class, they may contain meaningful structure patterns to help enrich the feature space for graph representation and classification. To support universum graph classification, we propose a mathematical programming algorithm, ugBoost, which integrates discriminative subgraph selection and margin maximization into a unified framework to fully exploit the universum. Because informative subgraph exploration in a universum setting requires the search of a large space, we derive an upper bound discriminative score for each subgraph and employ a branch-and-bound scheme to prune the search space. By using the explored subgraphs, our graph classification model intends to maximize the margin between positive and negative graphs and minimize the loss on the universum graph examples simultaneously. The subgraph exploration and the learning are integrated and performed iteratively so that each can be beneficial to the other. Experimental results and comparisons on real-world dataset demonstrate the performance of our algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. We use bold-faced letters (\(\varvec{w,\xi }\)) to indicate a vector, normal letters \(w_i\) and \(\xi _i\) to represent scalar values.

  2. The derivation from Eqs. (7) to (8) is illustrated in “Appendix.”

  3. We can obtain the dual solutions of Eq. (8) immediately after solving Eq. (7) by using CVX package, available from http://cvxr.com/cvx/.

  4. http://cvxr.com/cvx/.

  5. Available at http://www.epa.gov/ncct/dsstox/sdf_epafhm.html.

  6. http://arnetminer.org/citation.

References

  1. Aggarwal C (2011) On classification of graph streams. In: Proceeding of the SDM. Arizona, USA

  2. Bai X, Cherkassky V (2008) Gender classification of human faces using inference through contradictions. In: IJCNN, pp 746–750

  3. Chen S, Zhang C (2009) Selecting informative universum sample for semi-supervised learning. IJCAI 6:1016–1021

    Google Scholar 

  4. Demiriz A, Bennett K, Shawe-Taylor J (2002) Linear programming boosting via column generation. Mach Learn 46:225–254

    Article  MATH  Google Scholar 

  5. Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17:1036–1050

    Article  Google Scholar 

  6. Fei H, Huan J (2008) Structure feature selection for graph classification. In: Proceedings of the ACM CIKM, California, USA

  7. Fei H, Huan J (2010) Boosting with structure information in the functional space: an application to graph classification. In: Proceedings of the ACM SIGKDD, Washington DC, USA

  8. Gaüzere B, Brun L, Villemin D (2012) Two new graphs kernels in chemoinformatics. Pattern Recognit Lett 33(15):2038–2047

    Article  Google Scholar 

  9. Guo T, Zhu X (2013) Understanding the roles of sub-graph features for graph classification: an empirical study perspective. In: Proceedings of the ACM CIKM Conference, pp 817–822. ACM

  10. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36

    Article  Google Scholar 

  11. Jiang C, Coenen F, Sanderson R, Zito M (2010) Text classification using graph mining-based feature extraction. Knowl Based Syst 23(4):302–308

    Article  Google Scholar 

  12. Jin N, Young C, Wang W (2009) Graph classification based on pattern co-occurrence. In: Proceedings of the ACM CIKM, Hong Kong, China

  13. Jin N, Young C, Wang W (2010) GAIA: graph classification using evolutionary computation. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp 879–890. ACM

  14. Joachims T (2006) Training linear svms in linear time. In: KDD, pp 217–226

  15. Kashima H, Tsuda K, Inokuchi A (2004) Kernels for Graphs, chap. In: Schlkopf B, Tsuda K, Vert JP (eds) Kernel methods in computational biology. MIT Press, Cambridge

    Google Scholar 

  16. Kong X, Philip SY (2012) gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31(2):281–305

    Article  Google Scholar 

  17. Kong X, Yu P (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the ACM SIGKDD, Washington, DC, USA

  18. Luenberger D (1997) Optimization by vector space methods. Wiley, New York

    MATH  Google Scholar 

  19. Nash S, Sofer A (1996) Linear and nonlinear programming. McGraw-Hill, New York

    Google Scholar 

  20. Pan S, Wu J, Zhu X (2015) Cogboost: boosting for fast cost-sensitive graph classification. IEEE Trans Knowl Data Eng 27(11):2933–2946. doi:10.1109/TKDE.2015.2391115

    Article  Google Scholar 

  21. Pan S, Wu J, Zhu X, Long G, Zhang C (2015) Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognit 48(11):3783–3796

    Article  Google Scholar 

  22. Pan S, Wu J, Zhu X, Zhang C (2015) Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Trans Cybern 45(5):940–954

    Google Scholar 

  23. Pan S, Wu J, Zhu X, Zhang C, Yu P (2015) Joint structure feature exploration and regularization for multi-task graph classification. IEEE Trans Knowl Data Eng 28(3):715–728. doi:10.1109/TKDE.2015.2492567

    Article  Google Scholar 

  24. Pan S, Zhu X (2013) Graph classification with imbalanced class distributions and noise. In: IJCAI

  25. Pan S, Zhu X, Zhang C, Yu PS (2013) Graph stream classification using labeled and unlabeled graphs. In: International Conference on Data Engineering (ICDE), IEEE

  26. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  27. Peng B, Qian G, Ma Y (2008) View-invariant pose recognition using multilinear analysis and the universum. In: Advances in visual computing, pp 581–591. Springer

  28. Peng B, Qian G, Ma Y (2009) Recognizing body poses using multilinear analysis and semi-supervised learning. Pattern Recognit Lett 30(14):1289–1294

    Article  Google Scholar 

  29. Prakash BA, Vreeken J, Faloutsos C (2014) Efficiently spotting the starting points of an epidemic in a large graph. Knowl Inf Syst 38(1):35–59

    Article  Google Scholar 

  30. Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th international conference on machine learning. ACM, pp 759–766

  31. Ranu S, Singh A (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceedings of the ICDE, IEEE, pp 844–855

  32. Riesen K, Bunke H (2009) Graph classification by means of Lipschitz embedding. IEEE Trans SMC B 39:1472–1483

    Google Scholar 

  33. Russom CL, Bradbury SP, Broderius SJ, Hammermeister DE, Drummond RA (1997) Predicting modes of toxic action from chemical structure: acute toxicity in the fathead minnow (Pimephales promelas). Environ Toxicol Chem 16(5):948–967

    Article  Google Scholar 

  34. Saigo H, Nowozin S, Kadowaki T, Kudo T, Tsuda K (2009) gboost: a mathematical programming approach to graph classification and regression. Mach Learn 75:69–89

    Article  Google Scholar 

  35. Shen C, Wang P, Shen F, Wang H (2012) Uboost: boosting with the universum. IEEE Trans Pattern Anal Mach Intell 34(4):825–832

    Article  Google Scholar 

  36. Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler-lehman graph kernels. J Mach Learn Res 12:2539–2561

    MathSciNet  MATH  Google Scholar 

  37. Shi X, Kong X, Yu PS (2012) Transfer significant subgraphs across graph databases. In: Proceedings of the SIAM international conference on data mining. SDM

  38. Sinz FH, Chapelle O, Agarwal A, Schlkopf B (2007) An analysis of inference with the universum. In: NIPS’07, pp 1–1

  39. Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure-activity relationships. J Med Chem 47(22):5541–5554

    Article  Google Scholar 

  40. Thoma M, Cheng H, Gretton A, Han J, Kriegel H, Smola A, Song L, Yu P, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the SDM. USA

  41. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B Methodol 58(1):267–288

  42. Wang H, Zhang P, Tsang I, Chen L, Zhang C (2015) Defragging subgraph features for graph classification. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1687–1690. ACM

  43. Wang Z, Zhu Y, Liu W, Chen Z, Gao D (2014) Multi-view learning with universum. Knowl Based Syst 70:376–391. doi:10.1016/j.knosys.2014.07.019

    Article  Google Scholar 

  44. Weston J, Collobert R, Sinz F, Bottou L, Vapnik V (2006) Inference with the universum. In: Proceedings of the 23rd international conference on machine learning, pp 1009–1016. ACM

  45. Wu J, Hong Z, Pan S, Zhu X, Cai Z, Zhang C (2015) Multi-graph-view subgraph mining for graph classification. Knowl Inf Syst. doi:10.1007/s10115-015-0872-1

  46. Wu J, Hong Z, Pan S, Zhu X, Zhang C, Cai Z (2014) Multi-graph learning with positive and unlabeled bags. In: Proceedings of the 2014 SIAM international conference on data mining (SDM), pp 217–225

  47. Wu J, Zhu X, Zhang C, Cai Z (2013) Multi-instance multi-graph dual embedding learning. In: ICDM, pp 827–836

  48. Wu J, Zhu X, Zhang C, Yu PS (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396

    Article  Google Scholar 

  49. Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by leap search. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 433–444. ACM

  50. Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the ICDM, Maebashi City, Japan

  51. Zhang D, Wang J, Wang F, Zhang C (2008) Semi-supervised classification with universum. In: SDM, pp 323–333. SIAM

  52. Zhao Y, Kong X, Yu PS (2011) Positive and unlabeled learning for graph classification. In: IEEE 11th international conference on Data Mining (ICDM), 2011, pp 962–971. IEEE

  53. Zhu X (2006) Semi-supervised learning literature survey. Comput Sci Univ Wis Madison 2:3

    Google Scholar 

  54. Zhu X (2011) Cross-domain semi-supervised learning using feature formulation. IEEE Trans Syst Man Cybern Part B 41(6):1627–1638

    Article  Google Scholar 

  55. Zhu Y, Yu J, Cheng H, Qin L (2012) Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the CIKM, pp 205–214. ACM

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Wu.

Appendix

Appendix

Lagrangian Dual of Eq. (7). The Lagrangian function of Eq. (7) can be written as:

$$\begin{aligned} L(\varvec{\xi },\varvec{\psi },\varvec{\eta }, \varvec{w})= & {} \Vert \mathbf {w}\Vert + C_l\sum \limits _{i=1}^{l} \xi _i + C_u\sum \limits _{j=l+1}^n{(\psi _j+\eta _j)}\nonumber \\&-\, \sum _{i=1}^{l}\alpha _i\{y_i \sum \limits _{k=1}^{m}w_k\cdot \hbar _{g_k}(G_i) + \xi _i -1\} \nonumber \\&+\, \sum _{j=l+1}^{n}\beta _j\{\sum \limits _{k=1}^{m}w_k \cdot \hbar _{g_k}(G_j)-\varepsilon - \psi _j\} \nonumber \\&-\, \sum _{j=l+1}^{n}p_j\{\sum \limits _{k=1}^{m}w_k \cdot \hbar _{g_k}(G_j)+\varepsilon + \eta _j\} \nonumber \\&-\,\varvec{r}^T\varvec{w}-\varvec{s}^T\varvec{\xi }-\varvec{q}^T\varvec{\psi }-\varvec{z}^T\varvec{\eta } \end{aligned}$$
(14)

where, we have \(\alpha _i \ge 0, \beta _i \ge 0, p_i \ge 0, r_i \ge 0, s_i \ge 0, q_i \ge 0, z_i \ge 0\).

At optimum, the first derivative of the Lagrangian with respect to the primal variables (\(\varvec{\xi },\varvec{w}\), \(\varvec{\psi }\) and \(\varvec{\eta }\)) must vanish,

$$\begin{aligned} \frac{\partial L}{\partial \xi _i}= & {} C_l - \alpha _i -s_i = 0 ~~\Rightarrow 0 \le \alpha _i \le C_l\\ \frac{\partial L}{\partial \psi _i}= & {} C_u - \beta _i -q_i = 0 ~~\Rightarrow 0 \le \beta _i \le C_u\\ \frac{\partial L}{\partial \eta _i}= & {} C_u - p_i -s_i = 0 ~~\Rightarrow 0 \le p_i \le C_u\\ \frac{\partial L}{\partial w_k}= & {} 1 - \sum \limits _{i=1}^l\alpha _i y_i \hbar _{g_k}(G_i) + \sum \limits _{j=l+1}^n \beta _j \hbar _{g_k}(G_j) \\&-\, \sum \limits _{j=l+1}^n p_j \hbar _{g_k}(G_j) -r_k = 0 \\\Rightarrow & {} \sum \limits _{i=1}^l\alpha _i y_i \hbar _{g_k}(G_j) + \sum \limits _{j=l+1}^n (p_j-\beta _j) \hbar _{g_k}(G_j) \le 1, \quad \forall k \end{aligned}$$

Substituting these variables in Eq. (14), we obtain the its dual problem as Eq. (8).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, S., Wu, J., Zhu, X. et al. Boosting for graph classification with universum. Knowl Inf Syst 50, 53–77 (2017). https://doi.org/10.1007/s10115-016-0934-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0934-z

Keywords

Navigation