A robust one-class transfer learning method with uncertain data

Xiao, Yanshan; Liu, Bo; Yu, Philip S.; Hao, Zhifeng

doi:10.1007/s10115-014-0765-8

A robust one-class transfer learning method with uncertain data

Regular Paper
Published: 29 July 2014

Volume 44, pages 407–438, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yanshan Xiao¹,
Bo Liu²,
Philip S. Yu^3,4 &
…
Zhifeng Hao¹

786 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

One-class classification aims at constructing a distinctive classifier based on one class of examples. Most of the existing one-class classification methods are proposed based on the assumptions that: (1) there are a large number of training examples available for learning the classifier; (2) the training examples can be explicitly collected and hence do not contain any uncertain information. However, in real-world applications, these assumptions are not always satisfied. In this paper, we propose a novel approach called uncertain one-class transfer learning with support vector machine (UOCT-SVM), which is capable of constructing an accurate classifier on the target task by transferring knowledge from multiple source tasks whose data may contain uncertain information. In UOCT-SVM, the optimization function is formulated to deal with uncertain data and transfer learning based on one-class SVM. Then, an iterative framework is proposed to solve the optimization function. Extensive experiments have showed that UOCT-SVM can mitigate the effect of uncertain data on the decision boundary and transfer knowledge from source tasks to help build an accurate classifier on the target task, compared with state-of-the-art one-class classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

One-Class Transfer Learning with Uncertain Data

Multi-class support vector machine based on the minimization of class variance

Article 03 January 2021

Zhiqiang Zhang, Zeqian Xu, … Hui Zou

LMSVCR: novel effective method of semi-supervised multi-classification

Article 08 November 2021

Zijie Dong, Yimo Qin, … Yuan Yan Tang

Notes

In the experiments, we initialize $\Delta \overline{\mathbf{x }}_{1i}=0$ and $\Delta \overline{\mathbf{x }}_{2j}=0$.
Available at http://www.daviddlewis.com/resources/testcollections/.
Available at http://people.csail.mit.edu/jrennie/20Newsgroups/.
Available at http://archive.ics.uci.edu/ml/datasets/Mushroom.
Available from http://archive.ics.uci.edu/ml/datasets/ISOLET.
Available from http://dis.ijs.si/confidence/dataset.html.

References

Schölkopf B, Williamson RC, Smola A, Shawe-Taylor J (1999) Support vector method for novelty detection. In: Proceedings of neural information processing systems 1999, pp 582–588
Manevitz LM, Yousef M (2002) One-class SVMs for document classiffication. J Mach Learn Res 2:139–154
Google Scholar
Ma J, Perkins S (2003) Time-series novelty detection using one-class support vector machines. In: Proceedings of international joint conference on neural networks 2003, pp 1741–1745
Li J, Su L, Cheng C (2011) Finding pre-images via evolution strategies. Appl Soft Comput 11(6):4183–4194
Article MATH Google Scholar
Takruri M, Rajasegarar S, Challa S, Leckie C, Palaniswami M (2011) Spatio-temporal modelling-based drift-aware wireless sensor networks. Wirel Sens Syst 1(2):110–122
Article Google Scholar
Münoz-Marí J, Bovolo F, Gomez-Chova L, Bruzzone L, Camp-Valls G (2010) Semisupervised one-class support vector machines for classification of remote sensing data. IEEE Trans Geosci Remote Sens 48(8):3188–3197
Article Google Scholar
Yu H, Han J, Chang KCC (2004) Pebl: web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81
Article MATH Google Scholar
Fung GPC, Yu JX, Lu H, Yu PS (2006) Text classification without negative examples revisit. IEEE Trans Knowl Data Eng 18:6–20
Article Google Scholar
Liu B, Xiao Y, Cao L, Yu PS (1995) One-class-based uncertain data stream learning. In: Proceedings of SIAM international conference on data mining 2011, pp 992–1003
Pan SJ, Tsand IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Article Google Scholar
Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623
Article Google Scholar
Kriegel HP, Pfeifle M (2005) Hierarchical density based clustering of uncertain data. In: Proceedings of international conference on data engineering 2005, pp 689–692
Ngai W, Kao B, Chui C, Cheng R, Chau M, Yip KY (2006) Efficient clustering of uncertain data. In: Proceedings of international conference on data mining 2006, pp 436–445
Aggarwal CC (2007) On density based transforms for uncertain data mining. In: Proceedings of international conference on data engineering 2007, pp 866–875
Bi J, Zhang T (2004) Support vector classification with input data uncertainty. In: Proceedings of neural information processing systems, 2004
Gao C, Wang J (2010) Direct mining of discriminative patterns for classifying uncertain data. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining 2010, pp 861–870
Tsang S, Kao B, Yip KY, Ho WS, Lee SD (2011) Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1):64–78
Article Google Scholar
Murthy R, Ikeda R, Widom J (2011) Making aggregation work in uncertain and probabilistic databases. IEEE Trans Knowl Data Eng 22(8):1261–1273
Article Google Scholar
Yuen SM, Tao Y, Xiao X, Pei J, Zhang D (2010) Superseding nearest neighbor search on uncertain spatial databases. IEEE Trans Knowl Data Eng 22(7):1041–1055
Article Google Scholar
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining 2010, pp 273–282
Dai W, Xue G, Yang Q, Yu Y (2007) Transferring naive bayes classifiers for text classification. In: Proceedings of the AAAI conference on artificial intelligence 2007, pp 540–545
Jiang J, Zhai C (2007) Instance weighting for domain adaptation in NLP. In: Proceedings of the association for computational linguistics 2007, pp 264–271
Liao X, Xue Y, Carin L (2005) Logistic regression with an auxiliary data source. In: Proceedings of the international conference on machine learning 2005, pp 505–512
Huang J, Smola A, Gretton A, Borgwardt KM, Schölkopf B (2007) Correcting sample selection bias by unlabeled data. In: Proceedings of the neural information processing systems 2007, pp 601–608
Zheng VW, Yang Q, Xiang W, Shen D (2008) Transferring localization models over time. In: Proceedings of the AAAI conference on artificial intelligence 2008, pp 1421–1426
Pan SJ, Shen D, Yang Q, Kwok JT (2008) Transferring localization models across space. In: Proceedings of the AAAI conference on artificial Intelligence 2008, pp 1383–1388
Raykar VC, Krishnapuram B, Bi J, Dundar M, Rao RB (2008) Bayesian multiple instance learning: automatic feature selection and inductive transfer. In: Proceedings of the international conference on machine learning 2008, pp 808–815
Pan SJ, Qiang Y (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Dai W, Yang Q, Xue G, Yu Y (2007) Boosting for transfer learning. In: Proceedings of the international conference on machine learning 2007, pp 193–200
Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the international conference on machine learning 2007, pp 759–766
Dai W, Xue G, Yang Q, Yu Y (2007) Co-clustering based classification for out-of-domain documents. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining 2007, pp 432–444
Ando RK, Zhang T (2005) A high-performance semi-supervised learning method for text chunking. In: Proceedings of the association for computational linguistics 2005, pp 1–9
Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of the international conference on machine learning 2004, pp 432–444
Schwaighofer A, Tresp V, Yu K (2005) Learning gaussian process kernels via hierarchical bayes. In: Proceedings of the neural information processing systems 2005, pp 1209–1216
Gao J, Fan W, Jiang J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining 2008, pp 283–291
Mihalkova L, Huynh T, Mooney RJ (2007) Mapping and revising markov logic networks for transfer learning. In: Proceedings of the AAAI conference on artificial intelligence 2007, pp 608–614
Mihalkova L, Mooney RJ (2008) Transfer learning by mapping with minimal target data. In: Proceedings of workshop transfer learning for complex tasks with AAAI, 2008
Davis J, Domingos P (2008) Deep transfer via second-order markov logic. In: Proceedings of workshop transfer learning for complex tasks with AAAI, 2008
Bonilla EV, Agakov F, Williams C (2007) Kernel multi-task learning using task-specific features. In: Proceedings of the international conference on artificial intelligence and statistics 2007, pp 43–50
Yu K, Tresp V, Schwaighofer A (2005) Learning gaussian processes from multiple tasks. In: Proceedings of the international conference on machine learning 2005, pp 1012–1019
Bakker B, Heskes T (2003) Task clustering and gating for bayesian multitask learning. J Mach Learn Res 4:83–99
Google Scholar
Huffel SV, Vandewalle J (1991) The total least squares problem: computational aspects and analysis. Frontiers in applied mathematics. SIAM Press, Philadelphia
Book Google Scholar
Vapnik V (1998) Statistical learning theory. Frontiers in applied mathematics. Springer, London
MATH Google Scholar
Wang F, Zhao B, Zhang CS (2010) Linear time maximum margin clustering. IEEE Trans Neural Netw 21(2):319–332
Article Google Scholar
Chen J, Liu X (2014) Transfer learning with one-class data. Pattern Recognit Lett 37(1):32–40
Article Google Scholar
Schölkopf B, Herbrich R, Smola AJ, Williamson RC (2001) A generalized representer theorem. In: Proceedings of the annual conference on learning theory 2001, pp 416–426
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Article Google Scholar
William J, Shaw M (1986) On the foundation of evaluation. Am Soc Inf Sci 37(5):346–348
Article Google Scholar
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Article MATH Google Scholar
Cao B, Pan J, Zhang Y, Yeung DY, Yang Q (2010) Adaptive transfer learning. In: Proceedings of the AAAI conference on artificial intelligence, 2010
Aggarwal CC, Yu PS (2008) A framework for clustering uncertain data streams. In: Proceedings of the international conference on data engineering 2008, pp 150–159
Cole R, Fanty MA (1990) Spoken letter recognition. In: Proceedings of the workshop on speech and natural language 1990, pp 385–390
Yin J, Yang Q, Pan JJ (2008) Sensor-based abnormal human-activity detection. IEEE Trans Knowl Data Eng 20(8):1082–1090
Article Google Scholar
Tsang IW, Kwok JT, Cheung PM (2005) Core vector machines: Fast SVM training on very large data sets. J Mach Learn Res 6:363–392
MathSciNet Google Scholar
Dong JX, Devroye L, Suen CY (2005) Core vector machines: fast SVM training algorithm with decomposition on very large data sets. IEEE Trans Pattern Anal Mach Intell 27(4):603–618
Article Google Scholar
Tresp V (2000) A Bayesian committee machine. Neural Comput 12(11):2719–2741
Google Scholar
Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the international conference on machine learning 2007, pp 807–814
Kivinen J, Smola AJ, Williamson RC (2004) Online learning with kernels. IEEE Trans Signal Process 52(8):1–12
Article MathSciNet Google Scholar
Dragomir SS (2003) A survey on cauchy-bunyakovsky-schwarz type discrete inequalities. J Inequal Pure Appl Math 4(3):1–142
MATH Google Scholar

Download references

Acknowledgments

This work is supported by Natural Science Foundation of China (61070033, 61203280, 61202270), Guangdong Natural Science Funds for Distinguished Young Scholar (S2013050014133), Natural Science Foundation of Guangdong province (9251009001000005, S2012040007078), Specialized Research Fund for the Doctoral Program of Higher Education (20124420120004), Science and Technology Plan Project of Guangzhou City(12C42111607, 201200000031,2012J5100054), Science and Technology Plan Project of Panyu District Guangzhou (2012-Z-03-67), Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, GDUT Overseas Outstanding Doctoral Fund (405120095), US NSF through Grants IIS-0905215, CNS-1115234, IIS-0914934, DBI-0960443, and OISE-1129076, US Department of Army through Grant W911NF-12-1-0066, Google Mobile 2014 Program and KAU grant.

Author information

Authors and Affiliations

School of Computers, Guangdong University of Technology, Guangzhou, China
Yanshan Xiao & Zhifeng Hao
School of Automation, Guangdong University of Technology, Guangzhou, China
Bo Liu
Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
Philip S. Yu
Computer Science Department, King Abdulaziz University, Jeddah, Saudi Arabia
Philip S. Yu

Authors

Yanshan Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Liu.

Appendix

1.1 Proof for Theorem 1

Assume that $\alpha _{1i} \ge 0,\alpha _{2j} \ge 0,\beta _{1i} \ge 0$ and $\beta _{2j} \ge 0$ are Lagrange multipliers. The Lagrange function of problem (8) can be given as

$$\begin{aligned} L&= ||\mathbf w _0||^2 + C_1 ||\mathbf v _1||^2 + C_2 ||\mathbf v _2||^2 - \rho _1 - \rho _2 + C_1 \sum _{i=1}^{|S_1|} \xi _{1i} + C_2 \sum _{j=1}^{|S_2|} \xi _{2j} \nonumber \\&\quad +\,\sum _{i=1}^{|S_1|} \alpha _{1i} [ \rho _1 -\xi _{1i} - (\mathbf w _0+\mathbf v _1)^T \overline{\mathbf{x }}_{1i} ] +\sum _{j=1}^{|S_2|} \alpha _{2j} [ \rho _2 -\xi _{2j} - (\mathbf w _0+\mathbf v _2)^T \overline{\mathbf{x }}_{2j} ] \nonumber \\&\quad -\,\sum _{i=1}^{|S_1|} \beta _{1i} \xi _{1i} - \sum _{j=1}^{|S_2|} \beta _{2j} \xi _{2j} \end{aligned}$$

(37)

where it has $\overline{\mathbf{x }}_{1i} = \mathbf x _{1i} + \Delta \overline{\mathbf{x }}_{1i}$ and $\overline{\mathbf{x }}_{2j} = \mathbf x _{2j} + \Delta \overline{\mathbf{x }}_{2j}$.

By differentiating the Lagrange function (37) with $\mathbf w _0,\mathbf v _1,\mathbf v _2,\rho _1,\rho _2,\xi _{1i}$ and $\xi _{2j}$, respectively, the following equations can be obtained.

$$\begin{aligned} \frac{\partial L}{\partial \mathbf w _0}&= 2\mathbf w _0 - \sum _{i=1}^{|S_1|} \alpha _{i1} \overline{\mathbf{x }}_{i1} - \sum _{j=1}^{|S_2|} \alpha _{2j} \overline{\mathbf{x }}_{2j}=0,\end{aligned}$$

(38)

$$\begin{aligned} \frac{\partial L}{\partial \mathbf v _1}&= 2 C_1 \mathbf v _1 - \sum _{i=1}^{|S_1|} \alpha _{1i} \overline{\mathbf{x }}_{1i}=0, \end{aligned}$$

(39)

$$\begin{aligned} \frac{\partial L}{\partial \mathbf v _2}&= 2 C_2 \mathbf v _2- \sum _{j=1}^{|S_2|} \alpha _{2j} \overline{\mathbf{x }}_{2j}=0,\end{aligned}$$

(40)

$$\begin{aligned} \frac{\partial L}{\partial \rho _1}&= -1+\sum _{i=1}^{|S_1|} \alpha _{1i}=0, \end{aligned}$$

(41)

$$\begin{aligned} \frac{\partial L}{\partial \rho _2}&= -1+\sum _{j=1}^{|S_2|} \alpha _{2j}=0, \end{aligned}$$

(42)

$$\begin{aligned} \frac{\partial L}{\partial \xi _{1i}}&= C_1 - \alpha _{1i} - \beta _{1i} =0, \ \ i=1, \ldots , |S_1| \end{aligned}$$

(43)

$$\begin{aligned} \frac{\partial L}{\partial \xi _{2j}}&= C_2 - \alpha _{2j} - \beta _{2j} =0, \ \ j=1, \ldots , |S_2| \end{aligned}$$

(44)

From Eqs. (38)–(44), it is easy to deduce that

$$\begin{aligned}&\mathbf w _0 = \frac{1}{2} \left( \sum _{i=1}^{|S_1|}\alpha _{1i} \overline{\mathbf{x }}_{1i}+ \sum _{j=1}^{|S_2|} \alpha _{2j} \overline{\mathbf{x }}_{2j}\right) , \end{aligned}$$

(45)

$$\begin{aligned}&\mathbf v _1 = \frac{1}{ 2 C_1}\sum _{i=1}^{|S_1|} \alpha _{1i}\overline{\mathbf{x }}_{1i}, \end{aligned}$$

(46)

$$\begin{aligned}&\mathbf v _2 = \frac{1}{ 2 C_2}\sum _{j=1}^{|S_2|} \alpha _{2j}\overline{\mathbf{x }}_{2j}, \end{aligned}$$

(47)

$$\begin{aligned}&\sum _{i=1}^{|S_1|} \alpha _{1i} = 1, \end{aligned}$$

(48)

$$\begin{aligned}&\sum _{j=1}^{|S_2|} \alpha _{2j} = 1,\end{aligned}$$

(49)

$$\begin{aligned}&C_1 = \alpha _{1i} + \beta _{1i}, \ \ i=1, \ldots , |S_1|\end{aligned}$$

(50)

$$\begin{aligned}&C_2 =\alpha _{2j} + \beta _{2j}, \ \ \ j=1, \ldots , |S_2| \end{aligned}$$

(51)

Since it has $\beta _{1i} \ge 0$ and $\beta _{2j} \ge 0$, from (50) and (51), we can obtain

$$\begin{aligned}&0 \le \alpha _{1i} \le C_1, \ \ i=1, \ldots , |S_1| \end{aligned}$$

(52)

$$\begin{aligned}&0 \le \alpha _{2j} \le C_2, \ \ j=1, \ldots , |S_2| \end{aligned}$$

(53)

By substituting (38)–(53) into the Lagrange function (37), the dual form of problem (8) can be written as

$$\begin{aligned} \max&-\frac{1}{2} \sum _{i=1}^{|S_1|} \sum _{j=1}^{|S_2|} \alpha _{1i} \overline{\mathbf{x }}_{1i}^T \overline{\mathbf{x }}_{2j} \alpha _{2j} - \frac{C_1+1}{4C_1} \sum _{h=1}^{|S_1|} \sum _{g=1}^{|S_1|} \alpha _{1h} \overline{\mathbf{x }}_{1h}^T \overline{\mathbf{x }}_{1g} \alpha _{1g} \\&- \frac{C_2+1}{4C_2} \sum _{p=1}^{|S_2|} \sum _{k=1}^{|S_2|} \alpha _{2p} \overline{\mathbf{x }}_{2p}^T \overline{\mathbf{x }}_{2k} \alpha _{2k}\\ \mathrm{s.t.}&\sum _{i=1}^{|S_1|} \alpha _{1i} =1, \ \ \ 0 \le \alpha _{1i} \le C_1, \ \ i=1, \ldots , |S_1| \\&\sum _{j=1}^{|S_2|} \alpha _{2j} =1, \ \ \ 0 \le \alpha _{2j} \le C_2, \ \ j=1, \ldots , |S_2| \end{aligned}$$

$\square $

1.2 Proof for Theorem 2

In Theorem 2, we fix $\mathbf w _0,\mathbf v _1,\mathbf v _2,\rho _1$ and $\rho _2$ to be $\overline{\mathbf{w }}_0,\overline{\mathbf{v }}_1,\overline{\mathbf{v }}_2,\overline{\rho }_1$ and $\overline{\rho }_2$, respectively, and attempt to minimize the value of the objective function (7) by optimizing $\Delta \mathbf x _{1i}$ and $\Delta \mathbf x _{2j}$. From (7), the objective function’s value is determined by $\sum _{t=1}^{2} \sum _{i=1}^{|S_t|} \xi _{ti}$ since $\mathbf w _0,\mathbf v _1,\mathbf v _2,\rho _1$ and $\rho _2$ are fixed. Hence, we need to optimize $\Delta \mathbf x _{1i}$ and $\Delta \mathbf x _{2j}$ to minimize $\sum _{t=1}^{2} \sum _{ i=1}^{|S_t|} \xi _{ti}$.

Each training example $\mathbf x _{ti}$ ($i=1, \ldots , |S_t|, t=1, 2$) is associated with an error term $\xi _{ti}$ and the minimization of $\sum _{t=1}^{2} \sum _{ i=1}^{|S_t|} \xi _{ti}$ can be decomposed into subproblems of minimizing each error term $\xi _{ti}$:

$$\begin{aligned} \xi _{ti}&= \max \left\{ 0, \rho _t - (\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t)^T (\mathbf x _{ti} + \Delta \mathbf x _{ti}) \right\} \nonumber \\&= \max \left\{ 0, \rho _t - (\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t)^T\mathbf x _{ti} - (\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t)^T \Delta \mathbf x _{ti} \right\} \end{aligned}$$

(54)

From Eq. (54), it is seen that we can minimize $\xi _{ti}$ by maximizing $(\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t)^T \Delta \mathbf x _{ti}$. According to the Cauchy–Schwarz inequality [59], it has

$$\begin{aligned} - ||\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t|| \cdot ||\mathbf x _{ti}|| \le (\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t)^T \Delta \mathbf x _{ti} \le ||\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t|| \cdot ||\mathbf x _{ti}|| \end{aligned}$$

(55)

In Eq. (55) becomes equation if and only if $\Delta \mathbf x _{ti} = c (\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t)$, where $c$ is a constant number. Since $\Delta \mathbf x _{ti}$ is bounded by $\delta _{ti}$, the optimal value of $\Delta \mathbf x _{ti}$ is

$$\begin{aligned} \Delta \mathbf x _{ti}= \delta _{ti} \frac{\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t}{||\overline{\mathbf{w }}_0 + \overline{\mathbf{v }}_t||}, \ \ i=1, \ldots , |S_t|, \ \ t=1, 2. \end{aligned}$$

(56)

$\square $

1.3 Proof for Theorem 3

We fix $\overline{\mathbf{w }}_0,\overline{\mathbf{v }}_1,\overline{\mathbf{v }}_2,\overline{\rho }_1$ and $\overline{\rho }_2$, and focus on minimizing each $\xi =\max \{ 0, \overline{\rho }_t - \frac{1}{2} \sum _{h=1}^{2} \sum _{j=1}^{|S_h|} \alpha _{hj} K(\mathbf x _{hj}+\Delta \overline{\mathbf{x }}_{hj}, \mathbf x + \Delta \mathbf x ) - \frac{1}{2 C_t}\sum _{j=1}^{|S_t|} \alpha _{tj} K(\mathbf x _{tj}+\Delta \overline{\mathbf{x }}_{tj}, \mathbf x + \Delta \mathbf x )$ ($\mathbf x \in S_t, t=1, 2$) over $\Delta \mathbf x $. According to the first order Taylor expansion of $K(\cdot )$ in Eq. (21), it is easy to deduce

$$\begin{aligned}&\frac{1}{2} \sum _{h=1}^{2} \sum _{j=1}^{|S_h|} \alpha _{hj} K(\mathbf x _{hj}+\Delta \overline{\mathbf{x }}_{hj}, \mathbf x + \Delta \mathbf x ) + \frac{1}{2 C_t}\sum _{j=1}^{|S_t|} \alpha _{tj} K(\mathbf x _{tj}+\Delta \overline{\mathbf{x }}_{tj}, \mathbf x + \Delta \mathbf x ) \nonumber \\&\quad = \frac{1}{2} \sum _{h=1}^{2} \sum _{j=1}^{|S_h|} \alpha _{hj} K(\mathbf x _{hj}+\Delta \overline{\mathbf{x }}_{hj}, \mathbf x ) + \Delta \mathbf x ^T \frac{1}{2} \sum _{h=1}^{2} \sum _{j=1}^{|S_h|} \alpha _{hj} K'(\mathbf x _{hj}+\Delta \overline{\mathbf{x }}_{hj}, \mathbf x ) \ \ \ \ \ \ \nonumber \\&\qquad + \frac{1}{2 C_t}\sum _{j=1}^{|S_t|} \alpha _{tj} K(\mathbf x _{tj}+\Delta \overline{\mathbf{x }}_{tj}, \mathbf x ) + \Delta \mathbf x ^T \frac{1}{2 C_t}\sum _{j=1}^{|S_t|} \alpha _{tj} K'(\mathbf x _{tj}+\Delta \overline{\mathbf{x }}_{tj}, \mathbf x ) \ \ \ \ \ \ \ \ \ \ \ \nonumber \\&\quad = \frac{1}{2} \sum _{h=1}^{2} \sum _{j=1}^{|S_h|} \alpha _{hj} K(\mathbf x _{hj}+\Delta \overline{\mathbf{x }}_{hj}, \mathbf x ) + \frac{1}{2 C_t}\sum _{j=1}^{|S_t|} \alpha _{tj} K(\mathbf x _{tj}+\Delta \overline{\mathbf{x }}_{tj}, \mathbf x ) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \nonumber \\&\qquad + \Delta \mathbf x ^T \left[ \frac{1}{2} \sum _{h=1}^{2} \sum _{j=1}^{|S_h|} \alpha _{hj} K'(\mathbf x _{hj}+\Delta \overline{\mathbf{x }}_{hj}, \mathbf x ) + \frac{1}{2 C_t}\sum _{j=1}^{|S_t|} \alpha _{tj} K'(\mathbf x _{tj}+\Delta \overline{\mathbf{x }}_{tj}, \mathbf x ) \right] \qquad \end{aligned}$$

(57)

Similar to Sect. 8.2, by using the Cauchy–Schwarz inequality, the optimal value of $\Delta \mathbf x _{ti}$ is as follows

$$\begin{aligned} \Delta \mathbf{x}_{ti}= \delta _{ti } \frac{\mathbf{u}_{ti}}{||\mathbf{u}_{ti}||}, \ \ t=1, 2 \end{aligned}$$

where it has

$$\begin{aligned} \mathbf u _{ti}= \frac{1}{2} \sum _{h=1}^{2} \sum _{j=1}^{|S_h|} \alpha _{hj} K'(\mathbf x _{hj}+\Delta \overline{\mathbf{x }}_{hj}, \mathbf x ) + \frac{1}{2 C_t} \sum _{j=1}^{|S_t|} \alpha _{tj} K'(\mathbf x _{tj}+\Delta \overline{\mathbf{x }}_{tj}, \mathbf x ). \end{aligned}$$

$\square $

1.4 Proof for Theorem 4

Let $\alpha _{ti}\ge 0$ and $\beta _{ti} \ge 0$ be Lagrange multipliers. Based on the Lagrange multipliers, the Lagrange function of problem (29) can be given as

$$\begin{aligned} L&= ||\mathbf w ||^2 - {\mathbb {\rho }}^T \mathbf e + \sum _{t=1}^{K} C_t \sum _{j=1}^{|S_t|} \xi _{ti} \nonumber \\&\quad + \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} ({\mathbb {\rho }}^T \mathbf e _t - \xi _{ti} - \mathbf w ^T \overline{\mathbf{z }}(\mathbf x _{ti},t)) - \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \beta _{ti} \end{aligned}$$

(58)

Differentiating the Lagrange function (58) with $\mathbf w ,{\mathbb {\rho }},\xi _{ti}$ leads to

$$\begin{aligned}&\frac{\partial L}{\partial \mathbf w }= 2 \mathbf w - \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} \overline{\mathbf{z }}(\mathbf x _{ti},t) =0 \ \ \ \ \ \ \ \ \ \ \ \ \end{aligned}$$

(59)

$$\begin{aligned}&\frac{\partial L}{\partial {\mathbb {\rho }}}= - \mathbf e + \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} \mathbf e _t =0, \end{aligned}$$

(60)

$$\begin{aligned}&\frac{\partial L}{\partial \xi _{ti}}= C_t - \alpha _{ti} - \beta _{ti} =0. \end{aligned}$$

(61)

According to Eqs. (59)–(61), we can obtain

$$\begin{aligned}&\mathbf w = \frac{1}{2} \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} \overline{\mathbf{z }}(\mathbf x _{ti},t), \end{aligned}$$

(62)

$$\begin{aligned}&\sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} \mathbf e _t =\mathbf e , \end{aligned}$$

(63)

$$\begin{aligned}&0 \le \alpha _{ti} \le C_{t}. \ \ \ \ \ \ \ \ \ \end{aligned}$$

(64)

By substituting (62)–(64) to problem (29), the dual form can be given as

$$\begin{aligned} \max&- \frac{1}{4} \sum _{t=1}^{K} \sum _{h=1}^{K} \sum _{i=1}^{|S_t|} \sum _{j=1}^{|S_h|} \alpha _{ti} \overline{\mathbf{z }}(\mathbf x _{ti},t)^T \overline{\mathbf{z }}(\mathbf x _{hj},h) \alpha _{hj}, \nonumber \\ \mathrm{s.t.}&\sum _{t=1}^{K} \sum _{j=1}^{|S_t|} \alpha _{ti} \mathbf e _t =\mathbf e , \nonumber \\&0 \le \alpha _{ti} \le C_{t}, \ \ i=1, \ldots , |S_t|, \ \ t=1, \ldots , K. \end{aligned}$$

(65)

$\square $

1.5 Proof for Theorem 6

We fix $ \mathbf w ^{\phi }$ and $\rho $ to be $\overline{\mathbf{w }}^{\phi }$ and $\overline{\rho }$, respectively, and minimize each $\xi _{hj} =\max \{ 0, \overline{\rho }_h^T \mathbf e _h - (\mathbf w ^{\phi })^T \phi (\overline{\mathbf{z }}(\mathbf x _{hj}, h)) \}$ ($\mathbf x _{hj} \in S_h, h=1, \ldots , K$) over $\Delta \mathbf x _{hj}$. Since $\overline{\rho }_h^T \mathbf e _h$ is known, we minimize $\xi _{hj} $ by maximizing $(\mathbf w ^{\phi })^T \phi (\overline{\mathbf{z }}(\mathbf x _{hj}, h))$. Replacing $\overline{\mathbf{z }}(\mathbf x _{hj}, h)$ with $\phi (\overline{\mathbf{z }}(\mathbf x _{hj}, h))$ in Eq. (31) leads to

$$\begin{aligned} \mathbf w ^{\phi } = \frac{1}{2} \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} \phi (\overline{\mathbf{z }}(\mathbf x _{ti},t)) \end{aligned}$$

(66)

By employing the first order Taylor expansion of $K(\cdot )$ in Eq. (21) and substituting Eq. (66) into $(\mathbf w ^{\phi })^T \phi (\overline{\mathbf{z }}(\mathbf x _{hj}, h))$, it has

$$\begin{aligned}&(\mathbf w ^{\phi })^T \phi (\overline{\mathbf{z }}(\mathbf x _{hj}, h)) \nonumber \\&\quad = \frac{1}{2} \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} \phi (\overline{\mathbf{z }}(\mathbf x _{ti},t))^T \phi (\overline{\mathbf{z }}(\mathbf x _{hj}, h)) \nonumber \\&\quad = \frac{1}{2} \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} K(\mathbf z (\mathbf x _{ti},t)+ \Delta \overline{\mathbf{z }}(\mathbf x _{ti},t), \mathbf z (\mathbf x _{hj}, h)+\Delta \mathbf z (\mathbf x _{hj}, h)) \nonumber \\&\quad = \frac{1}{2} \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} K(\mathbf z (\mathbf x _{ti},t)+ \Delta \overline{\mathbf{z }}(\mathbf x _{ti},t), \mathbf z (\mathbf x _{hj},h)) \nonumber \\&\qquad + \frac{1}{2} \Delta \mathbf z (\mathbf x _{hj},h)^T \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} K'(\mathbf z (\mathbf x _{ti},t)+ \Delta \overline{\mathbf{z }}(\mathbf x _{ti},t), \mathbf z (\mathbf x _{hj},h)) \end{aligned}$$

(67)

By utilizing the Cauchy–Schwarz inequality, the optimal value of $\Delta \mathbf x _{hj}$ is

$$\begin{aligned} \Delta \mathbf z (\mathbf x _{hj},h) = \delta _{hj} \frac{\widetilde{\mathbf{u }}_{hj}}{||\widetilde{\mathbf{u }}_{hj}||}, \ \ j=1, \ldots , |S_h|, \ \ h=1, \ldots , K \end{aligned}$$

(68)

where it has

$$\begin{aligned} \widetilde{\mathbf{u }}_{hj}= \sum _{t=1}^{K} \sum _{i=1}^{|S_t|} \alpha _{ti} K'\left( \mathbf z (\mathbf x _{ti},t)+ \Delta \overline{\mathbf{z }}(\mathbf x _{ti},t), \mathbf z (\mathbf x _{hj},h)\right) . \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, Y., Liu, B., Yu, P.S. et al. A robust one-class transfer learning method with uncertain data. Knowl Inf Syst 44, 407–438 (2015). https://doi.org/10.1007/s10115-014-0765-8

Download citation

Received: 14 July 2013
Revised: 10 May 2014
Accepted: 03 July 2014
Published: 29 July 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10115-014-0765-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust one-class transfer learning method with uncertain data

Abstract

Access this article

Similar content being viewed by others

One-Class Transfer Learning with Uncertain Data

Multi-class support vector machine based on the minimization of class variance

LMSVCR: novel effective method of semi-supervised multi-classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof for Theorem 1

1.2 Proof for Theorem 2

1.3 Proof for Theorem 3

1.4 Proof for Theorem 4

1.5 Proof for Theorem 6

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A robust one-class transfer learning method with uncertain data

Abstract

Access this article

Similar content being viewed by others

One-Class Transfer Learning with Uncertain Data

Multi-class support vector machine based on the minimization of class variance

LMSVCR: novel effective method of semi-supervised multi-classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof for Theorem 1

1.2 Proof for Theorem 2

1.3 Proof for Theorem 3

1.4 Proof for Theorem 4

1.5 Proof for Theorem 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation