Transfer learning by centroid pivoted mapping in noisy environment

Huy, Thach Nguyen; Tong, Bin; Shao, Hao; Suzuki, Einoshin

doi:10.1007/s10844-012-0226-3

Transfer learning by centroid pivoted mapping in noisy environment

Published: 09 November 2012

Volume 41, pages 39–60, (2013)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Thach Nguyen Huy¹,
Bin Tong¹,
Hao Shao¹ &
…
Einoshin Suzuki^1,2

313 Accesses
2 Citations
Explore all metrics

Abstract

Transfer learning is a widely investigated learning paradigm that is initially proposed to reuse informative knowledge from related domains, as supervised information in the target domain is scarce while it is sufficiently available in the multiple source domains. One of the challenging issues in transfer learning is how to handle the distribution differences between the source domains and the target domain. Most studies in the research field implicitly assume that data distributions from the source domains and the target domain are similar in a well-designed feature space. However, it is often the case that label assignments for data in the source domains and the target domain are significantly different. Therefore, in reality even if the distribution difference between a source domain and a target domain is reduced, the knowledge from multiple source domains is not well transferred to the target domain unless the label information is carefully considered. In addition, noisy data often emerge in real world applications. Therefore, considering how to handle noisy data in the transfer learning setting is a challenging problem, as noisy data inevitably cause a side effect during the knowledge transfer. Due to the above reasons, in this paper, we are motivated to propose a robust framework against noise in the transfer learning setting. We also explicitly consider the difference in data distributions and label assignments among multiple source domains and the target domain. Experimental results on one synthetic data set, three UCI data sets and one real world text data set in different noise levels demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Under Data Shift for Domain Adaptation: A Model-Based Co-clustering Transfer Learning Solution

Weighted Multisource Tradaboost

K-Nearest Neighbor Based Local Distribution Alignment

Notes

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD conference (pp. 94–105).
Ankerst, M., Breunig, M.M., Kriegel, H., Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. SIGMOD Record, 28(2), 49–60.
Article Google Scholar
Argyriou, A., Evgeniou, T., Pontil, M. (2006). Multi-task feature learning. In NIPS (pp. 41–48).
Blitzer, J., McDonald, R., Pereira, F. (2006). Domain adaptation with structural correspondence learning. In EMNLP (pp. 120–128).
Brodley, C.E., & Friedl, M.A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research (JAIR), 11, 131–167.
MATH Google Scholar
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E. (2001). Introduction to algorithms, section 26.2, “The Floyd–Warshall algorithm” (2nd ed., pp. 558–565). McGraw-Hill Higher Education.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
Article MATH Google Scholar
Dai, W., Yang, Q., Xue, G.R., Yu, Y. (2007). Boosting for transfer learning. In ICML (pp. 193–200).
Ester, M., Kriegel, H., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD (pp. 226–231).
Fellegi, I.P., & Holt, D. (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71(353), 17–35.
Article Google Scholar
Ferri, F.J., Albert, J.V., Vidal, E. (1999). Considerations about sample-size sensitivity of a family of edited nearest-reighbor rules. Transactions on Systems, Man, and Cybernetics, Part B, 29(5), 667–672.
Article Google Scholar
Frommberger, L. (2007). Generalization and transfer learning in noise-affected robot navigation tasks. In EPIA workshops (pp. 508–519).
Gutstein, S., Fuentes, O., Freudenthal, E. (2008). The utility of knowledge transfer for noisy data. In FLAIRS conference (pp. 59–64).
Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques (2nd ed.). Morgan Kaufmann.
Hickey, R.J. (1996). Noise modelling and evaluating learning from examples. Artificial Intelligence, 82(1–2), 157–179.
Article MathSciNet Google Scholar
Hinneburg, A., & Keim, D.A. (1998). An efficient approach to clustering in large multimedia databases with noise. In KDD (pp. 58–65).
Huang, J., Smola, A.J., Gretton, A., Borgwardt, K.M., Schölkopf, B. (2006). Correcting sample selection bias by unlabeled data. In NIPS (pp. 601–608).
Indrajit, B., Godbole, S., Joshi, S., Verma, A. (2009). Cross-guided clustering: transfer of relevant supervision across domains for improved clustering. In ICDM (pp. 41–50).
Joachims, T. (1999). Transductive inference for text classification using support vector machines. In ICML (pp. 200–209).
Lee, J.A., & Verleysen, M. (2007). Nonlinear dimensionality reduction. Springer Science.
Ling, X., Xue, G.R., Dai, W., Jiang, Y., Yang, Q., Yu, Y. (2008). Can Chinese web pages be classified with english data source? In WWW (pp. 969–978).
Liu, Q., Liao, X., Carin, H.L., Stack, J.R., Carin, L. (2009). Semisupervised multitask learning. IEEE Transactions on PAMI, 31, 1074–1086.
Article Google Scholar
Liu, Q., Xu, Q., Zheng, V.W., Xue, H., Cao, Z., Yang, Q. (2010). Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study. BMC Bioinformatics, 11, 181.
Article Google Scholar
Manning, C.D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
MATH Google Scholar
Pan, S.J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359.
Article Google Scholar
Parrish, N., & Gupta, M.R. (2011). Bayesian transfer learning for noisy channels. In IEEE statistical signal processing workshop (pp. 269–272).
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Google Scholar
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Book Google Scholar
Rückert, U., & Kramer, S. (2008). Kernel-based inductive transfer. In ECML/PKDD (pp. 220–233).
Schaffer, C. (1992). Sparse data and the effect of overfitting avoidance in decision tree induction. In AAAI (pp. 147–152).
Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10, 153–178.
Google Scholar
Schiffman, S.S., Reynolds, M.L., Young, F.W. (1981). Introduction to multidimensional scaling: Theory, methods, and applications. New York: Erlbaum Associates.
MATH Google Scholar
Schlimmer, J.C., & Granger, R.H. (1986). Incremental learning from noisy data. Machine Learning, 1(3), 317–354.
Google Scholar
Schwaighofer, A., Tresp, V., Yu, K. (2004). Learning Gaussian process kernels via hierarchical bayes. In NIPS (pp. 1209–1216)
Shao, H., Tong, B., Suzuki, E. (2011). Compact coding for hyperplane classifiers in heterogeneous environment. In ECML/PKDD (3) (pp. 207–222).
Shi, X., Fan, W., Ren, J. (2008). Actively transfer domain knowledge. In ECML/PKDD (pp. 342–357).
Shi, X., Fan, W., Yang, Q., Ren, J. (2009a). Relaxed transfer of different classes via spectral partition. In ECML/PKDD (pp. 366–381).
Shi, Y., Lan, Z., Liu, W., Bi, W. (2009b). Extending semi-supervised learning methods for inductive transfer learning. In ICDM (pp. 483–492).
Tenenbaum, J.B., Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Article Google Scholar
Teng, C.M. (1999). Correcting noisy data. In ICML (pp. 239–248).
Vapnik, V.N. (1995). The nature of statistical learning theory. New York, NY: Springer.
Book MATH Google Scholar
Yamazaki, K., Kawanabe, M., Watanabe, S., Sugiyama, M., Müller, K. (2007). Asymptotic Bayesian generalization error when training and test distributions are different. In ICML (pp. 1079–1086).
Zheng, V.W., Pan, S.J., Yang, Q., Pan, J.J. (2008). Transferring multi-device localization models using latent multi-task learning. In AAAI (pp. 1427–1432).
Zhu, X. (2005). Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison.
Zhu, X., & Wu, X. (2005). Cost-constrained data acquisition for intelligent data preparation. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1542–1556.
Article Google Scholar
Zhu, X., Wu, X., Chen, Q. (2003). Eliminating class noise in large datasets. In ICML (pp. 920–927).
Zhu, X., Wu, X., Yang, Y. (2004). Error detection and impact-sensitive instance ranking in noisy datasets. In AAAI (pp. 378–384).

Download references

Author information

Authors and Affiliations

Graduate School of Systems Life Sciences, Kyushu University, Kyushu, Japan
Thach Nguyen Huy, Bin Tong, Hao Shao & Einoshin Suzuki
Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Kyushu, Japan
Einoshin Suzuki

Authors

Thach Nguyen Huy
View author publications
You can also search for this author in PubMed Google Scholar
Bin Tong
View author publications
You can also search for this author in PubMed Google Scholar
Hao Shao
View author publications
You can also search for this author in PubMed Google Scholar
Einoshin Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thach Nguyen Huy.

Appendix

1.1 A.1 DBSCAN algorithm

The DBSCAN algorithm is proposed by Ester et al. (1996). A summary of the DBSCAN algorithm is shown in Algorithm 4.

1.2 A.2 Multidimensional scale (MDS) algorithm

Multidimensional scaling is a statistical technique to visualize the dissimilarity in data (Schiffman et al. 1981). We summarized the MDS algorithm in Algorithm 6. Given a matrix of pair-wise distances, MDS computes the coordinates for the data. Subsequently, the algorithm performs an eigen-decomposition of the data, and then the top d eigenvectors of the distance matrix are selected to represent the coordinates in the new d-dimensional Euclidean space.

1.3 A.3 Results in table format

This section contains results in table formats. By results in tables, readers have more detail information.

Table 1 Experimental results of the synthetic data set

Full size table

Table 2 Experimental results of mushroom data set

Full size table

Table 3 Experimental results of kr vs kp data set

Full size table

Table 4 Experimental results of splice data set

Full size table

Table 5 Experimental results of rec vs talk data set

Full size table

Table 6 Experimental results of rec vs sci data set

Full size table

Table 7 Experimental results of sci vs talk data set

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huy, T.N., Tong, B., Shao, H. et al. Transfer learning by centroid pivoted mapping in noisy environment. J Intell Inf Syst 41, 39–60 (2013). https://doi.org/10.1007/s10844-012-0226-3

Download citation

Received: 26 December 2011
Revised: 18 July 2012
Accepted: 07 October 2012
Published: 09 November 2012
Issue Date: August 2013
DOI: https://doi.org/10.1007/s10844-012-0226-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transfer learning by centroid pivoted mapping in noisy environment

Abstract

Access this article

Similar content being viewed by others