Skip to main content

Abstract

Machine learning is a subset of artificial intelligence. This chapter presents first a machine learning tree, and then focuses on the matrix algebra methods in machine learning including single-objective optimization, feature selection, principal component analysis, and canonical correlation analysis together with supervised, unsupervised, and semi-supervised learning and active learning. More importantly, this chapter highlights selected topics and advances in machine learning: graph machine learning, reinforcement learning, Q-learning, and transfer learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acar, E., Camtepe, S.A., Krishnamoorthy, M., Yener, B.: Modeling and multiway analysis of chatroom tensors. In: Proceedings of the IEEE International Conference on Intelligence and Security Informatics, pp. 256–268. Springer, Berlin (2005)

    Chapter  Google Scholar 

  2. Acar, E., Aykut-Bingo, C., Bingo, H., Bro, R., Yener, B.: Multiway analysis of epilepsy tensors. Bioinformatics 23, i10–i18 (2007)

    Article  Google Scholar 

  3. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1992)

    Google Scholar 

  4. Ali, M.M., Khompatraporn, C., Zabinsky, Z.B.: A numerical evaluation of several stochastic algorithms on selected continuous global optimization on test problems. J. Global Optim. 31, 635–672 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Aliu, O.G., Imran, A., Imran, M.A., Evans, B.: A survey of self organisation in future cellular networks. IEEE Commun. Surveys Tutorials. 15(1), 336–361 (2013)

    Article  Google Scholar 

  6. Anderberg, M.R.: Cluster Analysis for Application. Academic, New York (1973)

    MATH  Google Scholar 

  7. Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984)

    MATH  Google Scholar 

  8. Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)

    MathSciNet  MATH  Google Scholar 

  9. Angluin D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)

    MathSciNet  Google Scholar 

  10. Arnold, A., Nallapati, R., Cohen, W.W.: A comparative study of methods for transductive transfer learning. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pp. 77–82 (2007)

    Google Scholar 

  11. Atlas, L., Cohn, D., Ladner, R., El-Sharkawi, M.A., Marks II, R.J.: Training connectionist networks with queries and selective sampling. In: Advances in Neural Information Processing Systems 2, Morgan Kaufmann, pp. 566–573 (1990)

    Google Scholar 

  12. Auslender, A.: Optimisation Méthodes Numériques. Masson, Paris (1976)

    MATH  Google Scholar 

  13. Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3, 1–48 (2002)

    MathSciNet  MATH  Google Scholar 

  14. Bagheri, M., Nurmanova, V., Abedinia, O., Naderi, M.S.: Enhancing power quality in microgrids with a new online control Strategy for DSTATCOM using reinforcement learning algorithm. IEEE Access 6, 38986–38996 (2018)

    Article  Google Scholar 

  15. Bandyopdhyay, S., Maulik, U.: An evolutionary technique based on K-means algorithm for optimal clustering in \(\mathbb {R}^N\). Inform. Sci. 146(1–4), 221–237 (2002)

    Google Scholar 

  16. Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. Inf. Theory. 44(2), 525–536 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  17. Baum, L.E., Eagon, J.A.: An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull. Amer. Math. Soc. 73(3), 360 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  18. Behbood, V., Lu, J., Zhang, G.: Fuzzy bridged refinement domain adaptation: long-term bank failure prediction. Int. J. Comput Intell. Appl. 12(1), Art. no. 1350003 (2013)

    Article  Google Scholar 

  19. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  20. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  21. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  22. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  23. Bersini, H., Dorigo, M., Langerman, S.: Results of the first international contest on evolutionary optimization. In: Proceedings of IEEE International Conference on Evolutionary Computation, Nagoya, pp. 611–615 (1996)

    Google Scholar 

  24. Bertsekas, D.P.: Dynamic Programming and Optimal Sequence of States of the Markov Decision Process. Control, vol. 11. Athena Scientific, Nashua (1995)

    Google Scholar 

  25. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)

    Google Scholar 

  26. Beyer, H.G., Schwefel, H.P.: Evolution strategies: a comprehensive introduction. J. Nat. Comput. 1(1), 3–52 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  27. Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128 (2006)

    Google Scholar 

  28. Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, Boom-Boxes and Blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 432–439 (2007)

    Google Scholar 

  29. Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning (2001)

    Google Scholar 

  30. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theorem (COLT 98), pp. 92–100 (1998)

    Google Scholar 

  31. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  32. Bouneffouf, D.: Exponentiated gradient exploration for active learning. Computers 5(1), 1–12 (2016)

    Article  Google Scholar 

  33. Bouneffouf, D., Laroche, R., Urvoy, T., Fèraud, R., Allesiardo, R.: Contextual bandit for active learning: Active Thompson sampling. In: Proceedings of the 21st International Conference on Neural Information Processing, ICONIP (2014)

    Google Scholar 

  34. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Analy. Mach. Intell. 26(9), 1124–1137 (2004)

    Article  MATH  Google Scholar 

  35. Breiman, L.: Better subset selection using the nonnegative garrote. Technometrics 37, 738–754 (1995)

    Article  MATH  Google Scholar 

  36. Bro, R.: PARAFAC: tutorial and applications. Chemome. Intell. Lab. Syst. 38, 149–171 (1997)

    Article  Google Scholar 

  37. Bu, F.: A high-order clustering algorithm based on dropout deep learning for heterogeneous data in Cyber-Physical-Social systems. IEEE Access 6, 11687–11693 (2018)

    Article  Google Scholar 

  38. Buczak, A.L., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tut. 18(2), 1153–1176 (2016)

    Article  Google Scholar 

  39. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)

    Article  Google Scholar 

  40. Burr, S.: Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, Retrieved 2014-11-18 (2010)

    Google Scholar 

  41. Cai, D., Zhang, C., He, S.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD, July 25–28, Washington, pp. 333–342 (2010)

    Google Scholar 

  42. Campbell, C., Cristianini, N., Smola, A.: Query learning with large margin classifiers. In: Proceedings of the International Conference on Machine Learning (ICML) (2000)

    Google Scholar 

  43. Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted 1 minimization. J. Fourier Analy. Appl. 14(5–6), 877–905 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  44. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 1–37 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  45. Caruana, R.A.: Multitask learning. Mach. Learn. 28, 41–75 (1997)

    Article  Google Scholar 

  46. Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Wilisky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572–596 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  47. Chang, C.I., Du, Q.: Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 42(3), 608–619 (2004)

    Article  Google Scholar 

  48. Chattopadhyay, R., Sun, Q., Fan, W., Davidson, I., Panchanathan, S., Ye, J.: Multisource domain adaptation and its application to early detection of fatigue. ACM Trans. Knowl. Discov. From Data 6(4), 1–26 (2012)

    Article  Google Scholar 

  49. Chen, T., Amari, S., Lin, Q.: A unified algorithm for principal and minor components extraction. Neural Netw. 11, 385–390 (1998)

    Article  Google Scholar 

  50. Chen, Y., Lasko, T.A., Mei, Q., Denny, J.C, Xu, H.: A study of active learning methods for named entity recognition in clinical text. J. Biomed. Inform. 58, 11–18 (2015)

    Article  Google Scholar 

  51. Chernoff, H.: Sequential analysis and optimal design. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 8. SIAM, Philadelphia (1972)

    Google Scholar 

  52. Choromanska, A., Jebara, T., Kim, H., Mohan, M., Monteleoni, C.: Fast spectral clustering via the Nyström method. In: International Conference on Algorithmic Learning Theory ALT 2013, pp. 367–381 (2013)

    MATH  Google Scholar 

  53. Chung, F.R.K.: Spectral graph theory. In: CBMS Regional Conference Series, vol.92. Conference Board of the Mathematical Sciences, Washington (1997)

    Google Scholar 

  54. Chung, C.J., Reynolds, R.G.: CAEP: An evolution-based tool for real-valued function optimization using cultural algorithms. Int. J. Artif. Intell. Tool 7(3), 239–291 (1998)

    Article  Google Scholar 

  55. Ciresan, D.C., Meier, U., Schmidhuber, J.: Transfer learning for Latin and Chinese characters with deep neural networks. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Brisbane, pp. 1–6 (2012)

    Google Scholar 

  56. Coates, A., Ng, A.Y.: Learning feature representations with K-means. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn., pp. 561–580. Springer, Berlin (2012)

    Chapter  Google Scholar 

  57. Cohen, W.W.: Fast effective rule induction. In: Proceedings of the 12th International Conference on International Conference on Machine Learning, Lake Tahoe, pp. 115–123 (1995)

    Chapter  Google Scholar 

  58. Cohn, D.: Active learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 10–14 (2011)

    Google Scholar 

  59. Cohn, D., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. J. Artific. Intell. Res. 4, 129–145 (1996)

    Article  MATH  Google Scholar 

  60. Comon, P., Golub, G., Lim, L.H., Mourrain, B.: Symmetric tensors and symmetric tensor rank. SIAM J. Matrix Anal. Appl. 30(3), 1254–1279 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  61. Corana, A., Marchesi, M., Martini, C., Ridella, S.: Minimizing multimodal functions of continuous variables with simulated annealing algorithms. ACM Trans. Math. Softw. 13(3), 262–280 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  62. Correa, N.M., Adali, T., Li, Y.Q., Calhoun, V.D.: Canonical correlation analysis for data fusion and group inferences. IEEE Signal Proc. Mag. 27(4), 39–50 (2010)

    Article  Google Scholar 

  63. Cortes, C., Mohri, M.: On transductive regression. In: Proceedings of the Neural Information Processing Systems (NIPS), pp. 305–312 (2006)

    Google Scholar 

  64. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001)

    MATH  Google Scholar 

  65. Cristianini, N., Shawe-Taylor, J., Elisseeff, A., Kandola, J.S.: On kernel-target alignment. In: NIPS’01 Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, pp. 367–373 (2001)

    Google Scholar 

  66. Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 193–200 (2007)

    Google Scholar 

  67. Dai, W., Xue, G., Yang, Q., Yu, Y.: Transferring naive Bayes classifiers for text classification. In: Proc. 22nd Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, pp. 540–545 (2007)

    Google Scholar 

  68. Dai, W., Jin, O., Xue, G.-R., Yang, Q., Yu, Y.: EigenTransfer: A unified framework for transfer learning. In: Proceedings of the the 26th International Conference on Machine Learning, Montreal, pp. 193–200 (2009)

    Google Scholar 

  69. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)

    Article  Google Scholar 

  70. Daumé III, H.: Frustratingly easy domain adaptation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 256–263 (2007)

    Google Scholar 

  71. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the international Conference on Machine Learning, pp. 209–216 (2007)

    Google Scholar 

  72. Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, vol. 27, pp. 1646–1654 (2014)

    Google Scholar 

  73. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, pp. 3837–3845 (2016)

    Google Scholar 

  74. Deng, Z., Choi, K., Jiang, Y.: Generalized hidden-mapping ridge regression, knowledge-leveraged inductive transfer learning for neural networks, fuzzy systems and kernel method. IEEE Trans. Cybern. 44(12), 2585–2599 (2014)

    Article  Google Scholar 

  75. Dhillon, I.S., Modha, D.M.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42(1), 143–175 (2001)

    Article  MATH  Google Scholar 

  76. Dong, X., Thanou, D., Frossard, P., Vandergheynst, P.: Learning Laplacian matrix in smooth graph signal representations. IEEE Trans. Sign. Proc. 64(23), 6160–6173 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  77. Donoho, D.L., Johnstone, I.: Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90, 1200–1224 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  78. Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1(1), 53–66 (1997)

    Article  Google Scholar 

  79. Douglas, S.C., Kung, S.-Y., Amari, S.: A self-stabilized minor subspace rule. IEEE Sign. Proc. Lett. 5(12), 328–330 (1998)

    Article  Google Scholar 

  80. Downie, J.S.: A window into music information retrieval research. Acoust. Sci. Technol. 29(4), 247–255 (2008)

    Article  Google Scholar 

  81. Du, Q., Faber, V., Gunzburger, M.: Centroidal Voronoi tessellations: applications and algorithms. SIAM Rev. 41, 637–676 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  82. Duan, L., Tsang, I.W., Xu, D., Maybank, S.J.: Domain transfer SVM for video concept detection. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1375–1381 (2009)

    Google Scholar 

  83. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  84. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Statist. 32, 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  85. El-Attar, R.A., Vidyasagar, M., Dutta, S.R.K.: An algorithm for II-norm minimization with application to nonlinear II-approximation. SIAM J. Numer. Anal. 16(1), 70–86 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  86. Estienne, F., Matthijs, N., Massart, D.L., Ricoux, P., Leibovici, D.: Multi-way modeling of high-dimensionality electroencephalographic data. Chemometr. Intell. Lab. Syst. 58(1), 59–72 (2001)

    Article  Google Scholar 

  87. Fan, J., Han, F., Liu, H.: Challenges of big data analysis. Nat. Sci. Rev. 1(2), 293–314 (2014)

    Article  Google Scholar 

  88. Farhadi, A., Forsyth, D., White, R.: Transfer learning in sign language. In: Proceedings of the IEEE 2007 Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)

    Google Scholar 

  89. Farmer, J., Packard, N., Perelson, A.: The immune system, adaptation and machine learning. Phys. D: Nonlinear Phenom. 2, 187–204 (1986)

    Article  MathSciNet  Google Scholar 

  90. Fedorov, V.V.: Theory of Optimal Experiments. (Trans. by Studden, W.J., Klimko, E.M.). Academic, New York (1972)

    Google Scholar 

  91. Fercoq, O., Richtárk, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  92. Figueiredo, M.A.T., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signa. Proc. 1(4), 586–597 (2007)

    Article  Google Scholar 

  93. Finkel, J.R., Manning, C.D.: Hierarchical Bayesian domain adaptation. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, pp. 602–610 (2009)

    Google Scholar 

  94. Fisher, R.A.: The statistical utilization of multiple measurements. Ann. Eugenic. 8, 376–386 (1938)

    Article  MATH  Google Scholar 

  95. Ford, L., Fulkerson, D.: Flows in Networks. Princeton University Press, Princeton (1962)

    MATH  Google Scholar 

  96. Freund, Y.: Boosting a weak learning algorithm by majority. Inform. Comput. 12(2), 256–285 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  97. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  98. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997)

    Article  MATH  Google Scholar 

  99. Friedman, J., Hastie, T., Höeling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  100. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  101. Fu, W.J.: Penalized regressions: the bridge versus the Lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)

    MathSciNet  Google Scholar 

  102. Fuchs, J.J.: Multipath time-delay detection and estimation. IEEE Trans. Signal Process. 47(1), 237–243 (1999)

    Article  Google Scholar 

  103. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Article  Google Scholar 

  104. Ge, Z., Song, Z., Ding, S.X., Huang, B.: Data mining and analytics in the process industry: the role of machine learning. IEEE Access 5, 20590–20616 (2017)

    Article  Google Scholar 

  105. Geladi, P., Kowalski, B.R.: Partial least squares regression: a tutorial. Anal. Chim. Acta 186, l–17 (1986)

    Google Scholar 

  106. George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach. Learn. 65(1), 167–198 (2006)

    Article  Google Scholar 

  107. Goldberg, D.E., Holland, J.H.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)

    Article  Google Scholar 

  108. Golub, G.H., Zha, H.: The canonical correlations of matrix pairs and their numerical computation. In: Linear Algebra for Signal Processing, pp. 27–49. Springer, Berlin (1995)

    Chapter  Google Scholar 

  109. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  110. Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems, vol. 17, pp. 529–536 (2005)

    Google Scholar 

  111. Guo, W., Kotsia, I., Ioannis, P.: Tensor learning for regression. IEEE Trans. Image Process. 21(2), 816–827 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  112. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  113. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  114. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    Article  Google Scholar 

  115. Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)

    Article  MATH  Google Scholar 

  116. Hesterberg, T., Choi, N.H., Meier, L., Fraley, C.: Least angle and 1 penalized regression: a review. Stat. Surv. 2, 61–93 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  117. Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimates for non-orthogonal problems. Technometrics 12, 55–67 (1970)

    Article  MATH  Google Scholar 

  118. Hoerl, A.E., Kennard, R.W.: Ridge regression: applications to nonorthogonal problems. Technometrics 12, 69–82 (1970)

    Article  MATH  Google Scholar 

  119. Hoi, S.C.H., Jin, R., Lyu, M.R.: Batch mode active learning with applications to text categorization and image retrieval. IEEE Trans. Knowl. Data Eng. 21(9), 1233–1247 (2009)

    Article  Google Scholar 

  120. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)

    Article  MATH  Google Scholar 

  121. Höskuldsson, A.: PLS regression methods. J. Chemometr. 2, 211–228 (1988)

    Article  Google Scholar 

  122. Hotelling, H.: Relations between two sets of variants. Biometrika 28(3/4), 321–377 (1936)

    Article  MATH  Google Scholar 

  123. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)

    Article  Google Scholar 

  124. Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. B Cybern. 42(2), 513–529 (2012)

    Article  Google Scholar 

  125. Hunter, D.R., Lange, K.: A tutorial on MM algorithms. Amer. Statist. 58, 30–37 (2004)

    Article  MathSciNet  Google Scholar 

  126. Jain, K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  127. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)

    Article  Google Scholar 

  128. Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)

    Article  Google Scholar 

  129. Jamil, M., Yang, X.-S.: A literature survey of benchmark functions for global optimization problems. Int. J. Math. Modell. Numer. Optim. 4(2), 150–194 (2013)

    MATH  Google Scholar 

  130. Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, New York (2001)

    Book  MATH  Google Scholar 

  131. Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning, pp. 200–209 (1999)

    Google Scholar 

  132. Johnson, S.C.: Hierarchical clustering schemes. Psycioietrika 32(3), 241–254 (1967)

    Article  MATH  Google Scholar 

  133. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, vol. 26, pp. 315–323 (2013)

    Google Scholar 

  134. Jolliffe, I.: Principal Component Analysis. Springer, New York (1986)

    Book  MATH  Google Scholar 

  135. Jonesb, S., Shaoa, L., Dub, K.: Active learning for human action retrieval using query pool selection. Neurocomputing 124, 89–96 (2014)

    Article  Google Scholar 

  136. Jouffe, L.: Fuzzy inference system learning by reinforcement methods. IEEE Trans. Syst. Man Cybern. Part C 28(3), 338–355 (1998)

    Article  Google Scholar 

  137. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Article  Google Scholar 

  138. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  139. Kan, M., Wu, J., Shan, S., Chen, X.: Domain adaptation for face recognition: targetize source domain bridged by common subspace. Int. J. Comput. Vis. 109(1–2), 94–109 (2014)

    Article  MATH  Google Scholar 

  140. Kearns, M., Valiant, L.: Crytographic limitations on learning Boolean formulae and finite automata. In: Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing, pp. 433–444 (1989); See J. ACM 41(1), 67–95 (1994)

    Google Scholar 

  141. Kearns, M.J., Vazirani, U.V.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)

    Book  Google Scholar 

  142. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks (ICNN), vol. IV, pp. 1942–1948 (1995)

    Google Scholar 

  143. Kiers, H.A.L.: Towards a standardized notation and terminology in multiway analysis. J. Chemometr. 14, 105–122 (2000)

    Article  MathSciNet  Google Scholar 

  144. Kimura, A., Kameoka, H., Sugiyama, M., Nakano, T., Maeda, E., Sakano, H., Ishiguro, K.: SemiCCA: Efficient semi-supervised learning of canonical correlations. Inform. Media Technol. 8(2), 311–318 (2013)

    Google Scholar 

  145. Klaine, P.V., Imran, M.A., Souza, R.D., Onireti, O.: A survey of machine learning techniques applied to self-organizing cellular networks. IEEE Commun. Surv. Tut. 19(4), 2392–2431 (2017)

    Article  Google Scholar 

  146. Kloft, M., Brefeld, U., Sonnenburg, S., and Zien, A.: p-norm multiple kernel learning. J. Mach. Learn. Res. 12, 953–997 (2011)

    Google Scholar 

  147. Kober, J., Bangell, J., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robustics Res. 32(11), 1238–1274 (2013)

    Article  Google Scholar 

  148. Kocer, B., Arslan, A.: Genetic transfer learning. Expert Syst. Appl. 37(10), 6997–7002 (2010)

    Article  Google Scholar 

  149. Kolda, T.G.: Multilinear operators for higher-order decompositions. Sandia Report SAND2006-2081, California (2006)

    Google Scholar 

  150. Kolda, T.G., Bader, B.W., Kenny, J.P.: Higher-order web link analysis using multilinear algebra. In: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 242–249 (2005)

    Google Scholar 

  151. Konečný J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Signa. Process. 10(2), 242–255 (2016)

    Article  Google Scholar 

  152. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  153. Kulis, B., Saenko, K., Darrell, T.: What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: Proceedings of the IEEE 2011 Conference on Computer Vision and Pattern Recognition, pp. 1785–1292 (2011)

    Google Scholar 

  154. Lathauwer, L.D., Moor, B.D., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  155. Lathauwer, L.D., Nion, D.: Decompositions of a higher-order tensor in block terms—part III: alternating least squares algorithms. SIAM J. Matrix Anal. Appl. 30(3), 1067–1083 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  156. Le Roux, N., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems, vol. 25, pp. 2663–2671 (2012)

    Google Scholar 

  157. Letexier, D., Bourennane, S., Blanc-Talon, J.: Nonorthogonal tensor matricization for hyperspectral image filtering. IEEE Geosci. Remote Sensing. Lett. 5(1), 3–7 (2008)

    Article  Google Scholar 

  158. Levie, R., Monti, F., Bresson, X., Bronstein, M.M.: CayleyNets: Graph convolutional neural networks with complex rational spectral filters (2018). Available at: https://arXiv:1705.07664v2

    Google Scholar 

  159. Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. ACM/Springer, New York/Berlin (1994)

    Chapter  Google Scholar 

  160. Li, X., Guo, Y.: Adaptive active learning for image classification. In: Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2013)

    Google Scholar 

  161. Li, F., Pan, S.J., Jin, O., Yang, Q., Zhu, X.: Cross-domain co-extraction of sentiment and topic lexicons. In: Proceedings of the 50th annual meeting of the association for computational linguistics long papers, vol. 1, pp. 410–419 (2012)

    Google Scholar 

  162. Li, W., Duan, L., Xu, D., Tsang, I.: Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1134–1148 (2014)

    Article  Google Scholar 

  163. Lin, L.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8, 293–321 (1992)

    Google Scholar 

  164. Lin, Z., Chen, M., Ma, Y.: The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report UILU-ENG-09-2215 (2009)

    Google Scholar 

  165. Ling, X., G.-R. Xue, G. -R., Dai, W., Jiang, Y., Yang, Q., Yu, Y.: Can Chinese Web pages be classified with English data source? In: Proceedings of the 17th International Conference on World Wide Web, pp. 969–978 (2008)

    Google Scholar 

  166. Liu, J., Wright, S.J., Re, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. J. Mach. Learn. Res., 16, 285–322 (2015)

    MathSciNet  MATH  Google Scholar 

  167. Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., Zhang, G.: Transfer learning using computational intelligence: a survey. Knowl. Based Syst. 80, 14–23 (2015)

    Article  Google Scholar 

  168. Luis, R., Sucar, L.E., Morales, E.F.: Inductive transfer for learning Bayesian networks. Mach. Learn. 79(1–2), 227–255 (2010)

    Article  MathSciNet  Google Scholar 

  169. Luo, F.L., Unbehauen, R., Cichock, R.: A minor component analysis algorithm. Neural Netw. 10(2), 291–297 (1997)

    Article  Google Scholar 

  170. Ma, Y., Luo, G., Zeng, X., Chen, A.: Transfer learning for cross-company software defect prediction. Inform. Softw. Technol. 54(3), 248–256 (2012)

    Article  Google Scholar 

  171. Ma, Y., Gong, W., Mao, F.: Transfer learning used to analyze the dynamic evolution of the dust aerosol. J. Quant. Spectrosc. Radiat. Transf. 153, 119–130 (2015)

    Article  Google Scholar 

  172. Mahalanobis, P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)

    MathSciNet  MATH  Google Scholar 

  173. Maier, M., von Luxburg, U., Hein, M.: How the result of graph clustering methods depends on the construction of the graph. ESAIM: Probab. Stat. 17, 370–418 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  174. Masci, J., Meier, U., Ciresan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Proceedings of the 21st International Conference on Artificial Neural Networks, Part I, Espoo, pp. 52–59 (2011)

    Google Scholar 

  175. Massy, W.F.: Principal components regression in exploratory statistical research. J. Am. Stat. Assoc. 60(309), 234–256 (1965)

    Article  Google Scholar 

  176. McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 359–367 (1998)

    Google Scholar 

  177. Michalski, R.: A theory and methodology of inductive learning. Mach. Learn. 1, 83–134 (1983)

    Google Scholar 

  178. Miller, G.A., Nicely, P.E.: An analysis of perceptual confusions among some English consonants. J. Acoust. Soc. Am. 27, 338–352 (1955)

    Article  Google Scholar 

  179. Mishra, S.K.: Global optimization by differential evolution and particle swarm methods: Evaluation on some benchmark functions. Munich Research Papers in Economics (2006). Available at: https://mpra.ub.uni-muenchen.de/1005/

  180. Mishra, S.K.: Performance of differential evolution and particle swarm methods on some relatively Harder multi-modal benchmark functions (2006). Available at: https://mpra.ub.uni-muenchen.de/449/

  181. Mitchell, T.M.: Machine Learning, vol. 45. McGraw Hill, Burr Ridge (1997)

    MATH  Google Scholar 

  182. Mitra, P., Murthu, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)

    Article  Google Scholar 

  183. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  184. Mohar, B.: Some applications of Laplace eigenvalues of graphs. In: Hahn, G., Sabidussi, G. (eds.) Graph Symmetry: Algebraic Methods and Applications. NATO Science Series C, vol.497, pp. 225–275. Kluwer, Dordrecht (1997)

    Chapter  Google Scholar 

  185. Moulton, C.M., Roberts, S.A., Calatn, P.H.: Hierarchical clustering of multiobjective optimization results to inform land-use decision making. URISA J. 21(2), 25–38 (2009)

    Google Scholar 

  186. Murthy, C.A., Chowdhury, N.: In search of optimal clusters using genetic algorithms. Pattern Recog. Lett. 17, 825–832 (1996)

    Article  Google Scholar 

  187. Narayanan, H., Belkin, M., Niyogi, P.: On the relation between low density separation, spectral clustering and graph cuts. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 1025–1032. MIT Press, Cambridge (2007)

    Google Scholar 

  188. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  189. Ng, V., Vardie, C.: Weakly supervised natural language learning without redundant views. In: Proceedings of the Human Language Technology/Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), Main Papers, pp. 94–101 (2003)

    Google Scholar 

  190. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 849–856. MIT Press, Cambridge (2002)

    Google Scholar 

  191. Nguyen, H.D.: An introduction to Majorization-minimization algorithms for machine learning and statistical estimation. WIREs Data Min. Knowl. Discovery 7(2), e1198 (2017)

    Article  MathSciNet  Google Scholar 

  192. Niculescu-Mizil, A., Caruana, R.: Inductive transfer for Bayesian network structure learning. In: Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS), San Juan (2007)

    Google Scholar 

  193. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the International Conference on Information and Knowledge Management (CIKM), pp. 86–93 (2000)

    Google Scholar 

  194. Oja, E., Karhunen, J.: On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. J. Math Anal. Appl. 106, 69–84 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  195. Ogoe, H.A., Visweswaran, S., Lu, X., Gopalakrishnan, V.: Knowledge transfer via classification rules using functional mapping for integrative modeling of gene expression data. BMC Bioinform. 16, 1–15 (2015)

    Article  Google Scholar 

  196. Oquab, M., Bottou, L., Laptev, I.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)

    Google Scholar 

  197. Ortega, J.M., Rheinboldt, W.C.: Iterative Solutions of Nonlinear Equations in Several Variables, pp. 253–255. Academic, New York (1970)

    Chapter  MATH  Google Scholar 

  198. Owen, A.B.: A robust hybrid of lasso and ridge regression. Prediction and Discovery (Contemp. Math.), 443, 59–71 (2007)

    Google Scholar 

  199. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  200. Pan, S.J., Kwok, J.T., Yang, Q., Pan, J.J.: Adaptive localization in a dynamic WiFi environment through multi-view learning. In: Proceedings of the 22nd Association for the Advancement of Artificial Intelligence (AAAI) Conference Artificial Intelligence, pp. 1108–1113 (2007)

    Google Scholar 

  201. Pan, S.J., Kwok, J.T., Yang, Q.: Transfer learning via dimensionality reduction. In: Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 2, pp. 677–682 (2008)

    Google Scholar 

  202. Pan, S.J., Shen, D., Yang, Q., Kwok, J.T.: Transferring localization models across space. In: Proceedings of the 23rd Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, pp. 1383–1388 (2008)

    Google Scholar 

  203. Pan, S.J., Tsang, I.W., Kwok, J.T, Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)

    Article  Google Scholar 

  204. Parra, L., Spence, C., Sajda, P., Ziehe, A., Muller, K.: Unmixing hyperspectral data. In: Advances in Neural Information Processing Systems, vol. 12, pp. 942–948. MIT Press, Cambridge (2000)

    Google Scholar 

  205. Patel, V.M, Gopalan, R., Li, R., Chellappa, R.: Visual domain adaptation: a survey of recent advances. IEEE Signal Process. Mag. 32(3), 53–69 (2015)

    Article  Google Scholar 

  206. Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6(3), 21–45 (2006)

    Article  Google Scholar 

  207. Prettenhofer, P., Stein, B.: Cross-language text classification using structural correspondence learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1118–1127 (2010)

    Google Scholar 

  208. Price, W.L.: A controlled random search procedure for global optimisation. Comput. J. 20(4), 367–370 (1977)

    Article  MATH  Google Scholar 

  209. Rahnamayan, S., Tizhoosh, H.R., Salama, N.M.M.: Opposition-based differential evolution. IEEE Trans. Evol. Comput. 12(1), 64–79 (2008)

    Article  Google Scholar 

  210. Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: Transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, Corvallis, pp. 759–766 (2007)

    Google Scholar 

  211. Rajagopal, A.N., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Ramak-rishnan, K.R., Sebe, N.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vis. 109(1–2), 146–167 (2014)

    Article  Google Scholar 

  212. Richtárik, P., Takáč M.: Parallel coordinate descent methods for big data optimization. Math. Program. Ser. A 156, 433–484 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  213. Rivli, J.: An Introduction to the Approximation of Functions. Courier Dover Publications, New York (1969)

    Google Scholar 

  214. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  215. Rosipal, R., Krämer, N.: Overview and recent advances in partial least squares. In: Proceedings of the Workshop on Subspace, Latent Structure and Feature Selection (SLSFS) 2005, pp. 34–51 (2006)

    Article  Google Scholar 

  216. Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  217. Roy, D.M., Kaelbling, L.P.: Efficient Bayesian task-level transfer learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, pp. 2599–2604 (2007)

    Google Scholar 

  218. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University (1994)

    Google Scholar 

  219. Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Proceedings of the European Conference on Computer Vision, vol. 6314, pp. 213–226 (2010)

    Google Scholar 

  220. Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)

    Article  Google Scholar 

  221. Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5, 197–227 (1990)

    Google Scholar 

  222. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Technical Report, INRIA, hal-0086005 (2013). See also Math. Program. 162, 83–112 (2017)

    Google Scholar 

  223. Schwefel, H.P.: Numerical Optimization of Computer Models. Wiley, Hoboken (1981)

    MATH  Google Scholar 

  224. Settles, B., Craven, M., Friedland, L.: Active learning with real annotation costs. In: Proceedings of the NIPS Workshop on Cost-Sensitive Learning, pp. 1–10 (2008)

    Google Scholar 

  225. Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: Advances in Neural Information Processing Systems (NIPS), vol.20, pp. 1289–1296, MIT Press, Cambridge (2008)

    Google Scholar 

  226. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the ACM Workshop on Computational Learning Theory, pp. 287–294 (1992)

    Google Scholar 

  227. Shell, J., Coupland, S.: Towards fuzzy transfer learning for intelligent environments. Ambient. Intell. Lect. Notes Comput. Sci. 7683, 145–160 (2012)

    Article  Google Scholar 

  228. Shell, J., Coupland, S.: Fuzzy transfer learning: Methodology and application. Inform. Sci. 293, 59–79 (2015)

    Article  Google Scholar 

  229. Shen, H., Tan, Y., Lu, J., Wu, Q., Qiu, Q.: Achieving autonomous power management using reinforcement learning. ACM Trans. Des. Autom. Electron. Syst. 18(2), 24:1–24:32 (2013)

    Article  Google Scholar 

  230. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  231. Shuman, D.I., Vandergheynst, P., Frossard, P.: Chebyshev polynomial approximation for distributed signal processing. In: Proceedings of the International Conference on Distributed Computing in Sensor Systems, Barcelona, pp. 1–8 (2011)

    Google Scholar 

  232. Shuman, D.I., Narang, S.K, Frossard, P., Ortega, A., Vandergheynst, P.: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30(3), 83–98 (2013)

    Article  Google Scholar 

  233. Silver, D.L., Mercer, R.E.: The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness. In: Thrun, S., Pratt, L.Y. (eds.) Learning to Learn, pp. 213–233. Kluwer Academic, Boston (1997)

    Google Scholar 

  234. Sindhwani, V., Niyogi, P., Belkin, M.: Beyond the point cloud: From transductive to semi-supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning (ICML), pp. 824–831. ACM, New York (2005)

    Google Scholar 

  235. Smola, J., Kondor, R.: Kernels and regularization on graphs. In: Learning Theory and Kernel Machines, pp. 144–158. Springer, Berlin (2003)

    Chapter  MATH  Google Scholar 

  236. Song, J., Babu, P., Palomar, D.P.: Optimization methods for designing sequences with low autocorrelation sidelobes. IEEE Trans. Signal Process. 63(15), 3998–4009 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  237. Sriperumbudur, B.K., Torres, D.A., Lanckriet, G.R.G.: A majorization-minimization approach to the sparse generalized eigenvalue problem. Mach. Learn. 85, 3–39 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  238. Sun, J., Zeng, H., Liu, H., Lu, Y., Chen, Z.: CubeSVD: a novel approach to personalized web search. In: Proceedings of the 14th International Conference on World Wide Web, pp. 652–662 (2005)

    Google Scholar 

  239. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning Series. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  240. Tang, K., Li, X., Suganthan, P.N., Yang, Z., Weise, T.: Benchmark functions for the CEC’2010 special session and competition on large-scale global optimization. Technical Report, 2009. Available at: https://www.researchgate.net/publication/228932005

    Google Scholar 

  241. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J.,, Mei, Q.: LINE: Large-scale information network embedding. In: Proceedings of the International World Wide Web Conference Committee (IW3C2), Florence, pp. 1067–1077 (2015)

    Google Scholar 

  242. Tao, D., Li, X., Wu, X., Hu, W., Maybank, S.J.: Supervised tensor learning. Knowl. Inform. Syst. 13, 1–42 (2007)

    Article  Google Scholar 

  243. Thrun, S., Pratt, L. (eds.): Learning to Learn. Kluwer Academic, Dordrecht (1998)

    MATH  Google Scholar 

  244. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  245. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Roy. Statist. Soc. B 63(2), 411–423 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  246. Tikhonov, A.: Solution of incorrectly formulated problems and the regularization method. Soviet Math. Dokl., 4, 1035–1038 (1963)

    MATH  Google Scholar 

  247. Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Wiley, New York (1977)

    MATH  Google Scholar 

  248. Tokic, M., Palm, G.: Value-difference based exploration: Adaptive control between epsilon-greedy and softmax. In: KI 2011: Advances in Artificial Intelligence, pp. 335–346 (2011)

    Chapter  Google Scholar 

  249. Tommasi, T., Orabona, F., Caputo, B.: Safety in numbers: learning categories from few examples with multi model knowledge transfer. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition 2010, pp. 3081–3088 (2010)

    Google Scholar 

  250. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 3, 45–66 (2001)

    MATH  Google Scholar 

  251. Tou, J.T., Gonzalez, R.C.: Pattern Recognition Principles. Addison-Wesley, London (1974)

    MATH  Google Scholar 

  252. Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-Learning. Mach. Learn. 16, 185–202 (1994)

    MATH  Google Scholar 

  253. Uurtio, V., Monteiro, J.M., Kandola, J., Shawe-Taylor, J., Fernandez-Reyes, D., Rousu, J.: A tutorial on canonical correlation methods. ACM Comput. Surv. 50(95), 14–38 (2017)

    Google Scholar 

  254. Valiant, L.G.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984)

    Article  MATH  Google Scholar 

  255. van Hasselt, H.: Double Q-learning. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 2613–2621 (2010)

    Google Scholar 

  256. Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensembles: TensorFaces. In: Proceedings of the European Conference on Computer Vision, Copenhagen, pp. 447–460 (2002)

    Google Scholar 

  257. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  258. Wang, H., Ahuja, N.: Compact representation of multidimensional data using tensor rank-one decomposition. In: Proceedings of the International Conference on Pattern Recognition, vol. 1, pp. 44–47 (2004)

    Google Scholar 

  259. Wang, X., Qian, B., Davidson, I.: On constrained spectral clustering and its applications. Data Min. Knowl. Disc. 28, 1–30 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  260. Wang, L., Hua, X., Yuan, B., Lu, J.: Active learning via query synthesis and nearest neighbour search. Neurocomputing 147, 426–434 (2015)

    Article  Google Scholar 

  261. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. ACM, New York (2016)

    Google Scholar 

  262. Watanabe, S.: Pattern Recognition: Human and Mechanical. Wiley, New York (1985)

    Google Scholar 

  263. Watldns, C.J.C.H.: Learning from delayed rewards. PhD Thesis, University of Cambridge, England (1989)

    Google Scholar 

  264. Watkins, C.J.C.H., Dayan, R.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    MATH  Google Scholar 

  265. Weenink, D.: Canonical correlation analysis. IFA Proc. 25, 81–99 (2003)

    Google Scholar 

  266. Wei, X.-Y., Yang, Z.-Q.: Coached active learning for interactive video search. In: Proceedings of the ACM International Conference on Multimedia, pp. 443–452 (2011)

    Google Scholar 

  267. Wei, X., Cao, B. Yu, P.S.: Nonlinear joint unsupervised feature selection. In: Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 414–422 (2016)

    Google Scholar 

  268. Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(9), 1–40 (2016)

    Google Scholar 

  269. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)

    MATH  Google Scholar 

  270. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Mateo (2011)

    MATH  Google Scholar 

  271. Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)

    Article  Google Scholar 

  272. Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 87, pp. 20:3–20:56 (2009)

    Google Scholar 

  273. Wright, J., Ganesh, A., Yang, A.Y., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Reconginit. Mach. Intell. 31(2), 210–227 (2009)

    Article  Google Scholar 

  274. Wold, H.: Path models with latent variables: The NIPALS approach. In: Blalock, H.M., et al. (eds.) Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building, pp. 307–357. Academic, Cambridge (1975)

    Chapter  Google Scholar 

  275. Wold, S., Sjöström, M., Eriksson, L.: PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58(2), 109–130 (2001)

    Article  Google Scholar 

  276. Wooldridge, M.J., Jennings, N.R.: Intelligent agent: theory and practice. Knowl. Eng. Rev. 10(2), 115–152 (1995)

    Article  Google Scholar 

  277. Wu, P., Dietterich, T.G.: Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 871–878 (2004)

    Google Scholar 

  278. Wu, T.T., Lange, K.: The MM alternative to EM. Statist. Sci. 25(4), 492–505 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  279. Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1101–1113 (1993)

    Article  Google Scholar 

  280. Xia, R., Zong, C., Hu, X., Cambria, E.: Feature ensemble plus sample selection: domain adaptation for sentiment classification. IEEE Intell. Syst. 28(3), 10–18 (2013)

    Article  Google Scholar 

  281. Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22, 418–435 (1992)

    Article  Google Scholar 

  282. Xu, L., Oja, E., Suen, C.: Modified Hebbian learning for curve and surface fitting. Neural Netw. 5, 441–457 (1992)

    Article  Google Scholar 

  283. Xu, H., Caramanis, C., Mannor, S.: Robust regression and Lasso. IEEE Trans. Inform. Theory 56(7), 3561–3574 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  284. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  285. Yamauchi, K.: Covariate shift and incremental learning. In: Advances in Neuro-Information Processing, pp. 1154–1162. Springer, Berlin (2009)

    Chapter  Google Scholar 

  286. Yan, S., Wang, H.: Semi-supervised Learning by sparse representation. In: Proceedings of the SIAM International Conference on Data Mining, Philadelphia, pp. 792–801 (2009)

    Google Scholar 

  287. Yang, B.: Projection approximation subspace tracking. IEEE Trans. Signal Process. 43, 95–107 (1995)

    Article  Google Scholar 

  288. Yen, T.-J.: A majorization-minimization approach to variable selection using spike and slab priors. Ann. Stat. 39(3), 1748–1775 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  289. Yin, J., Yang, Q., Ni, L.M.: Adaptive temporal radio maps for indoor location estimation. In: Proceedings of the Third IEEE International Conference on Pervasive Computing and Communications (2005)

    Google Scholar 

  290. Yu, K., Zhang, T., Gong, Y.: Nonlinear learning using local coordinate coding. In: Advances in Neural Information Processing Systems, vol. 22, pp. 2223–2231 (2009)

    Google Scholar 

  291. Yu, H., Sun, C., Yang, W., Yang, X., Zuo, X.: AL-ELM: One uncertainty-based active learning algorithm using extreme learning machine. Neurocomputing 166(20), 140–150 (2015)

    Article  Google Scholar 

  292. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. Ser. B 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  293. Yuan, G.-X., Ho, C.-H., Lin, C.-J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)

    Article  Google Scholar 

  294. Zhang, X.D.: Matrix Analysis and Applications. Cambridge University Press, Cambridge (2017)

    Book  MATH  Google Scholar 

  295. Zhang, Z., Coutinho, E., Deng, J., Schuller, B.: Cooperative learning and its application to emotion recognition from speech. IEEE Trans. Audio Speech Lang. Process. 23(1), 115–126 (2015)

    Google Scholar 

  296. Zhang, Z., Pan, Z., Kochenderfer, M.J.: Weighted Double Q-learning. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 3455–3461 (2017)

    Google Scholar 

  297. Zheng, V.W., Pan, S.J., Yang, Q., Pan, J.J.: Transferring multi-device localization models using latent multi-task learning. In: Proceedings of the 23rd Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, pp. 1427–1432 (2008)

    Google Scholar 

  298. Zheng, V.W., Yang, Q., Xiang, W., Shen, D.: Transferring localization models over time. In: Proceedings of the 23rd Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, pp. 1421–1426 (2008)

    Google Scholar 

  299. Zhou, Y., Goldman, S.: Democratic co-learning. In: Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 594–602 (2004)

    Google Scholar 

  300. Zhou, Z.-H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17, 1529–1541 (2005)

    Article  Google Scholar 

  301. Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: Proceedings of the ICML Workshop on Statistical Relational Learning, pp. 132–137 (2004)

    Google Scholar 

  302. Zhou, J., Chen, J., Ye, J.: Multi-task learning: Theory, algorithms, and applications (2012). Available at: https://archive.siam.org/meetings/sdm12/zhou_-chen_-ye.pdf

  303. Zhou, Z.-H., Zhan, D.-C., Yang, Q.: Semi-supervised learning with very few labeled training examples. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07) (2007)

    Google Scholar 

  304. Zhu, X.: Semi-Supervised Learning Literature Survey. Computer Sciences TR 1530, University of Wisconsin, Madison, (2005)

    Google Scholar 

  305. Zhu, X., Goldberg, A.B.: Introduction to Semi-Supervised Learning. In: Brachman, R.J., Dietterich, T. (eds.) Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypoo, San Rafael (2009)

    Google Scholar 

  306. Zhu, X., Ghahramani, Z., Laffer, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington (2003)

    Google Scholar 

  307. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B, 67(2), 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  308. Zou, H., Hastie,, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhang, XD. (2020). Machine Learning. In: A Matrix Algebra Approach to Artificial Intelligence. Springer, Singapore. https://doi.org/10.1007/978-981-15-2770-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2770-8_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2769-2

  • Online ISBN: 978-981-15-2770-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics