Abstract
Many efforts on domain adaptation focus on stationary environments and assume that the target domain samples are available before the learning process. However, real-world applications frequently involve the availability of non-stationary data sequentially. This study develops an unsupervised heterogeneous domain adaptation approach to address non-stationary scenarios where data streams continually feed the learning model. This process employs a fuzzy-based model that has been trained on a different but related domain. Subsequently, a neighborhood-based weight assignment fine-tunes the attraction and repulsion between neighbors based on prior knowledge about their domains and the similarity between class labels. To avoid unnecessary adaptation for each target domain chunk, domain adaptation is triggered only when concept drift is detected. This way, the model gradually adjusts to the evolving data, incorporating the unique characteristics of the new domain. When no drift is detected, existing parameters are reused for feature adaptation. At the end, the source domain is updated by incorporating the drifting data and their predicted labels. The proposed method offers several advantages, including avoidance of excessive alignment, reduction in domain adaptation cost, and a gradual reduction in dependency on the source domain for domain adaptation. To evaluate the method’s performance, experiments were conducted on several tasks extracted from two benchmark datasets, considering different types of concept drift. The experimental results demonstrate that the proposed model significantly improves classification accuracy while reducing computational time.
Similar content being viewed by others
Data availability
All data supporting the findings of this study are available within the paper and its Supplementary Information.
References
Ruiz Sánchez E.: Learning rules in data stream mining: algorithms and applications, in, Universidad de Granada (2021)
Lobo, J.L., Del Ser, J., Bifet, A., Kasabov, N.: Spiking neural networks and online learning: an overview and perspectives. Neural Netw. 121, 88–100 (2020)
Wen, Y.-M., Liu, S.: Semi-supervised classification of data streams by BIRCH ensemble and local structure mapping. J. Comput. Sci. Technol. 35, 295–304 (2020)
Yu, H., Liu, W., Lu, J., Wen, Y., Luo, X., Zhang, G.: Detecting group concept drift from multiple data streams. Pattern Recogn. 134, 109113 (2023)
Yu, H., Zhang, Q., Liu, T., Lu, J., Wen, Y., Zhang, G.: Meta-ADD: a meta-learning based pre-trained model for concept drift active detection. Inform. Sci. 608, 996–1009 (2022)
Yu H., Liu T., Lu J., Zhang G.: Automatic learning to detect concept drift, arXiv preprint arXiv:2105.01419 (2021)
Maciąg, P.S., Kryszkiewicz, M., Bembenik, R., Lobo, J.L., Del Ser, J.: Unsupervised anomaly detection in stream data with online evolving spiking neural networks. Neural Netw. 139, 118–139 (2021)
Wiwatcharakoses, C., Berrar, D.: SOINN+, a self-organizing incremental neural network for unsupervised learning from noisy data streams. Expert Syst. Appl. 143, 113069 (2020)
de Mello, R.F., Vaz, Y., Grossi, C.H., Bifet, A.: On learning guarantees to unsupervised concept drift detection on data streams. Expert Syst. Appl. 117, 90–102 (2019)
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22, 199–210 (2010)
Ding, Y., Jia, M., Zhuang, J., Cao, Y., Zhao, X., Lee, C.-G.: Deep imbalanced domain adaptation for transfer learning fault diagnosis of bearings under multiple working conditions. Reliab. Eng. Syst. Saf. 230, 108890 (2023)
Hong, Y., Chern, W.-C., Nguyen, T.V., Cai, H., Kim, H.: Semi-supervised domain adaptation for segmentation models on different monitoring settings. Autom. Constr. 149, 104773 (2023)
Tian, Q., Ma, C., Zhang, F.-Y., Peng, S., Xue, H.: Source-free unsupervised domain adaptation with sample transport learning. J. Comput. Sci. Technol. 36, 606–616 (2021)
Xu, Z., Pang, S., Zhang, T., Luo, X.-P., Liu, J., Tang, Y.-T., Yu, X., Xue, L.: Cross project defect prediction via balanced distribution adaptation based transfer learning. J. Comput. Sci. Technol. 34, 1039–1062 (2019)
Zhao, P., Hoi, S.C., Wang, J., Li, B.: Online transfer learning. Artif. Intell. 216, 76–102 (2014)
Chandra S., Haque A., Khan L., Aggarwal C.: An adaptive framework for multistream classification, in: Proceedings of the 25th ACM international on conference on information and knowledge management (2016), pp. 1181–1190
Liu, F., Lu, J., Zhang, G.: Unsupervised heterogeneous domain adaptation via shared fuzzy equivalence relations. IEEE Trans. Fuzzy Syst. 26, 3555–3568 (2018)
Omran, T.M., Sharef, B.T., Grosan, C., Li, Y.: Transfer learning and sentiment analysis of Bahraini dialects sequential text data using multilingual deep learning approach. Data Knowl. Eng. 143, 102106 (2023)
Farahani A., Voghoei S., Rasheed K., Arabnia H.R.: A brief review of domain adaptation, Advances in data science and information engineering (2021) pp. 877–894
Chen D., Zhu H., Yang S.: UC-SFDA: source-free domain adaptation via uncertainty prediction and evidence-based contrastive learning, knowledge-based systems (2023) p. 110728
Khan, S., Asim, M., Khan, S., Musyafa, A., Wu, Q.: Unsupervised domain adaptation using fuzzy rules and stochastic hierarchical convolutional neural networks. Comput. Electr. Eng. 105, 108547 (2023)
Li, H., He, F., Pan, Y.: Multi-objective dynamic distribution adaptation with instance reweighting for transfer feature learning. Knowl.-Based Syst. 263, 110303 (2023)
Du, H., He, L., Liu, P., Hao, X.: Inter-domain fusion and intra-domain style normalization network for unsupervised domain adaptive person re-identification. Digit. Signal Process. 133, 103848 (2023)
Liu, X., Prince, J.L., Xing, F., Zhuo, J., Reese, T., Stone, M., El Fakhri, G., Woo, J.: Attentive continuous generative self-training for unsupervised domain adaptive medical image translation. Med. Image Anal. 88, 102851 (2023)
Feitosa Neto, A., Canuto, A.M.P.: EOCD: an ensemble optimization approach for concept drift applications. Inform. Sci. 561, 81–100 (2021)
Chen, J., Lécué, F., Pan, J.Z., Deng, S., Chen, H.: Knowledge graph embeddings for dealing with concept drift in machine learning. J. Web Semant. 67, 100625 (2021)
Zheng, X., Li, P., Hu, X., Yu, K.: Semi-supervised classification on data streams with recurring concept drift and concept evolution. Knowl.-Based Syst. 215, 106749 (2021)
Page, E.S.: Continuous inspection schemes. Biometrika 41, 100–115 (1954)
Baidari, I., Honnikoll, N.: Bhattacharyya distance based concept drift detection method for evolving data stream. Expert Syst. Appl. 183, 115303 (2021)
Gama J., Medas P., Castillo G., Rodrigues P.: Learning with drift detection, in: Brazilian symposium on artificial intelligence, Springer (2004), pp. 286–295
Baena-Garcıa M., del Campo-Ávila J., Fidalgo R., Bifet A., Gavalda R., Morales-BuenoR.: Early drift detection method, in: Fourth international workshop on knowledge discovery from data streams (2006), pp. 77–86
Barros, R.S.M., Cabral, D.R.L., Gonçalves, P.M., Jr., Santos, S.G.T.C.: RDDM: reactive drift detection method. Expert Syst. Appl. 90, 344–355 (2017)
Ross, G.J., Adams, N.M., Tasoulis, D.K., Hand, D.J.: Exponentially weighted moving average charts for detecting concept drift. Pattern Recogn. Lett. 33, 191–198 (2012)
Frías-Blanco, I., Campo-Ávila, J.D., Ramos-Jiménez, G., Morales-Bueno, R., Ortiz-Díaz, A., Caballero-Mota, Y.: Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans. Knowl. Data Eng. 27, 810–823 (2015)
Bifet A., Gavalda R.: Learning from time-changing data with adaptive windowing, in: Proceedings of the 2007 SIAM international conference on data mining, SIAM (2007), pp. 443–448
Frias-Blanco, I., del Campo-Ávila, J., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Diaz, A., Caballero-Mota, Y.: Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans. Knowl. Data Eng. 27, 810–823 (2014)
Pears, R., Sakthithasan, S., Koh, Y.S.: Detecting concept change in dynamic data streams. Mach. Learn. 97, 259–293 (2014)
Pesaranghader, A., Viktor, H., Paquet, E.: Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams. Mach. Learn. 107, 1711–1743 (2018)
Goel, K., Batra, S.: Dynamically adaptive and diverse dual ensemble learning approach for handling concept drift in data streams. Comput. Intell. 38, 463–505 (2022)
Liao, G., Zhang, P., Yin, H., Deng, X., Li, Y., Zhou, H., Zhao, D.: A novel semi-supervised classification approach for evolving data streams. Expert Syst. Appl. 215, 119273 (2023)
Li, Y., Wang, Y., Liu, Q., Bi, C., Jiang, X., Sun, S.: Incremental semi-supervised learning on streaming data. Pattern Recogn. 88, 383–396 (2019)
Ren, S., Liao, B., Zhu, W., Li, K.: Knowledge-maximized ensemble algorithm for different types of concept drift. Inform. Sci. 430–431, 261–281 (2018)
Mohawesh, R., Tran, S., Ollington, R., Xu, S.: Analysis of concept drift in fake reviews detection. Expert Syst. Appl. 169, 114318 (2021)
Altendeitering, M., Dübler, S.: Scalable detection of concept drift: a learning technique based on support vector machines. Proc. Manuf. 51, 400–407 (2020)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996)
Wang, Y.-Y., Gu, J.-M., Wang, C., Chen, S.-C., Xue, H.: Discrimination-aware domain adversarial neural network. J. Comput. Sci. Technol. 35, 259–267 (2020)
Li, Y., Xu, J.-J., Zhao, P.-P., Fang, J.-H., Chen, W., Zhao, L.: ATLRec: an attentional adversarial transfer learning network for cross-domain recommendation. J. Comput. Sci. Technol. 35, 794–808 (2020)
Guo, H., Zhang, S., Wang, W.: Selective ensemble-based online adaptive deep neural networks for streaming data with concept drift. Neural Netw. 142, 437–456 (2021)
Ashfahani A., Pratama M.: Autonomous deep learning: continual learning approach for dynamic environments, in, SIAM, pp. 666–674
Hammami, Z., Mouelhi, W., Ben Said, L.: On-line self-adaptive framework for tailoring a neural-agent learning model addressing dynamic real-time scheduling problems. J. Manuf. Syst. 45, 97–108 (2017)
Mirza, B., Lin, Z.: Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification. Neural Netw. 80, 79–94 (2016)
Yu, H., Webb, G.I.: Adaptive online extreme learning machine by regulating forgetting factor by concept drift map. Neurocomputing 343, 141–153 (2019)
Sun, L., Ji, Y., Zhu, M., Gu, F., Dai, F., Li, K.: A new predictive method supporting streaming data with hybrid recurring concept drifts in process industry. Comput. Ind. Eng. 161, 107625 (2021)
Kuncheva, L.I.: Change detection in streaming multivariate data using likelihood detectors. IEEE Trans. Knowl. Data Eng. 25, 1175–1180 (2011)
Yamada M., Kimura A., Naya F., Sawada H.: Change-point detection with feature selection in high-dimensional time-series data. In: Proceedings of the twenty-third international joint conference on Artificial Intelligence, AAAI Press, Beijing, China, pp. 1827–1833 (2013)
Alippi C., Boracchi G., Carrera D., Roveri M.: Change detection in multivariate datastreams: likelihood and detectability loss, arXiv preprint arXiv:1510.04850 (2015)
Liu, S., Yamada, M., Collier, N., Sugiyama, M.: Change-point detection in time-series data by relative density-ratio estimation. Neural Netw. 43, 72–83 (2013)
Hushchyn, M., Ustyuzhanin, A.: Generalization of change-point detection in time series data based on direct density ratio estimation. J. Comput. Sci. 53, 101385 (2021)
Sethi, T.S., Kantardzic, M.: On the reliable detection of concept drift from streaming unlabeled data. Expert Syst. Appl. 82, 77–99 (2017)
Hamidzadeh, J., Rezaeenik, E., Moradi, M.: Predicting users’ preferences by fuzzy rough set quarter-sphere support vector machine. Appl. Soft Comput. 112, 107740 (2021)
Li, H., Zhang, N., Zhu, J., Wang, Y., Cao, H.: Probabilistic frequent itemset mining over uncertain data streams. Expert Syst. Appl. 112, 274–287 (2018)
Liu, Z., Loo, C.K., Pasupa, K., Seera, M.: Meta-cognitive recurrent kernel online sequential extreme learning machine with kernel adaptive filter for concept drift handling. Eng. Appl. Artif. Intell. 88, 103327 (2020)
Hamidzadeh, J., Moradi, M.: Incremental one-class classifier based on convex–concave hull. Pattern Anal. Appl. 23, 1523–1549 (2020)
Paudel, R., Eberle, W.: An approach for concept drift detection in a graph stream using discriminative subgraphs. ACM Trans.Knowl. Discov. Data (TKDD) 14, 1–25 (2020)
Amen, B., Faiz, S., Do, T.-T.: Big data directed acyclic graph model for real-time COVID-19 twitter stream detection. Pattern Recogn. 123, 108404 (2022)
Moulton R.H., Viktor H.L., Japkowicz N., Gama J.: Clustering in the presence of concept drift, in, Springer, pp. 339–355
Ding, G., Wang, Y., Li, C., Sun, H., Li, C., Wang, L., Yin, H., Huang, T.: HSCFC: high-dimensional streaming data clustering algorithm based on feedback control system. Futur. Gener. Comput. Syst. 146, 156–165 (2023)
Yuan, Y., Wang, Z., Wang, W.: Unsupervised concept drift detection based on multi-scale slide windows. Ad Hoc Netw. 111, 102325 (2021)
Wankhade, K.K., Jondhale, K.C., Dongre, S.S.: A clustering and ensemble based classifier for data stream classification. Appl. Soft Comput. 102, 107076 (2021)
Zhao, K., Jiang, H., Wu, Z., Lu, T.: A novel transfer learning fault diagnosis method based on manifold embedded distribution alignment with a little labeled data. J. Intell. Manuf. 33, 151–165 (2022)
Wang J., Feng W., Chen Y., Yu H., Huang M., Yu P.S.: Visual domain adaptation with manifold embedded distribution alignment, in: the 26th ACM international conference on Multimedia (2018), pp. 402–410
Ghifary, M., Balduzzi, D., Kleijn, W.B., Zhang, M.: Scatter component analysis: a unified framework for domain adaptation and domain generalization. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1414–1430 (2016)
Moradi, M., Hamidzadeh, J.: A domain adaptation method by incorporating belief function in twin quarter-sphere SVM. Knowl. Inform. Syst. 65, 3125–3163 (2023)
Morsing L.H., Sheikh-Omar O.A., Iosifidis A.: Supervised domain adaptation using graph embedding, in: the 25th International Conference on Pattern Recognition (ICPR), IEEE (2020), pp. 7841–7847
Sun, J., Wang, Z., Wang, W., Li, H., Sun, F.: Domain adaptation with geometrical preservation and distribution alignment. Neurocomputing 454, 152–167 (2021)
Yan, Y., Wu, Q., Tan, M., Ng, M.K., Min, H., Tsang, I.W.: Online heterogeneous transfer by hedge ensemble of offline and online decisions. IEEE Trans. Neural Netw. Learn. Syst. 29, 3252–3263 (2017)
Wu, Q., Zhou, X., Yan, Y., Wu, H., Min, H.: Online transfer learning by leveraging multiple source domains. Knowl. Inform. Syst. 52, 687–707 (2017)
Wu, Q., Wu, H., Zhou, X., Tan, M., Xu, Y., Yan, Y., Hao, T.: Online transfer learning with multiple homogeneous or heterogeneous sources. IEEE Trans. Knowl. Data Eng. 29, 1494–1507 (2017)
Liu, F., Zhang, G., Lu, J.: Heterogeneous domain adaptation: An unsupervised approach. IEEE Trans. Neural Netw. Learn. Syst. 31, 5588–5602 (2020)
Samat, A., Persello, C., Gamba, P., Liu, S., Abuduwaili, J., Li, E.: Supervised and semi-supervised multi-view canonical correlation analysis ensemble for heterogeneous domain adaptation in remote sensing image classification. Remote sens. 9, 337 (2017)
Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B., Smola, A.: Correcting sample selection bias by unlabeled data. Adv. Neural. Inform. Process. Syst. 19, 601–608 (2006)
Haque A., Wang Z., Chandra S., Dong B., Khan L., Hamlen K.W.: Fusion: an online method for multistream classification, in: Proceedings of the 2017 ACM on conference on information and knowledge management (2017), pp. 919–928
Pratama M., de Carvalho M., Xie R., Lughofer E., Lu J.: ATL: autonomous knowledge transfer from many streaming processes, in: Proceedings of the 28th ACM international conference on information and knowledge management (2019), pp. 269–278
Ye Y., Pan T., Meng Q., Li J., ShenH.T.: Online unsupervised domain adaptation via reducing inter-and intra-domain discrepancies, IEEE Transactions on Neural Networks and Learning Systems (2022)
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory 37, 145–151 (1991)
Lapin, M., Hein, M., Schiele, B.: Learning using privileged information: SVM+ and weighted SVM. Neural Netw. 53, 95–108 (2014)
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22, 199–210 (2011)
Read J.: Concept-drifting data streams are time series; the case for continuous adaptation, arXiv preprint arXiv:1810.02266 (2018)
Mikolov T., Chen K., Corrado G., Dean J.: Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781 (2013)
Arora S., Liang Y., MaT.: A simple but tough-to-beat baseline for sentence embeddings, in: International conference on learning representations (2017)
Sheskin D.J.: Handbook of parametric and nonparametric statistical procedures, crc Press (2003)
Acknowledgements
Not applicable
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
MM was involved in conceptualization, methodology, validation, investigation, writing—review & editing, analysis and interpretation of results. MR helped in conceptualization, methodology, supervision, review, investigation. AS contributed to conceptualization, supervision, investigation. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethics approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix I
Appendix I
This section contains some detailed information for computing the metric \(\mathcal{D}\) introduced by [17]. For each fuzzy feature vector \({\overline{A} }_{i}\left({a}_{i1}\text{,}{a}_{i2}\text{,}\dots \text{,}{a}_{in}\right)\in F({\mathbb{R}}^{n})\). For each \({\overline{a} }_{ij}\in F({\mathbb{R}})\), its membership function is computed by Eq. (I-1).
where \({a}_{ij}\) indicates the \(j\) th feature value of the \(i\) th sample and \({\rho }_{i}\) shows the hesitation degree of the \(i\) th sample considering a triangular membership function. Utilizing Eq. (I-2), \({\mu }_{\mathrm{ij}}({\varvec{x}}|{\overline{A} }_{i})\) where \({\varvec{x}}=({x}_{1}\text{,}{ x}_{2}\text{,}\dots \text{, }{x}_{n})\) is obtained.
To define the fuzzy relation between two heterogeneous domains (source and target), the following metric is defined to measure the distance between the fuzzy vectors.
where \(\lambda \) is the membership value, and \({\mathcal{D}}_{\lambda }\left(u\text{,}v\right)\) indicates the distance between points \(u\) and \(v\) in \({\mathbb{R}}^{n}\) with the given \(\lambda \). \(\Omega \left(\lambda \right)\) is computed by Eq. (I-4).
where \(d\left(v\text{,}u\right)\) is the \({l}_{1}\)-norm between two n-dimensional vectors (\(u\) and \(v\)). Note that the supremum operator (sup) in Eq. (I-3) indicates the longest distance between the fuzzy vector of one specific domain and the fuzzy set of another domain. Eq. (I-3) can be rewritten as Eq. (I-5).
The above equation is de-fuzzified regarding Eqs. (I-1) and (I-2) as follows.
Eq. (I-6) cannot be used directly for computing the fuzzy relation because it does not satisfy two properties of the fuzzy relation, including (1) symmetry, a condition in which \(\mathcal{D}\left({\overline{A} }_{i}\text{,}{\overline{A} }_{j}\right)=\mathcal{D}\left({\overline{A} }_{j}\text{,}{\overline{A} }_{i}\right)\text{, }\forall {\overline{A} }_{i}\text{, }{\overline{A} }_{j}\) and (2) reflexivity, a condition in which \(\mathcal{D}\left({\overline{A} }_{i}\text{,}{\overline{A} }_{j}\right)=1\text{,} \forall {\overline{A} }_{i}\). Thus, the following function is employed.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Moradi, M., Rahmanimanesh, M. & Shahzadi, A. Learning from streaming data with unsupervised heterogeneous domain adaptation. Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00463-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41060-023-00463-z