An adaptive and late multifusion framework in contextual representation based on evidential deep learning and Dempster–Shafer theory

El-Din, Doaa Mohey; Hassanein, Aboul Ella; Hassanien, Ehab E.

doi:10.1007/s10115-024-02150-2

An adaptive and late multifusion framework in contextual representation based on evidential deep learning and Dempster–Shafer theory

Regular Paper
Open access
Published: 22 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Knowledge and Information Systems Aims and scope Submit manuscript

An adaptive and late multifusion framework in contextual representation based on evidential deep learning and Dempster–Shafer theory

Download PDF

Doaa Mohey El-Din^1,3,
Aboul Ella Hassanein^1,2,3 &
Ehab E. Hassanien¹

157 Accesses
Explore all metrics

Abstract

There is a growing interest in multidisciplinary research in multimodal synthesis technology to stimulate diversity of modal interpretation in different application contexts. The real requirement for modality diversity across multiple contextual representation fields is due to the conflicting nature of data in multitarget sensors, which introduces other obstacles including ambiguity, uncertainty, imbalance, and redundancy in multiobject classification. This paper proposes a new adaptive and late multimodal fusion framework using evidence-enhanced deep learning guided by Dempster–Shafer theory and concatenation strategy to interpret multiple modalities and contextual representations that achieves a bigger number of features for interpreting unstructured multimodality types based on late fusion. Furthermore, it is designed based on a multifusion learning solution to solve the modality and context-based fusion that leads to improving decisions. It creates a fully automated selective deep neural network and constructs an adaptive fusion model for all modalities based on the input type. The proposed framework is implemented based on five layers which are a software-defined fusion layer, a preprocessing layer, a dynamic classification layer, an adaptive fusion layer, and an evaluation layer. The framework is formalizing the modality/context-based problem into an adaptive multifusion framework based on a late fusion level. The particle swarm optimization was used in multiple smart context systems to improve the final classification layer with the best optimal parameters that tracing 30 changes in hyperparameters of deep learning training models. This paper applies multiple experimental with multimodalities inputs in multicontext to show the behaviors the proposed multifusion framework. Experimental results on four challenging datasets including military, agricultural, COIVD-19, and food health data provide impressive results compared to other state-of-the-art multiple fusion models. The main strengths of proposed adaptive fusion framework can classify multiobjects with reduced features automatically and solves the fused data ambiguity and inconsistent data. In addition, it can increase the certainty and reduce the redundancy data with improving the unbalancing data. The experimental results of multimodalities experiment in multicontext using the proposed multimodal fusion framework achieve 98.45% of accuracy.

Employing multimodal co-learning to evaluate the robustness of sensor fusion for industry 5.0 tasks

Article 07 March 2022

An Overview of Multimodal Fusion Learning

A distributed smart fusion framework based on hard and soft sensors

Article 23 February 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, the rapid advancements in artificial intelligence (AI) have paved the way for the application of deep learning techniques in various fields [1], one prominent significant research field where deep learning has gained multiobject classification in decision-making [2]. An important strategy of artificial intelligence in studying big data is data fusion which is the joint analysis of multiple interrelated datasets that provide complementary views of the same phenomenon [6]. Data fusion systems are now widely used in various areas such as sensor networks, robotics, video and image processing, and intelligent system design [7,8,9,10]. The statistics of recent digital information around the world estimate that 80–90% of data generated by digitized services via industry is unstructured [10]. So, data fusion has become a wide-ranging subject and many terminologies have been used interchangeably. Multidata fusion is the process of combining disparate data streams to generate information in a form that is more understandable or usable [3]. It refers to the combining between multisensor data fusion technology [4] and the applicable multimodal data fusion (MMDF) [5] that is the process of combining disparate data streams of different dimensionality, resolution, type, etc., to generate information in a form that is more understandable or usable.

The main challenge of this paper is shown in modality and context-based fusion, interpreting diverse data fusion for classifying objects and improving decisions from multitargets into one unification objective for each system. The modality/context-based fusion causes conflicting data nature to solve uncertainty, ambiguity, and imbalanced interrelated data [6]. Moreover, there is no way to solve the conflicting nature of data such as image, text, audio, and video, in multitarget sensors to object classification in diverse context systems, heterogeneous data, imbalanced data, unstructured data, conflicting data, different representation, and difficult modality numbers [7].

The difficulty of this research is finding commonalities between smart systems that requires common elements in the connectivity of smart devices, and its mismatched input method type, input target, and input relationship between different intelligent systems. The lack of a single intelligent system dataset capable of understanding a large number of inputs for testing leads to the need to test multiple intelligent systems.

Most of the recent literatures are based on context-based fusion or modality on specific known contexts [9, 10]. Furthermore, no fusion framework can analyze offline stream data to generate a concluded hidden relationship between different modality types of data and diverse modality numbers. The fusion problem is considered as one of the most researched aspects of multimodal learning [11, 12].

The open research of modality/context-based fusion is shown in many problems properties as the following,

Standardization: a hardness of the generalized context-aware middleware due to the variety of contexts and systems involved to build a generic domain-focused middleware solution [13].
Increase autonomy: Although context-aware middleware architectures minimize the requirement of human intervention when they serve personalized applications, human intervention is still necessary and plays a significant role in realizing context awareness [14].
Lack of testing: It can be noted that most of the middleware architectures contained in this paper are still at the conceptual stage [14].
Lack of accurate data: Due to different sources of problems, many times a context-aware system cannot build a computational model that represents the knowledge of a real-world domain [15].

Different fusion applications aimed to support context representation and fusion, when formally incorporated in a context-aware system, are still open research [16]. Approaches fuse the multimodal features in a single way, which is not enough to elicit complementary data and then limits the performance [17].

This paper presents a new adaptive and late multimodal fusion framework that relies on creating a multifusion learning model for solving modality/context-based fusion challenges for improving multiobject classification and decision-making. It creates a fully automated selective deep neural network and constructs an adaptive fusion model for all modalities based on the input type. The proposed framework is constructing automatically a deep neural network based on the Dempster–Shafer and concatenation strategy to achieve a bigger number of features for interpreting unstructured multimodality types based on late fusion. The proposed framework is implemented based on five layers including a software-defined fusion layer, a preprocessing layer, a dynamic classification layer, an adaptive fusion layer, and an evaluation layer. The framework is formalizing the modality/context-based problem into an adaptive multifusion framework based on a late fusion level.

2 Literature review

The related works of modality fusion problem and context-aware fusion problem were discussed by many researchers. For example, authors in [18] presented an early fusion model applied on the time series text modality data types in the stock market. Although its experimental results achieved 87.7%, the limitation was shown in the redundant data in fusion. In [19], authors presented another early fusion model applied on bimodality data types of audio/visual data in human action recognition. Although its experimental results achieved 86%, the limitation was shown in the difficulty of fusing multiple modality types. Authors in [20] presented a late image fusion model for CIFAR-10 that can automate fusion in one context. The accuracy of object classification achieves 89–94% and the limitation is not suitable for multicontext.

Authors in [21] presented that a late fusion model is applied on ECG signals. It checked the quality of ECG of cardiology and the accuracy of object classification 61 and 87%. It limits the bigger number of characteristics to improve the quality. Authors in [22] presented a late fusion model that is designed to image modality type for medical images. Although its main objective was the medical image classification that achieved 88%, it required to increase the features number and to be applicable in multiple contexts.

Authors in [23] presented a hybrid fusion model for text daily historical water level data in Vietnam. The essential objective was prediction of the water level and the accuracy results achieved 91–93%. The limitation was found in the redundant data and hardness to multiple models’ types. Authors in [24] presented a hybrid fusion model that was designed based on bimodality in one context (Images and text). The experimental results achieved 99.57% on two modalities only in one context and the limitations were low robustness and limited context with specific conditions.

Authors in [25] presented the investigation of how to extricate ordinary highlights from the tremendous sum of multisensory information utilizing information preparation and mining procedures. A deep self-attention organizer was proposed to handle aero-engine multisensory information containing debasement data at diverse scales and after that precisely anticipate the comparing remaining useful life (RUL) of the aero-engine. Firstly, multiscale pieces with self-attention technique were developed to specifically extricate multisensory highlights on distinctive scales. Authors in [26] presented a depth estimation algorithm based on convolutional neural networks (CNNs). First, a single image super resolution algorithm was adopted to spatially super resolve the sub-aperture images (SAIs). Second, to adapt the texture complexity, the SAIs are partitioned into two regions, i.e., simple texture region and complex texture region, based on the texture analysis of the central SAI. Third, the epipolar plane images (EPIs) in horizontal, vertical, 45-degree diagonal, and 135-degree diagonal directions for both complex and simple texture regions were extracted, and the corresponding EPIs for the simple and complex texture regions were fed into the specified network branches. Finally, a fusion module was designed to generate a depth map. Experimental results show that the quality of the estimated depth maps by the proposed method was better than the state-of-the-art methods in terms of both objective quality and subjective quality.

Authors in [27] presented recognizing the epistemic emotions of learner-generated surveys in enormous open online courses (MOOCs) that could help to teach adaptive direction and interventions for learners. The epistemic feeling recognizable proof errand might be a fine-grained distinguishing proof assignment that contained different categories of feelings emerging amid the learning handle. Past studies only considered passionate or semantic data inside the audit writings alone, which led to deficiently highlighted representation. In addition, a few categories of epistemic feelings were ambiguously dispersed in space, making them difficult to recognize. The emotion-semantic-aware double contrastive learning (ES-DCL) approach was displayed to tackle these issues. In order to learn adequate highlight representation, certain semantic highlights and human-interpretable emotional highlights were, separately, extricated from two distinctive sets to create complementary emotional-semantic highlights. The proposed ES-DCL was compared with 11 other standard models on four diverse disciplinary MOOCs survey datasets.

Adaptivity control has two differences between it and dynamic control. The difference is shown when applying in flexibility to be adapted with diverse requirements in system behavior concerning the adoption rules on [28]. Adaptivity is considered a type of adaptive dynamic programming to achieve the optimal solution for systems iteratively [29]. But dynamic control programming is dynamic with control the parameters in a system concerning time issues only in changes. Authors in [30] presented the software-defined network (SDN) and organized work virtualization (NFV) are recognized as the most promising advances to variable distribute assets for arranged benefit. A benefit work chain (SFC), which could deploy virtualized organize capacities (VNFs) and chain them with related streams allotment can be utilized to speak to each arranged benefit owing to the presentation of the SDN/NFV innovation. This article was a deep learning that combined the multitask relapse layer over the chart neural systems were first presented to anticipate long-run asset necessities of each VNF occasion. Agreeing to the reenactment discoveries, the proposed deep showed at least a 6.2% enhancement in forecast precision over standard prediction models, and the proposed SFC arrangement procedure had been illustrated to deliver better execution in terms of acknowledgment proportion and income, compared to the current inactive sending calculations.

On the other side, fusion models face a big challenge in extraction relationships via multiple contexts due to each context having specific roles, parameters, and objectives [31,32,33]. Authors in [34] analyzed the relationship between human exercises and properties (sufficiency and stage) of Wi-Fi CSI signals on different receiving radio wires and found the flag properties that change strikingly in reaction to human development. The variety within the flag among different antennas appeared distinctive sensitivity to human exercises, specifically influencing acknowledgment execution. Hence, to recognize human exercises with way better proficiency, the research proposed a versatile radio wire disposal calculation that naturally disposes of the non-sensitive receiving wire and keeps the delicate antennas taking after distinctive human exercises. The test comes about uncovered that indeed when utilizing easy-to-implement, non-deep machine learning, such as arbitrary woodland, the acknowledgment framework based on the proposed versatile receiving wire end calculation accomplished a predominant classification precision of 99.84% (line of locate) on the StanWiFi dataset and 97.65% (line-of-sight)/93.33% (non-line-of-sight) on another broadly connected multienvironmental dataset at a division of the time fetched, illustrating the strength of the proposed calculation. Table 1 presents a summary of the state-of-the-art for the modality fusion problem and context-aware fusion problem.

Table 1 A summary of the state-of-the-art comparative analysis

An adaptive and late multifusion framework in contextual representation based on evidential deep learning and Dempster–Shafer theory

Abstract

Similar content being viewed by others

Employing multimodal co-learning to evaluate the robustness of sensor fusion for industry 5.0 tasks

An Overview of Multimodal Fusion Learning

A distributed smart fusion framework based on hard and soft sensors

1 Introduction

2 Literature review

3 Background and basics

3.1 Background of data fusion

3.2 Background of data fusion techniques

3.3 Deep learning techniques

3.4 Dempster–Shafer theory

3.5 Particle swarm optimizer

4 The proposed adaptive and late multimodal fusion framework

4.1 Framework architecture

4.1.1 Fusion level (1): Model fusion level

4.1.1.1 Layer (1): Software-defined fusion layer

4.1.1.2 Layer (2): Preprocessing layer

4.1.2 Fusion Level (2): Feature fusion level

4.1.2.1 Layer (3): Dynamic classification layer

4.1.2.2 Layer (4): Adaptive fusion layer

4.1.2.3 Layer (5) Evaluation layer

4.2 The inputs and outputs of adaptive multifusion framework

5 Datasets characteristics for multimodality on multicontext

5.1 Dataset (1): Smart military data sets

5.2 Dataset (2): smart agriculture dataset

5.3 Dataset (3) Smart health COVID-19 data sets

5.4 Dataset (4) Smart dietary health

6 Experimental results, analysis and discussion

6.1 Experiments and results analysis

6.1.1 Experiment (1) Same military: An experiment for same modalities fusion

6.1.2 Experiment (2) Smart agriculture: an experiment for same modalities fusion

6.1.3 Experiment (3) Smart COVID-19 Health with different modalities

6.1.4 Experiment (4) Smart dietary health with different modalities

6.2 Comparative analysis and discussion

7 Conclusion and future works

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Appendices

Appendix A: Multifusion framework algorithm

Appendix B: Mathematical Proof Formulation

Proof (1): Multimodality adaption for multiple modality inputs

Proof (2): Multimodality relationships weight and type:

Proof (3): Multimodality priority: refers to the importance of modality with respect to all datasets.

Proof (4): Context Adaption for diverse domains: refers to the reduction level filter

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation