1 Introduction

The number of autism spectrum disorder (ASD) diagnoses has greatly increased according to World Health Organization (WHO) statistics [1]. Often, ASD is detected during the process of examinations using Magnetic Resonance Imaging (MRI), Electroencephalogram (EEG), and demographic data [2, 3]. Over the last few years, more efforts have been directed toward the application of a new model for the diagnosis and triage of autistic patients, which involves a combination of medical and sociodemographic features [4, 5]. This holistic mode has earned recognition and has also led to the creation of smart methods that have facilitated the labeling and detection of autism patients, thus improving the triage process [6].

Triage is vital for determining the priority level of individuals with ASD, such as urgent, moderate, and minor individuals. This process undeniably assists autistic individuals who receive early diagnosis and fast-tracked medical services [6]. It appears increasingly evident that context-based triage has become necessary in the age of evolving technology and immediate applications. The triage system provides a way of prioritizing and locating problems across any domain, including telehealth systems. In other words, triage ensures the optimal performance and efficiency of real-time health applications [7].

However, there is still an inception of ASD triage applications that are performing as a part of real-time systems, most importantly in terms of patient privacy and unimpeded treatment, especially during crisis periods such as pandemics or disasters [5, 8]. Triage applications are mostly based on artificial intelligence (AI) and machine learning (ML) algorithms. The effectiveness of these models is strongly dependent on factors such as the structuring of the data, the behavior of the data and the processing of the data [9]. Dataset preprocessing covers a range of stages, such as imputations, normalizations, and balances of datasets. On the other hand, there is a need for feature engineering, which is also very important, although it involves the creation of new features from the original numeric attributes [10]. The progression of ML models, which are based on the recent fusion of these features, promises to aid in the development of performance evaluation indices. However, there are also areas of AI applications, especially those working under normal conditions in real time, that are prone to a variety of security threats, such as adversary attacks on machine models. Such attacks could be used to influence the data classification of ASD triage applications by manipulating initial data and consequentially altering the behavior of the ML model. Hence, such interference may decrease the precision of triage processes or even result in the complete failure of the system in certain cases [11]. As a result, ASD triage models should demonstrate robustness and reliability. With regard to the issue of robustness, the concept has recently become more noticeable and considered to be a vital part of AI [8].

An ML model is considered robust when it makes accurate outputs with the same metrics and performance under the circumstance of sudden significant changes in data inputs or perspectives due to unintended circumstances [11]. Another scenario is adversarial attacks, which are specifically engineered to deceive the ML models, making them make incorrect predictions through the injection of false data and the so-called "adversarial example" [12]. The "adversarial example" means inaccurate input is introduced into the model, which leads to a requirement of robustness in the model's ability to address this adversarial example. Attacking methods can be divided into two broad categories: white boxes and black boxes [13]. White-box attacks emerge when attackers gain full access to the model architecture and the model itself. In contrast, black-box attacks are made by attackers who do not have access to the internals of the model. As a result of our study, white-box adversarial attacks are predominantly exploited. With these attacks, adversaries have access to the entire structure and parameters of the model they want to exploit; thus, they can design malicious inputs that generate undesirable outputs [14]. Through white-box attacks, however, we aim to evaluate the robustness of our ASD triage system against threats from hackers who are able to extract the required model.

Furthermore, this area still poses one of the major challenges for scientists, and the development of early ASD triage models is currently among the priorities, with various studies aiming to apply ML models for the sake of creating new diagnostic tools or triage mechanisms. These studies, however, differ in terms of their accentuation; none of them are centered on the dataset, while others are concentrated on the ML models involved in their proposed models. This criticism was based on a study that used a sociodemographic ASD dataset and did not employ the triage method [1]. Instead, the study illustrated a number of ML approaches, such as support vector machine (SVM), decision tree (DT), AdaBoost, random forest (RF), Naïve Bayes (NB), neural network (NN), logistic regression (LR), and neural network multilayer perceptron (MLP), for building a prediction model for ASD diagnosis, paying more attention to sociodemographic features. In contrast, a study [15] suggested a hybrid model for ASD diagnosis that comprises the integration of both medical and sociodemographic ASD datasets, as well as the use of multicriteria decision-making (MCDM) during development. [16] employed a dataset of family characteristics of patients with ASD for training models such as SVM, DT, and AdaBoost; therefore, they developed a diagnostic model for ASD. Notably, some studies, such as [17, 18], incorporated diverse datasets and ML models, often within non-real-time contexts. However, the surge in real-time technology applications and the looming threat of adversarial attacks on online systems necessitate a recalibration of ASD model development strategies. As exemplified by [19, 20], a shift is underway toward the incorporation of these considerations, whether by merging traditional datasets with adversarial samples or by introducing innovative methods that leverage neural network (NN) or graph neural network (GNN) approaches to train ASD datasets.

The adaptability of ML models for real-time applications and the selection of the most resilient model for ASD triage are pressing concerns in the research community [9]. Consequently, multiple-ML models have emerged, each demonstrating distinct performance evaluation metric values. The imperative lies in enhancing and adapting ML models across diverse scenarios using a spectrum of classifiers. These models encompass a wide array of ML algorithms, including k-nearest neighbors (kNN), DT, stochastic gradient descent (SGD), support vector machines (SVMs), RF, NB, NN, and LR. Among these, certain models are particularly desirable as foundational learning tools due to their ease of implementation and computational efficiency [21,22,23]. These models undergo meticulous training and testing, employing various preprocessing approaches and encompassing both normal test and adversarial attack scenarios. Undoubtedly, this challenge aligns with the realm of machine learning theory, offering fertile ground for innovative solutions that promise valuable insights and resolutions.

The adaptability and selection of the most robust ML models for healthcare applications for ASD triage or other health fields are crucial [24]. The construction of ML models involves numerous circumstances and procedures, resulting in several choices [9]. Consequently, choosing the best model becomes an essential need. However, evaluating and benchmarking ML models present a complex decision-making challenge due to the necessary trustworthiness requirements, including (1) model robustness—performance on diverse datasets or under varying conditions [11]; (2) model generalization—performance on unseen data; and (3) model flexibility—adaptability to different types of data and problem domains [4, 25]. To address these challenges, specific problems can be described as follows:


1st Issue: Impact of Preprocessing Approaches: Prioritizing verified datasets through precise preprocessing is crucial during the training and testing of ML models [26]. Preprocessing, which involves pivotal approaches such as standard and feature fusion preprocessing, serves as the foundation for ML model development [9]. Despite the comprehensive perspective different preprocessing approaches provide, a central question emerges from the hypothesis: "Do distinct preprocessing approaches influence the development of the optimal ML model? ".


2nd Issue: Multiperspective Decision Matrix: Enhancing and adapting ML models across diverse scenarios using a spectrum of classifiers is imperative. These models encompass a wide array of ML algorithms, undergo meticulous training and testing, employ various preprocessing approaches, and include two perspectives: normal test (nonadversarial) and adversarial attack scenarios [27]. Therefore, evaluating and benchmarking various ML models considering both perspectives in a unified decision matrix simultaneously raises another critical issue.


3rd Issue: Analysis of Performance Metrics and Decision Criteria: The evaluation process for ML models involves a meticulous examination of various criteria or metrics, including classification accuracy (CA), precision, recall, and computational efficiency [24]. These criteria enable comparisons between models to identify strengths and weaknesses [4]. However, four subissues faced in the evaluation and benchmarking of these performance metrics are as follows:

  • Criteria Importance: Assigning appropriate weights to each metric, reflecting their relative importance, is pivotal. A notable challenge arises from simultaneously considering both benefit and cost criteria. Benefit criteria advocate higher values as desirable, such as CA, while cost criteria favor lower values, such as training and testing time [28].

  • Criteria Trade-off: A trade-off involves sacrificing some of one criterion to gain more of another criterion [29]. Achieving one objective or criterion may necessitate sacrificing another. In the context of ML model evaluation and benchmarking, there might be a trade-off between model complexity and interpretability. A more complex model may provide better CA but could be harder to interpret.

  • Criteria Conflict: Conflicts arise when there is a direct clash or disagreement between different criteria [30]. Optimizing one criterion may negatively impact another, and there is no easy compromise. For example, conflict might occur when trying to balance the need for model CA and the requirement for model simplicity. Improving CA (reducing error) may conflict with the goal of keeping the model simple for easier understanding.

  • Data Variation: The varied data values for the metric criteria make the selection decision a more complex task [31]. For example, training and testing time are measured in seconds, CA is measured in percentages, and log loss is measured between 0 and 1. Additionally, some evaluation criteria tend to be subjective, involving high or low values.

Undoubtedly, these issues align with the realm of machine learning theory, while others fall under complex multicriteria decision problems in the evaluation and benchmarking process, where fuzzy MCDM has emerged as a vital and indispensable tool for selecting the most trustworthy ML model for healthcare applications.

The utilization of fuzzy MCDM in evaluating and benchmarking ML models provides a new research direction to contribute to a reliable and trustworthy selection process. Explaining why this combination was chosen and its advantages in the context of ASD triage would provide valuable context for healthcare sectors. Although model selection based on confusion matrix metrics can provide valuable insights into the performance of ML models, it may not fully address the complexities and uncertainties inherent in real-world scenarios [32]. The need for the development of a decision-making framework can be justified as follows:

  1. 1.

    ML models include numerous specifications and considerations that are not limited to simple CA. The training time, precision, specificity, and ability to support different performance metrics are some of the specifications [24].

  2. 2.

    It is possible that a confusion matrix alone cannot show the dilemmas and conflicts that are inherent in the different evaluation metrics. Such cases include, for instance, high-sensitivity models that have low specificity; thus, they may cause misdiagnoses and false positives/negatives[33].

  3. 3.

    An expert system with a decision-making framework provides the facilities for the integration of user preferences and knowledge into the system. The model will, however, be validated to align with the goals and objectives of ASD triage stakeholders, such as clinicians and patients [9].

  4. 4.

    Fuzzy decision-making can cope with imprecision and uncertainty by incorporating them into the evaluation and selection process, resulting in a better mechanism in real-world ASD triage scenarios where the data provided can be incomplete or noisy [34].

  5. 5.

    A decision-making framework enables consideration of multiple criteria together, such as those from a confusion matrix, to carry out a more holistic approach to assessing ML models. Therefore, we will incorporate a holistic approach to the model to fulfill all criteria for efficient ASD triage [15].

In addition, it is of paramount importance to give preference to credible datasets when training and testing ML models [35], especially in regard to maintaining the smooth operation of ASD triaging in real time [28]. The first step in building the model for real-time triage of ASDs, which is resilient to changes in data, is stressing the need for the use of datasets, which are verified and serve as the basis for the training and testing of ML models [32]. The validation and verification of these datasets, on the other hand, are mandatory steps, which eventually leads to the use of feature fusion preprocessing as a foundation for ML model production [36]. In addition, triage is applied in the process of helping patients prioritize patients based on their cases [9].

The MCDM helps determine the efficiency of ML models and optimally selects the most suitable model to be applied [37, 38]. The FDOSM is a good emphasis, for instance, for identifying difficulties and intricate problems [39]. FDOSM is combined to create a problematic section regarding assessment metrics and to distinguish the most promising ML algorithms [24]. This method of approach is built on the concept of the ideal or the most desirable solution, takes into account preferences and inconsistencies, reduces the number of required comparisons, provides fair and implicit comparisons and involves fewer complex arithmetic operations [26]. Furthermore, FDOSM is able to successfully solve the aspects of normalization and weighting that are typical for MCDM procedures [40]. Another factor that stands out in FDOSM is that it is capable of dealing with vague and fuzzy data. This makes it a very reliable tool in circumstances where such data are present. FDOSM aids in the selection of the most efficient ML model by deploying a general evaluation process taking into account several factors and the overall performance of the ML model. Moreover, the model has access to 2-tuple linguistic Fermatean [41, 42], which makes it more potent in solving numerous difficult decisions. The 2-tuple linguistic Fermatean method is an extension of the advanced application of fuzzy set theory [43] and provides multiple benefits for the fuzzy-based FDOSM method. This approach can improve decision-making by providing a more exact representation of linguistic variables with both membership and nonmembership values, thus reducing uncertainty and increasing precision. The superiority of this technique is that it is capable of handling intricate circumstances with many criteria, managing conflicts, and guaranteeing consistency in decision-making [44, 45]. It is noteworthy that FDOSM addresses imprecise data, increases interpretability, and allows decision-makers to better understand the outcomes. Thus, the 2-tuple linguistic Fermatean empowers FDOSM to exercise critical and systemic decision-making without any deviations.

In addition, our goal is to introduce the 2-tuple linguistic Fermatean fuzzy decision by opinion score method (2TLFFDOSM). This new paradigm incorporates the effectiveness of FDOSM, the 2-tuple linguistic method and the Ferman fuzzy group to solve issues of evaluation and benchmarking, particularly conflicts and trade-offs [41]. The main contributions of this work include the following:

  1. 1.

    Produce of a novel ASD-triaged fused dataset based on algorithm 1.

  2. 2.

    Development of multiple-ML models based on the ASD-triaged fused dataset that addresses both normal and adversarial scenarios, founded on algorithm 2.

  3. 3.

    Development of a new decision matrix for multiple-ML models considered for both normal test and adversarial attack scenarios.

  4. 4.

    The 2TLFFDOSM method was formulated to evaluate, benchmark, and select the most robust ML model.

2 Framework

The proposed development framework is structured around three essential phases, as shown in Fig. 1. The initial phase, detailed in Sect. 2.1, focuses on the identification and preprocessing of an authentic ASD triage dataset. The subsequent phase, discussed in Sect. 2.2, delves into the development of multiple machine learning models (referred to as multiple-ML) tailored for normal test and adversarial attack scenarios. The third and final phase involves the development of an evaluation and benchmarking methodology for multiple MLs intended for real-time ASD patient triage. The significance of scrutinizing various ML models in terms of their performance on the ASD triage dataset should be considered for both normal and adversarial test examples. This comparative analysis is fundamental for observing the models being adopted for both the effectiveness and reliability of triage applications in the real world. In addition, it addresses the trends that are important from the points of view of practical, ethical and security issues.

Fig. 1
figure 1

The developed framework for evaluating and benchmarking robust ML models

2.1 Phase 1: ASD Triage Dataset Identification and Preprocessing

This stage incorporates two major steps: first, the description of the ASD triage dataset and, second, the implementation of the PCA feature fusion technique. The method of choice is applied to the ASD triage dataset in its raw form using the recently developed Algorithm 1. This phase of fusion of the ASD dataset results in a merged sample, which is used for comparison of the effectiveness of multi-ML models, which can learn from both normal and adversarial inputs. The detailed procedures of data preprocessing covered in this phase are explained below, including the data description. Figure 2 shows the flow chart of the preprocessing in this phase.

Fig. 2
figure 2

Preprocessing stages for the ASD triage dataset

2.1.1 Dataset Identification

The data for this study were sourced from a prior research project [6]. This dataset comprises authentic information, encompassing 538 patients who received autism diagnoses from specialized psychologist experts. It also includes 19 distinct features comprising medical and sociodemographic data. These patients were categorized into one of three triage labels: urgent (70 patients), moderate (432 patients), or minor (36 patients). The assignment of these labels was conducted through an intelligent triage method known as processes for triaging autism patients (PTAP) (as detailed in [6, 9]). This method involved the collaboration of four psychologist experts, AI specialists, and fuzzy decision-making experts.

To ensure data quality, the ASD triage dataset underwent a rigorous cleaning process, including the imputation of missing values. Additionally, the data were transformed into a numerical format for analysis. Notably, the initial dataset exhibited class imbalance among the three triage labels, prompting efforts to balance the classes. A visualization of the dataset points is presented in Fig. 3 using a 3D scatter plot that incorporates the 19 features, including medical and sociodemographic features.

Fig. 3
figure 3

3D scatter plot of the numerical features of medical and sociodemographic data

Consequently, each triage label was adjusted to contain 432 samples, resulting in a total of 1296 samples in the ASD triage dataset (as discussed in [29]). For more comprehensive details regarding the dataset's description and feature definitions, readers are encouraged to refer to a previous study [6, 49].

2.1.2 Feature Fusion Process

The quality of the raw data is subsequently magnified by feature fusion. Feature fusion involves the consolidation of multiple features or variables to a single feature set, leaving behind the most meaningful information in the original features [46]. This is done by a mathematical function termed principal component analysis, which reduces the number of dimensions of a dataset while maintaining as much variance as available; it achieves this through the transformation of the original features to a new set of a sexually uncorrelated variable described as a principal component; these components capture the features’ largest patterns or variations [47].

The purpose of employing PCA for feature fusion with the 19 autism-related features in this stage is to address several critical issues. These issues include dimensionality reduction, model interpretability, and generalizability improvement. As a group, these advantages tend to improve the ability of multiple-ML models to learn from the fused data source; consequently, the resulting ASD patient triage will be more accurate. As explained in Algorithm 1, the steps of the PCA algorithm involve processing a balanced ASD triage dataset \(X\) with \(n\) samples and \(p\) features.

figure a

Algorithm 1

Algorithm 1 standardizes the data, calculates the covariance matrix, identifies the principal components, and projects the data into a reduced-dimensional space. PCA is effectively performed on the balanced ASD triage dataset to achieve feature fusion while retaining essential information.

2.2 Phase 2: Development of Multiple-ML Models

In this phase, we developed multiple-ML models based on the fused ASD triage dataset. The approach began with the development of models for normal test examples, followed by the extension of our efforts to address adversarial test examples. Eight ML algorithms, including logistic regression, DT, NN, SGD, RF, kNN, NB, and SVM, were rigorously tested using nine performance metrics: training time, test time, area under the curve (AUC), CA, F1, precision, recall, log loss, and specificity. Considering both normal and adversarial test example scenarios allowed us to approach model development from multiple angles. This not only deepened our understanding of their strengths and limitations but also provided valuable insights into real-time triage applications. By encompassing a broader spectrum of potential scenarios, multiple MLs are better prepared to perform effectively and robustly in real-world situations, thus enhancing their applicability in ASD patient triage. The process for developing these multiple-ML models is depicted in Fig. 4.

Fig. 4
figure 4

The development process for multiple supervised ML models for normal and adversarial attack examples

Figure 4 shows the development process of eight ML models in the context of normal tests and adversarial attack examples. For the normal test examples, the experiment was applied to the fused ASD dataset to measure the classifiers' performance metrics without an adversarial perspective. Algorithm 2 measures the classifiers' performance with adversarial attack examples. The equations for the performance metrics are shown below, while Table 1 details the parameter settings used for each classifier algorithm.

$$AUC = \left( {\frac{{{{TP}}}}{{{{TP}} + {{FN}}}} + \user2{ }\frac{{{{TN}}}}{{{{TN}} + {{FP}}}}} \right)\user2{ ,}$$
(1)
$$CA = \frac{{{{TP}} + {{TN}}}}{{{{TP}} + {{FP}} + {{FN}} + {{TN}}}},$$
(2)
$$F1 \, score = \frac{{2\user2{*TP}}}{{2\user2{*TP} + {{FP}} + {{FN}}}},$$
(3)
$$Recall = \frac{{{{TP}}}}{{{{TP}} + {{FN}}}},$$
(4)
$$Precision = \frac{{{{TP}}}}{{{{TP}} + {{FP}}}},$$
(5)
$$Specificity = \frac{{{{TN}}}}{{{{TN}} + {{FP}}}},$$
(6)
$${\text{LogLoss}} = - \frac{1}{{{n}}}\mathop \sum \limits_{{{{i}} = 1}}^{{{n}}} \left[ {{{y}}_{{{i}}} \cdot {{log}}_{{{e}}} \left( {\hat{\user2{y}}_{{{i}}} } \right) + \left( {1 - {{y}}_{{{i}}} } \right) \cdot {{log}}_{{{e}}} \left( {1 - \hat{\user2{y}}_{{{i}}} } \right)} \right],$$
(7)

where \(TP\): true positives, \(TN\): true negatives, \(FP\): false positives, \(FN\): false negatives, \({y}_{i}\): actual class label, \({\widehat{\text{y}}}_{\text{i}}\): predicted probability per label, and \(n\): number of instances.

Table 1 Parameter settings for eight supervised ML classifiers

Optimizing the parameters of a classifier is of paramount importance when aiming to enhance its performance on the fused ASD triage dataset. Additionally, this process can yield valuable insights that can contribute to the optimization of algorithm parameters in future applications [48]. It is well acknowledged that default parameter settings for multiple-ML classifiers often result in suboptimal model performance [49, 50]. Consequently, fine-tuning these parameters enables precise control over the training process, leading to improved performance for multiple-ML models [51]. The parameters incorporated within each model were meticulously adjusted to suit the training of the fused ASD triage dataset. Subsequently, the above parameter settings were obtained through an initial analysis of the dataset size and the types of features present to promote high-level compatibility with numerous ML models; these parameters were also kept flexible to enable continued adjustment of model performance based on the evolving nature and size of the dataset. To better understand the ML models developed from both ends, consider the subsequent subsections.

2.2.1 ML Models Based on Normal Test Examples

The learning of the developed models with normal test examples is conducted by applying them to the fused ASD triage dataset. These models are then used to learn the data via eight previously defined ML algorithms. Finally, the nine performance metrics of the resulting prediction models without adversarial attack learning examples are measured. This effort is focused on understanding the achieved performance quality of the models in a normal scenario architecture.

2.2.2 ML Models Based on Adversarial Attack Examples

Adversarial training is recognized as an effective strategy for fortifying ML models against adversarial samples, thereby enhancing their robustness [52, 53]. Various methodologies have been proposed to construct adversarial examples with the goal of altering predictions while minimizing the dissimilarity between the original instance and the adversarial variant. In the context of this study, ML models were trained using the fused ASD triage dataset. Adversarial attacks were generated employing the fast gradient sign method [54], as delineated in Eq. (8). Subsequently, the Adversarial Robustness Toolbox (ART) classifier, described in Eq. (9), was employed for training purposes. By harnessing the Python-based ML security library ART, scholars and developers are equipped to assess and fortify ML models and applications against adversarial risks such as evasion, poisoning, extraction, and inference [55]. Figure 5 illustrates the workflow of the model development process, which involves the integration of adversarial attack examples into the dataset.

$$xadv=+\epsilon \cdot sign\left(\nabla xJ\left(\theta ,x,ytrue\right)\right),$$
(8)

where xadv is the adversarial example and x is the original input example.

Fig. 5
figure 5

ML models developed for adversarial attack examples

ϵ is a small scalar value used to control the magnitude of the perturbation. The purpose of the scalar ϵ is to control the magnitude of the perturbation applied to the original input, determining how much the adversarial example deviates from the original example.

sign: returns the sign of its input.

x​(θ, true) is the gradient of the loss function concerning the input x evaluated at θ, y, where θ is the model parameter and y is the true label.

$$A\left(x\right)=argmaxy\left(\sum i=k\omega i\cdot log10\left(fi\left(x\right)\right)\right),$$
(9)

where \(A(x)\) is the ART classifier output for input \(x\), \(k\) is the number of classes, \((f\text{i}(x)\) is the probability of the input \(x\) belonging to class \(i,\text{and} \omega \text{i}\) is the weight for class \(i.\)

In the context of defensive distillation [55], a distinct approach was undertaken. The model depicted in Fig. 5 was trained to exhibit a smoother decision boundary, particularly in the directions that potential attackers are prone to exploit. This strategic choice presents adversaries with a formidable challenge when attempting to identify adjustments in input data that lead to misclassification. Notably, this model's training process differed from that without adversarial attacks, as it leveraged "soft" probability outputs derived from the primary model rather than relying on the "hard" (0/1) true labels from the original training data. This distinctive technique has demonstrated efficacy in fortifying against initial iterations of adversarial attacks. Moreover, the same set of eight ML algorithms underwent training using the fast gradient method for adversarial test examples. The learning procedure followed Algorithm 2, as outlined below:

figure b

Algorithm 2

2.3 Phase 3: Evaluation and Benchmarking Methodology

In this phase, we delve into the decision-making methodology for evaluating and benchmarking the multiple-ML models developed based on the fused ASD triage dataset. The first section centers on the creation of the DM, while the second section provides an in-depth exploration of the 2TLFFDOSM. This method aids in selecting a robust ML model within the contexts of normal and adversarial test example learning scenarios.

2.3.1 Development of the DM

The malignant influence of adversarial attacks on the decision-making process may result in incorrect choices. Second, ML models differ in complexity, which complicates the task of objectively comparing their performance. Therefore, the critical element of our evaluation and benchmarking approach is the construction of the DM. The DM includes two components: criteria and alternatives. In total, there are 18 criteria that reflect the evaluation metrics of two perspectives: a test example without adversarial attacks and a test example with adversarial attacks. The alternatives to DM are eight ML models. Table 2 outlines the process of constructing the DM.

Table 2 Development of DM

The developed DM serves as a pivotal tool in addressing the challenges associated with evaluating and benchmarking multiple-ML models. It offers several key advantages, providing a comprehensive view of the effectiveness of various multiple-ML models across diverse scenarios. However, it is important to note that the DM's full potential is realized when integrated with an appropriate MCDM method. The formulation process of the 2TLFFDOSM enhances the evaluation process, considering the relative importance of criteria, resolving inherent conflicts, and ultimately aiding in the selection of the robust ML model. The combination of the DM and the 2TLFFDOSM methodology provides a powerful and nuanced approach to evaluating multiple-ML models and selecting robust models, ensuring informed decisions in the context of ASD triage applications.

2.3.2 Formulation of 2TLFFDOSM

FDOSM, a robust mathematical model, offers a compelling solution to the intricate challenges associated with selecting the most robust ML model for ASD triage applications [56]. Structured into two principal stages, FDOSM begins with the input unit, where it leverages a DM as a pivotal starting point for evaluating robust models. The DM encapsulates crucial evaluation criteria and alternative ML models for consideration. Transitioning to the data transformation unit, FDOSM performs a sophisticated transformation, converting the DM into an Opinion DM. This transformation process is further enhanced through the application of the Likert scale, resulting in the creation of a fuzzy opinion matrix. The culmination of the FDOSM process involves direct aggregation, a strategic technique used to ascertain the ultimate ranking of the alternatives, representing the eight ML models under scrutiny. Figure 6 visually represents the intricate stages involved in FDOSM.

Fig. 6
figure 6

FDOSM stages

2.4 Data Transformation Unit

To translate the DM into an opinion matrix, the data transformation unit consists of two fundamental phases [57].


Step 1: Based on the following factors, the optimum solution (robust ML model) is chosen as the best option:

$$A^{*} = \left\{ {\left[ {\left( {{}_{i}^{{max}} vij{\text{|}}j \in J} \right),\left( {{}_{i}^{{\min }} vij{\text{|}}j \in J} \right)~,~\left( {Op_{{ij}} \in I.J} \right)~i = 1.2.3 \ldots ..m} \right]} \right\}$$
(10)

The 'max' in the FDOSM formulation for choosing the robust ML model denotes the optimum value for the benefit criteria, denoting the highest possible acceptable value. The 'min' term, on the other hand, denotes the lowest permissible value and provides the best answer for the cost criterion. When the ideal intermediate value is located between the minimum and maximum values, the word "\({{{O}}{{p}}}_{{{i}}{{j}}}\)" refers to the critical value. The decision maker must determine this significant value based on the unique context and demands of the evaluation criteria.

The determination of a crucial value for the assessment criteria is of utmost relevance when choosing a reliable ML model. For instance, in the DM, the remaining criteria are classed as advantages, while criteria such as C1 (training time), C2 (testing time), and C7 (logloss) are labeled as costs. Establishing a critical value empowers the decision maker to conduct a comprehensive evaluation, taking into account both the benefits and costs associated with each subject. This meticulous assessment process ultimately guides the selection of the most adversarially robust ML model.


Step 2: Following the identification of the ideal solution, the next phase involves a comparative analysis where this ideal solution is contrasted with alternative values within the same criterion. This assessment is carried out by experts in the field of machine learning and employs a five-tier language system to categorize the extent of variance between the values. These linguistic terms encompass negligible deviation, minimal difference, moderate discrepancy, substantial variation, and considerable shift. This process can be represented mathematically using the following equation:

$$OPlang = \left\{ {\left( {(\tilde{v}_{{ij}} \otimes ~v_{{ij}} |j \in J} \right).~|i = 1.2.3 \ldots m)} \right\},~~~~~$$
(11)

Here, the symbol ⊗ denotes the comparison procedure discussed earlier. The linguistic term opinion matrix, which has the following definition, is the output of the data transformation unit:

$$Op_{{lang}} = \begin{array}{*{20}c} {A_{1} } \\ \vdots \\ {A_{m} } \\ \end{array} \left[ {\begin{array}{*{20}c} {op_{{11}} } & \cdots & {op_{{1n}} } \\ \vdots & \ddots & {} \\ {op_{{m1}} } & \cdots & {op_{{mn}} } \\ \end{array} } \right]$$
(12)

After the formulation of the opinion matrix, the subsequent phase involves its conversion into fuzzy numbers through the application of appropriate fuzzy membership functions. This transformation serves to quantify the linguistic terms present in the opinion matrix, rendering them fuzzy numbers. This conversion process enhances precision and provides a quantitative representation of the expert's assessments.


Data-Processing Unit The opinion matrix [58], which contains the expert's evaluations and comparisons of the options within each criterion, is the output of the data transformation unit. The third phase involves data processing, which is divided into various parts to identify the reliable ML model. The following is a description of these data processing steps:


Step 1: The opinion decision matrix obtained from the data transformation unit is fuzzified during this initial stage. This step's main goal is to convert the opinion matrix's linguistic terms into 2-tuple linguistic Fermatean fuzzy sets (2TLFFSs). The 2TLFFSs are capable of handling situations where language terms are applied to specific facts. The use of membership and nonmembership grades in the form of 2TL words is required in the development of a 2TLFFS. Because of the inherent uncertainty and slow transition between different linguistic words, 2TLFFSs define the degrees of membership and nonmembership associated with each linguistic term. The definition of 2TLFFSs is given below.

Definition 1

[41]: A FFS on a nonempty set X is given by.

$${\tilde{\mathcal{F}}} = \left\{ {x,\left( {M_{{{\tilde{\mathcal{F}}}}} \left( x \right),N_{{{\tilde{\mathcal{F}}}}} \left( x \right)} \right){|}x \in X} \right\},$$

where \({M}_{\widetilde{\mathcal{F}}}\left(x\right):X\to \left[\text{0,1}\right]\) and \({N}_{\widetilde{\mathcal{F}}}\left(x\right):X\to \left[\text{0,1}\right]\) are the membership and nonmembership grades of an element \(x\) in \(\widetilde{\mathcal{F}}\), respectively, under the constraint.

$$0 \le \left( {M_{{{\tilde{\mathcal{F}}}}} \left( x \right)} \right)^{3} + \left( {N_{{{\tilde{\mathcal{F}}}}} \left( x \right)} \right)^{3} \le 1,\;{\text{for}}\;{\text{all}}\,x \in X.$$

Definition 2

[41]. Each term in a linguistic term set (LTS) \(\mathcal{S}=\left\{{{{S}}}_{0,}{{{S}}}_{1},\dots ,{{{S}}}_{\mathbf{\rm K}}\right\}\), where \(\mathbf{\rm K}\) is an even number, represents the linguistic variable’s potential value, e.g., \(\mathcal{S}=\left\{{{{S}}}_{0}={{e}}{{x}}{{t}}{{r}}{{e}}{{m}}{{e}}{{l}}{{y}},\boldsymbol{ }{{{S}}}_{1}={{m}}{{o}}{{d}}{{e}}{{r}}{{a}}{{t}}{{e}}{{l}}{{y}},{{{S}}}_{2}={{n}}{{o}}{{t}}\boldsymbol{ }{{a}}{{t}}\boldsymbol{ }{{a}}{{l}}{{l}}\right\}\).

Definition 3

[59]. Suppose that the result of aggregating the indices of some linguistic terms in \(\mathcal{S}\) is a noninteger value \(\rho \in \left[0,\mathbf{\rm K}\right]\), \(\rho\) can be represented by the 2-tuple \(\left({S}_{k},\mathcalligra{k}\right), {S}_{k}\in \mathcal{S} \text{and} \mathcalligra{k}\in [-\text{0.5, \, 0.5})\), where \({S}_{k}\) is a linguistic term, and \(\mathcalligra{k}\) is the symbolic translation to the nearest index \(k\) in \(\mathcal{S}\).

Definition 4

[59]. Given an LTS \(\mathcal{S}=\left\{{S}_{0,}{S}_{1},\dots ,{S}_{\text{\rm K}}\right\}\) and \(\rho \in \left[0,\text{\rm K}\right]\), the following mapping is used to obtain the 2-tuple equivalent to \(\rho\):

$$\Delta :\left[0,\boldsymbol{ }\text{\rm K}\right]\to \mathcal{S}\times [-\text{0.5, \, 0.5})$$
$$\Delta \left(\rho \right)=\left( {S}_{k},\mathcalligra{k}\right), \text{with} \left\{\begin{array}{c}{S}_{k}, k=round \left(\rho \right), \\ \mathcalligra{k}=\rho -k,\kappa \in \left[-\text{0.5, \, 0.5}\right).\end{array}\right.$$

The inverse mapping \({\Delta }^{-1}\) transforms a 2-tuple to \(\rho\):

$${\Delta }^{-1}:\mathcal{S}\times [-\text{0.5, \, 0.5})\to \left[0,\text{\rm K}\right]$$
$${\Delta }^{-1}\left( {S}_{k},\mathcalligra{k}\right)=\mathcalligra{k}+k=\rho .$$

Definition 5

[41]. A 2TLFFS is a Fermatean fuzzy set in which the membership and nonmembership grades are represented by the 2-tuple \(\left({S}_{\mathcalligra{m}},\mu \right)\) and \(\left({S}_{\mathcalligra{n}},\nu \right)\), respectively, where \({S}_{\mathcalligra{m}},{S}_{\mathcalligra{n}}\in \mathcal{S}=\left\{{S}_{0,}{S}_{1},\dots ,{S}_{\text{\rm K}}\right\}\) and \(\mu ,\nu \in [-\text{0.5, \, 0.5})\), written as.

$$\widetilde{\mathbb{F}}=\left\{\langle x,\left({S}_{\mathcalligra{m}}(x),\mu (x)\right),\left({S}_{\mathcalligra{n}}\left(x\right),\nu (x)\right)\rangle |x\in X\right\}.$$
(13)

For simplicity, a 2TLFFS can be written in the form \(\langle \left({S}_{\mathcalligra{m}},\mu \right)\left({S}_{\mathcalligra{n}},\nu \right)\rangle\).

The 2TLFFSs of all linguistic terms are shown in Table 3 based on the LTS \(\mathcal{S}=\left\{{S}_{0,}{S}_{1},\dots ,{S}_{6}\right\}\).

Table 3 Conversion of opinion linguistic terms into 2TLFFSs

A rigorous approach is used to choose values for linguistic concepts and their related 2TLFFSs to make it easier to express subjective judgments using fuzzy logic. The inherent ambiguity and imprecision in human perception and interpretation are successfully captured by these 2TLFFSs, which help to produce seamless transitions between linguistic concepts.


Step 2: The fuzzy opinion decision matrix is subjected to direct aggregation, employing an aggregation operator such as the arithmetic mean. For a set of 2TLFFSs \(\left\{{\widetilde{\mathbb{F}}}_{1},{\widetilde{\mathbb{F}}}_{2},\dots ,{\widetilde{\mathbb{F}}}_{\text{n}}\right\},\), this aggregation procedure (the weighting averaging operator) can be executed using the following equation, where \({{\upomega}}=\left[{\omega }_{1},{\omega }_{2},\dots ,{\omega }_{\text{n}}\right]\) is the vector of weights that satisfy \({\omega }_{i}\in \left[\text{0,1}\right]\) and \(\sum_{i=1}^{\text{n}}{\omega }_{i}=1\) [41].

$$2TLFFSWA\left({\widetilde{\mathbb{F}}}_{1},{\widetilde{\mathbb{F}}}_{2},\dots ,{\widetilde{\mathbb{F}}}_{\text{n}}\right)=\left\{\Delta \left(\text{\rm K} \sqrt[3]{1-\prod_{i=1}^{\text{n}}{\left(1-{\left(\frac{{\Delta }^{-1}\left( {S}_{{m}_{i}},{\mu }_{i}\right)}{\text{\rm K}}\right)}^{3}\right)}^{{\omega }_{i}}}\right),\Delta \left(\text{\rm K}\prod_{i=1}^{\text{n}}{\left(\frac{{\Delta }^{-1}\left( {S}_{{\mathcalligra{n}}_{i}},{\nu }_{i}\right)}{\text{\rm K}}\right)}^{{\omega }_{i}}\right)\right\}.$$
(14)

Step 3: The defuzzification process can be implemented through the following equation [60]:

$$\mathfrak{S}\left(\widetilde{\mathbb{F}}\right)=\Delta \left\{\frac{\text{\rm K}}{2}\left(1+{\left(\frac{{\Delta }^{-1}\left( {S}_{\mathcalligra{m}},\mu \right)}{\text{\rm K}}\right)}^{3}-{\left(\frac{{\Delta }^{-1}\left({S}_{\mathcalligra{n}},\nu \right)}{\text{\rm K}}\right)}^{3}\right)\right\},{\Delta }^{-1}\left(\mathfrak{S}\left(\widetilde{\mathbb{F}}\right)\right)\in \left[0,\boldsymbol{ }\text{\rm K}\right].$$
(15)

It is worth noting that the best-ranking order corresponds to the highest score.


External Group Aggregation: This method involves aggregating fuzzy opinion matrices from various decision matrices (DMs), each of which has been processed independently using the instructions provided in the processing unit [61]. The final group decision is then created by combining the results of various DMs using the arithmetic mean aggregation approach. With the help of the group's experts, this method makes it easier to produce the final ranking. This thorough assessment aids in the assessment of reliable machine learning models for ASD triage applications, facilitating a thorough knowledge of their efficacy.

3 Results and Discussion

This section provides a comprehensive examination of the outcomes achieved in each of the phases, with the presentation structured as follows:

3.1 Fusion Results for the ASD Triage Dataset

The outcomes of the data fusion process employing PCA, as outlined in Algorithm 1, on the ASD triage dataset yield 12 principal component (PC) attributes. As previously mentioned, these PC attributes are derived from the initial set of 19 autism-related features subjected to PCA. The results of the new ASD-triaged fused dataset with 12 corresponding PCs are shown in the supplementary file. To the best of the author's knowledge, and in accordance with the previous systematic review paper presented in [9], this is the first ASD dataset in the literature constructed based on medical and sociodemographic features. To provide a visual representation of these results, Fig. 7 illustrates the plotted outcomes of each PC. Additionally, comprehensive results of the newly fused ASD triage dataset, comprising 1296 samples, are provided as a supplementary file. It is important to emphasize that this fused ASD dataset offers a multitude of advantages to researchers and scholars. Furthermore, a 3D line plot of PCA features is shown in Fig. 8, while the variance results stemming from the PCA fusion preprocessing are depicted in Fig. 9.

Fig. 7
figure 7figure 7

Plotting of the 12 PC attribute results obtained from Algorithm 1

Fig. 8
figure 8

A 3D line plot of PCA features generated by Algorithm 1

Fig. 9
figure 9

Average results of PCA variance analysis

The analysis of variance results obtained from the PCA fusion preprocessing, as depicted in Figs. 7 and 8, provides valuable insights into the distribution of information among the 12 PCs. The proportion of variance explained by each PC is a critical indicator of its contribution to the overall ASD dataset.

In Fig. 9, we observe that the variance proportions vary across the 12 PCs. Specifically, the individual variance proportions range from 0.039 for some PCs, indicating a relatively lower contribution to the dataset's overall variability, to higher values for other PCs. It is noteworthy that while certain PCs may have lower variance proportions, they still capture important patterns or variations present in the original dataset. Therefore, they should not be dismissed as insignificant. Instead, they collectively contribute to the comprehensive understanding of the data. The cumulative variance, represented by the cumulative value of 0.796, demonstrates the combined explanatory power of all 12 PCs. In essence, this cumulative variance signifies the extent to which these PCs collectively account for the dataset's total variability. In our case, a cumulative variance of 0.796 implies that the 12 PCs together capture approximately 79.6% of the overall variance present in the ASD triage dataset.

This level of variance coverage is highly meaningful, as it implies that the majority of the dataset's inherent variability has been retained, even after dimensionality reduction through PCA. In these contexts, the variance results obtained from Algorithm 1 emphasize the effectiveness of PCA in preserving essential dataset characteristics. Researchers can leverage this fused dataset to develop and evaluate robust machine learning models for ASD triage applications, as it strikes a balance between dimensionality reduction and information retention.

3.2 Multiple-ML Model Results

The results of the performance evaluation metrics for the developed multiple-ML models applied to the fused ASD triage dataset are presented in Table 4 and Table 5. These tables provide a comprehensive overview of how these models perform under different scenarios, specifically concerning normal test examples (Table 4) and adversarial attack examples (Table 5).

Table 4 Performance metrics of ML models under normal test examples
Table 5 Performance metrics of ML models under adversarial attack examples

In the analysis of the performance metrics presented in both Tables 4 and 5, several critical considerations arise, primarily concerning model selection, trade-offs, and conflicts:

  • Selection Considerations: The central objective of this evaluation is to pinpoint the most robust ML model for real-time ASD patient triage applications. To accomplish this, decision-makers must identify models that excel in specific criteria or metrics. For instance, in Table 4, the SVM model has exceptionally high AUC, CA, and precision values, positioning it as a strong contender for selection, especially when prioritizing these metrics. Similarly, in Table 5, the NN model exhibits noteworthy performance in terms of the AUC, CA, and precision, making it a viable choice, particularly in adversarial attack scenarios.

  • Trade-off Analysis: The process of selecting a robust ML model necessitates navigating trade-offs among diverse evaluation criteria. A single model rarely excels in all aspects simultaneously. For instance, the kNN model in Table 4 achieves a high specificity score but lags behind in CA and precision. Here, conducting a trade-off analysis becomes indispensable in identifying models that strike a balance between various performance metrics.

  • Resolving Conflicts: It is necessary to evaluate the behavior of ML models under adversarial attacks, as shown in the table below in Table 5. This is the place where the robustness of models is ultimately tested. The trade-offs that are observed under adversarial conditions can vary from those of normal testing samples. As an example, the RF model stands out with respect to CA, AUC, and precision because it offers a reliable option in a noisy environment. The performance assessment in extreme cases is an important factor for understanding the efficiency of a model that can be used in real-world ASD triage applications.

  • Robustness Assessment: It is necessary to evaluate the behavior of ML models under adversarial attacks, as shown in the table below in Table 5. This is the place where the robustness of models is ultimately tested. The trade-offs that are observed under adversarial conditions can vary from those of normal testing samples. As an example, the RF model stands out with respect to CA, AUC, and precision because it offers a reliable option in a noisy environment. The performance assessment in extreme cases is an important factor for understanding the efficiency of a model that can be used in real-world ASD triage applications.

Decision-making in regard to opting for the robust ML model to deploy for ASD triage services is premised on the metrics in Tables 4 and 5. Consequently, the review and benchmarking process should use the DM and 2TLFFDOSM to assess the trade-offs and conflicts while choosing an ML model that is robust. Finally, the selection process should be oriented toward both normal and adversarial testing, and this operation should be performed in such a way that the positive aspects of both scenarios are maintained and a balanced trade-off is achieved.

3.3 Evaluation and Benchmarking Results

The results of this section are a reflection of the framework developed in the third phase, representing the most critical aspect of showcasing the robust ML model. The evaluation and benchmarking results are constructed based on two crucial components: the developed DM and 2TLFFDOSM. However, these two components cannot fully elucidate patient outcomes without considering the results of the multiple-ML models presented in the previous section. First, it is essential to showcase the outcomes of the decision-makers, represented by the three experts, through the opinion matrix results derived from Eq. 12, as demonstrated in Table 6. Table 6 contains three opinion matrices, each corresponding to an expert, illustrating their comparisons and evaluations.

Table 6 Opinion matrices representing the evaluations of three experts

There is a noticeable variation among the three opinion matrices presented in Table 6, leading to different rankings for each expert. The individual rankings obtained by applying the mathematical model of 2TLFFDOSM to these matrices and incorporating the developed DM are presented in Table 7.

Table 7 Rankings from individual 2TLFFDOSM assessments by three experts

Definition 6

[59]. To compare 2-tuple linguistic information \(\left({S}_{k1},{\mathcalligra{k}}_{1}\right)\) and \(\left({S}_{k2},{\mathcalligra{k}}_{2}\right)\), the following rules are applied:

  • \(\text{if} \, k1<k2, \text{then} \left( {S}_{k1},{\mathcalligra{k}}_{1}\right)<\left( {S}_{k2},{\mathcalligra{k}}_{2}\right)\)

  • \(\text{if} \, k1=k2, \text{then}\)

  • \(\left({S}_{k1 },{\mathcalligra{k}}_{1}\right)=\left( {S}_{k2},{\mathcalligra{k}}_{2}\right), \text{if} \, {\mathcalligra{k}}_{1} = {\mathcalligra{k}}_{2},\)

  • \(\left({S}_{k1} ,{ \mathcalligra{k}}_{1}\right) < \left( {S}_{k2},{ \mathcalligra{k}}_{2}\right), \text{if} \, {\mathcalligra{k}}_{1} < {\mathcalligra{k}}_{2},\)

  • \(\left({S}_{k1 },{ \mathcalligra{k}}_{1}\right) > \left( {S}_{k2},{ \mathcalligra{k}}_{2}\right), \text{if} \, { \mathcalligra{k}}_{1} > {\mathcalligra{k}}_{2}.\)

Table 7 displays the individual ranking results provided by three experts for the ML models (A1 to A8) based on the opinion matrices. While some agreements in rankings exist for certain alternatives, differences are notable for others, underscoring the subjectivity involved in decision-making and evaluation processes. Figure 10 visualizes the ranking orders of the eight ML models, revealing that obtaining a unique rank is challenging for the three experts. Therefore, external group aggregation is essential for obtaining a unique rank and determining a robust ML model, as shown in Table 8.

Fig. 10
figure 10

Variance in individual rankings among three experts

Table 8 T2TLFFDOSM ranking results based on external group aggregation

Table 8 presents the results of the 2TLFFDOSM ranking based on external group aggregation for the eight ML models. Each ML model is associated with fuzzy scores and 2-tuple scores, which are then synthesized into a crisp score to facilitate ranking. The rankings are as follows: A8 (LR) ranks first with a 2TLFFDOSM score of 1.3370, followed by A3 (SVM) with a score of 1.3162, A5 (RF) with a score of 1.2930, A6 (NN) with a score of 1.2575, A7 (NB) with a score of 1.1763, A2 (DT) with a score of 1.0519, A1 (kNN) with a score of 1.0267, and A4 (SGD) with a score of 0.9805. These rankings provide valuable insights into the suitability of ML models for real-time ASD patient triage based on the developed evaluation methodology, encompassing both scenarios: normal test examples and adversarial attack examples.

To delve deeper into the discussion of these results, it is essential to revisit the performance metrics of the ML models presented in the previous section (Tables 4 and 5). The top-ranking model based on 2TLFFDOSM is A8 (LR). In terms of the performance of LR under both scenarios, for the normal test examples, 0.9720, 0.0870, 0.9906, 0.9367, 0.9360, 0.9369, 0.9367, 0.1728, and 0.9684 for the criteria C1 = Train time, C2 = Test time, C3 = AUC, C4 = CA, C5 = F1, C6 = Precision, C7 = Recall, C8 = LogLoss, and C9 = Specificity, respectively. Additionally, the results of the LR during adversarial attack examples for the same sequence of criteria were 1.2720, 0.3010, 0.9206, 0.8767, 0.8760, 0.8769, 0.8757, 0.0828, and 0.8284. These detailed metrics shed light on various aspects of LR performance, which contributes to its top-ranking position in the 2TLFFDOSM evaluation.

Given the strong performance of the LR model and its architectural features, the model appears to be a serious option for real-time ASD patient triage applications. LR exhibits high testing example accuracy on the normal test set, which is quite desirable for cases where fast predictions are needed. Furthermore, LR has a high AUC of 0.9906, meaning that the LR model more accurately discriminates between the three triage classes—urgent, moderate, and minor. Furthermore, LR boasts an excellent competitive CA, a balanced F1 score, and a sensational precision and recall score that guarantees minimum false negatives and false positives. The model performs very well in terms of the log loss, which is a likelihood estimate, and provides robust specificity for differentiating the correct classes of triage level 3. LR has a linear classification model that is well suited to the traits of the dataset.

Thus, PCA-based fusion of the ASD triage dataset significantly contributed to improving the performance of the LR model. With 12 PCs generated by algorithm 1, the PCA dimensionality reduction process was implemented to retain vital input information. Therefore, although the number of inputs was reduced, so was the computational complexity and the risk of overfitting. As a result, in the comparison of the ML models for the triage of ASD patients, the LR model outperformed the others the most. Moreover, the quality of the LR model performance and its resistance to adversarial attacks were also boosted by PCA preprocessing. The synergy between LR's classification capabilities and PCA-based data fusion highlights the potential of this utilized approach for real-world ASD triage applications.

Furthermore, based on the above distinct results, the selection process has reached its peak, and the trade-off issue has been effectively addressed. The conflicts and other issues discussed earlier have been meticulously resolved through the developed framework, which amalgamates the power of PCA for data fusion, multiple-ML models, and fuzzy decision-making methodology for robust ML models. The comprehensive evaluation and benchmarking methodology presented in this study has paved the way for the identification of LRs as the optimal model for real-time ASD patient triage.

4 State of the Art: Comparison of Study

In this section, a comprehensive comparison of the proposed framework with the literature is conducted using a checklist benchmarking approach. One of the recent comparison methodologies frequently used in the literature is checklist benchmarking. This approach involves comparing various important checklists presented as factors to emphasize the novelty of the presented work. The definitions of these checklists are provided below, and Table 9 illustrates how the proposed framework contributes to the existing body of literature based on the results obtained.

  • 1st Normal/Adversarial Perspectives: This point signifies how the proposed work emphasizes the consideration of both normal test examples and adversarial attack examples during the development of machine learning models for triaging autistic patients.

  • 2nd Fusion improvements: This point underscores the significance of feature fusion in enhancing the development of machine learning models. Therefore, this highlights the approach taken to preprocess the ASD dataset, emphasizing that studies meeting this checklist comparison point should address feature fusion during the preprocessing stage.

  • 3rd, Development of the MCDM Selection Method: This point highlights the distinction between utilizing existing methods and developing new methods for selecting the best machine learning model for ASD triage. Model selection is a fundamental challenge addressed in the present study, and it advocates for the use of an appropriate MCDM method to effectively address this issue. Consequently, studies that investigate the development of novel MCDM methods are considered to satisfy this comparison point.

  • 4th Decision Matrix Development: Building upon the context of "Selection Method Development," this point pertains to the creation of new decision matrices or enhancements to this crucial component of the decision-making process. It emphasizes the significance of innovating or improving decision matrices, which play a pivotal role in the study's methodology.

  • 5th Medical and Sociodemographic Features: The incorporation of both medical and sociodemographic features has demonstrated its impact on the detection, diagnosis, and triage of ASD patients. Consequently, this point underscores the integration aspects of both types of features when developing the study framework for assessing autistic patients.

  • 6th ML Criteria Issues: This point addresses the resolution of the aforementioned issues encountered during the evaluation and benchmarking of ML models. These issues encompass aspects such as the significance of criteria, trade-offs, and conflicts.

Table 9 Comparison perspectives and points in the benchmarks and proposed framework

A comparison of the proposed framework with the literature highlights significant differences among benchmark studies. The total score represents how well each study and the proposed framework have addressed the comparison points. The proposed framework scores 100%, while the benchmarks have varying scores, ranging from 20 to 70%. Among the benchmarks, Benchmark#1 stands out as the most relevant, particularly in terms of its focus on the development of ML models in normal scenarios, the evaluation of these models through decision matrix development, and the integration of medical and sociodemographic features. However, these methods are inadequate for addressing adversarial attack examples and feature fusion aspects. Interestingly, all benchmark studies, including Benchmark#1, did not consider adversarial attack examples or feature fusion when developing ML models. Moreover, the development of new MCDM methods was also overlooked in these benchmarks. This analysis underscores the unique contributions and areas of improvement in both the proposed framework and the benchmark studies, emphasizing the need for a more comprehensive approach in future research to address these critical aspects.

5 Conclusion

The development of real-time triage applications for ASD is an early-stage and critical venture given the importance of autism in the health sector. Advancing ASD triage solutions requires a study on which experimental-based theories, well-structured frameworks, and well-known methodologies are developed. For the first time, this study undertook a thorough exploration of ASD triage and included all aspects to overcome the identified limitations of previous studies and to achieve effective solutions. This study successfully established machine learning theories with fuzzy decision-making, which was crucial in the accomplishment of our objectives. The multiphase development process concluded with the creation of the formulated mathematical model FDOSM, where 2TLFFDOSM was indeed quite an achievement. These phases successfully identified a robust model based on the fused ASD dataset, which is rare in the current literature. Notably, our study presents the first fused ASD triage dataset through two PCA algorithms, thus providing researchers with a valuable resource for further investigation and research development.

Moreover, the developed DM in conjunction with 2TLFFDOSM efficiently addressed key challenges such as model selection, criterion importance, trade-offs, and conflicts. Additionally, our multi-ML models exhibited promising performance across various metrics, encompassing both normal test examples and adversarial attack examples. Although the overall performance results favor normal test examples, the determination of the robust ML model requires consideration of both scenarios simultaneously. Our study emphasized this critical aspect, which can serve as a model for researchers in other fields to follow a similar development sequence to validate their experiments and make informed decisions regarding robust ML models. However, the developed ML models are limited under default parameter settings in this study while adaptation of these parameters could provide more explainable results. The explainability of the robust model is not presented which is also the limitation we faced in this study. This can be achieved through the LIME or SHAP method.

Future research could explore ASD triage based on genetic contributions, a burgeoning area of interest. The development of a dynamic DM in tandem with a suitable fuzzy MCDM method could pave the way for the first triage method based on genetic analysis. This would further enhance our understanding and capabilities in ASD diagnosis and treatment. Finally, we plan to include more explicit numerical examples to further demonstrate the effectiveness of our proposed framework in handling complicated cases in ASD triage. The significance of models' parameter settings could be more considered in future works. Each model has different parameter settings and this can be further optimized by optimization techniques such as genetic or SWARM intelligence algorithms.