1 Introduction

Deep learning has been studied since 1940 and has recently attracted attention with the development of technology and hardware for artificial neural networks [1]. In addition, artificial neural networks that can handle large amounts of data and solve complex problems are required. Deep neural networks (DNNs), which increase the number of hidden layers in artificial neural networks, have emerged to meet this requirement [2].

Convolutional neural networks (CNNs) are a representative type of DNN and are used in various fields, such as image classification [3, 4], face recognition [5], and video processing [6], as well as in safety–critical systems [7,8,9], such as those of autonomous vehicles, in which robustness is very important [10,11,12].

Some CNN models can classify images with higher accuracy than humans for specific datasets, but there are problems that should be solved to operate in the real-world. The models should be tolerant to untrained data [13, 38]. However, a common assumption in deep learning is that the training dataset contains all types of data that exist in real-world operating environments [14, 15]. This assumption can be easily violated in the real-world, and if untrained data are input into the CNN, they are classified as trained with high confidence [16].

In real-world operating environments, the performance of CNNs can be degraded owing to data of untrained types, which limits its operability [17]. Misclassification, particularly in safety–critical systems based on artificial intelligence, can lead to catastrophic consequences [18]. For example, autonomous vehicles can misclassify untrained elements in environments such as deserts or countryside where lanes and traffic signals do not exist, countries with different traffic signal systems, and roads with newly added signs. This can have a detrimental effect on human life, property, etc. Thus, studies have been conducted to solve these problems [19,20,21].

Previous studies use information from data or models to identify the data of types that were not used to train DNNs. Data-based studies analyze the characteristics of data using the Euclidean distance [22] or extreme value theory [23]. Although data-based studies can identify data of untrained types at a lower cost than model-based studies, the accuracy is low. Model-based studies identify the data of new untrained types by adding prototypes [16, 24] or adding or modifying extra structures or modules to the model [13, 40]. Model-based studies increase the computational cost and require more resources owing to the increased number of parameters in the model, which can affect its performance.

In the field of neuroscience, studies have been conducted to characterize the representation of brain regions [25, 26, 39]. Those studies have identified brain regions that perceive stimuli by providing input signals through sensory organs and characterizing the information recognized by specific brain regions. The studies conducted representational similarity analysis between the input signals and the brain regions and showed that if the input signals are similar, the brain regions that perceive stimuli are similar. The structure of the DNN is inspired by the human brain and works similarly to the human brain system [27,28,29]. Therefore, it is expected that a relation exists between the type of data input to the DNN and the specific region of the DNN. This study defines concepts and methods and conducts experiments to answer the following research questions.

  • RQ1 (Characteristic): Does a set of neurons that play a major role in each class exist? Can a set of neurons identify class characteristics?

  • RQ2 (Similarity): Can the similarity between neuron sets be used as a criterion to identify data from trained and untrained classes?

Inspired by neuroscience research, this study identifies the class neuron cluster (CNC), a set of neurons that are activated based on the class of input data, and analyzes the relation between the class of input data and the CNC. Based on the analysis results, we confirm whether the activated neurons are similar if the characteristics of the classes are similar. Furthermore, we identify data types that the model has not trained on using the CNCs.

First, we measure the criteria for identifying the data of classes that the model has not trained using sets of neurons activated by the training and validation data. Next, based on these criteria, we identify the data from the untrained classes.

The method proposed in this study uses only activated neuron information without adding or modifying the model structure; this has the advantages of low-computational cost and no impact on the classification performance of the model.

The main contributions of this study are as follows:

  • We propose a CNC, which is a set of neurons used to identify a specific class. We show that the CNC recognizes class characteristics by conducting a similarity analysis between CNCs for the CIFAR-10 and STL-10 datasets and ResNet models.

  • We propose a new model-based method for identifying untrained class data using the CNC similarity.

  • We conduct experiments on public datasets (MNIST, CIFAR-10, and STL-10) to demonstrate the feasibility and effectiveness of the untrained class data identification method.

Section 2 describes some related work on the identification of the data of classes that the model has not trained on. Section 3 details the definition of the CNC, a set of neurons that are activated based on the class of input data, the identification of the CNC, and the analysis of the similarities between CNCs. Section 4 describes how to identify untrained class data. Section 5 presents the datasets, models, and configurations used in the experiments for identifying untrained class data. Section 6 presents and analyzes the experimental results. Section 7 provides the conclusions and the scope for future work. All variables and acronyms used in this paper are listed in Appendix 1.

2 Related work

The purpose of this study is to identify untrained class data. Therefore, the purpose is different from that of studies that identify modified data or outliers of known classes based on the generative adversarial network (GAN) and adversarial attack methods. Model training speed, performance improvement, or pruning for weight reduction are not addressed in this study.

Studies on the identification of untrained class data can be classified into two categories: data-based and model-based studies. In data-based study, Mendes et al. [22] proposed an open-set nearest neighbor method to identify untrained class data based on the Euclidean distance, depending on whether the label of the closest training data in the input data matched the label of the second closest training data. The method identifies trained and untrained class data with high accuracy of approximately 80% or more in experiments on simple datasets; however, the accuracy on complex datasets, such as Caltech-256, is lower than approximately 50%.

Zhang and Patel [23] proposed a sparse representation-based open-set recognition method to identify untrained class data by analyzing the difference between the input data and the training data using the extreme value theory method. The method identifies trained and untrained class data with a high accuracy of approximately 90% or more in experiments on simple datasets; however, the accuracy on complex datasets, including Caltech-256, is approximately 68–80%.

Similar to [23], Yu et al. [43] proposed the open set fault diagnostic method, which identifies untrained class data by analyzing it with the extreme value theory method. On industrial datasets, the proposed method increased the identification accuracy by approximately 10%.

In model-based study, Yang et al. [16] proposed generalized convolutional prototype learning with prototype loss for learning CNN models by adding prototype parameters. This method increases the classification accuracy by reducing the distance between the intra-class data and increasing the distance between the inter-class data based on the Euclidean distance. In addition, it improves the robustness of the CNN by training it to classify the added new class by adding a prototype for a new class. The method increases the accuracy by approximately 0.2% in the experiment on the MNIST dataset and by approximately 0.5% in the experiment on the CIFAR-10 dataset.

Similar to Yang et al. [16], Wang et al. [24] increased the classification accuracy by reducing the distance between the intra-class data and increasing the distance between the inter-class data using the CNN-based prototype ensemble technique. They identified untrained class data by calculating the novelty of each class using the open-world nearest mean classifier. The method identifies untrained class data with an accuracy of approximately 80% in the experiment on the simple characteristic Fashion-MNIST dataset and approximately 60% in the experiment on the complex CINIC dataset. Prototype-based methods, such as those of Yang et al. [16] and Wang et al. [24], require the addition of prototypes for the untrained class as parameters to the model, and an increase in the number of parameters increases the computational cost and required resources.

Gao et al. [41] proposed Convolutional open-world multi-task image Stream classifier with Intrinsic Similarity Metrics (CSIM), which identifies untrained class data based on similarity metrics. The method identified untrained class data with approximately 70% and less than 50% accuracy in experiments on MNIST and CIFAR-10 datasets, respectively.

Prototype-based methods, such as those of Yang et al. [16] and Wang et al. [24], require the addition of prototypes for the untrained class as parameters to the model, and an increase in the number of parameters increases the computational cost and required resources. These methods, including Gao et al. [41], require regular updating of model parameters to enhance the model’s untrained class data identification performance.

Zhou et al. [44] identify untrained class data using the learning to classify with incremental new class method based on entropy and probability. When data are input, if the combined score of entropy and probability is greater than the threshold, it is classified as an untrained class, and the accuracy is approximately 2% higher than the comparative methods.

Ma et al. [45] proposed a method for identifying untrained class data in a generative adversarial network when the discriminator’s score is greater than a threshold. The proposed method’s accuracy was approximately 5% higher than the comparative methods.

Yoshihashi et al. [13] proposed classification-reconstruction learning for open-set recognition, a method of probabilistic identification of untrained class data using deep hierarchical reconstruction nets, designed based on an openmax classifier modified with softmax. The method improves the F1-score by approximately 0.6 in the experiments on the modified ImageNet and LSUN datasets.

Dudi and Rajesh [40] proposed the shark smell-based whale optimization algorithm (SS-WOA), which identifies untrained class data based on the activation function values and classification costs of the CNN model. The method derives a threshold using the activation function value when data are input and identify the input data as untrained class data when the classification cost is less than the threshold. The method improves the classification accuracy by approximately 0.57–4% compared with other classification models in the experiment on plant-leaf datasets. The methods proposed in [13] and [40] increase the computational cost because separate modules (extra layer and algorithm) are added to the CNN model.

The untrained class data identification method proposed in this study differs from data-based methods in that it uses model information. In addition, it does not modify the model, such as a loss function or classifier, or add new modules to identify untrained class data; it uses only the activation information of the neurons in the trained model. Thus, because the method is model-independent, it has the advantages of low computational cost and no influence on the classification performance of the model, unlike previous model-based methods.

3 Class neuron cluster

Here, we define the CNC as a set of neurons that are activated by the input data of a particular type (class). Then, we identify the CNC for each class of input data and analyze the relation between the class of input data and the CNC.

3.1 Definition

Neurons recognize the characteristics of the input data and transmit the recognized information to the next layer [30]. The CNC is a set of neurons that recognize the characteristics of the data of a particular class, and these neurons are activated when the data of a particular class are input. Before defining the CNC, we define a function that determines whether a neuron is activated as in Definition 1.

Definition 1

Neuron activation function.

  • A set of neurons: \(N = \left\{ {n_{1} , n_{2} , \ldots , n_{l} } \right\}\)

  • A set of classes: \(S = \left\{ {c_{1} , c_{2} , \ldots , c_{m} } \right\}\)

  • A set of data in class\(c\): \(D_{c} = \left\{ {d_{1} , d_{2} , \ldots , d_{n} } \right\}\)

For input data \(d\), if the activation value of the neuron exceeds 0, neuron \(n\) is determined to be activated by data \(d\). For activation function \(f\), the function used to determine the activation of the neuron is expressed in Eq. (1).

$${\text{active}}\left( {n,\,d} \right) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {{\text{if}}~f\left( {n,\,d} \right) > 0} \hfill \\ {0,} \hfill & {{\text{else}}} \hfill \\ \end{array} } \right.$$
(1)

The CNC is a set of neurons that are activated more than a certain rate by the data of a particular class. Therefore, the neuron activation ratio is defined as in Definition 2 based on the neuron activation function in Definition 1.

Definition 2

Neuron activation ratio.

The activation ratio of neuron \(n\) for dataset \(D_{c}\) of a particular class \(c\) is the ratio of data that activates the neuron among the total data of the dataset, and it is expressed in Eq. (2).

$${\text{active}}\_{\text{ratio}}\left( {n, \,D_{c} } \right) = \frac{{{\Sigma }_{{d \in D_{c} }} {\text{active}}\left( {n, \,d} \right)}}{{\left| {D_{c} } \right|}}$$
(2)

When the data of a particular class are input, a CNC, a set of neurons that are activated more than a certain rate, is defined as Definition 3 based on Definitions 1 and 2.

Definition 3

Class neuron cluster.

The \({\text{CNC}}_{c}\) of a particular class \(c\) is a set of neurons whose neuron activation ratio is greater than or equal to the threshold for dataset \(D_{c}\) and is expressed as Eq. (3).

$${\text{CNC}}_{c} = \{ n \in N|{\text{active}}\_{\text{ratio}}\left( {n, \,D_{c} } \right) \ge {\text{threshold}}\}$$
(3)

For example, as depicted in Fig. 1, when the number of data of class \(c\) is 100 and the threshold is set at 0.5, neurons activated more than 50 times are identified as the CNC of class \(c\).

Fig. 1
figure 1

Example of a CNC

3.2 Class neuron cluster similarity analysis

To verify that the previously defined CNC recognizes the characteristics of the input data, we identify CNCs for each class of input data and analyze the similarities between CNCs.

3.2.1 Class neuron cluster similarity analysis approach

A CNC is a set of neurons used to recognize the characteristics of data. Therefore, when the CNC is measured for each class, the CNCs of classes with similar characteristics are expected to have many activated neurons in common. Conversely, the CNCs of classes with only slightly similar characteristics are expected to have fewer commonly activated neurons. To confirm this, we analyze the similarity between CNCs. Figure 2 shows an overview of the CNC similarity analysis.

Fig. 2
figure 2

Overview of the CNC similarity analysis

When the model that trains the training data composed of m classes (\(S = \left\{ {c_{1} , c_{2} , \ldots , c_{m} } \right\}\)) \(D_{S}^{{{\text{tr}}}}\) is \(M_{S}\), we identify the CNC for each class by inputting \(D_{S}^{{{\text{tr}}}}\) into \(M_{S}\). Then, we measure the similarities between CNCs and analyze the similarity between CNCs by representing them as a similarity matrix. The method for measuring the similarity between CNCs is described in Definition 4.

Definition 4

Similarity measurement.

The similarity between two clusters, \({\text{CNC}}_{{c_{i} }}\) and \({\text{CNC}}_{{c_{j} }}\), is measured as expressed in Eq. (4).

$${\text{CNC}}_{{c_{i} }} \oplus {\text{CNC}}_{{c_{j} }} = \frac{{{\text{CNC}}_{{c_{i} }} \cap {\text{CNC}}_{{c_{j} }} }}{{{\text{CNC}}_{{c_{i} }} \cup {\text{CNC}}_{{c_{j} }} }}$$
(4)

When the similarity between two CNCs is the proportion of neurons that are commonly activated as described in Definition 4, we measure all the similarities between the identified CNCs and express them as a similarity matrix.

The CNC similarity analysis is conducted for the CIFAR-10 [31] and STL-10 [32] datasets and the ResNet model. CIFAR-10 and STL-10 are datasets comprising 10 classes (living-being: 6 and object: 4); CIFAR-10 has 5,000 training data, and STL-10 has 500 training data for each class.

We analyze the similarity in the CIFAR-10 dataset for 24,576 neurons in the last six layers of the ResNet-20 model, and the STL-10 dataset for 368,640 neurons in the last 10 layers of the ResNet-32 model. The reason for targeting the latter layers is explained in Sect. 5.

Analyzing the CNC similarity identifies CNCs by inputting 500 training data for each class into the model, measures all similarities between the identified CNCs, and expresses them as a similarity matrix.

3.2.2 Class neuron cluster similarity analysis results

When the threshold of the neuron activation ratio is set at 0.5, the similarity matrixes representing the results of the CNC similarity analysis with 10 classes of CIFAR-10 and STL-10 are presented in Figs. 3 and 4, respectively.

Fig. 3
figure 3

Similarity matrix between the CNCs of CIFAR-10

Fig. 4
figure 4

Similarity matrix between the CNCs of STL-10

The analysis results on the CIFAR-10 dataset show that the similarities between living-being–living-being pairs and between object–object pairs are higher than the similarities between living-being–object pairs. In the living-being–living-being pairs, the cat–dog pair has the highest similarity, and in the object–object pairs, the ship–airplane pair has the highest similarity. The pair with the lowest similarity is the horse–ship pair, which is a living-being–object pair.

The analysis results on the STL-10 dataset, similar to those on the CIFAR-10 dataset, show that the similarities between living-being–living-being pairs and object–object pairs are higher than those between living-being–object pairs. In the living-being–living-being pairs, the dog–monkey pair has the highest similarity, and in the object–object pairs, the ship–airplane pair has the highest similarity. The pair with the lowest similarity is the horse–airplane pair, which is a living-being–object pair.

The similarity analysis results indicate that the characteristics of classes with high similarity between CNCs are more similar than those with low similarity, and the CNC is a set of neurons that recognizes the data characteristics of a particular class.

As described in Definition 3, CNC identification depends on the threshold. When the threshold is small, neurons activated by a small amount of data are included in the CNC, so that the CNC can be identified as coarse-grained. A coarse-grained CNC can flexibly identify data of various shapes; however, the data may be mistaken for a different class. Conversely, when the threshold is large, the CNC can be identified as fine-grained, which may not recognize data in various shapes and may result in poor flexibility. We conducted a similarity analysis based on the thresholds on the CIFAR-10 dataset and ResNet-20 model, and the analysis results are described in Appendix 2.

4 Untrained class data identification approach

The identification of untrained class data consists of a measure of model identifiability and identification of untrained class data. Figure 5 shows a flowchart of the untrained class data identification approach.

Fig. 5
figure 5

Flowchart of the untrained class data identification approach

The first step measures the model identifiability based on the training and validation data. The second step measures the maximum class similarity based on the unknown class data and training data and identifies the untrained class data by comparing it with the model identifiability. Each step is described in detail in the following subsections.

4.1 Measure of model identifiability

Model identifiability is a criterion for determining whether arbitrary data inputs to the model are data of classes that the model is not trained on and is measured based on class identifiability. Figure 6 shows an overview of model identifiability measurements.

Fig. 6
figure 6

Overview of the model identifiability measurement

We denote the training data excluding the data of class \(c_{h}\) as \(D_{{\overline{{c_{h} }} }}^{{{\text{tr}}}}\), the validation data excluding the data of \(c_{h}\) as \(D_{{\overline{{c_{x} }} }}^{{{\text{va}}}}\), and the model that is trained on training data \(D_{{\overline{{c_{h} }} }}^{{{\text{tr}}}}\) as \(M_{{\overline{{c_{h} }} }}\). First, we input training data \(D_{{\overline{{c_{h} }} }}^{{{\text{tr}}}}\) and validation data \(D_{{\overline{{c_{x} }} }}^{{{\text{va}}}}\) into model \(M_{{\overline{{c_{h} }} }}\) and identify the CNC for the training data and validation data, respectively. The validation data are selected from the original training data and do not overlap with the training data used for CNC identification.

Next, we measure the similarity between the CNCs of the training and validation data for each class. The similarity is used to identify each class. Class identifiability is a criterion for determining whether arbitrary data inputs to the model are data of a class that the model is trained on at the class level. Class identifiability is described in Definition 5.

Definition 5

Class identifiability.

The identifiability of a particular class \(c_{k}\) is the similarity between the \({\text{CNC}}_{{c_{k} }}^{{{\text{tr}}}}\) identified by inputting the training data, and the \({\text{CNC}}_{{c_{k} }}^{{{\text{va}}}}\) identified by inputting the validation data, as expressed in Eq. (5).

$${\text{identifiability}}_{{c_{k} }} = {\text{CNC}}_{{c_{k} }}^{{{\text{tr}}}} \oplus {\text{CNC}}_{{c_{k} }}^{{{\text{va}}}}$$
(5)

Model identifiability is a criterion for determining whether the input data are the data of the trained class at the model level. As expressed in Definition 6, it is measured as the minimum value among the class identifiabilities.

Definition 6

Model identifiability.

The identifiability of model \(M_{s}\) trained on \(S\) composed of \(m\) classes is the minimum value among the identifiabilities of \(m\) classes, as expressed in Eq. (6).

$${\text{identifiability}}_{{M_{S} }} = {\text{min}}\left( {\left\{ {{\text{identifiability}}_{{c_{i} }} } \right\}_{{i = 1}}^{m} } \right)$$
(6)

4.2 Identification of untrained class data

Untrained class data are identified by measuring the similarity between the CNC identified based on the input data and the CNC identified based on the training data and comparing the similarity with the model identifiability. Figure 7 shows an overview of untrained class data identification.

Fig. 7
figure 7

Overview of untrained class data identification

Because a small number of data may not be sufficient in identifying neuron clusters, we augment unknown class data, \(D_{{c_{h} }}^{{{\text{un}}}}\) with rotation, zoom, brightness, and shift [42] techniques. The augmented data are then used for \({\text{CNC}}_{{c_{h} }}^{{{\text{un}}}}\) identification. The unknown class data are selected from the test data.

We measure the class maximum similarity to determine whether the unknown class data are an untrained class. The class max similarity is measured by comparing the similarity between \({\text{CNC}}_{{c_{h} }}^{{{\text{un}}}}\) and CNCs based on the training data and is defined in Definition 7.

Definition 7

Class max similarity.

For the \({\text{CNC}}_{{c_{h} }}^{{{\text{un}}}}\) based on class \(c_{h}\), the maximum similarity of \(c_{h}\) is the maximum of the similarities measured among the CNCs based on \(m\) training data and is expressed in Eq. (7).

$${\text{similarity}}_{{c_{h} }}^{{{\text{max}}}} = {\text{max}}\left( {\left\{ {{\text{CNC}}_{{c_{i} }}^{{{\text{tr}}}} \oplus {\text{CNC}}_{{c_{h} }}^{{{\text{un}}}} } \right\}_{i = 1}^{m} } \right)$$
(7)

The untrained class data are identified by comparing the model identifiability measured in the previous step to the unknown class maximum similarity. The method for identifying the untrained class data is described in Definition 8.

Definition 8

Untrained class data identification.

If the unknown class max similarity, \({\text{similarity}}_{{c_{h} }}^{{{\text{max}}}}\) based on class \(c_{h}\), is less than the model identifiability, \({\text{identifiability}}_{M}\), \(c_{h}\) is identified as an untrained class, as expressed in Eq. (8).

$$c_{h} = \left\{ {\begin{array}{*{20}l} {{\text{untrained~class}},~} \hfill & {{\text{if}}~{\text{similarity}}_{{c_{h} }}^{{{\text{max}}}} < {\text{identifiability}}_{M} } \hfill \\ {{\text{trained~class}},} \hfill & {{\text{else}}} \hfill \\ \end{array} } \right.$$
(8)

If the unknown class max similarity is less than the model identifiability, the characteristics of the data of the unknown class are considered to be dissimilar to the characteristics of the data of all classes that the model is trained on; therefore, the data of the unknown class are identified as untrained class data.

To verify the performance of the untrained class data identification method, we define class identification accuracy based on the accuracy used in the existing classification test [37]. The class identification accuracy is measured based on the identification results obtained by comparing the model identifiability and the class max similarity. It measures the degree of identification of not only untrained class data but also trained class data as trained class data. The class identification accuracy is described in Definition 9.

Definition 9

Class identification accuracy.

The class identification accuracy is the ratio of correctly classified trained and untrained class data to the total data, as expressed in Eq. 9.

$${\text{Class Identification Accuracy}} = \frac{{\text{TP + TN}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$
(9)
  • TP: Identify untrained class data as an untrained class

  • TN: Identify trained class data as a trained class

  • FP: Identify trained class data as an untrained class

  • FN: Identify untrained class data as a trained class

To analyze the performance of the untrained class data identification method in detail, we measure the sensitivity, specificity, false-positive rate, and false-negative rate. Each performance metric is as follows.

$${\text{Sensitivity}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(10)
$${\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}}$$
(11)
$${\text{FNR}} \left( {\text{False negative rate}} \right) = \frac{{{\text{FN}}}}{{{\text{TP}} + {\text{FN}}}}$$
(12)
$${\text{FPR}} \left( {\text{Flase positive rate}} \right) = \frac{{{\text{FP}}}}{{{\text{TN}} + {\text{FP}}}}$$
(13)

5 Experimental design and settings

Here, we describe the experimental environments and methods used to measure model identifiability and identify the untrained class data.

5.1 Design to measure of model identifiability

Untrained class data are identified on the MNIST [33], CIFAR-10 [31], and STL-10 [32] datasets and ResNet [3], a CNN model. Table 1 presents the experimental datasets, models, and number of neurons to be measured.

Table 1 Experimental datasets and models

MNIST is a 28 × 28 pixel image dataset comprising handwritten numbers from 0 to 9, and CIFAR-10 is a 32 × 32 pixel image dataset comprising six living-being classes (bird, cat, dog, deer, frog, and horse) and four object classes (airplane, automobile, ship, and truck). Similar to CIFAR-10, STL-10 is a 96 × 96 pixel image dataset comprising six living-being classes (bird, cat, dog, deer, horse, and monkey) and four object classes (airplane, car, ship, and truck).

The MNIST and CIFAR-10 datasets, which have a small image size, employ the ResNet-20 model, whereas the STL-10 dataset, which has a relatively large image size, employs the ResNet-32 model.

CNNs store more abstracted information as the layers deepen [34,35,36]. Neurons in the front layers identify lines and corners, whereas neurons in the rear layers identify objects based on the information from neurons in the front layers. Therefore, identifying the CNC, which is a set of neurons that recognize the data characteristics of a particular class, on the rear layers is advantageous. We conduct experiments on neurons in layers corresponding to the last 31% of the activation layers of the model. We experiment on 24,576 neurons in six layers with a size of 8 × 8 × 64 for the ResNet-20 model, and 368,640 neurons in 10 layers with a size of 24 × 24 × 64 for the ResNet-32 model.

The numbers of training and validation data used for CNC identification for each dataset are listed in Table 2. We use 500 training data for each class in the MNIST and CIFAR-10 datasets, and 300 training data for each class in the STL-10 dataset. The image size of STL-10 is three times larger than those of MNIST and CIFAR-10. The STL-10 dataset requires more time for experiments than the MINST and CIFAR-10 datasets; therefore, a smaller number of training data are used. On the MNIST and CIFAR-10 datasets, there are 500 validation data for each class, and on the STL-10 dataset, there are 200 validation data for each class.

Table 2 Number of training/validation data used for CNC identification

Based on the CNCs identified using the training and validation data, we measure model identifiability. To identify untrained class data, in this study, we create a trained model \(M_{{\overline{{c_{x} }} }}\), except for the data corresponding to one specific class \(c_{x}\), from all classes in the dataset. Because each class in the dataset is excluded once, if \(m\) classes are present, a total of \(m\) models are created (\(M_{{\overline{{c_{1} }} }} ,{ }M_{{\overline{{c_{2} }} }} ,{ } \ldots ,{ }M_{{\overline{{c_{m} }} }}\)).

5.2 Design for identification of untrained class data

We identify untrained class data from unknown class data, which consists of nine trained class data and one untrained class data. The model should correctly identify the trained class data as trained class data and the untrained class data as untrained class data. The number of unknown class data is listed in Table 3.

Table 3 Number of unknown class data

The control parameters affecting this experiment are the number of data used for CNC identification and the activation rate threshold, which is the CNC identification criterion. Therefore, we conduct experiments by changing the values of the parameters. Each parameter is described as follows.

  • The number of data used for CNC identification.

  • The identification of a CNC, which is a set of neurons that recognize the characteristics of data, has a limitation with only one data. Therefore, we compare the results based on the number of data used for the CNC identification.

  • The models used in the experiment train only nine out of 10 classes, such as the unknown class data configuration. We conducted experiments on 10 models by excluding each class once and repeated them five times with different numbers of unknown class data used for CNC identification, thereby performing 50 experiments for each dataset.

  • Activation rate threshold.

  • As described in Sect. 3, if the threshold is changed, the CNC is identified differently, and the result of untrained class data identification may be different. Therefore, we proceed with the experiment by changing the threshold used for the CNC identification to 0.3, 0.5, and 0.7. The experimental results mainly describe the results when the threshold is set at 0.5, and the results when the threshold is set at 0.3 and 0.7 are described in detail in Appendices.

5.3 Configurations

All experiments were conducted on a server with a Windows 10 Pro OS, AMD Ryzen 7 2700X CPU, 64.0 GB RAM, and an NVIDIA GeForce RTX 2080 Ti GPU. The MNIST and CIFAR-10 datasets are obtained from Keras v2.6.0, and the STL-10 datasetFootnote 1 and ResNet modelFootnote 2 are available online.

6 Experimental results

Here, we describe the results of the measurement of model identifiability and identification of untrained class data based on model identifiability.

6.1 Results of measure of model identifiability

To measure the model identifiability, we identified CNCs by using the training and validation data of the MNIST, CIFAR-10, and STL-10 datasets. Figure 8 presents a box plot depicting the CNC size distribution identified by inputting the training data into 10 models (\(M_{{\overline{{{\text{airplane}}}} }}\), \(M_{{\overline{{{\text{automobile}}}} }}\), \(M_{{\overline{{{\text{bird}}}} }}\), \(M_{{\overline{{{\text{cat}}}} }}\), \(M_{{\overline{{{\text{deer}}}} }}\), \(M_{{\overline{{{\text{dog}}}} }}\), \(M_{{\overline{{{\text{frog}}}} }}\), \(M_{{\overline{{{\text{horse}}}} }}\), \(M_{{\overline{{{\text{ship}}}} }}\), \(M_{{\overline{{{\text{truck}}}} }}\)) trained with nine classes, with the exception of one class from the training data of the CIFAR-10 dataset.

Fig. 8
figure 8

Box plot for CNC size distribution based on the training data of CIFAR-10

The CNC size distribution for each class is measured by using the nine models. For example, the CNC size distribution of the airplane is the CNC size of the airplane class measured from models (\(M_{{\overline{{{\text{automobile}}}} }}\), \(M_{{\overline{{{\text{bird}}}} }}\), \(M_{{\overline{{{\text{cat}}}} }}\), \(M_{{\overline{{{\text{deer}}}} }}\), \(M_{{\overline{{{\text{dog}}}} }}\), \(M_{{\overline{{{\text{frog}}}} }}\), \(M_{{\overline{{{\text{horse}}}} }}\), \(M_{{\overline{{{\text{ship}}}} }}\), \(M_{{\overline{{{\text{truck}}}} }}\)) trained on the airplane data.

The CNCs are identified in sizes of approximately 3,700–4,800, accounting for approximately 15.1–19.5% of the total number of 24,576 neurons. The smallest CNC has a size of 3,785: the bird-class CNC (\({\text{CNC}}_{{{\text{bird}}}}^{{{\text{tr}}}}\)) of the model without the frog-class (\(M_{{\overline{{{\text{frog}}}} }}\)), and the largest CNC has a size of 4,727: the truck-class CNC (\({\text{CNC}}_{{{\text{truck}}}}^{{{\text{tr}}}}\)) of the model without the automobile-class (\(M_{{\overline{{{\text{automobile}}}} }}\)). Figure 9 displays a box plot depicting the CNC size distribution identified by inputting the validation data of the CIFAR-10 dataset.

Fig. 9
figure 9

Box plot for CNC size distribution based on the validation data of CIFAR-10

The CNC size distribution based on the validation data is similar to that of the training data. The CNCs are identified in sizes of approximately 3,700–4,800. The smallest CNC and the largest CNC are the bird-class CNC (\({\text{CNC}}_{{{\text{bird}}}}^{{{\text{tr}}}}\)) of the model without the frog-class (\(M_{{\overline{{{\text{frog}}}} }}\)), and the truck-class CNC (\(CNC_{{{\text{truck}}}}^{{{\text{tr}}}}\)) of the model without the automobile-class (\(M_{{\overline{{{\text{automobile}}}} }}\)), same as the training data.

The CNC sizes of the living-being classes (bird, cat, deer, dog, frog, and horse) are found to be smaller than the CNC sizes of the object classes (airplane, automobile, ship, and truck). This is because the data from living-being classes with complex characteristics activate various neurons. As a result, the number of neurons with a neuron activation rate greater than the threshold becomes small. For example, the CNC size of a bird with many curves and various shapes is identified to be smaller than that of a truck with many straight lines and simple shapes. The minimum, average, and maximum sizes of the CNCs for the training and validation data for each dataset are listed in Table 4.

Table 4 CNC min/average/max size of training and validation data

Similar to the CIFAR-10 dataset, the MNIST and STL-10 datasets have similar size distributions of CNCs for the training and validation data. The CNC size of MNIST is approximately 1,600–2,500, which is approximately 6.5–10.2% of the total 24,576 neurons, and the CNC size of STL-10 is approximately 98,000–135,000, which is approximately 26.6–36.6% of the total 368,640 neurons.

The more complex the dataset, the more neurons are identified as clusters. The CNC of the STL-10 dataset with a large image size (96 × 96 pixel) is the largest, and the CNC of the MNIST dataset with a small image size (28 × 28 pixel) is the smallest.

Conversely, in the case of the MNIST dataset, more than 90% of neurons are not used as clusters. In the CIFAR-10 dataset, approximately 80% or more, and in the STL-10 dataset, approximately 64% or more neurons are not used as clusters.

When the threshold is set at 0.3, the size of the CNC is larger than when the threshold is set at 0.5. This is because the threshold is small; therefore, fewer activated neurons are included in the cluster. When the threshold is set at 0.7, the size of the CNC is the smallest. This is because only highly activated neurons are included in the cluster as the threshold is increased. The min/average/max CNC sizes per dataset for thresholds of 0.3 and 0.7 are described in Appendix 3.

The model identifiabilities measured using CNCs for the training and validation data for each dataset are listed in Table 5. Ten models are created for each dataset excluding one class. The average model identifiabilities of MNIST, CIFAR-10, and STL-10 are 0.656, 0.721, and 0.646, respectively.

Table 5 Model identifiability for each dataset

The identifiability measured for each the model is used as a criterion for identifying the untrained class data. For example, for the MNIST dataset, the model without zero-class (\(M_{{\overline{{{\text{zero}}}} }}\)) identifies the unknown class data as the untrained class data if the maximum similarity between the training data and the unknown class data is less than 0.670, which is the model identifiability.

Model identifiability decreases as the threshold increases. This is because the larger the threshold, the smaller the size of the CNC; thus, the overlapping neurons between the CNCs by the training and validation data are small. When the thresholds are set at 0.3 and 0.7, the model identifiability for each dataset is described in Appendix 4.

6.2 Results of identification of untrained class data

To identify the untrained class data, we measured the class identification accuracy by inputting unknown class data into the model trained on part of the entire training data. The MNIST, CIFAR-10, and STL-10 datasets all consist of 10 classes, and we divided the unknown class data into nine trained class datasets and one untrained class dataset. Figure 10 displays the class identification accuracy of the dataset based on the number of unknown class data.

Fig. 10
figure 10

Class identification accuracy for the datasets based on the number of unknown class data

The relatively simple MNIST and CIFAR-10 datasets have higher class identification accuracy than the complex STL-10 dataset. The class identification accuracy in experiments with 10 data on the STL-10 dataset is 85%, and when more than 20 data are used, the accuracy is greater than 90% on all datasets. The more data used for CNC identification, the more accurately the CNC can be identified by recognizing the class characteristics, so the class identification accuracy is high. Figure 11 displays the identification accuracy for each class based on the number of unknown class data in the MNIST dataset.

Fig. 11
figure 11

Class identification accuracy on MNIST based on the number of unknown class data

In the experiment on the MNIST dataset, the ‘five’ class has the lowest identification accuracy when the number of unknown class data is ten, and the ‘one,’ ‘seven,’ and ‘nine’ classes have less than 90% accuracy in the data of 50 or less. We measured the sensitivity, specificity, FNR, and FPR to analyze the performance of the class data identification in detail. Table 6 lists the performance metrics on the MNIST dataset based on the number of unknown class data.

Table 6 Performance metrics on MNIST based on the number of unknown class data

The sensitivity, which is the ratio of correctly classified untrained class data, is 100%; therefore, all untrained class data are correctly classified. Conversely, the specificity, which is the proportion of correctly classified trained class data, is greater than 91%, and some of the trained class data are incorrectly classified. Consequently, the FPR, which is the proportion of incorrectly classified trained class data, is less than 9%.

When the threshold used for CNC identification is set at 0.3, most of untrained class data are incorrectly classified as trained class data. Conversely, when the threshold is set at 0.7, trained class data are incorrectly classified as untrained class data. The results of measuring the performance metrics on the MNIST dataset for thresholds of 0.3 and 0.7 are described in Appendix 5. Figure 12 presents the identification accuracy for each class based on the number of unknown class data in the CIFAR-10 dataset.

Fig. 12
figure 12

Class identification accuracy on CIFAR-10 based on the number of unknown class data

In the experiment on the CIFAR-10 dataset, the ‘automobile’ class has the lowest identification accuracy when the number of unknown class data is ten, and in all other cases, the accuracy has greater than 90%. Table 7 lists the other performance metrics on the CIFAR-10 dataset based on the number of unknown class data.

Table 7 Performance metrics on CIFAR-10 based on the number of unknown class data

Because some untrained class data are incorrectly classified, the sensitivity is 60–80%, and the lowest is when the number of unknown class data is 30. In contrast, the majority of the trained class data are correctly classified, with the specificity greater than 92%. Overall, the higher the amount of data, the better the performance.

Similar to the MNIST dataset, in the experiment on the CIFAR-10 dataset, when the threshold used for CNC identification is set at 0.3, most of the untrained class data are incorrectly classified. In contrast, some trained class data are misclassified when the threshold is set at 0.7. The results of measuring the performance metrics on the CIFAR-10 dataset based on the threshold are described in Appendix 5. Figure 13 presents the identification accuracy for each class based on the number of unknown class data in the STL-10 dataset.

Fig. 13
figure 13

Class identification accuracy on STL-10 based on the number of unknown class data

In the experiment on the STL-10 dataset, the identification accuracy of the ‘bird’ class is 0 when the number of unknown class data is ten, and 60% when the number of unknown class data is twenty. In all other cases, the accuracy has greater than 90%. Table 8 lists the other performance metrics on the STL-10 dataset based on the number of unknown class data.

Table 8 Performance metrics on STL-10 based on the number of unknown class data

Approximately half of the untrained class data are incorrectly classified, with a sensitivity of 50–60%. This is because untrained class data are identified as class data with similar characteristics. For example, a model trained on a dog but not a cat would classify a cat as a dog. The majority of the trained class data are correctly classified, with the specificity greater than 87%. The results of measuring the performance metrics on the STL-10 dataset based on the threshold are described in Appendix 5.

We compare the performance with CSIM [41] and CPE [24], which are CNN-based state-of-the-art untrained class data identification methods to examine the feasibility and effectiveness of CNC. Performance comparisons measure FNR and FPR on MNIST and CIFAR-10 datasets. Tables 9 and 10 list the performance metrics for each dataset of the methods.

Table 9 Performance metric of methods on MNIST
Table 10 Performance metric of methods on CIFAR-10

The FNR of CNC is mostly lower than that of other methods, but the standard error is greater than that of other methods. This is because the untrained data identification performance varies for class to class. CNC identifies all untrained class data on the MNIST dataset but cat and dog classes on the CIFAR-10 dataset as different classes.

On the CIFAR-10 dataset, the FPR of CNC is lower than other methods when the number of unknown class data is greater than 20. In other cases, however, it is higher than other methods.

CSIM and CPE identify the class of data when 300 data are input and store data predicted to be untrained class data in the buffer. Then, when the amount of data accumulated in the buffer reaches 1,000, the model trains on the data in the buffer. Therefore, the more data that are accumulated in the model, the better the model's performance.

CNC identifies untrained data using only the model's information without training the model. The performance of CNC depends on the amount of data used to identify neuron clusters. While CSIM and CPE require 300 data for class identification and 1,000 data for model training, CNC has comparable or better performance in identifying untrained class data with only 40 or more data.

Factors that greatly affect class identification accuracy are the number of data, data characteristics, and cluster identification threshold. If sufficient data are used to identify the CNC that recognizes the data characteristics, the class identification accuracy can be increased to 100%.

The class identification accuracy decreases as the data’s characteristics become more complex. The STL-10 dataset, which has more complex characteristics than the MNIST and CIFAR-10 datasets, has lower class identification accuracy when the same amount of data is used.

A low threshold does not accurately identify untrained class data, resulting in a low sensitivity. A high threshold does not accurately identify the trained class data, resulting in a low specificity. Although the threshold can be set differently depending on the model or target dataset, based on the experimental results of this study, we recommend setting it to 0.5.

Classification of the untrained class data is beyond the model’s capability; therefore, the model cannot classify them, and the performance of the model decreases with the number of untrained class data. For example, if 50 out of 100 data are untrained class data, the performance of the model cannot exceed 50%. Therefore, it is necessary to identify them so that they are not input into the model.

7 Conclusion

In this study, we proposed a method for identifying the data of classes that a DNN has not trained. We identified the CNC, a set of neurons that are activated based on the class of input data and confirmed that the CNC is a set of neurons that recognize the characteristics of data by conducting similarity analysis among the CNCs. In addition, the experiments on ResNet models and public datasets showed the feasibility and effectiveness of the untrained class data identification method based on CNC similarity. In future research, we plan to study methods that identify the class data that a model has trained, created by using GANs or mutated by adversarial attacks based on CNCs.