1 Introduction

Although knowledge constitutes our area of interest and the cognitive world, it does not have a unified and clear definition [2], which means that knowledge has uncertainty. Uncertainty, including randomness, vagueness, inconsistency, fuzziness, and incompleteness, exists in almost every system and model [3,4,5], the KBs are no exception. Uncertainty is really a key ingredient in the decision and a fundamental part in modelling [6], therefore, uncertainty is an important research topic in many real-world applications, such as decision making [7], recommendation system [8], Dempster-Shafer evidence theory [9], graph data [10], social networks [11, 12], multi-objective optimization problems [13] and risk analysis during the outbreak of COVID-19 [14,15,16,17].

In machine learning tasks, data is an indispensable resource for any machine learning model. However, any machine learning model always has uncertainty when it performs the task of predicting unobserved data. For the KBs, when using the existing knowledge in the KBs to perform inference and decision-making tasks, the uncertainty of the KBs will affect the prediction results of some downstream tasks of natural language understanding. An important reason is the existence of soft concepts, which have imprecision. For instance, in the phrase “large area”, the definition of large lacks strict quantitative standards.

Therefore, how to measure the uncertainty of a system plays a vital role in machine learning, data analysis, artificial intelligence applications, and cognitive science [6]. The current mainstream method is to use r ough s et t heory (RST) [18] to measure the uncertainty of KBs [1, 19]. RST, as a powerful tool that effectively measures the uncertainty of KBs, has attracted more and more attention from artificial intelligence practitioners, such as decision-making [20, 21], computer-aided diagnosis [22], attribute reduction [23], decision analysis [24, 25], and predicting the COVID-19 cases [26]. There are significant advantages in measuring the uncertainty of KBs based on RST. For instance, the RST uses the existing knowledge in the KBs to approximately characterize the unknown knowledge (i.e., target concept) that needs to be explored. The upper and lower approximation concepts in RST can well describe the uncertainty of KBs [18], and it can be combined with information theory to establish a connection between knowledge uncertainty and information entropy [27]. In addition, the RST is closely related to fuzzy mathematics, which uses the method of describing the fuzziness to measure the uncertainty of knowledge [7, 28].

1.1 Motivation

Based on RST, a series of measurement methods used to measure the uncertainty of the KBs are proposed. For instance, measurement based on the combination of information entropy and rough sets [29]; Using rough entropy theory to measure the uncertainty of KBs [30]; Measurement based on the combination of knowledge granulation and rough sets [31, 32]. Especially in recent work, many scholars focus on the method based on knowledge structure [33] to measure the uncertainty of knowledge bases [1, 19]. And obtain many exciting conclusions through a lot of experiments. Although the use of RST to measure the uncertainty of the KBs has achieved a series of great progress, we find that there are still many issues that have not been completely solved.

  1. 1.

    Conclusions are often based on the verification of a limited number of data sets, lacking a solid and comprehensive theoretical guarantee. For example, recently, an exciting experimental conclusion in [1] about measures of uncertainty for the KBs has attracted great research interest for scholars. In [1], the authors select three data sets and conduct numerical experiments on these three data sets to verify the superiority of using the knowledge amount to measure the uncertainty of the KBs.Footnote 1 However, these successful conclusions lack perfect mathematical expression and interpretability.

  2. 2.

    The classification of the instances of the knowledge base heavily depends on its attributes. Using RST to measure the uncertainty of a KB, an important prerequisite is that this KB can be divided by equivalence relations. Unfortunately, subject to certain real task scenarios, some KBs are difficult to meet this condition. For some special datasets, such as ProBase [34], it does not contain a large number of attributes of instances. Therefore, in ProBase, it is difficult to perform the above classification operations on instances based on their attributes. This requires us to transfer the opinions in RST to ProBase for analogy research.

To address the first issue, we employ RST as the theoretical basis to analyze the differences between different methods used to measure the uncertainty in the KBs. Specifically, (1) In terms of theoretical analysis, we compare and analyze in detail the mathematical principles of using knowledge granulation of knowledge structure, knowledge entropy of knowledge structure, rough entropy of knowledge structure and knowledge amount of knowledge structure (four measurement functions in total) to measure the uncertainty of the KBs. We find that the above four measurement functions can be unified into an elementary function λ(⋅) (i.e., (12)). The four measurement functions correspond to the four different inputs of function λ(⋅). Based on it, we theoretically prove that the conclusion in [1] is universal and interpretable, and further improved the theory of measures of uncertainty for the KBs. (2) In terms of experimental evaluation, we conduct experiments on 18 public datasets in different fields. The experimental results fully verified our theoretical analysis conclusions.

To address the second issue, we transfer the method of using RST to measure the uncertainty of the KBs to the study of the uncertainty of ProBase. (1) In terms of theoretical analysis, we explore the theoretical feasibility of using RST to measure the uncertainty of ProBase. From the view of RST, equivalence relations determine the partitions on the set \(\mathcal {W}\), and get equivalence classes under different equivalence relations thereby. Inspired by this, we regard an equivalence relation in the KBs as a hypernym (or concept) in ProBase, then we use hypernyms (or concepts) to divide instances, to obtain the equivalence class thereby. To this end, we provide a strategy for inducing datasets from ProBase, and the instances in the induced datasets can be divided by their concepts. (2) In terms of experimental evaluation, in order to verify the above ideas, we induce three datasets based on the strategy in ProBase, and perform experimental verification on three data sets. The experimental results fully verified our theoretical analysis conclusions.

1.2 Contribution

In brief, the contributions in this paper are summarized as follows:

  1. 1.

    We rigorously explain why k nowledge am ount (KAM) has much better performance for measuring the uncertainty of KBs. We prove an empirical conclusion established through experiments from a mathematical point of view.

  2. 2.

    We prove that measurement methods based on knowledge granulation, knowledge entropy, rough entropy, and knowledge amount can be integrated into a unified measurement function in measuring the uncertainty of KBs. We provide a formal representation of the unified measurement framework and exhaustive comparative analysis.

  3. 3.

    We propose an efficient strategy that induces a new dataset from ProBase. The instances in the induced dataset can be rigorously partitioned based on their concepts. Therefore, we expand the usage scenarios of the measurement function so that the measurement function is still valid for datasets that do not have enough attributes.

1.3 Paper organization

In Section 2, we briefly review the previous studies related to the work of this paper. In Section 3, we review some definitions related to RST, KBs and summarize some notations used in our work. In Section 4, we summarize the calculation methods and properties of the four measurement functions used to measure the uncertainty of KBs. In Section 5, we review the dispersion analysis of numerical experiments in [1]. In Section 6, we conduct a detailed theoretical analysis of different measurement functions and provide our main conclusions (i.e., Theorems 1,2, 3, and 4). Specifically, we unified the four popular measurement functions into a new measurement function. In Section 7, we first provide the definition of the concept structure of ProBase (see Definition 13). And then, we provide an effective strategy to induce KBs from ProBase, and instances in induced KBs can be classified by their concept of them. In Section 8, we verify our theoretical analysis via extensive experiments. Specifically, we conduct experiments on 18 public datasets and on three datasets induced from ProBase based on our proposed strategy. Section 11 summarizes our work.

2 Related work

In recent years, research on KBs has become one of the important topics in industry and academia. Many researchers have made exceptional contributions to this field and achieved a series of important results. Especially in theoretical research on the KBs, a series of important results have been obtained. These important conclusions have far-reaching significance for establishing a computable and measurable framework in the KBs. In particular, the uncertainty measurement of KBs based on knowledge structure has been widely concerned.

Knowledge structure

Qian et al. [35] describe the differences between various knowledge structures in the KBs based on the concept of knowledge distance. Li et al. [33] propose the definition of lattice, mapping, soft characterizations, and the group of knowledge structures. In the study of the relationship between different KBs, Li et al. [36] regard the KBs as a special relation information system. By introducing homomorphisms, they prove that the KBs are invariant under homomorphisms. Subsequently, based on the homomorphism relation between KBs, Qin et al. [37] propose the concept of communication between KBs, and they obtain a series of invariant characterizations under homomorphisms. It is worth noting that the above works all involve RST, which also provides a strong theoretical basis for our work. In addition, some scholars are committed to using other means to describe the knowledge structure, such as using fuzzy skill maps [38] and knowledge space theory [39].

Measurement method

The uncertainty of the KBs is usually calculated by entropy (e.g., information entropy) [40]. Some scholars have shown an increased interest in the combination of entropy theory and rough theory to measure the uncertainty of the system. Hence, many classic mathematical tools have been proposed. For example, Düntsch and Gediga et al. [29] study measuring uncertainty of rough sets with information entropy; Beaubouef et al. [30] propose a new concept, called rough entropy; Liang et al. [27] establish the relationships between rough and information entropy. In the study of knowledge granulation, Wierman [31] focuses on using knowledge granulation to measure the uncertainty of rough sets; Yao [41] employs the concept of granularity measure when studying the probabilistic approaches to rough sets; Shah et al. [32] propose many measures using soft rough covering sets theory and applied this theory to the task of multi-criteria decision making. Qin et al. [42] use rough set theory to analyze knowledge structures in a tolerance knowledge base. Kobren et al. [43] provide a new framework that can use user feedback to realize the construction and maintenance of the knowledge base in the case of identity uncertainty. Guo and Xu [7] provide a novel entropy-independent measurement function to capture the features of intuitionistic fuzzy sets.

3 Preliminaries

In this section, the key mathematical notations and their descriptions are listed in Table 1, and some basic definitions are reviewed.

Table 1 Key Notations and Descriptions

Definition 1 (1 Binary relation R on \(\mathcal {W}\))

Let wiRwj denote the binary relation between wi and wj on \(\mathcal {W}\), where wi is the predecessor of wj, and wj is the successor of wi. If \((w_{i}, w_{j})\in \textbf {R}\subseteq \mathcal {W} \times \mathcal {W}\), then we have wiRwj.

For any \(\left (w_{i}, w_{j}\right )\), the binary relation R can be represented by a 0-1 square matrix as follows,

$$ \text{Matrix}(\textbf{R})= \left[\begin{array}{ccc} \textbf{R}\left( w_{1}, w_{1}\right) & {\cdots} & \textbf{R}\left( w_{1}, w_{m}\right) \\ {\vdots} & {\ddots} & {\vdots} \\ \textbf{R}\left( w_{m}, w_{1}\right) & {\cdots} & \textbf{R}\left( w_{m}, w_{m}\right) \end{array}\right]_{m\times m} $$

where \(\textbf {R}\left (w_{i}, w_{j}\right )=1\), if \(\left (w_{i}, w_{j}\right ) \in \textbf {R}\), otherwise, \(\textbf {R}\left (w_{i}, w_{j}\right )=0\).

Definition 2 (1, 44 Equivalence relation on \(\mathcal {W}\))

If R satisfies the following three properties, then we call R to be an equivalence relation on \(\mathcal {W}\). Specifically,

  1. 1.

    reflexive means that wRw always holds for any \(w \in \mathcal {W}\),

  2. 2.

    symmetric means that wRv implies vRw for any w,v \(\in \mathcal {W}\),

  3. 3.

    transitive refers to wRv and vRu imply wRu for any \(w, u, v \in \mathcal {W}.\)

Since \(\mathcal {W}\) can be partitioned by an equivalence relation Ri, and the following definition of the equivalence class is obtained.

Definition 3 (44 Equivalence class on \(\mathcal {W}\))

Let Ri be an equivalence relation on \(\mathcal {W}\), we call that

$$ [w]_{\textbf{R}_{i}}=\{v \in \mathcal{W}~|~w \textbf{R}_{i} v\}, $$
(1)

is the equivalence class including w, and

$$ \mathcal{W} / \textbf{R}_{i}=\left\{[w]_{\textbf{R}_{i}}~|~w \in \mathcal{W}\right\} $$
(2)

is the family of all \([w]_{\textbf {R}_{i}}\).

Definition 4 (18 Knowledge base)

\([\mathcal {W}, \mathcal {R}]\) is called a KB if and only if \(\mathcal {R}\in 2^{\mathcal {R}[\mathcal {W}]}\).

Definition 5 (44 Equivalence relationship between KBs)

Given two KBs \([\mathcal {W}, \mathcal {Q}]\) and \([\mathcal {W}, \mathcal {O}]\), if \([\mathcal {W}, \mathcal {Q}]\) and \([\mathcal {W}, \mathcal {O}]\) are equivalent (i.e., \([\mathcal {W}, \mathcal {Q}] \triangleq [\mathcal {W}, \mathcal {O}]) \) then we have

$$ [\mathcal{W}, \mathcal{Q}] \triangleq [\mathcal{W}, \mathcal{O}] \Longleftrightarrow \mathcal{W} / \mathcal{Q} \triangleq \mathcal{W} / \mathcal{O}.$$

Definition 6 (1 Knowledge structure of \([\mathcal {W}, \mathcal {R}] \))

If the finite set \(\mathcal {W}=\{w_{i}\}_{k}\) can be divided by relations \(\mathcal {R}=\{\textbf {R}_{1}, \textbf {R}_{2}, ... , \textbf {R}_{i}\}\), then we call the vector

$$ \text{CSV}(\mathcal{R})=\left\langle \left[w_{1}\right]_{\mathcal{R}},\left[w_{2}\right]_{\mathcal{R}}, \ldots,\left[w_{k}\right]_{\mathcal{R}}\right\rangle $$
(3)

the knowledge structure of \([\mathcal {W},\mathcal {R}]\).

Definition 7 (Indiscernibility relation over \(\mathcal {P}\))

If \(\emptyset \neq \mathcal {P}\subseteq \mathcal {R}\), then we call \(\bigcap \mathcal {P}\) is the indiscernibility relation over \(\mathcal {P}\), which is denoted by \(ind({\mathcal {P}})\).

In other words, let F be the finite set, and fa and fb are two entities in F. fa and fb satisfy indiscernibility relation over \(\mathcal {P}\) if and only if fa and fb have the same value on all elements in \(\mathcal {P}\). For example, a red Porsche and a red Tesla satisfy the indiscernibility relation on the attribute color.

Example 1

Given a collection \(\mathcal {W}=\{w_{1}, w_{2},\cdots , w_{8}\}\) that contains 8 candies. Suppose these candies have different colors (e.g., red, blue, yellow), shapes (e.g., square, round, triangular), flavors (e.g., lemony, sweet). Therefore, these candies can be divided according to color, shape and taste. Statistical information about \(\mathcal {W}\) is summarized in Table 2.

Table 2 Candies are divided according to color, shape and taste

As shown in Table 2, we can define three equivalence relations, namely, R1 (i.e., color), R2 (i.e., shape), and R3 (i.e., taste). Further, through these three equivalence relations, the following three equivalence classes are obtained, i.e.,

$$ \begin{array}{llll} &\mathcal{W}/R_{1}=\{\{w_{1}, w_{3}, w_{7}\}, \{w_{2}, w_{4}\}, \{w_{5}, w_{6}, w_{8}\}\}, \\ &\mathcal{W}/R_{2}=\{\{w_{1}, w_{5}\}, \{w_{2}, w_{6}\}, \{w_{3}, w_{4}, w_{7}, w_{8}\}\}, \\ &\mathcal{W}/R_{3}=\{\{w_{2}, w_{3}, w_{7}\}, \{w_{1}, w_{3}, w_{4}, w_{5}, w_{6}\}\}. \end{array} $$

Apparently, according to Definition 4, [\(\mathcal {W}, \{R_{1}, R_{2}, R_{3}\}\)] is the KB. And according to Definition 7, w1 and w3 satisfy the indiscernibility relation on the color red, w1 and w4 satisfy the indiscernibility relation on the shape square.

4 Four uncertainty measurement functions for KBs

In this section, we introduce the categories, the core idea, and the formalization of four measurement functions. It is worth noting that, for a finite set \(\mathcal {W}\), we can divide \(\mathcal {W}\) based on its equivalence relations \(\mathcal {R}\) (based on rough set theory guidance) to obtain the knowledge base \([\mathcal {W}, \mathcal {R}]\). Then, according to Definition 6, we obtain the knowledge structure (i.e., \(\text {CSV}(\mathcal {R})\)) of \([\mathcal {W}, \mathcal {R}]\). Moreover, based on the \(\text {CSV}(\mathcal {R})\), we can unitize the knowledge granulation of \(\text {CSV}(\mathcal {R})\), the knowledge entropy of \(\text {CSV}(\mathcal {R})\), the rough entropy of \(\text {CSV}(\mathcal {R})\), and the knowledge amount of \(\text {CSV}(\mathcal {R})\) to construct the measure set, respectively. Finally, based on the constructed measure set (the principles of measure set construction and example are provided in Section 8), and the coefficient of variation (denoted as \(C_{v}(\mathcal {W})\) in (11), which is a common objective statistical indicator used to measure the uncertainty of a dataset) of the set is calculated to measure the uncertainty of the KB \([\mathcal {W}, \mathcal {R}]\).

4.1 Categories of four measurement functions

In this paper, we focus on 4 currently popular measurement functions for measuring uncertainty of knowledge bases. Specifically, these methods include:

  1. 1.

    Granularity-based measures (i.e., the knowledge granulation of \(\text {CSV}(\mathcal {R})\) in Definition 8).

  2. 2.

    Entropy-based measures (i.e., the knowledge entropy of \(\text {CSV}(\mathcal {R})\) in Definition 9, and the rough entropy of \(\text {CSV}(\mathcal {R})\) in Definition 10).

  3. 3.

    Knowledge amount-based measures (i.e., the knowledge amount of \(\text {CSV}(\mathcal {R})\) in Definition 11).

4.2 The core idea of four measurement functions

  1. 1.

    The core idea of granularity-based measures: The granulation of knowledge in the KB is mainly quantified by counting the number of elements in the equivalence relations \(\textbf {R}\in \mathcal {R}\). Specifically, given a KB \([\mathcal {W}, \mathcal {R}]\), if \(\mathcal {R}\in 2^{\mathcal {R}[\mathcal {W}]} \), then the granulation of \([\mathcal {W}, \mathcal {R}]\) can be formalized as a mapping function from \(2^{\mathcal {R}[\mathcal {W}]}\) to \((0, +\infty ]\).

  2. 2.

    The core idea of entropy-based measures: In classical thermodynamics, entropy as a measurable physical property reveals the disorder of the system (the higher the value of entropy, the higher disorder of the system). In information theory, entropy (e.g., Shannon entropy) is used to measure the uncertainty of a system. Similarly, a large number of studies applied the concept of entropy to measure the uncertainty of KBs.

  3. 3.

    The core idea of knowledge amount-based measures: These measures are the variation of the entropy-based measures described above, which introduces a probability measure (e.g., the probability of Wi in the universe \(\mathcal {W}\)). These makes it possible to measure the uncertainty and the fuzziness of the KB.

4.3 Formalization of four measurement functions

Definition 8

[[1] Knowledge granulation of \(\text {CSV}(\mathcal {R})\)] For a knowledge base \([\mathcal {W}, \mathcal {R}]\), the knowledge granulation of \(\text {CSV}(\mathcal {R})\) is quantified as:

$$ \text{KGR}(\mathcal{R})=\frac{1}{k^{2}} {\sum}_{i=1}^{m}\left|W_{i}\right|^{2} = \frac{1}{k^{2}}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}}\right|, $$
(4)

where \(\mathcal {W}/\mathcal {R}= \{W_{i}\}_{m}\), \(W_{i}=\{w_{i}\}_{n_{i}}\) (i.e., |Wi| = ni), \({\sum }_{i=1}^{m}(n_{i})={\sum }_{i=1}^{m}|W_{i}|=|\mathcal {W}|=k\). \(\mathcal {R}\) is the set of equivalence relations.

Definition 9 (1 Knowledge entropy of \(\text {CSV}(\mathcal {R})\))

For a knowledge base \([\mathcal {W}, \mathcal {R}]\), the knowledge entropy of \(\text {CSV}(\mathcal {R})\) is quantified as:

$$ \text{KEN}(\mathcal{R})=-{\sum}_{i=1}^{m} \frac{\left|W_{i}\right|}{k} \log_{2} \frac{\left|W_{i}\right|}{k} = -{\sum}_{i=1}^{k} \frac{1}{k} \log_{2} \frac{\left|\left[w_{i}\right]_{\mathcal{R}}\right|}{k} $$
(5)

where \(\mathcal {W}/\mathcal {R}= \{W_{i}\}_{m}\), \(W_{i}=\{w_{i}\}_{n_{i}}\) (i.e., |Wi| = ni), \({\sum }_{i=1}^{m}(n_{i})={\sum }_{i=1}^{m}|W_{i}|=|\mathcal {W}|=k\). \(\mathcal {R}\) is the set of equivalence relations.

Definition 10 (1 Rough entropy of \(\text {CSV}(\mathcal {R})\))

For a knowledge base \([\mathcal {W}, \mathcal {R}]\), the rough entropy of \(\text {CSV}(\mathcal {R})\) is quantified as:

$$ \text{REN}(\mathcal{R})=-{\sum}_{i=1}^{m} \frac{\left|W_{i}\right|}{k} \log_{2} \frac{1}{\left|W_{i}\right|} = -{\sum}_{i=1}^{k} \frac{1}{k} \log_{2} \frac{1}{\left|\left[w_{i}\right]_{\mathcal{R}}\right|} $$
(6)

where \(\mathcal {W}/\mathcal {R}= \{W_{i}\}_{m}\), \(W_{i}=\{w_{i}\}_{n_{i}}\) (i.e., |Wi| = ni), \({\sum }_{i=1}^{m}(n_{i})={\sum }_{i=1}^{m}|W_{i}|=|\mathcal {W}|=k\). \(\mathcal {R}\) is the set of equivalence relations.

Definition 11 (1 Knowledge amount of \(\text {CSV}(\mathcal {R})\))

For a knowledge base \([\mathcal {W}, \mathcal {R}]\), the knowledge amount of \(\text {CSV}(\mathcal {R})\) is quantified as:

$$ \begin{array}{@{}rcl@{}} \text{KAM}(\mathcal{R})&=&{\sum}_{i=1}^{m} \frac{1}{k^{2}}\left|W_{i}\right|\left|\mathcal{W}-W_{i}\right| \\&=&{\sum}_{i=1}^{k} \frac{1}{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}}\right|}{k}\right), \end{array} $$
(7)

where \(\mathcal {W}/\mathcal {R}= \{W_{i}\}_{m}\), \(W_{i}=\{w_{i}\}_{n_{i}}\) (i.e., |Wi| = ni), \({\sum }_{i=1}^{m}(n_{i})={\sum }_{i=1}^{m}|W_{i}|=|\mathcal {W}|=k\). \(\mathcal {R}\) is the set of equivalence relations.

4.4 The main properties of \(\text {KGR}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\)

Lemma 1 (1 Boundedness)

Suppose that \([\mathcal {W}, \mathcal {R}]\) is a KB and \(|\mathcal {W}|=k\), then

$$ \begin{array}{llll} \frac{1}{k} \leq &~\text{KGR}(\mathcal{R})\leq 1,\\ 0 \leq &~\text{REN}(\mathcal{R})\leq \log_{2} k,\\ 0 \leq &~\text{KAM}(\mathcal{R})\leq \frac{k-1}{k},\\ 0 \leq &~\text{KEN}(\mathcal{R})\leq \log_{2} k. \end{array} $$
(8)

Inequalities in (8) reveal the boundedness of \(\text {KGR}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\) on \(\mathcal {W}\).

Lemma 2 (1 Monotonicity)

Let \([\mathcal {W}, \mathcal {O}]\), \([\mathcal {W}, \mathcal {Q}]\) be two KBs. If \(\text {CSV}(\mathcal {O}) \prec \text {CSV}(\mathcal {Q})\)(i.e., \(\text {IDE}\left (\text {CSV}(\mathcal {O}) / \text {CSV}(\mathcal {Q})\right ) = 1\)), then

$$ \begin{array}{llll} \text{KGR}(\mathcal{O}) &< \text{KGR}(\mathcal{Q}),\\ \text{REN}(\mathcal{O}) &< \text{REN}(\mathcal{Q}),\\ \text{KAM}(\mathcal{O}) &> \text{KAM}(\mathcal{Q}),\\ \text{KEN}(\mathcal{O}) &> \text{KEN}(\mathcal{Q}). \end{array} $$
(9)

For rigorous proof of Lemma 1 and 2, the reader is referred to [1].

5 Dispersion analysis

In this section, we first review the conclusion of numerical experiments of [1]. The authors construct 4 measure sets (the principles of measure set construction and example are provided in Section 8) on three datasetsFootnote 2 (Nursery, Solar Flare, and Tic-Tac-Toe Endgamelaintaio in Table 3). Then, they compare the performance of four measurement functions (i.e., Definitions 8-11) by dispersion analysis. In their numerical experiment, they use the coefficient of variation of datasets to compare the performance differences between four different measurement functions. The experimental results are shown in Table 3.

Table 3 Cv-values of measure sets M(KGR), M(REN), M(KEN) and M(KAM)

According to Table 3, it is easy to see that this may imply an interesting conclusion, i.e.,

$$ \begin{array}{@{}rcl@{}} C_{v}(\textbf{M}_{\text{KGR}}(\mathcal{W}))&>& C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W})) > C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W})) \\&>& C_{v}(\textbf{M}_{\text{KAM}}(\mathcal{W})). \end{array} $$
(10)

Inequality (10) shows that \(\text {KAM}(\mathcal {P}_{i}/\mathcal {O}_{i}/\mathcal {Q}_{i})\) has a much better performance. The conclusion of Inequality (10) and Table 3 may reflect a kind of regularity, which naturally leads to further thinking about the following questions:

  1. 1.

    Does the conclusion of (10) apply to most datasets?

  2. 2.

    Does (10) reveal general laws?

  3. 3.

    What is the mathematical principle of (10)?

This motivates us to conduct deeper insight into different measurement functions. In the next section, we will give answers to these three questions.

6 Theoretical analysis of measurement functions

In this section, we answer the above three questions. We provide a unified framework to prove Inequality (10), and theoretically prove that Inequality (10) has general properties for most KBs. These conclusions provide a rigorous theoretical basis for measuring uncertainty for KBs. Before giving the conclusions, we review the mathematical tools and notations we need to use in our proof. Specifically, for a given finite set \(\mathcal {W}=\{w_{i}\}_{n}\), we use \(\sigma (\mathcal {W})\) and \(C_{v}(\mathcal {W})\) to represent standard deviation and coefficient of variation of \(\mathcal {W}\), respectively, i.e.,

$$ \bar{w} = \frac{1}{n} {\sum}_{i=1}^{n} w_{i}, \sigma(\mathcal{W}) = \sqrt{\!\frac{1}{n} {\sum}_{i=1}^{n}\left( w_{i} - \bar{w}\right)^{2}}, C_{v}(\mathcal{W}) = \frac{\sigma(\mathcal{W})}{\bar{w}}. $$
(11)

Next, we provide our core theorems, which are Theorems 1,2.3, and 4. These conclusions strictly theoretically prove the experimental conclusion in [1], solving the two questions raised in Section 5 thereby.

Theorem 1

Suppose that \([\mathcal {W},\mathcal {R}_{n}]\) be a KB. Let \(\textbf {M}(\mathcal {W})\) be the measure set on \(\mathcal {W}\), where \(\mathcal {W}= \{w_{i}\}_{k}\), which can be divided by relation \(\mathcal {R}_{n}=\{\textbf {R}_{j}\}_{n}\). Then the \(C_{v}(\textbf {M}_{\text {KGR}}(\mathcal {W}))\) can be equivalently described by the measurement function λ(x), where

$$ \begin{array}{@{}rcl@{}} \lambda(\cdot) &=&\frac{\sqrt{n \cdot {\sum}_{i=1}^{n}\left( {\sum}_{i=1}^{k}(\cdot)-\frac{1}{n} {\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(\cdot)\right)^{2}}}{{\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(\cdot)},\\ x &=& |[w_{i}]_{\mathcal{R}_{j}}|\in \mathbb{Z}^{+}. \end{array} $$
(12)

Proof

Suppose that \([\mathcal {W},\mathcal {R}_{n}]\) be a KB, and let \(\textbf {M}_{\text {KGR}}(\mathcal {W})\) be the measure set on the \(\mathcal {W}\) based on knowledge granulation, we suppose that,

$$ \begin{array}{@{}rcl@{}} \textbf{M}_{\text{KGR}}(\mathcal{W}) &=& \{\text{KGR}(\mathcal{R}_{1}), \text{KGR}(\mathcal{R}_{2}) , ..., \text{KGR}(\mathcal{R}_{n})\} \\&=& \{\text{KGR}(\mathcal{R}_{j})\}_{n}. \end{array} $$
(13)

According to (11), we obtain the following, i.e.,

$$ \begin{array}{@{}rcl@{}} \overline{\text{KGR}(\mathcal{R})} &=& \frac{1}{n} {\sum}_{j=1}^{n} \text{KGR}(\mathcal{R}_{j}), \\ \sigma(\textbf{M}_{\text{KGR}}(\mathcal{W})) &=& \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KGR}(\mathcal{R}_{j})-\overline{\text{KGR}(\mathcal{R})}\right)^{2}}, \\ C_{v}(\textbf{M}_{\text{KGR}}(\mathcal{W})) &=& \frac{\sigma(\textbf{M}_{\text{KGR}}(\mathcal{W}))}{\overline{\text{KGR}(\mathcal{R})}}. \end{array} $$
(14)

According to (4), for the set \(\mathcal {W}=\{w_{i}\}_{k}\) (i.e., \(|\mathcal {W}| = k\)), it follows that,

$$ \text{KGR}(\mathcal{R}) = \frac{1}{k^{2}}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|^{2}, $$
(15)

and

$$ \begin{array}{@{}rcl@{}} \overline{\text{KGR}(\mathcal{R})}&=& \frac{1}{n} {\sum}_{j=1}^{n} \text{KGR}(\mathcal{R}_{j})\\&=&\frac{1}{nk^{2}}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|. \end{array} $$
(16)

Further, we obtain

$$ \begin{array}{@{}rcl@{}} &&\sigma(\textbf{M}_{\text{KGR}}(\mathcal{W})) \\&=& \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KGR}(\mathcal{R}_{j})-\overline{\text{KGR}(\mathcal{R})}\right)^{2}}\\ &=&\sqrt{\frac{1}{n}{\sum}_{j=1}^{n}\left( \frac{1}{k^{2}}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|-\frac{1}{nk^{2}}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|\right)^{2}} \end{array} $$
(17)

, and

$$ \begin{array}{llll} &\quad C_{v}(\textbf{M}_{\text{KGR}}(\mathcal{W}))\\ &= \frac{\sigma(\textbf{M}_{\text{KGR}}(\mathcal{W}))}{\overline{\text{KGR}(\mathcal{R})}} \\ &= \frac{\sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KGR}(\mathcal{R}_{j})-\overline{\text{KGR}(\mathcal{R})}\right)^{2}}}{\frac{1}{nk^{2}}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|}\\ &=\frac{\sqrt{\frac{1}{n}{\sum}_{j=1}^{n}\left( \frac{1}{k^{2}}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|-\frac{1}{nk^{2}}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|\right)^{2}}}{\frac{1}{nk^{2}}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|}\\ &=\frac{\sqrt{n \cdot {\sum}_{i=1}^{n}\left( {\sum}_{i=1}^{k}(|[w_{i}]_{\mathcal{R}_{j}}|)-\frac{1}{n} {\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(|[w_{i}]_{\mathcal{R}_{j}}|)\right)^{2}}}{{\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(|[w_{i}]_{\mathcal{R}_{j}}|)}. \end{array} $$
(18)

By (18), we establish the mapping relationship between \(C_{v}(\textbf {M}_{\text {KGR}}(\mathcal {W}))\) and \(|[w_{i}]_{\mathcal {R}_{j}}|\), i.e.,

$$ \lambda (|[w_{i}]_{\mathcal{R}_{j}}|) = C_{v}(\textbf{M}_{\text{KGR}}(\mathcal{W})) \triangleq C_{v}(\text{KGR}(\{\mathcal{R}_{j})\}_{n}), $$
(19)

where λ(⋅) satisfies (12). The proof is completed. □

Theorem 2

Suppose that \([\mathcal {W},\mathcal {R}_{n}]\) be a KB. Let \(\textbf {M}(\mathcal {W})\) be the measure set on \(\mathcal {W}\), where \(\mathcal {W}= \{w_{i}\}_{k}\), which can be divided by relation \(\mathcal {R}_{n}=\{\textbf {R}_{j}\}_{n}\). Then the \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\) can be equivalently described by the measurement function λ(x) (i.e., (12)).

Proof

Suppose that \([\mathcal {W},\mathcal {R}_{n}]\) be a KB, and let \(\textbf {M}_{\text {REN}}(\mathcal {W})\) be the measure set on the \(\mathcal {W}\) based on rough entropy, we suppose that,

$$ \begin{array}{@{}rcl@{}} \textbf{M}_{\text{REN}}(\mathcal{W}) &=& \{\text{REN}(\mathcal{R}_{1}), \text{REN}(\mathcal{R}_{2}) , ..., \text{REN}(\mathcal{R}_{n})\} \\&=& \{\text{REN}(\mathcal{R}_{j})\}_{n}. \end{array} $$
(20)

According to (11), then we obtain the following, i.e.,

$$ \begin{array}{@{}rcl@{}} \overline{\text{REN}(\mathcal{R})} &=& \frac{1}{n} {\sum}_{j=1}^{n} \text{REN}(\mathcal{R}_{j}), \\ \sigma(\textbf{M}_{\text{REN}}(\mathcal{W})) &=& \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{REM}(\mathcal{R}_{j})-\overline{\text{REN}(\mathcal{R})}\right)^{2}}, \\ C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W})) &=& \frac{\sigma(\textbf{M}_{\text{REN}}(\mathcal{W}))}{\overline{\text{REN}(\mathcal{R})}} \end{array} $$
(21)

According to (6), for the set \(\mathcal {W}=\{w_{i}\}_{k}\) (i.e., \(|\mathcal {W}| = k\)), it follows that,

$$ \text{REN}(\mathcal{R}_{j}) = -{\sum}_{i=1}^{k} \frac{1}{k} \log_{2} \frac{1}{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|} = {\sum}_{i=1}^{k} \frac{1}{k} \log_{2} \left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|, $$
(22)

and

$$ \overline{\text{REN}(\mathcal{R})}= \frac{1}{n} {\sum}_{j=1}^{n} \text{REN}(\mathcal{R}_{j})=\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\log_{2}\left|[w_{i}]_{\mathcal{R}_{j}}\right|. $$
(23)

Further, we obtain

$$ \begin{array}{@{}rcl@{}} &&\!\sigma(\textbf{M}_{\text{REN}}(\mathcal{W}))\\ &\!=&\! \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{REN}(\mathcal{R}_{j})-\overline{\text{REN}(\mathcal{R})}\right)^{2}}\\ &\!=&\! \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( {\sum}_{i=1}^{k} \frac{1}{k} \log_{2} \left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right| - \frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\log_{2}\left|[w_{i}]_{\mathcal{R}_{j}}\right|\right)^{2}} \end{array} $$
(24)

, and

$$ \begin{array}{llll} &C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W}))\\ &= \frac{\sigma(\textbf{M}_{\text{REN}}(\mathcal{W}))}{\overline{\text{REN}(\mathcal{R})}} \\ &= \frac{\sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{REN}(\mathcal{R}_{j})-\overline{\text{REN}(\mathcal{R})}\right)^{2}}}{\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|}\\ &=\frac{\sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( {\sum}_{i=1}^{k} \frac{1}{k} \log_{2} \left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|-\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\log_{2}\left|[w_{i}]_{\mathcal{R}_{j}}\right|\right)^{2}}}{\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left|[w_{i}]_{\mathcal{R}_{j}}\right|}\\ &=\frac{\sqrt{n \cdot {\sum}_{i=1}^{n}\left( {\sum}_{i=1}^{k}(\log_{2}|[w_{i}]_{\mathcal{R}_{j}}|)-\frac{1}{n} {\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(\log_{2}|[w_{i}]_{\mathcal{R}_{j}}|)\right)^{2}}}{{\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(\log_{2}|[w_{i}]_{\mathcal{R}_{j}}|)}. \end{array} $$
(25)

By (25), we establish the mapping relationship between \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\) and \(\log _{2}|[w_{i}]_{\mathcal {R}_{j}}|\), i.e.,

$$ \lambda (\log_{2}|[w_{i}]_{\mathcal{R}_{j}}|) = C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W})) \triangleq C_{v}(\text{REN}(\{\mathcal{R}_{j})\}_{n}), $$
(26)

where λ(⋅) satisfies (12). The proof is completed. □

Theorem 3

Suppose that \([\mathcal {W},\mathcal {R}_{n}]\) be a KB. Let \(\textbf {M}(\mathcal {W})\) be the measure set on \(\mathcal {W}\), where \(\mathcal {W}= \{w_{i}\}_{k}\), which can be divided by relation \(\mathcal {R}_{n}=\{\textbf {R}_{j}\}_{n}\). Then the \(C_{v}(\textbf {M}_{\text {KEN}}(\mathcal {W}))\) can be equivalently described by the measurement function λ(x) (i.e., (12)).

Proof

Suppose that \([\mathcal {W},\mathcal {R}_{n}]\) be a KB, and let \(\textbf {M}_{\text {REN}}(\mathcal {W})\) be the measure set on the \(\mathcal {W}\) based on rough entropy, we suppose that,

$$ \begin{array}{@{}rcl@{}} \textbf{M}_{\text{KEN}}(\mathcal{W}) &=& \{\text{KEN}(\mathcal{R}_{1}), \text{KEN}(\mathcal{R}_{2}) , ..., \text{KEN}(\mathcal{R}_{n})\} \\&=& \{\text{KEN}(\mathcal{R}_{j})\}_{n}. \end{array} $$
(27)

According to (11), then we obtain the following, i.e.,

$$ \begin{array}{@{}rcl@{}} \overline{\text{KEN}(\mathcal{R})} &=& \frac{1}{n} {\sum}_{j=1}^{n} \text{KEN}(\mathcal{R}_{j}), \\ \sigma(\textbf{M}_{\text{KEN}}(\mathcal{W})) &=& \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KEN}(\mathcal{R}_{j})-\overline{\text{KEN}(\mathcal{R})}\right)^{2}}, \\ C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W})) &=& \frac{\sigma(\textbf{M}_{\text{KEN}}(\mathcal{W}))}{\overline{\text{KEN}(\mathcal{R})}} \end{array} $$
(28)

According to (5), for the set \(\mathcal {W}=\{w_{i}\}_{k}\) (i.e., |W| = k), it follows that,

$$ \text{KEN}(\mathcal{R}_{j}) = -{\sum}_{i=1}^{k} \frac{1}{k} \log_{2} \frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k} = \frac{1}{k} {\sum}_{i=1}^{k} \log_{2} \frac{k}{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|} $$
(29)

and

$$ \overline{\text{KEN}(\mathcal{R})} = \frac{1}{n} {\sum}_{j=1}^{n} \text{KEN}(\mathcal{R}_{j})=\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|}. $$
(30)

Further, we can obtain

$$ \begin{array}{@{}rcl@{}} &&\sigma(\textbf{M}_{\text{KEN}}(\mathcal{W})) \\&=& \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KEN}(\mathcal{R}_{j})-\overline{\text{KEN}(\mathcal{R})}\right)^{2}}\\ &=& \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( {\sum}_{i=1}^{k} \frac{1}{k} \log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|} - \frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|}\right)^{2}} \end{array} $$
(31)

, and

$$ \begin{array}{llll} &C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W}))\\ &= \frac{\sigma(\textbf{M}_{\text{KEN}}(\mathcal{W}))}{\overline{\text{KEN}(\mathcal{R})}} \\ &= \frac{\sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KEN}(\mathcal{R}_{j})-\overline{\text{KEN}(\mathcal{R})}\right)^{2}}}{\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|}}\\ &=\frac{\sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( {\sum}_{i=1}^{k} \frac{1}{k} \log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|} -\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|}\right)^{2}}}{\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|}}\\ &=\frac{\sqrt{n \cdot {\sum}_{i=1}^{n}\left( {\sum}_{i=1}^{k}(\log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|})-\frac{1}{n} {\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(\log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|})\right)^{2}}}{{\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(\log_{2}\frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|})}. \end{array} $$
(32)

According to (32), we establish the mapping relationship between \(C_{v}(\textbf {M}_{\text {KEN}}(\mathcal {W}))\) and \(\log _{2}\frac {k}{\left |[w_{i}]_{\mathcal {R}_{j}}\right |}\), i.e.,

$$ \begin{array}{@{}rcl@{}} \lambda \left( \log_{2}\left( \frac{k}{\left|[w_{i}]_{\mathcal{R}_{j}}\right|}\right)\right) &=& C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W})) \\&\triangleq& C_{v}(\text{KEN}(\{\mathcal{R}_{j})\}_{n}), \end{array} $$
(33)

where λ(⋅) satisfies (12). The proof is completed. □

Theorem 4

Suppose that \([\mathcal {W},\mathcal {R}_{n}]\) be a KB. Let \(\textbf {M}(\mathcal {W})\) be the measure set on \(\mathcal {W}\), where \(\mathcal {W}= \{w_{i}\}_{k}\), which can be divided by relation \(\mathcal {R}_{n}=\{\textbf {R}_{j}\}_{n}\). Then the \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) can be equivalently described by the measurement function λ(x) (i.e., (12)).

Proof

Suppose that \([\mathcal {W},\mathcal {R}_{n}]\) be a KB, and let \(\textbf {M}_{\text {RAM}}(\mathcal {W})\) be the measure set on the \(\mathcal {W}\) based on rough entropy, we suppose that,

$$ \begin{array}{@{}rcl@{}} \textbf{M}_{\text{KAM}}(\mathcal{W}) &=& \{\text{KAM}(\mathcal{R}_{1}), \text{KAM}(\mathcal{R}_{2}) , ..., \text{KAM}(\mathcal{R}_{n})\} \\&=& \{\text{KAM}(\mathcal{R}_{j})\}_{n}. \end{array} $$
(34)

According to (11), then we obtain the following, i.e.,

$$ \begin{array}{@{}rcl@{}} \overline{\text{KAM}(\mathcal{R})} &=& \frac{1}{n} {\sum}_{j=1}^{n} \text{KAM}(\mathcal{R}_{j}), \\ \sigma(\textbf{M}_{\text{KAM}}(\mathcal{W})) &=& \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KEN}(\mathcal{R}_{j})-\overline{\text{KAM}(\mathcal{R})}\right)^{2}}, \\ C_{v}(\textbf{M}_{\text{KAM}}(\mathcal{W})) &=& \frac{\sigma(\textbf{M}_{\text{KAM}}(\mathcal{W}))}{\overline{\text{KAM}(\mathcal{R})}} \end{array} $$
(35)

According to (7), for the set \(\mathcal {W}=\{w_{i}\}_{k}\) (i.e., |W| = k), it follows that,

$$ \text{KAM}(\mathcal{R}_{j}) = {\sum}_{i=1}^{k} \frac{1}{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right), $$
(36)

and

$$ \overline{\text{KAM}(\mathcal{R})} = \frac{1}{n} {\sum}_{j=1}^{n} \text{KAM}(\mathcal{R}_{j}) = \frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\!\left( \!1 - \frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\!\right)\!. $$
(37)

Further, we obtain that,

$$ \begin{array}{@{}rcl@{}} &&\!\sigma(\textbf{M}_{\text{KAM}}(\mathcal{W})) \\&\!=&\! \sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KAM}(\mathcal{R}_{j})-\overline{\text{KAM}(\mathcal{R})}\right)^{2}}\\ &\!=&\!\sqrt{\!\frac{1}{n}{\sum}_{j=1}^{n}\left( {\sum}_{i=1}^{k} \frac{1}{k}\left( \!1 - \frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\!\right) - \frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left( \!1 - \frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\!\right)\!\right)^{2}} \end{array} $$
(38)

and

$$ \begin{array}{llll} &C_{v}(\textbf{M}_{\text{KAM}}(\mathcal{W})) \\ &= \frac{\sigma(\textbf{M}_{\text{KAM}}(\mathcal{W}))}{\overline{\text{KAM}(\mathcal{R})}} \\ &= \frac{\sqrt{\frac{1}{n} {\sum}_{j=1}^{n}\left( \text{KAM}(\mathcal{R}_{j})-\overline{\text{KAM}(\mathcal{R})}\right)^{2}}}{\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right)}\\ &=\frac{\sqrt{\frac{1}{n}{\sum}_{j=1}^{n}\left( {\sum}_{i=1}^{k} \frac{1}{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right)- \frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right)\right)^{2}}}{\frac{1}{nk}{\sum}_{j=1}^{n}{\sum}_{i=1}^{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right)}\\ &=\frac{\sqrt{n \cdot {\sum}_{i=1}^{n}\left( {\sum}_{i=1}^{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right)-\frac{1}{n} {\sum}_{j=1}^{n} {\sum}_{i=1}^{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right)\right)^{2}}}{{\sum}_{j=1}^{n} {\sum}_{i=1}^{k}\left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right)}. \end{array} $$
(39)

Therefore, we establish the mapping relationship between \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) and \(\left (1-\frac {\left |\left [w_{i}\right ]_{\mathcal {R}_{j}}\right |}{k}\right )\), i.e.,

$$ \begin{array}{@{}rcl@{}} \lambda \left( 1-\frac{\left|\left[w_{i}\right]_{\mathcal{R}_{j}}\right|}{k}\right) &=& C_{v}(\textbf{M}_{\text{KAM}}(\mathcal{W})) \\&\triangleq& C_{v}(\text{KAM}(\{\mathcal{R}_{j})\}_{n}), \end{array} $$
(40)

where λ(⋅) satisfies (12). The proof is completed. □

6.1 The relation between λ(⋅) and \(\text {KGR}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\) and \(\text {KAM}(\mathcal {R})\)

According to Theorems 1-4, we summarize the intrinsic properties of function λ(⋅). Specifically, we can capture the following three important pieces of information:

  1. 1.

    Universality Measurement function λ(⋅) establishes an internal relationship with Cv(⋅) (e.g., (19)), in the final mathematical expression, we find that the set \(\mathcal {W}\) does not affect (12). In other words, (12) is applied to any finite set (only requires \(\mathcal {W}\) can be divided according to some relation \(\mathcal {R}\)), which means that the function λ(⋅) has universality.

  2. 2.

    One-to-one correspondence between four measurement functions and the input of λ(⋅) For example, \(\log _{2}|[w_{i}]_{\mathcal {R}_{j}}|\) corresponds to \(\text {REN}(\mathcal {R}_{n})\); \(1-\frac {\left |\left [w_{i}\right ]_{\mathcal {R}_{j}}\right |}{k}\) corresponds to \(\text {KAM}(\mathcal {R}_{n})\). Therefore, λ(⋅) achieves formal unification of the four different measurement functions.

  3. 3.

    Monotonicity The function λ(⋅) can uniformly describe these four different measurement tools in a two-dimensional plane. Since \(\left |\left [w_{i}\right ]_{\mathcal {R}_{j}}\right |\in \mathbb {R}\), thus that, \(|[w_{i}]_{\mathcal {R}_{j}}|\), \(\log _{2}|[w_{i}]_{\mathcal {R}_{j}}|\), \(\log _{2}\frac {k}{\left |[w_{i}]_{\mathcal {R}_{j}}\right |}\) and \(1-\frac {\left |\left [w_{i}\right ]_{\mathcal {R}_{j}}\right |}{k}\) can be described by the parameters x, \(\log _{2}(x)\), \(\log _{2}(\frac {k}{x})\), and \(1-\frac {x}{k}\), where \(x > 0, k\in \mathbb {Z}^{+}\), and they are all elementary functions in a two-dimensional plane.

Equivalent representation

According to λ(⋅) and Cv(⋅), we use λ(⋅) to describe Cv(⋅) equivalently. In addition, according to (12), we see that the difference between \(C_{v}(\textbf {M}_{\text {KGR}}(\mathcal {W}))\), \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\), \(C_{v}(\textbf {M}_{\text {KEN}}(\mathcal {W}))\) and \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) are completely dependent on their different inputs \(|[w_{i}]_{\mathcal {R}_{j}}|\), \(\log _{2}|[w_{i}]_{\mathcal {R}_{j}}|\), \(\log _{2}\frac {k}{\left |[w_{i}]_{\mathcal {R}_{j}}\right |}\) and \(1-\frac {\left |\left [w_{i}\right ]_{\mathcal {R}_{j}}\right |}{k}\). Therefore, the difference between four mathematical tools for measuring the uncertainty of \([\mathcal {W}, \mathcal {R}]\) can be represented by x, \(\log _{2}(x)\), \(\log _{2}(\frac {k}{x})\), and \(1-\frac {x}{k}\).

Interval range

Observably, considering the monotonicity of each function, we can obtain that in the interval [α,β], the In (10) always holds, where α satisfies \(\alpha = x_{1}=\sqrt {k}\) (i.e., \(\log _{2}(x_{1}) = \log _{2}(\frac {k}{x_{1}})\)), and β satisfies β = x2 = 2k or x2 = k (i.e., \(1-\frac {x_{2}}{k} = \log _{2}(\frac {k}{x_{2}}) \)). Consequently, we obtain an initial range, that is \([\sqrt {k}, 2k],k\in \mathbb {Z}^{+}.\) However, \(1-\frac {x_{2}}{k} = 1-\frac {2k}{k} = -1,\) which contradicts with \(1-\frac {\left |\left [w_{i}\right ]_{\mathcal {R}_{j}}\right |}{k} \ge 0\) (because \(\left |\left [w_{i}\right ]_{\mathcal {R}_{j}}\right |\le k\)). Then the value of βmin should be subject to \(1-\frac {x_{2}}{k} = 0,\) i.e., \(\beta = x^{\prime }_{2} = k.\) Therefore, we obtain that,

Corollary 1

If

$$ \left|\left[w_{i}\right]_{\textbf{R}_{j}}\right|\in [\alpha, \beta] = [x_{1}, x^{\prime}_{2} ] = [\sqrt{k}, k] \subseteq [\lceil \sqrt{k} \rceil, k], $$
(41)

where ⌈k⌉ is ceiling function, i.e., \(\lceil k \rceil ={\min \limits } \{n\in \mathbb {Z}|k \leqslant n\}\) (e.g., ⌈2.4⌉ = 3). Then,

$$ \begin{array}{@{}rcl@{}} C_{v}(\textbf{M}_{\text{KGR}}(\mathcal{W}))&>& C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W})) > C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W})) \\&>& C_{v}(\textbf{M}_{\text{KAM}}(\mathcal{W})) \end{array} $$

For an intuitive experience, we provide two visualizations of the different evaluation functions of x, \(\log _{2}(x)\), \(\log _{2}(\frac {k}{x})\), and \(1-\frac {x}{k}\) under different k values. According to Fig. 1 (k = 16), and Fig. 2 (k = 25), we can clearly see the difference between the four measurement functions.

Fig. 1
figure 1

A visualization of the different evaluation functions x, \(\log _{2}(x)\), \(\log _{2}(\frac {k}{x})\), and \(1-\frac {x}{k}\) at k = 16

Fig. 2
figure 2

A visualization of the different evaluation functions x, \(\log _{2}(x)\), \(\log _{2}(\frac {k}{x})\), and \(1-\frac {x}{k}\) at k = 25

Note

We provide two visual examples to understand the unified representation of these four measurement functions, which correspond to the four different inputs of the unified metric function λ(⋅). In the previous section, we provide an explicit interval within which the Inequality (10) holds strictly. However, as shown in Figs. 1 and 2, the magnitude relations of the four measurement functions are not unique, if \(\left |\left [w_{i}\right ]_{\textbf {R}_{j}}\right |\in (0, \sqrt {k})\). In summary, we conclude the following:

  1. 1.

    When \(\left |\left [w_{i}\right ]_{\textbf {R}_{j}}\right |\in [\sqrt {k}, k]\), inequality (10) holds strictly. In other words, KAM(\(\mathcal {R}\)) has a much better performance for measuring the uncertainty of KBs.

  2. 2.

    When \(\left |\left [w_{i}\right ]_{\textbf {R}_{j}}\right |\in (0, \sqrt {k})\), the four measurement functions do not show regularity in the results, and KAM(\(\mathcal {R}\)) almost always shows better performance. Note that since k represents the number of samples in the dataset, the interval \(\left |\left [w_{i}\right ]_{\textbf {R}_{j}}\right |\in (k, +\infty )\) does not exist in practice, so we will not discuss this situation.

Comparison analysis

λ(⋅) formally unifies \(\text {KGR}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\). Next, we visualize the similarities and differences between λ(⋅) and \(\text {KGR}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\) by Figs. 3 and 4.

Fig. 3
figure 3

Comparison of the measure values of the four measurement functions

Fig. 4
figure 4

Comparison of the outputs in λ(⋅) corresponding to the four different inputs

It is worth noting that λ(⋅) is not a new measurement function, which is used as a unified equivalent form of \(\text {KGR}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\). Therefore, the following analysis does not involve a comparison of performance, while focusing on the differences between λ(⋅) and each measurement function in terms of principle, interpretability. Specifically, as shown in Figs. 3 and 4, we summarize the comparison between λ(⋅) and \(\text {KGR}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\) as follows:

  1. 1.

    Measurement principle: For \(\text {KGR}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\), they focus only on outputting specific numerical results (e.g., coefficients of variation) in their studies of measures of uncertainty for knowledge bases. In other words, the comparison of the performance between these measurement functions is also limited to the presentation by the magnitude of the statistical values they compute. Unfortunately, this comparison at the level of results alone does not reflect why the four measurement functions differ. For example, in the case where the potential association between \(\text {KGR}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\) are not considered, it does not reveal the reason, although it can reflect that the value of “pink” is (almost always) greater than the value of “blue” (as shown on the left in Fig. 3).

  2. 2.

    Interpretability: As shown in Fig. 4, λ(⋅) integrates the four measurement functions in a unified measurement framework, where different inputs correspond to different outputs. In Theorem 1, we have proved that λ(⋅) has the following form, i.e.,

    $$ \begin{array}{@{}rcl@{}} \lambda(\cdot) &=&\frac{\sqrt{n \cdot {\sum}_{i=1}^{n}\left( {\sum}_{i=1}^{k}(\cdot)-\frac{1}{n} {\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(\cdot)\right)^{2}}}{{\sum}_{j=1}^{n} {\sum}_{i=1}^{k}(\cdot)}, \\x &=& |[w_{i}]_{\mathcal{R}_{j}}|\in \mathbb{Z}^{+}. \end{array} $$

    Obviously, for determined x, n, and k (which can be determined from the knowledge base), λ(⋅) involves only changes in values and therefore does not change the monotonicity of the original input. This excellent property allows the comparison between different outputs based on λ(⋅) to be translated into a comparison of their corresponding inputs, i.e., x, \(\log _{2}(x)\), \(\log _{2}(\frac {k}{x})\), and \(1-\frac {x}{k}\). Fortunately, each of the above four inputs corresponds to four more primitive functions and can be compared (as shown in Figs. 1 and 2). Thus, although λ(⋅) is not a new measurement function, as a unified integrated framework for \(\text {KGR}(\mathcal {R})\), \(\text {REN}(\mathcal {R})\), \(\text {KEN}(\mathcal {R})\), and \(\text {KAM}(\mathcal {R})\), it explains the differences in the metric values of different measurement functions by comparing x, \(\log _{2}(x)\), \(\log _{2}(\frac {k}{x})\), and \(1-\frac {x}{k}\).

Limitations

In RST, knowledge reflects the ability to classify some objects [45]. Specifically, in a KB, the set of entities we are interested in a certain field can be regarded as a finite set (or universe) \(\mathcal {W}\) and any subset \(\mathcal {C}\subseteq \mathcal {W}\) is called a category (or concept) in \(\mathcal {W}\), which contains many entities. The concept family, which contains many concepts, is called abstract knowledge about \(\mathcal {W}\). A KB over \(\mathcal {W}\) is equivalent to a family of classifications over \(\mathcal {W}\). Objects in a KB can be divided according to their different attributes. For example, given a set \(\mathcal {W}\), which contains many candies, and suppose these candies have different colors (e.g., white, yellow, red) and shapes (e.g., round, square, triangle), then, these candies can be described by attributes such as color and shape, e.g., red round candies, or yellow triangle candies, etc. According to different attributes, we can describe the specific situation of these candies by a certain attribute (e.g., color and shape). Hence, we can obtain two equivalence relations (or attributes) from the above example, i.e., \(\mathcal {R}=\{\textbf {R}_{1}, \textbf {R}_{2}\}=\{\texttt {color},~\texttt {shape}\}\). According to these equivalence relations, the corresponding equivalence class can be further obtained. The elements in the set \(\mathcal {W}\) are divided and recombined according to the equivalence relations, e.g., candies are divided by color.

7 Measures of uncertainty for KBs without attribute information

In the previous section, we analyze the performance of different measurement functions in measuring the uncertainty of KBs. The limitation of previous research is that the division of instances in a KB can often only depend on their attributes. However, the type of knowledge base has changed with the needs of real applications, and some of the knowledge bases do not contain the attributes of the instances or lack sufficient attribute relations to classify the instances (e.g., ProBase). In this section, we first provide the definition of concept structure of ProBase (see Definition 6). And then, we provide an effective strategy to induce KBs from ProBase, and instances in induced KBs can be classified by their concepts.

7.1 Inducing KBs from ProBase: intuition

According to the definition 4, for the sake of simplicity of description, we use a \([\mathcal {T}, {\mathscr{H}}]\) to represent a KB induced by ProBase. In fact, in ProBase, all KBs are induced by the same strategy. Hence, in the rest of this paper, we unify all knowledge bases into \([\mathcal {T}, {\mathscr{H}}]\) for theoretical analysis. Specifically, the more accurate description is that \(\mathcal {T}\) is the set containing a large number of instances, which refer to nodes that no longer have hyponyms in Pobase, and \({\mathscr{H}}\) is the family of hypernyms (or concepts) set of instances. Therefore, in this paper, we do not strictly distinguish the difference between InstanceOf and SubClass. In most downstream tasks, the two can be unified as the isA relationship.

Definition 12 (ProBase 34)

ProBaseFootnote 3 is probabilistic of taxonomy, which contains hundreds of millions of instances, concepts, and isA relationships. isA relationship can be specified as InstanceOf relation between a concept and an instance (e.g., (Snoopy, isA, dog)) or SubClass relation between a pair of concepts (e.g., (fruit, isA, botany)).

Classifications

We first use a simple example to illustrate the intuition that the instances in ProBase can be classified according to their concepts.

Example 2

Given a finite set \(\mathcal {T}_{1} = \{\text {dhol, tiger, lion, wolf}\}\), if \(\mathcal {T}_{1}\) is divided by the equivalence relation Ha = {carnivore}, the equivalence class of \(\mathcal {T}_{1}\) can form an independent set, i.e., \(\mathcal {T} = \mathcal {C},\) where

$$\mathcal{C} = \mathcal{T}_{1} / \textbf{H}_{a} = [\text{dhole, tiger, lion, wolf}~]_{\textbf{H} = \text{carnivore}}.$$

If \(\mathcal {T}_{1}\) is divided by the equivalence relation

$$\textbf{H}_{b} =\{\text{beast division}\}=\{\textbf{H}_{1}, \textbf{H}_{2}\}=\{\text{canidae}, \text{felidae}\}.$$

Then \(\mathcal {T}_{1}\) can be divided into \(\mathcal {C} =\mathcal {T}_{1} / \textbf {H}_{b}= \{\mathcal {C}_{1},~\mathcal {C}_{2}\},\) where

$$\mathcal{C}_{1} = \{\text{dhole,~wolf}\}_{\textbf{R}_{1} = \text{canidae}}, \text{~and~} \mathcal{C}_{2} = \{\text{tiger, lion}\}_{\textbf{R}_{2} = \text{felidae}}.$$

As can be seen from Example 2, \(\mathcal {T}_{1}\) can be divided by \(\textbf {H}_{b}\in {\mathscr{H}}\) to obtain \(\mathcal {C}_{1}\) and \(\mathcal {C}_{2}\).

For ProBase, the dimension of \(\mathcal {T} = \{\mathcal {C}_{i}\}_{m}\) can be determined by \({\mathscr{H}}\), hence, \(\mathcal {T} = \{\mathcal {C}_{i}\}_{m}\) can be regarded as a vector in vector space. Note that, suppose \([\mathcal {T}, {\mathscr{H}}]\) be a KB induced by ProBase, where \(\mathcal {T}\) is the set of instances, and \({\mathscr{H}}\) is the family consisting of the set of hypernyms (i.e., concepts) of instances, then the choice of concepts is constrained. This means that the instances in \(\mathcal {T}\) can be divided by \({\mathscr{H}}\). Therefore, in this paper, we regard an equivalence relation (i.e., attribute) in the KB as a concept (i.e., hypernym) in ProBase. Li et al. [33] define the vector \(\mathcal {T} = \{\mathcal {C}_{i}\}_{m}\) as the knowledge structure of KBs. Similarly, we provide the definition of the concept structures of \([\mathcal {T}, {\mathscr{H}}]\) as follows:

Definition 13 (Concept structures of \([\mathcal {T}, {\mathscr{H}}] \))

Suppose \([\mathcal {T}, {\mathscr{H}}]\) be a KB induced by ProBase, if the finite set \(\mathcal {T}=\{t_{i}\}_{k}\) can be divided by relations \({\mathscr{H}}=\{\textbf {H}_{1}, \textbf {H}_{2}, ... , \textbf {H}_{i}\}\), then we call the vector

$$ \text{CSV}(\mathcal{H})=\left\langle \left[t_{1}\right]_{\mathcal{H}},\left[t_{2}\right]_{\mathcal{H}}, \ldots,\left[t_{k}\right]_{\mathcal{H}}\right\rangle $$
(42)

is the concept structure of \([\mathcal {T}, {\mathscr{H}}]\).

In Example 2, let t1 = tiger, t2 = lion, and H2 = {felidae}, then \(\left [t_{1}\right ]_{\textbf {H}_{b}} \triangleq \left [t_{2}\right ]_{\textbf {H}_{b}}\), which mean that tiger and lion are equivalent under relation H2. Similarity, \(\mathcal {C}_{1}\) and \(\mathcal {C}_{2}\) are equivalent under relation Hb.

7.2 Inducing KBs from ProBase: strategy

Strategy

It is worth noting that in ProBase, most instances belong to many hypernyms, in other words, two or more different concepts may have the identical instances (e.g., the hypernyms of apple can be company, fruit, etc.). Therefore, intuitively, ProBase can divide instances based on different levels of hypernyms to obtain multiple KBs, and the specific division strategy is:

  1. 1.

    Select an instance \(t_{i}\in \mathcal {T}\) which should have at least three hypernym hierarchies (denoted as \(h^{j}(t_{i}, q), i\in |\mathcal {T}|,j, q\in \mathbb {Z}^{+},j_{max}\geqslant 3\)), i.e.,

    $$ t_{i}\longrightarrow {h^{1}_{k}}(t_{i}, q)\longrightarrow h^{2}(t_{i}, q) \longrightarrow h^{3}(t_{i}, q)\longrightarrow \cdots, $$
    (43)

    where xy means x is the hyponym of y. For example,

    $$ \texttt{corn} \longrightarrow \texttt{crop}\longrightarrow \texttt{plant}\longrightarrow\cdots $$
    (44)
  2. 2.

    Repeat the above strategy, and finally obtain all \({{h^{1}_{k}}(t_{i})}\) satisfying (45), i.e.,

    $$ t_{i}\longrightarrow \left\{ \begin{array}{cc} {h^{1}_{1}}(t_{i}, 1) \\ {h^{1}_{2}}(t_{i}, 1) \\ {\vdots} \\ {h^{1}_{k}}(t_{i}, 1)\\ \end{array} \right\} \longrightarrow h^{2}(t_{i}, 1) \longrightarrow \cdots $$
    (45)

    For example.

    $$ \begin{array}{llll} &\!\!\!\!\texttt{corn} \!\longrightarrow\! \texttt{crop}\longrightarrow \texttt{plant}\longrightarrow\cdots,\\ &\!\!\!\!\texttt{corn} \!\longrightarrow\! \texttt{Monocotyledoneae}\longrightarrow \texttt{plant}\longrightarrow\cdots,\\ &\!\!\!\!\texttt{corn} \!\longrightarrow\! \texttt{herbaceous plants}\longrightarrow \texttt{plant}\longrightarrow\cdots. \end{array} $$
    (46)
  3. 3.

    Collect all the instances in each \({h^{1}_{k}}(t_{i}, 1)\) to form set T1.

  4. 4.

    Repeat the selection strategy above, similarly, we collect all the instances in each \(h^{1}_{k^{\prime }}(t_{i}, 2)\) to form set T2.

    For example,

    $$ \texttt{corn} \longrightarrow \begin{cases} &\texttt{food}\longrightarrow \texttt{Foods Association}\longrightarrow\cdots,\\ &\vdots\\ &\texttt{coarse food grain}\longrightarrow \texttt{Foods Association}\longrightarrow\cdots.\\ \end{cases} $$
    (47)
  5. 5.

    Until ti does not satisfy (45), the search is terminated. The final acquired dataset

    $$ \begin{array}{llll} &{}[\mathcal{T}, \mathcal{H}],\\ &{}\mathcal{T}=\{T_{1}, T_{2},...,T_{q}\},\\ &{}\mathcal{H}=\{h^{2}(t_{i}, 1), h^{2}(t_{i}, 2),..., h^{2}(t_{i}, q)\}.\\ &{}\text{s.t.~} \begin{cases} &\!\!\!T_{i}\cap T_{j,j\neq i}=\emptyset,\\ &\!\!\!hypo(h^{2}(t_{i}, q_{i}))\cap hypo(h^{2}(t_{i}, q_{j,j\neq i}))\neq \emptyset.\\ \end{cases} \end{array} $$
    (48)

    can be viewed as a sub-dataset induced by ProBase, based on instance ti. TiTj,ji = ensures that the same instance is strictly divided according to its hypernyms. For example, a candy cannot be both red and blue. hypo(h2(ti,qi)) ∩ hypo(h2(ti,qj,ji))≠ ensures that presence of instances under any combination of hypo(h2(ti,qi),qi ∈{1,2,,...,q}).

Rationality analysis

The strategy is not unique. Similarly, we also select a concept (the concept must have enough hypernym hierarchies and hyponym hierarchies) to conform to the selection strategy of (45). We won’t repeat it here. Obviously, multiple KBs can be induced from ProBase based on the above strategy, and the instances in these KBs can be divided according to their selected concepts. As a comparison, in \([\mathcal {T}, {\mathscr{H}}]\), a “h2(ti,q)” plays the role of an attribute, and “\({h^{1}_{k}}(t_{i}, 1)\)” represents the attribute value. Therefore, based on the above strategy and analysis, we theoretically provide a strategy for inducing a KB from ProBase, and the instances in the induced KB can be strictly classified based on their selected concepts. Our results indicate that λ(⋅) provides valuable insights to integrate four measurement functions into a unified framework for measuring the uncertainty of KBs.

8 Experiments

8.1 KBs with attribute information

Comparison of four measurement functions

We conduct experiments on the datasets in Table 4 with the aim of comparing the performance of four measurement functions, KGR(⋅), REN(⋅),KEN(⋅) and KAM(⋅), across different knowledge bases.

Table 4 Data sets from UCI,a “#X”represents the number of “X”

The measure sets construction

Specifically, for a KB \([\mathcal {W}, \mathcal {R}]\), we denote \(R_{i}=ind(\{f_{i}\in \mathcal {R}\})\), where ind(⋅) stands for the indiscernibility relation, such as \(ind(\mathcal {R})=\bigcap _{f_{i}\in \mathcal {R}}\mathcal {R}\). Let \(\mathcal {R}\) be the set consisting of Ri, where \(\mathcal {R}\) satisfies \(\mathcal {R}_{j}=\{R_{1}, R_{2},...,R_{j}\}\) (e.g., \(\mathcal {R}_{3}=\{R_{1}, R_{2}, R_{3}\}\)). Obviously, \([\mathcal {W}, \mathcal {R}_{j}]\) is the knowledge base induced by \(\mathcal {W}\). Therefore, we obtain four measure sets on \(\mathcal {W}\) as follows:

$$ \begin{array}{llll} &\!\!\!\textbf{M}(\text{KGR}) = \{\text{KGR}(\mathcal{R}_{1}), \text{KGR}(\mathcal{R}_{2}), ..., \text{KGR}(\mathcal{R}_{j})\},\\ &\!\!\!\textbf{M}(\text{REN}) = \{\text{REN}(\mathcal{R}_{1}), \text{REN}(\mathcal{R}_{2}), ..., \text{REN}(\mathcal{R}_{j})\},\\ &\!\!\!\textbf{M}(\text{KEN}) = \{\text{KEN}(\mathcal{R}_{1}), \text{KEN}(\mathcal{R}_{2}), ..., \text{KEN}(\mathcal{R}_{j})\},\\ &\!\!\!\textbf{M}(\text{KAM}) = \{\text{KAM}(\mathcal{R}_{1}), \text{KAM}(\mathcal{R}_{2}), ..., \text{KAM}(\mathcal{R}_{j})\},\!\! \end{array} $$
(49)

Example 3

For example, “Lymphography” in Table 4 can be viewed as an information system \([\mathcal {T}, \mathcal {F}]\) with \(|\mathcal {T}|=148\), \(|\mathcal {F}|=18\). We can obtain four measure sets on “Lymphography” as follows:

$$ \begin{array}{llll} &\!\!\!\textbf{M}_{\text{KGR}}(\mathcal{W}) = \{\text{KGR}(\mathcal{R}_{1}), \text{KGR}(\mathcal{R}_{2}), ..., \text{KGR}(\mathcal{R}_{18})\},\\ &\!\!\!\textbf{M}_{\text{REN}}(\mathcal{W}) = \{\text{REN}(\mathcal{R}_{1}), \text{REN}(\mathcal{R}_{2}), ..., \text{REN}(\mathcal{R}_{18})\},\\ &\!\!\!\textbf{M}_{\text{KEN}}(\mathcal{W}) = \{\text{KEN}(\mathcal{R}_{1}), \text{KEN}(\mathcal{R}_{2}), ..., \text{KEN}(\mathcal{R}_{18})\},\\ &\!\!\!\textbf{M}_{\text{KAM}}(\mathcal{W}) = \{\text{KAM}(\mathcal{R}_{1}), \text{KAM}(\mathcal{R}_{2}), ..., \text{KAM}(\mathcal{R}_{18})\}, \end{array} $$
(50)

and the values of \(\text {KGR}(\mathcal {R}_{j})\), \(\text {REN}(\mathcal {R}_{j})\), \(\text {KEN}(\mathcal {R}_{j})\) and \(\text {KAM}(\mathcal {R}_{j})\) are calculated by (4)–(7).

8.2 Experimental results and analysis on multi-domain datasets

Experimental results

The experimental results are shown in Table 5 and Fig. 5.

Table 5 Coefficient of variation values of measure sets \(\textbf {M}_{\text {KGR}}(\mathcal {W}_{i})\), \(\textbf {M}_{\text {REN}}(\mathcal {W}_{i})\), \(\textbf {M}_{\text {KEN}}(\mathcal {W}_{i})\), and \(\textbf {M}_{\text {KAM}}(\mathcal {W}_{i})\)
Fig. 5
figure 5

Coefficient of variation values of four measure sets on datasets (a)–(r)

Analysis

From the results, we conclude that:

  1. 1.

    Consistency of results: We select datasets from different domains to validate our theoretical analysis, which contains different numbers of instances and attributes. Specifically, 18 datasets involving 6 domains (i.e., game, life science, social science, computer, physical and other) all consistently demonstrate our theoretical analysis, i.e.,

    $$ \begin{array}{@{}rcl@{}} C_{v}(\textbf{M}_{\text{KGR}}(\mathcal{W}))&>& C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W})) > C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W})) \\&>& C_{v}(\textbf{M}_{\text{KAM}}(\mathcal{W})). \end{array} $$
  2. 2.

    Metric Performance: For the dataset of different domains, the value of \(C_{v}(\textbf {M}_{\text {KGR}}(\mathcal {W}))\) fluctuates the most, and it has the worst performance for measuring the uncertainty of KBs. By contrast, the value of \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) has good stability, and it has the best performance for measuring the uncertainty of KBs.

  3. 3.

    Comparison of \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\) and \(C_{v}(\textbf {M}_{\text {KEN}}(\mathcal {W}))\): As shown in Fig. 5, the gap between \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\) and \(C_{v}(\textbf {M}_{\text {KEN}}(\mathcal {W}))\) is not significant in most of the datasets, which is consistent with our analysis of the measurement functions \(\text {REN}(\mathcal {R})\) and \(\text {REN}(\mathcal {R})\) in the previous section. For example, as shown in Figs. 1 and 2, when the value of x is in the interval \([\sqrt {k}, k]\), the gap between \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\) and \(C_{v}(\textbf {M}_{\text {KEN}}(\mathcal {W}))\) is not too significant in most cases.

  4. 4.

    Comparison of \(C_{v}(\textbf {M}_{\text {RGR}}(\mathcal {W}))\) and \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\): Contrasted with the above conclusion, the gap between \(C_{v}(\textbf {M}_{\text {RGR}}(\mathcal {W}))\) and \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) demonstrates a significant difference on almost all datasets, which is consistent with our analysis of the measurement functions \(\text {RGR}(\mathcal {R})\) and \(\text {KAM}(\mathcal {R})\) in the previous section. For example, as shown in Figs. 1 and 2, when the value of x is in the interval \([\sqrt {k}, k]\), the gap between \(C_{v}(\textbf {M}_{\text {RGR}}(\mathcal {W}))\) and \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) will increase as x increases.

8.3 KBs induced by ProBase

In this section, we aim to induce several KBs from ProBase based on the above strategy and to perform uncertainty measurement on the induced KBs. Specifically, we induce three different sizes of KBs (denoted as D1,D2,andD3) for the metric, and the specific information of D1 (based on concept fruit induction), D2 (based on concept corn induction, containing 123 instances) and D3 (based on concept corn induction, containing 1290 instances) are shown in Table 6. The construction method of the measure sets on D1, D2, and D3 is the same as the construction method (49) on the general datasets.

Table 6 Statistical information of D1, D2 and D3

8.4 Experimental results and analysis on ProBase

Experimental results

The experimental results are shown in Table 7 and Fig. 6.

Table 7 Coefficient of variation values of measure sets MKGR(Di), MREN(Di), MKEN(Di), and MKAM(Di) on dataset Di,i= 1,2,3
Fig. 6
figure 6

Coefficient of variation values of four measure sets on datasets D1, D2 and D3

Analysis

From the results, we conclude that:

  1. 1.

    In datasets D1 and D3, the results show the following relationship, i.e.,

    $$ \begin{array}{@{}rcl@{}} C_{v}(\textbf{M}_{\text{KGR}}(D_{i})&>& C_{v}(\textbf{M}_{\text{KEN}}(D_{i}))>C_{v}(\textbf{M}_{\text{REN}}(D_{i}) \\&>& C_{v}(\textbf{M}_{\text{KAM}}(D_{i}). \end{array} $$
    (51)

    The result is in line with our analysis conclusion. As shown in Figs. 1 and 2, we find that, in the interval \((0, \sqrt {k})\), there will be a situation where

    $$ \begin{array}{@{}rcl@{}} C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W})) &>& C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W})), \text{if~} \left|\left[w_{i}\right]_{\textbf{R}_{j}}\right|\\&&\in [0, \sqrt{k}],~w_{i}\in \mathcal{W}. \end{array} $$
    (52)

    This fully validates the rigor of our theoretical analysis. Moreover, this conclusion also reveal that \(\text {KEN}(\mathcal {W})\) and \(\text {REN}(\mathcal {W})\) are greatly affected by the parameter k.

  2. 2.

    In dataset D2, the results reveal the following relationship, i.e.,

    $$ \begin{array}{@{}rcl@{}} C_{v}(\textbf{M}_{\text{KGR}}(\mathcal{W}))&>&C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W}))> C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W}))\\&>& C_{v}(\textbf{M}_{\text{KAM}}(\mathcal{W})). \end{array} $$

    This further verifies that \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\) has stable and excellent performance in measuring the uncertainty of the KB.

  3. 3.

    Consistent with the experimental conclusions on the public datasets, \(\text {KGR}(\mathcal {W})\) has the worst performance in measuring the uncertainty of KBs, while \(\text {KAM}(\mathcal {W})\) maintains the best performance in measuring the uncertainty of KBs.

9 Case study

In this section, we provide a small-scale case to visually demonstrate how to use rough set theory and induction strategy (i.e., Section 7.2) to induce a measurable knowledge base (denoted as D4) from ProBase. Dataset D4 contains 19 concepts about fruit, and their corresponding hypernyms in ProBase (the selection of hypernyms is based on the induction strategy in Section 7.2). The statistical information of D4 is summarized in Table 8.

Table 8 Statistical information of D4

Further, as in the above experiments, we construct measure sets on D4, and calculate the coefficient of variation of measure sets, and the results are shown in Fig. 7.

Fig. 7
figure 7

Coefficient of variation values of measure sets on dataset D4

Obviously, the experimental results based on dataset D4 are consistent with the previous theoretical analysis and experimental evaluation conclusions. That is KGR(D4) has the worst performance in measuring the uncertainty of KBs, while KAM(D4) maintains the best performance in measuring the uncertainty of the KB. In particular, the case study also captures the situation where Cv(MKEN(D4)) is greater than Cv(MREN(D4)).

10 Discussion

In this section, we hope to bring some guidance and insight to the study of knowledge base uncertainty through the results of the theoretical analysis in this paper. According to Table 5 and Fig. 5, we visually observe that although \(C_{v}(\textbf {M}_{\text {KGR}}(\mathcal {W}))\), \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\), \(C_{v}(\textbf {M}_{\text {KEN}}(\mathcal {W}))\), and \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) exhibit the theoretical analysis of this paper on all 18 public datasets, i.e.,

$$ \begin{array}{@{}rcl@{}} C_{v}(\textbf{M}_{\text{KGR}}(\mathcal{W}))&>& C_{v}(\textbf{M}_{\text{REN}}(\mathcal{W})) > C_{v}(\textbf{M}_{\text{KEN}}(\mathcal{W})) \\&>& C_{v}(\textbf{M}_{\text{KAM}}(\mathcal{W})). \end{array} $$

However, a more detailed analysis reveals that there are significant differences between the different measurement functions (e.g., in the dataset “Letter Precognition”, \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) is 0.0380, but \(C_{v}(\textbf {M}_{\text {KGR}}(\mathcal {W}))\) can reach 3.1032). Therefore, a single conclusion based on a single measurement function is not sufficient. Based on the theoretical analysis and experimental validation in this paper, we advocate that the uncertainty of the knowledge base should be evaluated by combining the four measurement functions. For example, for datasets“Solar Flare” and “Letter Recognition”, although they differ slightly in the \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}))\) (\(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}_{15}))=0.0380\), \(C_{v}(\textbf {M}_{\text {KAM}}(\mathcal {W}_{16}))=0.0204\)), they differ significantly in the \(C_{v}(\textbf {M}_{\text {KGR}}(\mathcal {W}))\) and \(C_{v}(\textbf {M}_{\text {REN}}(\mathcal {W}))\). Therefore, it may be a more reasonable way to comprehensively consider these measurement functions.

The rapid development of deep neural networks (DNNs) in recent years has reached almost every field of AI, meanwhile, many researchers begin to think deeply about the reliability of prediction results based on neural networks. There is already evidence that uncertainty (e.g., data uncertainty and model uncertainty) imposes many limitations on DNNs, such as the lack of transparency of a DNN’s inference framework [46]. In the previous sections, we focus on measures of uncertainty for knowledge bases, aiming to provide a rigorous theoretical analysis for the existing conclusions (e.g., uncover the reasons for performance differences between measurement functions). We hope these results will provide insights into understanding the essence of uncertainty (e.g, uncertainty quantification [47]) for knowledge bases.

11 Conclusion and further work

The work of this paper is inspired by the experimental conclusions of [1]. In [1], the authors verify the superiority of measuring the uncertainty of KBs based on the knowledge amount through experiments on three datasets. Although this conclusion lacks rigorous theoretical analysis, it encourages us to study why the knowledge-amount-based measurement function has the best performance in measuring the uncertainty of the knowledge base. Therefore, this paper provides deeper insights into the uncertainty measurement of the knowledge base.

In this paper, we review four popular measurement functions in measuring the uncertainty for KBs. Then, at the theoretical level, we integrate the four measurement functions into a unified new measurement function, which provides valuable insights for measuring the uncertainty of KBs. At the experimental level, the experimental results on the 18 public datasets are consistent with our theoretical analysis conclusions, which fully demonstrates the correctness of our theoretical analysis. In addition, for some special datasets (e.g., ProBase), which contains a large amount of structured knowledge, there are not enough attributes to classify the instances in it. This leads to the inability of the above measurement functions to perform the uncertainty measurement on ProBase. In order to solve this issue, we propose an effective strategy, which can induce sub-datasets from ProBase, and all the instances in the sub-dataset can be divided according to their concepts. Comparative experimental results justify the effectiveness of the strategy and the consistency with the theoretical conclusions.

Further work

Knowledge base, as an indispensable carrier for the development of artificial intelligence technology today, provides far-reaching resources for smart devices. With the increase in the amount of downstream real tasks and the diversification of real application scenarios, various types of knowledge bases have appeared one after another, and their knowledge structures have become more and more complicated. Therefore, how to measure the uncertainty of these knowledge bases is the future important work.

In addition, the timeliness, accuracy, and redundancy of the knowledge base are also important indicators to measure the knowledge base. Whether a complete theoretical analysis of the above measurement indicators can be established is one of our future efforts.