A Bidirectional Subsethood Based Fuzzy Measure for Aggregation of Interval-Valued Data

Kabir, Shaily; Wagner, Christian

doi:10.1007/978-3-030-50143-3_48

Shaily Kabir¹³ &
Christian Wagner¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1238))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

959 Accesses
1 Citations

Abstract

Recent advances in the literature have leveraged the fuzzy integral (FI), a powerful multi-source aggregation operator, where a fuzzy measure (FM) is used to capture the worth of all combinations of subsets of sources. While in most applications, the FM is defined either by experts or numerically derived through optimization, these approaches are only viable if additional information on the sources is available. When such information is unavailable, as is commonly the case when sources are unknown a priori (e.g., in crowdsourcing), prior work has proposed the extraction of valuable insight (captured within FMs) directly from the evidence or input data by analyzing properties such as specificity or agreement amongst sources. Here, existing agreement-based FMs use established measures of similarity such as Jaccard and Dice to estimate the source agreement. Recently, a new similarity measure based on bidirectional subsethood was put forward to compare evidence, minimizing limitations such as aliasing (where different inputs result in the same similarity output) present in traditional similarity measures. In this paper, we build on this new similarity measure to develop a new instance of the agreement-based FM for interval-valued data. The proposed FM is purposely designed to support aggregation, and unlike previous agreement FMs, it degrades gracefully to an average operator for cases where no overlap between sources exists. We validate that it respects all requirements of a FM and explore its impact when used in conjunction with the Choquet FI for data fusion as part of both synthetic and real-world datasets, showing empirically that it generates robust and qualitatively superior outputs for the cases considered.

You have full access to this open access chapter, Download conference paper PDF

An advanced study on the similarity measures of intuitionistic fuzzy sets based on the set pair analysis theory and their application in decision making

Article 25 April 2018

Norms and Discrete Choquet Integrals Induced by Submodular Fuzzy Measures: A Discussion

A Study of Monometrics from Fuzzy Logic Connectives

Keywords

1 Introduction

Data aggregation from multiple sources has become more prevalent in many applications including sensor fusion [8], and crowdsourcing [20]. In such aggregation contexts, the fuzzy integral (FI) which is specified in respect to a fuzzy measure (FM) [9] is often used to capture the importance of information arising from different combinations of sources. Generally, FMs are defined by experts or generated through algorithms, such as the Sugeno $\lambda $-measure [19] and the Decomposable measure [7] which leverage the ‘worth’ of the singletons (individual sources), a.k.a. the densities. Another approach to generating FMs is optimization based on tuning an FM in respect to the behaviour of an aggregation function such as the FI and training data [2, 4]. If training data or information on the densities is limited or missing, specifying a FM is a challenging task, even though such a situation arises often, for example in aggregating crowdsourced data. To deal with this, Wagner and Anderson [21] first extracted FMs directly from the input data (the evidence) by analyzing and extracting key properties such as agreement, and specificity. Later, Havens et al. [11, 12] introduced more data-driven FMs which refined the established agreement FM in particular to leveraging a generic similarity measure (SM) to extract the property of ‘agreement’ amongst evidence from combinations of sources. This paper focuses on a recently introduced SM – bidirectional subsethood based SM [14, 15] which has been shown to address a number of limitations in common existing SMs such as Jaccard [13] and Dice [6], and explores the impact of its use in conjunction with agreement-based FMs.

So far, three agreement-based FMs have been proposed—the FM of Agreement (AG) [21], the FM of Generalized Accord (GenA) [12], and the Additive Measure of Agreement (AA) [11]. The AG FM captures the source agreement by using the intersection operation which considers only the overlap amongst multi-source data without tracking changes in their cardinality/size. This limitation of the intersection operation causes the AG FM to generate the same agreement and thus worth for very different subsets of sources. Figure 1 shows such a situation for interval-valued data with the AG FM. On the other hand, the use of the Jaccard or Dice SM with the GenA and AA FMs to estimate the source agreement makes the resulting FM susceptible to limitations of these measures, in particular aliasing–returning the same similarity for very different sets of intervals [14, 15]. Figure 2 presents such a case where the GenA and AA FMs produce identical agreement values and thus worth for different sets.

Given this context, this paper focuses on developing a new instance of an agreement FM to avoid the limitations of the existing ones. The proposed FM leverages the bidirectional subsethood based SM [14, 15] to minimize aliasing in the inter-source agreement and worth calculation. The proposed FM is designed following the concept of the GenA FM [12], and considers both cases where sources are overlapping (some agreement) or non-overlapping (no agreement). When sources are non-overlapping, the proposed FM in combination with the FI gracefully degrades to an average operator, whereas existing agreement FMs are not designed to deal with such cases. Beyond developing this FM, this paper also demonstrates its behaviour against the existing agreement FMs in aggregating interval datasets when used in combination with the Choquet FI (CFI) [5].

The paper is structured as follows: Sect. 2 reviews FMs and FIs along with a brief discussion of subsethood and the bidirectional subsethood based SM [14, 15]. Section 3 discusses existing agreement FMs. Section 4 develops a new instance of the agreement-based FM exploiting the bidirectional subsethood based SM. Section 5 demonstrates the behaviour of the proposed FM against the existing agreement FMs in aggregating interval-valued datasets when used with an FI for both synthetic and real-world datasets. Finally, Sect. 6 concludes the paper with suggestions and future work (Table 1).

Table 1. Acronyms and notation

Full size table

2 Background

This section initially reviews FMs and FIs and then provides a short discussion on subsethood and the new bidirectional subsethood based SM [14, 15].

2.1 Fuzzy Measures

FMs are defined as a hierarchical weighting structures (lattices) that capture the worth of all subsets in a set of sources, including that of the singletons, also referred to as the densities. Mathematically, an FM, g defined on a finite set of sources, $X=\{x_{1},...,x_{n}\}$ is a function $g:2^X\rightarrow [0,1]$ satisfying the properties [9]:

(P1) $g(\emptyset )=0$ and $g(X)=1$ (Boundedness)
(P2) If $a \subseteq b \subseteq X$ then $g(a) \subseteq g(b) \subseteq g(X)$ (Monotonicity)

Here, g(a) is the worth of a subset a of X. Property (P1) states that the worth of empty set ($\emptyset $) is 0 and the worth of universal set (X) is 1. We note that the worth of the universal set is not always required to be 1, but this convention is adopted here. Property (P2) shows the monotonicity of g, stating that if a is a subset of b ($a\subset b$), the worth of a is smaller or equal to the worth of b. There is a third property of continuous FMs, which is not applicable to discrete FMs, as used in this paper and most practical applications.

In practice, the FMs are defined in various ways, such as expert-defined, or derived by algorithms or optimization based on existing data and in conjunction with an aggregation functions such as the FI; for more details, please see [11, 22]. This paper focuses only on algorithmically derived FMs leveraging the evidence data arising from multiple sources. Section 3 reviews such FMs that are derived on the concept of source agreement.

2.2 Fuzzy Integrals

FIs have been efficiently used as powerful non-linear aggregation operators in evidence fusion [3, 9]. They aggregate multi-source data (evidence) by combining it with the worth information of all subsets of sources (captured by an FM). Two well-known FIs are the Sugeno FI (SFI) [19] and the Choquet FI (CFI) [5]. In practice, discrete SFI and CFI are commonly used [17] and in this paper, we focus on the discrete CFI as it is most popular for evidence aggregation.

Let $h:X\rightarrow [0,\infty )$ be a real-valued function that presents the evidence from a source. The discrete CFI is defined as

$$\begin{aligned} \int _{CFI} h \circ g = CFI_g(h)=\sum _{i=1}^n h(x_{\pi (i)})[g(A_i )- g(A_{i-1})], \end{aligned}$$

(1)

where $\pi $ is a permutation of X arranged like $h(x_{\pi (1)})\ge h(x_{\pi (2)})\ge $ ... $\ge h(x_{\pi (n)})$. $A_i=\{x_{\pi (1)}, x_{\pi (2)},..., x_{\pi (i)}\}$ is a subset of sources. g is the FM where $g(A_i)$ is the worth of the subset $A_i$ with $g(A_0)=0$.

In most cases, the multi-source data h is provided in a numeric form. However, in some applications h is better represented by interval-valued or fuzzy set-valued data. Considering this, FIs have been generalized for non-numeric evidence [1, 10, 16]. Let $\overline{h}:X\rightarrow I(\mathbb {R})$ be a set of interval-valued data where $I(\mathbb {R})$ is the set of all closed intervals over the real numbers and $\overline{h}_i=\overline{h}(x_i)=[h_i^-,h_i^+]$ be the ith interval (where $h_i^-$ and $h_i^+$ are the left and right endpoints respectively). Following the notation in [12], the CFI on $\overline{h}$ is defined as

$$\begin{aligned} \int _{CFI} \overline{h} \circ g=CFI_g(\overline{h})=[CFI_g(h^- ),CFI_g(h^+)], \end{aligned}$$

(2)

where the output $CFI_g(\overline{h})$ is itself interval-valued [7]. In other words, the CFI for interval-valued data is computed by applying the CFI for the numeric case of the left and right interval endpoints separately. Please see [11, 12, 21] for more detail about the interval aggregation using the FM and the CFI.

2.3 Subsethood

The subsethood between two sets a and b is a relation, indicating the degree to which a is a subset of b [18]. It is defined as

$$\begin{aligned} S_h\left( a,b\right) =\frac{\left| a\cap b\right| }{\left| a\right| }, \end{aligned}$$

(3)

where $\left| a\cap b\right| $ is the cardinality of the intersection of a and b, and $\left| a\right| $ is the cardinality of a. It is always bounded on the interval [0, 1], where 1 means that a is a subset of b ($a\subseteq b$) and 0 means that a and b are disjoint ($a\not \subset b$).

Similarly, the degree of subsethood of two intervals $\overline{a}$ and $\overline{b}$ can be defined as

(4)

where is the size of the intersection between $\overline{a}$ and $\overline{b}$ and $\left| \overline{a}\right| \ne 0$.

2.4 Bidirectional Subsethood Based Similarity Measure

A new SM was introduced in [14, 15] which uses the reciprocal subsethoods of intervals to capture their similarity. This measure for two intervals $\overline{a}$ and $\overline{b}$ is,

(5)

where $\bigstar $ is a t-norm. We can rewrite (5) using the definition of $S_h$ at (4) as

(6)

3 Existing Agreement Fuzzy Measures

Here, we briefly recapture the AG [21], GenA [12], and AA [11] FMs with respect to a set of intervals $\overline{h}=\{\overline{h}_1,\overline{h}_2,...,\overline{h}_n\}$ arising from n individual sources.

3.1 Fuzzy Measure of Agreement

Wagner and Anderson [21] proposed the AG FM by extracting it from the interval-valued data with no prior knowledge about sources. The AG FM is defined as

where $\overline{A}_i=\{\overline{h}_{\pi (1)},\overline{h}_{\pi (2)}...,\overline{h}_{\pi (i)}\}$ is the permuted set of intervals with $\overline{A}_0=\emptyset $, $z_i=\frac{i}{n}$ and |.| refers to the cardinality/size of the interval. Here, $\overline{U}_K(\overline{A}_i)$ unites the intersections of the K-tuples in $\overline{A}_i \subseteq \overline{h}$ as defined in (8) [11, 12].

$$\begin{aligned}&\overline{U}_K(\overline{A}_i)\quad = \bigcup _{k_1=1}^{i-K+1}\bigcup _{k_2=k_1+1}^{i-K+2}...\bigcup _{k_K=k_{K-1}+1}^{i}(\overline{h}_{\pi (k_1)} \cap \overline{h}_{\pi (k_2)}\cap ...\cap \overline{h}_{\pi (k_K)}) \end{aligned}$$

(8)

Further, the $\tilde{g}^{AG}(\overline{A}_i)$ is normalized by $\tilde{g}^{AG}(\overline{h})$ to satisfy the property of the FM, i.e., $g^{AG}(\overline{A}_i) = \frac{\tilde{g}^{AG}(\overline{A}_i)}{\tilde{g}^{AG}(\overline{h})}.$

3.2 Additive Measure of Agreement

Havens et al. [11] proposed the AA FM in order to alleviate the asymmetry issue of agreement FMs. This FM utilizes the SMs for determining the source agreement. The AA FM is expressed in (9).

$$\begin{aligned} \tilde{g}^{AA}(\overline{A}_i) = \tilde{g}^{AA}(\overline{A}_{i-1}) + \sum _{\begin{array}{c} j=1 \\ j\ne i \end{array}} ^{n}S^p(\overline{h}_j,\overline{h}_{\pi (i)}), i=[n], p \ge 0 \end{aligned}$$

(9)

where p is a tuning parameter and S is the SM. Further, $\tilde{g}^{AA}(\bar{A}_i)$ is normalized by $\tilde{g}^{AA}(\bar{A}_n)$ like $g^{AA}(A_i) = \frac{\tilde{g}^{AA}(\overline{A}_i)}{\tilde{g}^{AA}(\overline{A}_n)}.$

3.3 Fuzzy Measure of Generalized Accord

Havens et al. [12] proposed the GenA FM leveraging a generic SM to estimate the agreement (accord) of subsets of sources. The GenA FM is defined as

where $\overline{A}_i=\{\overline{h}_{\pi (1)},\overline{h}_{\pi (2)}....,\overline{h}_{\pi (i)}\}$ is the permuted set of intervals with $\overline{A}_0=\emptyset $, and $S_K(\overline{A}_i)$ is defined in (11).

$$\begin{aligned} { S_K(\overline{A}_i)= \left( {\begin{array}{c}n\\ K\end{array}}\right) ^{-1}\sum \nolimits _{{k_1=1}}^{{i-K}}\sum \nolimits _{{k_2=\atop k_1+1}}^{{i-K+1}}... \sum \nolimits _{{k_K=\atop k_{K-1}+1}}^{{i}} S(\{\overline{h}_{\pi (k_1)}, \overline{h}_{\pi (k_2)},...,\overline{h}_{\pi (k_K)}\})} \end{aligned}$$

(11)

Here, $\left( {\begin{array}{c}n\\ K\end{array}}\right) $ is the number of possible K-tuples in $\overline{h}$ and S is the SM. The quantity $S_K(\overline{A}_i)$ is the sum of similarities of the K-tuples in $\overline{A}_i \subseteq \overline{h}$, weighted by $\left( {\begin{array}{c}n\\ K\end{array}}\right) ^{-1}$. Further, the constant $\alpha _{\overline{h}}$ is defined in (12) so that $g^{GenA}(\overline{h})=1$.

$$\begin{aligned} \alpha _{\overline{h}} = \left( \sum _{K=2}^n S_K(\bar{A}_n)\right) ^{-1} \end{aligned}$$

(12)

In [11, 12], the GenA and AA FMs are explored in respect to the popular SMs (within (11) and (9)). As detailed in [14, 15], we note however that Jaccard or Dice SMs are liable to aliasing, thus making the GenA and AA FMs to generate the same worth for very different subsets of sources which in turn affects the quality of the overall aggregation. To avoid this, in the next section, we leverage the recently introduced bidirectional subsethood based SM (minimizing aliasing), designing a new instance of the GenA FM.

4 A New Instance of the Agreement Fuzzy Measure Based on Bidirectional Subsethood

Here, we develop a new instance of agreement FM following the concept of the GenA FM and exploit the new bidirectional subsethood based SM for computing the source agreement. As the new SM minimizes aliasing, it helps the proposed FM avoid generating the same agreement and worth for different subsets of sources. This section first defines the subsethood for a set of intervals. Then, the new SM at (5) is revisited to enable it to compute similarity for a set of intervals. Finally, the new instance of agreement FM involving the new SM is introduced.

4.1 Defining Subsethood for a Set of Intervals

The subsethood of an interval, $\overline{h}_r$ as regards to a set of intervals $\overline{A}_{i}\subseteq \overline{h}$ is defined as a mean of its subsethood to each interval $\overline{h}_t$ in $\overline{A}_{i}$. It is expressed as

$$\begin{aligned} S_h(\overline{h}_r,\overline{A}_{i}) = \frac{1}{|\overline{A}_{i}|}\sum \limits _{\overline{h}_t\in \overline{A}_{i}}S_h(\overline{h}_r,\overline{h}_t)= \frac{1}{|\overline{A}_{i}|}\sum \limits _{\overline{h}_t\in \overline{A}_{i}}\frac{|\overline{h}_r\cap \overline{h}_t|}{|\overline{h}_r|}, \end{aligned}$$

(13)

where $S_h(\overline{h}_r,\overline{A}_{i})\rightarrow [0,1]$ such that $S_h(\overline{h}_r,\overline{A}_{i})=1$ when $\overline{h}_r \subset \overline{h}_t$, for all $\overline{h}_t\in \overline{A}_{i}$ and $S_h(\overline{h}_r,\overline{A}_{i})=0$ when $\overline{h}_r \not \subset \overline{h}_t$ for any of $\overline{h}_t\in \overline{A}_{i}$.

4.2 Defining Bidirectional Subsethood Based Similarity Measure for a Set of Intervals

The bidirectional subsethood based SM, $S_{S_h}$ for $\overline{h}$ is the t-norm ($\bigstar $) of their reciprocal subsethoods, i.e.,

$$\begin{aligned} \begin{aligned} S_{S_h}\left( \overline{h}\right)&= \bigstar \left( S_h(\overline{h}_1,\{\overline{h}_2,...,\overline{h}_n\}),...,S_h(\overline{h}_n,\{\overline{h}_1,...,\overline{h}_{n-1}\})\right) \\&=\bigstar \left( S_h(\overline{h}_1,\overline{h}\backslash \overline{h}_1),...,S_h(\overline{h}_n,\overline{h}\backslash \overline{h}_n)\right) \end{aligned} \end{aligned}$$

(14)

where $\overline{h}\backslash \overline{h}_i$ is the nonempty subset of intervals excluding $\overline{h}_i$, $i\in \{1,...,n\}$. In this paper, we use the minimum t-norm ($\bigstar $) as it is the most common in practice.

4.3 Bidirectional Subsethood Based Agreement Fuzzy Measure

Consider again the set of n intervals, $\overline{h}$. For any nonempty subset $\overline{A}_i\in \overline{h}$, $1\le i\le n$, the new FM, $\tilde{g}^{AS_h}$ using the new SM (14) is defined as follows (which is later normalized to a proper FM, $g^{AS_h}$):

$$\begin{aligned} \tilde{g}^{AS_h}(\overline{A}_0)&= 0,\end{aligned}$$

(15a)

$$\begin{aligned} \tilde{g}^{AS_h}(\overline{A}_1)&= \left( {\begin{array}{c}n\\ 1\end{array}}\right) ^{-1}\times \sum _{k_1=1}^{1}S_{S_h}\left( \overline{h}_{k_1},\overline{h}_{k_1}\right) =\frac{1}{n},\end{aligned}$$

(15b)

$$\begin{aligned} \tilde{g}^{AS_h}(\overline{A}_i)&=i \times \tilde{g}^{AS_h}(\overline{A}_1)+\left( {\begin{array}{c}n\\ 2\end{array}}\right) ^{-1}\sum _{k_1=1}^{i-1}\sum _{k_2=k_1+1}^{i}S_{S_h}\left( \overline{h}_{k_1},\overline{h}_{k_2}\right) +...\\ {}&+\left( {\begin{array}{c}n\\ i\end{array}}\right) ^{-1}S_{S_h}\left( \overline{h}_1,...,\overline{h}_{i}\right) ,\nonumber \end{aligned}$$

(15c)

where $\overline{A}_0=\emptyset $, $\overline{A}_1$ is a singleton subset, and $\overline{A}_i$ is a non-singleton subset with i sources, $1<i\le n$. $\left( {\begin{array}{c}n\\ K\end{array}}\right) $ is total number of K-tuples in the set, $\overline{h}$, where $1\le K\le n$. (15a) is the worth of $\overline{A}_0$, which is always 0. (15b) is the worth of $\overline{A}_1$, which is the similarity of 1, weighted by $\left( {\begin{array}{c}n\\ 1\end{array}}\right) ^{-1}$. (15c) is the worth of $\overline{A}_i$, which is the sum of the similarities of all K-tuples in $\overline{A}_i$, $1\le K\le i$, weighted by $\left( {\begin{array}{c}n\\ K\end{array}}\right) ^{-1}$.

Remark 1

(15b) captures the worth of singleton subsets ($\overline{A}_1$) which is, $\tilde{g}^{AS_h}(\overline{A}_1)$ = $\frac{1}{n}$, where $n=|\overline{h}|$. For a non-singleton subset consisting of all disagreeing sources, the inclusion of the worth of the singleton subsets in (15c) enables it to generate the worth information for this set.

Following [11, 12], (15c) is rewritten as follows,

$$\begin{aligned} \tilde{g}^{AS_h}(\overline{A}_i) = \frac{i}{n}+\sum _{K=2}^i\left[ \left( {\begin{array}{c}n\\ K\end{array}}\right) ^{-1} Z_K(\overline{A}_i)\right] , \text { }i\ge 1, \end{aligned}$$

(16)

where the first part of (16) is the sum of the worth of all singletons in $\overline{A}_i$. The other part gives summation of the similarities of all K-tuples in $\overline{A}_i$ ($K\ge 2$), weighted by $\left( {\begin{array}{c}n\\ K\end{array}}\right) ^{-1}$. $Z_K(\overline{A}_i)$ captures the cumulative similarity for all K-tuples in $A_i$ ($K\ge 2$) using (14) and is defined in (17).

$$\begin{aligned} Z_K\left( \overline{A}_i\right) =\sum \nolimits _{{k_1=1}}^{{i-K+1}}\sum \nolimits _{{\begin{array}{c} k_2=\atop k_1+1 \end{array}}}^{{i-K+2}}...\sum \nolimits _{{\begin{array}{c} k_K=\atop k_{K-1}+1 \end{array}}}^{{i}} \bigstar \left( S_h(\overline{h}_{k_1 },\overline{A}_i\backslash \overline{h}_{k_1}),..., S_h(\overline{h}_{k_K},\overline{A}_i\backslash \overline{h}_{k_K})\right) \end{aligned}$$

(17)

Finally, $\tilde{g}^{AS_h}(\overline{A}_i)$ is normalized by $\tilde{g}^{AS_h}(\overline{h})$ in (18) so that $g^{AS_h}(\overline{A}_i)\le 1$ and $g^{AS_h}(\overline{h})=1$, which maintains the bounded property of the FM.

$$\begin{aligned} g^{AS_h}(\overline{A}_i) = \frac{\tilde{g}^{AS_h}(\overline{A}_i)}{\tilde{g}^{AS_h}(\overline{h})},\text { } 1\le i\le n. \end{aligned}$$

(18)

In the following Example 1 demonstrates that unlike the $g^{GenA}$ and $g^{AA}$ FMs, the new instance agreement FM, $g^{AS_h}$ avoids generating the same agreement and worth for different sets of sources. In addition, Example 2 presents the interval aggregation using the $g^{AS_h}$ FM and the CFI.

Example 1: Consider two interval-valued datasets, $\overline{h}$ and $\overline{r}$, as shown in Fig. 3. Their corresponding FM lattices using the $g^{AS_h}$, $g^{AG}$, $g^{GenA}$, and $g^{AA}$ FMs are also shown in Fig. 3 (we skip showing the FM values for $\emptyset $ and $\overline{h}$). Due to aliasing of the Jaccard SM, both $g^{GenA}$ and $g^{AA}$ FMs generate the same FM lattices for these sets whereas the $g^{AS_h}$ and $g^{AG}$ FMs generate distinct FM lattice.

Example 2: Consider the interval-valued dataset, $\overline{r}$ in Fig. 3(b) and its corresponding $g^{AS_h}$ FM lattice in Fig. 3(d). Using (1), the aggregation of left interval endpoints is, $CFI_g(h^-)=3\times [g^{AS_h}(\{1\})-g^{AS_h}(\{\emptyset \})]+1\times [g^{AS_h}(\{1,3\})-g^{AS_h}(\{1\})]+0\times [g^{AS_h}(\{1,2,3\})-g^{AS_h}(\{1,3\})]=3\times [0.22-0]+1\times [0.54-0.22]+0\times [1-0.54]=0.98$. Similarly, the aggregation of right interval endpoints is, $CFI_g(h^+)=10\times [0.22-0]+6\times [0.54-0.22]+3\times [1-0.54]=5.5$. Finally, using (2) the interval aggregation is, $CFI_g(\overline{h})=[CFI_g(h^-),CFI_g(h^+)]=[0.98,5.5]$.

5 Demonstration

This section demonstrates the behaviour of the new FM against the AG, GenA, and AA FMs for two synthetic datasets and a real-world example. For convenience, the new instance of agreement FM is denoted as $AS_h$ and the CFI is used throughout. Further, the Jaccard SM is used for the GenA and AA FMs, and AVG represents the arithmetic mean of the left and right endpoints of the intervals respectively. In all experiments, we follow the assumption that no worth information of sources is available (e.g. as in crowdsourcing). If there was such information, it could be captured and a meta-measure could be created (see [21]).

5.1 Demonstration with Synthetic Dataset-1

Figure 4 shows four examples of synthetic datasets together with aggregated results based on the CFI using the $AS_h$, AG, GenA, and AA FMs.

(1) The interval-valued set-I shown in Fig. 4(a) consists of three smaller intervals $\overline{h}_4$, $\overline{h}_5$, and $\overline{h}_6$ that agree completely and three larger intervals $\overline{h}_1$, $\overline{h}_2$ and $\overline{h}_3$ agreeing to a certain degree. The aggregation results (Fig. 4(a)) show that the AG FM gives importance only to the subset of larger intervals, whereas the GenA and AA FMs are influenced by the subset of smaller intervals as they agree totally. However, the $AS_h$ FM not only gives more importance to the subset of smaller intervals having a complete agreement, but also considers other subsets, $\{\overline{h}_1,\overline{h}_3\}$ and $\{\overline{h}_2,\overline{h}_3\}$ with agreement to a certain degree.

(2) For the interval-valued set-II shown in Fig. 4(b), there are three intervals $\overline{h}_1$, $\overline{h}_2$ and $\overline{h}_3$ having higher agreement than three other intervals $\overline{h}_4$, $\overline{h}_5$ and $\overline{h}_6$. Here, the AG FM is greatly influenced by the subset $\{\overline{h}_1,\overline{h}_2,\overline{h}_3\}$, whereas the GenA, AA, and $AS_h$ FMs show more balanced aggregation by considering the two subsets ($\{\overline{h}_1,\overline{h}_2,\overline{h}_3\}$ and $\{\overline{h}_4,\overline{h}_5,\overline{h}_6\}$) when used with the CFI.

(3) The interval-valued set-III shown in Fig. 4(c) includes three intervals agree to each other completely and the other three wholly disagrees. Here, the AG, GenA and AA FMs are completely influenced by the subset of agreed intervals, i.e., $\{\overline{h}_4,\overline{h}_5,\overline{h}_6\}$. Like other FMs, the $AS_h$ FM shows the influence of the subset $\{\overline{h}_4,\overline{h}_5,\overline{h}_6\}$, concurrently, it also considers disagreed singletons, $\{\overline{h}_1$, $\overline{h}_2$, $\overline{h}_3\}$.

(4) The interval-valued set-IV shown in Fig. 4(d) consists of five intervals where all intervals are completely non-overlapped. At this situation, the AG, GenA, and AA FMs are not designed to generate the worth information for the subsets of sources and hence do not provide aggregation when combined with the CFI. Contrarily, the $AS_h$ FM, by its construction, assigns worth to all singletons, which is later normalized by $\tilde{g}^{AS_h}(\overline{h})$. Even though there is no agreement amongst the sources regarding their intervals, the $AS_h$ FM still can estimate the worth of other subsets by utilizing the worth of singletons. Table 2 shows the normalized worth of all subsets of intervals for the dataset-IV (in Fig. 4(d)) using the $AS_h$ FM, where all intervals are in complete disagreement. Intuitively, when there is no overlap between intervals and all intervals are unique, then all sources should be treated with an equal worth and the aggregation should be equal to the average. In Fig. 4(d), only the $AS_h$ FM with the CFI generates the aggregation results accordingly (i.e., performs like an average operator).

Table 2. The normalized worth of subsets of intervals using the $AS_h$ FM ($g^{AS_h}$)

Full size table

5.2 Demonstration with Synthetic Dataset-2

Here, we investigate how the FMs in combination with the CFI behave in producing the aggregation result when the overlap between intervals are gradually decreased. Five different sets of two intervals $\overline{h}_1$ and $\overline{h}_2$ are considered in Fig. 5(a) with $100\%$, $75\%$, $50\%$, $25\%$, and $0\%$ overlap respectively. Note that $\overline{h}_1$ is set to [0, 1] in all five sets, while $\overline{h}_2$ is altered depending on the $\%$ of overlap. Figure 5(b) shows that all FMs (used with the CFI) aggregates the intervals equally (i.e., [0, 1]) when $100\%$ overlap exists. However, despite degrading overlap, the AG and GenA FMs continue to show the same aggregation (i.e., [0, 1]), whereas the AA and $AS_h$ FMs follow the overlap degradation and aggregate the intervals accordingly. Finally, when the intervals are in complete disagreement (i.e., $0\%$ overlap), the $AS_h$ FM with the CFI performs like an average operator, whereas the other FMs do not support aggregation.

5.3 A Real-World Example

This experiment uses the outcome of different ageing methods (Pubic Symphysis (PS), Auricular Surface (AS), Ectocranial Suture-Vault (ESV), and Ectocranial Suture-Lateral Anterior(ESLA)) to estimate the age-at-death of an individual skeleton [3] which is useful for forensic and biological anthropologists. Each of them provides an estimated age range for the individual skeleton. Considering the worth information of the aging methods are unknown, here our aim is to fuse their estimated age range directly to get a combined view of the skeletal age-at-death. In this aggregation experiment, the more intuitive aggregation outcome is likely to be a narrow age range capturing the actual age-at-death. Figure 6 presents the estimated age range of each aging methods for three individual skeletons together with their true chronological age-at-death. Figure 6 also shows the aggregation results for all agreement FMs when used with the CFI. The results reveal that the $g^{AS_h}$ FM specifies the age range more narrowly (while also capturing the true chronological age-at-death) compared to other agreement-based FMs. While this is only one example and not an extensive study, it demonstrates the interesting potential robustness in aggregation outcome of the proposed agreement FM.

6 Conclusions

As the agreement calculation of agreement FMs are affected by the limitations of popular SMs, this paper has developed a new instance of an evidence-driven agreement FM for interval-valued datasets building on the structure of GenA FM, and leveraging a recently introduced SM [14, 15] to provide better capture of the inter-source agreement and worth estimation. Further, the proposed FM is designed to deal with cases where no agreement exists amongst the evidence arising from sources. Here, in combination with the CFI, it gracefully degrades to an average operator, whereas existing agreement FMs are not designed to deal with such instances. The behaviour of this FM has been compared with existing agreement FMs by aggregating both synthetic and real interval-valued data in combination with the CFI, showing that it provides robust and qualitatively superior outcomes in agreement-based data aggregation. In future, we will experiment with this new instance of agreement FM in combination with the FI for aggregating fuzzy set-valued data. In addition, we will extend this FM to address the asymmetry issue noted in [11].

References

Anderson, D.T., Havens, T.C., Wagner, C., Keller, J.M., Anderson, M.F., Wescott, D.J.: Sugeno fuzzy integral generalizations for sub-normal fuzzy set-valued inputs. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1–8 (2012)
Google Scholar
Anderson, D.T., Keller, J.M., Havens, T.C.: Learning fuzzy-valued fuzzy measures for the fuzzy-valued sugeno fuzzy integral. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS (LNAI), vol. 6178, pp. 502–511. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14049-5_52
Chapter Google Scholar
Anderson, M.F., Anderson, D.T., Wescott, D.J.: Estimation of adult skeletal age-at-death using the sugeno fuzzy integral. Am. J. Phys. Anthropol. 142(1), 30–41 (2010)
Google Scholar
Beliakov, G.: Construction of aggregation functions from data using linear programming. Fuzzy Sets Syst. 160(1), 65–75 (2009)
Article MathSciNet Google Scholar
Choquet, G.: Theory of capacities. Ann. de l’institut Fourier 5, 131–295 (1954)
Article MathSciNet Google Scholar
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
Dubois, D., Prade, H.: Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York (1980)
MATH Google Scholar
Feng, Y., Chen, Y., Wang, M.: Multi-sensor data fusion based on fuzzy integral in AR system. In: Pan, Z., Cheok, A., Haller, M., Lau, R.W.H., Saito, H., Liang, R. (eds.) ICAT 2006. LNCS, vol. 4282, pp. 155–162. Springer, Heidelberg (2006). https://doi.org/10.1007/11941354_17
Chapter Google Scholar
Grabisch, M., Murofushi, T., Sugeno, M.: Fuzzy Measures and Integrals-Theory and Applications, vol. 40. Physica Verlag, Heidelberg (2000)
MATH Google Scholar
Havens, T.C., Anderson, D.T., Keller, J.M.: A fuzzy choquet integral with an interval type-2 fuzzy number-valued integrand. In: Proceedings of the IEEE International Conference on Fuzzy Systems, Barcelona, Spain, pp. 1–8 (2010)
Google Scholar
Havens, T.C., Anderson, D.T., Wagner, C.: Data-informed fuzzy measures for fuzzy integration of intervals and fuzzy numbers. IEEE Trans. Fuzzy Syst. 23(5), 1861–1875 (2015)
Article Google Scholar
Havens, T.C., Anderson, D.T., Wagner, C., Deilamsalehy, H., Wonnacott, D.: Fuzzy integrals of crowd-sourced intervals using a measure of generalized accord. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1–8 (2013)
Google Scholar
Jaccard, P.: Nouvelles recherches sur la distribution florale. Bull. de la socit vaudoise des Sci. Nat. 44, 223–270 (1908)
Google Scholar
Kabir, S., Wagner, C., Havens, T.C., Anderson, D.T.: A similarity measure based on bidirectional subsethood for intervals. IEEE Trans. Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2019.2945249
Kabir, S., Wagner, C., Havens, T.C., Anderson, D.T., Aickelin, U.: Novel similarity measure for interval-valued data based on overlapping ratio. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1–6 (2017)
Google Scholar
Meyer, P., Roubens, M.: On the use of the choquet integral with fuzzy numbers in multiple criteria decision support. Fuzzy Sets Syst. 157(7), 927–938 (2006)
Article MathSciNet Google Scholar
Murofushi, T., Sugeno, M.: An interpretation of fuzzy measures and the choquet integral as an integral with respect to a fuzzy measure. Fuzzy Sets Syst. 29(2), 201–227 (1989)
Article MathSciNet Google Scholar
Nguyen, H.T., Kreinovich, V.: Computing degrees of subsethood and similarity for interval-valued fuzzy sets: fast algorithms. In: Proceedings of the 9th International Conference on Intelligent Technologies, pp. 47–55 (2008)
Google Scholar
Sugeno, M.: Theory of fuzzy integrals and its applications. Tokyo Institute of Technology (1974)
Google Scholar
Quoc Viet Hung, N., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8181, pp. 1–15. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41154-0_1
Chapter Google Scholar
Wagner, C., Anderson, D.T.: Extracting meta-measures from data for fuzzy aggregation of crowd sourced information. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1–8 (2012)
Google Scholar
Wagner, C., Havens, T.C., Anderson, D.T.: The arithmetic recursive average as an instance of the recursive weighted power mean. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1–6 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Nottingham, Nottingham, NG8 1BB, UK
Shaily Kabir & Christian Wagner

Authors

Shaily Kabir
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wagner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaily Kabir .

Editor information

Editors and Affiliations

LIP6-Sorbonne University, Paris, France
Marie-Jeanne Lesot
IDMEC, IST, Universidade de Lisboa, Lisbon, Portugal
Susana Vieira
University of Alberta, Edmonton, AB, Canada
Marek Z. Reformat
INESC, IST, Universidade de Lisboa, Lisbon, Portugal
João Paulo Carvalho
Eindhoven University of Technology, Eindhoven, The Netherlands
Anna Wilbik
CNRS-Sorbonne University, Paris, France
Bernadette Bouchon-Meunier
Iona College, New Rochelle, NY, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kabir, S., Wagner, C. (2020). A Bidirectional Subsethood Based Fuzzy Measure for Aggregation of Interval-Valued Data. In: Lesot, MJ., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2020. Communications in Computer and Information Science, vol 1238. Springer, Cham. https://doi.org/10.1007/978-3-030-50143-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-50143-3_48
Published: 05 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50142-6
Online ISBN: 978-3-030-50143-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics