Introduction and preliminaries

The rough set theory of Pawlak [1] is a mathematical tool to analyze the uncertain data. Pawlak classified data to a set of classes, and each class consists of objects [2]. This classification depends on an equivalence relation, and so all objects in the same class have the same importance, but this is not true in real life for all uncertain data. The notion of membership was studied in the view point of topological spaces in [3] and [4] via a general binary relation. This membership was extended from the original rough membership function [5]. Allam et al. [6] and Salama [7] gave a new approach for basic rough set concepts. They obtained some topological structures from rough sets. In [8], Yao et al. discussed the issue of classes overlapping in rough sets and they introduced a membership of an object to a class. The approximations is used to calculate the accuracy as in [9]. Several studies such as a chemical topology [10] have focused on the stability of structure bonds in proteins. Also, similarity on rough sets is one of these studies which interests in the protein energy of unfolding. Similarity based on rough sets was discussed in [11] and [12]. In 2017, Elshenawy et al. [13] studied the similarity of data via a weighted membership. Walczak and Massart [14] classified the data by giving range chosen. They used lower and upper approximations to calculate the accuracy. In this case, the intersection of all reducts gives the core of attributes. The topology on X generated by R is a knowledge base for X, and indication of symptoms for a fixed disease can be seen through the topology [15]. Different notions of a membership function based on rough sets were introduced and studied in [16], [17], and [18]. A QSAR [19] constructs a mathematical model interconnected in its biological activity a set of structural descriptors of a set of chemical compounds. The main purpose of this paper is to study the reduction based on similarity. We give a comparison between a reduction by similarity and some other types of reduction with some different examples. An application on QSAR of AAs will be studied. We introduce an algorithm to reduce a membership function and illustrate it graphically. Some properties of relations and membership functions will be investigated. A new method is introduced to study the correlation between attributes and a decision through a similarity relation.

Definition 1

[1] The IS or approximation space is a system (U,A,f) where U is the universe of finite set of objects and A is a set of attributes which is featured or variables. Each aA defines an information function fa:UVa, where Va is the set of all values of a, say the domain of attribute a.

Definition 2

[5] For every BA, the indiscernibility relation on B, denoted by Ind(B), is defined to be two objects xi,xjU which are indiscernible by the sets B if b(xi)=b(xj) for every bB. Ind(B) is the smallest indiscernible groups of objects and so the equivalence class of Ind(B) called elementary set in B. [xi] will denote to the equivalence class of object xi in the relation Ind(B).

Definition 3

[5] For every BA, a membership function of an object xiU with respect to B is given by \(\mu _{B}(x_{i})= \frac {|[x_{i}]\cap B|}{|[x_{i}]|}\), where |[xi]| is the cardinality the class [xi], in other words, the number of objects contained in [xi].

The IS is represented in Table 1 with a set of attributes A={p,q,r,s} and a set of objects U={a,b,c,d,e}. We study the effect of attributes on decision, in other words, the correlation between attributes and the decision in the following statements.

Table 1 Information system

Original Pawlak method

In Table 1, we compare between attributes of objects. There is no similarity between attributes of objects. So, we have the class of objects C={{a},{b},{c},{d},{e}} and two sets of decisions D1={a,d,e} and D2={b,c}. The Pawlak membership is given by \(\mu _{A}(x) = \frac {|[x]\cap A |}{|[x]|}\), for any object x and AU. Therefore:

$$\mu_{D_{1}}(x)= \left\{\begin{array}{r@{\quad:\quad}l} 1 & x \in \{a, d, e\}\\ 0& x \in \{b, c\}\end{array} \right.,$$
$$\mu_{D_{2}}(x)= \left\{\begin{array}{r@{\quad:\quad}l} 1 & x \in \{b, c\}\\ 0& x \in \{a, d, e\}\end{array} \right.$$

This means any object correlates with the decision only by 0 and 1.

Pawlak method with coding

The IS in Table 1 can be coded by choosing intervals as in Table 2. The IS with coding is given by Table 3. The class of objects in Table 3 is C={{a},{b,c},{d},{e}}. Objects correlate with decisions D1 and D2 by:

$$\mu_{D_{1}}(x)= \left\{\begin{array}{r@{\quad:\quad}l} 1 & x \in \{a, d, e\}\\ 0& x \in \{b, c\}\end{array} \right.,$$
$$\mu_{D_{2}}(x)= \left\{\begin{array}{r@{\quad:\quad}l} \frac{1}{2} & x \in \{b, c\}\\ 0& x \in \{a, d, e\}\end{array} \right.$$
Table 2 Objects with attribute coding
Table 3 Coded information system

This means that any object correlates with the decision only by 0, \(\frac {1}{2}\) and 1.

Similarity without degree

The similarity matrix between objects in Table 3 is given by Table 4. A binary relation R on U is defined by xRy if and only if \(\mu (x, y)\leq \frac {1}{4}\). Then, the class of objects is C={U,{a,b,d},{a,c,d}},{a,d,e}. The correlation of objects with respect to decisions D1 and D2 is given by\(\mu _{D_{1}}(a)= \frac {|U\cap \{a, b, d\}\cap \{a, c, d\}\}\cap \{a, d, e\}|}{| D_{1}|}= \frac {|\{a, d\}|}{| D_{1}|}= \frac {2}{3}\), \(\mu _{D_{1}}(b)= 1\), \(\mu _{D_{1}}(c)= 1\), \(\mu _{D_{1}}(d)= \frac {2}{3}\), \(\mu _{D_{1}}(e)= 1\). Also, \(\mu _{D_{2}}(a)= 1\), \(\mu _{D_{2}}(b)= \frac {3}{2}\), \(\mu _{D_{2}}(c)= \frac {3}{2}\), \(\mu _{D_{2}}(d)= 1\) and \(\mu _{D_{2}}(e)= \frac {3}{2}\).

Table 4 Similarity matrix

Uncertain QSAR information system

The basic form of Pawlak depends on equivalence relation. This expands the application circle for objects and decision rules. Because if we apply Pawlak rules for some ISs, it is not possible to have equal attributes for two different attributes. So as in the following problem, that we are giving a method of solving depends on a similarity relation.

Problem A modeling of the energy of unfolding of a protein (tryptophane synthase an alpha unit of the bacteriophage T4 lysozome), where 19 coded amino acids (AAs) were each introduced into position 49 [20]. The AAs are described in terms of seven attributes: a1= PIE and a2= PIF (two measures of the side chain lipophilicity), a3= DGR =ΔG of transfer from the protein interior with water, a4= SAC = surface area, a5= MR = molecular refractivity, a6= LAM = the side chain polarity, and a7= Vol = molecular volume. In [14], the authors used the form of Pawlak [1] to make decision rules. The IS of quantitative attributes {a1,a2,a3,a4,a5,a6,a7} and a decision attribute {d} can be represented by Table 5. The condition attributes are coded into four qualitative terms, such as very low, low, high, and very high, whereas the decision attributes is coded into three qualitative terms, such as low, medium, and high. The qualitative terms of all attributes are coded by integer numbers. The problem is to illustrate that there are some objects that coded with medium energy of unfolding in [14] in respect of an attribute would be for the high energy of unfold.

Table 5 QSAR information system

Algorithm

Step 1: Construct a similarity matrix for each attribute a by Ma=[wij] which will be 19×19 matrix, 7 is the number of attributes, i∈{1,2,3,⋯,19} will denote to the row of matrix, and j∈{1,2,3,⋯,19} will denote to the column of the matrix.

Step 2: Calculate the similarity degree of attributes through a definite relation, say R, on the our IS which will be denoted by QSAR IS=(U,A), where U={x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19} and A={a1,a2,a3,a4,a5,a6,a7}, the set of attributes. So, xiRxj if and only if xi is similar to xj. The degree of similarity or similarity measure between xi and xj is denoted by \(\text {deg}_{a}(x_{i}, x_{j})= d_{ij}= 1- \frac {|a(x_{i})- a(x_{j})|}{|a_{\text {max}}-a_{\text {min}}|}\), where amax and amin denote the minimum and maximum values of attribute a, respectively. It is clear that R is reflexive and symmetric.

Step 3: Classify the data deduced from step 2 via xiR={xj(dij):xiRxj}. The value d will be chosen by an expert. Then, there are two cases dij>d and dij<d.

Step 4: Define a membership function of every object for the similarity matrix in step 3 as follows: For any arbitrary element xkR=CkU/R and for yCk, the membership function of y with respect to any subset X of U, \(\mu ^{C_{k}}_{X}(y)= \frac {\sum \limits _{x\in C_{k}\cap X} d_{ij}}{\sum \limits _{x\in C_{k}}d_{ij}}\), where k∈{1,2,⋯,19}. If CkX=ϕ, then \(\mu ^{C_{k}}_{X}(y)= 0\).

Step 5: Present functions of the least, most extreme, and normal weighted participation for every y by:

$$\mu^{w}_{X}(y)= \text{min}_{y\in C_{k}}\left(\mu^{C_{k}}_{X}(y)\right)$$
$$\mu^{w}_{X}(y)= \text{max}_{y\in C_{k}}\left(\mu^{C_{k}}_{X}(y)\right)$$
$$\mu^{w}_{X}(y)= \text{avg}_{y\in C_{k}}\left(\mu^{C_{k}}_{X}(y)\right)$$

Step 6: Choose a set XU. Evaluate the rough membership μ for each object z by determining the \(A= \bigcap \limits _{z\in C_{k}} C_{k}\) and calculate \(\mu = \frac {|A\cap X|}{|A|}\), for every object.

Step 7: Determine a maximum rough membership μ from the last column of each classification.

Now, we apply the algorithm on the problem of QSAR.

  1. (1)

    Similarity matrix for an attribute a1. Since a1(max)=1.85 and a1(min)=−0.77, then, a1(max)−a1(min)=2.62, then we have the matrix \(M_{x_{1}}\). We have:

    $$ M_{a_{1}}= \left[{\begin{array}{ccccccccccccccccccc} 1&0.7&0.7&1.0&0.9 & 0.7 & 0.9 &1.0 &0.9 &0.6 &0.6 &0.7 & 0.5& 0.9 &0.9 &1.0 &0.4 &0.7 &0.8\\ 0.7 & 1& 0.9 & 0.6& 0.9 & 1.0 & 0.8 &0.8 &0.4 &0.3 &0.9 &0.9 & 0.5& 0.2 &0.7 &0.8 &0.7 &0.5 &0.5 \\ 0.7 & 0.9& 1 & 0.6& 0.8 & 0.9 & 0.8 &0.7 &0.3 &0.3 &0.9 &0.4 & 0.2& 0.6 &0.8 &0.7 &0.1 &0.4 &0.5 \\ 1.0 & 0.6& 0.6 & 1& 0.8 & 0.6 & 0.8 &0.9 &0.7 &0.7 &0.5 &0.8 & 0.6& 0.9 &0.8 &0.9 &0.5 &0.8 &0.8 \\ 0.9 & 0.9& 0.8 & 0.8& 1 & 0.8 & 0.9 &0.9 &0.5 &0.5 &0.7 &0.6 & 0.4& 0.8 &0.9 &0.9 &0.2 &0.6 &0.4 \\ 0.7 & 1.0& 0.9 & 0.6& 0.8 & 1 & 0.8 &0.7 &0.3 &0.3 &0.9 &0.5 & 0.2& 0.7 &0.8 &0.7 &0.1 &0.5 &0.5 \\ 0.9 & 0.8& 0.8 & 0.8& 0.9 & 0.8 & 1 &0.9 &0.5 &0.5 &0.7 &0.7 & 0.4& 0.8 &1.0 &0.9 &0.3 &0.7 &0.7 \\ 1.0 & 0.8& 0.7 & 0.9& 0.9 & 0.7 & 0.9 &1 &0.6 &0.6 &0.6 &0.7 & 0.5& 0.9 &0.9 &1.0 &0.3 &0.7 &0.8 \\ 0.9 & 0.4& 0.3 & 0.7& 0.5 & 0.3 & 0.5 &0.6 &1 &0.9 &0.2 &0.9 & 0.9& 0.7 &0.5 &0.6 &0.7 &0.9 &0.8 \\ 0.6 & 0.3& 0.3 & 0.7& 0.5 & 0.3 & 0.5 &0.6 &0.9 &1 &0.2 &0.8 & 0.9& 0.7 &0.5 &0.6 &0.8 &0.8 &0.8 \\ 0.6 & 0.9& 0.9 & 0.5& 0.7 & 0.9 & 0.7 &0.6 &0.2 &0.2 &1 &0.4 & 0.1& 0.6 &0.4 &0.6 & 0.0 &0.4 &0.4 \\ 0.7 & 0.9& 0.4 & 0.8& 0.6 & 0.5 & 0.7 &0.7 &0.9 &0.8 &0.7 &1 & 0.7& 0.8 &0.7 &0.7 &0.6 &1.0 &0.9 \\ 0.5 & 0.5& 0.2 & 0.6& 0.4 & 0.2 & 0.4 &0.5 &0.9 &0.9 &0.1 &0.7 & 1& 0.5 &0.4 &0.5 &0.9 &0.7 &0.7 \\ 0.9 & 0.2& 0.6 & 0.9& 0.8 & 0.7 & 0.8 &0.9 &0.7 &0.7 &0.6 &0.8 & 0.5& 1 &0.8 &0.9 &0.4 &0.8 &0.9 \\ 0.9 & 0.7& 0.8 & 0.8& 0.9 & 0.8 & 1.0 &0.9 &0.5 &0.5 &0.7 &0.7 & 0.4& 0.8 &1 &0.9 &0.3 &0.7 &0.7 \\ 1.0 & 0.8& 0.7 & 0.9& 0.9 & 0.7 & 0.9 &1.0 &0.6 &0.6 &0.6 &0.7 & 0.5& 0.9 &0.9 &1 &0.3 &0.7 &0.8 \\ 0.4 & 0.7& 0.1 & 0.5& 0.2 & 0.1 & 0.3 &0.3 &0.7 &0.8 &0.0 &0.6 & 0.9& 0.4 &0.3 &0.3 &1 &0.6 &0.6 \\ 0.7 & 0.5& 0.4 & 0.8& 0.6 & 0.5 & 0.7 &0.7 &0.9 &0.8 &0.4 &1.0 & 0.7& 0.8 &0.7 &0.7 &0.6 &1 &0.9 \\ 0.8 & 0.5& 0.5 & 0.8& 0.7 & 0.5 & 0.7 &0.8 &0.8 &0.8 &0.4 &0.9 & 0.7& 0.9 &0.7 &0.8 &0.6 &0.9 &1 \end{array}} \right] $$
  2. (2)

    Take d=0.7, we have the similarity classes.

    $$\begin{aligned} C_{1}&= x_{1}R= \{x_{1}(1), x_{4}(1.0), x_{5}(0.9), x_{7}(0,9), x_{8}(1.0), x_{9}(0.9), x_{14}(0.9), x_{15}(0.9), x_{16}(1.0), x_{19}(0.8)\};\\ C_{2}&= x_{2}R= \{ x_{2}(1), x_{3}(0.9), x_{5}(0.9), x_{6}(1.0), x_{7}(0.8), x_{8}(0.8), x_{11}(0.9), x_{12}(0.9), x_{16}(0.8)\};\\ C_{3}&= x_{3}R= \{ x_{2}(0.9), x_{3}(1), x_{5}(0.8), x_{6}(0.9), x_{7}(0.8), x_{11}(0.9), x_{15}(0.8) \};\\ C_{4}&= x_{4}R= \{ x_{1}(1.0), x_{4}(1), x_{5}(0.8), x_{7}(0.8), x_{8}(0.9), x_{12}(0.8), x_{14}(0.9), x_{15}(0.8), x_{16}(0.9), x_{18}(0.8), x_{19}(0.8)\};\\ C_{5}&= x_{5}R= \{ x_{1}(0.9), x_{2}(0.9), x_{3}(0.8), x_{4}(0.8), x_{5}(1), x_{6}(0.8), x_{7}(0.9), x_{8}(0.9),x_{14}(0.8), x_{15}(0.9), x_{16}(0.9) \};\\ C_{6}&= x_{6}R= \{ x_{2}(1.0), x_{3}(0.9), x_{5}(0.8), x_{6}(1), x_{7}(0.8), x_{11}(0.9), x_{15}(0.8) \};\\ C_{7}&= x_{7}R= \{ x_{1}(0.9), x_{2}(0.8), x_{3}(0.8), x_{4}(0.8), x_{5}(0.9), x_{6}(0.8), x_{7}(1), x_{8}(0.9),x_{14}(0.8), x_{15}(1.0), x_{16}(0.9) \};\\ C_{8}&= x_{8}R= \{ x_{1}(1.0), x_{2}(0.8), x_{4}(0.9), x_{5}(0.9), x_{7}(0.9), x_{8}(1), x_{14}(0.9), x_{15}(0.9),x_{16}(1.0), x_{19}(0.8) \};\\ C_{9}&= x_{9}R= \{ x_{1}(0.9), x_{9}(1), x_{10}(0.9), x_{12}(0.9), x_{13}(0.9), x_{18}(0.9), x_{19}(0.8) \};\\ C_{10}&= x_{10}R= \{ x_{9}(0.9), x_{10}(1), x_{12}(0.8), x_{13}(0.9), x_{17}(0.8), x_{18}(0.8), x_{19}(0.8) \};\\ C_{11}&= x_{11}R= \{ x_{2}(0.9), x_{3}(0.9), x_{6}(0.9), x_{11}(1) \};\\ C_{12}&= x_{12}R= \{ x_{2}(0.9), x_{4}(0.8), x_{9}(0.9), x_{10}(0.8), x_{12}(1), x_{14}(0.8), x_{18}(1.0), x_{19}(0.9)\};\\ C_{13}&= x_{13}R= \{ x_{9}(0.9), x_{10}(0.9), x_{13}(1), x_{17}(0.9)\};\\ C_{14}&= x_{14}R= \{ x_{1}(0.9), x_{4}(0.9), x_{5}(0.8), x_{7}(0.8), x_{8}(0.9), x_{12}(0.8), x_{14}(1), x_{15}(0.8),x_{16}(0.9), x_{18}(0.8), x_{19}(0.9)\};\\ C_{15}&= x_{15}R= \{ x_{1}(0.9), x_{3}(0.8), x_{4}(0.8), x_{5}(0.9), x_{6}(0.8), x_{7}(1.0), x_{8}(0.9), x_{14}(0.8),x_{15}(1), x_{16}(0.9) \};\\ C_{16}&= x_{16}R= \{ x_{1}(1.0), x_{2}(0.8), x_{4}(0.9), x_{5}(0.9), x_{7}(0.9), x_{8}(1.0), x_{14}(0.9), x_{15}(0.9),x_{16}(1), x_{19}(0.8) \};\\ C_{17}&= x_{17}R= \{ x_{10}(0.8), x_{13}(0.9), x_{17}(1)\};\\ C_{18}&= x_{18}R= \{ x_{4}(0.8), x_{9}(0.9), x_{10}(0.8), x_{12}(1.0), x_{14}(0.8), x_{18}(1), x_{19}(0.9)\};\\ C_{19}&= x_{19}R= \{ x_{1}(0.8), x_{4}(0.8), x_{8}(0.8), x_{9}(0.8), x_{10}(0.8), x_{12}(0.9), x_{14}(0.9), x_{16}(0.8), x_{18}(0.9), x_{19}(1) \}. \end{aligned} $$
  3. (3)

    We choose some various sets and compare the membership classification through the following two cases.

    Case 1: Let X={x9,x10,x12}. Then, we have:

    $$\begin{array}{ccccc} \mu^{C_{1}}_{X}(x_{1})= 0.10 &\mu^{C_{2}}_{X}(x_{2})= 0.11 &\mu^{C_{2}}_{X}(x_{3})= 0.11 &\mu^{C_{1}}_{X}(x_{4})= 0.10 &\mu^{C_{1}}_{X}(x_{5})= 0.10\\ \mu^{C_{4}}_{X}(x_{1})= 0.08 &\mu^{C_{3}}_{X}(x_{2})= 0.0 &\mu^{C_{3}}_{X}(x_{3})= 0.0 &\mu^{C_{4}}_{X}(x_{4})= 0.08 &\mu^{C_{2}}_{X}(x_{5})= 0.11\\ \mu^{C_{5}}_{X}(x_{1})= 0.0 &\mu^{C_{5}}_{X}(x_{2})= 0.0 &\mu^{C_{5}}_{X}(x_{3})= 0.0 &\mu^{C_{5}}_{X}(x_{4})= 0.0 &\mu^{C_{3}}_{X}(x_{5})= 0.0\\ \mu^{C_{7}}_{X}(x_{1})= 0.0 &\mu^{C_{6}}_{X}(x_{2})= 0.0 &\mu^{C_{6}}_{X}(x_{3})= 0.0 &\mu^{C_{7}}_{X}(x_{4})= 0.0 &\mu^{C_{4}}_{X}(x_{5})= 0.08\\ \mu^{C_{8}}_{X}(x_{1})= 0.0 &\mu^{C_{7}}_{X}(x_{2})= 0.0 &\mu^{C_{7}}_{X}(x_{3})= 0.0 &\mu^{C_{8}}_{X}(x_{4})= 0.0 &\mu^{C_{5}}_{X}(x_{5})= 0.0\\ \mu^{C_{9}}_{X}(x_{1})= 0.44 &\mu^{C_{8}}_{X}(x_{2})= 0.0 &\mu^{C_{11}}_{X}(x_{3})= 0.0 &\mu^{C_{12}}_{X}(x_{4})= 0.13 &\mu^{C_{6}}_{X}(x_{5})= 0.38\\ \mu^{C_{14}}_{X}(x_{1})= 0.08 &\mu^{C_{11}}_{X}(x_{2})= 0.0 &\mu^{C_{15}}_{X}(x_{3})= 0.0 &\mu^{C_{14}}_{X}(x_{4})= 0.08 &\mu^{C_{7}}_{X}(x_{5})= 0.0 \\ \mu^{C_{15}}_{X}(x_{1})= 0.0 &\mu^{C_{12}}_{X}(x_{2})= 0.38 &&\mu^{C_{15}}_{X}(x_{4})= 0.0 &\mu^{C_{8}}_{X}(x_{5})= 0.0\\ \mu^{C_{16}}_{X}(x_{1})= 0.0 &\mu^{C_{16}}_{X}(x_{2})= 0.0 &&\mu^{C_{16}}_{X}(x_{4})= 0.0 &\mu^{C_{14}}_{X}(x_{5})= 0.08\\ \mu^{C_{19}}_{X}(x_{1})= 0.30 &&&\mu^{C_{18}}_{X}(x_{4})= 0.44 &\mu^{C_{15}}_{X}(x_{5})= 0.0\\ &&& \mu^{C_{19}}_{X}(x_{4})= 0.30 & \mu^{C_{16}}_{X}(x_{5})= 0.0 \end{array}$$

    and

    $$\begin{array}{cccccc} \mu^{C_{2}}_{X}(x_{6})= 0.11 &\mu^{C_{1}}_{X}(x_{7})= 0.10 &\mu^{C_{1}}_{X}(x_{8})= 0.10 &\mu^{C_{1}}_{X}(x_{9})= 0.10 &\mu^{C_{9}}_{X}(x_{10})= 0.44\\ \mu^{C_{3}}_{X}(x_{6})= 0.0 &\mu^{C_{2}}_{X}(x_{7})= 0.11 &\mu^{C_{2}}_{X}(x_{8})= 0.11 &\mu^{C_{9}}_{X}(x_{9})= 0.44 &\mu^{C_{10}}_{X}(x_{10})= 0.45\\ \end{array}$$
    $$\begin{array}{cccccc} \mu^{C_{5}}_{X}(x_{6})= 0.0 &\mu^{C_{3}}_{X}(x_{7})= 0.0 &\mu^{C_{4}}_{X}(x_{8})= 0.08 &\mu^{C_{10}}_{X}(x_{9})= 0.45 &\mu^{C_{12}}_{X}(x_{10})= 0.38\\ \mu^{C_{6}}_{X}(x_{6})= 0.0 &\mu^{C_{4}}_{X}(x_{7})= 0.08 &\mu^{C_{5}}_{X}(x_{8})= 0.0 &\mu^{C_{12}}_{X}(x_{9})= 0.38 &\mu^{C_{13}}_{X}(x_{10})= 0.49\\ \mu^{C_{7}}_{X}(x_{6})= 0.0 & \mu^{C_{5}}_{X}(x_{7})= 0.0 &\mu^{C_{7}}_{X}(x_{8})= 0.0 &\mu^{C_{13}}_{X}(x_{9})= 0.49 &\mu^{C_{17}}_{X}(x_{10})= 0.30\\ \mu^{C_{11}}_{X}(x_{6})= 0.0 &\mu^{C_{6}}_{X}(x_{7})= 0.0 &\mu^{C_{8}}_{X}(x_{8})= 0.0 & \mu^{C_{18}}_{X}(x_{9})= 0.44 &\mu^{C_{18}}_{X}(x_{10})= 0.44\\ \mu^{C_{15}}_{X}(x_{6})= 0.0 & \mu^{C_{7}}_{X}(x_{7})= 0.0 &\mu^{C_{14}}_{X}(x_{8})= 0.08 & \mu^{C_{19}}_{X}(x_{9})= 0.30 & \mu^{C_{19}}_{X}(x_{10})= 0.30\\ &\mu^{C_{8}}_{X}(x_{7})= 0.0 &\mu^{C_{15}}_{X}(x_{8})= 0.0 &&\\ &\mu^{C_{14}}_{X}(x_{7})= 0.08 &\mu^{C_{16}}_{X}(x_{8})= 0.0 &&\\ &\mu^{C_{15}}_{X}(x_{7})= 0.0 &\mu^{C_{19}}_{X}(x_{8})= 0.30 && \\ & \mu^{C_{16}}_{X}(x_{7})= 0.0 &&& \end{array}$$

    and

    $$\begin{array}{cccccc} \mu^{C_{2}}_{X}(x_{11})= 0.11 &\mu^{C_{2}}_{X}(x_{12})= 0.11 &\mu^{C_{9}}_{X}(x_{13})= 0.44 &\mu^{C_{1}}_{X}(x_{14})= 0.10 &\mu^{C_{1}}_{X}(x_{15})= 0.10\\ \mu^{C_{3}}_{X}(x_{11})= 0.0 &\mu^{C_{4}}_{X}(x_{12})= 0.08 & \mu^{C_{10}}_{X}(x_{13})= 0.45 &\mu^{C_{4}}_{X}(x_{14})= 0.08 &\mu^{C_{3}}_{X}(x_{15})= 0.0\\ \mu^{C_{6}}_{X}(x_{11})= 0.0 &\mu^{C_{9}}_{X}(x_{12})= 0.44 & \mu^{C_{13}}_{X}(x_{13})= 0.49 &\mu^{C_{5}}_{X}(x_{14})= 0.0 &\mu^{C_{4}}_{X}(x_{15})= 0.08\\ \mu^{C_{11}}_{X}(x_{11})= 0.0 &\mu^{C_{10}}_{X}(x_{12})= 0.45 & \mu^{C_{17}}_{X}(x_{13})= 0.30 &\mu^{C_{7}}_{X}(x_{14})= 0.0 &\mu^{C_{5}}_{X}(x_{15})= 0.0\\ &\mu^{C_{12}}_{X}(x_{12})= 0.38 &&\mu^{C_{8}}_{X}(x_{14})= 0.0 &\mu^{C_{6}}_{X}(x_{15})= 0.0\\ &\mu^{C_{14}}_{X}(x_{12})= 0.08 &&\mu^{C_{12}}_{X}(x_{14})= 0.38 & \mu^{C_{7}}_{X}(x_{15})= 0.0 \\ &\mu^{C_{18}}_{X}(x_{12})= 0.44 &&\mu^{C_{14}}_{X}(x_{14})= 0.08 & \mu^{C_{8}}_{X}(x_{15})= 0.0\\ &\mu^{C_{19}}_{X}(x_{12})= 0.30 &&\mu^{C_{15}}_{X}(x_{14})= 0.0 & \mu^{C_{14}}_{X}(x_{15})= 0.08\\ &&&\mu^{C_{16}}_{X}(x_{14})= 0.0 & \mu^{C_{15}}_{X}(x_{15})= 0.0\\ &&&\mu^{C_{18}}_{X}(x_{14})= 0.44 & \mu^{C_{16}}_{X}(x_{15})= 0.0\\ &&&\mu^{C_{19}}_{X}(x_{14})= 0.30 &\end{array}$$

    and

    $$\begin{array}{cccccc} \mu^{C_{1}}_{X}(x_{16})= 0.10 &\mu^{C_{10}}_{X}(x_{17})= 0.45 &\mu^{C_{4}}_{X}(x_{18})= 0.08 &\mu^{C_{1}}_{X}(x_{19})= 0.10\\ \mu^{C_{2}}_{X}(x_{16})= 0.11 &\mu^{C_{13}}_{X}(x_{17})= 0.49 &\mu^{C_{9}}_{X}(x_{18})= 0.44 & \mu^{C_{4}}_{X}(x_{19})= 0.08\\ \mu^{C_{4}}_{X}(x_{16})= 0.08 &\mu^{C_{17}}_{X}(x_{17})= 0.30 &\mu^{C_{10}}_{X}(x_{18})= 0.45 &\mu^{C_{8}}_{X}(x_{19})= 0.0\\ \mu^{C_{5}}_{X}(x_{16})= 0.0 &&\mu^{C_{12}}_{X}(x_{18})= 0.38 & \mu^{C_{9}}_{X}(x_{19})= 0.44\\ \mu^{C_{7}}_{X}(x_{16})= 0.0 &&\mu^{C_{14}}_{X}(x_{18})= 0.08 &\mu^{C_{10}}_{X}(x_{19})= 0.45\\ \mu^{C_{8}}_{X}(x_{16})= 0.0 &&\mu^{C_{18}}_{X}(x_{18})= 0.44 & \mu^{C_{12}}_{X}(x_{19})= 0.38\\ \mu^{C_{14}}_{X}(x_{16})= 0.08 &&\mu^{C_{19}}_{X}(x_{18})= 0.30 & \mu^{C_{14}}_{X}(x_{19})= 0.08\\ \mu^{C_{15}}_{X}(x_{16})= 0.0 &&&\mu^{C_{16}}_{X}(x_{19})= 0.0\\ \mu^{C_{16}}_{X}(x_{16})= 0.0 &&&\mu^{C_{18}}_{X}(x_{19})= 0.44\\ \mu^{C_{19}}_{X}(x_{16})= 0.30 &&&\mu^{C_{19}}_{X}(x_{19})= 0.30\\ &&&\end{array}$$

    Now, we evaluate the rough membership for each object which gives a preferable value for the data. The object belongs to more than one class, so it has three membership minimum, maximum, and average. This can be shown in Table 6 and Fig. 1.

    Fig. 1
    figure 1

    Rough membership for X={x9,x10,x12}

    Table 6 Weighed rough membership for X

    In [14], the decision rules are coded into three qualitative terms, such as low, medium, and high. X is the set of objects that protein has high energy of unfolding.

    Case 2: Let Y={x4,x8,x13,x17,x19}. Then, we have:

    $$\begin{array}{ccccc} \mu^{C_{1}}_{Y}(x_{1})= 0.30 &\mu^{C_{2}}_{Y}(x_{2})= 0.10 &\mu^{C_{2}}_{Y}(x_{3})= 0.10 &\mu^{C_{1}}_{Y}(x_{4})= 0.30 &\mu^{C_{1}}_{Y}(x_{5})= 0.30\\ \mu^{C_{4}}_{Y}(x_{1})= 0.28 &\mu^{C_{3}}_{Y}(x_{2})= 0.0 &\mu^{C_{3}}_{Y}(x_{3})= 0.0 &\mu^{C_{4}}_{Y}(x_{4})= 0.28 &\mu^{C_{2}}_{Y}(x_{5})= 0.10\\ \mu^{C_{5}}_{Y}(x_{1})= 0.18 &\mu^{C_{5}}_{Y}(x_{2})= 0.18 &\mu^{C_{5}}_{Y}(x_{3})= 0.18 &\mu^{C_{5}}_{Y}(x_{4})= 0.18 &\mu^{C_{3}}_{Y}(x_{5})= 0.0\\ \mu^{C_{7}}_{Y}(x_{1})= 0.18 &\mu^{C_{6}}_{Y}(x_{2})= 0.0 &\mu^{C_{6}}_{Y}(x_{3})= 0.0 &\mu^{C_{7}}_{Y}(x_{4})= 0.18 &\mu^{C_{4}}_{Y}(x_{5})= 0.28\\ \mu^{C_{8}}_{Y}(x_{1})= 0.30 &\mu^{C_{7}}_{Y}(x_{2})= 0.18 &\mu^{C_{7}}_{Y}(x_{3})= 0.18 &\mu^{C_{8}}_{Y}(x_{4})= 0.30 &\mu^{C_{5}}_{Y}(x_{5})= 0.18\\ \mu^{C_{9}}_{Y}(x_{1})= 0.27 &\mu^{C_{8}}_{Y}(x_{2})= 0.30 &\mu^{C_{11}}_{Y}(x_{3})= 0.0 &\mu^{C_{12}}_{Y}(x_{4})= 0.24 &\mu^{C_{6}}_{Y}(x_{5})= 0.0\\ \mu^{C_{14}}_{Y}(x_{1})= 0.28 &\mu^{C_{11}}_{Y}(x_{2})= 0.0 &\mu^{C_{15}}_{Y}(x_{3})= 0.19 &\mu^{C_{14}}_{Y}(x_{4})= 0.28 &\mu^{C_{7}}_{Y}(x_{5})= 0.18 \\ \end{array}$$
    $$\begin{array}{ccccc} \mu^{C_{15}}_{Y}(x_{1})= 0.19 &\mu^{C_{12}}_{Y}(x_{2})= 0.24 &&\mu^{C_{15}}_{Y}(x_{4})= 0.19 &\mu^{C_{8}}_{Y}(x_{5})= 0.30\\ \mu^{C_{16}}_{Y}(x_{1})= 0.30 &\mu^{C_{16}}_{Y}(x_{2})= 0.30 &&\mu^{C_{16}}_{Y}(x_{4})= 0.30 &\mu^{C_{14}}_{Y}(x_{5})= 0.28\\ \mu^{C_{19}}_{Y}(x_{1})= 0.31 &&&\mu^{C_{18}}_{Y}(x_{4})= 0.27 &\mu^{C_{15}}_{Y}(x_{5})= 0.19\\ &&& \mu^{C_{19}}_{Y}(x_{4})= 0.31 & \mu^{C_{16}}_{Y}(x_{5})= 0.30\end{array}$$

    and

    $$\begin{array}{cccccc} \mu^{C_{2}}_{Y}(x_{6})= 0.10 &\mu^{C_{1}}_{Y}(x_{7})= 0.30 &\mu^{C_{1}}_{Y}(x_{8})= 0.30 &\mu^{C_{1}}_{Y}(x_{9})= 0.30 &\mu^{C_{9}}_{Y}(x_{10})= 0.27\\ \mu^{C_{3}}_{Y}(x_{6})= 0.0 &\mu^{C_{2}}_{Y}(x_{7})= 0.10 &\mu^{C_{2}}_{Y}(x_{8})= 0.10 &\mu^{C_{9}}_{Y}(x_{9})= 0.27 &\mu^{C_{10}}_{Y}(x_{10})= 0.42\\ \mu^{C_{5}}_{Y}(x_{6})= 0.18 &\mu^{C_{3}}_{Y}(x_{7})= 0.0 &\mu^{C_{4}}_{Y}(x_{8})= 0.28 &\mu^{C_{10}}_{Y}(x_{9})= 0.42 &\mu^{C_{12}}_{Y}(x_{10})= 0.24\\ \mu^{C_{6}}_{Y}(x_{6})= 0.0 &\mu^{C_{4}}_{Y}(x_{7})= 0.28 &\mu^{C_{5}}_{Y}(x_{8})= 0.18 &\mu^{C_{12}}_{Y}(x_{9})= 0.24 &\mu^{C_{13}}_{Y}(x_{10})= 0.51\\ \mu^{C_{7}}_{Y}(x_{6})= 0.18 & \mu^{C_{5}}_{Y}(x_{7})= 0.18 &\mu^{C_{7}}_{Y}(x_{8})= 0.18 &\mu^{C_{13}}_{Y}(x_{9})= 0.51 &\mu^{C_{17}}_{Y}(x_{10})= 0.70\\ \mu^{C_{11}}_{Y}(x_{6})= 0.0 &\mu^{C_{6}}_{Y}(x_{7})= 0.0 &\mu^{C_{8}}_{Y}(x_{8})= 0.30 & \mu^{C_{18}}_{Y}(x_{9})= 0.27 &\mu^{C_{18}}_{Y}(x_{10})= 0.27\\ \mu^{C_{15}}_{Y}(x_{6})= 0.19 & \mu^{C_{7}}_{Y}(x_{7})= 0.18 &\mu^{C_{14}}_{Y}(x_{8})= 0.28 & \mu^{C_{19}}_{Y}(x_{9})= 0.31 & \mu^{C_{19}}_{Y}(x_{10})= 0.31\\ &\mu^{C_{8}}_{Y}(x_{7})= 0.30 &\mu^{C_{15}}_{Y}(x_{8})= 0.19 &&\\ &\mu^{C_{14}}_{Y}(x_{7})= 0.28 &\mu^{C_{16}}_{Y}(x_{8})= 0.30 &&\\ &\mu^{C_{15}}_{Y}(x_{7})= 0.19 &\mu^{C_{19}}_{Y}(x_{8})= 0.31 && \\ & \mu^{C_{16}}_{Y}(x_{7})= 0.30 &&&\end{array}$$

    and

    $$\begin{array}{cccccc} \mu^{C_{2}}_{Y}(x_{11})= 0.10 &\mu^{C_{2}}_{Y}(x_{12})= 0.10 &\mu^{C_{9}}_{Y}(x_{13})= 0.27 &\mu^{C_{1}}_{Y}(x_{14})= 0.30 &\mu^{C_{1}}_{Y}(x_{15})= 0.30\\ \mu^{C_{3}}_{Y}(x_{11})= 0.0 &\mu^{C_{4}}_{Y}(x_{12})= 0.28 & \mu^{C_{10}}_{Y}(x_{13})= 0.42 &\mu^{C_{4}}_{Y}(x_{14})= 0.28 &\mu^{C_{3}}_{Y}(x_{15})= 0.0\\ \mu^{C_{6}}_{Y}(x_{11})= 0.0 &\mu^{C_{9}}_{Y}(x_{12})= 0.27 & \mu^{C_{13}}_{Y}(x_{13})= 0.51 &\mu^{C_{5}}_{Y}(x_{14})= 0.18 &\mu^{C_{4}}_{Y}(x_{15})= 0.28\\ \mu^{C_{11}}_{Y}(x_{11})= 0.0 &\mu^{C_{10}}_{Y}(x_{12})= 0.42 & \mu^{C_{17}}_{Y}(x_{13})= 0.70 &\mu^{C_{7}}_{Y}(x_{14})= 0.18 &\mu^{C_{5}}_{Y}(x_{15})= 0.18\\ &\mu^{C_{12}}_{Y}(x_{12})= 0.24 &&\mu^{C_{8}}_{Y}(x_{14})= 0.30 &\mu^{C_{6}}_{Y}(x_{15})= 0.0\\ &\mu^{C_{14}}_{Y}(x_{12})= 0.28 &&\mu^{C_{12}}_{Y}(x_{14})= 0.24 & \mu^{C_{7}}_{Y}(x_{15})= 0.18 \\ &\mu^{C_{18}}_{Y}(x_{12})= 0.27 &&\mu^{C_{14}}_{Y}(x_{14})= 0.28 & \mu^{C_{8}}_{Y}(x_{15})= 0.30\\ &\mu^{C_{19}}_{Y}(x_{12})= 0.31 &&\mu^{C_{15}}_{Y}(x_{14})= 0.19 & \mu^{C_{14}}_{Y}(x_{15})= 0.28\\ &&&\mu^{C_{16}}_{Y}(x_{14})= 0.30 & \mu^{C_{15}}_{Y}(x_{15})= 0.19\\ &&&\mu^{C_{18}}_{Y}(x_{14})= 0.27 & \mu^{C_{16}}_{Y}(x_{15})= 0.30\\ &&&\mu^{C_{19}}_{Y}(x_{14})= 0.31 &\end{array}$$

    and

    $$\begin{array}{cccccc} \mu^{C_{1}}_{Y}(x_{16})= 0.30 &\mu^{C_{10}}_{Y}(x_{17})= 0.42 &\mu^{C_{4}}_{Y}(x_{18})= 0.28 &\mu^{C_{1}}_{Y}(x_{19})= 0.30\\ \mu^{C_{2}}_{Y}(x_{16})= 0.10 &\mu^{C_{13}}_{Y}(x_{17})= 0.51 &\mu^{C_{9}}_{Y}(x_{18})= 0.27 & \mu^{C_{4}}_{Y}(x_{19})= 0.28\\ \mu^{C_{4}}_{Y}(x_{16})= 0.28 &\mu^{C_{17}}_{Y}(x_{17})= 0.70 &\mu^{C_{10}}_{Y}(x_{18})= 0.42 &\mu^{C_{8}}_{Y}(x_{19})= 0.30\\ \mu^{C_{5}}_{Y}(x_{16})= 0.18 &&\mu^{C_{12}}_{Y}(x_{18})= 0.24 & \mu^{C_{9}}_{Y}(x_{19})= 0.27\\ \mu^{C_{7}}_{Y}(x_{16})= 0.18 &&\mu^{C_{14}}_{Y}(x_{18})= 0.28 &\mu^{C_{10}}_{Y}(x_{19})= 0.42\\ \mu^{C_{8}}_{Y}(x_{16})= 0.30 &&\mu^{C_{18}}_{Y}(x_{18})= 0.27 & \mu^{C_{12}}_{Y}(x_{19})= 0.24\\ \mu^{C_{14}}_{Y}(x_{16})= 0.28 &&\mu^{C_{19}}_{Y}(x_{18})= 0.31 & \mu^{C_{14}}_{Y}(x_{19})= 0.28\\ \mu^{C_{15}}_{Y}(x_{16})= 0.19 &&&\mu^{C_{16}}_{Y}(x_{19})= 0.30\\ \mu^{C_{16}}_{Y}(x_{16})= 0.30 &&&\mu^{C_{18}}_{Y}(x_{19})= 0.27\\ \mu^{C_{19}}_{Y}(x_{16})= 0.31 &&&\mu^{C_{19}}_{Y}(x_{19})= 0.31\\ &&&\end{array}$$

    Now, we evaluate the rough membership for each object which gives a preferable value for the data. This value will be evaluated via minimum, maximum, average, and weighed membership. This can be shown in Table 6 and Fig. 2.

    Fig. 2
    figure 2

    Rough membership for Y={x4,x8,x13,x17,x19}

    Y is the set of objects that protein has a medium energy of unfolding. From Table 7, one can show that the objects x9,x10,x12 have a rough membership value 1; this means that a protein has high energy of unfolding. The object x19 in Pawlak reduction had a medium energy of unfolding [14], while from our procedure, x19 has a high energy of unfolding. In the same manner, we can take a set Z of a protein, which has a low energy of unfolding. Through the correlation between objects and decision, there is at least one object in Z which has the medium or the high energy of unfolding. Therefore, the significance of each attribute and degrees of memberships is more precise from Pawlak’s reduction. We can evaluate analogously the quantitative for attributes {a2,a3,a4,a5,a6,a7}.

    Table 7 Weighed rough membership for Y

Some properties on a similarity relation

The given algorithm depends on a similarity matrix, general binary relation, and rough membership. So, we study some properties of these notions.

Proposition 1

If R1 and R2 are two different relations, then the degree of similarity is the same.

Proof

A similarity measure between xi and xj is given by \(\text {deg}_{a}(x_{i}, x_{j})= 1- \frac {|a(x_{i})- a(x_{j})|}{|a_{\text {max}}-a_{\text {min}}|}\). Since each of R1 and R2 depends on d, then \(|x_{i} R_{1}|= \sum \limits _{x_{k}\in C_{k}}d_{ij}= |x_{i} R_{2}|\). Therefore, the degree of similarity is the same. □

Proposition 2

If R1R2, then \(\text {deg}_{R_{1}}(x_{i}, x_{j}) \leq \text {deg}_{R_{2}}(x_{i}, x_{j})\), where deg is the degree of similarity relation.

Proof

Since R1R2, then \(|x_{i} R_{1}|= \sum \limits _{x_{k}\in C_{k}}d_{ij}\leq \sum \limits _{x_{k}\in C_{k}} d^{'}_{ij}= |x_{i} R_{2}|\), where dij and \(d^{'}_{ij}\) are the similarity degrees with respect to R1 and R2, respectively. □

Proposition 3

For an IS (U,A), where U is the set of objects and A is the sets of attributes. Then, μXCk(x)=μXCk(y), for every two different objects x,yA, every class Ck, and every nonempty set XU.

Proof

Directly from \(\mu ^{C_{k}}_{X}(x)= \frac {\sum \limits _{x\in C_{k}\cap X} d_{ij}}{\sum \limits _{x\in C_{k}}d_{ij}}= \mu ^{C_{k}}_{X}(y)\)

Remark 1

For an attribute aA in an IS (U,A). If μXCk(x)=μXCm(x), km, it is not necessary that Ck=Cm. This is obvious in the problem of our study.

Conclusion and discussion

Chemical data sets have been analyzed using similarity relations. The results are more precise in comparison with the original rough set theory. The description of objects is different from the study in [14]. For example, the decision concerning element x19 in our study has the high energy of unfolding, of which in [14], the energy was the medium of unfolding. This opened the way for applying similarity models in IS which give discrete structure and coincide with the classical case. The model of QSAR of similarity reduction can be applied to a finite set of objects. The approach used here can be applied in any IS with quantitative or qualitative data. Consequently, they are very significant in decision-making [2124]. The introduced techniques are very useful in application because they open a way for more topological applications from real-life problems.