1 Introduction

The aim of our project is to explore the principle of CBR in developing Cholera Easy Detection System (CEDS). Case-based reasoning (CBR) is a technique presented by Riesbeck and Schank [1] and Kolodner [2] based on human behavior. Aamodt and Plaza [3] introduce a basic model for developing CBR applications. CBR is said to be a technique that deals with problems by remembering similar formal cases which are stored in case base and comparing present cases with them [4] just how a human brain works. Generally, the life cycle of CBR is described by 4 phases, such as retain the most similar cases to compare with new problem, reuse the cases to attempt solution of the problem, revise the proposed solution if necessary, and obtain the solution as part of the new case. So CBR can also be represented as the knowledge-based learning approach as each time a new case occurs, the system memorizes the case.

In this work, we have proposed a model CEDS that detects Cholera disease through some certain symptoms felt by the patient. In our proposed approach, the symptoms are classified as primary and secondary symptoms based on their intensity. The system diagnoses the problem in two stages. In first stage, the system uses cardinality approach, and in the second stage, it uses similarity searching approach.

Cardinality approach: In this approach, system will check number of primary symptoms experienced by the patient he or she has related to the disease to generate an initial solution.

Similarity searching: In this approach, system will determine the similarity of formal cases with the new case to select the most similar case based on some mathematical calculations as explained in Sect. 4.

Our proposed system assists medical experts to take an immediate action by returning an automated solution on the basis of some input parameters. This work is an empirical study of CBR in cholera disease detection. In future work, we like to explore the proposed system with more real-time cases.

2 Literature Study

CBR has played a significant role in modern technology and has widely been used in the field of medicine as in [5], aircraft conflict resolution as in [6], and military decision support system as in [7] and several other fields.

The working principle of CBR mainly depends on two main parts as in [3]: The first part is the retrieval, in which most historical cases are retained from case base to find the similarity between old case and the new one. Depending on the size of case base, many algorithm are deployed, namely nearest neighbor match and ID3 as in [8]. The second part is the adaptation which suggests solving the problem by learning from experience. It means solutions of past similar cases have to be updated if required to fit for a new case. Since adaptation is a very complex method and field oriented, no generalized adaptation methods have been developed so far.

In [9], a knowledge-based viral fever detection system has been developed which is based on CBR. This system is based on viral fever cases and symptoms by which one can detect the type of viral fever he or she has and what are the necessary action he or she has to take. A patient may report with some symptoms of different severity level (low, medium, and high) corresponding to each symptom. From the value of the severity of the symptom in a patient and the weight of each symptom with respect to a disease, the CBR system calculates the weighted mean value of each disease by which VFDS will detect the fever type that has the closest similarity with the existing cases in the case base.

The paper [10] discusses a knowledge-based system that uses rule-based and case-based method together to achieve the diagnosis. Rule-based systems deal with the problems with well-defined knowledge bases which do not provide flexibility of such system. CBR is used to extend the features of rule-based systems by utilizing the formal cases with new cases to improve the performance of the system. The result of this research shows that the system is useful to help pathologists in making accurate and timely diagnoses. The system also helps in eliminating errors which are a prominent cause of medical errors. The input deals with some specific characteristics corresponding to the disease, and the output is the probable solution to that disease. The system updates its case base based on the differences in characteristics between the old and new cases.

The research in [11] has shown the different AI approach for solving complex medical diagnosis in the determination of cancer which claims to be a better diagnosis method than the conventional human diagnosis which is significantly worse than the neural diagnostic system. This paper describes a new system for detection of heart disease based on feed-forward neural network architecture and genetic algorithm. Hybridization has applied to train the neural network using genetic algorithms which is inspired by nature and applied to the field of problem solving, notably function optimization and is proved to be effective in searching large, complex cases. Whenever the doctor submits data, the system identified the data from comparing with the trained data and generates a list of probable diseases that the patient may have. The experimental result shows that the proposed approach is more suitable compare to back propagation which is generally used to train neural network.

A Swine Flu Medical Recommender (SFMR) is developed in [12] with the help of CBR that helps medical officer in detecting swine flu. A case, comprising of several symptoms, each of which are assigned with a weighted value, is submitted as query to the system to compare it with the past cases from its case base using an algorithm for similarity which retrieve the most similar cases. These retrieved cases will be used to produce the probable solution to the problem or may be updated with some modifications.

The paper presented in [13] introduces an approach to implement knowledge within myCBR 3, which focused on vocabulary and similarity measure development system architecture. Vocabulary interprets the range of permissible values for attributes. For numeric values, this is usually the value range (minimum, maximum), while for symbolic values, this can be a list of values and similarity measures defining the relationship between attribute values. Similarity measures can be calculated using Hamming distance for numeric values or reference tables for symbolic values. The myCBR 3 Workbench supports powerful GUIs for representing the similarity of knowledge. This technique is used in modeling of knowledge model, retrieval of information, and case base management. The idea of global similarity is described after the measurement of local similarity based on available attributes which provides the definition of vocabulary and conceptual idea.

A CBR system for complex medical diagnosis has been discussed in paper [14] in the domain of detection of premenstrual syndrome (PMS) as it falls under both gynecology and psychiatry. To address this issue, the paper has proposed a CBR-based expert system that uses the k-nearest neighbor (KNN) algorithm to search k similar cases based on the Euclidean distance measure. The novelty of the system is in the design of a flexible autoset tolerance (T), which serves as a threshold to extract cases for which similarities are greater than the assigned value of T.

The architecture of the system is shown in Fig. 1. Each part of the system is essentially an individual problem solver. Generally, when the super user submits a case in the form of some symptoms to the system, the system then tries to search for the most similar cases from its existing database called case base using inference engine which processes the knowledge cases to generate a probable solution that helps the super user to identify the problem. The super user only can modify the case base if required.

Fig. 1
figure 1

Architecture of CEDS

2.1 Symptoms, Weight, and Symbol

Each case consists of several symptoms. Each symptom is attached with a weight value which denotes the relative importance of that symptom among the several symptoms exhibited in the disease and intensity of that symptom experienced by the patient expressed in the form of observed value for low, normal, and high. The symptoms are classified as primary (PS) and secondary symptoms (SS) based on list prepared by medical experts as per statistical evidence of the most common symptoms. We set the weight value of each symptom shown in Table 1 on the basis of relative importance values assigned by the medical experts that depend on statistical evidence.

Table 1 Symptoms, weight, and symbols

3 Proposed Methodologies

The proposed system CEDS is developed for identifying the cholera disease from the study of several symptom correspondence to that disease using the tools of NetBeans 8.0 and its database. Initially, medical expert submits a well-defined query in the form of symptoms partially collected from the patient. The system works at 2-stage search based on Cardinality and Similarity, respectively, to analyze the data for results.

In cardinality approach, the system deals with the common primary symptoms with respect to the disease that the patient is suffering from, as in Table 1. On processing the useful data, the CEDS checks whether the given symptoms experienced by the patient match with the maximum number of possible common symptom or not to predict the initial solution as shown in Fig. 2.

Fig. 2
figure 2

Symptoms chart with primary symptoms

If the desired initial solution is generated, the system will go for next verification to rectify the initial solution (Table 2).

Table 2 Cardinality approach

In similarity searching approach, the secondary symptoms are initially fed to the system by the medical officer as per his experience and knowledge in Fig. 3. On processing the data, system will retrieve the similar type of cases from the case database. If C new is the case which is submitted to the system, then the system will return the k most similar cases with respect to threshold value \( \delta_{t} \) from its existing case base C as shown in Fig. 4 using k means neighboring approach as shown in Eq. (1). Here, we consider k = 3.

$$ {\text{Sim}}\left( {C_{i} ,C_{\text{new}} } \right) = \frac{{\sum {(O_{i}^{j} - O_{i}^{\text{new}} )} \times W_{i} }}{{\sum {W_{i} } }} $$
(1)

where \( O_{i}^{j} \) defined as the observed value O of jth symptom of case i and w is the weight value of that symptom. Now, system will select three most similar cases by Eq. (1) say C r , C s , and C t with respect to C new. Let \( C^{{\prime }} \) be the most similar cases, then \( C^{\prime } = \left\{ { C_{{p \in \left\{ {r,s,t} \right\} }} |C_{p} \in C\quad {\text{where}}\;C\;{\text{set}}\;{\text{of}}\;{\text{cases}}\;{\text{in}}\;{\text{case}}\;{\text{base}}} \right\} \).

Fig. 3
figure 3

Symptoms chart with secondary symptoms

Fig. 4
figure 4

Partial case base of similar three cases

  • Calculation of Deviation (DV):

After retrieving the similar cases, system will compare those with the new case C new to check the amount of deviation DV of the individual case from the new as shown.

$$ \overline{{C_{p} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{{{\text{no}} .\;{\text{of}}\;{\text{symptyoms}}}} f\left( {C_{p} ;C_{\text{new}} } \right) \times W_{i} }}{{\sum {W_{i} } }} $$
(2)
$$ \overline{{\dot{C}_{p} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{{{\text{no}} .\;{\text{of}}\;{\text{symptyoms}}}} f^{2} \left( {C_{p} ;C_{\text{new}} } \right) \times W_{i} }}{{\sum {W_{i} } }} $$
(3)

where

$$ f\left( {C_{p } ;C_{s} } \right) = \left| {O_{p}^{i} - O_{\text{new}}^{i} } \right| $$
(4)

where f is the absolute difference of two observed values

$$ f^{2} \left( {C_{p} ;C_{\text{new}} } \right) = \left| {O_{p}^{i} - O_{\text{new}}^{i} } \right|^{2} $$
(5)
$$ {\text{DV}}^{2} \left( {C_{p} } \right) = \left( {\overline{{\dot{C}_{p} }} } \right) - X^{2} \quad {\text{where}}\;X = \overline{{C_{p} }} $$
(6)
  • Best case selection:

The system will return the best case C best by computing the best fitness value by Eq. (7).

$$ {\text{fitness}}\left( {C_{p} } \right) = \mathop {\hbox{min} }\limits_{{p \in \{ r,s,t\} }} \left\{ {{\text{DV}}\left( {C_{p} } \right) } \right\} $$
(7)

If the best case \( C_{\text{best}} \ge \tau \), then the patient has a high chance of Cholera. Here, \( \tau \) is the threshold value.

4 Result and Analysis

C24 is the most matched case among the three past similar cases with respect to new case C12 which is calculated by the best fitness value of each case by Eq. (7) (Fig. 5).

Fig. 5
figure 5

Report of probable solution with case no.

The deviation of three cases with respect to the new case is represented by three different colors shown in Fig. 6. The vertical axis of the graph shows the deviation (DV) for each case, while the horizontal axis represents the case numbers. The representation of this chart helps the medical expert to take an instant action for diagnosis.

Fig. 6
figure 6

Comparison of DV of three most similar cases w.r.t new case

5 Flow Chart

See Fig. 7.

Fig. 7
figure 7

Flowchart of CEDS

6 Conclusion

In this research, a simple CBR system to detect the cholera diseases has been developed in order to explain how CBR can be helpful in assisting the doctors to take a prominent decision. However, CBR is under the assumption that similar problems have similar solutions, and new problem can be solved by utilizing past similar problems or modifying solutions generated by the system. So a complex CBR case is required before it can match user’s condition. In the future, this system can be extended to work in a distributed environment so that it will operate on real-time cases. We also wish to develop our system to work on stronger platform with real-time data.