Detection and explanation of anomalies in healthcare data

The growth of databases in the healthcare domain opens multiple doors for machine learning and artificial intelligence technology. Many medical devices are available in the medical field; however, medical errors remain a severe challenge. Different algorithms are developed to identify and solve medical errors, such as detecting anomalous readings, anomalous health conditions of a patient, etc. However, they fail to answer why those entries are considered an anomaly. This research gap leads to an outlying aspect mining problem. The problem of outlying aspect mining aims to discover the set of features (a.k.a subspace) in which the given data point is dramatically different than others. In this paper, we present a framework that detects anomalies in healthcare data and then provides an explanation of anomalies. This paper aims to effectively and efficiently detect anomalies and explain why they are considered anomalies by detecting outlying aspects. First, we re-introduced four anomaly detection techniques and outlying aspect mining algorithms. Then, we evaluate the performance of anomaly detection techniques and choose the best anomaly detection algorithm. Later, we detect the top k anomaly as a query and detect their outlying aspect. Lastly, we evaluate their performance on 16 real-world healthcare datasets. The experimental results show that the latest isolation-based outlying aspect mining measure, SiNNE, has outstanding performance on this task and has promising results.


Introduction
Despite improvements in healthcare instruments, the presence of medical errors remains a severe challenge [1].Applying machine learning (ML) and artificial intelligence (AI) algorithms in the healthcare industry helps improve patients' health more efficiently.According to [2], around 86% of healthcare companies use machine learning and artificial intelligence algorithms.These algorithms help in many ways, such as medical image diagnosis [3,4], disease detection/classification [5][6][7], medical data analysis [8], medical data classification [9,10], drug discovery [8], robot surgery [8], detect anomalous reading [11], etc.Recently, researchers have been interested in detecting abnormal activity in the healthcare industry.
Anomaly or outlier 1 is defined as a data instance that does not conform with the remainder of that set of data instances.In the healthcare domain, an anomaly is referred to as an unusual health condition or activity of a patient [12,13].A vast number of applications have been developed to detect anomalies from medical data [14][15][16][17].However, no study has been conducted to find out why these points are considered as an anomaly, i.e., on which set of features a data point is dramatically different than others, as far as we know.The problem of detecting such an explanation leads to outlying aspect mining (a.k.a, outlier explanation, outlier interpretation, outlying subspaces detection).Outlying aspect mining aims to identify the set of features where the given point (or a given anomaly) is most inconsistent with the rest of the data.
In many healthcare applications, a medical officer wants to know the most outlying aspects of a specific patient compared to other patients.For example, you are a doctor having patients with Pima Indian diabetes disease.While treating a particular patient, you want to know in which aspects this patient differs from others.For example, let's consider the Pima Indian diabetes disease data set. 2 For 'Patient A' , the most outlying aspect will be having the highest number of pregnancies and low diabetes pedigree function (see Fig. 1), compared to other subspaces.
Another example is when a medical insurance analyst wants to know in which aspects the given insurance claim is most unusual.The above-given applications are different than anomaly detection.Instead of searching the whole data set for the anomaly, in outlying aspect mining, we are specifically interested in a given data instance.The goal is to find out outlying aspects where a given data instance stands out.Such data instance is called a query q.
These interesting applications of outlying aspect mining in the medical domain motivated us to write this paper.In this paper, we first introduce four anomaly detection techniques and outlying aspect mining methods.Later, we evaluate their performance on 16 healthcare datasets.To the best of our knowledge, it is the first time when these algorithms have been applied to healthcare data.Our results have verified their performance on anomaly detection and outlying aspect mining tasks and found that isolation-based algorithm presents promising performance, i.e., iForest perform well in anomaly detection and SiNNE perform well for outlying aspect mining task.
The rest of the paper is organized as follows.Section 2 summarizes the principle and working mechanism of four outlying aspect mining algorithms and anomaly detection algorithms.Next, the experimental setup and results are summarized in Sects.3 and 4, respectively.Finally, we conclude the paper in Sect. 5.

Existing methods
Before describing different outlying aspect mining algorithms, we first provide the problem formulation.

Basic notations and definitions
Definition 1 (Problem definition) Given a set of n instances X ( X = n) in d dimensional space, a data point q ∈ X , is called anomaly iff, • q dramatically differs from others in full feature space.and a subspace S is called outlying aspect of q iff, • outlyingness of q in subspace S is higher than other subspaces, and there is no other subspace with the same or higher outlyingness.
Outlying aspect mining algorithms first require a scoring measure to compute the outlyingness of the query in subspace and a search method to search for the most outlying subspace.In the rest of this section, we review different scoring measures only.For the search part, we will use Beam [18] search method because it is the latest search method and is used in different studies [18][19][20][21][22][23].
The flowchart of the complete process is presented in Fig. 2.

Existing anomaly detection scoring measures LOF
The core idea of density-based anomaly detection is the density of the anomalous object is significantly different The description of the data set is provided in Table 1.
from the normal instance.The first local density-based approach, called LOF, which stands for Local Outlier Factor introduced by [24], which is the widely used local outlier detection approach.For any data object, the LOF score is the ratio of the average local density of its k-nearest neighbors to its local density [25].The LOF score of data object q is defined as follows: where lrd(q) max(dist k (x,D),dist(q,x)) , N k (q) is a set of k-nearest neighbours of q , dist(q, x) is a distance between q and x and dist k (q, D) is the distance between q and its k-NN in X .The LOF score represents the sparse- ness of the data object.Data objects with higher LOF values are considered as anomalies.

iForest
Liu et al. [26] presented a framework called Isolation Forest or iForest, which isolates each data point by axisparallel partitioning of the attribute space.To the best of our knowledge, iForest is the first technique that uses an isolation mechanism to detect anomalies.iForest builds an ensemble of trees called isolation trees (iTree).Each iTree is built using a randomly selected subsample without replacement from the data set.A random split is performed at each node on a randomly selected point from attribute space.The partition will terminate once all the nodes have only one data object or nodes reach the tree's height limit for iTree.The anomaly score for q ∈ R d based on iForest is defined as: where l i (q) is the path length of q in tree T i .

Sp
Rather than searching for k-nearest neighbor in the data set, [27] employs scoring measure based on the nearest neighbor (k =1) in random sub-samples ( S ⊂ D ).The Sp score of data object q is defined as follows: Fig. 2 The flowchart where dist(q, x) is a distance between q and x.
In [27], authors have shown that Sp performs better than state-of-the-art anomaly detector LOF and runs faster than LOF.iNNE Bandaragoda et al. [28] proposed iNNE, which is stands for isolation using Nearest Neighbor Ensemble.The core idea behind iNNE is an anomaly is far away from its nearest neighbor, and the inverse is true for the regular object.iNNE implementation is influenced by iForest and LOF.The critical difference between iNNE and iForest is that iForest builds a tree from subspaces while iNNE builds hyperspheres using all dimensions.An isolation score of q is defined as follows: where cnn(q) = arg min The anomaly score for data object q is defined as: where I i (q) is isolation score based on sub-sample in i th set.

Outlying aspect mining algorithms OAMiner
Duan et al. [29] introduce Outlying Aspect Miner (OAMiner in short), which uses a Kernel Density Estimation (KDE) [30] based scoring measure to compute the outlyingness of query q in subspace S: where f S (q) is a kernel density estimation of q in sub- space S, m is the dimensionality of subspace S ( |S| = m ), h i is the kernel bandwidth in dimension i.
Duan et al. [29] have stated that density is a bias towards high-dimensional subspaces-density tends to decrease as the dimension increases.Thus, to remove the effect of dimensionality bias, they proposed using the query's density rank as a measure of outlyingness.To find Sp (q) = min x∈S dist(q, x) the most outlying subspace of the query, the density of all data points needs to compute in each subspace, where the subspace with the best rank is selected as an outlying aspect of the given query.OAMiner systematically enumerates all the possible subspaces.In OAMiner, the author has used the set enumeration tree approach [31], which is widely used by the data mining research community.OAMiner searches for subspaces by traversing a depth-first manner [32].OAMiner used some anti-monotonicity properties to prune the subspaces.Given data set O , a query object q and subspace S, if rank(f S (q)) = 1, then every super-set of S cannot be a minimal subspace and thus can be pruned.

Beam
Vinh et al. [18] captures the concept of dimensionality unbiasedness and further investigates dimensionally unbiased scoring functions.Dimensionality unbiasedness is an essential property for outlying measures because the query object is compared in different subspaces with a different number of dimensions.They proposed two novel outlying scoring metrics (1) density Z-score and ( 2) isolation Path score (iPath in short).Their work showed that the proposed Z-score and iPath are dimensionally unbiased.
Therein, the density Z-score is defined as follows: where µ f S and σ f S are the mean and standard devia- tion of the density of all data instances in subspace S, respectively.The iPath score is motivated by isolation Forest (iForest) anomaly detection approach [26].The intuition behind iForest is that anomalies are few and susceptible to isolation.iForest constructs t trees, where each tree is built from randomly selected sub-samples ψ ( ψ ≪ n ).Later, it divides using the axis-parallel random splits.Since in the outlying aspect mining context, the main focus is on the path length of the query; thus, authors have ignored other parts of the tree.In outlying aspect mining, the intuition behind the iPath score is that in the most outlying subspace, a given query is easy to isolate than the rest of the data.
The process of calculating the iPath of query q w.r.t.sub-samples ψ of the data is where l i S (q) is path length of q in i th tree and subspace S.
Vinh et al. [18] was the first to coin the term dimensionality unbiasedness.
Definition 2 (Dimensionality unbiased [18]) A dimensionality unbiased outlyingness measure (OM) is a measure of which the baseline value, i.e., average value for any data sample O = {o 1 , o 2 , • • • , o n } drawn from a uniform distribution, is a quantity independent of the dimension of the subspace S, i.e., In [18,Theorem 3], it is proven that rank transformation and Z-score normalization have resulted in a constant average value in any data distribution.Furthermore, it is worth noting that the Z-score scoring function is not only normalized but also the variance of the normalized measures that are constant to dimensions.
The overall beam search process is divided into three stages.All 1-D subspaces are inspected in the first stage to identify trivial outlying features.In the subsequent stage, an exhaustive search is performed on all possible 2 dimensional subspaces.In the third stage, the beam search is implemented at level l.The beam algorithm only keeps top W subspaces (called beam width) in the search process.The total number of subspace considered by the beam algorithm is in the order of O(d 2 + W d max ) where d max is the maximum dimension of subspace, and W is the beam width.sGrid Wells and Ting [23] introduced a simple grid-based density estimator called sGrid.sGrid is a smoothed variant of a grid-based density estimator [30].Let O be a collection of n data objects in D-dimensional space, x.S be a projection of a data object x ∈ O in subspace S. The sGrid den- sity of point q is computed as points that fall in a bin that covers point q and its surrounding neighbors.
Their work showed that the proposed density estimator has advantages over the existing kernel density estimator in outlying aspect mining by replacing the kernel density estimator with sGrid.By replacing KDE with the sGrid density estimator, OAMiner [29] and Beam [18] run two orders of magnitude faster than their original implementation.However, sGrid is not a dimensionally unbiased measure, requiring Z-Score normalization.Again, it makes sGrid computationally inefficient.

SiNNE
Very recently, [21] proposed a Simple Isolation score using Nearest Neighbor Ensemble (SiNNE in short) = const.w.r.t S measure which from Isolation using Nearest Neighbor Ensembles (iNNE in short) method for anomaly detection [28].SiNNE constructs t ensemble of models Each model has ψ hyperspheres, where a radius of the hypersphere is the euclidean distance between a ( a ∈ D i ) to its nearest neighbor in D i .
The outlying score of q in model M i , I(q�M i ) = 0 if q falls in any of the ball and 1 otherwise.The final outlying score of q using t models is: In their work, they argue that Z-score normalization is biased towards a subspace having high-density variance, and the definition of dimensionality unbiasedness needs to be revised.Furthermore, SiNNE is computationally faster than density and distance-based measures.

Datasets
In this study, we used 16 publicly available benchmarking medical datasets for anomaly detection; BreastW and Pima are from [33], 3 Annthyroid, Cardiotocography, Heart disease, Hepatitis, WDBC and WPBC are from [34]  4 and Arrhythmia, Lympho, Mammography, Musk, Thyroid, Vertebral, WBC, and Yeast are from [35]. 5 The summary of each data set is provided in Table 1.

Algorithm implementation and parameters
We use PyOD [36] Python library to implement anomaly detection algorithms.In terms of implementation of OAM algorithms, we used Java implementation of sGrid and SiNNE, which is made available by the authors [23] and [21], respectively.We implemented RBeam and Beam in Java using WEKA [37].
We used the default parameters of each algorithm as suggested in respective papers unless specified otherwise.

Evaluation measure
We used the area under the ROC curve (AUC) [39] and precision at n (P@n) 6 [40] as a measure of effectiveness for anomaly ranking produced by an anomaly detector.An anomaly detector with a high AUC indicates better detection accuracy, whereas a low AUC indicates low detection accuracy.Samariya and Ma [20] proposed a new kernel mean embedding-based evaluation measure in the outlying aspect mining domain.The intuition behind the evaluation measure is that in most outlying aspects, a query q is far from the distribution of data in those aspects.

Definition 3
The quality of discovered aspects (or subspace(s)) S for a query q is computed as where K S (q, x) is a kernel similarity between q and x in subspace S.
All experiments were conducted on a machine with an Intel 8-core i9 CPU and 16 GB main memory, running on macOS Big Sur version 11.1.We run each job on multiple single CPU treads, which is done using GNU parallel [42].

Empirical evaluation
In this section, we present the result of four anomaly detection methods; LOF, iForest, Sp, and iNNE and four outlying scoring measures; Kernel Density Rank (1) (RBeam), Density Z-score (Beam), sGrid Z-score (sBeam) and SiNNE (SiBeam) using Beam search on medical datasets.All experiments were run for 1 h, and unfinished tasks were killed and presented as ' ‡' .

Experiment-1: Performance of anomaly detection algorithms
In this sub-section, we presented the results of four anomaly detection techniques: LOF, iForest, Sp, and iNNE in terms of AUC.
The AUC comparison of LOF, iForest, Sp, and iNNE is presented in Table 2 (c.f.columns 2 to 5 of Table 2).It is interesting to note that no specific anomaly detection algorithm performs best in each dataset.However, iForest is the best-performing measure with having the best AUC in 10 datasets.In the last row of Table 2, the Avg.AUC of each anomaly detection method shows that iForest produced the best AUC while Sp had a significantly low AUC.Whereas LOF and iNNE produce comparative results.
The total runtime, which includes pre-processing, model building, ranking n instances, and computing AUC, is presented in Table 2 (c.f.columns 6 to 9 of Table 2).Overall, Sp is the fastest measure compared to others.While iForest and iNNE almost take similar time.

Experiment-2: Performance of outlying aspect mining algorithms
We first use the iForest anomaly detection method for each data set to detect top k=10 anomalies; then, they are used as queries.Each scoring measure identifies outlying aspects for each anomaly (queries).We detect the quality of subspace using Eq. 1.
Tables 3, 4, 5 and 6 shows the subspace found by four scoring measures and quality of discovered subspace on 16 real-world medical datasets.RBeam and Beam cannot finish on annthyroid and musk in an hour; thus, we presented as ' ‡' .
Out of 160 queries, SiBeam detects a better subspace for 116 queries, and sGBeam detects a better subspace for only 23 queries.While RBeam detects better subspaces for 40 out of 140 queries and Beam only for 6 queries.Overall, SiBeam is the best-performing measure, and RBeam is a slow measure; however, it performs better than the Z-score-based measure.As mentioned in [20,21], Z-score-based measures are biased towards subspace having high variance.Thus, both Z-score-based measures perform worst in this comparison.
Next, we visually present the discovered subspaces by different scoring measures of three queries from each data set.Note that each one-dimensional subspace is plotted using a histogram with 10 equal-width bins.

Table 8 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the arrhythmia data set
By visually comparing discovered subspaces by each measure, out of 48 queries (3 from each data set), SiBeam and sGBeam detect better subspaces for 39 and 18 queries.In contrast, RBeam and Beam detect better subspaces for 29 and 11 out of 42 queries.Overall, visually we can say that SiBeam performs best or comparative to RBeam, Beam, and sGBeam.

Conclusion
This paper shows an interesting application of OAM in the healthcare domain.We first introduced four anomaly detection and outlying aspect mining algorithms.Then, we presented a framework that not only detects anomalies but also explains why a given query is an anomaly; by providing a set of features where it is most outlying compared to others.Our evaluation on 16 medical datasets shows that iForest is the bestperforming measure.Furthermore, our experiment on the task of anomaly explanation (outlying aspect mining) shows that the recently developed isolation-based outlying scoring measure SiNNE outperforms other state-of-the-art outlying aspect mining scoring measures.In the medical domain, it is essential to have a fast

Fig. 1
Fig. 1 Outlying aspects of Patient A on different features.The square point represents Patient A

Table 9
Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the breastw data set Table 10 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the cardiotocography data set Table 11 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the diabetes data set Table 13 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the hepatitis data set Page 17 of 23 Samariya et al.Health Information Science and Systems (2023) 11:20 Table 14 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the lympho data set Table 15 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the mammography data set Table 17 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the pima data set Table 18 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the thyroid data set Table 19 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the vertebral data set Table 20 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the wbc data set algorithm; thus, kernel density or Z-score-based scoring measures are not suitable while the data set is huge.

Table 7 Visualization of discovered subspaces by RBeam, Beam, sGBeam, and SiBeam in the annthyroid data set Samariya
et al.Health Information Science and Systems (2023) 11:20