Anytime algorithm for frequent pattern outlier detection
 1.1k Downloads
 3 Citations
Abstract
Outlier detection consists in detecting anomalous observations from data. During the past decade, outlier detection methods were proposed using the concept of frequent patterns. Basically such methods require to mine all frequent patterns for computing the outlier factor of each transaction. This approach remains too expensive despite recent progress in pattern mining field to provide results within a short response time of only a few seconds. In this paper, we provide the first anytime method for calculating the frequent pattern outlier factor (FPOF). This method which can be interrupted at anytime by the enduser accurately approximates FPOF by mining a sample of patterns. It also computes the maximum error on the estimated FPOF for helping the user to stop the process at the right time. Experiments show the interest of this method for very large datasets where exhaustive mining fails to provide good approximate solutions. The accuracy of our anytime approximate method outperforms the baseline approach for a same budget in number of patterns.
Keywords
Pattern mining Outlier detection Pattern sampling1 Introduction
Outlier detection consists in detecting anomalous observations from data [17]. The outlier detection problem has important applications, such as detection of credit card fraud or network intrusions. During the past decade, outlier detection methods were proposed for categorical data [2, 7, 9, 18, 20, 27, 28]. The general principle is to build a model that reflects the majority of the dataset and to judge as outlier all data observations that deviate from this model. Some of these approaches use the concept of frequent patterns [18, 20, 27] for building the model. Their key idea is to consider the number of frequent patterns supported by each data observation. A data observation is unlikely to be an outlier if it supports many frequent patterns since frequent patterns correspond to the “common features” of the dataset. Frequent pattern outlier detection methods first extract all frequent itemsets from the data and then assign an outlier score to each data observation based on the frequent itemsets it contains. These outlier detection methods follow the schema of patternbased twostep methods.
 1.
Threshold issue The completeness of the first step requires to adjust thresholds which is recognized as being very difficult. Typically, if the minimal support threshold is too low, the extraction becomes unfeasible. If it is too high, some essential patterns are missed.
 2.
Accuracy issue Completeness leads to huge pattern volumes without guaranteeing not missing important patterns. For a smaller budget (in time or number of patterns), we claim that nonexhaustive methods can produce collections of patterns better adapted to the task of the second step. Interestingly, a nonexhaustive method can even guarantee a certain quality on the second step.
 3.
Runtime issue The exhaustive mining of all patterns requires to explore the search space in a certain fashion that extracts either very general patterns first (breadthfirst search) or very similar patterns to each other (depthfirst search). For having patterns regularly covering the search space, it is necessary to wait for the end of the extraction step before starting the model construction. As this first step is very timeconsuming, it prevents the user to have an immediate answer.

Accuracy The result of our samplingbased anytime algorithm converges to the exact FPOF when time tends to infinity. In particular, the Kendall’s tau which evaluates the similarity between the rankings induced by the approximate and exact FPOF increases rapidly and smoothly with pattern budget.

Certainty The error estimated stemming from Bennett’s inequality is relatively close to the true error. The enduser therefore has an objective interestingness measure in order to help stop the algorithm at the right time.

Stability Even if the proposed algorithm is nondeterministic, the variability (evaluating by the standard deviations of accuracy measures) decreases with sample size and time. This means that multiple executions give approximately the same answer.
2 Related work
2.1 Patternbased outlier detection
The outlier detection methods are primarily based on the construction of a model that describes the majority of data observations. A new data observation is then considered abnormal when it strongly deviates from this model. In this paper, we mainly focus on the outlier detection methods dedicated to categorical data. A broader view of outlier detection is provided by surveys including [17]. Different frameworks are dedicated to categorical data for the construction of the model including the Minimum Description Length framework [2], the probability framework (using Hidden Markov Models (HMM) [7], joint probabilities [9] or a random walk on attributes [28]) and the patternbased framework [18, 20, 27]. Patternbased methods benefit from the progress of pattern mining made over the past two decades. The key idea is that as the frequent patterns reflect the distribution of the dataset, they form a representative model of the dataset. Such methods remain efficient for highdimensional spaces unlike other methods dedicated to categorical data.
The first patternbased approach [18] introduced the FPOF that exploits the complete collection of frequent itemsets (while [27] uses an opposite approach by considering nonfrequent itemsets). More recently, [20] replaces the collection of frequent itemsets by the condensed representation of NonDerivable Itemsets (NDI) which is more compact and less expensive to mine. We would go further by showing that the FPOF proposed in [18] can be approximated efficiently by extracting a small sample of patterns.
This paper benefits from FPOF which remains a popular outlier detection factor despite its known limits. Unlike other methods, it does not exploit the data structure which is often used to improve the detection of abnormal data: an organization as attributevalue [2, 9, 28] and in a most original way, sequentiality [7]. Moreover, recent experiments [28] have shown that the FPOF is not wellsuited for identifying abnormal data when data are noisy or attributes have very different distributions. Finally, the main flaw of FPOF that we already discussed in the introduction is its computational cost. In addition, it is necessary to wait until the end of the execution to know what are the outliers. Note that even, nonpatternbased outlier detection methods which are polynomial with the dataset size suffer from the same drawbacks. By offering an anytime algorithm, our proposal gives a first result in a short response time and if there is enough time, it converges to a result as good as would give the original FPOF method.
2.2 Pattern sampling
Previous methods for patternbased outlier detection enumerates exhaustively all patterns satisfying a given selection predicate, called constraint [24] (e.g., minimal frequency). As mentioned in introduction, it is recognized that constraintbased pattern mining leads to threshold and runtime issues which are sometimes a severe bottleneck. Recently, there has been a resurgence in pattern mining for nonexhaustive methods [12] through pattern sampling [6, 8]. Pattern sampling aims at accessing the pattern space \({\mathcal L} \) by an efficient sampling procedure simulating a distribution \(\pi : {\mathcal L} \rightarrow [0,1]\) that is defined with respect to some interestingness measure m: \(\pi (.) = m(.) / Z\) where Z is a normalizing constant (formal framework and algorithms are detailed in [6]). In this way, the user has a fast and direct access to the entire pattern language and with no parameter (except possibly the sample size). Pattern sampling has been introduced to facilitate interactive data exploration [31]. As constraintbased pattern mining, pattern sampling problem has been declined for different languages like itemsets [6] and graphs [15], and different interestingness measures including support [6, 15], area [6, 26], discriminative measure [6, 15] or utility measure [6, 25, 26].
To the best of our knowledge, there are only two proposals benefiting from pattern sampling to instantly build patternbased global models: representative set of patterns [8] and tiling [26]. In this paper, we investigate the use of pattern sampling for assigning an outlier score to each transaction (a kind of model). But we go further by refining this model over time to finally tend to the exact model. With a lower (pattern or time) budget than that of an exhaustive method, we obtain a higher quality with a bounded error.
2.3 Anytime algorithms for pattern mining
Introduced in the field of realtime system design [4, 32], anytime algorithms have more recently been used in the field of data mining [3, 5, 11, 16, 23], and more specifically for pattern mining [30]. Most of the time, anytime algorithms have been used to build global models (e.g., classifiers, clusterings or rankings) when the computation time required to obtain a first model is very important. One approach to build global models using anytime algorithms is to enumerate the set of all possible solutions and keep anytime the best solution, i.e., the best global model. For example, using depthfirstsearchbased algorithms, this approach has been used to build Bayesian networks [23], or to extract groups with maximum coverage from spatiotemporal data of mobile users [30]. Another approach to build global models using anytime algorithms is to compute first a rough solution and then to refine this solution over time. For example, this approach is used in [5] to build an anytime densitybased clustering algorithm and in [16] to provide high quality subspace clusterings of data streams. This approach is also used in this paper to extract outliers. Indeed, using pattern sampling, our algorithm refines the FPOF of transactions over time.
To the best of our knowledge, only the works in [3] address the problem of outlier detection using anytime algorithms. In [3], the authors propose an anytime algorithm to determine within any period of time whether an object in a data stream is anomalous or not. The more time is available, the more reliable the predictions are. Compared to this work, in this paper, we do not propose an algorithm to detect outliers in data streams, but in very large datasets. However, we have the same property, meaning that the accuracy of our predictions (a transaction is an outlier or not) increases with time. Finally, to the best of our knowledge, only the works in [30] use anytime algorithms for pattern mining. Nevertheless, compared to our work, this work solves a very different problem, i.e., finding groups of users with maximum coverage in the context of spatiotemporal data mining.
3 Frequentpatternbased outlier detection
3.1 Basic definitions
Three toy datasets with slight variations
Trans.  Items  

\({\mathcal D} \)  
\(t_1\)  A  B  
\(t_2\)  A  B  
\(t_3\)  A  B  
\(t_4\)  C  
\({\mathcal D} '\)  
\(t_1\)  A  B  
\(t_2\)  A  B  
\(t_3\)  A  B  
\(t_4\)  C  
\(t_5\)  \(\mathbf{A}\)  \(\mathbf{B}\)  
\({\mathcal D} ''\)  
\(t_1\)  A  B  \(\mathbf{D}\)  
\(t_2\)  A  B  \(\mathbf{D}\)  
\(t_3\)  A  B  \(\mathbf{D}\)  
\(t_4\)  C 
3.2 Frequent pattern outlier factor
Intuitively, a transaction is more representative when it contains many patterns which are very frequent within the dataset. In contrast, an outlier contains only few patterns and these patterns are not very frequent. The FPOF [18] formalizes this intuition:
Definition 1
The range of \({ fpof}\) is [0, 1] where 1 means that the transaction is the most representative transaction of the dataset, while a value near 0 means that the transaction is an outlier. Other normalizations (denominator) are possible like \({Supp}({{\mathcal L}},{{\mathcal D}}) \) or \(\sum _{t \in {\mathcal D}} {Supp}({2^t},{{\mathcal D}}) \). Whatever the normalization method, two transactions remain ordered in the same way (so it does not affect the Kendall’s tau that we use to evaluate our method). Under a certain Markov model, the score \({ fpof} (t, {\mathcal D})\) is also the proportion of time that an analyst would dedicate to study the transaction t considering the collection of frequent itemsets [13].
In the first dataset provided by Table 1, \(t_1\) is covered by \(\emptyset \) (\(supp (\emptyset ,{\mathcal D}) = 1\)) and, A, B and AB whose support equals to 0.75 (\({Supp}({\{\emptyset , A, B, AB\}},{{\mathcal D}}) = 3.25\)), while \(t_4\) is only covered by \(\emptyset \) and C (\({Supp}({\{\emptyset , C\}},{{\mathcal D}}) = 1.25\)). Consequently, \({ fpof} (t_1, {\mathcal D} _1) = 3.25 / 3.25\) and \({ fpof} (t_4, {\mathcal D} _1) = 1.25 / 3.25\). In this example, \(t_4\) appears to be an outlier. It is easy to see that increasing the frequency of the patterns covering the first transactions (e.g., dataset \({\mathcal D} '\)) decreases the FPOF of \(t_4\). Similarly, increasing the number of patterns covering the first transactions also decreases the FPOF factor of \(t_4\) (e.g., dataset \({\mathcal D} ''\)).
4 Problem formulation
4.1 Exact FPOF computation problem
Given a dataset \({\mathcal D}\), the outlier detection problem consists in computing the FPOF for each transaction \(t \in {\mathcal D} \). In practice, this exact calculation of the FPOF was performed by mining all patterns appearing at least once in the dataset (i.e., with \(\sigma = 1 / {{\mathcal D}} \)) [18]. Of course, this expensive task is not possible for very large datasets. Recently, it has been demonstrated that the FPOF can be reformulated in order to calculate the exact FPOF in polynomial time [14].
Time comparison of two exact methods for calculating FPOF
\({\mathcal D} \)  Exhaustive Time (s)  Nonenumerative time (s) 

chess  439.5  1.1 
connect  748.5  577.7 
mushroom  0.4  5.9 
pumsb  Time out  1,970.5 
retail  8.7  5,969.9 
sick  0.8  0.5 
Table 2 reports the running time required for calculating the exact FPOF using the classical exhaustive method [18] and the nonenumerative method [14] (respectively the 2nd and the 3rd column) based on the experimental setting described in Sect. 6. Note that the exact exhaustive method (as baseline) benefits from lcm which is one of the most recognized frequent itemset mining algorithm. The nonenumerative method is effective and rivals the exact exhaustive one. Its main advantage is to calculate the exact FPOF with datasets where the exact exhaustive method fails (e.g., pumsb where the execution was aborted after 5h).
However, even with a polynomial method, Table 2 shows that the exact calculation remains timeconsuming. It is clear that the exact FPOF calculation cannot be guaranteed in a short response time. Thus, it makes sense to propose approximate algorithms for the FPOF computation.
4.2 Approximate FPOF computation problem
Let us focusing on a classical approach used in the literature to approximate the FPOF. Instead of using the complete collection of patterns, FPOF is usually approximated with a collection of frequent patterns i.e., with a higher minimal support threshold:
Definition 2
4.3 Anytime FPOF computation problem
Figure 1 shows that the Kendall’s tau varies significantly depending on the dataset for a same minimal support threshold. It means that this threshold is not easy to fix for obtaining a good compromise between efficiency and quality. It clearly hinders the user interactivity. Therefore, it seems interesting to rephrase the approximate FPOF problem by opting for an anytime perspective. In this context, the method informs the user with a feedback on the maximum error on the current approximate FPOF. Then, the user will choose the right time to stop the method.

\({{ fpof} (t, {\mathcal D})  \widetilde{{ fpof} _k}(t, {\mathcal D})} \le \epsilon _k\) for each transaction \(t \in {\mathcal D} \), with confidence \(1  \delta \) and

\(\epsilon _{k + 1} \le \epsilon _k\) where \(\lim _{k \rightarrow + \infty }\epsilon _k = 0\).
5 Anytime sampling method
This section addresses the above problem by using pattern sampling. First, we propose a method for approximating FPOF from a pattern sample drawn according to frequency. Then we show how to estimate the error of this approximation. Finally, we detail our samplingbased anytime algorithm.
5.1 Pattern sampling for FPOF
In Sect. 4, we showed that the use of the most frequent patterns is insufficient to approximate accurately FPOF. The most frequent patterns do not measure the singularity of each transaction that also relies on more specific patterns (whose frequency varies from small to average). Conversely do not considering frequent patterns would also be a mistake because they contribute significantly to FPOF. A reasonable approach is to select patterns randomly with a probability proportional to their weight in the calculation of FPOF. Typically, in the dataset \({\mathcal D}\) of Table 1, the itemset AB is 3 times more important than itemset C in the calculation of FPOF due to their frequency.
In recent years, pattern sampling techniques have been proposed to randomly draw patterns in proportion to their frequency [6]. Such approaches are ideal to bring us a welladapted collection of patterns. Of course, it remains the nontrivial task of approximating FPOF starting from this collection. This is what provides the following definition:
Definition 3
It is important to note that \({\centerdot \centerdot } \) is used here instead of \({Supp}({\centerdot \centerdot },{{\mathcal D}}) \) as done in Definition 1. As the sampling technique already takes into account the frequency when it draws patterns, it is not necessary to involve the support here. Indeed, the draw is with replacement for the correct approximation of FPOF (without this replacement the most frequent patterns would be disadvantaged). It induces that the same pattern can have multiple occurrences within the sample \({\mathcal S}_{k} ({\mathcal D})\).
For the same sample size k and for the same transaction t, it is possible to calculate different values of a ksampling FPOF due to \({\mathcal S}_{k} ({\mathcal D})\). But, the higher the threshold k, the less the difference between values stemming from two samples is high. Furthermore, the greater the sample size k, the better the approximation:
Property 1
(Convergence) Given a dataset \({\mathcal D}\), a ksampling FPOF converges to the FPOF for each transaction \(t \in {\mathcal D} \).
Proof
\({\mathcal S}_{k} ({\mathcal D}) \sim supp ({\mathcal L}, {\mathcal D})\) means that there exists a constant \(\alpha > 0\) such that \(\forall X \in {\mathcal L} \), \(\lim _{k \rightarrow \infty } {{{\mathcal S}_{k} ({\mathcal D})}{\triangleright {\{X\}}}} = \alpha supp (X, {\mathcal D})\). Then, for each transaction t, we obtain that: \(\lim _{k \rightarrow \infty } {{ {\mathcal S}_{k} ({\mathcal D})}{\triangleright {2^t}}} = \alpha \sum _{X \in 2^t} supp (X, {\mathcal D}) = \alpha {Supp}({2^t},{{\mathcal D}}) \). By injecting this result into Definition 3, we conclude that Property 1 is right. \(\square \)
Beyond convergence, the interest of this approach is the speed of convergence far superior to that of the \(\sigma \)exhaustive frequent pattern outlier factor as shown in the experimental study (see Sect. 6). This speed is accompanied by a good efficiency due to a reasonable complexity of pattern sampling:
Property 2
(Complexity) A ksampling FPOF of all transactions can be calculated in time \(O( k \times {{\mathcal I}} \times {{\mathcal D}})\).
Proof
Pattern sampling according to frequency is performed in time \(O({{\mathcal I}} \times {{\mathcal D}} + k ({{\mathcal I}} + \ln {{\mathcal D}}))\) [6] and the FPOF calculation for all transactions consists in finding the transactions containing each sampled pattern. Thus, it is calculated in time \(O(k \times {{\mathcal I}} \times {{\mathcal D}})\). \(\square \)
Given a number of patterns k (which is the allocated pattern budget), a ksampling FPOF is therefore effective to calculate an accurate approximation. The next section goes further by ensuring certainty of this approximation.
5.2 Bounding the error
This section shows how to provide a feedback for helping the user in his/her decision to stop the algorithm. The idea is to draw a sample and to bound the maximum error of FPOF using a statistical result known as Bennett’s inequality. This maximum error is provided to the enduser given an initial confidence. If he/she judges that the quality is sufficiently good, he/she interrupts the algorithm that returns an approximate FPOF based on the current sample. Otherwise, the sampling FPOF is refined by increasing the sample size and so on.
Property 3
Proof
Property 4
Proof
This property is a direct corollary of Property 3. For each pair of transactions t and \(t'\), we are sure that the ranking of the approximate method is correct when the lower bound of one transaction is higher than the upper bound of the other. Property 3 provides these bounds. \(\square \)
Property 4 enables us to bound the true Kendall’s tau of our approach. Unfortunately, it is not possible to estimate similar bounds about evaluation metrics that rely on the ground truth because this ground truth is obviously not known in advance by the approximate approach. For instance, it is impossible to estimate the false alarm rate or the detection rate as these measures require to know the true outliers. An outlier threshold \(\alpha \) is used in order to define these true outliers in the experimental section (see Sect. 6.2).
Properties 3 and 4 provide bounds which are used in the algorithm of the next section.
5.3 Anytime algorithm
Algorithm 1 returns, at anytime, an approximate FPOF of all transactions of the dataset \({\mathcal D}\) by guaranteeing a bounded error with confidence \(1  \delta \). Basically, the main loop (lines 2–9) is iterated until that the user interrupts the process (line 9). Lines 4–7 calculate the maximal error \(\tilde{\epsilon }\) using Property 3, and line 8 prints the current approximated bounds described in the previous section as feedback for helping the user. When the user interrupts the process, line 10 returns the ksampling FPOF with the current sampling \({\mathcal S}_{}\). Otherwise, one more pattern is drawn (line 3) and so on.
As desired in Sect. 4.3, Algorithm 1 approximates the FPOF of all transactions for a pattern budget k:
Property 5

\({{ fpof} (t,{\mathcal D})  { fpof} _{k} (t, {\mathcal D})} \le \epsilon _k\) for each transaction \(t \in {\mathcal D} \), with a confidence of \(1  \delta \) and

\(\lim _{k \rightarrow + \infty } \epsilon _k = 0\).
Proof
This property is a direct corollary of Property 3. The proposed bounds justify the above definition of error \(\epsilon _k\) and ensure that \({{ fpof} (t,{\mathcal D})  { fpof} _{k} (t, {\mathcal D})} \le \epsilon _k\) for \(t \in {\mathcal D} \) with \(1  \delta \) as confidence. Furthermore, as the bounds are refined when the budget k increases, it gives that \(\lim _{k \rightarrow + \infty } \epsilon _k = 0\). \(\square \)
Next section also provides experiments showing that \(\epsilon _{k + 1} \le \epsilon _{k}\) even if it is not possible to formally prove this result due to the empirical variance that may increase.
6 Experimental study
The goal of this paper is not to define a new outlier detection factor, but to improve the computing of FPOF that is well established. For this reason, we do not provide new experiments showing the interest and the limits of FPOF for detecting outliers as this aspect is already detailed in the literature (see related work in Sect. 2). Experiments exclusively focus on the study of the quality of the approximate FPOF provided by our samplingbased anytime algorithm in comparison with the exact FPOF used as reference. The exact FPOF is computed by the polynomial method described in Sect. 4.1.
Performance issue of pattern sampling
\({\mathcal D} \)  \({{\mathcal D}} \)  \({{\mathcal I}} \)  Avg. number of patt. per sec. 

chess  3196  75  29.0k 
connect  67,557  129  1.2k 
hepatic  155  45  219.3k 
german  1000  76  78.6k 
mushroom  8124  119  17.9k 
pumsb  49,096  7117  1.7k 
retail  88,162  16,470  1.5k 
sick  2800  58  29.4k 
6.1 Anytime approximation vs the stateoftheart approximation

Baseline: This method relies on \(\sigma \)Exhaustive FPOF (see Definition 2) where \(\sigma \) is defined for considering the set of topk frequent patterns.

Samplingbased method: This method draws k patterns according to frequency and then approximates the exact FPOF based on the formula of Definition 3.
Accuracy To assess the speed of the convergence, we consider the increase of the Kendall’s tau and the decrease of the true error. As expected, the two approximate methods converge to the exact FPOF when the pattern budget increases, but the convergence of the samplingbased anytime method is smoother and faster. Indeed, while the FPOF error of the baseline may increase by considering more patterns (see german or mushroom in Fig. 3, for instance), the higher the pattern budget k, the better the approximation of the samplingbased method.
In certain datasets, when the pattern budget is small, the baseline is more effective considering the Kendall’s tau especially (e.g., hepatic or german). As it considers the most frequent patterns first (in particular, items), it tends to cover more rapidly the entire dataset. It would be appropriate to propose a hybrid method where items are considered before using sampling.
Certainty Only the samplingbased method provides guarantees on the approximate FPOF computed at anytime for helping the enduser to interrupt the algorithm and to analyze the result. In Fig. 3, we observe that the lower bound of the Kendall’s tau is quite pessimistic (i.e., it is always much lower than the true Kendall’s tau). Similarly, the true average error per transaction of the approximate method is lower than the estimated one (see Fig. 3). This difference results from the Bennett’s inequality that makes no assumption about the distribution. It is also interesting to note that for 2 datasets (i.e., for chessand connect), the average error per transaction of the baseline is always above the estimated error. It means that the use of the most frequent itemsets is a worse strategy than a random uniform sampling. Conversely, the sampling strategy based on frequency has higher results, and in addition, this method offers some guarantees on the certainty of the approximation.
Stability To measure the stability of the samplingbased anytime method, we consider the confidence intervals of the Kendall’s tau and the average error per transaction in Figs. 2 and 3. Of course, the smaller the confidence interval, the better the result. Although the samplingbased method is not deterministic, the obtained results are really stable. For certain datasets (e.g., german or mushroom), the instability increases in a first phase and then gradually dwindles in a second phase. The first phase is the progressive coverage of all transactions by at least one pattern that brings instability (the approximate FPOF goes from 0 (no approximation) to 1 (first rough approximation). In the second phase, the newly drawn patterns refine preliminary approximations.
6.2 ROC analysis of anytime approximation
Confusion matrix of four possible outcomes of a prediction
Predicted outliers  Predicted normal  

\({ fpof} _{k} (t,{\mathcal D}) \le \beta \)  \({ fpof} _{k} (t,{\mathcal D}) > \beta \)  
Outliers  True positive  False negative 
\({ fpof} (t,{\mathcal D}) \le \alpha \)  (TP)  (FN) 
Normal  False positive  False negative 
\({ fpof} (t,{\mathcal D}) > \alpha \)  (FP)  (TN) 
For a budget of 10k patterns, Fig. 4 reports the receiver operating characteristic (ROC) curves of the samplingbased method by varying the minimal FPOF threshold \(\beta \) for different ground truths \(\alpha \in \{0.1, 0.2, 0.3\}\). Note that there is no outlier for \(\alpha = 0.1\) in german.
Whatever the choice of the threshold \(\alpha \) that determines the true outliers, the samplingbased approximation works well overall. The method tends to quickly isolate outliers (i.e., the detection rate increases very quickly when the false alarm rate is low). We even see it isolates even better the outliers when they are very few (i.e., with the lowest value of \(\alpha \), here 0.1).
7 Conclusion and discussion
We revisited the FPOF calculation with an anytime constraint by benefiting from the recent advances in pattern sampling. Our approximate method using a sampling technique outperforms exhaustive method based on the most frequent patterns. It also provides additional guarantees on the result with a maximum bound on the error using the Bennett’s inequality. The experiments have shown the interest of this approach in terms of accuracy (fast and smooth convergence to the exact FPOF), certainty (reasonable estimated error) and stability (good reproducibility of approximations) compared to the usual exhaustive approach where the most frequent patterns are mined.
Despite the challenge of anytime constraint, our proposal therefore combines the proven power of patternbased methods by adding a guarantee on the quality of results thanks to sampling techniques. Of course, there is still room for improvement in particular the approach could take into account the frequent items to have a more reliable approximation at the very beginning. But, as FPOF has disadvantages, it would be interesting to apply this approach with other outlier detection methods dedicated to categorical data. For patternbased methods, a similar design based on sampling according to frequency can be exploited. For other methods, it is really less natural to determine which space should be sampled for achieving an approximation. However, we also think our samplingbased anytime approach can be generalized to other measures involving patterns (e.g., CPCQ index [22]) or patternbased models (e.g., CBA [21]). We would also like to adapt this approach to integrate the user feedback. In the case of FPOF, it consists in showing the transactions considered as the most probable outliers to the user at the very beginning of the process. By confirming or not that the shown transactions are outliers, the sampling process should focus its effort on other less known transactions.
Footnotes
Notes
Acknowledgments
This work has been partially supported by the Prefute project, PEPS 2016, CNRS.
References
 1.Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: International Conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)Google Scholar
 2.Akoglu, L., Tong, H., Vreeken, J., Faloutsos, C.: Fast and reliable anomaly detection in categorical data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 415–424. ACM (2012)Google Scholar
 3.Assent, I., Kranen, P., Baldauf, C., Seidl, T.: Anyout: anytime outlier detection on streaming data. In: Proceedings of the 17th International Conference on Database Systems for Advanced Applications, vol. Part I, DASFAA’12, pp. 228–242. Springer, Berlin (2012)Google Scholar
 4.Boddy, M., Dean, T.L.: Deliberation scheduling for problem solving in timeconstrained environments. Artif. Intell. 67(2), 245–285 (1994)CrossRefzbMATHGoogle Scholar
 5.Böhm, C., Feng, J., He, X., Mai, S.T.: Efficient anytime densitybased clustering. In: Proceedings of the 13th SIAM International Conference on Data Mining, May 2–4, 2013. Austin, Texas, USA, pp. 112–120. SIAM (2013)Google Scholar
 6.Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient twostep random procedures. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 582–590 (2011)Google Scholar
 7.Cao, L., Ou, Y., Yu, P.S., Wei, G.: Detecting abnormal coupled sequences and sequence changes in groupbased manipulative trading behaviors. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 85–94. ACM (2010)Google Scholar
 8.Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J.: ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat. Anal. Data Min. 1(2), 67–84 (2008)MathSciNetCrossRefGoogle Scholar
 9.Das, K., Schneider, J.: Detecting anomalous records in categorical datasets. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 220–229. ACM (2007)Google Scholar
 10.Durand, N., Crémilleux, B.: ECCLAT: a new approach of clusters discovery in categorical data. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XIX, pp. 177–190. Springer, London (2003)Google Scholar
 11.Esmeir, S., Markovitch, S.: Anytime learning of anycost classifiers. Mach. Learn. 82(3), 445–473 (2011)MathSciNetCrossRefGoogle Scholar
 12.Giacometti, A., Li, D.H., Marcel, P., Soulet, A.: 20 years of pattern mining: a bibliometric survey. ACM SIGKDD Explor. Newsl. 15(1), 41–50 (2014)CrossRefGoogle Scholar
 13.Giacometti, A., Li, D.H., Soulet, A.: Balancing the analysis of frequent patterns. In: Tseng, V.S., Ho, T.B., Zhou, Z.H., Chen, A.L.P., Kao, H.Y. (eds) Advances in Knowledge Discovery and Data Mining, pp. 53–64. Springer (2014)Google Scholar
 14.Giacometti, A., Soulet, A.: Frequent pattern outlier detection without exhaustive mining. In: PacificAsia Conference on Knowledge Discovery and Data Mining, pp. 196–207. Springer (2016)Google Scholar
 15.Hasan, M.A., Zaki, M.J.: Output space sampling for graph patterns. PVLDB 2(1), 730–741 (2009)Google Scholar
 16.Hassani, M., Kranen, P., Saini, R., Seidl, T.: Subspace anytime stream clustering. In: Proceedings of the 26th International Conference on Scientific and Statistical Database Management, SSDBM ’14, pp. 37:1–37:4, New York, NY, USA. ACM (2014)Google Scholar
 17.Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, Netherlands (1980)Google Scholar
 18.He, Z., Xu, X., Huang, Z.J., Deng, S.: FPoutlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. 2(1), 103–118 (2005)CrossRefGoogle Scholar
 19.Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the lego approach to data mining. In: From Local Patterns to Global Models: Proceedings of the ECML PKDD 2008 Workshop, pp. 1–16 (2008)Google Scholar
 20.Koufakou, A., Secretan, J., Georgiopoulos, M.: Nonderivable itemsets for fast outlier detection in large highdimensional categorical data. Knowl. Inf. Syst. 29(3), 697–725 (2011)CrossRefGoogle Scholar
 21.Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: International Conference on Knowledge Discovery and Data Mining (1998)Google Scholar
 22.Liu, Q., Dong, G.: CPCQ: contrast pattern based clustering quality index for categorical data. Pattern Recogn. 45(4), 1739–1748 (2012)CrossRefGoogle Scholar
 23.Malone, B., Yuan, C.: A depthfirst branch and bound algorithm for learning optimal bayesian networks. In: Revised Selected Papers of the Third International Workshop on Graph Structures for Knowledge Representation and Reasoning, vol. 8323, pp. 111–122, New York, NY, USA. Springer, New York (2014)Google Scholar
 24.Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997)CrossRefGoogle Scholar
 25.Moens, S., Boley, M.: Instant exceptional model mining using weighted controlled pattern sampling. In: IDA, pp. 203–214 (2014)Google Scholar
 26.Moens, S., Boley, M., Goethals, B.: Providing concise database covers instantly by recursive tile sampling. In: International Conference on Discovery Science, pp. 216–227. Springer (2014)Google Scholar
 27.Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixedattribute data sets. Data Min. Knowl. Discov. 12(2), 203–228 (2006)Google Scholar
 28.Pang, G., Cao, L., Chen, L.: Outlier detection in complex categorical data by modelling the feature value couplings. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, vol. 2016, pp. 9–15 (2016)Google Scholar
 29.Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001)CrossRefzbMATHGoogle Scholar
 30.Vadlamudi, S.G., Chakrabarti, P.P., Sarkar, S.: Anytime algorithms for mining groups with maximum coverage. In: Proceedings of the Tenth Australasian Data Mining Conference, vol. 134, AusDM ’12, pp. 209–219, Darlinghurst, Australia. Australian Computer Society, Inc (2012)Google Scholar
 31.van Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, pp. 169–182. Springer, Berlin, Heidelberg (2014)Google Scholar
 32.Zilberstein, S., Russell, S.: Optimal composition of realtime systems. Artif. Intell. 82(1), 181–213 (1996)MathSciNetCrossRefGoogle Scholar