Abstract
Emerging patterns (EPs) are useful knowledge patterns with many applications. In recent studies on bio-medical profiling data, we have successfully used such patterns to solve difficult cancer diagnosis problems and produced higher classification accuracy when compared to alternative methods. However, the discovery of EPs is a challenging and computationally expensive problem.
In this paper, we study how to incrementally modify and maintain the concise boundary descriptions of the space of all emerging patterns when small changes occur to the data. As EP spaces are convex, the maintenance on the bounds guarantees that no desired patterns are lost. We introduce algorithms to handle four types of changes: insertion of new data, deletion of old data, addition of new attributes, and deletion of old attributes. We compare these incremental algorithms, on six benchmark data sets, against an efficient algorithm that computes from scratch. The results show that the incremental algorithms are much faster than the From-Scratch method, often with tremendous speed-up rates.
Similar content being viewed by others
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998. Topic detection and tracking pilot study: Final report, in Proc. of the DARPA Broadcast News Transcription and understandingWorkshop, pp. 194–218.
Barnett,V. and Lewis, T. 1994. Outliers in Statistical Data, John Wiley & Sons.
Bonchi, F., Giannotti, F., Mainetto, G., and Pedeschi, D. 1999. A classification-based methodology for planning audit strategies in fraud detection. in Proc. of KDD-99, pp. 175–184.
Burge, P. and Shawe-Taylor, J. 1997. Detecting cellular fraud using adaptive prototypes. in Proc. of AI Approaches to Fraud Detection and Risk Management, pp. 9–13.
Chan, P. and Stolfo, S. 1998. Toward scalable learning with non-uniform class and cost-distributions: A case study in credit card fraud detection. in Proc. of KDD-98, AAAI-Press, pp. 164–168.
Cover, T. and Thomas, J.A. 1991. Elements of Information Theory. Wiley-International.
Dempster, A.P., Laird, N.M., and Ribin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1) pp: 1–38.
Fawcett, T. and Provost, F. 1997. Combining data mining and machine learning for effective fraud detection. in Proc. of AI Approaches to Fraud Detection and Risk Management, pp. 14–19.
Fawcett, T. and Provost, F. 1999. Activity monitoring: Noticing interesting changes in behavior. in Proc. of KDD-99, pp. 53–62.
Grabec, I. 1990. Self-organization of Neurons described by the maximum-entropy principle, Biological Cybernetics, 63: 403–409.
Guralnik, V. and Srivastava, J. 1999. Event detection from time series data. in Proc. KDD-99, pp. 33–42.
Hawkins, D.M. 1980. Identification of Outliers. Chapman and Hall, London.
Hunt, L.A. and Jorgensen, M.A. 1999. Mixture model clustering: A brief introduction to the MULTMIX program, Australian & New Zealand Journal of Statistics, 40: 153–171.
Knorr, E.M. and Ng, R.T. 1998. Algorithms for mining distance-based outliers in large datasets. in Proc. of the 24th VLDB Conference, pp. 392–403.
Knorr, E.M. and Ng, R.T. 1999. Finding intensional knowledge of distance-based outliers. in Proc. of the 25th VLDB Conference, pp. 211–222.
Krichevskii, R.E. and Trofimov, V.K. 1981. The performance of universal coding. IEEE Trans. Inform. Theory, IT-27(2):199–207.
Lane, T. and Brodley, C. 1998. Approaches to on-line learning and concept drift for user identification in computer security. in Proc. of KDD-98, AAAI Press, pp. 66–72.
Lee, W., Stolfo, S.J., and Mok, K.W. 1998. Mining audit data to build intrusion detection models. in Proc. of KDD-98.
Lee, W., Stolfo, S.J., and Mok, K.W. 1999. Mining in a data-flow environment: Experience in network intrusion detection. in Proc. of KDD-99, pp. 114–124.
Marron, J.S. and Wand, M.P. 1992. Exact mean integrated squared error. Annals of Statistics, 20: 712–736.
McLachlan, G. and Peel, D. 2000. Finite Mixture Models. Wiley Series in Probability and Statistics, John Wiley and Sons.
Moreau, Y. and Vandewalle, J. Detection of mobile phone fraud using supervised neural networks:Afirst prototype, Available via: ftp://ftp.esat.kuleuven.ac.jp/pub/SISTA/moreau/reports/icann97 TR97–44.ps.
Neal, R.M. and Hinton, G.E. 1993. A view of the EM algorithm that justifies incremental, sparse, and other variants, ftp://ftp.cs.toronto.edu/pub/radford/www/publications.html
Ng, S.K. and McLachlan, G.J. 2002. On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Statistics & Computing. In press. Available at http://www.maths.uq.edu.au/gim/increm.ps
Rocke, D.M. 1996. Robustness properties of S-estimators of multivariate location and shape in high dimension. Annals of Statistics, 24(3): 1327–1345.
Rosset, S., Murad, U., Neumann, E., Idan,Y., and Pinkas, G. 1999. Discovery of fraud rules for telecommunicationschallenges and solutions. in Proc. of KDD-99, pp. 409–413.
Williams, G.J. and Huang, Z. 1997. Mining the knowledge mine: The hot spots methodology for mining large real world databases. in Advanced Topics in Artificial Intelligence Lecture Notes in Artificial Intelligence, volume 1342, Springer-Verlag, pp. 340–348.
Yamanishi, K., Takeuchi, J., Williams, G., and Milne, P. 2000. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. in Proc. of KDD2000, ACM Press, pp. 250–254.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Li, J., Manoukian, T., Dong, G. et al. Incremental Maintenance on the Border of the Space of Emerging Patterns. Data Min Knowl Disc 9, 89–116 (2004). https://doi.org/10.1023/B:DAMI.0000026901.85057.58
Issue Date:
DOI: https://doi.org/10.1023/B:DAMI.0000026901.85057.58