Abstract
Most of the current algorithms for mining frequent patterns assume that two object subdescriptions are similar if they are equal, but in many real-world problems some other ways to evaluate the similarity are used. Recently, three algorithms (ObjectMiner, STreeDC-Miner and STreeNDC-Miner) for mining frequent patterns allowing similarity functions different from the equality have been proposed. For searching frequent patterns, ObjectMiner and STreeDC-Miner use a pruning property called Downward Closure property, which should be held by the similarity function. For similarity functions that do not meet this property, the STreeNDC-Miner algorithm was proposed. However, for searching frequent patterns, this algorithm explores all subsets of features, which could be very expensive. In this work, we propose a frequent similar pattern mining algorithm for similarity functions that do not meet the Downward Closure property, which is faster than STreeNDC-Miner and loses fewer frequent similar patterns than ObjectMiner and STreeDC-Miner. Also we show the quality of the set of frequent similar patterns computed by our algorithm with respect to the quality of the set of frequent similar patterns computed by the other algorithms, in a supervised classification context.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM-SIGMOD international conference management of data, pp 94–105
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 1995 international conference on data engineering, pp 3–14
Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16: 1–27
Dánger R, Ruiz-Shulcloper J, Berlanga R (2004) Objectminer: a new approach for mining complex objects. In: Proceedings of the sixth international conference on enterprise information systems, pp 42–47
Gómez J, Rodríguez O, Valladares S, Ruiz-Shulcloper J et al (1994) Prognostic of gas-oil deposits in the Cuban Ophiological Association. Applying mathematical modeling. Geophys Int 33: 447–467
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15: 55–86
Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings of the 1999 international conference data on engineering, pp 106–115
Iváncsy R, Vajk I (2006) Frequent pattern mining in web log data. Acta Polytechnica Hungarica. J Appl Sci Bp 1: 77–90
Kelil A, Wang S, Jiang Q, Brzezinski R (2009) A general measure of similarity for categorical sequences. Knowl Inf Syst. doi:10.1007/s10115-009-0237-8
LaRosa C, Xiong L, Mandelberg K (2008) Frequent pattern mining for kernel trace data. In: Proceedings of the 2008 ACM symposium on applied computing, pp 880–885
Li J, Fu AW, Fahey P (2009) Efficient discovery of risk patterns in medical data. Artif Intell Med 45: 77–89
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 1998 international conference on knowledge discovery and data mining, pp 80–86
Lopez FJ, Blanco A, Garcia F, Cano C, Marin A (2008) Fuzzy association rules for biological data analysis: a case study on yeast. BMC Bioinform 9: 107
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1: 259–289
Martínez-Trinidad JF, Ruiz-Shulcloper J, Lazo-Cortés MS (2000) Structuralization of universes. Fuzzy Sets Syst 112: 485–500
Ortiz-Posadas MR, Vega-Alvarado L, Toni B (2004) A similarity function to evaluate the orthodontic condition in patients with cleft lip and palate. Med Hypotheses 63: 35–41
Ortiz-Posadas MR, Vega-Alvarado L, Toni B (2009) A mathematical function to evaluate surgical complexity of cleft lip and palate. Comput Methods Prog Biomed 94: 232–238
Quan X, Liu G, Lu Z, Ni X, Wenyin L (2009) Short text similarity based on probabilistic topics. Knowl Inf Syst. doi:10.1007/s10115-009-0250-y
Rodríguez-González AY, Martínez-Trinidad JF, Carrasco-Ochoa JA, Ruiz-Shulcloper J (2008) Mining frequent similar patterns on mixed data. In: Ruiz-Shulcloper J, Kropatsch W (ed) Progress in pattern recognition, image analysis and applications, LNCS 5197, Springer, Berlin, pp 136–144
Ruiz-Shulcloper J, Fuentes-Rodrguez A (1981) A cybernetic model to analyze juvenile delinquency. Revista Ciencias Matemáticas 2: 123–153
Silverstein C, Brin S, Motwani R, Ullman J (1998) Scalable techniques for mining causal structures. In: Proceedings of the 1998 international conference on very large data bases, pp 594–605
Wan X (2006) Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowl Inf Syst 15: 55–73
Yang J, Cheungand WK, Chen X (2009) Learning element similarity matrix for semi-structured document analysis. Knowl Inf Syst 19: 53–78
Zhang M, Kao B, Cheung DW, Yip KY (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data 1: 7
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rodríguez-González, A.Y., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A. et al. RP-Miner: a relaxed prune algorithm for frequent similar pattern mining. Knowl Inf Syst 27, 451–471 (2011). https://doi.org/10.1007/s10115-010-0309-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0309-9