Abstract
The main drawbacks of sequential pattern mining have been its lack of focus on user expectations and the high number of discovered patterns. However, the solution commonly accepted – the use of constraints – approximates the mining process to a verification of what are the frequent patterns among the specified ones, instead of the discovery of unknown and unexpected patterns.
In this paper, we propose a new methodology to mine sequential patterns, keeping the focus on user expectations, without compromising the discovery of unknown patterns. Our methodology is based on the use of constraint relaxations, and it consists on using them to filter accepted patterns during the mining process. We propose a hierarchy of relaxations, applied to constraints expressed as context-free languages, classifying the existing relaxations (legal, valid and naïve, previously proposed), and proposing several new classes of relaxations. The new classes range from the approx and non-accepted, to the composition of different types of relaxations, like the approx-legal or the non-prefix-valid relaxations. Finally, we present a case study that shows the results achieved with the application of this methodology to the analysis of the curricular sequences of computer science students.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Antunes, C., Oliveira, A.L.: Inference of Sequential Association Rules Guided by Context-Free Grammars. In: Int. Conf. Grammatical Inference, pp. 1–13. Springer, Heidelberg (2002)
Antunes, C., Oliveira, A.L.: Sequential Pattern Mining with Approximated Constraints. In: Int. Conf. Applied Computing, IADIS, pp. 131–138 (2004)
Garofalakis, M., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraint. In: Int. Conf. Very Large Databases, pp. 223–234. Morgan Kaufmann, San Francisco (1999)
Hilderman, R., Hamilton, H.: Knowledge discovery and interestingness measures: a survey, Technical Report CS 99-04, Dep. Computer Science, University of Regina (1999)
Hipp, J., Güntzer, U.: Is pushing constraints deeply into the mining algorithms really what we want? SIGKDD Explorations 4(1), 50–55 (2002)
Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Reading (1979)
Kum, H.-C., Pei, J., Wang, W., Duncan, D.: ApproxMAP: Approximate Mining of Consensus Sequential Patterns. In: Int. Conf. on Data Mining. IEEE, Los Alamitos (2003)
Levenshtein, V.: Binary Codes capable of correcting spurious insertions and deletions of ones. In: Problems of Information Transmission, pp. 8–17. Kluwer, Dordrecht (1965)
Pei, J., Han, J., et al.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Int. Conf. Data Engineering, pp. 215–226. IEEE, Los Alamitos (2001)
Pei, J., Han, J., Wang, W.: Mining Sequential Patterns with Constraints in Large Databases. In: Conf Information and Knowledge Management, pp. 18–25. ACM, New York (2002)
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Int. Conf Extending Database Technology, pp. 3–17. Springer, Heidelberg (1996)
Srikant, R., Agrawal, R.: Mining association rules with item constraints. In: Int. Conf. Knowledge Discovery and Data Mining, pp. 67–73. ACM, New York (1997)
Zaki, M.: Efficient Enumeration of Frequent Sequences. In: Int. Conf. Information and Knowledge Management, pp. 68–75. ACM, New York (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Antunes, C., Oliveira, A.L. (2005). Constraint Relaxations for Discovering Unknown Sequential Patterns. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-31841-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25082-1
Online ISBN: 978-3-540-31841-5
eBook Packages: Computer ScienceComputer Science (R0)