Noise Sensitivity of an Information Granules Filtering Procedure by Genetic Optimization for Inexact Sequential Pattern Mining

Maiorino, Enrico; Possemato, Francesca; Modugno, Valerio; Rizzi, Antonello

doi:10.1007/978-3-319-26393-9_9

Enrico Maiorino⁸,
Francesca Possemato⁸,
Valerio Modugno⁹ &
…
Antonello Rizzi⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 620))

Included in the following conference series:

International Joint Conference on Computational Intelligence

518 Accesses
6 Citations

Abstract

One of the most essential challenges in Data Mining and Knowledge Discovery is the development of effective tools able to find regularities in data. In order to highlight and to extract interesting knowledge from the data at hand, a key problem is frequent pattern mining, i.e. to discover frequent substructures hidden in the available data. In many interesting application fields, data are often represented and stored as sequences over time or space of generic objects. Due to the presence of noise and uncertainties in data, searching for frequent subsequences must employ approximate matching techniques, such as edit distances. A common procedure to identify recurrent patterns in noisy data is based on clustering algorithms relying on some edit distance between subsequences. However, this plain approach can produce many spurious patterns due to multiple pattern matchings on close positions in the same sequence excerpt. In this paper, we present a method to overcome this drawback by applying an optimization-based step lter that identifies the most descriptive patterns among those found by the clustering process, and allows to return more compact and easily interpretable clusters. We evaluate the mining systems performances on synthetic data in two separate cases, corresponding respectively to two different (simulated) sources of noise. In both cases, our method performs well in retrieving the original patterns with acceptable information loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The subscript “e” stands for “extraction” as in extraction step.

References

Possemato, F., Rizzi, A.: Automatic text categorization by a granular computing approach: facing unbalanced data sets. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)
Google Scholar
Modugno, V., Possemato, F., Rizzi, A.: Combining piecewise linear regression and a granular computing framework for financial time series classification (2014)
Google Scholar
Bianchi, F., Livi, L., Rizzi, A., Sadeghian, A.: A granular computing approach to the design of optimized graph classification systems. Soft Comput. 18, 393–412 (2014)
Article Google Scholar
Bianchi, F.M., Scardapane, S., Livi, L., Uncini, A., Rizzi, A.: An interpretable graph-based image classifier. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 2339–2346. IEEE (2014)
Google Scholar
Rizzi, A., Del Vescovo, G.: Automatic image classification by a granular computing approach. In: Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, pp. 33–38 (2006)
Google Scholar
Del Vescovo, G., Rizzi, A.: Automatic classification of graphs by symbolic histograms. In: IEEE International Conference on Granular Computing. GRC 2007, pp. 410–410 (2007)
Google Scholar
Del Vescovo, G., Rizzi, A.: Online handwriting recognition by the symbolic histograms approach. In: IEEE International Conference on Granular Computing. GRC 2007, pp. 686–686 (2007)
Google Scholar
Bargiela, A., Pedrycz, W.: Granular Computing: An Introduction. Springer (2003)
Google Scholar
Livi, L., Rizzi, A., Sadeghian, A.: Granular modeling and computing approaches for intelligent analysis of non-geometric data. Appl. Soft Comput. 27, 567–574 (2015)
Article Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)
Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)
Book Google Scholar
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. Springer (1996)
Google Scholar
Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42, 31–60 (2001)
Google Scholar
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.C.: Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 355–359 (2000)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), IEEE Computer Society, pp. 0215–0215 (2001)
Google Scholar
Sinha, S., Tompa, M.: YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588 (2003)
Article Google Scholar
Pavesi, G., Mereghetti, P., Mauri, G., Pesole, G.: Weeder web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32, W199–W203 (2004)
Article Google Scholar
Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in dna sequences. Bioinformatics 18, S354–S363 (2002)
Article Google Scholar
Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9, 225–242 (2002)
Article Google Scholar
Zhu, F., Yan, X., Han, J., Yu, P.S.: Efficient discovery of frequent approximate sequential patterns. In: Seventh IEEE International Conference on Data Mining. ICDM 2007, pp. 751–756. IEEE (2007)
Google Scholar
Ji, X., Bailey, J.: An efficient technique for mining approximately frequent substring patterns. In: Seventh IEEE International Conference on Data Mining Workshops. ICDM Workshops 2007, pp. 325–330. IEEE (2007)
Google Scholar
Rizzi, A., Possemato, F., Livi, L., Sebastiani, A., Giuliani, A., Mascioli, F.M.F.: A dissimilarity-based classifier for generalized sequences by a granular computing approach. In: IJCNN, IEEE, pp. 1–8 (2013)
Google Scholar
Zhu, F., Yan, X., Han, J., Yu, P.S.: Efficient discovery of frequent approximate sequential patterns. In: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, Washington, DC, USA, IEEE Computer Society, pp. 751–756 (2007)
Google Scholar
Ji, X., Bailey, J.: An efficient technique for mining approximately frequent substring patterns. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops. ICDMW ’07, Washington, DC, USA, IEEE Computer Society, pp. 325–330 (2007)
Google Scholar
Fu, A.W.C., Keogh, E., Lau, L.Y., Ratanamahatana, C.A., Wong, R.C.W.: Scaling and time warping in time series querying. VLDB J. Int. J. Very Large Data Bases 17, 899–921 (2008)
Article Google Scholar
Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional trajectories. In: 18th International Conference on Data Engineering. Proceedings. IEEE, pp. 673–684 (2002)
Google Scholar
Patel, P., Keogh, E., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. In: 2002 IEEE International Conference on Data Mining. ICDM 2003. Proceedings. IEEE, pp. 370–377 (2002)
Google Scholar
Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 493–498 (2003)
Google Scholar
Floratou, A., Tata, S., Patel, J.M.: Efficient and accurate discovery of patterns in sequence data sets. IEEE Trans. Knowl. Data Eng. 23, 1154–1168 (2011)
Article Google Scholar
Matsui, T., Uno, T., Umemori, J., Koide, T.: A new approach to string pattern mining with approximate match. In: Discovery Science, pp. 110–125. Springer (2013)
Google Scholar
Maiorino, E., Possemato, F., Modugno, V., Rizzi, A.: Information granules filtering for inexact sequential pattern mining by evolutionary computation (2014)
Google Scholar
Rizzi, A., Del Vescovo, G., Livi, L., Frattale Mascioli, F.M.: A new granular computing approach for sequences representation and classification. In: Proceedings of the 2012 International Joint Conference on Neural Networks, pp. 2268–2275 (2012)
Google Scholar
Del Vescovo, G., Livi, L., Frattale Mascioli, M., Rizzi, A.: On the problem of modeling structured data with the minsod representative. Int. J. Comput. Theory Eng. 6, 9–14 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, Electronics and Telecommunications (DIET), SAPIENZA University of Rome, Via Eudossiana 18, 00184, Rome, Italy
Enrico Maiorino, Francesca Possemato & Antonello Rizzi
Dipartimento di Ingegneria Informatica, Automatica e Gestionale (DIAG), SAPIENZA University of Rome, Via Ariosto 25, 00185, Rome, Italy
Valerio Modugno

Authors

Enrico Maiorino
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Possemato
View author publications
You can also search for this author in PubMed Google Scholar
Valerio Modugno
View author publications
You can also search for this author in PubMed Google Scholar
Antonello Rizzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enrico Maiorino .

Editor information

Editors and Affiliations

Ingenería Informática, Escuela Técnica Superior de, Granada, Spain
Juan Julian Merelo
aSEEB-ISR-IST, Technical University of Lisbon (IST), Lisbon, Portugal
Agostinho Rosa
Facultad de Informática, University of Murcia, Murcia, Spain
José M. Cadenas
University of Coimbra, Coimbra, Portugal
António Dourado
Images, Signals and Intelligence, University PARIS-EST Créteil (UPEC), Créteil, France
Kurosh Madani
Instituto Politécnico de Setúbal (IPS), Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maiorino, E., Possemato, F., Modugno, V., Rizzi, A. (2016). Noise Sensitivity of an Information Granules Filtering Procedure by Genetic Optimization for Inexact Sequential Pattern Mining. In: Merelo, J.J., Rosa, A., Cadenas, J.M., Dourado, A., Madani, K., Filipe, J. (eds) Computational Intelligence. IJCCI 2014. Studies in Computational Intelligence, vol 620. Springer, Cham. https://doi.org/10.1007/978-3-319-26393-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-26393-9_9
Published: 25 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26391-5
Online ISBN: 978-3-319-26393-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics