Section-Wise Similarities for Clustering and Outlier Detection of Subjective Sequential Data

Siordia, Oscar S.; de Diego, Isaac Martín; Conde, Cristina; Cabello, Enrique

doi:10.1007/978-3-642-24471-1_5

Oscar S. Siordia¹⁸,
Isaac Martín de Diego¹⁸,
Cristina Conde¹⁸ &
…
Enrique Cabello¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7005))

Included in the following conference series:

International Workshop on Similarity-Based Pattern Recognition

792 Accesses
5 Citations

Abstract

In this paper, a novelty methodology for the representation and similarity measurement of sequential data is presented. First, a linear segmentation algorithm based on feature points is proposed. Then, two similarity measures are defined from the differences between the behavior and the mean level of the sequential data. These similarities are calculated for clustering and outlier detection of subjective sequential data generated through the evaluation of the driving risk obtained from a group of traffic safety experts. Finally, a novel dissimilarity measure for outlier detection of paired sequential data is proposed. The results of the experiments show that both similarities contain complementary and relevant information about the dataset. The methodology results useful to find patterns on subjective data related with the behavior and the level of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730. Springer, Heidelberg (1993)
Google Scholar
Chan, K., Fu, W.: Efficient time series matching by wavelets. In: Proceedings of the 15th IEEE International Conference on Data Engineering (1999)
Google Scholar
Perng, C., Wang, H., Zhang, S., Parker, S.: Landmarks: a new model for similarity-based pattern querying in time series databases. In: Proceedings of the 15th IEEE International Conference on Data Engineering (2000)
Google Scholar
Keogh, E., Pazzani, M.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: KDD, pp. 239–243 (1998)
Google Scholar
Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., Allan, J.: Mining of concurrent text and time series. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 37–44 (2000)
Google Scholar
Park, S., Kim, S.W., Chu, W.W.: Segment-based approach for subsequence searches in sequence databases. In: Proceedings of the 16th ACM Symposium on Applied Computing (2001)
Google Scholar
Wang, C., Wang, S.: Supporting content-based searches on time series via approximation. In: Proceedings of the 12th International Conference on Scientific and Statistical Database Management (2000)
Google Scholar
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE, 257–286 (1989)
Google Scholar
García-García, D., Parrado-Hernandez, E., Díaz-de-Maria, F.: Anderson-darling: A goodness of fit test for small samples assumptions. P. Recognition 44, 1014–1022
Google Scholar
Panuccio, A., Bicego, M., Murino, V.: A hidden markov model-based approach to sequential data clustering. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 734–742. Springer, Heidelberg (2002)
Chapter Google Scholar
Brazalez, A., et al.: CABINTEC: Cabina inteligente para el transporte por carretera. In: Proc. of the Congreso Español de Sistemas Inteligentes de Transporte (2008)
Google Scholar
Siordia, O.S., Martín, I., Conde, C., Reyes, G., Cabello, E.: Driving risk classification based on experts evaluation. In: Proceedings of the 2010 IEEE Intelligent Vehicles Symposium (IV 2010), San Diego, CA, pp. 1098–1103 (2010)
Google Scholar
Cork, R.C., Isaac, I., Elsharydah, A., Saleemi, S., Zavisca, F., Alexander, L.: A comparison of the verbal rating scale and the visual analog scale for pain assessment. Technical Report 1, Int. Journal of Anesthesiology (2004)
Google Scholar
Keogh, E., Chu, S., Hart, D., Pazzani M.: Segmenting time series: A survey and novel approach. In: Data Mining in Time Series Databases, pp. 1–22 (1993)
Google Scholar
Lachaud, J., Vialard, A., de Vieilleville, F.: Analysis and comparative evaluation of discrete tangent estimators. In: Andrès, É., Damiand, G., Lienhardt, P. (eds.) DGCI 2005. LNCS, vol. 3429, pp. 240–251. Springer, Heidelberg (2005)
Chapter Google Scholar
Zhu, Y., Wu, D., Li, S.: A piecewise linear representation method of time series based on feature points. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part II. LNCS (LNAI), vol. 4693, pp. 1066–1072. Springer, Heidelberg (2007)
Chapter Google Scholar
Basri, R., Costa, L., Geiger, D., Jacobs, D.: Determining the similarity of deformable shapes. Vision Research 38, 135–143 (1995)
Google Scholar
Romeu, J.L.: Anderson-darling: A goodness of fit test for small samples assumptions. Selected Topics in Assurance Related Technologies 10(5), 1–6 (2003)
Google Scholar
Pękalska, E., Duin, R.P.W., Günter, S., Bunke, H.: On not making dissimilarities euclidean. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 1145–1154. Springer, Heidelberg (2004)
Chapter Google Scholar
Pekalska, E., Paclík, P., Duin, R.P.W.: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, Special Issue on Kernel Methods 2(12), 175–211 (2001)
MathSciNet MATH Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Book MATH Google Scholar
Keogh, E., Xi, X., Wei, L., Ratanamahatana, A.: The ucr time series classification/clustering (2006), http://www.cs.ucr.edu/~eamonn/time_series_data/
Ramsay, J., Silverman, B.: Functional Data Analysis, Secaucus, NJ, USA. Springer Series in Statistics (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Face Recognition and Artificial Vision Group, Universidad Rey Juan Carlos, C. Tulipán, S/N, 28934, Móstoles, España
Oscar S. Siordia, Isaac Martín de Diego, Cristina Conde & Enrique Cabello

Authors

Oscar S. Siordia
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Martín de Diego
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Conde
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Cabello
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DAIS, Università Ca’ Foscari, Via Torino 155, 30172, Venice, Italy
Marcello Pelillo
The University of York, YO1 5DD, Heslington, York, UK
Edwin R. Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siordia, O.S., de Diego, I.M., Conde, C., Cabello, E. (2011). Section-Wise Similarities for Clustering and Outlier Detection of Subjective Sequential Data. In: Pelillo, M., Hancock, E.R. (eds) Similarity-Based Pattern Recognition. SIMBAD 2011. Lecture Notes in Computer Science, vol 7005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24471-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-24471-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24470-4
Online ISBN: 978-3-642-24471-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics