Skip to main content

Section-Wise Similarities for Clustering and Outlier Detection of Subjective Sequential Data

  • Conference paper
Similarity-Based Pattern Recognition (SIMBAD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7005))

Included in the following conference series:

Abstract

In this paper, a novelty methodology for the representation and similarity measurement of sequential data is presented. First, a linear segmentation algorithm based on feature points is proposed. Then, two similarity measures are defined from the differences between the behavior and the mean level of the sequential data. These similarities are calculated for clustering and outlier detection of subjective sequential data generated through the evaluation of the driving risk obtained from a group of traffic safety experts. Finally, a novel dissimilarity measure for outlier detection of paired sequential data is proposed. The results of the experiments show that both similarities contain complementary and relevant information about the dataset. The methodology results useful to find patterns on subjective data related with the behavior and the level of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730. Springer, Heidelberg (1993)

    Google Scholar 

  2. Chan, K., Fu, W.: Efficient time series matching by wavelets. In: Proceedings of the 15th IEEE International Conference on Data Engineering (1999)

    Google Scholar 

  3. Perng, C., Wang, H., Zhang, S., Parker, S.: Landmarks: a new model for similarity-based pattern querying in time series databases. In: Proceedings of the 15th IEEE International Conference on Data Engineering (2000)

    Google Scholar 

  4. Keogh, E., Pazzani, M.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: KDD, pp. 239–243 (1998)

    Google Scholar 

  5. Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., Allan, J.: Mining of concurrent text and time series. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 37–44 (2000)

    Google Scholar 

  6. Park, S., Kim, S.W., Chu, W.W.: Segment-based approach for subsequence searches in sequence databases. In: Proceedings of the 16th ACM Symposium on Applied Computing (2001)

    Google Scholar 

  7. Wang, C., Wang, S.: Supporting content-based searches on time series via approximation. In: Proceedings of the 12th International Conference on Scientific and Statistical Database Management (2000)

    Google Scholar 

  8. Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE, 257–286 (1989)

    Google Scholar 

  9. García-García, D., Parrado-Hernandez, E., Díaz-de-Maria, F.: Anderson-darling: A goodness of fit test for small samples assumptions. P. Recognition 44, 1014–1022

    Google Scholar 

  10. Panuccio, A., Bicego, M., Murino, V.: A hidden markov model-based approach to sequential data clustering. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 734–742. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Brazalez, A., et al.: CABINTEC: Cabina inteligente para el transporte por carretera. In: Proc. of the Congreso Español de Sistemas Inteligentes de Transporte (2008)

    Google Scholar 

  12. Siordia, O.S., Martín, I., Conde, C., Reyes, G., Cabello, E.: Driving risk classification based on experts evaluation. In: Proceedings of the 2010 IEEE Intelligent Vehicles Symposium (IV 2010), San Diego, CA, pp. 1098–1103 (2010)

    Google Scholar 

  13. Cork, R.C., Isaac, I., Elsharydah, A., Saleemi, S., Zavisca, F., Alexander, L.: A comparison of the verbal rating scale and the visual analog scale for pain assessment. Technical Report 1, Int. Journal of Anesthesiology (2004)

    Google Scholar 

  14. Keogh, E., Chu, S., Hart, D., Pazzani M.: Segmenting time series: A survey and novel approach. In: Data Mining in Time Series Databases, pp. 1–22 (1993)

    Google Scholar 

  15. Lachaud, J., Vialard, A., de Vieilleville, F.: Analysis and comparative evaluation of discrete tangent estimators. In: Andrès, É., Damiand, G., Lienhardt, P. (eds.) DGCI 2005. LNCS, vol. 3429, pp. 240–251. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Zhu, Y., Wu, D., Li, S.: A piecewise linear representation method of time series based on feature points. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part II. LNCS (LNAI), vol. 4693, pp. 1066–1072. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Basri, R., Costa, L., Geiger, D., Jacobs, D.: Determining the similarity of deformable shapes. Vision Research 38, 135–143 (1995)

    Google Scholar 

  18. Romeu, J.L.: Anderson-darling: A goodness of fit test for small samples assumptions. Selected Topics in Assurance Related Technologies 10(5), 1–6 (2003)

    Google Scholar 

  19. Pękalska, E., Duin, R.P.W., Günter, S., Bunke, H.: On not making dissimilarities euclidean. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 1145–1154. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Pekalska, E., Paclík, P., Duin, R.P.W.: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, Special Issue on Kernel Methods 2(12), 175–211 (2001)

    MathSciNet  MATH  Google Scholar 

  21. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Book  MATH  Google Scholar 

  22. Keogh, E., Xi, X., Wei, L., Ratanamahatana, A.: The ucr time series classification/clustering (2006), http://www.cs.ucr.edu/~eamonn/time_series_data/

  23. Ramsay, J., Silverman, B.: Functional Data Analysis, Secaucus, NJ, USA. Springer Series in Statistics (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Siordia, O.S., de Diego, I.M., Conde, C., Cabello, E. (2011). Section-Wise Similarities for Clustering and Outlier Detection of Subjective Sequential Data. In: Pelillo, M., Hancock, E.R. (eds) Similarity-Based Pattern Recognition. SIMBAD 2011. Lecture Notes in Computer Science, vol 7005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24471-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24471-1_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24470-4

  • Online ISBN: 978-3-642-24471-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics