Extracting and Rendering Representative Sequences

Gabadinho, Alexis; Ritschard, Gilbert; Studer, Matthias; Müller, Nicolas S.

doi:10.1007/978-3-642-19032-2_7

Alexis Gabadinho⁵,
Gilbert Ritschard⁵,
Matthias Studer⁵ &
…
Nicolas S. Müller⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 128))

Included in the following conference series:

International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

904 Accesses
14 Citations
3 Altmetric

Abstract

This paper is concerned with the summarization of a set of categorical sequences. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighbourhood. The proposed heuristic for extracting the representative subset requires as main arguments a pairwise distance matrix, a representativeness criterion and a distance threshold under which two sequences are considered as redundant or, identically, in the neighborhood of each other. It first builds a list of candidates using a representativeness score and then eliminates redundancy. We propose also a visualization tool for rendering the results and quality measures for evaluating them. The proposed tools have been implemented in our TraMineR R package for mining and visualizing sequence data and we demonstrate their efficiency on a real world example from social sciences. The methods are nonetheless by no way limited to social science data and should prove useful in many other domains.

This work is part of the Swiss National Science Foundation research project FN-122230 “Mining event histories: Towards new insights on personal Swiss life courses”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbott, A., Tsay, A.: Sequence analysis and optimal matching methods in sociology, Review and prospect. Sociological Methods and Research 29(1), 3–33 (2000) (With discussion, pp. 34–76)
Article Google Scholar
Müller, N.S., Gabadinho, A., Ritschard, G., Studer, M.: Extracting knowledge from life courses: Clustering and visualization. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 176–185. Springer, Heidelberg (2008)
Chapter Google Scholar
Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of representative protein data sets. Protein Sci. 1(3), 409–417 (1992)
Article Google Scholar
Holm, L., Sander, C.: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14(5), 423–429 (1998)
Article Google Scholar
Gabadinho, A., Ritschard, G., Studer, M., Müller, N.: Mining sequence data in R with the TraMineR package: A user’s guide. Technical report, Department of Econometrics and Laboratory of Demography, University of Geneva, Geneva (2009)
Google Scholar
McVicar, D., Anyadike-Danes, M.: Predicting successful and unsuccessful transitions from school to work by using sequence methods. Journal of the Royal Statistical Society. Series A (Statistics in Society) 165(2), 317–334 (2002)
Article MathSciNet MATH Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. John Wiley and Sons, New York (1990)
Book MATH Google Scholar
Studer, M., Ritschard, G., Gabadinho, A., Müller, N.S.: Discrepancy analysis of complex objects using dissimilarities. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 3–19. Springer, Heidelberg (2010)
Chapter Google Scholar
Clark, R.D.: Optisim: An extended dissimilarity selection method for finding diverse representative subsets. Journal of Chemical Information and Computer Sciences 37(6), 1181–1188 (1997)
Article Google Scholar
Daszykowski, M., Walczak, B., Massart, D.L.: Representative subset selection. Analytica Chimica Acta 468(1), 91–103 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Econometrics and Laboratory of Demography, University of Geneva, 40, bd du Pont-d’Arve, CH-1211, Geneva, Switzerland
Alexis Gabadinho, Gilbert Ritschard, Matthias Studer & Nicolas S. Müller

Authors

Alexis Gabadinho
View author publications
You can also search for this author in PubMed Google Scholar
Gilbert Ritschard
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Studer
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas S. Müller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628, Delft, CD, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, Henley Business School, University of Reading, RG6 6UD, Reading, UK
Kecheng Liu
Departament of Systems and Informatics, Polytechnic Institute of Setúbal – INSTICC, Rua do Vale de Chaves - Estefanilha, 2910-761, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gabadinho, A., Ritschard, G., Studer, M., Müller, N.S. (2011). Extracting and Rendering Representative Sequences. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowlege Engineering and Knowledge Management. IC3K 2009. Communications in Computer and Information Science, vol 128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19032-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-19032-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19031-5
Online ISBN: 978-3-642-19032-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics