Extracting Knowledge from Life Courses: Clustering and Visualization
This article presents some of the facilities offered by our TraMineR R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the n most frequent sequences. TraMineR was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should also be of interest for other kind of sequential data such as DNA analysis or web logs.
KeywordsOptimal Match Frequent Sequence Levenshtein Distance Sociological Method Swiss Household Panel
Unable to display preview. Download preview PDF.
- 3.Kruskal, J.: An overview of sequence comparison. In: Time warps, string edits, and macromolecules. The theory and practice of sequence comparison, pp. 1–44. Adison-Wesley, Don Mills (1983)Google Scholar
- 7.Rohwer, G., Pötter, U.: TDA user’s manual. Software, Ruhr-Universität Bochum, Fakultät für Sozialwissenschaften, Bochum (2002)Google Scholar
- 9.Notredame, C., Bucher, P., Gauthier, J.A., Widmer, E.: T-COFFEE/SALTT: User guide and reference manual (2005), Available at, http://www.tcoffee.org/saltt
- 10.Gauthier, J.A., Widmer, E.D., Bucher, P., Notredame, C.: How much does it cost? Optimization of costs in sequence analysis of social science data. Sociological Methods and Research (forthcoming, 2008)Google Scholar
- 12.Brzinsky-Fay, C., Kohler, U., Luniak, M.: Sequence analysis with Stata. The Stata Journal 6(4), 435–460 (2006)Google Scholar
- 13.Lesnard, L.: Describing social rhythms with optimal matching (2007)Google Scholar
- 14.Elzinga, C.H.: CHESA 2.1 User manual. User guide, Dept of Social Science Research methods, Vrije Universiteit, Amsterdam (2007)Google Scholar