International Conference on Knowledge Engineering and the Semantic Web

Knowledge Engineering and Semantic Web pp 225-239 | Cite as

Pattern Mining and Machine Learning for Demographic Sequences

  • Dmitry I. Ignatov
  • Ekaterina Mitrofanova
  • Anna Muratova
  • Danil Gizdatullin
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 518)

Abstract

In this paper, we present the results of our first studies in application of pattern mining and machine learning techniques to analysis of demographic sequences in Russia based on data of 11 generations from 1930 to 1984. The main goal is not prediction and data mining methods themselves but rather extraction of interesting patterns and knowledge acquisition from substantial datasets of demographic data. We use decision trees as techniques for demographic events prediction and emergent patterns for searching significant and potentially useful sequences.

Keywords

Demographic sequences Sequence mining Emergent patterns Emergent sequences Decision trees Machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aisenbrey, S., Fasang, A.E.: New life for old ideas: The second wave of sequence analysis bringing the course back into the life course. Sociological Methods & Research 38(3), 420–462 (2010)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Billari, F.C.: Sequence analysis in demographic research. Canadian Studies in Population 28(2), 439–458 (2001)Google Scholar
  3. 3.
    Aassve, A., Billari, F.C., Piccarreta, R.: Strings of adulthood: A sequence analysis of young british womens work-family trajectories. European Journal of Population 23(3/4), 369–388 (2007)CrossRefGoogle Scholar
  4. 4.
    Jackson, P.B., Berkowitz, A.: The structure of the life course: Gender and racioethnic variation in the occurrence and sequencing of role transitions. Advances in Life Course Research 9, 55–90 (2005)CrossRefGoogle Scholar
  5. 5.
    Worts, D., Sacker, A., McMunn, A., McDonough, P.: Individualization, opportunity and jeopardy in american womens work and family lives: A multi-state sequence analysis. Advances in Life Course Research 18(4), 296–318 (2013)CrossRefGoogle Scholar
  6. 6.
    Abbott, A., Tsay, A.: Sequence analysis and optimal matching methods in sociology: Review and prospect. Sociological Methods & Research (2000)Google Scholar
  7. 7.
    Billari, F., Piccarreta, R.: Analyzing demographic life courses through sequence analysis. Mathematical Population Studies 12(2), 81–106 (2005)MATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Billari, F.C., Frnkranz, J., Prskawetz, A.: Timing, Sequencing, and Quantum of Life Course Events: A Machine Learning Approach. European Journal of Population 22(1), 37–65 (2006)CrossRefGoogle Scholar
  9. 9.
    Gauthier, J.A., Widmer, E.D., Bucher, P., Notredame, C.: How Much Does It Cost? Optimization of Costs in Sequence Analysis of Social Science Data. Sociological Methods & Research 38(1), 197–231 (2009)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Ritschard, G., Oris, M.: Life course data in demography and social sciences: Statistical and data-mining approaches. Advances in Life Course Research 10, 283–314 (2005)CrossRefGoogle Scholar
  11. 11.
    Gabadinho, A., Ritschard, G., Mller, N.S., Studer, M.: Analyzing and Visualizing State Sequences in R with TraMineR. J. of Statistical Software 40(4), 1–37 (2011)CrossRefGoogle Scholar
  12. 12.
    Blockeel, H., Fürnkranz, J., Prskawetz, A., Billari, F.C.: Detecting temporal change in event sequences: an application to demographic data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 29–41. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  13. 13.
    Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: A Java Open-Source Pattern Mining Library. Journal of Machine Learning Research 15, 3389–3393 (2014)Google Scholar
  14. 14.
    Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proc. of the Fifth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 1999, pp. 43–52. ACM (1999)Google Scholar
  15. 15.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)Google Scholar
  16. 16.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  17. 17.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14 (1995)Google Scholar
  18. 18.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann (2006)Google Scholar
  19. 19.
    Mill, J.S.: A system of logic, ratonative and inductive, vol. 1. J. W. Parker, London (1843) Google Scholar
  20. 20.
    Finn, V.K.: On Machine-Oriented Formalization of Plausible Reasoning in the Style of F. BackonJ. S. Mill. Semiotika i Informatika 20, 35–101 (1983)MATHMathSciNetGoogle Scholar
  21. 21.
    Kuznetsov, S.O.: Learning of simple conceptual graphs from positive and negative examples. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 384–391. Springer, Heidelberg (1999) CrossRefGoogle Scholar
  22. 22.
    Low-Kam, C., Raissi, C., Kaytoue, M., Pei, J.: Mining statistically significant sequential patterns. In: IEEE 13th Int. Conf. on Data Mining, pp. 488–496 (2013)Google Scholar
  23. 23.
    Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan, B.: Orange: Data Mining Toolbox in Python. Journal of Machine Learning Research 14, 2349–2353 (2013)MATHGoogle Scholar
  24. 24.
    Bouckaert, R.R., Frank, E., Hall, M.A., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: WEKA - Experiences with a Java Open-Source Project. Journal of Machine Learning Research 11, 2533–2541 (2010)MATHGoogle Scholar
  25. 25.
    Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: Özsoyoglu, Z.M., Zdonik, S.B. (eds.) Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp. 79–90. IEEE Computer Society (2004)Google Scholar
  26. 26.
    Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns by prefix-projected growth. In: Proceedings of the 17th International Conference on Data Engineering, pp. 215–224 (2001)Google Scholar
  27. 27.
    Cerf, L., Gay, D., Selmaoui-Folcher, N., Crmilleux, B., Boulicaut, J.F.: Parameter-free classification in multi-class imbalanced data sets. Data & Knowledge Engineering 87, 109–129 (2013)CrossRefGoogle Scholar
  28. 28.
    Buzmakov, A., Egho, E., Jay, N., Kuznetsov, S.O., Napoli, A., Raïssi, C.: On projections of sequential pattern structures (with an application on care trajectories). In: 10th Int. Conf. on Concept Lattices and Their Applications, pp. 199–208 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Dmitry I. Ignatov
    • 1
  • Ekaterina Mitrofanova
    • 1
  • Anna Muratova
    • 1
  • Danil Gizdatullin
    • 1
  1. 1.National Research University Higher School of EconomicsMoscowRussia

Personalised recommendations