Data Mining and Knowledge Discovery

, Volume 31, Issue 5, pp 1359–1390 | Cite as

MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data

  • Martin Becker
  • Florian Lemmerich
  • Philipp Singer
  • Markus Strohmaier
  • Andreas Hotho
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2017


Sequential traces of user data are frequently observed online and offline, e.g., as sequences of visited websites or as sequences of locations captured by GPS. However, understanding factors explaining the production of sequence data is a challenging task, especially since the data generation is often not homogeneous. For example, navigation behavior might change in different phases of browsing a website or movement behavior may vary between groups of users. In this work, we tackle this task and propose MixedTrails , a Bayesian approach for comparing the plausibility of hypotheses regarding the generative processes of heterogeneous sequence data. Each hypothesis is derived from existing literature, theory, or intuition and represents a belief about transition probabilities between a set of states that can vary between groups of observed transitions. For example, when trying to understand human movement in a city and given some data, a hypothesis assuming tourists to be more likely to move towards points of interests than locals can be shown to be more plausible than a hypothesis assuming the opposite. Our approach incorporates such hypotheses as Bayesian priors in a generative mixed transition Markov chain model, and compares their plausibility utilizing Bayes factors. We discuss analytical and approximate inference methods for calculating the marginal likelihoods for Bayes factors, give guidance on interpreting the results, and illustrate our approach with several experiments on synthetic and empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind of analysis for studying sequential data in many application areas.


Heterogeneous sequence data Markov chain Model comparison Bayes factor Hyptrails Mixedtrails MTMC 



This work was partially funded by the BMBF project Kallimachos and the DFG German Science Fund research projects PoSTs II and p2map.


  1. Asahara A, Maruyama K, Sato A, Seto K (2011) Pedestrian-movement prediction based on mixed Markov-chain model. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, pp 25–33Google Scholar
  2. Baccigalupo C, Plaza E (2006) Case-based sequential ordering of songs for playlist recommendation. In: European conference on case-based reasoning. Springer, pp 286–300Google Scholar
  3. Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512MathSciNetCrossRefzbMATHGoogle Scholar
  4. Becker M, Singer P, Lemmerich F, Hotho A, Helic D, Strohmaier M (2015) Photowalking the city: comparing hypotheses about urban photo trails on Flickr. In: Liu TY, Scollon CN, Zhu W (eds) Social informatics. Springer, pp 227–244Google Scholar
  5. Becker M, Mewes H, Hotho A, Dimitrov D, Lemmerich F, Strohmaier M (2016) Sparktrails: a MapReduce implementation of HypTrails for comparing hypotheses about human trails. In: Bourdeau J, Hendler J, Nkambou R, Horrocks I, Zhao BY (eds) Proceedings of the 25th international conference companion on world wide web. WWW’16 Companion, Canada. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 17–18Google Scholar
  6. Benavoli A, Mangili F, Corani G, Zaffalon M, Ruggeri F (2014) A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In: Proceedings of the 31st international conference on machine learning, ICML’14, Beijing, China, June 2014., pp 1026–1034Google Scholar
  7. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57:289–300MathSciNetzbMATHGoogle Scholar
  8. Blackstone A (2012) Sociological inquiry principles: qualitative and quantitative methods. Flat World Knowledge, Irvington, NY, USAGoogle Scholar
  9. Blei DM , Moreno PJ (2001) Topic segmentation with an aspect hidden Markov model. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 343–348Google Scholar
  10. Brumby DP, Howes A (2004) Good enough but i’ll just check: web-page search as attentional refocusing. In: Lovett MC, Schunn CD, Lebiere C, Munro P (eds) Sixth international conference on cognitive modeling: ICCM - 2004. Psychology Press, pp 46–51Google Scholar
  11. Catledge LD, Pitkow JE (1995) Characterizing browsing strategies in the world-wide web. Comput Netw ISDN Syst 27(6):1065–1073CrossRefGoogle Scholar
  12. Chalmers M, Rodden K, Brodbeck D (1998) The order of things: activity-centred information access. Comput Netw ISDN Syst 30(1):359–367CrossRefGoogle Scholar
  13. Chi EH, Pirolli PLT, Chen K, Pitkow J (2001) Using information scent to model user information needs and actions and the web. In: Conference on human factors in computing systems. ACM, pp 490–497Google Scholar
  14. Chib S (1995) Marginal likelihood from the Gibbs output. J Am Stat Assoc 90(432):1313–1321MathSciNetCrossRefzbMATHGoogle Scholar
  15. De Choudhury M, Feldman M, Amer-Yahia S, Golbandi N, Lempel R, Yu C (2010) Automatic construction of travel itineraries using social breadcrumbs. In: Proceedings of the 21st ACM conference on hypertext and hypermedia, HT’10, Toronto, Ontario, Canada. ACM, New York, NY, USA, pp 35–44Google Scholar
  16. Dimitrov D, Singer P, Lemmerich F, Strohmaier M (2017) What makes a link successful on wikipedia? In: Proceedings of the 26th International Conference on World Wide Web. WWW ’17, Perth, Australia. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 917–926Google Scholar
  17. Figueiredo F, Ribeiro B, Almeida JM , Andrade N, Faloutsos C (2016a) Mining online music listening trajectories. In: Proceedings of the 17th ISMIR conference, New York City, USA, August 7–11, 2016Google Scholar
  18. Figueiredo F, Ribeiro B, Almeida JM, Faloutsos C (2016b) Tribeflow: mining & predicting user trajectories. In: Proceedings of the 25th international conference on world wide web. WWW ’16, Canada. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 695–706Google Scholar
  19. Fox EB, Sudderth EB, Jordan MI, Willsky AS (2010) Bayesian nonparametric methods for learning Markov switching processes. IEEE Signal Process Mag 27(6):43–54Google Scholar
  20. Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26(1):78–89MathSciNetCrossRefGoogle Scholar
  21. Gabriel KR, Neumann J (1962) A Markov chain model for daily rainfall occurrence at Tel Aviv. Q J R Meteorol Soc 88(375):90–95CrossRefGoogle Scholar
  22. Gambs S, Killijian M-O, del Prado Cortez MN (2010) Show me how you move and I will tell you who you are. In: Proceedings of the 3rd ACM SIGSPATIAL international workshop on security and privacy in GIS and LBS, SPRINGL ’10, ACM, New York, NY, USA, pp 34–41Google Scholar
  23. Gelman A, Hill J, Yajima M (2012) Why we (usually) don’t have to worry about multiple comparisons. J Res Educ Eff 5(2):189–211Google Scholar
  24. Ghahramani Z, Jordan MI, Smyth P (1997) Factorial hidden Markov models. Mach Learn 29(2–3):245–273CrossRefzbMATHGoogle Scholar
  25. Goldwater S, Griffiths T (2007) A fully Bayesian approach to unsupervised part-of-speech tagging. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, June 2007. Association for Computational Linguistics, pp 744–751Google Scholar
  26. Gonzalez MC, Hidalgo CA, Barabasi A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782CrossRefGoogle Scholar
  27. Goodman SN (1998) Multiple comparisons, explained. Am J Epidemiol 147(9):807–812CrossRefGoogle Scholar
  28. Gupta R, Kumar R, Vassilvitskii S (2016) On mixtures of Markov chains. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates, Inc., pp 3441–3449Google Scholar
  29. Hamilton JD (1990) Analysis of time series subject to changes in regime. J Econom 45(1–2):39–70MathSciNetCrossRefzbMATHGoogle Scholar
  30. Hayes B et al (2013) First links in the Markov chain. Am Sci 101(2):92–97CrossRefGoogle Scholar
  31. Herr N (2008) The Sourcebook for Teaching Science, Grades 6-12: Strategies, Activities, and Instructional Resources, WileyGoogle Scholar
  32. Huberman BA, Pirolli PLT, Pitkow JE, Lukose RM (1998) Strong regularities in world wide web surfing. Science 280(5360):95–97CrossRefGoogle Scholar
  33. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795MathSciNetCrossRefzbMATHGoogle Scholar
  34. Kemeny JG, Snell JL et al (1960) Finite Markov chains, vol 356. van Nostrand, PrincetonzbMATHGoogle Scholar
  35. Kruschke JK (2013) Bayesian estimation supersedes the t test. J Exp Psychol Gen 142(2):573CrossRefGoogle Scholar
  36. Kruschke J (2015) In: Doing Bayesian Data Analysis, 2nd edn. Academic Press, BostonGoogle Scholar
  37. Laxman S, Tankasali V, White RW (2008) Stream prediction using a generative model based on frequent episodes in event sequences. In: International conference on knowledge discovery and data mining. ACM, pp 453–461Google Scholar
  38. Lemmerich F, Becker M, Singer P, Helic D, Hotho A, Strohmaier M (2016) Mining subgroups with exceptional transition behavior. In: KDD ’16: proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACMGoogle Scholar
  39. Markov AA (2006) An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains. Sci Context 19(04):591–600 Originally published in 1913CrossRefzbMATHGoogle Scholar
  40. Matsubara Y, Sakurai Y, Faloutsos C (2014) Autoplait: automatic mining of co-evolving time sequences. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, pp 193–204Google Scholar
  41. Murphy KP (2002) Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California, BerkeleyGoogle Scholar
  42. Noulas A, Scellato S, Lambiotte R, Pontil M, Mascolo C (2012) A tale of many cities: universal patterns in human urban mobility. PLoS ONE 7(5):1–10CrossRefGoogle Scholar
  43. Noulas A, Scellato S, Lathia N, Mascolo C (2012) Mining user mobility features for next place prediction in location-based services. In: Proceedings of the 2012 IEEE 12th international conference on data mining, ICDM ’12. IEEE Computer Society, Washington, DC, USA, pp 1038–1043Google Scholar
  44. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Stanford InfoLabGoogle Scholar
  45. Pirolli PLT, Card SK (1999) Information foraging. Psychol Rev 106(4):643–675CrossRefGoogle Scholar
  46. Ponte JM, Croft WB (1997) Text segmentation by topic. In: International conference on theory and practice of digital libraries. Springer, pp 113–125Google Scholar
  47. Poulsen CS (1990) Mixed Markov and latent Markov modelling applied to brand choice behaviour. Int J Res Mark 7(1):5–19MathSciNetCrossRefGoogle Scholar
  48. Rabiner LR, Juang B-H (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16CrossRefGoogle Scholar
  49. Rendle S, Freudenthaler C, Schmidt-Thieme L (2010) Factorizing personalized Markov chains for next-basket recommendation. In: Proceedings of the 19th International Conference on World Wide Web. WWW ’10, Raleigh, North Carolina, USA. ACM, New York, NY, USA, pp 811–820Google Scholar
  50. Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237CrossRefGoogle Scholar
  51. Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55MathSciNetCrossRefGoogle Scholar
  52. Singer P, Helic D, Taraghi B, Strohmaier M (2014) Detecting memory and structure in human navigation patterns using Markov chain models of varying order. PLoS ONE 9(7):e102070CrossRefGoogle Scholar
  53. Singer P, Helic D, Hotho A, Strohmaier M (2015) Hyptrails: a Bayesian approach for comparing hypotheses about human trails on the web. In: Proceedings of the 24th International Conference on World Wide Web. WWW ’15, Florence, Italy. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 1003–1013Google Scholar
  54. Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE (1985) Fluorescence detection in automated DNA sequence analysis. Nature 321(6071):674–679CrossRefGoogle Scholar
  55. Smith RL, Tawn JA, Coles SG (1997) Markov chain models for threshold exceedances. Biometrika 84(2):249–268MathSciNetCrossRefzbMATHGoogle Scholar
  56. Strelioff CC, Crutchfield JP, Hübler AW (2007) Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Phys Rev E 76(1):011106MathSciNetCrossRefGoogle Scholar
  57. Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581MathSciNetCrossRefzbMATHGoogle Scholar
  58. Trochim W (2001) Research methods knowledge base, 2nd edn. Atomic Dog Publishing, Cincinnati, OH, USAGoogle Scholar
  59. Van Mulbregt P, Carp I, Gillick L, Lowe S, Yamron J (1998) Text segmentation and topic tracking on broadcast news via a hidden Markov model approach. In: ICSLPGoogle Scholar
  60. Vanpaemel W (2010) Prior sensitivity in theory testing: an apologia for the bayes factor. J Math Psychol 54(6):491–498MathSciNetCrossRefzbMATHGoogle Scholar
  61. Walk S, Singer P, Strohmaier M (2014) Sequential action patterns in collaborative ontology-engineering projects: a case-study in the biomedical domain. In: International conference on conference on information & knowledge management. ACMGoogle Scholar
  62. Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 977–984Google Scholar
  63. West R, Leskovec J (2012) Human wayfinding in information networks. In: Proceedings of the 21st international conference on world wide web. ACM, pp 619–628Google Scholar
  64. West R, Pineau J, Precup D (2009) Wikispeedia: an online game for inferring semantic distances between concepts. In: Proceedings of the 21st international joint conference on artificial intelligence, pp 1598–1603Google Scholar
  65. Wetzels R, Tutschkow D, Dolan C, van der Sluis S, Dutilh G, Wagenmakers E-J (2016) A bayesian test for the hot hand phenomenon. J Math Psychol 72:200–209MathSciNetCrossRefzbMATHGoogle Scholar
  66. White RW, Huang J (2010) Assessing the scenic route: measuring the value of search trails in web logs. In Conference on research and development in information retrieval. ACM, pp 587–594Google Scholar
  67. Yang J, McAuley J, Leskovec J, LePendu P, Shah N (2014) Finding progression stages in time-evolving event sequences. In: Proceedings of the 23rd international conference on World wide web. ACM, pp 783–794Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Data Mining and Information Retrieval GroupUniversity of WürzburgWürzburgGermany
  2. 2.GESIS - Leibniz Institute for the Social SciencesCologneGermany

Personalised recommendations