A Topic Model for Traffic Speed Data Analysis

  • Tomonari Masada
  • Atsuhiro Takasu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8482)


We propose a probabilistic model for traffic speed data. Our model inherits two key features from latent Dirichlet allocation (LDA). Firstly, unlike e.g. stock market data, lack of data is often perceived for traffic speed data due to unexpected failure of sensors or networks. Therefore, we regard speed data not as a time series, but as an unordered multiset in the same way as LDA regards documents not as a sequence, but as a bag of words. This also enables us to analyze co-occurrence patterns of speed data regardless of their positions along the time axis. Secondly, we regard a daily set of speed data gathered from the same sensor as a document and model it not with a single distribution, but with a mixture of distributions as in LDA. While each such distribution is called topic in LDA, we call it patch to remove text-mining connotation and name our model Patchy. This approach enables us to model speed co-occurrence patterns effectively. However, speed data are non-negative real. Therefore, we use Gamma distributions in place of multinomial distributions. Due to these two features, Patchy can reveal context dependency of traffic speed data. For example, a 60 mph observed on Sunday can be assigned to a patch different from that to which a 60 mph on Wednesday is assigned. We evaluate this context dependency through a binary classification task, where test data are classified as either weekday data or not. We use real traffic speed data provided by New York City and compare Patchy with the baseline method, where a simpler data model is applied.


Gamma Distribution Near Neighbor Topic Model Latent Dirichlet Allocation Baseline Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blei, D.M., Lafferty, J.D.: Correlated topic models. In: NIPS (2005)Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Drummond, A., Jermaine, C., Vagena, Z.: Topic models for feature selection in document clustering. In: SDM, pp. 521–529 (2013)Google Scholar
  4. 4.
    Hennig, P., Stern, D.H., Herbrich, R., Graepel., T.: Kernel topic models. In: AISTATS (2012)Google Scholar
  5. 5.
    Mills, T.C., Markellos, R.N.: The Econometric Modelling of Financial Time Series. Cambridge University Press (2008)Google Scholar
  6. 6.
    Minka, T.P.: Estimating a Gamma distribution (2002),
  7. 7.
    Pan, B., Demiryurek, U., Shahabi, C.: Utilizing real-world transportation data for accurate traffic prediction. In: ICDM, pp. 595–604 (2012)Google Scholar
  8. 8.
    Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cDNA microarray data sets. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(2), 143–156 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tomonari Masada
    • 1
  • Atsuhiro Takasu
    • 2
  1. 1.Nagasaki UniversityNagasakiJapan
  2. 2.National Institute of InformaticsChiyoda-kuJapan

Personalised recommendations