A Topic Model for Traffic Speed Data Analysis
We propose a probabilistic model for traffic speed data. Our model inherits two key features from latent Dirichlet allocation (LDA). Firstly, unlike e.g. stock market data, lack of data is often perceived for traffic speed data due to unexpected failure of sensors or networks. Therefore, we regard speed data not as a time series, but as an unordered multiset in the same way as LDA regards documents not as a sequence, but as a bag of words. This also enables us to analyze co-occurrence patterns of speed data regardless of their positions along the time axis. Secondly, we regard a daily set of speed data gathered from the same sensor as a document and model it not with a single distribution, but with a mixture of distributions as in LDA. While each such distribution is called topic in LDA, we call it patch to remove text-mining connotation and name our model Patchy. This approach enables us to model speed co-occurrence patterns effectively. However, speed data are non-negative real. Therefore, we use Gamma distributions in place of multinomial distributions. Due to these two features, Patchy can reveal context dependency of traffic speed data. For example, a 60 mph observed on Sunday can be assigned to a patch different from that to which a 60 mph on Wednesday is assigned. We evaluate this context dependency through a binary classification task, where test data are classified as either weekday data or not. We use real traffic speed data provided by New York City and compare Patchy with the baseline method, where a simpler data model is applied.
KeywordsGamma Distribution Near Neighbor Topic Model Latent Dirichlet Allocation Baseline Method
Unable to display preview. Download preview PDF.
- 1.Blei, D.M., Lafferty, J.D.: Correlated topic models. In: NIPS (2005)Google Scholar
- 3.Drummond, A., Jermaine, C., Vagena, Z.: Topic models for feature selection in document clustering. In: SDM, pp. 521–529 (2013)Google Scholar
- 4.Hennig, P., Stern, D.H., Herbrich, R., Graepel., T.: Kernel topic models. In: AISTATS (2012)Google Scholar
- 5.Mills, T.C., Markellos, R.N.: The Econometric Modelling of Financial Time Series. Cambridge University Press (2008)Google Scholar
- 6.Minka, T.P.: Estimating a Gamma distribution (2002), http://research.microsoft.com/en-us/um/people/minka/papers/minka-gamma.pdf
- 7.Pan, B., Demiryurek, U., Shahabi, C.: Utilizing real-world transportation data for accurate traffic prediction. In: ICDM, pp. 595–604 (2012)Google Scholar