Machine Learning

, Volume 93, Issue 1, pp 115–139 | Cite as

Spatio-temporal random fields: compressible representation and distributed estimation



Modern sensing technology allows us enhanced monitoring of dynamic activities in business, traffic, and home, just to name a few. The increasing amount of sensor measurements, however, brings us the challenge for efficient data analysis. This is especially true when sensing targets can interoperate—in such cases we need learning models that can capture the relations of sensors, possibly without collecting or exchanging all data. Generative graphical models namely the Markov random fields (MRF) fit this purpose, which can represent complex spatial and temporal relations among sensors, producing interpretable answers in terms of probability. The only drawback will be the cost for inference, storing and optimizing a very large number of parameters—not uncommon when we apply them for real-world applications.

In this paper, we investigate how we can make discrete probabilistic graphical models practical for predicting sensor states in a spatio-temporal setting. A set of new ideas allows keeping the advantages of such models while achieving scalability. We first introduce a novel alternative to represent model parameters, which enables us to compress the parameter storage by removing uninformative parameters in a systematic way. For finding the best parameters via maximum likelihood estimation, we provide a separable optimization algorithm that can be performed independently in parallel in each graph node. We illustrate that the prediction quality of our suggested method is comparable to those of the standard MRF and a spatio-temporal k-nearest neighbor method, while using much less computational resources.


Regularization Graphical models Spatio-temporal 


  1. Apiletti, D., Baralis, E., & Cerquitelli, T. (2011). Energy-saving models for wireless sensor networks. Knowledge and Information Systems, 28, 615–644. CrossRefGoogle Scholar
  2. Barzilai, J., & Borwein, J. M. (1988). Two-point step size gradient methods. IMA Journal of Numerical Analysis, 8(1), 141–148. MathSciNetMATHCrossRefGoogle Scholar
  3. Campbell, D. (2011). Is it still Big Data if it fits in my pocket? In Proceedings of the VLDB endowment (Vol. 4, p. 694). Google Scholar
  4. Cucuringu, M., Puente, J., & Shue, D. (2011). Model selection in undirected graphical models with the elastic net. Google Scholar
  5. Darroch, J. N., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, 43(5), 1470–1480. MathSciNetMATHCrossRefGoogle Scholar
  6. Douillard, B., Fox, D., & Ramos, F. T. (2007). A spatio-temporal probabilistic model for multi-sensor object recognition. In IEEE/RSJ international conference on intelligent robots and systems (pp. 2402–2408). Google Scholar
  7. Giannotti, F., Nanni, M., Pinelli, F., & Pedreschi, D. (2007). Trajectory pattern mining. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 330–339). CrossRefGoogle Scholar
  8. Gong, X., & Wang, F. (2002). Three improvements on KNN-NPR for traffic flow forecasting. In Proceedings of the 5th international conference on intelligent transportation systems (pp. 736–740). CrossRefGoogle Scholar
  9. Hafstein, S. F., Chrobok, R., Pottmeier, A., & Mazur, M. S. F. (2004). A high-resolution cellular automata traffic simulation model with application in a freeway traffic information system. Computer-Aided Civil and Infrastructure Engineering, 19(5), 338–350. CrossRefGoogle Scholar
  10. Heinemann, U., & Globerson, A. (2011). What cannot be learned with Bethe approximations. In Proceedings of the 27th conference on uncertainty in artificial intelligence, Barcelona, Spain. Google Scholar
  11. Huang, R., Pavlovic, V., & Metaxas, D. (2008). A new spatio-temporal mrf framework for video-based object segmentation. In The 1st international workshop on machine learning for vision-based motion analysis. Google Scholar
  12. Kschischang, F. R., Frey, B. J., & Loeliger, H. A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519. MathSciNetMATHCrossRefGoogle Scholar
  13. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning (pp. 282–289). Google Scholar
  14. Lam, W. H. K., Tang, Y. F., & Tam, M. (2006). Comparison of two non-parametric models for daily traffic forecasting in Hong Kong. Journal of Forecasting, 25(3), 173–192. MathSciNetCrossRefGoogle Scholar
  15. Liebig, T., Xu, Z., May, M., & Wrobel, S. (2012). Pedestrian quantity estimation with trajectory patterns. In Lecture notes in computer science: Vol. 7524. Machine learning and knowledge discovery in databases (pp. 629–643). Berlin: Springer. CrossRefGoogle Scholar
  16. Lippi, M., Bertini, M., & Frasconi, P. (2010). Collective traffic forecasting. In Lecture notes in computer science: Vol. 6322. Machine learning and knowledge discovery in databases (pp. 259–273). Berlin: Springer. CrossRefGoogle Scholar
  17. Luckham, D. (2002). The power of events—an introduction to complex event processing in distributed enterprise systems. Reading: Addison Wesley. Google Scholar
  18. Marinosson, S. F., Chrobok, R., Pottmeier, A., Wahle, J., & Schreckenberg, M. (2002). Simulation framework for the autobahn traffic in North Rhine-Westphalia. In Cellular automata—5th int. conf. on cellular automata for research and industry (pp. 2977–2980). Berlin: Springer. Google Scholar
  19. May, M. & Saitta, L. (Eds.) (2010). Lecture notes in artificial intelligence: Vol. 6202. Ubiquitous knowledge discovery. Berlin: Springer. Google Scholar
  20. May, M., Hecker, D., Körner, C., Scheider, S., & Schulz, D. (2008). A vector-geometry based spatial knn-algorithm for traffic frequency predictions. In Data mining workshops, international conference on data mining (pp. 442–447). Google Scholar
  21. Meinshausen, N., & Buehlmann, P. (2005). High dimensional graphs and variable selection with lasso. The Annals of Statistics, 34(3), 1436–1462. CrossRefGoogle Scholar
  22. Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Advances in Neural Information Processing Systems, 14, 841–848. Google Scholar
  23. Nocedal, J., & Wright, S. J. (2006). Numerical optimization (2nd ed.). Berlin: Springer. MATHGoogle Scholar
  24. Pearce, N. D., & Wand, M. P. (2006). Penalized splines and reproducing kernel methods. American Statistician, 60(3), 233–240. MathSciNetCrossRefGoogle Scholar
  25. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufmann Publishers Inc. Google Scholar
  26. Piatkowski, N. (2012). iST-MRF: interactive spatio-temporal probabilistic models for sensor networks. In International workshop at ECML PKDD 2012 on instant interactive data mining. Google Scholar
  27. Piatkowski, N., Lee, S., & Morik, K. (2012). Spatio-temporal models for sustainability. In: Proceedings of the SustKDD workshop in ACM KDD. Google Scholar
  28. Sagy, G., Keren, D., Sharfman, I., & Schuster, A. (2011). Distributed threshold querying of general functions by a difference of monotonic representation. In: Proceedings of the VLDB endowment (Vol. 4). Google Scholar
  29. Sutton, C., & McCallum, A. (2007). An introduction to conditional random fields for relational learning. In Introduction to statistical relational learning. Cambridge: MIT Press. Google Scholar
  30. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288. MathSciNetMATHGoogle Scholar
  31. Wainwright, M., Jaakkola, T., & Willsky, A. (2005). A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51(7), 2313–2335. MathSciNetCrossRefGoogle Scholar
  32. Wainwright, M. J., & Jordan, M. I. (2007). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2), 1–305. CrossRefGoogle Scholar
  33. Wang, D., Rundensteiner, E. A., & Ellison, R. T. (2011). Active complex event processing of event streams. In: Procs. of the VLDB endowment (Vol. 4). Google Scholar
  34. Whittaker, J., Garside, S., & Lindveld, K. (1997). Tracking and predicting a network traffic process. International Journal of Forecasting, 13(1), 51–61. CrossRefGoogle Scholar
  35. Williams, B., & Hoel, L. (2003). Modeling and forecasting vehicular traffic flow as a seasonal arima process: theoretical basis and empirical results. Journal of Transportation Engineering, 129(6), 664–672. CrossRefGoogle Scholar
  36. Wolff, R., Badhuri, K., & Kargupta, H. (2009). A generic local algorithm for mining data streams in large distributed systems. IEEE Transactions on Knowledge and Data Engineering, 21(4), 465–478. CrossRefGoogle Scholar
  37. Wright, S. J., Nowak, R. D., & Figueiredo, M. A. T. (2009). Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57, 2479–2493. MathSciNetCrossRefGoogle Scholar
  38. Yin, Z., & Collins, R. (2007). Belief propagation in a 3D spatio-temporal MRF for moving object detection. IEEE Computer Vision and Pattern Recognition. Google Scholar
  39. Zhao, F., & Park, N. (2004). Using geographically weighted regression models to estimate annual average daily traffic. Journal of the Transportation Research Board, 1879(12), 99–107. CrossRefGoogle Scholar
  40. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320. MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Nico Piatkowski
    • 1
  • Sangkyun Lee
    • 1
  • Katharina Morik
    • 1
  1. 1.TU Dortmund UniversityDortmundGermany

Personalised recommendations