Machine Learning

, Volume 75, Issue 2, pp 217–243 | Cite as

Bayesian learning of graphical vector autoregressions with unequal lag-lengths



Graphical modelling strategies have been recently discovered as a versatile tool for analyzing multivariate stochastic processes. Vector autoregressive processes can be structurally represented by mixed graphs having both directed and undirected edges between the variables representing process components. To allow for more expressive vector autoregressive structures, we consider models with separate time dynamics for each directed edge and non-decomposable graph topologies for the undirected part of the mixed graph.

Contrary to static graphical models, the number of possible mixed graphs is extremely large even for small systems, and consequently, standard Bayesian computation based on Markov chain Monte Carlo is not in practice a feasible alternative for model learning. To obtain a numerically efficient approach we utilize a recent Bayesian information theoretic criterion for model learning, which has attractive properties when the potential model complexity is large relative to the size of the observed data set. The performance of our method is illustrated by analyzing both simulated and real data sets. Our simulation experiments demonstrate the gains in predictive accuracy which can obtained by considering structural learning of vector autoregressive processes instead of unstructured models. The analysis of the real data also shows that the understanding of the dynamics of a multivariate process can be improved significantly by considering more flexible model classes.


Bayesian analysis Granger-causality Graphical models Statistical learning Vector autoregression Markov chain Monte Carlo Greedy optimization 


  1. Abramovitz, M., & Stegun, I. A. (Eds.) (1965). Handbook of mathematical functions with formulas, graphs, and mathematical tables. New York: Dover. Google Scholar
  2. Akaike, H. (1969). Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics, 21, 243–247. MATHCrossRefMathSciNetGoogle Scholar
  3. Bach, F. R., & Jordan, M. I. (2004a). Beyond independent components: Trees and clusters. Journal of Machine Learning Research, 4, 1205–1233. MATHCrossRefMathSciNetGoogle Scholar
  4. Bach, F. R., & Jordan, M. I. (2004b). Learning graphical models for stationary time series. IEEE Transactions on Signal Processing, 52, 2189–2199. CrossRefMathSciNetGoogle Scholar
  5. Bernardo, J. M. (1999). Nested hypothesis testing: the Bayesian reference criterion. In J. M. Bernardo, J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian statistics (Vol. 6, pp. 101–130). London: Oxford University Press. With discussion. Google Scholar
  6. Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester: Wiley. MATHCrossRefGoogle Scholar
  7. Brillinger, D. R. (1996). Remarks concerning graphical models for time series and point processes. Revista de Econometria, 16, 1–23. MathSciNetGoogle Scholar
  8. Brüggemann, R., Krolzig, H.-M., & Lütkepohl, H. (2002). Comparison of model reduction methods for VAR processes. EUI Working Paper, ECO, 2002/19.
  9. Carvalho, C., & West, M. (2007). Dynamic matrix-variate graphical models. Bayesian Analysis, 2, 69–98. CrossRefMathSciNetGoogle Scholar
  10. Corander, J. (2003). Bayesian graphical model determination using decision theory. Journal of Multivariate Analysis, 85, 253–266. MATHCrossRefMathSciNetGoogle Scholar
  11. Corander, J., & Marttinen, P. (2006). Bayesian model learning based on predictive entropy. Journal of Logic, Language and Information, 15, 5–20. MATHCrossRefMathSciNetGoogle Scholar
  12. Corander, J., & Villani, M. (2006). A Bayesian approach to modelling graphical vector autoregressions. Journal of Time Series Analysis, 27, 141–156. MATHCrossRefMathSciNetGoogle Scholar
  13. Cormen, T. H., Leiserson, C. E., & Rivest, R. L. (2001). Introduction to algorithms (2nd edn.). Cambridge: MIT Press. MATHGoogle Scholar
  14. Dahlhaus, R. (2000). Graphical interaction models for multivariate time series. Metrika, 51, 157–172. MATHCrossRefMathSciNetGoogle Scholar
  15. Dahlhaus, R., & Eichler, M. (2003). Causality and graphical models in time series analysis. In P. J. Green, N. L. Hjort, & S. Richardson (Eds.), Highly structured stochastic systems (pp. 115–137). London: Oxford University Press. Google Scholar
  16. Dash, D. (2005). Restructuring dynamic causal systems in equilibrium. In: R. Cowell & Z. Ghahramani (Eds.), Proceedings of the tenth international workshop on artificial intelligence and statistics (AIStats). Society for artificial intelligence and statistics. Available electronically at
  17. Drton, M., & Eichler, M. (2006). Maximum likelihood estimation in Gaussian chain graph models under the alternative Markov property. Scandinavian Journal of Statistics, 33, 247–257. MATHCrossRefMathSciNetGoogle Scholar
  18. Eichler, M. (2001). Graphical modelling of multivariate time series. Technical report, Universität Heidelberg. arXiv:math.ST/0610654.
  19. Eichler, M. (2006a). Fitting graphical interaction models to multivariate time series. In Proceedings of the 22nd conference of uncertainty in artificial intelligence. Arlington: AUAI Press. Google Scholar
  20. Eichler, M. (2006b). Graphical modelling of dynamic relationships in multivariate time series. In M. Winterhalder, B. Schelter, & J. Timmer (Eds.), Handbook of time series analysis (pp. 335–372). New York: Wiley. CrossRefGoogle Scholar
  21. Eichler, M. (2007). Granger-causality and path diagrams for multivariate time series. Journal of Econometrics, 137, 334–353. CrossRefMathSciNetGoogle Scholar
  22. Eichler, M. (2008). Causal inference from multivariate time series: What can be learned from Granger causality. In C., Glymour, W. Wang & D. Westerstahl (Eds.), Proceedings from the 13th international congress of logic, methodology and philosophy of science. King’s College Publications, London. Google Scholar
  23. Eichler, M., Dahlhaus, R., & Sandkühler, J. (2003). Partial correlation analysis for the identification of synaptic connections. Biological Cybernetics, 89, 289–302. MATHCrossRefGoogle Scholar
  24. Florens, J. P., & Mouchart, M. (1985). A linear theory for noncausality. Econometrica, 53, 157–175. MATHCrossRefMathSciNetGoogle Scholar
  25. Fried, R., & Didelez, V. (2003). Decomposability and selection of graphical models for multivariate time series. Biometrika, 90, 251–267. MATHCrossRefMathSciNetGoogle Scholar
  26. Fried, R., & Didelez, V. (2005). Latent variable analysis and partial correlation graphs for multivariate time series. Statistics & Probability Letters, 73, 287–296. MATHCrossRefMathSciNetGoogle Scholar
  27. Friedman, N., Murphy, K., & Russell, S. (1998). Learning the structure of dynamic probabilistic networks. In G. F. Cooper & S. Moral (Eds.), Proceedings of the 14th annual conference on uncertainty in artificial intelligence (UAI-98). San Mateo: Morgan Kaufmann. Google Scholar
  28. Gather, U., Imhoff, M., & Fried, R. (2002). Graphical models for multivariate time series from intensive care monitoring. Statistics in Medicine, 21, 2685–2701. CrossRefGoogle Scholar
  29. Giudici, P., & Stanghellini, E. (2002). Bayesian inference for graphical factor analysis models. Psychometrika, 66, 577–592. CrossRefMathSciNetGoogle Scholar
  30. Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37, 24–36. Google Scholar
  31. Granger, C. W. J. (2001). Essays in econometrics: collected papers of Clive W.J. Granger. Cambridge: Cambridge University Press. Ghysels, E., Swanson, N.R. & Watson, M.W. (Eds.). Google Scholar
  32. Gredenhoff, M., & Karlsson, S. (1999). Lag-length selection in VAR-models using equal and unequal lag-length procedures. Computational Statistics, 14, 171–187. MATHGoogle Scholar
  33. Haario, H., Saksman, E., & Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli, 7, 223–242. MATHCrossRefMathSciNetGoogle Scholar
  34. Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society, B 41, 190–195. MathSciNetGoogle Scholar
  35. Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks—the combination of knowledge and statistical data. Machine Learning, 20, 197–243. MATHGoogle Scholar
  36. Imhoff, M., & Kuhls, S. (2006). Alarm algorithms in critical care monitoring. Anesthesia and Analgesia, 102, 1525–1537. CrossRefGoogle Scholar
  37. Iwasaki, Y., & Simon, H. A. (1994). Causality and model abstraction. Artificial Intelligence, 67, 143–194. MATHCrossRefMathSciNetGoogle Scholar
  38. Janzura, M., & Nielsen, J. (2006). A simulated annealing-based method for learning Bayesian networks from statistical data. International Journal of Intelligent Systems, 21, 335–348. MATHCrossRefGoogle Scholar
  39. Johansen, S. (1995). Likelihood-based inference in cointegrated vector autoregressive models. London: Oxford University Press. MATHCrossRefGoogle Scholar
  40. Jordan, M. I. (2004). Graphical models. Statistical Science, 19, 140–155. MATHCrossRefMathSciNetGoogle Scholar
  41. Koivisto, M., & Sood, K. (2004). Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5, 549–573. MathSciNetGoogle Scholar
  42. Lauritzen, S. L. (1996). Graphical models. London: Oxford University Press. Google Scholar
  43. Leimer, H.-G. (1993). Optimal decomposition by clique separators. Discrete Mathematics, 113, 99–123. MATHCrossRefMathSciNetGoogle Scholar
  44. Lütkepohl, H. (1993). Introduction to multiple time series analysis. Berlin: Springer. MATHGoogle Scholar
  45. Lynggaard, H., & Walther, K. H. (1993). Dynamic modelling with mixed graphical association models. Master’s thesis, Aalborg University. Google Scholar
  46. Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. San Diego: Academic Press. MATHGoogle Scholar
  47. Moneta, A., & Spirtes, P. (2005). Graph-based search procedure for vector autoregressive models. LEM Working Paper 2005/14, Sant’Anna School of Advanced Studies, Pisa. Google Scholar
  48. Oxley, L., Reale, M., & Tunnicliffe, W. (2004). Finding directed acyclic graphs for vector autoregressions. In J. Antoch (Ed.), Proceedings in computational statistics 2004 (pp. 1621–1628). Heidelberg: Physica. Google Scholar
  49. Ozcicek, O., & McMillin, W. D. (1999). Lag length selection in vector autoregressive models: symmetric and asymmetric lags. Applied Economics, 31, 517–524. CrossRefGoogle Scholar
  50. Pearl, J. (2000). Causality: models, reasoning, and inference. Cambridge: Cambridge University Press. MATHGoogle Scholar
  51. Reale, M., & Tunnicliffe Wilson, G. (2001). Identification of vector AR models with recursive structural errors using conditional independence graphs. Statistical Methods and Applications, 10, 49–65. MATHCrossRefGoogle Scholar
  52. Reale, M., & Tunnicliffe Wilson, G. (2002). The sampling properties of conditional independence graphs for structural vector autoregressions. Biometrika, 8, 457–461. CrossRefMathSciNetGoogle Scholar
  53. Robert, C. P., & Casella, G. (2005). Monte Carlo statistical methods (2nd ed.). New York: Springer. Google Scholar
  54. Roverato, A. (2002). Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scandinavian Journal of Statistics, 29, 391–411. MATHCrossRefMathSciNetGoogle Scholar
  55. Salvador, R., Suckling, J., Schwarzbauer, C., & Bullmore, E. (2005). Undirected graphs of frequency-dependent functional connectivity in whole brain networks. Philosophical Transactions of the Royal Society B Biological Sciences, 360, 937–946. CrossRefGoogle Scholar
  56. Schelter, B., Winterhalder, M., Hellwig, B., Guschlbauer, B., Lucking, C. H., & Timmer, J. (2006). Direct or indirect? Graphical models for neural oscillators. Journal of Physiology (Paris), 99, 37–46. CrossRefGoogle Scholar
  57. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464. MATHCrossRefMathSciNetGoogle Scholar
  58. Seinfeld, J. H. (1986). Atmospheric chemistry and physics of air pollution. New York: Wiley. Google Scholar
  59. Sisson, S. A. (2005). Transdimensional Markov chains: a decade of progress and future perspectives. Journal of the American Statistical Association, 100, 1077–1089. MATHCrossRefMathSciNetGoogle Scholar
  60. Speed, T. P., & Kiiveri, H. T. (1986). Gaussian distributions over finite graphs. Annals of Statistics, 14, 138–150. MATHCrossRefMathSciNetGoogle Scholar
  61. Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. Cambridge: MIT Press. Google Scholar
  62. Stanghellini, E., & Whittaker, J. (1999). Analysis of multivariate time series via a hidden graphical model. In D. Heckerman & J. Whittaker (Eds.), Proceedings of the seventh international workshop on artificial intelligence and statistics. San Mateo: Morgan Kaufmann. Google Scholar
  63. Valdés-Sosa, P. A., Sánchez-Bornot, J. M., Lage-Castellanos, A., Vega-Hernández, M., Bosch-Bayard, J., Melie-García, L., & Canalez-Rodríguez, E. (2005). Estimating brain functional connectivity with sparse multivariate autoregression. Philosophical Transactions of the Royal Society B-Biological Sciences, 360, 969–981. CrossRefGoogle Scholar
  64. Whittaker, J. (1990). Graphical models in applied multivariate statistics. New York: Wiley. MATHGoogle Scholar
  65. Winker, P., & Maringer, D. (2004). Optimal lag structure selection in VEC-models. Computing in Economics and Finance 2004 155, Society for Computational Economics. Google Scholar
  66. Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York: Wiley. MATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.University of HelsinkiHelsinkiFinland
  2. 2.Abo Akademi University AddressTurkuFinland

Personalised recommendations