Learning Bi-clustered Vector Autoregressive Models
- 3.3k Downloads
Abstract
Vector Auto-regressive (VAR) models are useful for analyzing temporal dependencies among multivariate time series, known as Granger causality. There exist methods for learning sparse VAR models, leading directly to causal networks among the variables of interest. Another useful type of analysis comes from clustering methods, which summarize multiple time series by putting them into groups. We develop a methodology that integrates both types of analyses, motivated by the intuition that Granger causal relations in real-world time series may exhibit some clustering structure, in which case the estimation of both should be carried out together. Our methodology combines sparse learning and a nonparametric bi-clustered prior over the VAR model, conducting full Bayesian inference via blocked Gibbs sampling. Experiments on simulated and real data demonstrate improvements in both model estimation and clustering quality over standard alternatives, and in particular biologically more meaningful clusters in a T-cell activation gene expression time series dataset than those by other methods.
Keywords
time-series analysis vector auto-regressive models bi-clustering Bayesian non-parametrics gene expression analysisReferences
- 1.Brock, G., Pihur, V., Datta, S., Datta, S.: clvalid: An R package for cluster validation. Journal of Statistical Software 25(4), 1–22 (2008)Google Scholar
- 2.Busygin, S., Prokopyev, O., Pardalos, P.: Biclustering in data mining. Computers & Operations Research 35(9), 2964–2987 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
- 3.Cooke, E., Savage, R., Kirk, P., Darkins, R., Wild, D.: Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements. BMC Bioinformatics 12(1), 399 (2011)CrossRefGoogle Scholar
- 4.Datta, S., Datta, S.: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 7(1), 397 (2006)CrossRefGoogle Scholar
- 5.Fujita, A., Sato, J., Garay-Malpartida, H., Yamaguchi, R., Miyano, S., Sogayar, M., Ferreira, C.: Modeling gene expression regulatory networks with the sparse vector autoregressive model. BMC Systems Biology 1(1), 39 (2007)CrossRefGoogle Scholar
- 6.Girvan, M., Newman, M.: Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99(12), 7821 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
- 7.Granger, C.: Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 424–438 (1969)Google Scholar
- 8.Heller, K., Ghahramani, Z.: Bayesian hierarchical clustering. In: The 22nd International Conference on Machine Learning, pp. 297–304. ACM (2005)Google Scholar
- 9.Herman, I., Melançon, G., Marshall, M.: Graph visualization and navigation in information visualization: A survey. IEEE Transactions on Visualization and Computer Graphics 6(1), 24–43 (2000)CrossRefGoogle Scholar
- 10.Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)CrossRefGoogle Scholar
- 11.Lozano, A., Abe, N., Liu, Y., Rosset, S.: Grouped graphical granger modeling for gene expression regulatory networks discovery. Bioinformatics 25(12), i110 (2009)CrossRefGoogle Scholar
- 12.Marlin, B.M., Schmidt, M., Murphy, K.P.: Group sparse priors for covariance estimation. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montreal, Canada (2009)Google Scholar
- 13.Meeds, E., Roweis, S.: Nonparametric Bayesian biclustering. Technical report, Department of Computer Science, University of Toronto (2007)Google Scholar
- 14.Mills, T.C.: The Econometric Modelling of Financial Time Series, 2nd edn. Cambridge University Press (1999)Google Scholar
- 15.Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems (2001)Google Scholar
- 16.Porteous, I., Bart, E., Welling, M.: Multi-hdp: A non-parametric bayesian model for tensor factorization. In: Proc. of the 23rd National Conf. on Artificial Intelligence, pp. 1487–1490 (2008)Google Scholar
- 17.Ramoni, M., Sebastiani, P., Kohane, I.: Cluster analysis of gene expression dynamics. Proceedings of the National Academy of Sciences 99(14), 9121 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
- 18.Rangel, C., Angus, J., Ghahramani, Z., Lioumi, M., Sotheran, E., Gaiba, A., Wild, D., Falciani, F.: Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics 20(9), 1361–1372 (2004)CrossRefGoogle Scholar
- 19.Reimand, J., Arak, T., Vilo, J.: g: Profiler – a web server for functional interpretation of gene lists (2011 update). Nucleic Acids Research 39(suppl. 2), W307–W315 (2011)CrossRefGoogle Scholar
- 20.Schaeffer, S.: Graph clustering. Computer Science Review 1(1), 27–64 (2007)MathSciNetCrossRefGoogle Scholar
- 21.Sethuraman, J.: A constructive definition of Dirichlet priors. Statistica Sinica 4, 639–650 (1994)MathSciNetzbMATHGoogle Scholar
- 22.Shojaie, A., Basu, S., Michailidis, G.: Adaptive thresholding for reconstructing regulatory networks from time-course gene expression data. Statistics in Biosciences, 1–18 (2011)Google Scholar
- 23.Tsay, R.S.: Analysis of financial time series. Wiley-Interscience (2005)Google Scholar
- 24.Zou, H.: The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476), 1418–1429 (2006)MathSciNetzbMATHCrossRefGoogle Scholar