Abstract
Vector Auto-regressive (VAR) models are useful for analyzing temporal dependencies among multivariate time series, known as Granger causality. There exist methods for learning sparse VAR models, leading directly to causal networks among the variables of interest. Another useful type of analysis comes from clustering methods, which summarize multiple time series by putting them into groups. We develop a methodology that integrates both types of analyses, motivated by the intuition that Granger causal relations in real-world time series may exhibit some clustering structure, in which case the estimation of both should be carried out together. Our methodology combines sparse learning and a nonparametric bi-clustered prior over the VAR model, conducting full Bayesian inference via blocked Gibbs sampling. Experiments on simulated and real data demonstrate improvements in both model estimation and clustering quality over standard alternatives, and in particular biologically more meaningful clusters in a T-cell activation gene expression time series dataset than those by other methods.
Chapter PDF
Similar content being viewed by others
Keywords
References
Brock, G., Pihur, V., Datta, S., Datta, S.: clvalid: An R package for cluster validation. Journal of Statistical Software 25(4), 1–22 (2008)
Busygin, S., Prokopyev, O., Pardalos, P.: Biclustering in data mining. Computers & Operations Research 35(9), 2964–2987 (2008)
Cooke, E., Savage, R., Kirk, P., Darkins, R., Wild, D.: Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements. BMC Bioinformatics 12(1), 399 (2011)
Datta, S., Datta, S.: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 7(1), 397 (2006)
Fujita, A., Sato, J., Garay-Malpartida, H., Yamaguchi, R., Miyano, S., Sogayar, M., Ferreira, C.: Modeling gene expression regulatory networks with the sparse vector autoregressive model. BMC Systems Biology 1(1), 39 (2007)
Girvan, M., Newman, M.: Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99(12), 7821 (2002)
Granger, C.: Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 424–438 (1969)
Heller, K., Ghahramani, Z.: Bayesian hierarchical clustering. In: The 22nd International Conference on Machine Learning, pp. 297–304. ACM (2005)
Herman, I., Melançon, G., Marshall, M.: Graph visualization and navigation in information visualization: A survey. IEEE Transactions on Visualization and Computer Graphics 6(1), 24–43 (2000)
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)
Lozano, A., Abe, N., Liu, Y., Rosset, S.: Grouped graphical granger modeling for gene expression regulatory networks discovery. Bioinformatics 25(12), i110 (2009)
Marlin, B.M., Schmidt, M., Murphy, K.P.: Group sparse priors for covariance estimation. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montreal, Canada (2009)
Meeds, E., Roweis, S.: Nonparametric Bayesian biclustering. Technical report, Department of Computer Science, University of Toronto (2007)
Mills, T.C.: The Econometric Modelling of Financial Time Series, 2nd edn. Cambridge University Press (1999)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems (2001)
Porteous, I., Bart, E., Welling, M.: Multi-hdp: A non-parametric bayesian model for tensor factorization. In: Proc. of the 23rd National Conf. on Artificial Intelligence, pp. 1487–1490 (2008)
Ramoni, M., Sebastiani, P., Kohane, I.: Cluster analysis of gene expression dynamics. Proceedings of the National Academy of Sciences 99(14), 9121 (2002)
Rangel, C., Angus, J., Ghahramani, Z., Lioumi, M., Sotheran, E., Gaiba, A., Wild, D., Falciani, F.: Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics 20(9), 1361–1372 (2004)
Reimand, J., Arak, T., Vilo, J.: g: Profiler – a web server for functional interpretation of gene lists (2011 update). Nucleic Acids Research 39(suppl. 2), W307–W315 (2011)
Schaeffer, S.: Graph clustering. Computer Science Review 1(1), 27–64 (2007)
Sethuraman, J.: A constructive definition of Dirichlet priors. Statistica Sinica 4, 639–650 (1994)
Shojaie, A., Basu, S., Michailidis, G.: Adaptive thresholding for reconstructing regulatory networks from time-course gene expression data. Statistics in Biosciences, 1–18 (2011)
Tsay, R.S.: Analysis of financial time series. Wiley-Interscience (2005)
Zou, H.: The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476), 1418–1429 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, TK., Schneider, J. (2012). Learning Bi-clustered Vector Autoregressive Models. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-33486-3_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33485-6
Online ISBN: 978-3-642-33486-3
eBook Packages: Computer ScienceComputer Science (R0)