Advertisement

International Journal of Computer Vision

, Volume 105, Issue 3, pp 246–268 | Cite as

An Improved Hierarchical Dirichlet Process-Hidden Markov Model and Its Application to Trajectory Modeling and Retrieval

  • Weiming Hu
  • Guodong Tian
  • Xi LiEmail author
  • Stephen Maybank
Article

Abstract

In this paper, we propose a hierarchical Bayesian model, an improved hierarchical Dirichlet process-hidden Markov model (iHDP-HMM), for visual document analysis. The iHDP-HMM is capable of clustering visual documents and capturing the temporal correlations between the visual words within a visual document while identifying the number of document clusters and the number of visual topics adaptively. A Bayesian inference mechanism for the iHDP-HMM is developed to carry out likelihood evaluation, topic estimation, and cluster membership prediction. We apply the iHDP-HMM to simultaneously cluster motion trajectories and discover latent topics for trajectory words, based on the proposed method for constructing the trajectory word codebook. Then, an iHDP-HMM-based probabilistic trajectory retrieval framework is developed. The experimental results verify the clustering accuracy of the iHDP-HMM and trajectory retrieval accuracy of the proposed framework.

Keywords

Hierarchical Dirichlet process  Hidden Markov model  Trajectory analysis  Video retrieval 

References

  1. Alon, J., Sclaroff, S., Kollios, G., Pavlovic, V. (2003). Discovering clusters in motion time-series data. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Vol. 1, pp. 375–381).Google Scholar
  2. Atev, S., Miller, G., & Papanikolopoulos, N. P. (2010). Clustering of vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems, 11(3), 647–657.CrossRefGoogle Scholar
  3. Bashir, F., Khokhar, A., Schonfeld, D. (2004). A hybrid system for affine-invariant trajectory retrieval. In Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval (pp. 235–242).Google Scholar
  4. Bashir, F. I., Khokhar, A. A., & Schonfeld, D. (2007). Real-time motion trajectory-based indexing and retrieval of video sequences. IEEE Transactions on Multimedia, 9(1), 58–65.CrossRefGoogle Scholar
  5. Beal, M.J., Ghahramani, Z., Rasmussen, C. (2002). The infinite hidden Markov model. In Proceedings of Annual Conference on Neural Information Processing Systems (Vol. 14, pp. 577–584).Google Scholar
  6. Beal, M.J., Krishnamurthy, P. (2006). Gene expression time course clustering with countably infinite hidden Markov models. In Proceedings of Annual Conference on Uncertainty in Artificial Intelligence (pp. 23–30).Google Scholar
  7. Blackwell, D., & Macqueen, J. B. (1973). Ferguson distribution via polya urn schemes. The Annals of Statistics, 1(2), 353–355.MathSciNetzbMATHCrossRefGoogle Scholar
  8. Blei, D.M., Jordan, M.I. (2004). Variational methods for the Dirichlet process. In Proceedings of International Conference on Machine Learning (pp. 121–144).Google Scholar
  9. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.zbMATHGoogle Scholar
  10. Chen L., Ozsu M.T., Oria V. (2004). Symbolic representation and retrieval of moving object trajectories. In Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval (pp. 227–234).Google Scholar
  11. Chen, L., Ozsu, M.T., Oria, V. (2005). Robust and fast similarity search for moving object trajectories. In Proceedings of ACM International Conference on Management of Data (pp. 491–502).Google Scholar
  12. Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.CrossRefGoogle Scholar
  13. Dimitrova, N., & Golshani, F. (1995). Motion recovery for video content classification. ACM Transactions on Information System, 13(14), 408–439.CrossRefGoogle Scholar
  14. Dyana, A., Das, S. (2007). Spatio-temporal descriptor using 3D curvature scale space. In Proceedings of International Conference on Pattern Recognition and Machine Intelligence (pp. 632–640).Google Scholar
  15. Dyana, A., Subramanian, M.P., Das, S. (2009). Combining reatures for shape and motion trajectory of video objects for efficient content based video retrieval. In Proceedings of International Conference on Advances in Pattern Recognition (pp. 113–116).Google Scholar
  16. Dyana, A., & Das, S. (2010). MST-CSS (multi-spectro-temporal curvature scale space), a novel spatio-temporal representation for content-based video retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 20(8), 1080–1094.CrossRefGoogle Scholar
  17. Ferguson, T. (1973). A Bayesian analysis of some non-parametric problems. The Annals of Statistics, 1(2), 209–230.MathSciNetzbMATHCrossRefGoogle Scholar
  18. Fox, E. B., Sudderth, E. B., Jordan, M. I., & Willsky, A. S. (2008). An HDP-HMM for systems with state persistence. In Proceedins of International Conference on Machine Learning (pp. 312–319). Finland: Helsinki.Google Scholar
  19. Georgescu, B., Shimshoni, I., Meer, P. (2003). Mean shift based clustering in high dimensions: A texture classification example. In Proceedings of IEEE International Conference on Computer Vision (Vol. 1, pp. 456–463).Google Scholar
  20. Hsieh, J., Yu, S., & Chen, Y. (2006). Motion-based video retrieval by trajectory matching. IEEE Transactions on Circuits and Systems for Video Technology, 16(3), 396–409.CrossRefGoogle Scholar
  21. Jian, Y.-D., & Chen, C.-S. (2010). Two-view motion segmentation with model selection and outlier removal by Ransac-enhanced Dirichlet process mixture models. International Journal of Computer Vision, 88(3), 489–501.MathSciNetCrossRefGoogle Scholar
  22. Johnson, N., & Hogg, D. (1996). Learning the distribution of object trajectories for event recognition. Image and Vision Computing, 14(8), 609–615.CrossRefGoogle Scholar
  23. Jung, C. R., Hennemann, L., & Musse, S. R. (2008). Event detection using trajectory clustering and 4-D histograms. IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1565– 1575.CrossRefGoogle Scholar
  24. Keogh, E.J., Pazzani, M.J. (2000). Scaling up dynamic time warping for datamining applications. In Proceedings of International Conference on Knowledge Discovery and Data Mining (pp. 285–289).Google Scholar
  25. Kivinen, J.J., Sudderth, E.B., Jordan, M.I. (2007). Learning multiscale representations of natural scenes using Dirichlet processes. In Proceedings of IEEE International Conference on Computer Vision (pp. 1–8).Google Scholar
  26. Kuettel, D., Breitenstein, M.D., Gool, L.V., Ferrari, V. (2010). What’s going on? Discovering spatio-temporal dependencies in dynamic scenes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1951–1958).Google Scholar
  27. Le, T.-L., Boucher, A., Thonnat, M. (2006). Trajectory-based video indexing and retrieval enabling relevance feedback. In Proceedings of International Conference on Communications and Electronics (pp. 1–6).Google Scholar
  28. Le, T.-L., Boucher, A., Thonnat, M. (2007). Subtrajectory-based video indexing and retrieval. In Proceedings of International Multimedia Modeling Conference (pp. 418–427). Singapore.Google Scholar
  29. Li, X., Hu, W.M., Zhang, Z.F., Zhang, X.Q., Luo, G. (2008). Trajectory-based video retrieval using Dirichlet process mixture models. In Proceedings of British Machine Vision Conference (pp. 1–10). UK: Leeds.Google Scholar
  30. Li, F.-F., Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 524–531).Google Scholar
  31. Li, L., Wang, G., Li, F.-F. (2007) OPTIMOL: Automatic online picture collection via incremental model learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8).Google Scholar
  32. Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.CrossRefGoogle Scholar
  33. Little, J. J., Gu, Z. (2001). Video retrieval by spatial and temporal structure of trajectories. In Proceedings SPIE Storage and Retrieval for Media Databases (Vol. 4315, pp. 545–552).Google Scholar
  34. Liu, C.-L., Zhou, X.-D. (2006). Online Japanese character recognition using trajectory-based normalization and direction feature extraction. In Proceedings of International Workshop on Frontiers in Handwriting Recognition (pp. 217–222). France: La Baule.Google Scholar
  35. Liu, C.-L., Jaeger, S., & Nakagawa, M. (2004). Online recognition of Chinese characters: The state-of-the-art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 198–213.CrossRefGoogle Scholar
  36. Ma, X., Bashir, F., Khokhar, A. A., & Schonfeld, D. (2009). Event analysis based on multiple interactive motion trajectories. IEEE Transactions on Circuits and Systems for Video Technology, 19(3), 397–406.CrossRefGoogle Scholar
  37. Maceachern, S. N., & Muller, P. (1998). Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics, 7(2), 223–238.Google Scholar
  38. Morris, B.T., Trivedi, M.M. (2009). Learning trajectory patterns by clustering: Experimental studies and comparative evaluation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 312–319).Google Scholar
  39. Morris, B. T., & Trivedi, M. M. (2008). Learning, modeling, and classification of vehicle track patterns from live video. IEEE Transactions on Intelligent Transportation Systems, 9(3), 425–437.CrossRefGoogle Scholar
  40. Morris, B. T., & Trivedi, M. M. (2008). A survey of vision-based trajectory learning and analysis for surveillance. IEEE Transactions on Circuits and Systems for Video Technology, 18(8), 1114–1127.CrossRefGoogle Scholar
  41. Naftel, A., Khalid, S. (2006). Motion trajectory learning in the DFT-coefficient feature space. In Proceedings of IEEE International Conference on Computer Vision Systems (pp. 47–47), Jan 2006.Google Scholar
  42. Neal, R. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2), 249–265.MathSciNetGoogle Scholar
  43. Niebles, J., Wang, H. C., & Li, F.-F. (2008). Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3), 299–318.CrossRefGoogle Scholar
  44. Piotto, N., Conci, N., & De Natale, F. G. B. (2009). Syntactic matching of trajectories for ambient intelligence applications. IEEE Transactions on Multimedia, 11(7), 1266–1275.CrossRefGoogle Scholar
  45. Sahouria, E. (1997). Video Indexing Based on Object Motion. M.S. Thesis, Department of Electrical Engineering and Computer Science, University of California, Berkeley.Google Scholar
  46. Saleemi, I., Shafique, K., & Shah, M. (2009). Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1472–1485.CrossRefGoogle Scholar
  47. Shim, C.-B., Chang, J.-W. (2000). Spatio-temporal representation and retrieval using moving object’s trajectories. In Proceedings of ACM Workshops on Multimedia (pp. 209–212).Google Scholar
  48. Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T. (2005). Discovering objects and their location in images. In Proceedings of IEEE International Conference on Computer Vision (Vol. 1, pp. 370–377). Google Scholar
  49. Sun, J., Zhang, W., Tang, X., Shum, H. (2005). Bidirectional tracking using trajectory segment analysis. In Proceedings of IEEE International Conference on Computer Vision (Vol. 1, pp. 717–724).Google Scholar
  50. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. (2005). Sharing clusters among related groups: Hierarchical Dirichlet processes. In Proceedings of Annual Conference on Neural Information Processing Systems (pp. 1385–1392).Google Scholar
  51. Teh, Y., Jordan, M., Beal, M., & Blei, D. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566–1581.MathSciNetzbMATHCrossRefGoogle Scholar
  52. Veeraraghavan, H., & Papanikolopoulos, N. P. (2009). Learning to recognize video-based spatiotemporal events. IEEE Transactions on Intelligent Transportation Systems, 10(4), 628–638.CrossRefGoogle Scholar
  53. Vlachos, M., Kollios, G., Gunopulos, D. (2002). Discovering similar multidimensional trajectories. In Proceedings of International Conference on Data Engineering (pp. 673–684).Google Scholar
  54. Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., & Keogh, E. (2006). Indexing multidimensional time-series. International Journal on Very Large Data Bases, 15(1), 1–20.CrossRefGoogle Scholar
  55. Wang, X., Grimson, E. (2007). Spatial latent Dirichlet allocation. In Proceedings of Annual Conference on Neural Information Processing Systems (pp. 1–8).Google Scholar
  56. Wang, X., Ma, X., Grimson, E. (2007). Unsupervised activity perception by hierarchical Bayesian models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8).Google Scholar
  57. Wang, X., Tieu, K., Grimson, E. (2006). Learning semantic scene models by trajectory analysis. In Proceedings of European Conference on Computer Vision (Vol. 3, pp. 110–123).Google Scholar
  58. Wang, G., Zhang, Y., Li, F.-F. (2006). Using dependent regions for object categorization in a generative framework. In Proceedings of Computer Vision and Pattern Recognition (Vol. 2, pp. 1597–1604).Google Scholar
  59. Zhang, Z., Huang, K., Tan, T. (2006). Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes. In Proceedings of IEEE International Conference on Pattern Recognition (pp. 1135–1138).Google Scholar
  60. Zhang, C., Zhu, S., Gong, Y. (2006). Trend analysis for large document streams. In Proceedings of International Conference on Machine Learning and Applications (pp. 285–295).Google Scholar
  61. Zhu, X., Ghahramani, Z., Lafferty, J. (2005). Time-sensitive Dirichlet process mixture models. Technical Report CMUCALD-05-104, School of Computer Science, Carnegie Mellon University.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Weiming Hu
    • 1
  • Guodong Tian
    • 1
  • Xi Li
    • 1
    Email author
  • Stephen Maybank
    • 2
  1. 1.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingChina
  2. 2.Department of Computer Science and Information SystemsBirkbeck CollegeLondonUK

Personalised recommendations