Abstract
In recent years, mining the knowledge from asynchronous sequences by Hawkes process is a subject worthy of continued attention, and Hawkes processes based on the neural network have gradually become the most hotly researched fields, especially based on the recurrence neural network (RNN). However, these models still contain some inherent shortcomings of RNN, such as vanishing and exploding gradient and long-term dependency problems. Meanwhile, transformer-based on self-attention has achieved great success in sequential modeling like text processing and speech recognition. Although the Transformer Hawkes process (THP) has gained huge performance improvement, THPs do not effectively utilize the temporal information in the asynchronous events, for these asynchronous sequences, the event occurrence instants are as important as the types of events, while conventional THPs simply convert temporal information into position encoding and add them as the input of transformer. With this in mind, we come up with a new kind of Transformer-based Hawkes process model, temporal attention augmented transformer Hawkes Process (TAA-THP), we modify the traditional dot-product attention structure and introduce the temporal encoding into attention structure. We conduct numerous experiments on a wide range of synthetic and real-life datasets to validate the performance of our proposed TAA-THP model, a significant improvement compared with existing baseline models on the different measurements is achieved, including log-likelihood on the test dataset, and prediction accuracies of event types and occurrence times. In addition, through the ablation studies, we vividly demonstrate the merit of introducing additional temporal attention by comparing the performance of the model with and without temporal attention.
Similar content being viewed by others
References
Ogata Y (1998) Space-time point-process models for earthquake occurrences. Ann Inst Stat Math 50(2):379–402
Johnson AE, Pollard TJ, Shen L, Li-Wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG (2016) Mimic-III, a freely accessible critical care database. Scientific data 3(1):1–9
Mohler G, Carter J, Raje R (2018) Improving social harm indices with a modulated Hawkes process. Int J Forecast 34(3):431–439
Zhang L-N, Liu J-W, Zuo X (2020) Survival analysis of failures based on Hawkes process with Weibull base intensity. Eng Appl Artif Intell 93:103709
Luo D, Xu H, Zha H, Du J, Xie R, Yang X, Zhang W (2014) You are what you watch and when you watch: Inferring household structures from IPTV viewing data. IEEE Trans Broadcast 60(1):61–72
Zhao Q, Erdogdu MA, He HY, Rajaraman A, Leskovec J (2015) Seismic: a self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1513–1522
Daley DJ, Vere-Jones D (2007) An introduction to the theory of point processes: volume II general theory and structure. Springer Science & Business Media
Hawkes AG (1971) Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1):83–90
Reynaud-Bouret P, Schbath S et al (2010) Adaptive estimation for Hawkes processes; application to genome analysis. Ann Stat 38(5):2781–2822
Kobayashi R, Lambiotte R (2016) Tideh: time-dependent Hawkes process for predicting retweet dynamics. In: Proceedings of the tenth international conference on web and social media (ICWSM), pp. 191–200, ICWSM
Xu H, Farajtabar M, Zha H (2016) Learning Granger causality for Hawkes processes. In: International conference on machine learning, pp. 1717–1726
Zhou K, Zha H, Song L (2013) Learning social infectivity in sparse low-rank networks using multidimensional Hawkes processes. In: Artificial Intelligence and Statistics, pp. 641–649
Du N, Dai H, Trivedi R, Upadhyay U, Gomez-Rodriguez M, Song L (2016) Recurrent marked temporal point processes: Embedding event history to vector. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp. 1555–1564
Mei H, Eisner JM (2017) The neural Hawkes process: a neurally self-modulating multivariate point process. In: Advances in Neural Information Processing Systems, pp. 6754–6764
Xiao S, Yan J, Yang X, Zha H, Chu SM (2017) Modeling the intensity function of point process via recurrent neural networks. In: Thirty-first AAAI conference on artificial intelligence
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks 5(2):157–166
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In International conference on machine learning, pp. 1310–1318
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (ed) 3rd International conference on learning representations, ICLR
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 5998-6008
Yu D, Deng Li (2016) Automatic speech recognition. Springer, London
Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge
Girdhar R, Carreira J, Doersch C, et al. (2019) Video action transformer network. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 244–253.
Zhang Q, Lipani A, Kirnap O, Yilmaz E (2020) Self-attentive Hawkes process. In Proceedings of the 37th international conference on machine learning, pp. 11183–11193, ICML
Zuo S, Jiang H, Li Z, Zhao T, Zha H (2020) Transformer hawkes process. In Proceedings of the 37th international conference on machine learning, pp. 11692–11702, ICML
Dai Z, Yang Z et al. (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 2978–2988, ACL
Al-Rfou R Choe D et al. (2019) Character-level language modeling with deeper self-attention. In: The thirty-third conference on artificial intelligence, pp 3159–3166, AAAI
Yang Y, Etesami J, He N, Kiyavash N (2017) Online learning for multivariate Hawkes processes. Adv Neural Inf Process Syst 30:4937–4946
Hawkes AG (2018) Hawkes processes and their applications to finance: a review. Quant Fin 18(2):193–198
Hansen NR, Reynaud-Bouret P, Rivoirard V (2015) Lasso and probabilistic inequalities for multivariate point processes. Bernoulli 21(1):83–143
Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser L (2019) Universal transformers. In: 7th International conference on learning representations, ICLR
Graves A (2016) Adaptive computation time for recurrent neural networks. arXiv preprint https://arxiv.org/abs/1603.08983
WangC, Li M, Smola AJ (2019) Language models with transformers. arXiv preprint https://arxiv.org/abs/1904.09408
Robert C, Casella G (2013) Monte Carlo statistical methods. Springer Science & Business Media, Cham
Stoer J, Bulirsch R (2013) Introduction to numerical analysis, vol 12. Springer Science & Business Media, Cham
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3rd International conference on learning representations, ICLR
Leskovec J, Krevl A (2014) Snap datasets: Stanford large network dataset collection
Funding
This work was supported by the Science Foundation of China University of Petroleum, Beijing (No. 2462020YXZZ023). Thanks for Hong-Yuan Mei and Si-Miao Zuo for their generous help in our research state, their help greatly improved our research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Temporal Attention Augmented Transformer Hawkes Process.”
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Ln., Liu, Jw., Song, Zy. et al. Temporal attention augmented transformer Hawkes process. Neural Comput & Applic 34, 3795–3809 (2022). https://doi.org/10.1007/s00521-021-06641-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06641-z