Temporal attention augmented transformer Hawkes process

Zhang, Lu-ning; Liu, Jian-wei; Song, Zhi-yan; Zuo, Xin

doi:10.1007/s00521-021-06641-z

Temporal attention augmented transformer Hawkes process

Original Article
Published: 08 November 2021

Volume 34, pages 3795–3809, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Lu-ning Zhang¹,
Jian-wei Liu¹,
Zhi-yan Song¹ &
…
Xin Zuo¹

715 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

In recent years, mining the knowledge from asynchronous sequences by Hawkes process is a subject worthy of continued attention, and Hawkes processes based on the neural network have gradually become the most hotly researched fields, especially based on the recurrence neural network (RNN). However, these models still contain some inherent shortcomings of RNN, such as vanishing and exploding gradient and long-term dependency problems. Meanwhile, transformer-based on self-attention has achieved great success in sequential modeling like text processing and speech recognition. Although the Transformer Hawkes process (THP) has gained huge performance improvement, THPs do not effectively utilize the temporal information in the asynchronous events, for these asynchronous sequences, the event occurrence instants are as important as the types of events, while conventional THPs simply convert temporal information into position encoding and add them as the input of transformer. With this in mind, we come up with a new kind of Transformer-based Hawkes process model, temporal attention augmented transformer Hawkes Process (TAA-THP), we modify the traditional dot-product attention structure and introduce the temporal encoding into attention structure. We conduct numerous experiments on a wide range of synthetic and real-life datasets to validate the performance of our proposed TAA-THP model, a significant improvement compared with existing baseline models on the different measurements is achieved, including log-likelihood on the test dataset, and prediction accuracies of event types and occurrence times. In addition, through the ablation studies, we vividly demonstrate the merit of introducing additional temporal attention by comparing the performance of the model with and without temporal attention.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on the long short-term memory model

Article 13 May 2020

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

References

Ogata Y (1998) Space-time point-process models for earthquake occurrences. Ann Inst Stat Math 50(2):379–402
Article Google Scholar
Johnson AE, Pollard TJ, Shen L, Li-Wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG (2016) Mimic-III, a freely accessible critical care database. Scientific data 3(1):1–9
Article Google Scholar
Mohler G, Carter J, Raje R (2018) Improving social harm indices with a modulated Hawkes process. Int J Forecast 34(3):431–439
Article Google Scholar
Zhang L-N, Liu J-W, Zuo X (2020) Survival analysis of failures based on Hawkes process with Weibull base intensity. Eng Appl Artif Intell 93:103709
Article Google Scholar
Luo D, Xu H, Zha H, Du J, Xie R, Yang X, Zhang W (2014) You are what you watch and when you watch: Inferring household structures from IPTV viewing data. IEEE Trans Broadcast 60(1):61–72
Article Google Scholar
Zhao Q, Erdogdu MA, He HY, Rajaraman A, Leskovec J (2015) Seismic: a self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1513–1522
Daley DJ, Vere-Jones D (2007) An introduction to the theory of point processes: volume II general theory and structure. Springer Science & Business Media
Hawkes AG (1971) Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1):83–90
Article MathSciNet Google Scholar
Reynaud-Bouret P, Schbath S et al (2010) Adaptive estimation for Hawkes processes; application to genome analysis. Ann Stat 38(5):2781–2822
Article MathSciNet Google Scholar
Kobayashi R, Lambiotte R (2016) Tideh: time-dependent Hawkes process for predicting retweet dynamics. In: Proceedings of the tenth international conference on web and social media (ICWSM), pp. 191–200, ICWSM
Xu H, Farajtabar M, Zha H (2016) Learning Granger causality for Hawkes processes. In: International conference on machine learning, pp. 1717–1726
Zhou K, Zha H, Song L (2013) Learning social infectivity in sparse low-rank networks using multidimensional Hawkes processes. In: Artificial Intelligence and Statistics, pp. 641–649
Du N, Dai H, Trivedi R, Upadhyay U, Gomez-Rodriguez M, Song L (2016) Recurrent marked temporal point processes: Embedding event history to vector. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp. 1555–1564
Mei H, Eisner JM (2017) The neural Hawkes process: a neurally self-modulating multivariate point process. In: Advances in Neural Information Processing Systems, pp. 6754–6764
Xiao S, Yan J, Yang X, Zha H, Chu SM (2017) Modeling the intensity function of point process via recurrent neural networks. In: Thirty-first AAAI conference on artificial intelligence
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks 5(2):157–166
Article Google Scholar
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In International conference on machine learning, pp. 1310–1318
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (ed) 3rd International conference on learning representations, ICLR
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 5998-6008
Yu D, Deng Li (2016) Automatic speech recognition. Springer, London
MATH Google Scholar
Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge
Book Google Scholar
Girdhar R, Carreira J, Doersch C, et al. (2019) Video action transformer network. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 244–253.
Zhang Q, Lipani A, Kirnap O, Yilmaz E (2020) Self-attentive Hawkes process. In Proceedings of the 37th international conference on machine learning, pp. 11183–11193, ICML
Zuo S, Jiang H, Li Z, Zhao T, Zha H (2020) Transformer hawkes process. In Proceedings of the 37th international conference on machine learning, pp. 11692–11702, ICML
Dai Z, Yang Z et al. (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 2978–2988, ACL
Al-Rfou R Choe D et al. (2019) Character-level language modeling with deeper self-attention. In: The thirty-third conference on artificial intelligence, pp 3159–3166, AAAI
Yang Y, Etesami J, He N, Kiyavash N (2017) Online learning for multivariate Hawkes processes. Adv Neural Inf Process Syst 30:4937–4946
Google Scholar
Hawkes AG (2018) Hawkes processes and their applications to finance: a review. Quant Fin 18(2):193–198
Article MathSciNet Google Scholar
Hansen NR, Reynaud-Bouret P, Rivoirard V (2015) Lasso and probabilistic inequalities for multivariate point processes. Bernoulli 21(1):83–143
Article MathSciNet Google Scholar
Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser L (2019) Universal transformers. In: 7th International conference on learning representations, ICLR
Graves A (2016) Adaptive computation time for recurrent neural networks. arXiv preprint https://arxiv.org/abs/1603.08983
WangC, Li M, Smola AJ (2019) Language models with transformers. arXiv preprint https://arxiv.org/abs/1904.09408
Robert C, Casella G (2013) Monte Carlo statistical methods. Springer Science & Business Media, Cham
MATH Google Scholar
Stoer J, Bulirsch R (2013) Introduction to numerical analysis, vol 12. Springer Science & Business Media, Cham
MATH Google Scholar
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3rd International conference on learning representations, ICLR
Leskovec J, Krevl A (2014) Snap datasets: Stanford large network dataset collection

Download references

Funding

This work was supported by the Science Foundation of China University of Petroleum, Beijing (No. 2462020YXZZ023). Thanks for Hong-Yuan Mei and Si-Miao Zuo for their generous help in our research state, their help greatly improved our research.

Author information

Authors and Affiliations

Department of Automation, College of Information Science and Engineering, China University of Petroleum, Beijing, 102249, China
Lu-ning Zhang, Jian-wei Liu, Zhi-yan Song & Xin Zuo

Authors

Lu-ning Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian-wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-yan Song
View author publications
You can also search for this author in PubMed Google Scholar
Xin Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian-wei Liu.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Temporal Attention Augmented Transformer Hawkes Process.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Ln., Liu, Jw., Song, Zy. et al. Temporal attention augmented transformer Hawkes process. Neural Comput & Applic 34, 3795–3809 (2022). https://doi.org/10.1007/s00521-021-06641-z

Download citation

Received: 24 April 2021
Accepted: 15 October 2021
Published: 08 November 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00521-021-06641-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Temporal attention augmented transformer Hawkes process

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

Video summarization using deep learning techniques: a detailed analysis and investigation

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Temporal attention augmented transformer Hawkes process

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

Video summarization using deep learning techniques: a detailed analysis and investigation

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation