Automatic detection of trends in time-stamped sequences: an evolutionary approach

Araujo, Lourdes; Merelo, Juan Julián

doi:10.1007/s00500-008-0395-8

Automatic detection of trends in time-stamped sequences: an evolutionary approach

Original Paper
Published: 14 January 2009

Volume 14, pages 211–227, (2010)
Cite this article

Soft Computing Aims and scope Submit manuscript

Lourdes Araujo¹ &
Juan Julián Merelo²

118 Accesses
4 Citations
Explore all metrics

Abstract

This paper presents an evolutionary algorithm for modeling the arrival dates in time-stamped data sequences such as newscasts, e-mails, IRC conversations, scientific journal articles or weblog postings. These models are applied to the detection of buzz (i.e. terms that occur with a higher-than-normal frequency) in them, which has attracted a lot of interest in the online world with the increasing number of periodic content producers. That is why in this paper we have used this kind of online sequences to test our system, though it is also valid for other types of event sequences. The algorithm assigns frequencies (number of events per time unit) to time intervals so that it produces an optimal fit to the data. The optimization procedure is a trade off between accurately fitting the data and avoiding too many frequency changes, thus overcoming the noise inherent in these sequences. This process has been traditionally performed using dynamic programming algorithms, which are limited by memory and efficiency requirements. This limitation can be a problem when dealing with long sequences, and suggests the application of alternative search methods with some degree of uncertainty to achieve tractability, such as the evolutionary algorithm proposed in this paper. This algorithm is able to reach the same solution quality as those classical dynamic programming algorithms, but in a shorter time. We also test different cost functions and propose a new one that yields better fits than the one originally proposed by Kleinberg on real-world data. Finally, several distributions of states for the finite state automata are tested, with the result that an uniform distribution produces much better fits than the geometric distribution also proposed by Kleinberg. We also present a variant of the evolutionary algorithm, which achieves a fast fit of a sequence extended with new data, by taking advantage of the fit obtained for the original subsequence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modelling informative time points: an evolutionary process approach

Article 24 June 2020

Mining sequential patterns from probabilistic databases

Article 24 July 2014

On measuring similarity for sequences of itemsets

Article 20 July 2014

Notes

The number of generations required to reach the optimal fit is about ten times the one required with the strategy finally adopted.
The number of generations required to reach the optimal fit is about twenty times the one required with the strategy finally adopted.
http://www.blogalia.com.
terrorism.
terrorist attack.

References

Araujo L (2004) Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Trans Evol Comput 8(1):14–27
Article MathSciNet Google Scholar
Araujo L, Merelo JJ (2006) Automatic detection of trends in dynamical text: an evolutionary approach. http://www.citebase.org/abstract?id=oai:arXiV.org:cs/0601047
Araujo L, Cuesta JA, Merelo JJ (2006) Genetic algorithm for burst detection and activity tracking in event streams. In: Runarsson TP, Beyer HG, Burke E, Guervós JJM, Bullinaria LDWA, Rowe J, Yao X (eds) Proceedings PPSN IX, no. 4193. Lecture notes in computer science, LNCS. Springer, Berlin, pp 453–462
Bingham E, Kabán A, Girolami M (2003) Topic identification in dynamical text by complexity pursuit. Neural Process Lett 17(1):69–83
Article Google Scholar
Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming, 2002. http://citeseer.ist.psu.edu/charikar02finding.html
Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer, Berlin
Elwalid AI, Mitra D (1993) Effective bandwidth of general Markovian traffic sources and admission control of high speed networks. IEEE/ACM Trans Netw 1(3):329–343
Article Google Scholar
Forney GD (1973) The Viterbi algorithm. Proc IEEE 61(3):268–278
Article MathSciNet Google Scholar
Galvão RK, Becerra VM, Abou-Seada M (2004) Ratio selection for classification models. Data Mining and Knowledge Discovery 8(2):151–170. doi:10.1023/B:DAMI.0000015913.38787.b3
Google Scholar
Girolami M, Kaban A (2004) Simplicial mixtures of Markov chains: distributed modelling of dynamic user profiles. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, Cambridge
Google Scholar
Goldberg DE (1989) Genetic Algorithms in search, optimization and machine learning. Addison Wesley, Reading
Gollapudi S, Sivakumar D (2004) Framework and algorithms for trend analysis in massive temporal data sets. In: CIKM’04: Proceedings of the thirteenth ACM international conference on Information and knowledge management. ACM Press, New York, pp 168–177. doi:10.1145/1031171.1031208
Gruhl D, Guha R, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: KDD’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press, New York, pp 78–87. doi:10.1145/1081870.1081883. http://portal.acm.org/citation.cfm?id=1081883
Hsu WH, Welge M, Redman T, Clutter D (2002) High-performance commercial data mining: a multistrategy machine learning application. Data Min Knowl Discov 6(4):361–391
Article MathSciNet Google Scholar
Ihler A, Hutchins J, Smyth P (2006) Adaptive event detection with time-varying poisson processes. In: KDD’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, pp 207–216. doi:10.1145/1150402.1150428
Kleinberg JM (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4):373–397
Article MathSciNet Google Scholar
Kleinberg J (2006) Temporal dynamics of on-line information streams. In: Garofalakis M, Gehrke J, Rastogi R (eds) Data stream management: processing high-speed data streams. Springer, Berlin. http://www.cs.cornell.edu/home/kleinber/stream-survey04.pdf
Kumar R, Novak J, Raghavan P, Tomkins A (2004) Structure and evolution of blogspace. Commun ACM 47(12):35–39. doi:10.1145/1035134.1035162
Article Google Scholar
Michalewicz Z, Fogel DB (2004) How to solve it: modern heuristics, 2nd edn. Revised and extended edn. Springer, Berlin. ISBN:3-540-22494-7
Muthukrishnan S (2003) Data streams: algorithms and applications. In: SODA’03: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 413–413. Extended version available at http://infolab.usc.edu/csci599/Fall2003/Data thms
Rabiner LR (1990) A tutorial on hidden Markov models and selected applications in speech recognition. In: Readings in speech recognition. Morgan Kaufmann Publishers Inc., Menlo Park, pp 267–296
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, pp 424–433. doi:10.1145/1150402.1150450
Yi J (2005) Detecting buzz from time-sequenced document streams. In: e-Technology, e-Commerce and e-Service, 2005. EEE ’05. Proceedings. The 2005 IEEE International Conference on, pp 347–352. http://ieeexplore.ieee.org/iel5/9634/30444/01402320.pdf

Download references

Acknowledgments

This work has been supported by the Spanish MICYT projects TIN2007-68083-C02-01 and TIN2007-67581-C02-01, the Junta de Andalucia CICE project P06-TIC-02025 and the Granada University PIUGR 9/11/06 project. We are also very grateful to the anonymous reviewers, who greatly contributed to the improvement of this papers and suggested new lines of research.

Author information

Authors and Affiliations

Departamento de Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia, Madrid, Spain
Lourdes Araujo
Departamento de Arquitectura y Tecnología de Computadores, Universidad de Granada, Granada, Spain
Juan Julián Merelo

Authors

Lourdes Araujo
View author publications
You can also search for this author in PubMed Google Scholar
Juan Julián Merelo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lourdes Araujo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Araujo, L., Merelo, J.J. Automatic detection of trends in time-stamped sequences: an evolutionary approach. Soft Comput 14, 211–227 (2010). https://doi.org/10.1007/s00500-008-0395-8

Download citation

Published: 14 January 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s00500-008-0395-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic detection of trends in time-stamped sequences: an evolutionary approach

Abstract

Access this article

Similar content being viewed by others

Modelling informative time points: an evolutionary process approach

Mining sequential patterns from probabilistic databases

On measuring similarity for sequences of itemsets

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic detection of trends in time-stamped sequences: an evolutionary approach

Abstract

Access this article

Similar content being viewed by others

Modelling informative time points: an evolutionary process approach

Mining sequential patterns from probabilistic databases

On measuring similarity for sequences of itemsets

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation