Finding anomalous periodic time series

Rebbapragada, Umaa; Protopapas, Pavlos; Brodley, Carla E.; Alcock, Charles

doi:10.1007/s10994-008-5093-3

Finding anomalous periodic time series

An application to catalogs of periodic variable stars

Published: 31 December 2008

Volume 74, pages 281–313, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Finding anomalous periodic time series

Download PDF

Umaa Rebbapragada¹,
Pavlos Protopapas^2,3,
Carla E. Brodley¹ &
…
Charles Alcock²

3094 Accesses
93 Citations
Explore all metrics

Abstract

Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD’s reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena.

Article PDF

A survey of methods for time series change point detection

Article 08 September 2016

Samaneh Aminikhanghahi & Diane J. Cook

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

Alejandro Pasos Ruiz, Michael Flynn, … Anthony Bagnall

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Article 25 March 2024

Rémi Thériault, Mattan S. Ben-Shachar, … Dominique Makowski

References

Aggarwal, C. C., & Yu, P. S. (2001). Outlier detection for high dimensional data. In Proceedings of the ACM SIGMOD international conference on data management (pp. 37–46).
Angiulli, F., & Pizzuti, C. (2002) Fast outlier detection in high dimensional spaces. In PKDD’02: Proceedings of the 6th European conference on principles of data mining and knowledge discovery (pp. 15–26).
Bar-Joseph, Z., Gerber, G., Gifford, D. K., Jaakkola, T., & Simon, I. (2002). A new approach to analyzing gene expression time series data. In RECOMB (pp. 39–48).
Barnett, V., & Lewis, T. (1994). Outliers in statistical data. New York: Wiley.
MATH Google Scholar
Bay, S. D., & Schwabacher, M. (2003). Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the ninth international conference on knowledge discovery and data mining (pp. 29–38).
Bottou, L., & Bengio, Y. (1995). Convergence properties of the k-means algorithms. In Advances in neural information processing systems (pp. 585–592).
Breunig, M. M., Kriegel, H., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 93–104).
Chan, P. K., & Mahoney, M. (2005). Modeling multiple time series for anomaly detection. In IEEE international conference on data mining (pp. 90–97).
Chudova, D., Gaffney, S., Mjolsness, E., & Smyth, P. (2003). Translation-invariant mixture models for curve clustering. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 79–88).
Dasgupta, D., & Forrest, S. (1996). Novelty detection in time series data using ideas from immunology. In Proceedings of the international conference on intelligent systems (pp. 82–87).
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39, 1–38.
MATH MathSciNet Google Scholar
Gaffney, S., & Smyth, P. (2004). Joint probabilistic curve clustering and alignment. In Advances in neural information processing systems (Vol. 17, pp. 473–480). Cambridge: MIT Press.
Google Scholar
Hawkins, D. (1980). Identification of outliers. London: Chapman and Hall.
MATH Google Scholar
Hewish, A., Bell, J., Pilkington, P., & Scott, R. (1968). Observations of a rapidly pulsating radio source. Nature, 217, 709–710.
Article Google Scholar
Jagadish, H. V., Koudas, N., & Muthukrishnan, S. (1999). Mining deviants in a time series database. In Proceedings of the 25th international conference on very large data bases (pp. 102–113).
Jin, W., Tung, A. K. H., & Han, J. (2001). Mining top-n local outliers in large databases. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 293–298).
Keogh, E., & Folias, T. (2002). The UCR time series data mining archive. http://www.cs.ucr.edu/~eamonn/TSDMA/index.html.
Keogh, E., Lonardi, S., & Chiu, B. Y. (2002). Finding surprising patterns in a time series database in linear time and space. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 550–556).
Keogh, E., Lin, J., & Fu, A. (2005). HOT SAX: Efficiently finding the most unusual time series subsequence. In Proceedings of the fifth IEEE international conference on data mining (pp. 226–233).
Klebesadel, R. W., Strong, I. B., & Olson, R. A. (1973). Observations of gamma-ray bursts of cosmic origin. Astrophysical Journal Letters, 182, L85+.
Article Google Scholar
Knorr, E. M., & Ng, R. T. (1998). Algorithms for mining distance-based outliers. In Proceedings of the 24th international conference on very large databases (VLDB) (pp. 392–403).
Kollios, G., Gunopulos, D., Koudas, N., & Berchtold, S. (2003). Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactions on Knowledge and Data Engineering, 15(5), 1170–1187.
Article Google Scholar
Lazarevic, A., & Kumar, V. (2005). Feature Bagging for Outlier Detection. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 157–166).
Lin, J., Keogh, E., Lonardi, S., & Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery.
Listgarten, J., Neal, R. M., Roweis, S. T., Puckrin, R., & Cutler, S. (2006). Bayesian detection of infrequent differences in sets of time series with shared structure. In Advances in neural information processing systems 19.
Ma, J., & Perkins, S. (2003). Online novelty detection on temporal sequences. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 613–618).
Mahoney, M., & Chan, P. K. (2005). Trajectory boundary modeling of time series for anomaly detection. In Computer Science Dept. Technical Report CS-2005-08.
Mallat, S. (1998). A wavelet tour of signal processing. San Diego: Academic Press.
MATH Google Scholar
Pelleg, D., & Moore, A. (2000). X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th international conference on machine learning (pp. 727–734).
Petit, M. (1987). Variable stars. New York: Wiley.
Google Scholar
Pollacco, D. L., & Bell, S. A. (1993). New light on UU Sagittae. Monthly Notices of the Royal Astronomical Society, 262, 377–391.
Google Scholar
Protopapas, P., Giammarco, J. M., Faccioli, L., Struble, M. F., Dave, R., & Alcock, C. (2006). Finding outlier light-curves in catalogs of periodic variable stars. Monthly Notices of the Royal Astronomical Society, 369, 677–696.
Article Google Scholar
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In SIGMOD’00: Proceedings of the 2000 ACM SIGMOD international conference on management of data (pp. 427–438).
Ren, D., Wang, B., & Perrizo, W. (2004). RDF: A density-based outlier detection method using vertical data representation. In Proceedings of the fourth IEEE international conference on data mining (pp. 503–506).
Richter, G., Wenzel, W., & Hoffmeister, C. (1985). Variable stars. Berlin: Springer.
Google Scholar
Salvador, S., Chan, P., & Brodie, J. (2004). Learning states and rules for time series anomaly detection. In Proceedings of the seventeenth international Florida artificial intelligence research society conference.
Samus’, N. N., Goranskii, V. P., Durlevich, O. V., Zharova, A. V., Kazarovets, E. V., Kireeva, N. N., Pastukhova, E. N., Williams, D. B., & Hazen, M. L. (2003). An electronic version of the second volume of the general catalogue of variable stars with improved coordinates. Astronomy Letters, 29(7), 468–479.
Article Google Scholar
Schmidt, M. (1963). 3c 273: A star-like object with large red-shift. Nature, 197, 1040.
Article Google Scholar
Shahabi, C., Tian, X., & Zhao, W. (2000). TSA-tree: A wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In Statistical and scientific database management (pp. 55–68).
Sterken, C., & Jaschek, C. (1996). Light curves of variable stars: a pictorial atlas. Cambridge: Cambridge University Press.
Google Scholar
Udalski, A., Szymanski, M., Kubiak, M., Pietrzynski, G., Wozniak, P., & Zebrun, Z. (1997). Optical gravitational lensing experiment. photometry of the macho-smc-1 microlensing candidate. Acta Astronomica, 47(431).
Wei, L., Kumar, N., Lolla, V., Keogh, E., Lonardi, S., & Ratanamahatana, C. (2005). Assumption-free anomaly detection in time series. In SSDBM’2005: Proceedings of the 17th international conference on scientific and statistical database management (pp. 237–240).
Wei, L., Keogh, E., & Xi, X. (2006). SAXually explicit images: Finding unusual shapes. In Proceedings of the sixth IEEE international conference on data mining (pp. 711–720).
Wu, M., & Jermaine, C. (2006). Outlier detection by sampling with accuracy guarantees. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 767–772).
Yang, J., Wang, W., & Yu, P. S. (2001). Infominer: Mining surprising periodic patterns. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 395–400).
Yang, J., Wang, W., & Yu, P. S. (2004). Mining surprising periodic patterns. Data Mining and Knowledge Discovery, 9, 189–216.
Article MathSciNet Google Scholar
Yu, D., Sheikholeslami, G., & Zhang, A. (2004). FindOut: Finding outliers in very large datasets. Knowledge and Information Systems, 4(4), 387–412.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tufts University, 161 College Ave., Medford, MA, 02155, USA
Umaa Rebbapragada & Carla E. Brodley
Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA, 02138, USA
Pavlos Protopapas & Charles Alcock
Initiative in Innovative Computing, Harvard University, 60 Oxford Street, Cambridge, MA, 02138, USA
Pavlos Protopapas

Authors

Umaa Rebbapragada
View author publications
You can also search for this author in PubMed Google Scholar
Pavlos Protopapas
View author publications
You can also search for this author in PubMed Google Scholar
Carla E. Brodley
View author publications
You can also search for this author in PubMed Google Scholar
Charles Alcock
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Umaa Rebbapragada.

Additional information

Editor: Tow Fawcett.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rebbapragada, U., Protopapas, P., Brodley, C.E. et al. Finding anomalous periodic time series. Mach Learn 74, 281–313 (2009). https://doi.org/10.1007/s10994-008-5093-3

Download citation

Received: 20 March 2007
Revised: 17 November 2008
Accepted: 21 November 2008
Published: 31 December 2008
Issue Date: March 2009
DOI: https://doi.org/10.1007/s10994-008-5093-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Finding anomalous periodic time series

Abstract

Article PDF

Similar content being viewed by others

A survey of methods for time series change point detection

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Check your outliers! An introduction to identifying statistical outliers in R with easystats

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding anomalous periodic time series

Abstract

Article PDF

Similar content being viewed by others

A survey of methods for time series change point detection

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Check your outliers﻿! An introduction to identifying statistical outliers in R with easystats

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Check your outliers! An introduction to identifying statistical outliers in R with easystats