Abstract
Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD’s reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena.
Article PDF
Similar content being viewed by others
References
Aggarwal, C. C., & Yu, P. S. (2001). Outlier detection for high dimensional data. In Proceedings of the ACM SIGMOD international conference on data management (pp. 37–46).
Angiulli, F., & Pizzuti, C. (2002) Fast outlier detection in high dimensional spaces. In PKDD’02: Proceedings of the 6th European conference on principles of data mining and knowledge discovery (pp. 15–26).
Bar-Joseph, Z., Gerber, G., Gifford, D. K., Jaakkola, T., & Simon, I. (2002). A new approach to analyzing gene expression time series data. In RECOMB (pp. 39–48).
Barnett, V., & Lewis, T. (1994). Outliers in statistical data. New York: Wiley.
Bay, S. D., & Schwabacher, M. (2003). Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the ninth international conference on knowledge discovery and data mining (pp. 29–38).
Bottou, L., & Bengio, Y. (1995). Convergence properties of the k-means algorithms. In Advances in neural information processing systems (pp. 585–592).
Breunig, M. M., Kriegel, H., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 93–104).
Chan, P. K., & Mahoney, M. (2005). Modeling multiple time series for anomaly detection. In IEEE international conference on data mining (pp. 90–97).
Chudova, D., Gaffney, S., Mjolsness, E., & Smyth, P. (2003). Translation-invariant mixture models for curve clustering. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 79–88).
Dasgupta, D., & Forrest, S. (1996). Novelty detection in time series data using ideas from immunology. In Proceedings of the international conference on intelligent systems (pp. 82–87).
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39, 1–38.
Gaffney, S., & Smyth, P. (2004). Joint probabilistic curve clustering and alignment. In Advances in neural information processing systems (Vol. 17, pp. 473–480). Cambridge: MIT Press.
Hawkins, D. (1980). Identification of outliers. London: Chapman and Hall.
Hewish, A., Bell, J., Pilkington, P., & Scott, R. (1968). Observations of a rapidly pulsating radio source. Nature, 217, 709–710.
Jagadish, H. V., Koudas, N., & Muthukrishnan, S. (1999). Mining deviants in a time series database. In Proceedings of the 25th international conference on very large data bases (pp. 102–113).
Jin, W., Tung, A. K. H., & Han, J. (2001). Mining top-n local outliers in large databases. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 293–298).
Keogh, E., & Folias, T. (2002). The UCR time series data mining archive. http://www.cs.ucr.edu/~eamonn/TSDMA/index.html.
Keogh, E., Lonardi, S., & Chiu, B. Y. (2002). Finding surprising patterns in a time series database in linear time and space. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 550–556).
Keogh, E., Lin, J., & Fu, A. (2005). HOT SAX: Efficiently finding the most unusual time series subsequence. In Proceedings of the fifth IEEE international conference on data mining (pp. 226–233).
Klebesadel, R. W., Strong, I. B., & Olson, R. A. (1973). Observations of gamma-ray bursts of cosmic origin. Astrophysical Journal Letters, 182, L85+.
Knorr, E. M., & Ng, R. T. (1998). Algorithms for mining distance-based outliers. In Proceedings of the 24th international conference on very large databases (VLDB) (pp. 392–403).
Kollios, G., Gunopulos, D., Koudas, N., & Berchtold, S. (2003). Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactions on Knowledge and Data Engineering, 15(5), 1170–1187.
Lazarevic, A., & Kumar, V. (2005). Feature Bagging for Outlier Detection. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 157–166).
Lin, J., Keogh, E., Lonardi, S., & Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery.
Listgarten, J., Neal, R. M., Roweis, S. T., Puckrin, R., & Cutler, S. (2006). Bayesian detection of infrequent differences in sets of time series with shared structure. In Advances in neural information processing systems 19.
Ma, J., & Perkins, S. (2003). Online novelty detection on temporal sequences. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 613–618).
Mahoney, M., & Chan, P. K. (2005). Trajectory boundary modeling of time series for anomaly detection. In Computer Science Dept. Technical Report CS-2005-08.
Mallat, S. (1998). A wavelet tour of signal processing. San Diego: Academic Press.
Pelleg, D., & Moore, A. (2000). X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th international conference on machine learning (pp. 727–734).
Petit, M. (1987). Variable stars. New York: Wiley.
Pollacco, D. L., & Bell, S. A. (1993). New light on UU Sagittae. Monthly Notices of the Royal Astronomical Society, 262, 377–391.
Protopapas, P., Giammarco, J. M., Faccioli, L., Struble, M. F., Dave, R., & Alcock, C. (2006). Finding outlier light-curves in catalogs of periodic variable stars. Monthly Notices of the Royal Astronomical Society, 369, 677–696.
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In SIGMOD’00: Proceedings of the 2000 ACM SIGMOD international conference on management of data (pp. 427–438).
Ren, D., Wang, B., & Perrizo, W. (2004). RDF: A density-based outlier detection method using vertical data representation. In Proceedings of the fourth IEEE international conference on data mining (pp. 503–506).
Richter, G., Wenzel, W., & Hoffmeister, C. (1985). Variable stars. Berlin: Springer.
Salvador, S., Chan, P., & Brodie, J. (2004). Learning states and rules for time series anomaly detection. In Proceedings of the seventeenth international Florida artificial intelligence research society conference.
Samus’, N. N., Goranskii, V. P., Durlevich, O. V., Zharova, A. V., Kazarovets, E. V., Kireeva, N. N., Pastukhova, E. N., Williams, D. B., & Hazen, M. L. (2003). An electronic version of the second volume of the general catalogue of variable stars with improved coordinates. Astronomy Letters, 29(7), 468–479.
Schmidt, M. (1963). 3c 273: A star-like object with large red-shift. Nature, 197, 1040.
Shahabi, C., Tian, X., & Zhao, W. (2000). TSA-tree: A wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In Statistical and scientific database management (pp. 55–68).
Sterken, C., & Jaschek, C. (1996). Light curves of variable stars: a pictorial atlas. Cambridge: Cambridge University Press.
Udalski, A., Szymanski, M., Kubiak, M., Pietrzynski, G., Wozniak, P., & Zebrun, Z. (1997). Optical gravitational lensing experiment. photometry of the macho-smc-1 microlensing candidate. Acta Astronomica, 47(431).
Wei, L., Kumar, N., Lolla, V., Keogh, E., Lonardi, S., & Ratanamahatana, C. (2005). Assumption-free anomaly detection in time series. In SSDBM’2005: Proceedings of the 17th international conference on scientific and statistical database management (pp. 237–240).
Wei, L., Keogh, E., & Xi, X. (2006). SAXually explicit images: Finding unusual shapes. In Proceedings of the sixth IEEE international conference on data mining (pp. 711–720).
Wu, M., & Jermaine, C. (2006). Outlier detection by sampling with accuracy guarantees. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 767–772).
Yang, J., Wang, W., & Yu, P. S. (2001). Infominer: Mining surprising periodic patterns. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 395–400).
Yang, J., Wang, W., & Yu, P. S. (2004). Mining surprising periodic patterns. Data Mining and Knowledge Discovery, 9, 189–216.
Yu, D., Sheikholeslami, G., & Zhang, A. (2004). FindOut: Finding outliers in very large datasets. Knowledge and Information Systems, 4(4), 387–412.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Tow Fawcett.
Rights and permissions
About this article
Cite this article
Rebbapragada, U., Protopapas, P., Brodley, C.E. et al. Finding anomalous periodic time series. Mach Learn 74, 281–313 (2009). https://doi.org/10.1007/s10994-008-5093-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5093-3