Abstract
Time series reconstruction of astronomical catalogues is an important part of data archiving and a basis for time-domain astronomical analysis in the era of time-domain astronomy. As the field of view and sampling frequency of various time-domain telescopes increase, the amount of data to be processed becomes larger and larger. How to optimize the spatial and temporal efficiency of this process with the aid of computer technology becomes a hot issue. To address the problem of spatial efficiency, in this paper, we propose a time series data compression algorithm based on the negative database and dynamic programming, and on this basis, we design a multi-level storage and access query architecture for hot data and non-hot data, which greatly compresses the storage space of data while ensuring the query efficiency. To address the issue of time efficiency, this paper proposes a spatio-temporal data partitioning and layout algorithm suitable for distributed architecture, whose nested round-robin strategy has a wide range of load balancing effects on different spatial locations, temporal locations, and different ranges of temporal data queries, which can effectively ensure the execution efficiency of the distributed system. Experimental results show that the proposed optimization algorithm can keep the system at a low load skewness level of about 4% and save about 83% of storage space.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Maselli, A., Massaro, F., D’Abrusco, R., Cusumano, G., La Parola, V., Segreto, A., Tosti, G.: New blazars from the cross-match of recent multi-frequency catalogs. Astrophys. Space Sci. 357, 1–7 (2015)
Abbott, B.P., Abbott, R., Abbott, T., Acernese, F., Ackley, K., Adams, C., Adams, T., Addesso, P., Adhikari, R., Adya, V.B., et al: Gw170817: observation of gravitational waves from a binary neutron star inspiral. Phys. Rev. Lett. 119(16), 161101 (2017)
Nieto-Santisteban, M.A., Thakar, A.R., Szalay, A.S., Gray, J.: Large-scale query and xmatch, entering the parallel zone. arXiv preprint cs/0701167 (2007)
Kunszt, P.Z., Szalay, A.S., Thakar, A.R.: The hierarchical triangular mesh. In: Mining the Sky: Proceedings of the MPA/ESO/MPE Workshop Held at Garching, Germany, July 31-August 4, 2000, pp. 631–637 (2001). Springer
Gorski, K.M., Hivon, E., Banday, A.J., Wandelt, B.D., Hansen, F.K., Reinecke, M., Bartelmann, M.: Healpix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere. ApJ 622(2), 759 (2005)
Li, B., Yu, C., Li, C., Hu, X., Xiao, J., Tang, S., Cui, C., Fan, D.: mcatcs: A highly efficient cross-matching scheme for multi-band astronomical catalogs. PASP 131(999), 054501 (2019)
Zhang, Y., Yu, C., Sun, C., Xiao, J., Li, K., Mu, Y., Cui, C.: Hlc2: a highly efficient cross-matching framework for large astronomical catalogues on heterogeneous computing environments. MNRAS (2023)
Yu, C., Li, K., Tang, S., Sun, C., Ma, B., Zhao, Q.: Astrocatr: a mechanism and tool for efficient time series reconstruction of large-scale astronomical catalogues. Mon. Not. R. Astron. Soc. 496(1), 629–637 (2020)
Li, B., Yu, C., Hu, X., Xiao, J., Tang, S., Li, L., Ma, B.: An efficient retrieval method for astronomical catalog time series data. In: Algorithms and Architectures for Parallel Processing: 18th International Conference, ICA3PP 2018, Guangzhou, China, November 15-17, 2018, Proceedings, Part I 18, pp. 284–298 (2018). Springer
Du, P., Ren, J., Pan, J., Luo, A.: New cross-matching algorithm in large-scale catalogs with threadpool technique. SCPMA 57, 577–583 (2014)
Zhao, Q., Sun, J., Xiao, J., Yu, C.: Distributed astronomical cross-match based on mapreduce model. Application Research of Computers (9), 3322–3325 (2010)
Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Özcan, F.: Clash of the titans: Mapreduce vs. spark for large scale data analytics. Proc. VLDB Endow. 8(13), 2110–2121 (2015)
Li, L., Tang, D., Liu, T., Liu, H., Li, W., Cui, C.: Optimizing the join operation on hive to accelerate cross-matching in astronomy. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 1735–1745 (2014). IEEE
Song, H., Yin, Y., Sun, X.-H., Thakur, R., Lang, S.: A segment-level adaptive data layout scheme for improved load balance in parallel file systems. In: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 414–423 (2011). IEEE
Liu, Y., Huang, X., Huang, Y., Geng, S., Peng, X., Li, R.: A variable-sized stripe level data layout strategy for hdd/ssd hybrid parallel file systems. Concurrency and Computation: Practice and Experience 29(20), 4039 (2017)
Atallah, M.J., Prabhakar, S.: (almost) optimal parallel block access to range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 205–215 (2000)
Altiparmak, N., Tosun, A.S.: Equivalent disk allocations. IEEE Transactions on Parallel and Distributed Systems 23(3), 538–546 (2011)
Yaşar, A., Gedik, B., Ferhatosmanoğlu, H.: Distributed block formation and layout for disk-based management of large-scale graphs. Distributed and Parallel Databases 35, 23–53 (2017)
Liang, K.: Design and implementation of a massive astronomical data management system based on multi-tier architecture. Master’s thesis, Shandong University (2019)
Li, K., Yu, C., Tang, S., Sun, C., Zhao, Q., Huang, S., Kang, Q.: Flexible light curves generation system for astronomical catalogs. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), pp. 1074–1081 (2017). IEEE
Fan, D., He, B., Li, C., Han, J., Xu, Y., Cui, C.: Research on spherical distance computation and accuracy comparison. Astronomical Research and Technology 16, 69–76 (2019)
Huang, X., Wang, L., Yan, J., Deng, Z., Wang, S., Ma, Y.: Towards building a distributed data management architecture to integrate multi-sources remote sensing big data. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 83–90 (2018). IEEE
Funding
This work was supported by the National Key Research and Development Program of China. Grant id:2022YFF0711500.This work was supported by National Natural Science Fundation of China. Grant id:11803022.This work was supported by National Natural Science Fundation of China. Grant id:12273077.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, Q., Sun, L., Zhang, M. et al. Storage optimisation and distributed architecture for time series reconstruction of massive astronomical catalogues. Exp Astron 56, 821–845 (2023). https://doi.org/10.1007/s10686-023-09913-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10686-023-09913-9