Skip to main content
Log in

Storage optimisation and distributed architecture for time series reconstruction of massive astronomical catalogues

  • Research
  • Published:
Experimental Astronomy Aims and scope Submit manuscript

Abstract

Time series reconstruction of astronomical catalogues is an important part of data archiving and a basis for time-domain astronomical analysis in the era of time-domain astronomy. As the field of view and sampling frequency of various time-domain telescopes increase, the amount of data to be processed becomes larger and larger. How to optimize the spatial and temporal efficiency of this process with the aid of computer technology becomes a hot issue. To address the problem of spatial efficiency, in this paper, we propose a time series data compression algorithm based on the negative database and dynamic programming, and on this basis, we design a multi-level storage and access query architecture for hot data and non-hot data, which greatly compresses the storage space of data while ensuring the query efficiency. To address the issue of time efficiency, this paper proposes a spatio-temporal data partitioning and layout algorithm suitable for distributed architecture, whose nested round-robin strategy has a wide range of load balancing effects on different spatial locations, temporal locations, and different ranges of temporal data queries, which can effectively ensure the execution efficiency of the distributed system. Experimental results show that the proposed optimization algorithm can keep the system at a low load skewness level of about 4% and save about 83% of storage space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Maselli, A., Massaro, F., D’Abrusco, R., Cusumano, G., La Parola, V., Segreto, A., Tosti, G.: New blazars from the cross-match of recent multi-frequency catalogs. Astrophys. Space Sci. 357, 1–7 (2015)

  2. Abbott, B.P., Abbott, R., Abbott, T., Acernese, F., Ackley, K., Adams, C., Adams, T., Addesso, P., Adhikari, R., Adya, V.B., et al: Gw170817: observation of gravitational waves from a binary neutron star inspiral. Phys. Rev. Lett. 119(16), 161101 (2017)

  3. Nieto-Santisteban, M.A., Thakar, A.R., Szalay, A.S., Gray, J.: Large-scale query and xmatch, entering the parallel zone. arXiv preprint cs/0701167 (2007)

  4. Kunszt, P.Z., Szalay, A.S., Thakar, A.R.: The hierarchical triangular mesh. In: Mining the Sky: Proceedings of the MPA/ESO/MPE Workshop Held at Garching, Germany, July 31-August 4, 2000, pp. 631–637 (2001). Springer

  5. Gorski, K.M., Hivon, E., Banday, A.J., Wandelt, B.D., Hansen, F.K., Reinecke, M., Bartelmann, M.: Healpix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere. ApJ 622(2), 759 (2005)

  6. Li, B., Yu, C., Li, C., Hu, X., Xiao, J., Tang, S., Cui, C., Fan, D.: mcatcs: A highly efficient cross-matching scheme for multi-band astronomical catalogs. PASP 131(999), 054501 (2019)

  7. Zhang, Y., Yu, C., Sun, C., Xiao, J., Li, K., Mu, Y., Cui, C.: Hlc2: a highly efficient cross-matching framework for large astronomical catalogues on heterogeneous computing environments. MNRAS (2023)

  8. Yu, C., Li, K., Tang, S., Sun, C., Ma, B., Zhao, Q.: Astrocatr: a mechanism and tool for efficient time series reconstruction of large-scale astronomical catalogues. Mon. Not. R. Astron. Soc. 496(1), 629–637 (2020)

  9. Li, B., Yu, C., Hu, X., Xiao, J., Tang, S., Li, L., Ma, B.: An efficient retrieval method for astronomical catalog time series data. In: Algorithms and Architectures for Parallel Processing: 18th International Conference, ICA3PP 2018, Guangzhou, China, November 15-17, 2018, Proceedings, Part I 18, pp. 284–298 (2018). Springer

  10. Du, P., Ren, J., Pan, J., Luo, A.: New cross-matching algorithm in large-scale catalogs with threadpool technique. SCPMA 57, 577–583 (2014)

  11. Zhao, Q., Sun, J., Xiao, J., Yu, C.: Distributed astronomical cross-match based on mapreduce model. Application Research of Computers (9), 3322–3325 (2010)

  12. Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Özcan, F.: Clash of the titans: Mapreduce vs. spark for large scale data analytics. Proc. VLDB Endow. 8(13), 2110–2121 (2015)

  13. Li, L., Tang, D., Liu, T., Liu, H., Li, W., Cui, C.: Optimizing the join operation on hive to accelerate cross-matching in astronomy. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 1735–1745 (2014). IEEE

  14. Song, H., Yin, Y., Sun, X.-H., Thakur, R., Lang, S.: A segment-level adaptive data layout scheme for improved load balance in parallel file systems. In: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 414–423 (2011). IEEE

  15. Liu, Y., Huang, X., Huang, Y., Geng, S., Peng, X., Li, R.: A variable-sized stripe level data layout strategy for hdd/ssd hybrid parallel file systems. Concurrency and Computation: Practice and Experience 29(20), 4039 (2017)

  16. Atallah, M.J., Prabhakar, S.: (almost) optimal parallel block access to range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 205–215 (2000)

  17. Altiparmak, N., Tosun, A.S.: Equivalent disk allocations. IEEE Transactions on Parallel and Distributed Systems 23(3), 538–546 (2011)

  18. Yaşar, A., Gedik, B., Ferhatosmanoğlu, H.: Distributed block formation and layout for disk-based management of large-scale graphs. Distributed and Parallel Databases 35, 23–53 (2017)

  19. Liang, K.: Design and implementation of a massive astronomical data management system based on multi-tier architecture. Master’s thesis, Shandong University (2019)

  20. Li, K., Yu, C., Tang, S., Sun, C., Zhao, Q., Huang, S., Kang, Q.: Flexible light curves generation system for astronomical catalogs. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), pp. 1074–1081 (2017). IEEE

  21. Fan, D., He, B., Li, C., Han, J., Xu, Y., Cui, C.: Research on spherical distance computation and accuracy comparison. Astronomical Research and Technology 16, 69–76 (2019)

  22. Huang, X., Wang, L., Yan, J., Deng, Z., Wang, S., Ma, Y.: Towards building a distributed data management architecture to integrate multi-sources remote sensing big data. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 83–90 (2018). IEEE

Download references

Funding

This work was supported by the National Key Research and Development Program of China. Grant id:2022YFF0711500.This work was supported by National Natural Science Fundation of China. Grant id:11803022.This work was supported by National Natural Science Fundation of China. Grant id:12273077.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Zhao.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Q., Sun, L., Zhang, M. et al. Storage optimisation and distributed architecture for time series reconstruction of massive astronomical catalogues. Exp Astron 56, 821–845 (2023). https://doi.org/10.1007/s10686-023-09913-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10686-023-09913-9

Keywords

Navigation