Skip to main content

GpDL: A Spatially Aggregated Data Layout for Long-Term Astronomical Observation Archive

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11335))

  • 1702 Accesses

  • The original version of this chapter was revised: The grant numbers of the Joint Research Fund in Astronomy were incorrect in the acknowledgement on p. 536. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-05054-2_49

Abstract

A great number of excellent astronomical academic achievements are built on historical observation data. So long-term astronomical observation archive has great significance for astronomical research. At the observation site, data from different sky areas shot in a consecutive time period are stored in one disk. So original data layout is temporally aggregated and spatially scattered. After an observation cycle, data are backuped into long-term astronomical observation archive. Astronomers request data from archive. But original data layout does not match requests’ spatial locality, i.e., one request focuses on specific sky area during a time period. In this situation, archive adopting original data layout consumes lots of energy and shortens disk life. Therefore, a reorganized spatially aggregated data layout is indispensable for archive. But how to aggregate observation data from nearby sky areas into one disk while keeping high disk capacity utilization is challenging. In this paper, we propose a spatially aggregated data layout based on HEALPix and graph partition for long-term astronomical observation archive, named GpDL. GpDL is generated based on distribution-known original data layout before observation data are backuped into archive. GpDL saves a lot of resources for archive while keeping up to 91% disk capacity utilization. In simulation experiments, compared with TaDL (original temporally aggregated data layout) and AmrDL (another spatially aggregated data layout based on thought of Adaptive Mesh Refinement), GpDL effectively reduces open disks number and energy cost for the same requests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 24 April 2019

    The original version of chapter 18 starting on p. 238 was revised. The name of the second author has been deleted. Instead of Wenwen Liu, Rebecca J. Stones, Gang Wang, and Xiaoguang Liu it should be read as Wenwen Liu, Gang Wang, and Xiaoguang Liu. The original Chapter was corrected. The original version of chapter 40 starting on p. 524 was revised. The grant numbers of the Joint Research Fund in Astronomy were incorrect in the acknowledgement on p. 536. The original chapter was corrected.

Notes

  1. 1.

    HEALPix is an acronym for Hierarchical Equal Area isoLatitude Pixelation of a sphere. HEALPix divides a sphere surface into many blocks with equal surface area and each block has a unique BlockID. The sphere is divided into curvilinear quadrangles hierarchically [3]. Resolution increases by division of each block into four small equal-area ones. Different resolutions correspond to different NSIDEs. (see Fig. 2. The lowest resolution is \(NSIDE=1\). When \(NSIDE=1,2,4,8\), clockwise from upper-left to bottom-left, the sphere surface is divided into 12, 48, 192, and 768 blocks.)

  2. 2.

    see Fig. 5. Take block 102 as an example. The distance to neighboring blocks from 102 is set to 1, e.g., the distance of (102, 99), (102, 97)... is set to 1. The distance to neighboring of neighboring blocks from 102 is set to 2, e.g., the distance of (102, 32), (102, 34)... is set to 2. The distance to neighboring of neighboring of neighboring blocks from 102 is set to 3, and so on.

References

  1. Cui, X., Yuan, X., Gong, X.: Antarctic schmidt telescopes (AST3) for dome A. In: Ground-Based and Airborne Telescopes II, vol. 7012, p. 70122D. International Society for Optics and Photonics (2008)

    Google Scholar 

  2. Gong, Z., et al.: Multi-level layout optimization for efficient spatio-temporal queries on ISABELA-compressed data. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), pp. 873–884. IEEE (2012)

    Google Scholar 

  3. Gorski, K.M., et al.: HEALPix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. Astrophys. J. 622(2), 759 (2005)

    Article  Google Scholar 

  4. Graham, M.J., Djorgovski, S.G., Mahabal, A., Donalek, C., Drake, A., Longo, G.: Data challenges of time domain astronomy. Distrib. Parallel Databases 30(5–6), 371–384 (2012)

    Article  Google Scholar 

  5. He, Y.Q., Sun, S.X.: A data layout and access control strategies of the video storage server based disk array. In: 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2008, pp. 433–437. IEEE (2008)

    Google Scholar 

  6. Hong, Z., et al.: AQUAdex: a highly efficient indexing and retrieving method for astronomical big data of time series images. In: Wang, G., Zomaya, A., Perez, G.M., Li, K. (eds.) ICA3PP 2015. LNCS, vol. 9529, pp. 92–105. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27122-4_7

    Chapter  Google Scholar 

  7. Hoque, I., Gupta, I.: Disk layout techniques for online social network data. IEEE Internet Comput. 16(3), 24–36 (2012)

    Article  Google Scholar 

  8. Huang, D., Zhang, X., Shi, W., Zheng, M., Jiang, S., Qin, F.: LiU: hiding disk access latency for HPC applications with a new SSD-enabled data layout. In: 2013 IEEE 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 111–120. IEEE (2013)

    Google Scholar 

  9. Huang, H., Hung, W., Shin, K.G.: FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In: ACM SIGOPS Operating Systems Review, vol. 39, pp. 263–276. ACM (2005)

    Google Scholar 

  10. Karypis, G., Kumar, V.: Multilevelk-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  Google Scholar 

  11. Nan, R.: Five hundred meter aperture spherical radio telescope (FAST). Sci. China Ser. G 49(2), 129–148 (2006)

    Article  Google Scholar 

  12. Rubin, S., Bodík, R., Chilimbi, T.: An efficient profile-analysis framework for data-layout optimizations. In: ACM SIGPLAN Notices, vol. 37, pp. 140–153. ACM (2002)

    Google Scholar 

  13. Son, S.W., Chen, G., Kandemir, M.: Disk layout optimization for reducing energy consumption. In: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 274–283. ACM (2005)

    Google Scholar 

  14. Szalay, A.S., Kunszt, P.Z., Thakar, A., Gray, J., Slutz, D., Brunner, R.J.: Designing and mining multi-terabyte astronomy archives: the Sloan digital sky survey. ACM SIGMOD Rec. 29(2), 451–462 (2000)

    Article  Google Scholar 

  15. Vogt, S.S., et al.: APF-the lick observatory automated planet finder. Publ. Astron. Soc. Pac. 126(938), 359 (2014)

    Article  Google Scholar 

  16. Xiao, L., Yu-An, T.: TPL: a data layout method for reducing rotational latency of modern hard disk drive. In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 7, pp. 336–340. IEEE (2009)

    Google Scholar 

  17. Yan, J., et al.: Optimized data layout for spatio-temporal data in time domain astronomy. In: Ibrahim, S., Choo, K.-K.R., Yan, Z., Pedrycz, W. (eds.) ICA3PP 2017. LNCS, vol. 10393, pp. 431–440. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65482-9_30

    Chapter  Google Scholar 

Download references

Acknowledgments

This work is supported by the Joint Research Fund in Astronomy (U1531111, U1731243, U1731125) under cooperative agreement between the National Natural Science Foundation of China (NSFC) and Chinese Academy of Sciences (CAS), the National Natural Science Foundation of China (11573019, 61602336).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ce Yu or Shanjiang Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Z. et al. (2018). GpDL: A Spatially Aggregated Data Layout for Long-Term Astronomical Observation Archive. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11335. Springer, Cham. https://doi.org/10.1007/978-3-030-05054-2_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05054-2_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05053-5

  • Online ISBN: 978-3-030-05054-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics