Abstract
A great number of excellent astronomical academic achievements are built on historical observation data. So long-term astronomical observation archive has great significance for astronomical research. At the observation site, data from different sky areas shot in a consecutive time period are stored in one disk. So original data layout is temporally aggregated and spatially scattered. After an observation cycle, data are backuped into long-term astronomical observation archive. Astronomers request data from archive. But original data layout does not match requests’ spatial locality, i.e., one request focuses on specific sky area during a time period. In this situation, archive adopting original data layout consumes lots of energy and shortens disk life. Therefore, a reorganized spatially aggregated data layout is indispensable for archive. But how to aggregate observation data from nearby sky areas into one disk while keeping high disk capacity utilization is challenging. In this paper, we propose a spatially aggregated data layout based on HEALPix and graph partition for long-term astronomical observation archive, named GpDL. GpDL is generated based on distribution-known original data layout before observation data are backuped into archive. GpDL saves a lot of resources for archive while keeping up to 91% disk capacity utilization. In simulation experiments, compared with TaDL (original temporally aggregated data layout) and AmrDL (another spatially aggregated data layout based on thought of Adaptive Mesh Refinement), GpDL effectively reduces open disks number and energy cost for the same requests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
24 April 2019
The original version of chapter 18 starting on p. 238 was revised. The name of the second author has been deleted. Instead of Wenwen Liu, Rebecca J. Stones, Gang Wang, and Xiaoguang Liu it should be read as Wenwen Liu, Gang Wang, and Xiaoguang Liu. The original Chapter was corrected. The original version of chapter 40 starting on p. 524 was revised. The grant numbers of the Joint Research Fund in Astronomy were incorrect in the acknowledgement on p. 536. The original chapter was corrected.
Notes
- 1.
HEALPix is an acronym for Hierarchical Equal Area isoLatitude Pixelation of a sphere. HEALPix divides a sphere surface into many blocks with equal surface area and each block has a unique BlockID. The sphere is divided into curvilinear quadrangles hierarchically [3]. Resolution increases by division of each block into four small equal-area ones. Different resolutions correspond to different NSIDEs. (see Fig. 2. The lowest resolution is \(NSIDE=1\). When \(NSIDE=1,2,4,8\), clockwise from upper-left to bottom-left, the sphere surface is divided into 12, 48, 192, and 768 blocks.)
- 2.
see Fig. 5. Take block 102 as an example. The distance to neighboring blocks from 102 is set to 1, e.g., the distance of (102, 99), (102, 97)... is set to 1. The distance to neighboring of neighboring blocks from 102 is set to 2, e.g., the distance of (102, 32), (102, 34)... is set to 2. The distance to neighboring of neighboring of neighboring blocks from 102 is set to 3, and so on.
References
Cui, X., Yuan, X., Gong, X.: Antarctic schmidt telescopes (AST3) for dome A. In: Ground-Based and Airborne Telescopes II, vol. 7012, p. 70122D. International Society for Optics and Photonics (2008)
Gong, Z., et al.: Multi-level layout optimization for efficient spatio-temporal queries on ISABELA-compressed data. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), pp. 873–884. IEEE (2012)
Gorski, K.M., et al.: HEALPix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. Astrophys. J. 622(2), 759 (2005)
Graham, M.J., Djorgovski, S.G., Mahabal, A., Donalek, C., Drake, A., Longo, G.: Data challenges of time domain astronomy. Distrib. Parallel Databases 30(5–6), 371–384 (2012)
He, Y.Q., Sun, S.X.: A data layout and access control strategies of the video storage server based disk array. In: 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2008, pp. 433–437. IEEE (2008)
Hong, Z., et al.: AQUAdex: a highly efficient indexing and retrieving method for astronomical big data of time series images. In: Wang, G., Zomaya, A., Perez, G.M., Li, K. (eds.) ICA3PP 2015. LNCS, vol. 9529, pp. 92–105. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27122-4_7
Hoque, I., Gupta, I.: Disk layout techniques for online social network data. IEEE Internet Comput. 16(3), 24–36 (2012)
Huang, D., Zhang, X., Shi, W., Zheng, M., Jiang, S., Qin, F.: LiU: hiding disk access latency for HPC applications with a new SSD-enabled data layout. In: 2013 IEEE 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 111–120. IEEE (2013)
Huang, H., Hung, W., Shin, K.G.: FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In: ACM SIGOPS Operating Systems Review, vol. 39, pp. 263–276. ACM (2005)
Karypis, G., Kumar, V.: Multilevelk-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)
Nan, R.: Five hundred meter aperture spherical radio telescope (FAST). Sci. China Ser. G 49(2), 129–148 (2006)
Rubin, S., Bodík, R., Chilimbi, T.: An efficient profile-analysis framework for data-layout optimizations. In: ACM SIGPLAN Notices, vol. 37, pp. 140–153. ACM (2002)
Son, S.W., Chen, G., Kandemir, M.: Disk layout optimization for reducing energy consumption. In: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 274–283. ACM (2005)
Szalay, A.S., Kunszt, P.Z., Thakar, A., Gray, J., Slutz, D., Brunner, R.J.: Designing and mining multi-terabyte astronomy archives: the Sloan digital sky survey. ACM SIGMOD Rec. 29(2), 451–462 (2000)
Vogt, S.S., et al.: APF-the lick observatory automated planet finder. Publ. Astron. Soc. Pac. 126(938), 359 (2014)
Xiao, L., Yu-An, T.: TPL: a data layout method for reducing rotational latency of modern hard disk drive. In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 7, pp. 336–340. IEEE (2009)
Yan, J., et al.: Optimized data layout for spatio-temporal data in time domain astronomy. In: Ibrahim, S., Choo, K.-K.R., Yan, Z., Pedrycz, W. (eds.) ICA3PP 2017. LNCS, vol. 10393, pp. 431–440. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65482-9_30
Acknowledgments
This work is supported by the Joint Research Fund in Astronomy (U1531111, U1731243, U1731125) under cooperative agreement between the National Natural Science Foundation of China (NSFC) and Chinese Academy of Sciences (CAS), the National Natural Science Foundation of China (11573019, 61602336).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Z. et al. (2018). GpDL: A Spatially Aggregated Data Layout for Long-Term Astronomical Observation Archive. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11335. Springer, Cham. https://doi.org/10.1007/978-3-030-05054-2_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-05054-2_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05053-5
Online ISBN: 978-3-030-05054-2
eBook Packages: Computer ScienceComputer Science (R0)