Science China Information Sciences

, Volume 56, Issue 5, pp 1–11 | Cite as

AIP: a tool for flexible and transparent data management

  • GuangYan Zhang
  • JianPing Qiu
  • JiWu Shu
  • WeiMin Zheng
Research Paper


Existing data management tools have some limitations such as restrictions to specific file systems or shortage of transparence to applications. In this paper, we present a new data management tool called AIP, which is implemented via the standard data management API, and hence it supports multiple file systems and makes data management operations transparent to applications. First, AIP provides centralized policy-based data management for controlling the placement of files in different storage tiers. Second, AIP uses differentiated collections of file states to improve the execution efficiency of data management policies, with the help of the caching mechanism of file states. Third, AIP also provides a resource arbitration mechanism for controlling the rate of initiated data management operations. Our results from representative experiments demonstrate that AIP has the ability to provide high performance, to introduce low management overhead, and to have good scalability.


data management DMAPI management policy differentiated collection resource arbitration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Smith A J. Long term file migration: development and evaluation of algorithms. Commun ACM, 1981, 24: 521–532CrossRefGoogle Scholar
  2. 2.
    Douceur J R, Bolosky W J. A large-scale study of file system contents. In: Proceedings of the 1999 ACM SIGMETRICS Conference. New York: ACM, 1999. 59–70CrossRefGoogle Scholar
  3. 3.
    Vogels W. File system usage in Windows NT 4.0. In: Proceedings of the 17th ACM Symposium on Operating Systems Principles. New York: ACM, 1999. 93–109Google Scholar
  4. 4.
    Wang F, Xin Q, Hong B, et al. File system workload analysis for large scale scientific computing applications. In: Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies(MSST 2004). Washington DC: IEEE, 2004. 139–152Google Scholar
  5. 5.
    Gibson T J, Miller E L, Long D D E. Long-term file activity and inter-reference patterns. In: Proceedings of 24th International Conference on Technology Management and Performance Evaluation of Enterprise-Wide Information Systems. California: Computer Measurement Group, 1998. 976–987Google Scholar
  6. 6.
    Gibson T J, Miller E L. Long-term file activity patterns in a UNIX workstation environment. In: 15th IEEE Symposium on Mass Storage Systems. Washington DC: IEEE, 1998. 355–371Google Scholar
  7. 7.
    Gribble S, Manku G, Roselli E, et al. Self-similarity in file systems. In: SIGMETRICS98. New York: ACM, 1998. 141–150Google Scholar
  8. 8.
    Miroshnichenko A. Data management API: the standard and implementation experiences. In: Proceedings of AUUG 96 & Asia Pacific World Wide Web. NSW: AUUG, 1996. 271–282Google Scholar
  9. 9.
    Jin H, Xiong M Z, Wu S. Information value evaluation model for ILM. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Washington DC, 2008. 543–548Google Scholar
  10. 10.
    Zhao X N, Li Z H, Zeng L J. A hierarchical storage strategy based on block-level data valuation. In: 4th International Conference on Networked Computing and Advanced Information Management. Washington DC: IEEE, 2008. 36–41Google Scholar
  11. 11.
    Vengerov D. A reinforcement learning framework for online data migration in hierarchical storage systems. J Supercomput, 2008, 43: 1–19CrossRefGoogle Scholar
  12. 12.
    Verma A, Pease D, Sharma U, et al. An architecture for lifecycle management in very large file systems. In: Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies. Washington DC: IEEE, 2005. 160–168CrossRefGoogle Scholar
  13. 13.
    Menon J, Pease D A, Rees R, et al. IBM storage tank-a heterogeneous scalable SAN file system. IBM Syst J, 2003, 42: 250–267CrossRefGoogle Scholar
  14. 14.
    Beigi M, Devarakonda M V, Jain R, et al. Akshat verma: policy-based information lifecycle management in a largescale file system. In: POLICY’ 05 Proceedings of the 6th IEEE International Workshop on Policies for Distributed Systems and Networks. Washington DC: IEEE, 2005. 139–148CrossRefGoogle Scholar
  15. 15.
    He D S, Zhang X B, Du D H C, et al. Coordinating parallel hierarchical storage management in object-base cluster file system. In: Proceeding of 23nd IEEE-14th NASA Goddard Conference on Mass Storage Systems and Technologies. Washington DC: IEEE, 2006. 219–234Google Scholar
  16. 16.
    Gelb J P. System-managed storage. IBM Syst J, 1989, 28: 77–103CrossRefGoogle Scholar
  17. 17.
    Kaczmarski M, Jiang T, Pease D. Beyond backup towards storage management. IBM Syst J, 2003, 42: 322–338CrossRefGoogle Scholar
  18. 18.
    Anonymous. Veritas data protection products. 2004.
  19. 19.
    Brooks C, McFarlane P, Pott N, et al. IBM tivoli storage management concepts.
  20. 20.
    EMC Corporation. A better approach to managing file system data, lowering costs, reducing risk, and managing data growth. EMC White Paper. 2006Google Scholar
  21. 21.
    Pike R, Presotto D, Dorward S, et al. Plan 9 from Bell Labs. Comput Syst, 1995, 8: 221254Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • GuangYan Zhang
    • 1
    • 2
  • JianPing Qiu
    • 1
  • JiWu Shu
    • 1
    • 2
  • WeiMin Zheng
    • 1
    • 2
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.Tsinghua National Laboratory for Information Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations