Efficient Operational Management of Enterprise File Server with File Size Distribution Model

  • Toshiko MatsumotoEmail author
  • Takashi Onoyama
  • Norihisa Komoda
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 247)


Toward efficient operational management of enterprise file server, we propose an estimation method for relationship between file number and cumulative file size in descending order of file size based on a model for file size distribution. We develop the model by weighted summation of multiple log normal distribution based on AIC. File size data from technical and non-technical divisions of a company show that our model fits well with observed distribution, and that the estimated relationship can be utilized for cost-effective operational management of file server.


Akaike’s information criterion Enterprise file server File size Log normal distribution Operational management Tiered storage 


  1. 1.
    Anderson E, Hall J, Hartline J, Hobbs M, Karlin AR, Saia J, Swaminathan R, Wilkes J (2001) An experimental study of data migration algorithms. In: Proceedings of the 5th international workshop on algorithm engineering, 145–158Google Scholar
  2. 2.
    Malhotra J, Sarode P, Kamble A (2012) A review of various techniques and approaches of data deduplication. Intr J Eng Pract 1(1):1–8Google Scholar
  3. 3.
    Agrawal N, Bolosky WJ, Douceur JR, Lorch JR (2007) A five-year study of file-system metadata. ACM Trans Storage 3(3):31–45CrossRefGoogle Scholar
  4. 4.
    Downey AB (2001) The structural cause of file size distributions. In: Proceedings of the 2001 ACM SIGMETRICS international conference on measurement and modeling of computer systemsGoogle Scholar
  5. 5.
    Evans KM, Kuenning GH (2002) A study of irregularities in file-size distributions. In: Proceedings of international symposium on performance evaluation of computer and telecommunication systemsGoogle Scholar
  6. 6.
    Meyer DT, Bolosky WJ (2011) A study of practical deduplication. In: Proceedings of 9th USENIX conference on file and storage technologiesGoogle Scholar
  7. 7.
    Satyanarayanan M (1981) A study of file sizes and functional lifetimes. In: Proceedings of the 8th ACM symposium on operating systems principles, 96–108Google Scholar
  8. 8.
    Barford P, Crovella M (1998) Generating representative web workloads for network and server performance evaluation. In: Proceedings of the 1998 ACM SIGMETRICS joint international conference on measurement and modeling of computer systemsGoogle Scholar
  9. 9.
    Gibson T, Miller EL (1999) An improved long-term file-usage prediction algorithm. In: Proceedings of annual international conference on computer measurement and performanceGoogle Scholar
  10. 10.
    SPEC SFS 2008 benchmark (2008);
  11. 11.
    Matsumoto T, Onoyama T, Komoda N (2012) File size distribution model in enterprise file server toward efficient operational management. In: Proceedings of world congress on engineering and computer science 2012, 2, 1400–1404Google Scholar
  12. 12.
    Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Proceedings of the 2nd international symposium on information theory, 267–281Google Scholar
  13. 13.
    Finney D (1941) On the distribution of a variable whose logarithm is normally distributed. J Roy Stat Soc 7:155–161MathSciNetGoogle Scholar
  14. 14.
    Sturges HA (1926) The choice of a class interval. J Am Stat Assoc 21(153):65–66CrossRefGoogle Scholar
  15. 15.
    McLaren CE, Legler JM, Brittenham GM (1994) The generalized χ2 goodness-of-fit test. The Statistician 43(2):247–258CrossRefGoogle Scholar
  16. 16.
    Miller GR (1981) Simultaneous Statistical Inference, 2nd edn. Springer-Verlag, New YorkCrossRefzbMATHGoogle Scholar
  17. 17.
    Nakamura T, Komoda N (2009) Size adjusting pre-allocation methods to improve fragmentation and performance on simultaneous file creation by synchronous write. ISPJ Journal 50(11):2690–2698Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Toshiko Matsumoto
    • 1
    Email author
  • Takashi Onoyama
    • 1
  • Norihisa Komoda
    • 2
  1. 1.Hitachi Solutions, Ltd.TokyoJapan
  2. 2.Osaka UniversityOsakaJapan

Personalised recommendations