The Journal of Supercomputing

, Volume 69, Issue 1, pp 139–160

Characterizing and modeling cloud applications/jobs on a Google data center

Article

DOI: 10.1007/s11227-014-1131-z

Cite this article as:
Di, S., Kondo, D. & Cappello, F. J Supercomput (2014) 69: 139. doi:10.1007/s11227-014-1131-z

Abstract

In this paper, we characterize and model Google applications and jobs, based on a 1-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+ % of tasks’ simulation errors are \(<\)20 %, confirming a high accuracy of our simulation model.

Keywords

Google data centerCloud taskCharacterization and analysisLarge-scale system trace

Copyright information

© Argonne National Laboratory; DE-AC02-06CH11357  2014

Authors and Affiliations

  1. 1.INRIAParisFrance
  2. 2.Argonne National LaboratoryLemontUSA