Architecting Dependable Systems VII

Volume 6420 of the series Lecture Notes in Computer Science pp 201-226

ASDF: An Automated, Online Framework for Diagnosing Performance Problems

  • Keith BareAffiliated withCarnegie Mellon University
  • , Soila P. KavulyaAffiliated withCarnegie Mellon University
  • , Jiaqi TanAffiliated withDSO National Laboratories
  • , Xinghao PanAffiliated withDSO National Laboratories
  • , Eugene MarinelliAffiliated withCarnegie Mellon University
  • , Michael KasickAffiliated withCarnegie Mellon University
  • , Rajeev GandhiAffiliated withCarnegie Mellon University
  • , Priya NarasimhanAffiliated withCarnegie Mellon University

* Final gross prices may vary according to local VAT.

Get Access


Performance problems account for a significant percentage of documented failures in large-scale distributed systems, such as Hadoop. Localizing the source of these performance problems can be frustrating due to the overwhelming amount of monitoring information available. We automate problem localization using ASDF, an online diagnostic framework that transparently monitors and analyzes different time-varying data sources (e.g., OS performance counters, Hadoop logs) and narrows down performance problems to a specific node or a set of nodes. ASDF’s flexible architecture allows system administrators to easily customize data sources and analysis modules for their unique operating environments. We demonstrate the effectiveness of ASDF’s diagnostics on documented performance problems in Hadoop; our results indicate that ASDF incurs an average monitoring overhead of 0.38% of CPU time and achieves a balanced accuracy of 80% at localizing problems to the culprit node.