Abstract: Performance logs contain rich information about a system's state. Large-scale web service infrastructures deployed in the cloud are notoriously difficult to troubleshoot, especially performance bugs. Detecting, isolating and diagnosing fine-grained performance anomalies requires integrating system performance measures across space and time. To achieve scale, we present our megatables approach, which automatically interprets performance log data and outputs millibottleneck predictions along with supporting visualizations. We evaluate our method with three illustrative scenarios, and we assess its predictive ability. We also evaluate its ability to extract meaningful information from many log samples drawn from the wild.
Authors: Joshua Kimball, Rodrigo Alves Lima and Calton Pu (Georgia Institute of Technology, USA)
Email: jmkimball@gatech.edu, ral@gatech.edu, calton@cc.gatech.edu