Abstract
Storage systems are getting complex to handle HPC and Big Data requirements. This complexity triggers performing in-depth evaluations to ensure the absence of issues in all systems’ layers. However, the current performance evaluation activity is performed around high-level metrics for simplicity reasons. It is therefore impossible to catch potential I/O issues in lower layers along the Linux I/O stack. In this paper, we introduce IOscope tracer for uncovering I/O patterns of storage systems’ workloads. It performs filtering-based profiling over fine-grained criteria inside Linux kernel. IOscope has near-zero overhead and verified behaviours inside the kernel thanks to relying on the extended Berkeley Packet Filter (eBPF) technology. We demonstrate the capabilities of IOscope to discover patterns-related issues through a performance study on MongoDB and Cassandra. Results show that clustered MongoDB suffers from a noisy I/O pattern regardless of the used storage support (HDDs or SSDs). Hence, IOscope helps to have better troubleshooting process and contributes to have in-depth understanding of I/O performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
A major version of MongoDB (v3.6) has been released during writing this paper. It suffers from the same performance issue discussed in Sect. 3.2, regardless of the optimized throughput.
References
Abramova, V., Bernardino, J.: NoSQL databases: MongoDB vs cassandra. In: Proceedings of the International C* Conference on Computer Science and Software Engineering, pp. 14–22. ACM (2013)
Balouek, D., et al.: Adding virtualization capabilities to the Grid’5000 testbed. In: Ivanov, I.I., van Sinderen, M., Leymann, F., Shan, T. (eds.) CLOSER 2012. CCIS, vol. 367, pp. 3–20. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-04519-1_1
Betke, E., Kunkel, J.: Real-time I/O-monitoring of HPC applications with SIOX, elasticsearch, Grafana and FUSE. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) High Performance Computing, pp. 174–186. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67630-2_15
Chahal, D., Virk, R., Nambiar, M.: Performance extrapolation of IO intensive workloads: work in progress. In: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering, pp. 105–108. ACM (2016)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)
Daoud, H., Dagenais, M.R.: Recovering disk storage metrics from low-level trace events. Softw.: Pract. Exp. 48(5), 1019–1041 (2018)
Desnoyers, M., Dagenais, M.R.: The LTTng tracer: a low impact performance and behavior monitor for GNU/Linux. In: OLS (Ottawa Linux Symposium), vol. 2006, pp. 209–224. Citeseer, Linux Symposium (2006)
Gandini, A., Gribaudo, M., Knottenbelt, W.J., Osman, R., Piazzolla, P.: Performance evaluation of NoSQL databases. In: Horváth, A., Wolter, K. (eds.) EPEW 2014. LNCS, vol. 8721, pp. 16–29. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10885-8_2
Jacob, B., Larson, P., Leitao, B., Da Silva, S.: SystemTap: instrumenting the Linux kernel for analyzing performance and functional problems. IBM Redbook (2008)
Jeong, S., Lee, K., Hwang, J., Lee, S., Won, Y.: Androstep: Android storage performance analysis tool. Software Engineering (Workshops), vol. 13, pp. 327–340 (2013)
Jung, M.G., Youn, S.A., Bae, J., Choi, Y.L.: A study on data input and output performance comparison of MongoDB and PostgreSQL in the big data environment. In: 2015 8th International Conference on Database Theory and Application (DTA), pp. 14–17. IEEE (2015)
Klein, J., Gorton, I., Ernst, N., Donohoe, P., Pham, K., Matser, C.: Performance evaluation of NoSQL databases: a case study. In: Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems (2015)
Manual page on Linux, B.: (2017). http://man7.org/linux/man-pages/man2/bpf.2.html
Luo, X., et al.: HPC I/O trace extrapolation. In: Proceedings of the 4th Workshop on Extreme Scale Programming Tools. p. 2. ACM (2015)
Luo, X., et al.: ScalaiOExtrap: elastic I/O tracing and extrapolation. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 585–594. IEEE (2017)
Mantri, S.G.: Efficient in-depth IO tracing and its application for optimizing systems. Ph.D. thesis, Virginia Tech (2014)
McDougall, R., Mauro, J., Gregg, B.: Solaris performance and tools: DTrace and MDB techniques for Solaris 10 and OpenSolaris. Prentice Hall (2006)
Collection project, B.C.: https://github.com/iovisor/bcc
Schulist, J., Borkmann, D., Starovoitov, A.: Linux socket filtering aka Berkeley Packet Filter (BPF) (2016)
Sharma, S.D., Dagenais, M.: Enhanced userspace and in-kernel trace filtering for production systems. J. Comput. Sci. Technol. 6, 1161–1178 (2016)
Starovoitov, A.: (2014). https://lwn.net/Articles/598545/
Tak, B.C., Tang, C., Huang, H., Wang, L.: PseudoApp: performance prediction for application migration to cloud. In: 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), pp. 303–310. IEEE (2013)
Vef, M.A., Tarasov, V., Hildebrand, D., Brinkmann, A.: Challenges and solutions for tracing storage systems: a case study with spectrum scale. ACM Trans. Storage 14(2), 1–24 (2018). https://doi.org/10.1145/3149376
Virk, R., Chahal, D.: Trace replay based I/O performance studies for enterprise workload migration. In: 2nd Annual Conference of CMG India, Page Online (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Saif, A., Nussbaum, L., Song, YQ. (2018). IOscope: A Flexible I/O Tracer for Workloads’ I/O Pattern Characterization. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-02465-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02464-2
Online ISBN: 978-3-030-02465-9
eBook Packages: Computer ScienceComputer Science (R0)