A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns
Parallel I/O access patterns act as fingerprints of a parallel program. In order to extract meaningful information from these patterns, they have to be represented appropriately. Due to the fact that string objects can be easily compared using Kernel Methods, a conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using Kernel Principal Component Analysis (Kernel PCA) and Hierarchical Clustering. The evaluation showed that 2 out of 4 I/O access pattern groups were completely identified, while the other 2 conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.
KeywordsKernel functions Kast spectrum kernel I/O access pattern comparison Kernel PCA
Raul Torres would like to acknowledge the financial support from the Colombian Administrative Department of Science, Technology and Innovation (Colciencias) as well as the mathematical advisory received from Ruslan Krenzler.
- 1.Kunkel, J.M.: Simulating parallel programs on application and system level. Comput. Sci. Res. Dev. 28(2), 167–174 (2012)Google Scholar
- 2.Liu, Y., Gunasekaran, R., Ma, X.S., Vazhkudai, S.S.: Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), Santa Clara, pp. 213–228 (2014)Google Scholar
- 5.BakIr, G., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. The MIT Press, Cambridge (2007)Google Scholar
- 10.Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report. University of California at Santa Cruz (1999)Google Scholar
- 11.Vishwanathan, S.V.N., Smola, A.J.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems 15, pp. 569–576 (2003)Google Scholar
- 12.Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 7, pp. 566–575 (2002)Google Scholar
- 13.Kluge, M.: Comparison and End-to-End Performance Analysis of Parallel Filesystems. Ph.D. Thesis Dissertation. Technische Universität Dresden (2011)Google Scholar
- 14.Loewe, W., McLarty, T., Morrone, C.: IOR Benchmark (2012)Google Scholar
- 17.Behzad B., Byna S., Prabhat and Snir, M.: Pattern-driven parallel I/O tuning. In: Proceedings of the 10th Parallel Data Storage Workshop, Austin, Texas, pp. 43–48 (2015)Google Scholar
- 18.Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD 1998, Seattle, pp. 94–105 (1998)Google Scholar
- 19.Koller, R., Rangaswami, R.: I/O Deduplication: utilizing content similarity to improve I/O performance. ACM Trans. Storage (TOS) 6(3), 13:1–13:26 (2010)Google Scholar