A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns

  • Raul TorresEmail author
  • Julian Kunkel
  • Manuel F. Dolz
  • Thomas Ludwig
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10421)


Parallel I/O access patterns act as fingerprints of a parallel program. In order to extract meaningful information from these patterns, they have to be represented appropriately. Due to the fact that string objects can be easily compared using Kernel Methods, a conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using Kernel Principal Component Analysis (Kernel PCA) and Hierarchical Clustering. The evaluation showed that 2 out of 4 I/O access pattern groups were completely identified, while the other 2 conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.


Kernel functions Kast spectrum kernel I/O access pattern comparison Kernel PCA 



Raul Torres would like to acknowledge the financial support from the Colombian Administrative Department of Science, Technology and Innovation (Colciencias) as well as the mathematical advisory received from Ruslan Krenzler.


  1. 1.
    Kunkel, J.M.: Simulating parallel programs on application and system level. Comput. Sci. Res. Dev. 28(2), 167–174 (2012)Google Scholar
  2. 2.
    Liu, Y., Gunasekaran, R., Ma, X.S., Vazhkudai, S.S.: Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), Santa Clara, pp. 213–228 (2014)Google Scholar
  3. 3.
    Kung, S.Y.: Kernel Methods and Machine Learning. Cambridge University Press, Cambridge (2014)CrossRefzbMATHGoogle Scholar
  4. 4.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefzbMATHGoogle Scholar
  5. 5.
    BakIr, G., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. The MIT Press, Cambridge (2007)Google Scholar
  6. 6.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference. Springer Series in Statistics. Springer, New York (2009)CrossRefzbMATHGoogle Scholar
  7. 7.
    Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). doi: 10.1007/BFb0020217 Google Scholar
  8. 8.
    Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels for structured data. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 66–83. Springer, Heidelberg (2003). doi: 10.1007/3-540-36468-4_5 CrossRefGoogle Scholar
  9. 9.
    Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and Distances for Structured Data. Mach. Learn. 57(3), 205–232 (2004)CrossRefzbMATHGoogle Scholar
  10. 10.
    Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report. University of California at Santa Cruz (1999)Google Scholar
  11. 11.
    Vishwanathan, S.V.N., Smola, A.J.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems 15, pp. 569–576 (2003)Google Scholar
  12. 12.
    Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 7, pp. 566–575 (2002)Google Scholar
  13. 13.
    Kluge, M.: Comparison and End-to-End Performance Analysis of Parallel Filesystems. Ph.D. Thesis Dissertation. Technische Universität Dresden (2011)Google Scholar
  14. 14.
    Loewe, W., McLarty, T., Morrone, C.: IOR Benchmark (2012)Google Scholar
  15. 15.
    Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)CrossRefGoogle Scholar
  16. 16.
    Madhyastha, T.M., Reed, D.A.: Learning to classify parallel input/output access patterns. IEEE Trans. Parallel Distrib. Syst. 13(8), 802–813 (2002)CrossRefGoogle Scholar
  17. 17.
    Behzad B., Byna S., Prabhat and Snir, M.: Pattern-driven parallel I/O tuning. In: Proceedings of the 10th Parallel Data Storage Workshop, Austin, Texas, pp. 43–48 (2015)Google Scholar
  18. 18.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD 1998, Seattle, pp. 94–105 (1998)Google Scholar
  19. 19.
    Koller, R., Rangaswami, R.: I/O Deduplication: utilizing content similarity to improve I/O performance. ACM Trans. Storage (TOS) 6(3), 13:1–13:26 (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Scientific Computing Research GroupUniversität HamburgHamburgGermany

Personalised recommendations