Skip to main content

A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns

  • Conference paper
  • First Online:
Parallel Computing Technologies (PaCT 2017)

Abstract

Parallel I/O access patterns act as fingerprints of a parallel program. In order to extract meaningful information from these patterns, they have to be represented appropriately. Due to the fact that string objects can be easily compared using Kernel Methods, a conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using Kernel Principal Component Analysis (Kernel PCA) and Hierarchical Clustering. The evaluation showed that 2 out of 4 I/O access pattern groups were completely identified, while the other 2 conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kunkel, J.M.: Simulating parallel programs on application and system level. Comput. Sci. Res. Dev. 28(2), 167–174 (2012)

    Google Scholar 

  2. Liu, Y., Gunasekaran, R., Ma, X.S., Vazhkudai, S.S.: Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), Santa Clara, pp. 213–228 (2014)

    Google Scholar 

  3. Kung, S.Y.: Kernel Methods and Machine Learning. Cambridge University Press, Cambridge (2014)

    Book  MATH  Google Scholar 

  4. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)

    Book  MATH  Google Scholar 

  5. BakIr, G., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. The MIT Press, Cambridge (2007)

    Google Scholar 

  6. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference. Springer Series in Statistics. Springer, New York (2009)

    Book  MATH  Google Scholar 

  7. Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). doi:10.1007/BFb0020217

    Google Scholar 

  8. Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels for structured data. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 66–83. Springer, Heidelberg (2003). doi:10.1007/3-540-36468-4_5

    Chapter  Google Scholar 

  9. Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and Distances for Structured Data. Mach. Learn. 57(3), 205–232 (2004)

    Article  MATH  Google Scholar 

  10. Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report. University of California at Santa Cruz (1999)

    Google Scholar 

  11. Vishwanathan, S.V.N., Smola, A.J.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems 15, pp. 569–576 (2003)

    Google Scholar 

  12. Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 7, pp. 566–575 (2002)

    Google Scholar 

  13. Kluge, M.: Comparison and End-to-End Performance Analysis of Parallel Filesystems. Ph.D. Thesis Dissertation. Technische Universität Dresden (2011)

    Google Scholar 

  14. Loewe, W., McLarty, T., Morrone, C.: IOR Benchmark (2012)

    Google Scholar 

  15. Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)

    Article  Google Scholar 

  16. Madhyastha, T.M., Reed, D.A.: Learning to classify parallel input/output access patterns. IEEE Trans. Parallel Distrib. Syst. 13(8), 802–813 (2002)

    Article  Google Scholar 

  17. Behzad B., Byna S., Prabhat and Snir, M.: Pattern-driven parallel I/O tuning. In: Proceedings of the 10th Parallel Data Storage Workshop, Austin, Texas, pp. 43–48 (2015)

    Google Scholar 

  18. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD 1998, Seattle, pp. 94–105 (1998)

    Google Scholar 

  19. Koller, R., Rangaswami, R.: I/O Deduplication: utilizing content similarity to improve I/O performance. ACM Trans. Storage (TOS) 6(3), 13:1–13:26 (2010)

    Google Scholar 

Download references

Acknowledgements

Raul Torres would like to acknowledge the financial support from the Colombian Administrative Department of Science, Technology and Innovation (Colciencias) as well as the mathematical advisory received from Ruslan Krenzler.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raul Torres .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Torres, R., Kunkel, J., Dolz, M.F., Ludwig, T. (2017). A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2017. Lecture Notes in Computer Science(), vol 10421. Springer, Cham. https://doi.org/10.1007/978-3-319-62932-2_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62932-2_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62931-5

  • Online ISBN: 978-3-319-62932-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics