Skip to main content

SOHAC: Efficient Storage of Tick Data That Supports Search and Analysis

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7377)

Abstract

Storage of tick data is a challenging problem because two criteria have to be fulfilled simultaneously: the storage structure should allow fast execution of queries and the data should not occupy too much space on the hard disk or in the main memory. In this paper, we present a clustering-based solution, and we introduce a new clustering algorithm that is designed to support the storage of tick data. We evaluate our algorithm both on publicly available real-world datasets, as well as real-world tick data from the financial domain provided by one of the world-wide most renowned investment bank. In our experiments we compare our approach, SOHAC, against a large collection of conventional hierarchical clustering algorithms from the literature. The experiments show that our algorithm substantially outperforms – both in terms of statistical significance and practical relevance – the examined clustering algorithms for the tick data storage problem.

Keywords

  • Cluster Algorithm
  • Stock Market
  • Compression Ratio
  • Single Linkage
  • Cosine Similarity

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmad, S., Taskaya-Temizel, T., Ahmad, K.: Summarizing Time Series: Learning Patterns in ‘Volatile’ Series. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 523–532. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

  2. Akram, Q.F., Rime, D., Sarno, L.: Does the law of one price hold in international financial markets? evidence from tick data. Journal of Banking & Finance 33(10), 1741–1754 (2009)

    CrossRef  Google Scholar 

  3. Bartiromo, R.: Dynamics of stock prices. Physical Review E 69(6), 067108 (2004)

    CrossRef  Google Scholar 

  4. Ben-David, S., Von Luxburg, U., Pál, D.: A sober look at clustering stability. Learning Theory, 5–19 (2006)

    Google Scholar 

  5. Buza, K., Buza, A., Kis, P.: A distributed genetic algorithm for graph-based clustering. Man-Machine Interactions 2, 323–331 (2011)

    Google Scholar 

  6. Cortez, P., Morais, A.: A Data Mining Approach to Predict Forest Fires using Meteorological Data. In: New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, pp. 512–523 (2007)

    Google Scholar 

  7. Dionne, G., Duchesne, P., Pacurar, M.: Intraday value at risk (ivar) using tick-by-tick data with application to the toronto stock exchange. Journal of Empirical Finance 16(5), 777–792 (2009)

    CrossRef  Google Scholar 

  8. Frank, A., Asuncion, A.: Uci machine learning repository (2010), http://archive.ics.uci.edu/ml

  9. Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)

    CrossRef  Google Scholar 

  10. Han, B., Yang, Z.: Data matrix compression by using co-clustering. In: 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2011), vol. 4, pp. 2600–2604 (July 2011)

    Google Scholar 

  11. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)

    CrossRef  Google Scholar 

  12. Kurucz, M., Benczur, A., Csalogány, K., Lukács, L.: Spectral clustering in telephone call graphs. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 82–91. ACM (2007)

    Google Scholar 

  13. Nanopoulos, A., Gabriel, H.-H., Spiliopoulou, M.: Spectral Clustering in Social-Tagging Systems. In: Vossen, G., Long, D.D.E., Yu, J.X. (eds.) WISE 2009. LNCS, vol. 5802, pp. 87–100. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  14. Oh, K.J., Kim, K.: Analyzing stock market tick data using piecewise nonlinear model. Expert Systems with Applications 22(3), 249–255 (2002)

    CrossRef  Google Scholar 

  15. Ohnishi, T., Mizuno, T., Aihara, K., Takayasu, M., Takayasu, H.: Statistical properties of the moving average price in dollar–yen exchange rates. Physica A: Statistical Mechanics and its Applications 344(1), 207–210 (2004)

    CrossRef  Google Scholar 

  16. Salomon, D.: Data compression: the complete reference. Springer-Verlag New York Inc. (2004)

    Google Scholar 

  17. Sazuka, N.: Analysis of binarized high frequency financial data. The European Physical Journal B-Condensed Matter and Complex Systems 50(1), 129–131 (2006)

    CrossRef  Google Scholar 

  18. Takayasu, M., Takayasu, H., Okazaki, M.P.: Transaction interval analysis of high resolution foreign exchange data. Empirical Science of Financial Fluctuations-The Advent of Econophysics 18, 25 (2002)

    Google Scholar 

  19. Tan, P., Steinbach, M., Kumar, V., et al.: Introduction to data mining. Pearson Addison Wesley, Boston (2006)

    Google Scholar 

  20. Thai-Nghe, N., Drumond, L., Horváth, T., Schmidt-Thieme, L.: Multi-relational factorization models for predicting student performance. In: KDD 2011 Workshop on Knowledge Discovery in Educational Data, KDDinED 2011 (2011)

    Google Scholar 

  21. Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M.: The Role of Hubness in Clustering High-Dimensional Data. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS (LNAI), vol. 6634, pp. 183–195. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  22. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2011)

    Google Scholar 

  23. Xu, R., Wunsch, D., et al.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)

    CrossRef  Google Scholar 

  24. Zhou, B.: High-frequency data and volatility in foreign-exchange rates. Journal of Business & Economic Statistics 14(1), 45–52 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nagy, G.I., Buza, K. (2012). SOHAC: Efficient Storage of Tick Data That Supports Search and Analysis. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2012. Lecture Notes in Computer Science(), vol 7377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31488-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31488-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31487-2

  • Online ISBN: 978-3-642-31488-9

  • eBook Packages: Computer ScienceComputer Science (R0)