Synthetic History for Exchange Traded Funds

  • Aistis Raudys
  • Lukas Sirvydis
  • Karol Lisovskij
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 117)


To make money in trading one ought to forecast the future price, but to do so accurately one must verify the predictions using the past data. A short trading history can present a problem. We showed both theoretically and experimentally that the history of some financial assets can be reconstructed quite accurately. We forecasted the past price movements of exchange traded funds (ETFs). The problem in practice is very acute as there are a number of very liquid ETFs that can be traded with minimum slippage but their available history is too short. In such situations systematic traders cannot test their trading models as the history length is insufficient. To forecast historical ETF prices we used stocks with a longer history available. In some cases we created multiple model instances with a variable number of stocks. As soon as the stock history became unavailable we selected a different model. We compared this and eight other methods using a set of US ETFs ranging from S&P 500 to uranium. The experimental study showed the expectation maximisation with covariance matrix normalization to be the best method for this task.


synthetic history artificial history time series regression expectation maximisation imputation missing data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mockus, J., Raudys, A.: On the Efficient-Market Hypothesis and Stock Exchange Game Model. Expert Systems with Applications 37(8), 5673–5681 (2010)CrossRefGoogle Scholar
  2. 2.
    Graham, K.: Imputing for missing survey responses. s.l. In: Proceedings of the Section on Survey Research Methods. American Statistical Association (1982)Google Scholar
  3. 3.
    Schneider, T.: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate 14, 853–871 (2001)CrossRefGoogle Scholar
  4. 4.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, pp. 3–18, 39–48, 127–139. John Wiley & Sons, Los Angeles (1987)Google Scholar
  5. 5.
    Tseng, S., Wang, K., Lee, C.: A pre-processing method to deal with missing values by integrating clustering and regression techniques. Applied Artificial Intelligence 17(5-6), 535–544 (2003)CrossRefGoogle Scholar
  6. 6.
    Firat, M., Dikbas, F., Koc, A.C., Güngör, M.: Estimation of Missing River Flows using Expectation Maximization Method. Balwois, Ohrid (2010)Google Scholar
  7. 7.
    Amato, A., Calabrese, M., Di Lecce, V.: Decision Trees in Time Series Reconstruction Problems. In: IEEE International Instrumentation and Measurement Technology Conference, pp. 895–899. IEEE, Canada (2008)CrossRefGoogle Scholar
  8. 8.
    Kim, J.-W., Pachepsky, Y.A.: Reconstructing missing daily precipitation data using regression trees and artificial neural networks for SWAT stream flow simulation. Journal of Hydrology 394(3-4), 305–314 (2010)CrossRefGoogle Scholar
  9. 9.
    Huang, X., Zhu, Q.: A pseudo nearest neighbour approach for missing data recovery on Gaussian data sets. Pattern Recognition Letters 23(13), 1613–1622 (2002)CrossRefGoogle Scholar
  10. 10.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Missing, Mining, Inference, and Prediction. Springer, New York (2001)Google Scholar
  11. 11.
    Aksoy, H., Toprak, Z.F., Aytek, A.: Stochastic generation of hourly mean wind speed data. Renewable Energy 29(14), 2111–2131 (2004)CrossRefGoogle Scholar
  12. 12.
    Srikanthan, R.: A multisite daily rainfall data generation model for climate change conditions. In: 18th World IMACS / MODSIM Congress, pp. 3976–3982. eWater CRC, Water Division, Bureau of Meteorology, Melbourne (2009)Google Scholar
  13. 13.
    Andridge Rebecca, R., Little Roderick, J.A.: A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review 78(1), 40–64 (2010)CrossRefGoogle Scholar
  14. 14.
    Utsunomiya, K., Sonoda, K.: Methodology for Handling Missing Values In Tankan. Research and Statistics Department Bank of Japan, Japan (2001)Google Scholar
  15. 15.
    Bang, Y.-K., Lee, C.-H.: Fuzzy Time Series Prediction with Data Preprocessing and Error Compensation Based on Correlation Analysis. In: Third International Conference on Convergence and Hybrid Information Technology, vol. 2, pp. 714–721. IEEE (2008)Google Scholar
  16. 16.
    Shrestha, S.L.: Categorical Regression Models with Optimal Scaling for Predicting Indoor Air Pollution Concentrations inside Kitchens in Nepalese Households. Nepal Journal of Science and Technology 10, 205–211 (2009)Google Scholar
  17. 17.
    Sujatha, K.V., Sundaram, S.M.: Stock Index Prediction Using Regression and Neural Network Models under Non Normal Conditions. In: 2010 International Conference on Emerging Trends in Robotics and Communication Technologies (INTERACT), pp. 59–63 (2010)Google Scholar
  18. 18.
    Yankov, D., DeCoste, D., Keogh, E.: Ensembles of Nearest Neighbor Forecasts. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 545–556. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Mustapha, N., Jalali, M., Bozorgniya, A., Jalali, M.: Navigation Patterns Mining Approach based on Expectation Maximization Algorithm. World Academy of Science, Engineering and Technology 50, 855–859 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Aistis Raudys
    • 1
  • Lukas Sirvydis
    • 1
  • Karol Lisovskij
    • 1
  1. 1.Faculty of Mathematics and Informatics, Department of InformaticsVilnius UniversityVilniusLithuania

Personalised recommendations