Exploratory Data Analysis

Ruppert, David; Matteson, David S.

doi:10.1007/978-1-4939-2614-5_4

David Ruppert⁶ &
David S. Matteson⁷

Part of the book series: Springer Texts in Statistics ((STS))

208k Accesses

Abstract

This book is about the statistical analysis of financial markets data such as equity prices, foreign exchange rates, and interest rates. These quantities vary randomly thereby causing financial risk as well as the opportunity for profit. Figures 4.1, 4.2, and 4.3 show, respectively, time series plots of daily log returns on the S&P 500 index, daily changes in the Deutsch Mark (DM) to U.S. dollar exchange rate, and changes in the monthly risk-free return, which is 1/12th the annual risk-free interest rate. A time series is a sequence of observations of some quantity or quantities, e.g., equity prices, taken over time, and a time series plot is a plot of a time series in chronological order. Figure 4.1 was produced by the following code:

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See Appendix A.2.1 for definitions of CDF, PDF, and other terms in probability theory.
2.
“Standard” means having expectation 0 and variance 1.
3.
See Sect. 5.16 for more discussion of robust estimation and the precise definition of MAD.
4.
Somewhat confusingly, the bottom 10 % of the data is also called the first decile and similarly for the second 10 %, and so forth. Thus, the first decile could refer to the 10th percentile of the data or to all of the data at or below this percentile. In like fashion, the bottom 20 % of the sample is called the first quintile and the second to fifth quantiles are defined analogously.
5.
See Appendix A.9.4 for an introduction to the lognormal distribution and the definition of the log-standard deviation.
6.
However, t-distributions have been generalized in at least two different ways to the so-called skewed-t-distributions, which need not be symmetric. See Sect. 5.7
7.
See Sect. 5.14
8.
See Chap. 19 for a discussion on how tail weight can greatly affect risk measures such as VaR and expected shortfall.
9.
The factor 1.5 is the default value of the range parameter and can be changed.

References

Abramson, I. (1982) On bandwidth variation in kernel estimates—a square root law. Annals of Statistics, 9, 168–176.
MathSciNet Google Scholar
Atkinson, A. C. (1985) Plots, transformations, and regression: An introduction to graphical methods of diagnostic regression analysis, Clarendon Press, Oxford.
MATH Google Scholar
Bolance, C., Guillén, M., and Nielsen, J. P. (2003) Kernel density estimation of actuarial loss functions. Insurance: Mathematics and Economics, 32, 19–36.
MATH Google Scholar
Carroll, R. J., and Ruppert, D. (1988) Transformation and Weighting in Regression, Chapman & Hall, New York.
Book MATH Google Scholar
Hoaglin, D. C., Mosteller, F., and Tukey, J. W., Eds. (1983) Understanding Robust and Exploratory Data Analysis, Wiley, New York.
MATH Google Scholar
Hoaglin, D. C., Mosteller, F., and Tukey, J. W., Eds. (1985) Exploring Data Tables, Trends, and Shapes, Wiley, New York.
MATH Google Scholar
Jones, M. C. (1990) Variable kernel density estimates and variable kernel density estimates. Australian Journal of Statistics, 32, 361–371. (Note: The title is intended to be ironic and is not a misprint.)
Google Scholar
Kleiber, C., and Zeileis, A. (2008) Applied Econometrics with R, Springer, New York.
Book MATH Google Scholar
Lehmann, E. L. (1999) Elements of Large-Sample Theory, Springer-Verlag, New York.
Book MATH Google Scholar
Scott, D. W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization, Wiley-Interscience, New York.
Book MATH Google Scholar
Serfling, R. J. (1980) Approximation Theorems of Mathematical Statistics, Wiley, New York.
Book MATH Google Scholar
Silverman, B. W. (1986) Density Estimation for Statistics and Data Analysis, Chapman & Hall, London.
Book MATH Google Scholar
Tukey, J. W. (1977) Exploratory Data Analysis, Addison-Wesley, Reading, MA.
MATH Google Scholar
van der Vaart, A. W. (1998) Asymptotic Statistics, Cambridge University Press, Cambridge.
Book MATH Google Scholar
Wand, M. P., and Jones, M. C. (1995) Kernel Smoothing, Chapman & Hall, London.
Book MATH Google Scholar
Wand, M. P., Marron, J. S., and Ruppert, D. (1991) Transformations in density estimation, Journal of the American Statistical Association, 86, 343–366.
Article MATH MathSciNet Google Scholar
Yap, B. W., and Sim, C. H. (2011) Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation, 81, 2141–2155.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistical Science and School of ORIE, Cornell University, Ithaca, NY, USA
David Ruppert
Department of Statistical Science Department of Social Statistics, Cornell University, Ithaca, NY, USA
David S. Matteson

Authors

David Ruppert
View author publications
You can also search for this author in PubMed Google Scholar
David S. Matteson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ruppert, D., Matteson, D.S. (2015). Exploratory Data Analysis. In: Statistics and Data Analysis for Financial Engineering. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2614-5_4

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2614-5_4
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2613-8
Online ISBN: 978-1-4939-2614-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics