Abstract
Investors are often said to be driven by emotions, and studies in sentiment analysis claim that there is a causal relationship between negative affect in text and prices in financial markets. The text collections used in these studies tend to be of varying sizes and sources, with little justification of their design criteria. This is a classic data engineering problem, which requires specification of the data sources and design of the data repositories and retrieval facilities. In this paper, we explore the statistical properties of negative affect expressed in various textual corpora, differing in specification, size and provenance. The question we ask is whether there are any stylized facts of negative affect that are universal across all texts. We observed two main findings: (1) The frequency distribution of negative terms is generally stable across different corpus sizes and (2) The frequency of negative terms accounts for a relatively small proportion of the total terms in the corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Returns denote the relative change in price of an asset between two different time periods.
- 2.
Typically, the first 100 words in a ranked frequency list are responsible for 50 % of the total words in the text collection [7].
- 3.
The Shapiro-Wilk statistic for the AofM and COCA corpora at 128 documents was not significant at \(p < 0.001\).
- 4.
Namely the terms ‘down’, ‘loss’, ‘close’, ‘drop’, ‘lose’, ‘against’, ‘crisis’, ‘decline’, ‘problem’, ‘concern’, ‘cut’, ‘risk’, ‘decline’, ‘inflation’ and ‘drop’. These words were selected based on their common occurrence across all 6 corpora.
References
Cont, R.: Empirical properties of asset returns: stylized facts adn statistical issues. Quant. Finance 1, 223–236 (2001)
Taylor, S.J.: Asset Price Dynamics, Volatility, and Prediction. Princeton University Press, Princeton (2011)
Shiller, R.J., Perron, P.: Testing the random walk hypothesis: power versus frequency of observation. Econ. Lett. 18(4), 381–386 (1985)
Tetlock, P.C.: Giving content to investor sentiment: the role of media in the stockmarket. J. Finance 62(3), 1139–1168 (2007)
Garcia, D.: Sentiment during recessions. J. Finance LXVIII 3, 1267–1300 (2013). doi:10.1111/jofi.12027
Antweiler, W., Frank, M.Z.: Is all that talk just noise? the information content of internet stock message boards. J. Finance 59(3), 1259–1294 (2004)
Ahmad, K.: Being in text and text in being: notes on representative texts. In: Andeman, G., Rogers, M. (eds.) Incorporating Corpora, pp. 60–91. Multilingual Matters, Clevedon (2008)
Loughran, T., McDonald, B.: The use of word lists in textual analysis. J. Behav. Finance 16(1), 1–11 (2015)
Davies, M., The corpus of contemporary american english: 450 million words, 1990-present (2008)
British National Corpus. Oxford University, Humanities Computing Unit, New York (2000)
Kelly, S.: Signs of irrational exuberance: an investigation into the role of news and sentiment in finance. Ph.D. thesis, Trinity College, University of Dublin (2015)
Zhao, Z., Ahmad, K.: Qualitative and quantitative sentiment proxies: interaction between markets. In: Jackowski, K., Burduk, R., Walkowiak, K., Woźniak, M., Yin, H. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 466–474. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24834-9_54
Zhao, Z., Ahmad, K.: A computational account of investor behaviour in chinese and US market. Int. J. Econ. Behav. Organ. 3(6), 78–84 (2015)
Stone, P.J., Dunphy, D.C., Smith, M.S., Olgilvie, D.M., with associates: The General Inquirer: A Computer Approach to Content Analysis. The MIT Press, Cambridge (1966)
Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6, pp. pp. 417–422. Citeseer (2006)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Loughran, T., McDonald, B.: When is a liability not a liability. J. Finance 66, 35–65 (2011)
Cook, J.A., Ahmad, K.: Behaviour and markets: the interaction between sentiment analysis and ethical values? In: Jackowski, K., Burduk, R., Walkowiak, K., Woźniak, M., Yin, H. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 551–558. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24834-9_64
Kelly, S., Ahmad, K.: The impact of news media and affect in financial markets. In: Jackowski, K., Burduk, R., Walkowiak, K., Woźniak, M., Yin, H. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 535–540. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24834-9_62
Acknowledgments
J.A. Cook—This research is supported by Science Foundation Ireland through the CNGL Programme (Grant 07/CE/I1142) in the ADAPT Center (www.adaptcentre.ie) at Trinity College, University of Dublin. Z. Zhao—The research leading to these results has also received funding from the EU FP7 Slandail project under grant agreement no. 607691. In this study we used the text analysis system Rocksteady, developed as part of the Faireachain project for monitoring, evaluating and predicting behaviour of markets and communities (2009–2011). Support for Rocksteady’s development was provided by Trinity College, University of Dublin and Enterprise Ireland (Grant IP-2009-0595).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Cook, J.A., Zhao, Z., Ahmad, K. (2016). Stylized Facts of Linguistic Corpora: Exploring the Lexical Properties of Affect in News. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2016. IDEAL 2016. Lecture Notes in Computer Science(), vol 9937. Springer, Cham. https://doi.org/10.1007/978-3-319-46257-8_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-46257-8_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46256-1
Online ISBN: 978-3-319-46257-8
eBook Packages: Computer ScienceComputer Science (R0)