Skip to main content

Stylized Facts of Linguistic Corpora: Exploring the Lexical Properties of Affect in News

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2016 (IDEAL 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9937))

  • 1831 Accesses

Abstract

Investors are often said to be driven by emotions, and studies in sentiment analysis claim that there is a causal relationship between negative affect in text and prices in financial markets. The text collections used in these studies tend to be of varying sizes and sources, with little justification of their design criteria. This is a classic data engineering problem, which requires specification of the data sources and design of the data repositories and retrieval facilities. In this paper, we explore the statistical properties of negative affect expressed in various textual corpora, differing in specification, size and provenance. The question we ask is whether there are any stylized facts of negative affect that are universal across all texts. We observed two main findings: (1) The frequency distribution of negative terms is generally stable across different corpus sizes and (2) The frequency of negative terms accounts for a relatively small proportion of the total terms in the corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Returns denote the relative change in price of an asset between two different time periods.

  2. 2.

    Typically, the first 100 words in a ranked frequency list are responsible for 50 % of the total words in the text collection [7].

  3. 3.

    The Shapiro-Wilk statistic for the AofM and COCA corpora at 128 documents was not significant at \(p < 0.001\).

  4. 4.

    Namely the terms ‘down’, ‘loss’, ‘close’, ‘drop’, ‘lose’, ‘against’, ‘crisis’, ‘decline’, ‘problem’, ‘concern’, ‘cut’, ‘risk’, ‘decline’, ‘inflation’ and ‘drop’. These words were selected based on their common occurrence across all 6 corpora.

References

  1. Cont, R.: Empirical properties of asset returns: stylized facts adn statistical issues. Quant. Finance 1, 223–236 (2001)

    Article  Google Scholar 

  2. Taylor, S.J.: Asset Price Dynamics, Volatility, and Prediction. Princeton University Press, Princeton (2011)

    Book  MATH  Google Scholar 

  3. Shiller, R.J., Perron, P.: Testing the random walk hypothesis: power versus frequency of observation. Econ. Lett. 18(4), 381–386 (1985)

    Article  MATH  Google Scholar 

  4. Tetlock, P.C.: Giving content to investor sentiment: the role of media in the stockmarket. J. Finance 62(3), 1139–1168 (2007)

    Article  Google Scholar 

  5. Garcia, D.: Sentiment during recessions. J. Finance LXVIII 3, 1267–1300 (2013). doi:10.1111/jofi.12027

    Article  Google Scholar 

  6. Antweiler, W., Frank, M.Z.: Is all that talk just noise? the information content of internet stock message boards. J. Finance 59(3), 1259–1294 (2004)

    Article  Google Scholar 

  7. Ahmad, K.: Being in text and text in being: notes on representative texts. In: Andeman, G., Rogers, M. (eds.) Incorporating Corpora, pp. 60–91. Multilingual Matters, Clevedon (2008)

    Google Scholar 

  8. Loughran, T., McDonald, B.: The use of word lists in textual analysis. J. Behav. Finance 16(1), 1–11 (2015)

    Article  Google Scholar 

  9. Davies, M., The corpus of contemporary american english: 450 million words, 1990-present (2008)

    Google Scholar 

  10. British National Corpus. Oxford University, Humanities Computing Unit, New York (2000)

    Google Scholar 

  11. Kelly, S.: Signs of irrational exuberance: an investigation into the role of news and sentiment in finance. Ph.D. thesis, Trinity College, University of Dublin (2015)

    Google Scholar 

  12. Zhao, Z., Ahmad, K.: Qualitative and quantitative sentiment proxies: interaction between markets. In: Jackowski, K., Burduk, R., Walkowiak, K., Woźniak, M., Yin, H. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 466–474. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24834-9_54

    Chapter  Google Scholar 

  13. Zhao, Z., Ahmad, K.: A computational account of investor behaviour in chinese and US market. Int. J. Econ. Behav. Organ. 3(6), 78–84 (2015)

    Google Scholar 

  14. Stone, P.J., Dunphy, D.C., Smith, M.S., Olgilvie, D.M., with associates: The General Inquirer: A Computer Approach to Content Analysis. The MIT Press, Cambridge (1966)

    Google Scholar 

  15. Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6, pp. pp. 417–422. Citeseer (2006)

    Google Scholar 

  16. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  17. Loughran, T., McDonald, B.: When is a liability not a liability. J. Finance 66, 35–65 (2011)

    Article  Google Scholar 

  18. Cook, J.A., Ahmad, K.: Behaviour and markets: the interaction between sentiment analysis and ethical values? In: Jackowski, K., Burduk, R., Walkowiak, K., Woźniak, M., Yin, H. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 551–558. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24834-9_64

    Chapter  Google Scholar 

  19. Kelly, S., Ahmad, K.: The impact of news media and affect in financial markets. In: Jackowski, K., Burduk, R., Walkowiak, K., Woźniak, M., Yin, H. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 535–540. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24834-9_62

    Chapter  Google Scholar 

Download references

Acknowledgments

J.A. Cook—This research is supported by Science Foundation Ireland through the CNGL Programme (Grant 07/CE/I1142) in the ADAPT Center (www.adaptcentre.ie) at Trinity College, University of Dublin. Z. Zhao—The research leading to these results has also received funding from the EU FP7 Slandail project under grant agreement no. 607691. In this study we used the text analysis system Rocksteady, developed as part of the Faireachain project for monitoring, evaluating and predicting behaviour of markets and communities (2009–2011). Support for Rocksteady’s development was provided by Trinity College, University of Dublin and Enterprise Ireland (Grant IP-2009-0595).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zeyan Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Cook, J.A., Zhao, Z., Ahmad, K. (2016). Stylized Facts of Linguistic Corpora: Exploring the Lexical Properties of Affect in News. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2016. IDEAL 2016. Lecture Notes in Computer Science(), vol 9937. Springer, Cham. https://doi.org/10.1007/978-3-319-46257-8_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46257-8_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46256-1

  • Online ISBN: 978-3-319-46257-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics