Abstract
Computer-aided text analyses have gained a lot of attention recently. Applied to different types of business communication such as earnings announcements, analyst reports, or IPO prospectuses, they have been used to extract relevant information for financial market participants. A large number of studies employ dictionary-based approaches by referring to specific word lists. Since these lists have been predominantly compiled for the English language, the respective analyses have focused on English business texts. In order to amplify the application of content analyses to other languages, we create a German dictionary designed to measure the textual sentiment of business communication. Our dictionary is based on the English dictionary by Loughran and McDonald (J Finance 66:35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x, 2011), which is commonly used for examining finance- and accounting-specific texts. We discuss the set-up of our dictionary and extensively test its quality. We further compare our dictionary to German general language dictionaries and to a machine-learning procedure and provide evidence for its ability to capture market-relevant textual sentiment of German business communication.
Similar content being viewed by others
Notes
The BPW dictionary is publicly available at www.uni-giessen.de/BPW.
Table 12 in the online appendix documents this trend by summarizing the most influential content analyses in the area since 2014. The results of these studies will be further described in the next sections. You can access the online appendix on www.uni-giessen.de/bpw or upon request.
For a comprehensive overview of relevant studies, see also Eickhoff and Neuss (2017).
See http://www.liwc.net.
We use the most recent version of the LM dictionary as of March 2015. The original version of the LM dictionary used in Loughran and McDonald (2011) included 2337 negative, 353 positive, and 258 uncertainty words. A documentation on the updates of the LM dictionary can be found at https://www3.nd.edu/~mcdonald/Word_Lists.html.
We refer to the LIWC as Wolf et al. (2008)’s German version of the—originally English—Linguistic Inquiry Word Count.
See Table 15 in the online-appendix.
For generating all translations, we use the widespread Langenscheidt German-English dictionary (Merz 2012) as well as online dictionaries available at http://www.dict.cc and http://www.linguee.de.
For more information, see http://www.dgap.de/.
We also run our analysis for a sub-sample of companies which explicitly report the use of external professional translation, proofreading, or text-production services. Our results from this sub-sample do not materially differ from those for the whole sample. The results are provided in Panel C of Table 14 in the online appendix.
For more information, see http://www3.nd.edu/~mcdonald/Word_Lists.html.
We provide the stop word list at www.uni-giessen.de/bpw.
Note that removing sparse terms provides only a miniscule advantage in computing time using the dictionary approach and might be rather helpful if the document-term matrix as input of a statistical algorithm like Maximum Entropy, LSA, or LDA would become too large to compute. We thank an anonymous referee for pointing this out.
We also conduct our main analysis on the quarterly and annual reports without using pruning or applying stop-word filtering. The corresponding results do not change materially and can be found in Panel B Table 14 in the online appendix.
All text-processing steps were conducted with the Rapidminer software. For more information, please see https://rapidminer.com/.
See Table 13 in the online appendix for a complete list of all word combinations we control for. For the same reason, we counted the terms “IMPAIRMENT LOSS” and “IMPAIRMENT LOSSES” as one negative word. We also conduct our analysis on the quarterly and annual reports without making any exception from the word independence assumption. The corresponding results can be found in Panel A of Table 14 in the online appendix.
Loughran and McDonald (2011) give an account of the most frequently employed negative words in their sample of U.S. annual reports (10-K). The results for our sample of English quarterly and annual reports are quite comparable: All but one of the ten most common negative English words in our texts appear among the 30 most common negative words in Loughran and McDonald (2011). Furthermore, the distribution of the most common words within both samples appears to be quite comparable. While the ten most common negative words in our sample account for 35.0% of the negative word count, the ten most common negative words in Loughran and McDonald (2011) account for 33.8%.
Summary statistics are presented in Table 16 in the online appendix.
Note that the LIWC uses stems which account for several inflections.
Note that the original HARVARD dictionary does not include inflections but word stems. For our analyses in Table 7, we use a modified inflection using version of the HARVARD dictionary’s negative wordlist by Loughran and McDonald (2011). For the positive wordlist of the HARVARD dictionary, we are not aware of a modified version using inflections and thus confine our comparison in Panel B to the negative wordlist.
Note that some studies as, for example, Li (2010) combine negative and uncertain language as “negative” language. However, we believe that uncertain language is conceptually different from negative language and restrict our analyses in this section to positive and negative language.
We also conduct fivefold, 10-fold, and 25-fold cross-validation tests. The average percentages of correctly classified observations are 64.4%, 64.9%, and 64.5%, respectively.
The results are provided in Table 17 in the online appendix.
References
Allee KD, Deangelis MD (2015) The structure of voluntary disclosure narratives: evidence from tone dispersion. J Account Res 53:241–274. https://doi.org/10.1111/1475-679X.12072
Ammann M, Schaub N (2016) Social interaction and investing: evidence from an online social trading network. Working Paper
Antons D, Breidbach CF (2018) Big data, big insights? Advancing service innovation and design with machine learning. J Ser Res 21:17–39. https://doi.org/10.1177/1094670517738373
Antons D, Kleer R, Salge TO (2016) Mapping the topic landscape of JPIM, 1984-2013: in search of hidden structures and development trajectories. J Prod Innov Manag 33:726–749. https://doi.org/10.1111/jpim.12300
Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J Finance 59:1259–1294. https://doi.org/10.1111/j.1540-6261.2004.00662.x
Arslan-Ayaydin Ö, Boudt K, Thewissen J (2015) Managers set the tone: equity incentives and the tone of earnings press releases. J Bank Finance 72:132–147. https://doi.org/10.1016/j.jbankfin.2015.10.007
Baker M, Wurgler J (2006) Investor sentiment and the cross-section of stock returns. J of Finance 61:1645–1680. https://doi.org/10.1111/j.1540-6261.2006.00885.x
Bannier CE, Pauls T, Walter A (2017) CEO-Speeches and stock returns. Working Paper. https://doi.org/10.2139/ssrn.2869785
Bao Y, Datta A (2014) Simultaneously discovering and quantifying risk types from textual risk disclosures. Manage Sci 60:1371–1391. https://doi.org/10.1287/mnsc.2014.1930
Blair C, Cole SR (2002) Two-sided equivalence testing of the difference between two means. J Mod Appl Stat Methods 1:139–142. https://doi.org/10.22237/jmasm/1020255540
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Boudt K, Thewissen J (2018) Jockeying for position in CEO letters: impression management and sentiment analytics. Financ Manage. https://doi.org/10.1111/fima.12219
Boukus E, Rosenberg JV (2006) The information content of FOMC minutes. Working Paper. https://doi.org/10.2139/ssrn.922312
Brooke J, Tofiloski M, Taboada M (2009) Cross-linguistic sentiment analysis: from English to Spanish. Proc Int Conf RANLP 2009:50–54
Buehlmaier MMM, Whited TM (2018) Are financial constraints priced? Evidence from textual analysis. Rev Financ Stud 31:2693–2728. https://doi.org/10.1093/rfs/hhy007
Caton S, Hall M, Weinhardt C (2015) How do politicians use Facebook? An applied social observatory. Big Data Soc 2:1–18. https://doi.org/10.1177/2053951715612822
Caumanns J (1999) A fast and simple stemming algorithm for German words. http://www.inf.fu-berlin.de/inst/pubs/tr-b-99-16.abstract.html. Accessed 13 Jan 2018
Cicon JE, Ferris SP, Kammel AJ, Noronha G (2012) European corporate governance: a thematic analysis of national codes of governance. Eur Financ Manag 18:620–648. https://doi.org/10.1111/j.1468-036X.2010.00542.x
Das SR, Chen MY (2007) Yahoo! for Amazon: sentiment extraction from small talk on the web. Manage Sci 53:1375–1388. https://doi.org/10.1287/mnsc.1070.0704
Davis AK, Tama-Sweet I (2012) Managers’ use of language across alternative disclosure outlets: earnings press releases versus MD&A. Contemp Account Res 29:804–837. https://doi.org/10.1111/j.1911-3846.2011.01125.x
Davis AK, Piger JM, Sedor LM (2012) Beyond the numbers: measuring the information content of earnings press release language. Contemp Account Res 29:845–868. https://doi.org/10.1111/j.1911-3846.2011.01130.x
Davis AK, Ge W, Matsumoto D, Zhang JL (2015) The effect of manager-specific optimism on the tone of earnings conference calls. Rev Acc Stud 20:639–673. https://doi.org/10.1007/s11142-014-9309-4
Debortoli S, Müller O, Junglas I, Vom Brocke J (2016) Text mining for information systems researchers: an annotated topic modeling tutorial. CAIS 39:110–135. https://doi.org/10.17705/1CAIS.03907
Doran JS, Peterson DR, Price SM (2012) Earnings conference call content and stock price: the case of REITs. The Journal of Real Estate Finance and Economics 45:402–434. https://doi.org/10.1007/s11146-010-9266-z
Eickhoff M, Muntermann J (2016a) How to conquer information overload? Supporting financial decisions by identifying relevant conference call topics. In: PACIS 2016 Proceedings
Eickhoff M, Muntermann J (2016b) They talk but what do they listen to? Analyzing financial analysts’ information processing using latent Dirichlet allocation. In: PACIS 2016 Proceedings
Eickhoff M, Neuss N (2017) Topic modelling methodology: Its use in information systems and other managerial disciplines. In: Proceedings of the 25th European conference on information systems, Guimarães, Portugal:1327–1347
Engelberg J (2008) Costly information processing: Evidence from earnings announcements. Working Paper. https://doi.org/10.2139/ssrn.1107998
Feldman R, Govindaraj S, Livnat J, Segal B (2008) The incremental information content of tone change in management discussion and analysis. Working Paper
Ferris SP, Hao Q, Liao M-Y (2013) The effect of issuer conservatism on IPO pricing and performance. Rev Finance 17:993–1027. https://doi.org/10.1093/rof/rfs018
Feuerriegel S, Ratku A, Neumann D (2016) Analysis of how underlying topics in financial news affect stock prices using Latent Dirichlet Allocation. 49th Hawaii International Conference on System Sciences (HICSS 2016):1072–1081. https://doi.org/10.1109/hicss.2016.137
Frazier KB, Ingram RW, Tennyson BM (1984) A methodology for the analysis of narrative accounting disclosures. J Account Res 22:318–331. https://doi.org/10.2307/2490713
Gamache DL, McNamara G, Mannor MJ, Johnson RE (2015) Motivated to acquire? The impact of CEO regulatory focus on firm acquisitions. Acad Manag J 58:1261–1282. https://doi.org/10.5465/amj.2013.0377
García D (2013) Sentiment during recessions. The Journal of Finance 68:1267–1300. https://doi.org/10.1111/jofi.12027
Giorgi S, Weber K (2015) Marks of distinction: framing and audience appreciation in the context of investment advice. Adm Sci Q 60:333–367. https://doi.org/10.1177/0001839215571125
González M, Guzmán A, Téllez D, Trujill M-A (2016) What do you say and how do you say it: information disclosure in Latin American Firms. Working Paper. https://doi.org/10.2139/ssrn.2929833
Griffin PA (2003) Got information? Investor response to form 10-K and form 10-Q EDGAR filings. Rev Acc Stud 8:433–460. https://doi.org/10.1023/A:1027351630866
Grimmer J, Stewart BM (2013) Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal 21:267–297. https://doi.org/10.1093/pan/mps028
Haselmayer M, Jenny M (2016) Sentiment analysis of political communication. Combining a dictionary approach with crowdcoding. Qual Quant 51:2623–2646. https://doi.org/10.1007/s11135-016-0412-4
Hawkins JA (2015) A comparative typology of English and German: Unifying the contrasts, 1st edn., Croom Helm, London, 1986. Routledge library editions: English Language, vol 10. Routledge, London
Henry E (2008) Are investors influenced by how earnings press releases are written? J Bus Commun 45:363–407. https://doi.org/10.1177/0021943608319388
Henry E, Leone AJ (2016) Measuring qualitative information in capital markets research: comparison of alternative methodologies to measure disclosure tone. Account Rev 91:153–178. https://doi.org/10.2308/accr-51161
Heston SL, Sinha NR (2016) News versus sentiment: predicting stock returns from news stories. FEDS 2016:1–35. https://doi.org/10.17016/feds.2016.048
Hillert A, Jacobs H, Müller S (2014) Media makes momentum. Rev Financ Stud 27:3467–3501. https://doi.org/10.1093/rfs/hhu061
Hillert A, Niessen-Ruenzi A, Ruenzi S (2016) Mutual fund shareholder letters: flows, performance, and managerial behavior. Working Paper. https://doi.org/10.2139/ssrn.2524610
Huang X, Teoh SH, Zhang Y (2014a) Tone management. Account Rev 89:1083–1113. https://doi.org/10.2308/accr-50684
Huang AH, Zang AY, Zheng R (2014b) Evidence on the information content of text in analyst reports. Account Rev 89:2151–2180. https://doi.org/10.2308/accr-50833
Huang AH, Lehavy R, Zang AY, Zheng R (2017) Analyst information discovery and interpretation roles: a topic modeling approach. Manage Sci 64:2833–2855. https://doi.org/10.1287/mnsc.2017.2751
Iliev R, Sagi E, Dehghani M (2015) Automated text analysis in psychology: methods, applications, and future developments. Lang Cogn 7:265–290. https://doi.org/10.1017/langcog.2014.30
Jacobi C, Kleinen-von Königslöw K, Ruigrok N (2016) Political news in online and print newspapers. Dig J 4:723–742. https://doi.org/10.1080/21670811.2015.1087810
Jandl J-O, Feuerriegel S, Neumann D (2014) Long- and short-term impact of news messages on house prices: a comparative study of Spain and the United States. Thirty fifth international conference on information systems (Auckland), pp 1–18
Jegadeesh N, Wu D (2013) Word power: a new approach for content analysis. J Financ Econ 110:712–729. https://doi.org/10.1016/j.jfineco.2013.08.018
Kaji N, Kitsuregawa M (2007) Building lexicon for sentiment analysis from massive collection of HTML documents. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1075–1083
Kaplan S, Vakili K (2015) The double-edged sword of recombination in breakthrough innovation. Strateg Manag J 36:1435–1457. https://doi.org/10.1002/smj.2294
Kearney C, Liu S (2014) Textual sentiment in finance: a survey of methods and models. Int Rev Financ Anal 33:171–185. https://doi.org/10.1016/j.irfa.2014.02.006
Kirchhoff KR, Piwinger M (2009) Praxishandbuch investor relations. Gabler, Wiesbaden. https://doi.org/10.1007/978-3-8349-8810-2
König E, Gast V (2012) Understanding English-German contrasts, vol 29, 3rd edn. Grundlagen der Anglistik und Amerikanistik. Schmidt, Berlin
Larcker DF, Zakolyukina AA (2012) Detecting deceptive discussions in conference calls. J Account Res 50:495–540. https://doi.org/10.1111/j.1475-679X.2012.00450.x
Lee H, Kang P (2017) Identifying core topics in technology and innovation management studies: a topic model approach. J Technol Transf. https://doi.org/10.1007/s10961-017-9561-4
Lee S, Song J, Kim Y (2010) An empirical comparison of four text mining methods. In: 43rd Hawaii international conference on system sciences (HICSS), 2010; Honolulu, Hawaii, 5-8 Jan., pp 1–10. https://doi.org/10.1109/hicss.2010.48
Li F (2010) The information content of forward-looking statements in corporate filings—a naïve bayesian machine learning approach. J Account Res 48:1049–1102. https://doi.org/10.1111/j.1475-679X.2010.00382.x
Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance 66:35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x
Loughran T, McDonald B (2015) The use of word lists in textual analysis. J Behav Finance 16:1–11. https://doi.org/10.1080/15427560.2015.1000335
Loughran T, McDonald B (2016) Textual analysis in accounting and finance: a survey. J Account Res 54:1187–1230. https://doi.org/10.1111/1475-679X.12123
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
Mengelkamp A, Hobert S, Schumann M (2015) Corporate credit risk analysis utilizing textual user generated content—a Twitter based feasibility study. Working Paper
Mengelkamp A, Schumann M, Wolf S (2016) Data driven creation of sentiment dictionaries for corporate credit risk analysis. In: Proceedings of the 22nd Americas conference on information systems (AMCIS), pp 1–8
Merz L (2012) Langenscheidt Routledge Fachwörterbuch kompakt Wirtschaft Englisch: Englisch-deutsch; deutsch-englisch = Langenscheidt Routledge dictionary of business concise edition English, 4th edn. Langenscheidt Fachwörterbücher, Langenscheidt, Berlin
Molina-González MD, Martínez-Cámara E, Martín-Valdivia M-T, Perea-Ortega JM (2013) Semantic orientation for polarity classification in Spanish reviews. Expert Syst Appl 40:7250–7257. https://doi.org/10.1016/j.eswa.2013.06.076
Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of LIWC2015. https://doi.org/10.15781/t29g6z
Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137. https://doi.org/10.1108/eb046814
Price SM, Doran JS, Peterson DR, Bliss BA (2012) Earnings conference calls and stock returns: the incremental informativeness of textual tone. J Bank Finance 36:992–1011. https://doi.org/10.1016/j.jbankfin.2011.10.013
Ramírez-Esparza N, Pennebaker JW, García FA, Suriá Martínez R (2007) La psicología del uso de las palabras: un programa de computadora que analiza textos en español. Revista Mexicana de Psicología 24:85–99
Remus R, Quasthoff U, Heyer G (2010) SentiWS—a publicly available German-language resource for sentiment analysis. LREC. 2010
Renault T (2017) Intraday online investor sentiment and return patterns in the U.S. stock market. J Bank Finance 84:25–40. https://doi.org/10.1016/j.jbankfin.2017.07.002
Rushdi-Saleh M, Martín-Valdivia MT, Ureña-López LA, Perea-Ortega JM (2011) OCA: opinion corpus for Arabic. J Am Soc Inform Sci Technol 62:2045–2054. https://doi.org/10.1002/asi.21598
Shrout PE, Fleiss JL (1979) Intraclass correlations: uses in assessing rater reliability. Psychol Bull 86:420–428. https://doi.org/10.1037/0033-2909.86.2.420
Sinha NR (2016) Underreaction to news in the US stock market. Q J Finance 6:1–46. https://doi.org/10.1142/S2010139216500051
Stone PJ, Dunphy DC, Smith MS (1966) The general inquirer: a computer approach to content analysis. MIT Press, Cambridge
Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 34:2622–2629. https://doi.org/10.1016/j.eswa.2007.05.028
Tetlock PC (2007) Giving content to investor sentiment. The role of media in the stock market. J Finance 62:1139–1168. https://doi.org/10.1111/j.1540-6261.2007.01232.x
Tetlock PC, Saar-Tsechansky M, MacsKassy S (2008) More than words. Quantifying language to measure firms’ fundamentals. J Finance 63:1437–1467. https://doi.org/10.1111/j.1540-6261.2008.01362.x
Tirunillai S, Tellis GJ (2014) Mining marketing meaning from online chatter: strategic brand analysis of big data using latent Dirichlet allocation. J Mark Res 51:463–479. https://doi.org/10.1509/jmr.12.0106
Twedt B, Rees L (2012) Reading between the lines: an empirical examination of qualitative attributes of financial analysts’ reports. J Account Public Policy 31:1–21. https://doi.org/10.1016/j.jaccpubpol.2011.10.010
Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the 2008 conference on empirical methods in natural language processing (Honolulu, Hawaii, 2008), pp 553–561
Wang X, Bendle NT, Mai F, Cotte J (2015) The journal of consumer research at 40: a historical analysis. J Consum Res 42:5–18. https://doi.org/10.1093/jcr/ucv009
Wolf M, Horn AB, Mehl MR, Haug S, Pennebaker JW, Kordy H (2008) Computergestützte quantitative Textanalyse. Diagnostica 54:85–98. https://doi.org/10.1026/0012-1924.54.2.85
Zehe A, Becker M, Hettinger L, Hotho A, Reger I (2016) Prediction of happy endings in German novels based on sentiment information. In: Proceedings of the workshop on interactions between data mining and natural language processing 2016, pp 9–16
Zijlstra H, van Meerveid T, van Middendorp H (2004) De Nederlandse versie van de’Linguistic inquiry and word Count’(LIWC). Gedrag Gezondheid 32:271–281
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bannier, C., Pauls, T. & Walter, A. Content analysis of business communication: introducing a German dictionary. J Bus Econ 89, 79–123 (2019). https://doi.org/10.1007/s11573-018-0914-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11573-018-0914-8