Skip to main content
Log in

Content analysis of business communication: introducing a German dictionary

  • Original Paper
  • Published:
Journal of Business Economics Aims and scope Submit manuscript

Abstract

Computer-aided text analyses have gained a lot of attention recently. Applied to different types of business communication such as earnings announcements, analyst reports, or IPO prospectuses, they have been used to extract relevant information for financial market participants. A large number of studies employ dictionary-based approaches by referring to specific word lists. Since these lists have been predominantly compiled for the English language, the respective analyses have focused on English business texts. In order to amplify the application of content analyses to other languages, we create a German dictionary designed to measure the textual sentiment of business communication. Our dictionary is based on the English dictionary by Loughran and McDonald (J Finance 66:35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x, 2011), which is commonly used for examining finance- and accounting-specific texts. We discuss the set-up of our dictionary and extensively test its quality. We further compare our dictionary to German general language dictionaries and to a machine-learning procedure and provide evidence for its ability to capture market-relevant textual sentiment of German business communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The BPW dictionary is publicly available at www.uni-giessen.de/BPW.

  2. Table 12 in the online appendix documents this trend by summarizing the most influential content analyses in the area since 2014. The results of these studies will be further described in the next sections. You can access the online appendix on www.uni-giessen.de/bpw or upon request.

  3. For more information on LSA and LDA approaches, see Loughran and McDonald (2016) or Eickhoff and Neuss (2017).

  4. For a comprehensive overview of relevant studies, see also Eickhoff and Neuss (2017).

  5. See http://www.wjh.harvard.edu/~inquirer/.

  6. See http://www.dictionsoftware.com/.

  7. See http://www.liwc.net.

  8. We use the most recent version of the LM dictionary as of March 2015. The original version of the LM dictionary used in Loughran and McDonald (2011) included 2337 negative, 353 positive, and 258 uncertainty words. A documentation on the updates of the LM dictionary can be found at https://www3.nd.edu/~mcdonald/Word_Lists.html.

  9. For a comprehensive overview on studies using the mentioned general and domain-specific dictionaries, see Kearney and Liu (2014) and Loughran and McDonald (2016).

  10. We refer to the LIWC as Wolf et al. (2008)’s German version of the—originally English—Linguistic Inquiry Word Count.

  11. See Table 15 in the online-appendix.

  12. For generating all translations, we use the widespread Langenscheidt German-English dictionary (Merz 2012) as well as online dictionaries available at http://www.dict.cc and http://www.linguee.de.

  13. For more information, see http://www.dgap.de/.

  14. We also run our analysis for a sub-sample of companies which explicitly report the use of external professional translation, proofreading, or text-production services. Our results from this sub-sample do not materially differ from those for the whole sample. The results are provided in Panel C of Table 14 in the online appendix.

  15. For more information, see http://www3.nd.edu/~mcdonald/Word_Lists.html.

  16. We provide the stop word list at www.uni-giessen.de/bpw.

  17. Note that removing sparse terms provides only a miniscule advantage in computing time using the dictionary approach and might be rather helpful if the document-term matrix as input of a statistical algorithm like Maximum Entropy, LSA, or LDA would become too large to compute. We thank an anonymous referee for pointing this out.

  18. We also conduct our main analysis on the quarterly and annual reports without using pruning or applying stop-word filtering. The corresponding results do not change materially and can be found in Panel B Table 14 in the online appendix.

  19. All text-processing steps were conducted with the Rapidminer software. For more information, please see https://rapidminer.com/.

  20. See Table 13 in the online appendix for a complete list of all word combinations we control for. For the same reason, we counted the terms “IMPAIRMENT LOSS” and “IMPAIRMENT LOSSES” as one negative word. We also conduct our analysis on the quarterly and annual reports without making any exception from the word independence assumption. The corresponding results can be found in Panel A of Table 14 in the online appendix.

  21. Loughran and McDonald (2011) give an account of the most frequently employed negative words in their sample of U.S. annual reports (10-K). The results for our sample of English quarterly and annual reports are quite comparable: All but one of the ten most common negative English words in our texts appear among the 30 most common negative words in Loughran and McDonald (2011). Furthermore, the distribution of the most common words within both samples appears to be quite comparable. While the ten most common negative words in our sample account for 35.0% of the negative word count, the ten most common negative words in Loughran and McDonald (2011) account for 33.8%.

  22. Summary statistics are presented in Table 16 in the online appendix.

  23. Note that the LIWC uses stems which account for several inflections.

  24. Note that the original HARVARD dictionary does not include inflections but word stems. For our analyses in Table 7, we use a modified inflection using version of the HARVARD dictionary’s negative wordlist by Loughran and McDonald (2011). For the positive wordlist of the HARVARD dictionary, we are not aware of a modified version using inflections and thus confine our comparison in Panel B to the negative wordlist.

  25. Note that some studies as, for example, Li (2010) combine negative and uncertain language as “negative” language. However, we believe that uncertain language is conceptually different from negative language and restrict our analyses in this section to positive and negative language.

  26. We also conduct fivefold, 10-fold, and 25-fold cross-validation tests. The average percentages of correctly classified observations are 64.4%, 64.9%, and 64.5%, respectively.

  27. The results are provided in Table 17 in the online appendix.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christina Bannier.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bannier, C., Pauls, T. & Walter, A. Content analysis of business communication: introducing a German dictionary. J Bus Econ 89, 79–123 (2019). https://doi.org/10.1007/s11573-018-0914-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11573-018-0914-8

Keywords

JEL Classification

Navigation