Digital agendas in the insurance industry: the importance of comprehensive approaches


With a growing awareness of the potential of innovation provided by digital technology, insurance companies have increasingly adopted digital agendas in their business activities. Our paper studies the relationship between the expression of a digital agenda in annual reports and the business performance of 41 publicly-traded European insurance companies for the time period from 2007 to 2017. Our findings show a positive relationship, which is particularly strong in cases where companies take a comprehensive approach by addressing digital technology both in the context of internal activities within their own organisation and external activities in connection with customers and business partners.

Fig. 1


  1. 1.

    In order to be able to calculate Tobin’s Q, we restrict the data set to publicly-traded insurance companies, and we consider companies that disclosed their full annual reports in English for the respective years.

  2. 2.

    Several approaches were conducted and evaluated; the most suitable results in our case were provided by preprocessing with Ghostscript, plain text extraction via Xpdf, and quantitative text analysis using the programming language R (considered alternatives include, amongst others, PDFBox, RapidMiner, and Tika).

  3. 3.

    Note that we explicitly do not use the common word stem “digit” at this point since it could be misleading.

  4. 4.

    This is done by Porter’s word stemming algorithm via SnowballC (see Bouchet-Valat 2014).

  5. 5.

    The word stems were gained from a content analysis of a subset of the complete data set. In order to avoid individual bias, key words were chosen by different researchers and then comprehensively discussed before a common agreement on the most suitable items was reached.

  6. 6.

    By focusing on the European market, we refrain from dealing with market specifics.

  7. 7.

    For \( d_{i,t}^{binary} \), the effect seems to be positive as well, but could not be statistically confirmed.

  8. 8.

    We also calculated \( d_{i,t}^{c50,ei,binary} \), \( d_{i,t}^{c50,e,binary} \), \( d_{i,t}^{c50,i,binary} \), \( d_{i,t}^{c100,ei,binary} \), \( d_{i,t}^{c100,e,binary} \), and \( d_{i,t}^{c100,i,binary} \).

  9. 9.

    In addition to this, we also use, amongst others, k = “c20,e,binary”, “c50,e,binary”, “c20,i,binary”, “c50,i,binary”, “c50,ei,binary”, and “binary” in the robustness analysis.

  10. 10.

    List of English stop words retrieved from


This paper has been granted the 2018 Shin Research Excellence Award—a partnership between The Geneva Association and the International Insurance Society—for its academic quality and relevance by the decision of a panel of judges comprising both business and academic specialists.



A.1: Treatment-effects model

The treatment-effects model is given by the following two regression equations that are simultaneously estimated via maximum likelihood. We assume \( d_{i,t}^{k} = d_{i,t}^{c20,ei,binary} \) in our base case.Footnote 9 The regression equation (“Q Equation”) is given by

$$ Q_{i,t} = x_{i,t} \beta + d_{i,t}^{k} \delta + \varepsilon_{i,t} $$

and the selection equation (“Digital Equation”) is defined as

$$ {}^{*}d_{i,t}^{k} = z_{i,t} \gamma + u_{i,t} , $$


$$ d_{i,t}^{k} = \left\{ {\begin{array}{*{20}c} 1 & {{\text{if }}{}^{*}d_{i,t}^{k} > 0} \\ 0 & {\text{otherwise}} \\ \end{array} } \right. $$

and error terms \( \varepsilon_{i,t} \) and ui,t that are assumed to be normally distributed with a mean vector of zero, variances of \( \sigma_{\varepsilon } \) and 1, and a covariance of \( \rho \) (see, e.g. Maddala 1983; Guo and Fraser 2010; Hoyt and Liebenberg 2011; Bohnert et al. 2018).

A.2: Pseudocode

Repeat for each PDF document, i.e. annual report

  • Rename and assign unique identifier (ID) to each PDF document (ID.pdf)

  • Prepare document for text extraction, i.e. remove access restrictions (via Ghostscript)

  • Extract plain text (via Xpdf) and create a text document with identical ID as the PDF document (ID.txt)

Repeat for each plain text document (for all files ID.txt)

  • Create corresponding new (empty) text document for concordances (ID_conc.txt)

  • Translate all characters to lower case characters

  • Wrap all characters that are not alphabetic characters or digits with one space character before and after

  • Replace (multiple) white space characters (including tab keys) and newline with one space character

  • Identify occurrences of words containing the strings “digita” or “digiti”

  • Repeat for each occurrence

    • Extract corresponding keyword string and certain number of words (string wrapped by space characters) before and after the corresponding keyword, i.e. extract concordance of a given length, e.g. 20 words before and 20 words after the keyword in case of c20

    • Add concordance line to ID_conc.txt

    • Repeat for each concordance document (for all files ID_conc.txt)

      • Remove all characters besides alphabetic characters

      • Remove common words from common.listFootnote 10

      • Transform all words into word stems (via Porter’s word stemming algorithm)

      • Set variable d_(i,t)^(c20,e,binary) to 1, if at least one of the following word stems can be found: “channel”, “client”, “custom”, “distribut”, “market”, “online”, “product”, “sale”, “service”

      • set variable to 0 otherwise

      • Set variable d_(i,t)^(c20,i,binary) to 1, if at least one of the following word stems can be found: “board”, “employe”, “group”, “manag”, “model”

      • set variable to 0 otherwise

      • Set variable d_(i,t)^(c20,ei,binary) to 1, if d_(i,t)^(c20,e,binary) == 1 and d_(i,t)^(c20,i,binary) == 1

      • set variable to 0 otherwise

Bohnert, A., Fritzsche, A. & Gregor, S. Digital agendas in the insurance industry: the importance of comprehensive approaches. Geneva Pap Risk Insur Issues Pract 44, 1–19 (2019).

