Introduction

Innovation in corporate organizations represents a driver of competitiveness and productivity growth, which in turn have a positive impact on both firms’ value and reputation. In the last decades, we have witnessed fundamental changes in the innovators’ workplaces, that had a significant influence on both the business collaboration practices and the management of (bulky) information [83, 99, 104]. On one side, the figure of the innovator is changing, and though innovation has been perceived for years in a predominantly masculine context, a strong evidence of gender differences in the decision-making process which impact firms’ innovation strategies has been reported in literature [30]. On the other side, the increasing intensive use of web-based technologies is changing the way firms communicate all the activities related to their business mission and objectives. The interdependence of these activities has opened debates about who actually drives the innovation process, and the effect of the people driving innovation on how the innovation is articulated and communicated to potential stakeholders.

In this context, it is important to understand whether the diversity of people behind innovation is an asset for companies, and whether this asset has an impact on the ways creativity and innovation are communicated. This task is difficult since, as we shall see in what follows, it is difficult to assess the human aspect of firm’s contribution to innovation, probably due to the invisibility of people in innovation.

In our paper, we want to study whether the gender composition of the corporate Boards of Directors could be related to how the information about innovation is articulated on the company’s website, and to how it is communicated to potential customers and other stakeholders. Corporate boards are becoming increasingly engaged with their company’s innovation journey and can provide important oversight and support along the way [12, 88, 103]. Our research design presupposes a relation between corporate website content and board’s innovation strategy. Such presupposition does not assume a fine-grained control of the board on the content of company websites. We do however assume that the communication practices and the content of company websites are aligned with the key messages and decisions of the board with respect to the corporate innovation strategy.

This paper is thus an attempt to address the ongoing debate about the relationship between leadership and innovation, by examining:

  • whether the presence of women as part of the leadership of complex organizations may have an influence on the communication of innovation-related topics;

  • the extent to which human attributes (in our case, the gender attribute) may be used to derive useful information about company innovation communication practices.

Following the ideas of [107], we want to quantify the ability of businesses to articulate and communicate the innovative aspects of their products, services and processes opportunities. Any innovation assessment procedure should be devised w.r.t. the type of innovation that one wants to take into account. In our study, we focus on the products, processes and services innovation. Such focus allows us to relate our results to previous studies that have investigated the relationship between the Board’s gender diversity and product innovation [84]. In particular, gender diversity may lead to a broader range of ideas, which could trigger innovation, since more diverse ideas, in their number and in their diversity, may increase the likelihood to introduce new products or new services by the company [78, 89]. Although gender aspects have been already studied w.r.t. product innovation, results are still far from being unique: for example, [46] bring evidence of significant associations between Board diversity and product innovation, also stating that in some cases gender diversity revealed a negative impact on product innovation. In our contribution, we will go beyond traditional products innovation metrics by focusing on the companies’ degree of articulation of the innovative aspects of their new products, processes and services.

After addressing the gender-aspects of innovation, we aim to use these insights to help modeling the extent to which businesses articulate the innovativeness of their products and services, based on the straightforward assumption that a company’s online communication practices have a potential impact on customers and other stakeholders perceptions of its innovativeness, sustainability, and commitment [21, 26]. Starting from the consideration by [66], that stresses out the importance of considering how innovativeness is perceived as a whole, we implement a metric that measures the extent to which a company articulates the innovativeness of its products, processes and services. This metric (referred to as articulation of innovation) measures the frequency of use of keywords associated to innovation, and it is related to company’s communication practices, based on its claims about the innovative aspects of its products, processes or services. In order to determine the values of this metric, we have used a web search tool that measures the frequency of a preliminary designed regular expression on the websites of firms from specific business sectors. Please notice that this procedure has been devised by [105], in a contribution aimed to express the articulation of innovation by using different types of co-creation activities as predictors [36, 37]. In our work, we want to extend these approaches, by adding the gender metrics in the predictors set.

In a nutshell, in our paper, we aim:

  • To examine the relation between the articulation of innovation and the presence of women in companies’ Boards of Directors. To this end, we will perform a correlation analysis between the gender metric and the articulation of innovation, and we will interpret the results to test the hypothesis that the specific expertise and knowledge brought by women could be as associated with a broader range of online claims about the innovativeness of company’s produces and services and, thus, with a better articulation of firm’s innovativeness [69].

  • To predict the articulation of innovation by adding the gender component to the predictors set.

Since some authors state that female Board representation is positively associated with performance only with respect to firms for which innovation and creativity play a particularly important role [27], the purpose of our exploration is to understand whether there is a difference between the relationship between gender and articulation of innovation, and between innovation-driven and more traditional businesses: we investigate firms for which innovation and creativity play a particularly important role to determine whether there is a correlation between gender and innovation articulation; then, we compare our findings to firms that are representative of a whole economy. This will provide (or not) evidence to confirm that female Board representation is positively associated with innovation performances and that is tightly intertwined with the most innovative and creative firms.

Our paper is organized as follows: “Motivation” illustrates the motivations that led to the present work; “Co-creation, gender aspects and innovation” outlines the ongoing trends of the innovation and gender related literature; “Our data” describes the data we have used in our experiments; “Correlation analysis” performs a first correlation analysis of the variables taken into account. Then, our Neural Network approach is outlined in “Neural network approach”, and is compared to a linear regression approach in “Comparison with linear regression”, before concluding and outlining future directions of research in “Conclusions and future works”.

Motivation

The motivation for our study is twofold. First, the relationship between the gender composition of the Board of Directors and the degree of articulation of the innovative aspects of companies’ new products and services has not been studied in detail before: there are no previous studies that have suggested a specific tool and model that could be used to examine the practical aspects of companies’ online communications, and in this sense, our study addresses a gap in the existing innovation management literature. Second, there is not any preliminary assumption about the nature of the above relationship, and this lack of preliminary knowledge justifies the adoption of a generic approach like neural networks in parallel to traditional linear regression methods, which pre-assumes a simple linear relationship. In this sense, a key feature of our study consists in the generic nature of the exploration of the relationship between gender and the articulation of innovation: we will show that the neural network approach appears to be better at detecting the potential role of gender on the articulation of innovation on companies’ websites, when used jointly with other indicators. We want to stress out that the only contribution to this regard has been made by [35], that concluded that “it is not possible to detect a universal relationship between innovation perception and gender diversity on the Board of Directors, since traditional business show a low magnitude of such correlation and, conversely, an higher positive relationship is found on businesses that are referred to as innovative”: in both cases, [35] has found an increasing relationships between the two aggregates over time (data have been collected over the years 2013 and 2019), and in the current contribution, we will check whether this trend is confirmed over new observations.

The novelty of the study is the focus on the relationship between the presence of women in the Boards of Directors and the articulation of innovation on companies websites: to the best of our knowledge this relationship has not been studied in-depth so far, and the results of our study are expected to provide preliminary insights that could potentially inform and support executive management decisions related to the online communication practices of their companies, as well as to explore ways of enhancing them by ensuring a stronger engagement with the female representatives of their Boards of Directors. More importantly, the study provides a basis for future studies that could address the above relation in more explicit ways. Furthermore, our analysis can be used by policy makers interested in the relation between innovation management, the communication of companies’ innovativeness and gender balance. Accordingly, we can assess the potential impact of gender balance on firms performance and innovation management: our data are important for understanding whether gender policies (promoted by the country in which the firm is based) are successfully implemented, as well as for comparing gender policies promoted in the different countries, in order to set up an experimental comparison over time and across different countries. In the second part of our work, we perform a computational prediction task in order to assess whether gender aspects may be useful to predict the online articulation of innovation by businesses coming from different sectors.

Our work falls within the strand of recent research highlighting the impact of gender on corporate innovation [47, 115, 117]. However, our approach differ in several respects, mainly in the way we measure the innovativeness of a business. Instead of using traditional measures, such as patent counts and patent citations (which capture only a partial aspect of corporate innovation), we go beyond by using the value co-creation dimension as a result of all those practises which challenge the traditional ways of innovation management [38, 105].

Gender quotas regulations for company boards is nowadays widely debated and, although their implementation has contributed to boost the promotion of gender diversity policy in today’s labor market, they represent a very controversial subject. While the political arguments for gender quotas are intrinsically motivated by principles of fairness and equality, this paper, by examining to what extent gender diversity at the board level affects innovation, looks more in-depth as it discloses relevant policy guidelines and practical implications.

Co-creation, gender aspects and innovation

In this section, we are providing a literature review about the main concepts upon which our study is based: co-creation and gender, and their implications in the innovation debate. As stated in “Introduction”, there are some contributions that assess the relationship between co-creation and articulation of innovation, and we want to understand whether the gender component can be used to improve this assessment task.

Value co-creation is becoming more and more important in marketing and innovation: it can be defined as a marketing paradigm able to satisfy the needs of heterogeneous groups of customers [109], through the involvement of end users and other stakeholders in the creation of the final product or service [90]: it fosters a transformation of the customer role, that becomes an active participant in the creation of its value [95]. In this paradigm, the business and the customer build a dialogue that should be interpreted as a process of co-learning to achieve a shared goal for both parties [13]. For this reason, the involvement of customers in the design of the product or service has to be seen as an interactive process, often undertaken unconsciously by the customer [64, 90]. Customers’ preferences are fundamental elements which shape the value co-creation process [108]: in order to satisfy these preferences, the business has to provide its customers with information, knowledge, skills and resources to be used during the processes; in addition, the business must be able to influence the value co-creation process in order to enable customers to make the most efficient use of the resources [87].

Due to the advancement in Information and Communication Technology (ICT), more and more efficient technological tools allow individuals to receive news and information in real time and at a global level. As a consequence, people become more and more aware of their needs to be satisfied by a product or a service [38]. Furthermore, it should be noted that the interaction between the business and the customer during the value co-creation process can be guaranteed by a channel of communication between them, and this channel often consists of an online platform [101]. The development of new technologies has improved the efficiency of these online platforms, making the collaboration between businesses and customers easier and faster.

In this contribution, we adopt the definition of co-creation introduced by [36,37,38,39] in which co-creation has been identified as a single concept. Please notice that in some contributions [44, 97] the concept of co-creation has been partitioned into value co-creation and co-production, in which the former refers to the customer’s involvement in the phase of use and consumption of the product or service, and the latter refers to the involvement in the phase of design and production.

The adoption of value co-creation practices leads to benefits for both the customer and the supplier: with regards to customers, they can get the product or service that actually meets their preferences [116], and they feel actively involved in the production process and this stimulates trust and loyalty towards the supplier [90]; from a supplier’s perspective, value co-creation clearly highlights customer preferences and provides the opportunity to create the products and services able to meet their interests [109]. In addition, this learning may lead to create new products (and services), leading the business to gain a competitive advantage and a greater degree of innovation [116]. In this regard, it has been stated by several authors [65, 82, 94] that the involvement of customers in value co-creation activities has a positive influence on the results of innovation; in particular, this activity is able to reduce innovation costs and time-to-market, and to increase the quality of the new product or service and the business’ development skills. For this reason, the development of new platforms that allow collaboration in the creation of value is considered more important and it is placed among the fundamental pillars of a business strategic plan [82, 96].

Given the increasing importance of co-creation activities [109] and that a possible positive relationship between the concepts of value co-creation and innovation has been found [36,37,38,39], one of the main purposes of this study is to investigate this relationship and contribute to its prediction and assessment.

Over the last decades, the innovation debate paid more and more attention to the gender aspect, especially to its relations to top management [52]: leading academics and policy makers have started to investigate the diverse implications of the presence of women in (and more generally, the gender diversity of) corporate Board of Directors [72]. According to [32], the presence of women in the Board of Directors is apt to diversify perspectives, experiences, working styles, knowledge, and expertise with respect to their male counterpart [51, 56], motivating an increasing attention towards the role of women, especially in innovative (and creative) businesses [27, 76].

While the sex of a person is a feature defined by biology, gender is something that refers to a social construction: it is a cultural aspect [70] and it is assimilated through the interaction with other people. Therefore, individuals learn how to behave according to their gender, and they also learn the attitudes to avoid in order not to be inappropriate [114]. As a consequence, there is a high probability that people expect women to behave according to their “femininity”, while men according to their “masculinity” [112]. Therefore, there are stereotypes of the role attributed to both genders [43] and these can be classified as 1) descriptive gender stereotypes, that refer to what a man or woman is like, and 2) prescriptive gender stereotypes, that refer to the socially required behavior of woman and a man. The elements just described can shape the behavior of individuals and have an influence on their decisions: men and women are partly influenced by what is normal to do for the society according to “masculinity” or “femininity”. With this regard, a relevant example is given by the segregation of the labor market, i.e., the lack of males or females in a certain sector or profession [3].

The behavior of individuals is also affected by the context in which they find themselves because some contexts seem to require specific attitudes and roles. The reaction of people who find themselves in this circumstance is usually to adapt themselves to the context and behave as it requires, assuming traits that recall masculinity or femininity [24]. A relevant context in which individuals’ behavior is influenced by the context itself is represented by the entrepreneurial world: from a psychological and competence point of view, it requires leadership skills [80, 118], which are often identified in typical masculine traits such as aggressiveness, risk-taking and autonomy [4]. On the contrary, femininity seems to be attributed to a warm, calm and communal attitude and therefore, if a woman intends to undertake an entrepreneurial activity, she should assume the afore mentioned masculine traits [2]. Furthermore, according to the Global Report 2020/2021 [22] compiled by the Global Entrepreneurship Monitor (GEM), men involved in entrepreneurial activities are significantly more than women. In particular, the percentage of men entrepreneurs exceeds that of women in almost the entire world in the Total Early-stage Entrepreneurial Activity (TEA), which accounts for all those new entrepreneurial activities undertaken during the year; the exceptions are represented by only six countries (Kazakhstan, Indonesia, Oman, Saudi Arabia, Togo and Angola).

In the context of entrepreneurship, there are significant differences between the two genders in innovation-related issues [85], due to the fact that the innovation concept is characterized by a risk-taking component (women are more risk-adverse) and by the implementation of new technologies. This last aspect may be explained by the different fields of study chosen by men versus women: it has been shown [15, 23, 57] that the presence of women in STEM curricula (Science, Technology, Engineering, and Mathematics) is significantly lower compared to men. As a result, more men acquire training and familiarity with technology and are therefore more likely to adopt innovative management practises that require the application of new technologies.

We emphasize that the definition of innovation is gender-neutral, but it has been argued that the way innovation is operationalized and measured is strongly gendered with masculine connotations [81, 92]: a strong association is found between innovation (and technology) with masculinity [111], and women are seen as less innovative than men, leading to a perception of women’s underperformance [31, 75] in innovation practices [20, 74, 81].

More recently, theories asserting the masculinity character of innovation have been partly discarded: there is a growing focus on the role of women in innovation, particularly in companies that are strongly innovation-driven [76]. In particular, a positive influence of the women’s presence on Boards in innovation results has been demonstrated [84]. The presence of women on Boards leads to a greater heterogeneity in the group, and, consequently, to a greater diversity of perspectives, experiences, work styles, knowledge and skills [32]. With regard to skills and knowledge, it is found that women are more adept at recognizing customer behaviors and expectations; this leads to the identification of innovative products, services and processes that more closely reflect customer needs [58, 107]. Furthermore, women on Boards usually have better understanding of customer behaviors and expectations, and in this regard, the presence of women on Boards is more suitable to widen the range of ideas and perspectives to innovate the product and to identify market opportunities [51, 107]. In a nutshell, as conjectured by [86] the specific expertise and knowledge brought by women may contribute to broaden the range of new products and services, and one may argue that in some cases a correlation exists between the presence of women on Boards and innovation [107]. Nevertheless, the studies focused on identifying the contribution of the presence of women (or gender diversity) to firm innovation are still limited, probably due to the invisibility of people in innovation [106], as opposed to the limelight nature of entrepreneurship [7]. In our contribution, also in view of the particular focus by policy makers (i.e., European institutions) on assessing and closing the gender gap in the economic and financial sector [77], we are going to investigate the relationship between the gender aspect and the companies’ articulation of innovation, which is strictly related to the customers’ perception: this latter point is a key concern of innovation, since it is right through this one that companies influence their customers’ innovation perception. In this regard, there exist in literature several studies that propose a qualitative assessment approach, e.g. by performing surveys [68] that are also used to assess the perception of innovation barriers amongst firms [16, 53]. A substantial amount of literature exists on the customer perception of innovation [67]; the effects of innovation perception have been explored in different sectors, e.g. banking [41], educational [60], hospitality [59]. A unified framework for consumer perceived innovativeness, embedding both qualitative and quantitative aspects has been proposed by [71], and a further abstraction is proposed by [66], that define the perceived firms’ innovativeness as a consumer-centric view of innovation, based on the customer assessment of the business’ capability to endure, whose indicator is the capability of creating and commercializing novel, creative, and impactful ideas and solutions. Please notice that the procedure we have used to assess the articulation of innovation on company web-sites is gender neutral, to avoid masculine connotations in the way it is measured.

In what follows we outline the research ideas developed to investigate the relationship between co-creation and innovation, that will be used as the main building blocks of our approach. In particular, we will present the origin and development of the metrics used to measure the two aspects (i.e., innovation and co-creation) and the approaches and tools used to assess their relationships. In order to assess these two aspects, one approach suggests to compute the frequency of keywords related to the concept of co-creation and innovation within the websites of a large sample of businesses. The approach based on using the frequency and co-occurrence of words to examine the relation between specific organizational activities was introduced by [45] where it has been analyzed the sequence of competitive actions of a sample of firms to define their own business strategies. The same method has then applied to innovation-related topics by [50] to analyze the business models of innovative businesses, by defining a set of keywords related to innovation and by counting (and normalizing according to the size of the websites) their occurrencies: thanks to this approach, the authors were able to classify businesses based on the degree and the type of innovation communicated on their websites.

The same approach was applied to analyze value co-creation by [6], that introduced a set of keywords (detailed in Table 1) related to value co-creation in order to classify the different co-creation activities practiced by a large sample of businesses.

Table 1 Co-creation keywords in [5]. Keywords are represented as regular expressions in which concepts are identified by plain text and connected by logical connectors (\(\vee\) denotes OR and \(\wedge\) indicates AND)

A further development was provided by [105], that looks for the occurrence of keywords related to co-creation and innovation on businesses’ websites, and implements a linear regression to investigate the relationship between the two aspects, in order to assess the influence of the practice of co-creation activities on the degree of innovation communicated by businesses on their websites. This idea was further investigated by [38], that introduced a Neural Network approach to investigate the relationship between value co-creation and innovation among a large sample of businesses, by exploiting the neural networks’ skills to analyze different kinds of relationships amongst variables. In this latter paper the authors tested the hypothesis that businesses with a higher degree of involvement in co-creation activity had a greater number of possibilities, occasions and contexts in which to apply innovation in their products, services and processes [44, 96], and a positive association between these concepts has been found. Later contributions led to comparable results on different sets of data [36, 37, 73], on top of implementing Self-Organizing Maps to classify businesses according to their degree of involvement in co-creation and innovation activities.

Last, [39] examined the relationship between the degree of involvement in co-creation activities by businesses, the degree of articulation of their service value attributes and their innovativeness, by using different methods: Principal Component Analysis was used in order to identify the components of co-creation activities; neural networks and correlation analysis were used to analyze the aforementioned researched relationship; K-means cluster analysisFootnote 1 and Self-Organizing Maps (SOMs) approach allowed to classify the businesses according to their degree of involvement in different co-creation activities, articulation of their service value attributes and their innovativeness. The results of this contribution show the presence of a statistically significant relationship between the degree of involvement in co-creation activities by businesses, the degree of articulation of their service value attributes and their innovativeness.

All the above studies and findings suggest that it may exists a relationship between the articulation of innovation and co-creation activities. In this contribution, we attempt to deepen this line of research by trying to understand whether adding the gender component may shed more lights to detect (or not) this relationship.

Our data

In the past decades, there has been a significant growth in the amount of unstructured data: the elaboration of texts proves to be more and more complex and time-consuming, so the adoption of text mining techniques for the analysis of textual data has proved to be very useful in order to simplify and improve the activity of researchers [11] and other stakeholders. Text mining refers to techniques capable of analyzing an unstructured text, processing its information, and creating a structured content [55]. Text mining has been used in the field of innovation research [40, 83], and its use over the Internet (Web scraping) can be useful to detect specific business’ information: according to [26], businesses’ online communication of innovation-related content influences customers’ and other stakeholders’ perceptions of innovation. Starting from this assertion, we want to quantify the amount of online content about innovation, which from now on we will call articulation of innovation. Articulation of innovation will be used in what follows as a metric to evaluate the degree of innovation of a business, and instead of using a pre-defined dictionary based on text mining procedures [11], we use the frequency of the companies’ online comments about their own new products, processes and services to assess their articulation of innovation: we do not aim to apply an automated content analysis, but rather to assess the degree of the articulation, i.e. to which extent companies’ news announcements concern the innovative aspects of their products, services and processes on their websites. Actually, the meaning of this measure is twofold: first, it could be seen as an expression of the firm’s perception of its own innovativeness; second, it represents the communication of the innovative image the company wants to offer to customers and other stakeholders.

This metric has been introduced by [105], and it is computed by detecting any online statement containing the combination of the words new and product, or the words new and service, or new and process etc..Footnote 2

Other research studies have focussed on the relationship between gender and innovation, applying more traditional and tangible metrics (number of mew products, new services, new processes, new patents, etc.) such as the ones described in the OSLO manual.Footnote 3 The present study does not use these metrics, and our innovation metric provides a quantitative evaluation of how often firms articulate the innovative aspects of their market offers on their websites: it embeds the advantage of emphasising the ability of a firm to differentiate itself by articulating the innovative aspects of its products and services [36], and it accounts for the claims of innovativeness of a company about its own products and services.

The value co-creation refers to the degree of customers’ involvement in a business in the creation of new products and services. According to the related literature outlined in “Co-creation, gender aspects and innovation”, this degree of customers’ co-participation has been computed by searching for co-creation-related keywords (detailled in Table 2) on businesses’ websites.

Table 2 The co-creation keywords used in our approach

In order to assess the articulation of innovation and the co-creation components, we have implemented a Web Scraper for Regular Expressions, that is specifically tailored to our data needs: it allows the user to calculate the frequency of a predefined set of regular expressions (that has to be provided as input by the user), related to the innovation and co-creation activities of a business. For each website, the scraper identifies all available webpages belonging to the web domain; for each page, the source code is translated into text, and the conditions imposed by the user are translated into regular expressions: the occurrencies of these regular expressions are computed for each page, aggregated for the whole website, and rescaled with respect to the number of pages.

In order to carry out our analysis, we have collected data about three different business’ attributes: the afore mentioned articulation of innovation, the co-creation components, and the gender component. As for the gender component, [1, 48] introduced the idea of using gender (i.e., sex) as a variable, often to analyse differences and similarities between men and women. Starting from this, we have decided to compute the percentage of female directors on the companies’ Boards, that reflects either the extent of female directors appointments or Boards homogeneity/heterogeneity (with values, respectively, equal to 0 or 1), as suggested by [72]. By using this metric (referred to as gender metric), we want to test the assertion in [33, 100], according to which many firms portrayed as leaders in innovation activity are managed by teams that include both men and women. Moreover, such approach appears to be in agreement with the concepts stated by [25], which remarks that research should focus less on gender differences and similarities, and pay more attention into understanding how gender is embedded in processes, for instance by the counting of women and men involved in innovation processes.

We have computed the articulation of innovation, the co-creation components, and the gender component with respect to the year 2021 over five different sets of data:

  • 287 Open Source (OS) businesses associated with the Eclipse OS Foundation which embrace those businesses that are more apt to adopt open innovation and co-creation practices. OS companies have been commonly considered as a unique example of open innovation and creativity [42, 91, 110]. This set of data will be referred to as Eclipse;

  • the other four sets of data consist of businesses listed on the NASDAQ (98 businesses), FTSE100 (95 businesses), DAX30 (30 businesses), and CAC40 (40 businesses), which refer to stock markets of different countries (respectively, USA, UK, Germany, and France) and are considered to be representative of the country’s overall economic and financial condition (details can be found at https://docs.google.com/spreadsheets/d/1gaZepe_m_kqoQLtBM3dBpiBvpBppIAZj/edit?usp=sharing &ouid=100079799523990491093 &rtpof=true &sd=true).

Table 3 shows the main statistics about the gender metric which is defined, as said, by the ratio between the number of female Board members and the total number of Board members along with the articulation of innovation over the examined sets of data for the year 2021. The following tables also contain the statistics and results for the overall set of data containing aggregated data from the five instances pooled together, that will be referred to as Overall.

Table 3 Main statistics over the year 2021 of: the ratio of women to total members of the Board; the articulation of innovation

Nevertheless, we want to emphasize that by applying the above described gender metric we overcome the criticism about the reliability of the innovation-related measurements in literature as considered to be gender-biased [81]. Please notice that our analysis is not meant to define a model to explain innovation, that is why we are not building a model by identifying the most suitable control variables: the purpose of our experimental study is to verify whether:

  • the relation between gender and innovation is different between innovation-related and more traditional businesses;

  • the gender metric can better explain the innovation metric.

We bring to the reader’s notice that here we are not addressing the role that time plays in the innovation process or in making claims on company websites, for example by examining the time lag between the joining to the Board of a woman and her fully realized ability to influence the business innovation claims [17]. This latter aspect would require a separate investigation by means of a dedicated script able to autonomously detect changes in the articulation of innovation and gender component, which goes beyond the scope of our current implementation and goals.

Looking at Fig. 2, we can draw some insights from the graphical representation of the relationship between innovation and the percentage of women on the Board of Directors. For all sets of data, we plotted on the x-axis the percentage of women on the Board of Directors while the frequencies of the articulation of innovation are shown on the y-axis. At a first glance, we do not observe a straight correlation between the variables, and the same conclusion holds also by looking at Fig. 1, where the articulation of innovation frequencies appear on the y-axis, and the number of men on the Board of Directors are displayed on the x-axis.

Fig. 1
figure 1

Scatter plot of the articulation of innovation (referred to as I on the y-axis) vs number of men in the Board of Directors (referred to as G1 on the x-axis) in the five sets of data taken into account. The point size is proportional to the number of women in the Board of Directors

Fig. 2
figure 2

Scatter plot of the articulation of innovation (referred to as I on the y-axis) vs percentage of women in the Board of Directors (referred to as G2_tot on the x-axis) in the five sets of data taken into account

Correlation analysis

In this section, we outline a correlation analysis between the gender component and the articulation of innovation with respect to the investigated sets of data. The purpose of this analysis is to understand whether the gender ratio on the Board of Directors and businesses’ innovation activity are somewhat related to each other. Moreover, with respect to this aspect, we want to investigate whether there are some differences between businesses that are known to be innovative and technology-driven (i.e., Eclipse) and those more traditional. We start by remarking that the average ratio of women on the Board of Directors is highest in the Eclipse set of data: this seems to confirm what has been observed in literature on gender and innovation, i.e., a higher presence of women involved in the decision-making process in innovation-related businesses than traditional ones. In addition, we report that \(71\%\) of the businesses in the Eclipse instance have an absolute majority of women on the Board and that the second highest absolute majority of women is shown by CAC (\(15\%\)), while the other sets of data show small values, down to \(0\%\) in DAX. We also remark that the maximum female presence on the Board is still found in the Eclipse set of data (\(100\%\)). The other sets of data contains also businesses in which there is an absolute majority of women, except for the businesses listed in the German stock market, where the maximum female presence is \(38\%\). Last, we want to stress that the ratio female to Board show a minimum equal to 0 on all sets of data, and that the percentage of businesses without women in the Board is between \(1\%\) and \(2\%\) in all sets of data but DAX, that show a higher number. These results seem to confirm that on a large scale entrepreneurship is still largely gendered [63], but, compared to the past, there is a greater presence of women in both businesses where innovation is more intense and more traditional businesses [35].

In what follows we draw some comments on the articulation of innovation metric. By looking at Table 3, the highest average is found in Eclipse, and this seems to confirm what was expected, that is, these businesses are exactly those known to be more technology-driven and innovative. The same holds for the maximum. When analysing the minimum instead, it is possible to observe that all sets of data show the value zero: in all sets of data there are therefore businesses that seem to have no interest in communicating their innovation to potential customers and stakeholders through their website. In particular, we report that in all sets of data the percentage of businesses that have zero frequency related to the articulation of innovation is about \(50\%\), except for DAX, that shows the \(20\%\). In table 4 we report, respectively, a Pearson, Rank-based and Mutual Information statistical dependence analysis aimed at determining the relationships between the two variables. In particular, Mutual Information (MI) dates back to the beginnings of information theory[102]. It was initially applied to calculate the capacity performance of communication systems, and thereafter it served as a fundamental tool of investigation for many other research areas, such as mathematics, computer science and economics [8, 9]. MI has been successfully adopted in feature-selection problems to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables. Although the correlation coefficient is a long-standing measure of the strength of statistical dependence, MI has advantages over it: it is able to detect non-linear relationship between features and it is model neutral.

The results show a very low Pearson correlation coefficient and almost close to zero (except for FTSE) with a p-value significantly greater than 0.05, suggesting that there is no correlation between the two variables over all sets of data. Similar remarks can be drawn by looking at the ranked-based correlation analysis.

As a last takeaway, we draw a comparison with the results about the pairwise mutual information between the articulation of innovation and the gender component over the years 2013 and 2019 reported in [35]. Results therein presented show an increase of the mutual information over the investigated years, and this trend appears to hold true for 2021 as well. In particular, Eclipse shows the highest Mutual Information value (0.53 in 2021, while it was 0.32 in 2013 and 0.34 in 2019), while the highest incremental change over years is observed for FTSE which goes from 0.05 in 2013 to 0.41 in 2021.

Table 4 Pearson, Rank Based and Mutual Information (MI) dependence coefficients between the gender component and the articulation of innovation

We conclude this analysis by remarking that the gender component and the articulation of innovation do not show any linear or rank-based correlation. On the other side, the Mutual Information between the two variables is significant (\(\ge 0.5\)) over three sets of data out of five: since the Mutual Information helps in reducing the uncertainty for a given random variable (in our case, the articulation of innovation) when another variable (in our case, the ratio women to total Board) is known, this could be a good signal suggesting us that the use of the gender component may be used to predict the articulation of innovation. The next Sections will aim to investigate if this holds true.

Neural network approach

Artificial Neural Networks (ANN) are high-level algorithms inspired by the behavior of the brain, of which they represent a simplification [49]: they have been used to perform a wide range of computational tasks in many fields in which there is no assumption about the relations amongst input and output variables. They are composed of elementary units (i.e., neurons), which are connected through weighted edges (i.e., synapses) in a given topology.

Synapses are associated with a value, which determines the magnitude of transmission. A neuron receives inputs from the neurons that are connected to it, which are aggregated as the weighted sum of inputs and synapses’ values; then an activation value is computed by applying a given activation function to the weighted sum of inputs. This value is sent, through synapses, to the connected neurons. We can devise three main categories of neurons:

  • Input neurons, whose activations represent the inputs of the task to be performed by the network;

  • Output neurons, whose activations represent the output of the network;

  • Hidden neurons, the remaining neurons, so called because they are not visible from the external environment.

As for the topologies, the most common are the layered one (in which neurons are subdivided in layers) and the completely connected (in which all neurons are all connected with each other). Each neuron in the layered topology is connected to all neurons in adjacent layers, whilst there are no connections between neurons in the same layer: in this topology the information flow of the network is unidirectional, and the resulting network is referred to as a feed-forward architecture.

To use the neural networks, the user needs first to set the values of synapses: this is done by the learning algorithm, which can be seen as the ability of the network to modify its behavior in order to obtain the right output given some input through the modification of synaptic connections (weights). This happens by modifying these weights until a stopping criterion is met (usually, when the difference between the desired and the actual output produced by the network is smaller than a given threshold).

Several algorithms have been proposed to train neural networks, e.g., genetic algorithms [79], simulated annealing [62], and backpropagation [113]. This last algorithm is still the most used, and it is based on the back-propagation of the error from the output units to the input ones: in a first phase, the activation of the input units is propagated through the activation functions; then, the weights of the synapses are modified through the technique of the descendent gradient.

We have performed three different sets of experiment (models):

  1. 1.

    A model to identify a relationship between the articulation of innovation and the gender component only, in which the input is given by the gender metric and the output by the articulation of innovation. In what follows this model is referred to as GE model;

  2. 2.

    A model in which the input set is composed of all co-creation components and the output by the articulation of innovation. This model is referred to as CO model;

  3. 3.

    A model in which the input set is composed by all the co-creation components plus the ratio female to cardinality of Board of Directors of the business taken into account: this model combines the co-creation components with the gender aspect, and is referred to as CO-GE model.

In what follows, we will outline the main features of our neural network approach: “Data pre-processing” will outline the data pre-propcessing operations; “Training and test set” will detail how we have partitioned data into training and test sets; “Network topology” will detail the Network topology, while the learning parameter and the algorithm’s performances will be outlined in “Learning parameters” and “Algorithm performances”, respectively.

Data pre-processing

The analysis of data at hand represents an important phase in the experimental settings for a neural network approach: this operation aims to explore data features, to detect eventual anomalies, and to preserve the most information as possible, avoiding at the same time over-fitting shortcomings. In our approach, we have used the pre-processing operations defined by [10, 29], that we outline in what follows:

Removal and replacement

The issue of incurring in missing or wrong values is a key-point in real-world applications, and all neural networks approaches resort to procedures to take into account this aspect. We have used the approach introduced by [10], that suggests to remove the indicators containing more than \(30\%\) of missing or wrong values. All sets of data introduced in “Our data” do not reach this portion of missing or wrong data, hence we are using, in the experimental phase, the whole set of variables. In detail, no wrong values have been detected in our sets of data; as for missing data, we want to remark that for some businesses it was not possible to collect all data referring to the gender components, so we have some missing data referring to the number of male and female in the Board of Directors: in this last case, we have followed the guidelines indicated by [10, 29] and replaced the missing values by the variable’s average over all businesses.

Normalization

When applying neural networks, a widely used rule-of-thumb suggests to perform data-normalization in order to feed the neural network with data belonging to the same range. Many mathematical formulations have been suggested to this aim [61], and we are using the logarithmic transformation used by [36, 37, 39], which is defined as follows: let \(x_i\) be the value before normalisation of input x for business i, and \(\overline{x}_i\) be its normalised value. The relation between normalised and pre-normalised data can be defined as follows:

$$\begin{aligned} \overline{x}_{i} = \log _{u} \left( x_{i} + 1 \right) , \end{aligned}$$
(1)

where \(u = x_{\max } + 1\), in order to have \(\overline{x}_{i} \in [0,1]\). We want to remark that the original formulation proposed by [10] was the following:

$$\begin{aligned} \overline{x}_i = \log _{u} \left( |\min (0, x_{min}) | + x_i + 1 \right) , \end{aligned}$$
(2)

and this was due to the fact that the authors have tackled a problem arising from the use of different variables sets, in which there were observations whose values were negative. All values belonging to our sets of data represent weighted occurrences, and they cannot be negative by definition, hence we are not considering the possibility of encountering negative values, that would hinder the application of a logarithmic transformation. Tables 5 , 6 and 7 show the main statistics of data at hand after the pre-processing operations.

Table 5 Main statistics of the response and explanatory variables after pre-processing operations, for the sets of data Eclipse and Nasdaq. Variables \(C1--C24\) refer to co-creations components; variable I denotes the articulation of innovation; variable G denotes the ratio women to total members of the Board
Table 6 Main statistics of the response and explanatory variables after pre-processing operations, for the sets of data FTSE and DAX. Variables \(C1--C24\) refer to co-creations components; variable I denotes the articulation of innovation; variable G denotes the ratio women to total members of the Board
Table 7 Main statistics of the response and explanatory variables after pre-processing operations, for the sets of data CAC and Overall. Variables \(C1--C24\) refer to co-creations components; variable I denotes the articulation of innovation; variable G denotes the ratio women to total members of the Board

Training and test set

As for the experimental framework, in all experiments, we have sampled two disjoint sets of observations out of the total number of businesses belonging to a set of data: the training set (used to estimate the networks’ parameters) and the test set (used for assessing their performances). Businesses have been randomly allocated to these two sets to have the training set consisting of 75% of the total businesses, and the test set consisting of 25% of the total businesses. This sampling has been repeated 30 times, each time leading to a different definition of training and test set, which was needed to compute a robust weighted error function over the diverse train-test partition. The procedure has been performed over the five different sets of data at hand.

Network topology

As for the network topology, we have used a feed-forward architecture trained by backpropagation: in this topology, the most important parameter to be set is the number of hidden neurons and number of hidden layers. The literature on the topic has introduced many approaches to enhance these issues, ranging from recognised rules of thumbs to adaptive procedures to change the topology of the neural network over time. In this direction, we have performed preliminary experiments by using the adaptive procedure proposed by [28], in which hidden neurons are added to the network until no improvements on the performances of the system are detected. This approach requires a computational time that is about 10 up to 50 times higher than a basic feed-forward one, and a comparison of the results based on a Wilcoxon signed rank test (tested against a 0.05 significance level) led us to reject the null hypothesis of difference between the two approaches. Hence, we have resorted to well-established rules of thumb in order to set the number of hidden neurons, as reported by [34], and we have used one hidden layer, setting the number of hidden neurons equal to the number of parameters (variables). The reason for choosing one layer only is due to the small cardinality of some sets of data at hand, that could hinder the learning algorithm’s ability to learn the higher number of parameters associated to the synapses. We want to stress out that two-layers neural networks are formally recognised as universal function approximators [54], but the comparison with the adaptive procedure performed in the preliminary experiments suggests that one hidden layer is enough for our purposes.

Learning parameters

As for the neural network learning parameter, the literature on the topic reports that the typical values for a neural network with standardized inputs (or inputs mapped to the [0,  1] interval, which is our case) has to be smaller than 1 and greater than \(10^{-6}\)[18]; furthermore, there is no way of determining the learning rate a-priori [98]. For these reasons, we have resorted to a parameter tuning procedure to set the learning rate: F-Race [19], by whose execution we obtained a learning parameter equal to 0.12 for the CO model and 0.10 for the CO-GE model. Please notice that for both models the momentum value as determined by the F-Race procedure was close to 0, hence we have not introduced any momentum parameter into our analysis.

Algorithm performances

As for the neural network learning performances, the basic rule consists of using the supervised learning technique on the test set, while, in order to avoid overfitting, the goodness of the neural network has to be assessed by computing an error metric over the validation set, that can be also used in order to set the termination criterion for the algorithm. Several error measures can be used, such as the mean absolute error (MAE), the mean absolute percentage error (MAPE), the root mean square error (RMSE), etc. In our experiments, we have used the mean square error (MSE), defined as:

$$\begin{aligned} {\frac{1}{n} \sum _{i=1}^{n} (e_i - a_i)^2}, \end{aligned}$$
(3)

where n is the number of observations, e is the expected output and a is the actual neural network output.

Neural Networks have been implemented in Python. Experiments have been run on a cluster with AMD Opteron 2216 dual core CPUs running at 2.4 GHz with 2x1 MB L2 cache and 4 GB of RAM under Cluster Rocks distribution built on top of CentOS 5.3 Linux. We have run the neural network on all the obtained traintest partitions. For each partition, the algorithm has been run 100 times, and the best run (with respect to the MSE) has been recorded. In order to test the neural network performances and the robustness of the approach, we report in Table 8 the statistics of the best runs obtained over the 30 partitions and with respect to the GE, CO, and CO-GE models.

Table 8 Neural Network’s overall errors (MSE) on the GE, CO-GE, and CO models. We report the statistics on the best runs obtained on the 30 different partitions of overall error

By looking at the results, we can say that the neural network is not able to generalize the GE model, hence we conclude that a general relationship between the gender component alone and the articulation of innovation cannot be explained. On the contrary, good results were obtained from the application of the neural network to the CO and the CO-GE models: the average of the total error on the single test sets varies between 0.035 and 0.120 in with respect to the CO-GE model and between 0.041 and 0.171 for the CO model; furthermore, the overall instances show the average of the total error between the minimum and maximum provided by the single test sets, but features a lower standard variation, due to better performances obtained on bigger sets of data. As for the single sets of data, the lowest MSE (on both models) is found for the FTSE set of data and the highest error is found for the CAC set of data, which may be due to the low number of observations. The standard deviation of both models is comparable, meaning that the better performances on the CO-GE model do not come at the cost of an higher variability of the results. In all cases, we can say that the introduction of the gender component improves the skills of the neural approach to predict the articulation of innovation.

Comparison with linear regression

From the observation of the Table 8, it is possible to detect the strength and the ability of generalization of the neural network: results have been validated with respect to the test set, hence on different sets of data than those used for the training of the network. We want to stress the importance of this conclusion by performing a linear regression on the same sets of data, in order to test the over-fitting of the resulting model. We have performed linear regressions for the three models, but we are reporting only results of the CO and CO-GE models, since the GE model do not lead to satisfactory performances. Please notice that in literature there are contributions that introduce linear regression to predict the articulation of innovation, in which the predictor set is given by the same co-creation components used in our analysis (see “Co-creation, gender aspects and innovation”): we want to stress out that these contributions also introduce a pre-processing phase in which Principal Component Analysis [37] is used to reduce the input space. In our contribution instead, we do not apply such data reduction techniques, since they may lead to biased or misleading conclusions. We used all the indicators shown in Table 2 to form the predictors set, resulting in a procedure that is highly reliable and that does not require the intervention of an external user.

Please notice that we are not interested in assessing the goodness of the predictors, so that we do not report their estimates along with the corresponding p-values, and we just report the following performance measures:

  • The coefficient of determination \(R^2\), along with its variants: the Adjusted \(R^2\), that is used to adjust this performance measure to take into account the number of predictors (i.e., it decreases when the cardinality of predictors does not improve the goodness of the fit by a sufficient amount), and the Predicted \(R^2\), that takes into account a sampling procedure to assess overfitting (i.e., lower values are an indicator of overfit);

  • The p-value of the F-test, that represents the probability to obtain an F-statistic value greater than the F-value of the model, under the null hypothesis that the regression is not significant [14];

  • The Akaike’s Information Criteria (AIC), that measures the goodness of a statistical model by assessing the relative amount of information loss (lower values identify better models with regards to the specific set of data).

We have performed two linear-regression analyses based on two different sets of experiments: first, we have performed the analysis on the whole set of data, without partitioning into training and test sets; then, we have adopted the same subsampling procedure to identify thirty different partitions of training and test sets as defined in “Training and test set”, and then we have computed the aggregated performance indicators.

As for the experiments performed without partitioning the data into training and test set, we obtain the results listed in Table 9. We can notice that adding the gender component (to the predictor set composed by the co-creation indicators) does not always lead to an improvement in the goodness of the fit when the regression performance is assessed by the \(R^2\). On the contrary, it worsen the performances over three (NASDAQ, DAX, CAC) out of five sets of data. Anyhow, the regression is always significant, as confirmed by the p-value of the F-test (values smaller than 0.025 indicate that the regression is significant, and they are highlighted in bold in the table).

Once more, we could conclude that the operation of adding the gender component not only does not lead to a significant adjuvant effect but it worsen the regression performance in the case of the DAX set of data. Anyhow, the goodness of the fit assessed by the \(R^2\) seems to indicate that linear regression is appropriate, with the exception of the sets of data DAX and CAC for the CO-GE model, and of the set of data Eclipse with respect to the CO model. We also notice that adding the gender component improves the linear regression performance over the Eclipse set of data.

Table 9 Estimates of the linear models CO and CO-GE: estimated parameters, \(p-values\) and significance statistics of the linear models. Columns highlighted in bold mean that the corresponding regression is significant according to the p-value of the F-test

Although this result may sound promising, this is a situation affected by over-fitting, as witnessed by low values of the predicted \(R^2\) over the five sets of data. In order to demonstrate this, we have introduced the same sampling procedure defined in “Training and test set” and computed the MSE, by using Eq. 3, as for the neural networks. Results are reported in Table 10. We can see that linear regression achieves better performances over the train set with respect to four sets of data out of five for both CO-GE and CO models, whilst when dealing with the test set, neural networks show the best performances over the CO-GE model for all sets of data, and on three out of five of them with respect to the CO model. In this case, we see that the addition of the gender component plays a significant role when comparing these two different approaches.

Table 10 Linear regression errors (MSE) on both CO and CO-GE models. We report the statistics on the best runs obtained on the 30 different partitions of training-validation set

In a nutshell, the introduction of the gender component into the predictor set leads to better neural network’s performances in the validation set in three sets of data out of five (Eclipse, Nasdaq, Ftse); on the remaining two (Dax, Cac), its introduction worsen the neural network performances. Overall, we can state that the introduction of the gender component may help to improve the performances of a neural network approach. As for the linear models, the introduction of the gender component may help to improve the performances, but it is highly affected by over-fitting on small-sized instances, which makes the approach not reliable. On bigger sized instances (i.e., the overall set of data) instead, the linear regression analysis show performances that are better than the neural network approach, suggesting that the two approaches have to be used jointly.

Conclusions and future works

In this contribution, we have proposed a framework to predict the articulation of innovation, and to understand whether the gender component of businesses (coming from different scenarios) can be used to this extent: since innovation activities are managed by Board of Directors that include both men and women, we have tried to understand whether the gender component can be embedded in (and whether it can be used to explain) the innovation processes, such as, in our case, the articulation of innovation. Our results confirm previous findings [35] and reveal that it is not possible to detect a universal relationship between innovation perception and gender diversity on the Board of Directors, and demonstrate that neural networks may be used to approximate this relationship. The increasing relationships between articulation of innovation and gender component over time is also confirmed by more recent observations.

Our findings corroborate the idea that female Board representation is positively associated with more innovative and creative firms, but we have seen that this statement can also be applied to some more traditional business. We have introduced a novel support tool based on a neural network approach that is able to grasp the relationship between gender and communication of innovation on both innovation-related and more traditional companies. To this end, we have taken into account other features related to the innovation attitude of a business (such as the value co-creation) since we have shown that it is not possible to grasp such relationship when the gender component is used as a stand-alone predictor: this suggests that female Board representation is positively associated with innovation performances only when it is integrated into the business culture and vision, and our approach has shown that it is computationally pointless to advocate a direct relation between these two aggregates taken as stand-alone. At the current stage, we can assert that the gender component may be useful to predict the innovation articulation of a business under several scenarios.

Our findings seem to suggest that the relation between gender and innovation is strong in innovation-driven businesses, but it does not mean that it does not hold also for more traditional businesses: our approach offers satisfactory performances to this regard over traditional sets of data, but it fails on two benchmarks: this could due to some features of the benchmarks themselves that lead to over-fitting, and further research has to be devoted to investigate this aspect. For the same reason, we can state that the gender metric might be helpful in predicting the articulation of innovation, but this statement has to be read in the light of what previously remarked about the evidence that the gender metrics cannot be used alone.

It has been argued that women are either absent or made absent in innovation processes, even if they work in a R &D role and are responsible of innovation activities [93]. In this contribution, we have shown the opposite, by devising an approach based on the ideas from [25]: we focussed on how gender is embedded in business processes, by the counting of women and men involved in innovation processes, and we have defined a metric able to quantify the level of openness of a company to innovation.

Our contribution spreads light on the ongoing discussion about the conditions under which gender diversity is an asset and a stimuli for creativity and innovation: we have investigated businesses with different gender composition and different attitude to creativity and innovation. In a way, this is a first step in the definition of a gender neutral concepts to be used when examining innovation and related issues, in order to develop methods to examine what people do, rather than how they talk about it. At the current step, it seems that the presence of women has a relation with the business’ innovation claims: we cannot identify a causal link, but this indicates a direction for future research on what makes women more influential in the Boards, and on the extent to which women ideas about innovation are considered and implemented in companies (and comparing these findings with those relative to the male counterpart). Furthermore, we want to examine the extent to which women and men intervene in the innovation discourses: for this goal, we have to define different keywords and regular expressions to be associated to innovation and gender analysis.

Further research will be also devoted to replicating this approach to other geographical contexts: the first step will consist in analysing emerging markets, in which gender and innovations appear to be more intertwined. The tools we have introduced are apt to work in English, hence the first attempts will be concerned with English-speaking countries. A further analysis will be carried out after having defined the new keywords to be used in other languages.

Furthermore, we want to extend our approach by considering the role that time plays in the innovation process and in making claims on companies websites. In this direction, it would be interesting to investigate on the effects induced by time lags between the moment when a woman joins the Board of Directors and the time she can exert an influence on firms’ innovativeness. Such analysis would need high frequency input data to continuously monitor changes in the Boards of Directors and in the regular expressions occurrences on the firms’ websites. It will be will valuable for future studies to include more traditional ways of measuring innovation (number of new products, number of new product features, number of new services, number of patents, etc.) as covariates in models similar to the one presented here. This analysis was left out of the scope of the present study since it requires a completely different data collection approach.