1 Introduction

Claiming for renewed interpretation of local resource, the present paper suggests a novel set of social network analysis, firm level and communication-based algorithms for mining the web to identify emerging entrepreneurial projects. Based on a constructivist approach in social science, which seems knowledge as socially constructed, Berger and Luckmann (1969) suggest that the cognitive processes linked to entrepreneurial activities feature prominently in theories of opportunity recognition (Kirzner 1979; Beattie 1999). From this perspective, entrepreneurship can be considered a socially constructed phenomenon, which is reflected by the emergence of opportunities. These opportunities arise as individuals make sense of information and their actions and retrospectively ‘discover’ and ‘recognize’ business ideas (Gartner et al. 2003). Thus, entrepreneurship takes place in an ‘enacted’ environment (Weick 1995).

Compared to several other qualitative research methodologies, discourse analysis is more strongly based on social constructivist paradigm (Phillips and Hardy 2002). As known, it requires some contextualization (Cicourel 1981; Fairclough 1995); texts are the material manifestation of discourse but discourse exists beyond the individual texts that comprise them (Chalaby 1996; Phillips and Hardy 2002). Also, a discourse cannot be identified on the basis of a single text; rather discourse emerges from the interactions among different social groups, their ‘texts’, and the context in which these interactions are embedded. In the case of entrepreneurship research, the context is both proximate and distal, indicating the systemic (economic) and substantive (political and cultural) embeddedness of entrepreneurship (Johannisson et al. 2002) which is reflected in the overall institutional setting, norms, and values, and the entrepreneur’s political and social environment. Linking this to notions of context as discussed in discourse analysis, the proximate context and the distal context will reflect the entrepreneur’s respective micro- and macro-environments (Achtenhagen and Welter 2007).

Based on selected social theories and semantic social network analysis (as a specific type of discourse analysis), we draw on the highly interconnected world of social networking platforms (InstagramFootnote 1) to conduct an empirical exploration of a localized entrepreneurial project. We select a localized group of nodes (entrepreneurs) and visualize their Instagram community. Network centrality measures contribute to explaining the role of specific nodes-concepts-business ideas within the discourse.

The paper is structured as follows: Sect. 2 discusses the notion of entrepreneurship, Sect. 3 defines semantic social network analysis (SNA), Sect. 4 provides a synthesis of the empirical survey, Sect. 5 gives the main evidences, and Sect. 6 concludes the paper.

2 The Localized Entrepreneur and Social Media

Contemporary regional policy is increasingly interested in encouraging latent localized innovation potential in the (intentional and/or effective) entrepreneurial projects carried out by local actors (Foray 2015). We are interested in the conceptual categories that describe the basic notion of ‘entrepreneurship’ in this literature.

Pragmatic application of the notion of entrepreneurship inspired by development economics considers its development as a self-discovery processFootnote 2 (Hausmann and Rodrik 2003). The entrepreneurial discovery is conceived as economic experimentation with new ideas, which emanate largely from scientific and technological inventions. This chimes with the cognitive theory of the firm and its specific focus on entrepreneurship as associated with different types of learning (Nooteboom 2009). Thus, entrepreneurship can be seen as a form (individual and/or collective) of dynamic capability. It consists of the ability to find and develop external partners, which are at a distanceFootnote 3, and the intellectual and behavioral capability to collaborate across this distance. It takes account of both competence and governance issues (Nooteboom 2009).

It is a fact that the new opportunities from participation in open-source com-munities and social networking platforms are contributing to a more complex notion of the entrepreneur. We are interested in the effects of textual meta-data matrices within the communication on social networks, in particular Instagram, and how these linguistic signs - verbal language fragments in a social platform strictly visual based - can express meaning for research especially from the perspective of the relational dimension of the focal network.

The approach to language introduced by Austin (1962) with his definition of performative utterance, suggests that language should no longer be considered a descriptive tool related to a state of affairs but should be understood as an act of creation - “performative” - of the real. Evoking the categories of thought through language is to create meaning, in the case of hashtags, commentary and description textual meta-data of the visual products issued by the users of the social. For each user it means building an individual identity, an individual biography. These individual biographies mediated by social-media when if shared, evolve into a value that is a more complex system which transcends the individual dimension of the individual user and is shared by multiple usersFootnote 4.

The novelty here is that, in the utterance, the individual is performing an action of which the very act of uttering the sentence is an essential component. We propose to introduce performative utterance in an assessment of a corporate network empirically. This allows us to analyze what is imprinted in the statements of identity discourse generated through the hashtag, on the Instagram profiles of community actors, and what it means when placed in a relational intentionality typical of network dynamics. Our interest is in identifying the self-representation produced by the meta-data, and to investigate which metadata are most commonly shared by the actors in the network. The analysis is conducted in two phases to examine the universes of identity, values, and interests that characterize the network and its actors. This research extends the notion of entrepreneurship as the ability to find and develop outside partners at sufficient cognitive distance. Additional relational competencies are needed, and particularly the ability to compete in global knowledge networks.

3 Semantic Social Network Analysis

Grounded in the field of communication science, semantic network analysis (Popping 2000) can be considered an alternative to content analysis (CA) (Krippendorff 2004). Since CA is used to analyze the content of media messages, it tends to determine the value of one or more variables based on the message content. In other words, it infers relevant aspects of what a message (newspaper article, forum posting, personal e-mail, etc.) means in its context, and the communication research question determines both the relevance and the correct context.

Rather than directly coding the messages to address the research question, semantic network analysis first represents the content of the messages as a network of objects. This network representation is queried to address the research question.

Despite wide use of the technique, extracting the network of relations from the text can be more difficult than categorizing text fragments although there are no standards for defining patterns on these networks (van Atteveldt 2008).

New social media such as Facebook, Twitter, Instagram, and so on, are considered direct and indirect big relational data sets. Within these virtual places, huge amounts of content (photo and/or video posts and blogs) are shared socially at diverse levels with different motivations such as socializing, co-designing, etc. This kind of social sharing is considered semantic due to the nature of the shared objects. The strength of this kind of on-line semantic sharing lies in the network structure and in its power of viral transmission of the messages/contentFootnote 5.

Even more trans-disciplinary technique, actually, there are three implication levels, as scientific fields directly involved in.

Firstly, ‘computational linguistics’ has seen drastic increases in computer storage and processing power in recent decades, leading to the development of multiple linguistic tools and techniques. Second, there is a need to alleviate the problems of combining, sharing, and querying these semantic networks, which requires a focus on ‘knowledge representation’. This refers to the formal representation of the background knowledge used to aggregate the textual objects with the abstract concepts in a research question. Third, there can be the distinguishing manual and automatic extractions of complex and abstract concepts by these data setsFootnote 6. A frequent application of automatic extractions is marketing trend analysis and political science. At this level, the basic research question is about measuring the concept’s relative importance in the relevant information sphere (web, blog, on-line forum). If the concept (e.g. a hashtag) is a node in a network of links (e.g. sharing hashtag), then analysis of the network structure can reveal the relative importance of that concept. Thus, semantic SNA is an extension of the SNA method (Wasserman and Faust 1994). The concepts of high betweenness centrality (BC) (the semantic, more diffused SNA indicator) become gatekeepers between different domainsFootnote 7.

4 Exploring an Entrepreneurial Project (Ep) Through Semantic SNA: A Synthesis of the Empirical Survey

The empirical analysis is in two steps. The first is an interpretative firm-level case study to identify a localized Ep projectFootnote 8.

The evidence includes the multi-relational external networks, corresponding to a specific learning investment.

Starting from these networks (A ≡ 6 nodes and 8 links; B ≡ 5 nodes and 8 links), we can parse the corresponding Instagram communities (research step 2).

The research dataset consists of the hashtagsFootnote 9 emitted in the previous two years by all members (6 + 5) of the networks related to the case study.

The research on the Instagram database involved several stages: search of the content was enabled using the tool Iconosquare (http://iconosquare.com/) which, only giving information on the user’s Instagram account, provides more objective research content since it is free of local and temporal constraints, which constrain search performed by users directly approaching a company.

In a subsequent step, data collection consists of gathering company information on companies and compiling it in a database record using the “trans-coding” language (Manovich 2001) ‘python’ which is a script that can extract data from Instagram through the API protocolFootnote 10.

Before moving on to the phase of data mining, we performed some cognitive ergonomics operations aimed at avoiding redundant or insignificant data as follows:

  1. 1.

    We deleted from the dataset the “auto-tag”, i.e. all those hashtags in which the issuer of the media content “tags you”. This initial screening is necessary to avoid imbalanced data with respect to individuals who issue more content, and especially because the self-tag may not be construed as a given relational potential, given its self-referentiality;

  2. 2.

    We deleted from the dataset all the “omnibus” hashtags, i.e. all those metadata which act as a description or interpretation of the photographic content that accompanies them, performing a “channel function”, i.e. referring to Instagram and its practices and operations. This category includes the meta-data “#instagram” “#in-stantmood” “#instantgood” “#igers” “#photosofthedayh” “#pi-coftheday” “#vso” “#vsocam” “#tagsforlikes” etc. This second filter was necessary to avoid drugging the results of the sampling with metadata not related to the specific of the analyzed subject but present in all or most of the Instagram content, a hashtag shared around the world used by users as a tool to cope with a greater amount of feedback and a greater rate of engagement. The dataset was split into two, coinciding with the hashtag inherent in the actor-network active first in the exploration phase (network A, with 6 detected users) and then in the exploitation phase (network B, with 5 detected users), within the learning cycle of the case study. To avoid redundancy, we aggregated the meta-data conceptually. To reduce redundancy and increase truth in substantive terms, we proceeded to a tag aggregation, both lexical proximity (“Naples” and “Napoli”, “Milan” and “Milano”, “graphic” and “grafica”) and conceptual proximity (combinations of words that are synonymous or referring to a genus of the same species, e.g. “Arduino” and “Arduinolab”). We obtained a dataset of 734 records - 274 for the exploration phase, and 470 for the exploitation phase. Data visualization was achieved through a dual network visualization where the shared hashtags serve as bonds (marked with a circle) between network nodes (the actors in the network, identified by a square) (see Fig. 1).

    Fig. 1.
    figure 1

    Comparison - between Phase 1 (above) and Phase 2 (below) - of the importance in Instagram platform of the “shared hashtags” for each phase. Squares are Instagram users, circles are hashtags, and node size denotes betweenness.

Finally, to the aim of implementing the data analysis the research of proper centrality measures (i.e. Betweenness Centrality of vertices in complete Bipartite Graphs) could be absolved (cfr. Unnithan et al. 2014).

5 Main Evidences

Results from the first research step (case study) are given by the description of the Fondazione Plart Ep. It is related to its dynamic abilities to capture the largest number of experts in the field of polymeric materials and the strategies of external consulting agreements, and indirect involvement of artists and experts in exhibition activities. The involvement of this expertise (and thus, structured inter-organizational links) at different degrees of cognitive distance, allows a larger training supply (e.g. in-depth teaching and laboratory activities replicable at home and in school for younger people).

Results from the second research step emerge by the semantic network analysis of a selected group of Instagram users formed by Fondazione Plart’s external network nodes. A main evidences’ synthetic view is in Fig. 1, where the network visualizations are scaled according to the betweenness centrality valuesFootnote 11. This provides an understanding of which occurrences are the most shared by the users of the network in each phase.

When interpreting the data, it was clear that the evolution of the cycle of learning, the network concerned, at least as regards its narration on social media, evolves into an equally sharp. In the first phase, the network aggregates of semantic fields are related to architecture and design disciplines - all members of the network share the #architecture and #design hashtag. In the second phase, the network also aggregates on a geographical basis, and extends its domain epistemic to the world and the art market, tourism, new creative forms, such as the universe of ‘making’.

6 Conclusion

This research has demonstrated that selecting a group of nodes that are connected by a learning logic based on cognitive distance allows social media (Instagram) to be used as a source of information for research in entrepreneurship.

This study contributes to our understanding of complex social networks by studying the modular structures of networks. Detecting the network modules, or communities, is becoming a critical issue and there is much discussion on the quality of the partition process. Our research testifies to the importance of aligning the research question and its theoretical background (finding a localized entrepreneurship project as a self-discovery process), with the research method (explorative), and thus, fixing the algorithm for mining the web (selecting Instagram users through a firm-level case study). To model the formation of a community (in our conceptual background) we used a learning-based entrepreneurial process, which treats each node as a player in a heuristic of invention (a dynamic cycle alternating the phases of exploration and exploitation).