Modern text mining and content analysis methods can support political science research by enabling large-scale modelling of opinions and their diffusion, taking into account a plethora of sources. This is particularly interesting for investigating norm acceptance – a field, where large-scale studies have so far hardly been conducted due to the high costs of manual evaluation and interpretation and the difficulty of integrating multiple data sources. Traditionally, one model has been computed for one specific data source, e.g. there are studies analyzing opinions on Twitter  and how they are propagated through the Twitter network . An especially challenging issue is the necessity of simultaneously dealing with data from various domains, as outlined in the previous section, and, instead of integrating potentially many individual models to get a holistic picture, perform cross-domain fusion.
A crucial step in an (semi-)automatic analysis of norm acceptance is, thus, the fusion of data across domains at different processing levels. It is obviously not enough just to collect the different data sources into one data silo, but the sources at varying stages of the natural language processing pipeline must be integrated. The whole process can be broken down into a sequence of processing steps and their associated computer science fields as follows:
Finding and selecting sources and statements from a broad range of text types: information retrieval & data integration
Knowledge extraction, opinion mining, and sentiment analysis: text mining
Narratives and argument mining: computer linguistics
Information diffusion: social network analysis
Visualization and interpretation: human–computer interaction
Fig. 2 shows a coarse-grained architectural overview of the proposed system. The goal is to automate this pipeline as much as possible, while allowing the political scientist to interfere and guide the process based on expert knowledge.
Retrieval and integration
The stance of an individual or a group on norms manifests itself in various outlets: policy reports, contracts, political agreements, laws, news, but also in more informal ways, such as through social media, speeches, (transcribed) interviews, etc. Automatically extracting relevant statements from these different data sources is very challenging. Further, sources need to be assessed with regard to trustworthiness and authenticity. Also, the attribution of statements to a specific group or entity is often not easy. Especially in the political domain, underlying norms are not always visible in concrete statements or advocated solutions and decisions. Automatic detection, therefore, needs to have a common understanding of social and political backgrounds, e.g. in the form of a knowledge graph or other background models. In this first step, the focus is on selecting the statements and specific sources for further processing and analysis. Interwoven with this step is the integration of many heterogeneous data types and information from the political domain into a unified representation. The identification of actors and attribution of statements needs to happen at this stage as well. The fact that individual actors usually use more than one medium to communicate makes it necessary to identify and disambiguate actors to establish some kind of profile.
To populate profiles of political actors (individuals, parties, media companies, decision makers, etc.) opinions and stances need to be extracted. Opinion mining and sentiment analysis are well-researched areas when it comes to, e.g. product reviews. In the political domain, opinion mining is much more difficult . This has two reasons: firstly, political opinions are typically very complex and not just binary (good or bad) but a lot more nuanced. Further, stances on a particular political issue are not independent of each other. Norms are related and dependent on each other, and it is not possible to negate one norm without also changing the stance towards other dependent norms without making a logical error. Secondly, political actors are often inclined to obfuscate their true stance or at least keep it vague. This makes opinion mining in the political domain much more challenging and, therefore, requires a deeper understanding of the utterances of political actors.
In the political sphere, simple sentiment analysis methods typically fail. Except for explicit voting on a well-defined issue or answering specific, clearly defined questions, norm acceptance and contestations are typically expressed in longer discussions. Here, methods from argument mining can be employed to narrow down the political views and normative stances. For complicated issues, extracting narratives around a certain norm could help to represent the various stances . Methods from representation learning as well as automatic summarization could be employed to position political actors in a kind of latent argument space regarding norms and political views. Progress in embedding methods and language models  are also promising to capture complex semantic information.
Many political actors and also many citizens use social media to get information on the one hand and to spread their opinions on the other hand. This has led to powerful influencers with millions of followers and groups with high reach. Social network analysis, or more specifically, information diffusion research can identify highly influential accounts and track the spread of information through a social network. It is of paramount interest to monitor political actors and interest groups to analyze norm acceptance and how they position themselves with regard to particular political topics. A large-scale analysis of these communication structures might uncover latent relations and influence among various groups.
All those computational models, representations, and analyses are of limited use without a proper way for political scientists to explore and investigate them. Therefore, finding a good way to visualize results backed up by their grounding sources and a clear depiction of communication structures are vital. Interpretation of the automatically generated results by political scientists is necessary, and capabilities to interact with the results, for example by browsing or exploring the representations and aggregated results, need to be supported. Even with the most advanced NLP tools, human experts need to draw the right conclusions from the data and put the results not only into the political context but also into the socio-economic context and consider other factors that might be relevant for norm acceptance. To this end, retrieval functionality needs to be provided to make the resulting model and its representation accessible.
Large-scale processing and the fusion of multiple types of sources opens up new possibilities for norm acceptance research beyond a mere increase in data volume. The global structure of information networks and the advancement in automatic translation make transferring results from one political region to others possible. Further, comparing regions and analyzing their influence on each other becomes possible. Within a globalized world, not only goods and money are no longer bound to country boarders, but also opinions and norm acceptance are globally influenced. Besides the spatial analysis, the temporal aspect of changing norm acceptance over time can be investigated on a much finer level using cross-domain fusion. Especially the interplay between different norms and how acceptance and contestation of particular norms lead to changing stances on other topics can over time become a new field of study for political science.