In order to design the OM/SA approach in the EU Community, we have studied the use of these technologies in related recent EU projects, which all aim at the exploitation of various types of political content created by citizens in the Internet, with main emphasis on the social media (in order to develop knowledge models, recommendations, intelligent services, etc.). A review of this OM/SA use approaches is initially provided in this section, and then are described the innovative features of the proposed OM/SA approach for the EU-Community project with respect to these recent related projects.
3.1 Related Projects
+spaces (2010–2012).
+Spaces (“Positive Spaces”) was a research project research aiming at supporting the formulation of effective public policies by assessing the impacts of prospective policies, using political content created in various Facebook and Twitter, alongside virtual spaces (VS), like Open Wonderland. Modern VSs can be viewed as micro-societies, with dynamics resembling those of real world societies, the most evolved ones having virtual economies as well as regulations analogous to real-life legislative frameworks. Moreover, VSs are controlled environments in which all parameters of users’ reactions and interactions can be tracked, so they can be very useful for assessing impacts of various policy options. In this project, information retrieval mechanisms and ON/SA were applied in order to collect data from VS and process them. Structured data from polls and petitions as well as unstructured data from VS blogs and debate logs were incorporated, together with relational information like social networks from user tracing.
E-policy (2011–2014).
The E-Policy (“Engineering the POlicy-making LIfe CYcle”) project aimed at supporting policy makers for ‘engineering’ the policy making life-cycle, integrating both global and individual perspectives. Its objectives included assessment of social impacts through opinion mining on e-participation data from various thematic web sites and Web 2.0 platforms that allowed users to express their opinions on energy related topics through textual messages. In this project opinion mining identified social impacts that should be considered at both global and individual levels. At the global level, opinion mining aggregated individual opinions as trend line in order to conduct policy evaluation. Finally regression analysis was performed, in which text sentiment, estimated as a numeric score in the interval of [−2; 2], with negative (positive) values indicating negative (positive) sentiment, was used as one of many independent variables in impact simulation modules.
Render (2010–2013).
The Render (“Reflecting Knowledge Diversity”) project targeted at leveraging diversity (viewed as a valuable asset and crucial source of innovation and adaptability) in information management, in order to allow for better communication and collaboration. Under this objective, they addressed the problem of sentiment analysis in multiple domains and several languages, such as English, French, German, Italian, and Spanish. They exploited domain knowledge in the form of different sentiment lexicons, as well as the influence of various lexical surface features. Experimental results showed that the improvement resulting from using a two-layer model, sentiment lexicons, surface features and feature scaling is quite important, especially in social media textual datasets. Also, in this project, a tool was developed that performs sentiment classification and visualization of Twitter short texts, which enables the analysis and visualization of diversity in tweets.
Arcomem (2011–2014).
This project aims to enable memory institutions like archives, museums, and libraries to use and incorporate relevant social media content. A series of initial applications have been developed for opinion mining from social media using GATE, a freely available toolkit for language processing. Based on the work described in Maynard and Funk (2011), which focused on sentiments identification in tweets about political parties, their methodologies were extended to a more generic analysis of sentiment about any kind of entity or event mentioned, focusing on two specific domains: the current Greek financial crisis and the Rock am Ring rock festival in Germany in 2010. For both cases, first a basic sentiment analysis was performed, by associating a positive, negative or neutral sentiment to each relevant opinion target, together with a polarity score. As a next step, entity or event extraction was performed. A modified version of ANNIE, the default Named Entity (NE) recognition system in GATE, was used in order to identify mentions of persons, locations, organizations, dates, times and financial concepts. Sentiment analysis is performed by using a rule-based approach.
TrendMiner (2011–2014).
TrendMiner dealt with large-scale, cross-lingual trend mining and summarization of real-time social media streams. This project was very similar to the abovementioned project ArcoMem, involving more or less the same OM/SA methodology and tools. Again, the use of sentiment lexicons, special purpose gazetteers and rule-based approaches, under GATE platform, comprised the overall strategy for performing OM/SA on social media.
Padgets (2010–2012).
PADGETS (“Policy Gadgets Mashing Underlying Group Knowledge in Web 2.0 Media”) focused on multilingual SA of citizens postings in government social media accounts, as a response to government policy campaigns and postings on specific topics/policies of interest. The main research objective of this project was to develop a methodology and a technological platform for the systematic and centrally managed exploitation of the emerging Web 2.0 social media by government organizations in their policy and decision making processes (Ferro et al. 2013). Citizens’ postings (concerning opinions and comments) in government accounts in a variety of social media platforms, such as Twitter, Facebook, YouTube, Blogger, etc., were analyzed in order to identify citizens’ sentiments. Since texts in social media tend to be very small, a machine learning approach was followed, in which limited linguistic resources were required. There was also a sentiment analysis module that incorporated sentiment lexicons, but this proved to perform significantly worse than machine learning models. Moreover, inclusion of emotional writing style attribute was taken into account in order to augment the performance of SA. From a technical point of view, feature selection was performed using a hybrid scheme of Support Vector Machines and Genetic algorithms. The system was implemented using RapidMiner®.
Nomad (2012–2015).
The NOMAD project (“Policy Formulation and Validation through non-moderated crowd sourcing”) aimed at enabling government agencies to exploit the extensive political content created in the social media beyond their own accounts, in multiple external sources (e.g. political blogs and forums, news websites, and various Twitter, Facebook, etc. accounts) (Loukis and Charalabidis 2014). For this purpose there is extensive use of sophisticated OM/SA techniques, such as semantically driven textual data acquisition, sentiment analysis, thematic analysis, topics extraction and arguments extraction, and summarization. The main technology used was polarity lexica with some learning algorithms that exploit name entities.
3.2 Innovative Features of the OM/SA of the EU-Community Project
The EU-Community project, in order to meet its particular objectives and requirements, has developed a novel approach to OM/SA (described in the following section), which includes some interesting innovative features with respect to the above similar projects, shown below in Table 1.
Table 1. Innovative Features of the OM/SA approach of the EU-Community project