A Knowledge Extraction and Management Component to Support Spontaneous Participation
- Cite this paper as:
- Porwol L., Hassan I., Ojo A., Breslin J. (2015) A Knowledge Extraction and Management Component to Support Spontaneous Participation. In: Tambouris E. et al. (eds) Electronic Participation. ePart 2015. Lecture Notes in Computer Science, vol 9249. Springer, Cham
Harnessing spontaneous contributions of citizens on Social Media and networking sites is a major feature of the next generation citizen-led e-Participation paradigm. However, extracting information of interest from Social Media streams is a challenging task and requires support from domain specific language resources such as lexica. This work describes our efforts at developing a Knowledge Extraction and Management component which employs a lexicon for extracting information related to public services in Social Media contents or streams as part of a holistic technology infrastructure for citizen-led e-Participation. Our approach consists of three basic steps – (1) acquisition and refinement of public service catalogues, (2) organization of the public service names into a lexicon based on different semantic similarity measures and (3) development of a dictionary-based Named Entity Recognizer (NER) or “spotter” based on the lexicon. We evaluate the performance of the NER solution supported by contextual information generated by two well-known general-purpose information NER tools (DBpedia Spotlight and Alchemy) on a dataset of tweets. Results show that our strategy to domain specific information extraction from Social Media is effective. We conclude with a scenario on how our approach could be scaled-up to extract other types of information from citizen discussions on Social Media.
Keywordse-Participation Citizen-led e-Participation Information extraction (IE) Natural Language Processing (NLP) Public services e-Government
e-Participation involves technology-mediated interaction between citizens and the politics sphere . By leveraging information and telecommunication technology (ICT), in particular contemporary social software technologies, e-Participation facilitates ubiquitous public participation and instant feedback capabilities . Certain contemporary e-Participation solutions attempted to augment the discussion platforms with Social Media as additional communication channel though with very limited success. This is due to the fact that the traditional conceptualizations of e-Participation as a consultative, democratic process with involvement of citizens in policymaking do not sufficiently address the common spontaneous citizen participation on informal channels such as Social Media . Our experience so far is that state-of-the-art e-Participation solutions integrated with Social Media directly or in particular, leveraging generic, of-the-shelf analytical solutions to harness the vast information on Social Media mainly through general-purpose NER tools, largely fail to achieve the desired performance level. This we argue is due to lack of specific mechanisms to deal with information overload and lack of domain-specific natural language propocessing tools to complement the popular general-purpose ones which offer relatively narrow range of general concepts like name of a person, organization, location, brand, product, a numeric expression including time, date, money and percent . In particular, state-of-the-art solutions fall short in addressing e-Participation specific concepts such as public service names or policies. Therefore, progress in this area is contingent on developing domain-specific tools for processing political discussion by citizens and e-participation contents in general. Building on our previous work , we show how the Knowledge Extraction & Management component of a holistic infrastructure for e-Participation could be implemented using domain-specific lexical resource. We demonstrate and discuss the use of the component through a use case scenario. Thus, by our solution we exemplify the creation of explicit technological bridge between the citizen-political-discussion-sphere (Social Media) and public services sphere. The paper is structured as follows: In Sect. 2 we elaborate on related work in the context of use of social software, in particular Social Media for e-Participation; Sect. 3 presents important concepts essential to understand the of Knowledge Extraction and Management Component development process; In Sect. 3 we discuss the approach; Sect. 4 elaborates on the Knowledge Extraction and Management component creation and presents an example use case scenario. In Sect. 5 we discuss our contribution to the advancement of e-Participation domain with final conclusions presented in Sect. 6.
2 Related Work
2.1 Web 2.0 and Social Media in e-Participation
The last decade witnessed many examples of the use of social software as an infrastructure for realizing certain aspects of e-Participation. Social software is usually referred to as: Web 2.0 Software (or platform) that enables social networking by offering capabilities for people to contact and interact with each other . The main principle of Web 2.0 is collective intelligence, collaborative content creation and composition by the user (here citizen) who contributes towards common knowledge . Many e-Participation projects including HUWY,1 U@MARENOSTRUM2, VID3, WAVE4, VOICES5, WEGOV6, Puzzled by Policy7, PADGETS8, SPACES9 employed Web 2.0 tools such as digital forums, blogs, wiki’s and live-chat to provide dedicated e-Participation environment where citizens can express and discuss their needs, concerns and ideas. Those highly structured platforms, though supposedly well tuned to specific e-Participation needs, in principle suffer from abysmally low participation of citizens. In contrast, very specific, incredibly popular sub-group of social software tools: Social Media are widely used by citizens for spontaneous political discussions though without direct link to the formal e-Participation. This phenomena is referred to in the literature as Duality of e-Participation . Therefore, in response to challenges faced by the dedicated e-Participation platforms some of the solutions indeed, introduced explicit support for the popular Social Media platforms with particular feed integration (in rare cases both ways content exchange is available) [9, 10]. Some more advanced solutions such as presented by PADGETS  with injection of special widgets into Social Media enable direct back-loop feed to the dedicated platform. However the big challenge remains unsolved as the prominent e-Participation solutions integrating Social Media largely do not address the issue of volume nor quality (lack of relevant selectivity) of the content produced , therefore do not ensure sufficient innovation to enable the dual e-Participation observed by Macintosh . We are aware of certain original attempts to leverage the potential of spontaneous discussions on Social Media, such as the innovative approach presented in WEGOV project . Nevertheless the methodology applied in the project appears to relay on relatively generic Social Media analytics tools (for topic detection, topic popularity, sentiment analysis and seed user detection) without explicit, direct links to the government sphere including for instance: references to governmental services, policy documents or newsletters. Moreover the methodology does not seem to give much of explicit thought to the essential synergy between current government-led solutions and processes, and citizen-led participation. Therefore the solution offered by the project, though advanced, appears to repeat the principles of the already available of-the-shelf, popular Social Media analytics solutions for businesses.
Considering Social Media and politics it is important to recall past miscellaneous attempts to harness Social Media for e-Participation, beyond e-Participation research projects. In particular, as it has been shown that successful Social Media campaign can influence political popularity (hence can have a significant impact on results of elections), many decision makers and government offices employed Social Media as a direct campaign communication channel [14, 15]. Another important e-Participation Social-Media use context has been: improved, Social-Media-supported Disaster and Crisis Management and Policy Development derived from Social-Media-facilitated citizen reporting capabilities [16, 17]. In particular Social Media have been playing increasing role as rapid crowdsourcing and rapid response tools, especially in the events of crisis (including political crisis)  and natural disaster . However, the Social-Media-applications for e-Participation, in the cases mentioned, focus rather on the use of popular Social Media platforms directly or use common, of-the-shelf (not a domain specific) analytical methodologies and solutions to harness the spontaneous political discussions what results in moderate performance. Therefore, a solution that would try to comprehensively address specific analytical needs of the e-Participation context, such as: effective methods for identifying political content on Social Media or contextual information clustering and linking is yet to be developed.
2.2 Semantic Web
The Semantic Web (Web 3.0) provides a framework that allows data to be shared and reused across applications, enterprises, and community boundaries  advancing the website-based Web 2.0 to the Web of Data. Semantic Web leverages ontologies for information modelling and knowledge representation. Ontologies provide a controlled vocabulary of terms that can collectively provide an abstract view of the domain . Semantic Web technologies and ontologies are being used to address data discovery, data interoperability, knowledge sharing and collaboration problems. Ontologies can be described in RDF (Resource Description Framework)  which provides a flexible graph based model, used to describe and relate resources. The application of Semantic Web technologies to e-Government gained significant momentum with applications to several major areas including the use of ontologies to formally model different aspects of e-government; Structuring e-Participation research , Enabling personalized service delivery ; Enabling interoperability and integration of government resources and services .
In our work we leverage Semantic Web for constructing artefacts essential for Natural Language Processing and for storing all the data in a graph, in order to explicitly support better data discovery and interoperability for next generation e-Participation.
2.3 Natural Language Processing
2.3.1 Information Extraction
The goal of Information Extraction (IE) is to derive information structures directly from text with emphasis on the following aspects: identifying relations from textual content , automatic instantiation of ontologies and building knowledge bases tools . Common methods on IE have focused on the use of supervised learning – SL techniques , self-supervised methods  and rule learning . These techniques learn a language model or a set of rules from a set of manually tagged training documents and then apply the model or rules to new texts. The challenge for the SL approaches is the high cost of creating the labelled resources. In contrast, the unsupervised learning (UL) methods (also referred to as Open Information Extraction) attempt to fetch information automatically from the texts themselves .
2.3.2 Named Entity Recognition
A named entity can be defined as an entity classified accordingly to predefined set of categories for instance: person, organization, location, brand, product, time, date etc. . The Named Entity Recognition applies multiple “classic” information extraction techniques listed before: SL, SSL and UL. However certain contemporary NER solutions apply lexical resources (e.g. WordNet10), lexical patterns and statistics computed on large annotated corpus . The common processing pipeline for NER includes detecting named entities, assigning a type weighted by a numeric confidence score and by providing a list of URIs for disambiguation. The lexical resources and terminological databases are essential part of modern NLP systems consisting of large amount of highly detailed and curated entries . A Lexicon can be developed to be domain independent or to support a specific domain. Thus, in our work we focus on the lexicon-based domain-specific approach to NER.
Design science creates and evaluates artefacts that define ideas, practices, technical capabilities and products through which the analysis, design, implementation and use of information systems can be effectively accomplished. Given that the goal of this work is to construct a technical artefact, our research follows the Design Science Research guidelines and process elaborated in [34, 35]. In particular, our objective is to develop a Knowledge Extraction and Management Component (KEMC) as part of a comprehensive infrastructure for e-Participation. To achieve this goal, we construct two technical artefacts – (1) a Lexicon of public service names and (2) a Named Entity Recogniser based on the lexicon and integrating generic NER solutions through dedicated APIs. The development of the lexicon is based on the national public service catalogues. The two datasets were employed as input into a process which automatically related public service names based on a set of semantic similarity and relatedness measures including Explicit Semantics Analysis (ESA)  and WordNet-based measures. The resulting graph of Public Service Names is subsequently employed to develop a NER or spotter using an open source dictionary based spotter framework. In line with the DSR process model described in .
4 Knowledge Extraction and Management Component (KEMC)
In this section we elaborate on Knowledge Extraction and Management Component implementation encapsulating two core building blocks: the public service lexicon and the NER solution leveraging the generated language resource. First we present the comprehensive infrastructure for e-Participation design to provide the context for KEMC development. Then we present the domain-specific lexicon creation process algorithm followed by application of the language resource to dedicated NER solutions combining the output of the generic NER solutions.
4.1 e-Participation Infrastructure Design
4.3 Named Entity Recogniser
NER performance comparison
The goal of this section is to demonstrate the use of KEMC solution through a use case scenario: John Smith (hypothetical character) is an Irish politician promoting legislation introducing restrictions on medical card applicants’ eligibility. John opens the e-Participation Analyser supported by KEMC. Based on his specific request, the interface generates a dynamic report of places in Ireland from which it appears that citizens express negative sentiment towards the healthcare services. From the information mined from Twitter (public service tweets detected by our NER solution) it is apparent that Galway City (location detected) has the highest rate of negative opinions (sentiment analysis) oscillating around the institution of University Hospital – UH and Merlin Park Hospital – MPH (organisation entities detected). Moreover common topics found are (through topic analysis): prenatal care, physiotherapy and medical card. John tries to identify the key arguments against his policy project therefore he explores the posts and discussions of highest popularity rank with negative sentiment associated with the medical card and public healthcare. After following selected discussions (represented in semantic web format - every posts and discussion is distinguished by unique URL) he realises that the negative opinions come mainly from UH and MPH not accepting the medical card for particular services (prenatal care and physiotherapy) therefore he engages into discussion with citizens on Twitter and explains that the issues mentioned by citizens are of local character (but will be addressed) and ensures citizens that the upcoming legislation will not bring any harm but rather improve the current set of services covered. Moreover now, since John knows that the “hot” topics detected around Public Healthcare Services in Ireland are closely related medical card (similarity measures and graph distance), he suggests relevant common strategy that should be developed in order to facilitate a solution for these problems. The use case scenario presented will be leveraged for real-world experimentation for KEMC deployment. We believe that the direct implication of the use of KEMC will be to enable government to interact with citizen-spaces on Social Media in a more selective, topic-relevant, efficient way and long-term, can contribute significantly to enhancing the delivery of the public services as a result of better understanding of citizen’s needs and concerns; hence directly supporting the duality of e-Participation.
In this paper we have briefly introduced the state-of-the-art use of social software (in particular Social Media) for e-Participation, focusing on the main trends. The last decade has seen a shift in e-Participation from simple Web 2.0 forums to more advanced platforms integrating Social Media like Facebook or Twitter. Social Media are by many folds more widely used by citizens than any e-Participation solution. Moreover, many people incorporated them into everyday activities as they are very easy to use  and indeed they became a spontaneous place for, every-day political discussions. Nevertheless, we argue that to date, the current e-Participation solutions do not unleash the full potential of Social Media analytics in the context of e-Government, essential to deal with significant channel-specific obstacles like information overload and varying content quality. The main challenge, which is the lack of dedicated analytical tools for e-Participation, renders most of the contemporary e-Participation solutions’ performance insufficient to fully support the duality of e-Participation. Results from our work, in creating Knowledge Extraction and Management component as part of a holistic infrastructure for e-Participation, in a form a domain-specific NER solution powered by a Public Service Lexicon, provides first significant step towards building explicit connection between the sphere of government and citizen spontaneous discussions on Social Media, thus delivering the base to support duality of e-Participation. We show that dedicated solution outperforms the generic, of-the-shelf analytical solutions; therefore further development of custom solutions (that can follow our universal methodology) is a viable option for the advancement of e-Participation domain. We claim better universality and scalability of the automatically generated lexical resources (based on similarity measures) in comparison to Supervised-Learning-based solutions, which demand significant manual efforts on creating relevant resources. Moreover we claim better alignment of the resource delivered to specific e-Participation need. Nevertheless, we emphasise on the limitation of the resource, created explicitly for Irish and UK context. We demonstrate, through a specific use case scenario that a combination of the dedicated NER, supplied with rich contextual information provided by generic NER solutions can be a very powerful tool for e-Participation information analytics. The example use of KEMC for identifying Social Media discussions and posts related to concrete public services opens up a possibility for completely new set of capabilities for public services’ citizen-perception evaluation, hence supporting future improvement and public service integration. Apart from the extensive works carried under WeGov project , we are not aware of any other significant attempt at applying advanced Social Media analytics to e-Participation. We acknowledge innovative work by Hagen et al.  on leveraging NLP technologies to analyse e-Petitioning content. However, we haven’t found any approach that would try to combine and apply dedicated, scalable NER, supported by rich, automatically annotated domain-specific lexical resources, Semantic Web and Natural Language Processing technologies to address the duality of e-Participation.
Motivated by the need to provide the necessary step towards supporting the duality of e-Participation, we have presented a technical component – KEMC for extracting, consolidating and enriching (by linking) information from spontaneous discussions on Social Media as part of the comprehensive infrastructure for e-Participation design implementation, advancing the existing e-Participation methods and tools. Results from our work show immediate opportunities for developing and consolidating the domain-specific lexical resources, Semantic Web and NLP technologies into an analytical infrastructure for application to the context of e-Participation. While we have demonstrated theoretically the usefulness of the KEMC component, on the example of public healthcare service information matching supported by miscellaneous contextual information, more detailed and formal evaluations in different contexts are yet to be conducted. Next steps for the research include the implementation of special, information-visualisation-rich dashboards building upon developed analytical component to explain better the capabilities of the comprehensive infrastructure for e-Participation. This will be followed by a set of interviews with politicians and citizens in order to determine the usefulness of presented solution in reaching out the information about public services and potentially supporting the process of public services improvement. Future steps should also bring series of design and development iterations of the solution with applications of the component to other contexts of e-Participation to advance the domain of Social Media analytics for e-Government.
This work has been funded by EU Commission - Grant 645860 (ROUTE-TO-PA).