DomainSenticNet: An Ontology and a Methodology Enabling Domain-Aware Sentic Computing

In recent years, SenticNet and OntoSenticNet have represented important developments in the novel interdisciplinary field of research known as sentic computing, enabling the development of a variety of Sentic applications. In this paper, we propose an extension of the OntoSenticNet ontology, named DomainSenticNet, and contribute an unsupervised methodology to support the development of domain-aware Sentic applications. We developed an unsupervised methodology that, for each concept in OntoSenticNet, mines semantically related concepts from WordNet and Probase knowledge bases and computes domain distributional information from the entire collection of Kickstarter domain-specific crowdfunding campaigns. Subsequently, we applied DomainSenticNet to a prototype tool for Kickstarter campaign authoring and success prediction, demonstrating an improvement in the interpretability of sentiment intensities. DomainSenticNet is an extension of the OntoSenticNet ontology that integrates each of the 100,000 concepts included in OntoSenticNet with a set of semantically related concepts and domain distributional information. The defined unsupervised methodology is highly replicable and can be easily adapted to build similar domain-aware resources from different domain corpora and external knowledge bases. Used in combination with OntoSenticNet, DomainSenticNet may favor the development of novel hybrid aspect-based sentiment analysis systems and support further research on sentic computing in domain-aware applications.


Introduction
In recent decades, the Internet has become the preferred communication channel for novel forms of everyday human activities. As recently highlighted by the unfortunate global situation caused by the COVID-19 1 pandemic, people are now able to perform new activities online to replace or complement traditional behaviors. Popular examples of the new forms of activity domains include e-learning, e-commerce, telehealth, telemedicine, social media, and e-government. Within this context, the majority of the above-mentioned sectors and fields of research benefit from the analyses of popular opinions and sentiments that are massively and extensively conveyed over the Internet via user-generated contents. To support this, researchers are investigating and developing methodologies for aspectbased sentiment analysis (ABSA). As reported by recent surveys [10,12,13], the literature on ABSA has identified many open challenges to be solved. The authors of [14] hold that state-of-the-art ABSA approaches can be broadly categorized into symbolic and sub-symbolic approaches. Symbolic approaches "consist of machine learning techniques that perform sentiment classification based on word co-occurrence frequencies." Sub-symbolic approaches, on the other hand, "include the use of lexicons, ontologies, and semantic networks to encode the polarity associated with words and multiword expressions." In both cases, ABSA "is a suitcase research problem" [10] that requires many natural language processing (NLP) challenges to be overcome.
In this paper, we introduce DomainSenticNet, an extension of the OntoSenticNet ontology [14] to aid the development of hybrid ABSA systems by leveraging the advantages of both symbolic and sub-symbolic approaches. DomainSenticNet is a resource written in the Web Ontology Language (OWL) standard that, for each of the 100,000 OntoSenticNet concepts, provides a set of semantically related concepts and domain distributional information. Specifically, to build DomainSenticNet, for each of the concepts in OntoSenticNet, we mined semantically related concepts from the knowledge bases WordNet [18] and Probase [34] and obtained domain distributional information by computing the distribution of occurrences and co-occurrences of the concept across domain-specific texts extracted from textual descriptions of the entire collection of Kickstarter 2 crowdfunding campaigns.
The present paper describes the unsupervised methodology we designed to build our resource, which can be replicated to generate similar resources from different domain corpora and external knowledge bases. Therefore, DomainSenticNet, used in combination with OntoSenticNet, can support future investigations of sentic computing [7] for domain-aware research and applications. Moreover, in this paper, we discuss the practical usage of our resource and present an example of a real application that provides a high level of interpretability of sentiment intensities expressed for domain aspects.
The remainder of the paper is organized as follows. Section 2 states our research objectives. Section 3 describes DomainSenticNet and the unsupervised methodology we designed to construct it from the external knowledge bases WordNet [18] and Probase [34], and the textual description of Kickstarter crowdfunding campaigns. Section 4 describes an example of a real application that, drawing on DomainSenticNet, demonstrates improved interpretability of aspect-based sentiment analysis outcomes. Section 5 summarizes the existing literature related to our work. Finally, Section 6 provides concluding remarks.

Research Objectives
OntoSenticNet [14] is a commonsense ontology for sentiment analysis based on SenticNet, a semantic network of 100,000 concepts. In this paper, our main research objective is to provide an extension (not a substitution) of OntoSenticNet to: -RO1: provide a wider coverage of domain-specific concepts (not yet included in SenticNet) to support the development of novel hybrid (symbolic and subsymbolic) domain-specific SenticNet-based ABSA systems; -RO2: include, for each concept, effective and humanreadable information on the domain pertinence; and -RO3: use a standard knowledge representation language to ease the adoption and reuse of our OntoSenticNet extension.
Additionally, with respect to the methodology, we had one further research objective: -RO4: to define a replicable (and generalized) methodology that could be adapted with minimal efforts to cover additional concepts and domains.
In Section 3, we describe the resource and the methodology.

DomainSenticNet Resource and Methodology
In this section, we introduce DomainSenticNet and describe the unsupervised methodology we defined to create the resource. DomainSenticNet is a resource that extends Onto-SenticNet with: 1. additional related concepts harvested from external knowledge bases; 2. distributional information, i.e., occurrences and co-occurrences of each SenticNet concept and related concepts, in domain-related texts.
To illustrate the characteristics of our resource, in Fig. 1, we visually represent the original SenticNet concept "apple" as a graph. In this graph, nodes represent SenticNet concepts and edges represent semantic relatedness between pairs of concepts. Figure 2 shows a visual representation of the corresponding "apple" concept in DomainSenticNet. In this figure, additional nodes represent the semantically related concepts mined from external knowledge bases and edges are complemented by domain distributional information about occurrences and co-occurrences in domain texts. Figure 3 depicts the methodology workflow we designed and performed to generate the DomainSenticNet resource. The methodology included four main steps: In the following sections, we describe each of the four steps and, without loss of generality, make explicit reference to the external knowledge bases and corpora used to generate DomainSenticNet.

Expansion
To address our first research objective (see Section 2, RO1), in the first step of our workflow, for each concept ∈ Sentic-Net, we searched for semantically related concepts in the external knowledge bases WordNet [18] and Probase [36]. In both knowledge bases, we first identified all concepts corresponding to those in SenticNet. Then, to collect all neighborhood concepts, for each identified concept, we performed a 1-hop visit on the corresponding knowledge graphs, following the hypernymy ("is a") and synonymy relationships. Figure 4 shows an excerpt of the semantically related concepts we found for the "apple" SenticNet concept. For this concept, we first identified the concepts "apple#1" and "apple#2" in WordNet and "apple" in Probase. Subsequently, we collected two synonyms (i.e., "malus pumila" and "orchard apple tree") and four hypernyms (i.e., "apple tree," "edible fruit," "false fruit," and "pome") from WordNet, and ∼4.6K hypernyms (e.g., "brand," "corporation," "company," "crop," "firm," "food," "fresh fruit," "fruit," "fruit tree," "manufacturer," etc.) from Probase.

Mining of Domain Corpora
Distributional information was at the base of our second research objective (see Section 2, RO2). To tackle this objective, we applied standard text mining techniques on domainspecific corpora, to compute: i) the number of occurrences of concepts belonging to SenticNet and ii) the number of co-occurrences of each concept in SenticNet and the semantically related external concepts we previously harvested in Step 1 (see Section 3.1). As a medium-sized collection of domain-specific texts, Kickstarter was chosen as a data source. 4 Kickstarter, a popular source for data scientists, includes approximately 480K campaign descriptions 5 in the form of hypertexts, including text, images, videos, and hyperlinks. 6 To identify the domains of interest of each campaign, we leveraged the labels available on the Kickstarter platform to categorize each campaign description. In Table 1, we present an excerpt of the 15 main domain categories of Kickstarter, with related subcategories. 7 The number of occurrences and co-occurrences was computed in four substeps: -Step 2.1: Starting from the campaign uniform resource locators (URLs), we retrieved campaign textual descriptions by means of a custom-made crawler; -Step 2.2: For each word w corresponding to one of the concepts generated in Step 1 (see Section 3.1) and for each textual campaign description t, we computed the number of occurrences occ(w, t) of word w in t; -Step 2.3: For each campaign description t and for each pair of words {w 1 , w 2 } s.t. occ(w 1 , t) > 0 and occ(w 2 , t) > 0 , we computed the number of co-occurrences co_occ(w 1 , w 2 , t) of words w 1 and w 2 in the description t as co_occ(w 1 , w 2 , t) = occ(w 1 , t) * occ(w 2 , t); -Step 2.4: Since Kickstarter campaigns are laJbeled with two domain categories (i.e., a main category and an optional subcategory), we leveraged this labeling to compute the distributions of occurrences and co-occurrences of concepts across domains.  4 Monthly updated dataset of the Kickstarter campaign URLs is available at: https ://webro bots.io/kicks tarte r-datas ets/ 5 Real-time statistics are accessible at: https ://www.kicks tarte r.com/ help/stats 6 We were able to crawl a total of ∼230K Kickstarter descriptions from the original ∼480K campaigns. 7 An overview of the respective domains and related statistics is available at: https ://www.kicks tarte r.com/help/stats Returning to the "apple" concept example, Fig. 5 depicts the distribution of occurrences of the word "apple" over each resulting domain corpus; Fig. 6 presents the co-occurrences distribution for the pair of words "apple" and "brand."

Domain Weighting
Since most distributional methodologies perform better using normalized weights, to complete our second research objective (see Section 2, RO2), we defined a proper transformation to obtain correct domain distributional information in the third step of our workflow. To this end, we defined a domain relevance function that assigned each SenticNet concept w a domain relevance with respect to a corpus C d . The function is defined as follows: where C d includes all textual descriptions of the Kickstarter campaigns labeled with a specific domain category d.
Additionally, in order to represent the domain relevance of a pair of related concepts {w 1 , w 2 } we defined: Continuing the "apple" concept example, Fig. 7 shows the domain distribution of the domainOccScore for the concept "apple," and Fig. 8 presents the domain distribution of domainCooccScore for the two semantically related concepts "apple" and "brand." Finally, in Table 2, we provide the top 40 most co-occurring concepts with "apple" across domains.

OWL Translation
To address the third research objective (see Section 2, RO3), in the fourth step of our workflow (see Fig. 3, block 4), we translated all collected domain distributional information into an OWL representation. As shown in the ontology schema depicted in Fig. 9, DomainSenticNet refers to the original definition of SenticConcept, thus enabling reference to all original OntoSenticNet facts.

WordNet Probase
As an example, in OntoSenticNet [14], the concept "apple" is defined as follows: ii) polarity is the overall sentiment polarity; iii) semantics are properties representing five semantically related concepts (e.g., adam_and_eve, fruit, garden, outdoor, and tree); and iv) primitiveURI refers to two primitive moods (e.g., admiration and interest).
To represent all of the concepts mined from the external knowledge bases in the first step (see Fig. 3, block 1), we defined the "ExternalConcept" class as follows: The above class enables the model to reference concepts such as the "malus pumila," in which WordNet presents as a synonym of the SenticNet concept "apple." Instances of the "ExternalConcept" class have two annotation properties, namely provenance and text, which represent the source knowledge base and the lexeme, respectively: Fig. 6 Co-occurrence distribution (top 36 domains) of the words "apple" and "brand" in the domain corpora extracted from the Kickstarter campaigns Fig. 7 The domainOccScore(w, C d ) distribution ( d ∈ set of the top 36 domains) for the DomainSenticNet concept w ="apple" As an example, the external concept "malus pumila" is defined as follows: Fig. 8 The domainCooccScore(w 1 , w 2 , C d ) distribution ( d ∈ set of the top 36 domains) for the DomainSenticNet concepts w 1 ="apple" and w 2 = "brand"  To represent each of the 176 considered domains, we defined the following Domain class: The 15 main categories and 161 subcategories were then defined as subclasses of the Domain class.
As an example, the resulting definition for the domain "Ceramics" includes the annotation property subDomainOf, representing the fact that "Ceramics" is a subdomain of "Art." To represent the domain weights described in Section 3.3, we provided the definitions for the DomainScore, Domain-OccScore, and DomainCooccScore classes, as follows: The datatype property score represents a numeric weight: The following object property domain represents the domain related to a score: Finally, the object properties referTo, source, and exter-nalSource bind a DomainScore to one or more SenticConcepts or ExternalConcepts: As an example, the domainOccScore("apple," D food ) , defined in Section 3.3, is represented as follows: To address the above-mentioned dynamicity, we created a project Web page 10 and established a maintenance schedule for the generation of time-based update releases.

Domain-Aware Kickstarter Campaign Success Prediction with DomainSenticNet
In this section, we present an example application of DomainSenticNet.
GameOn [16] is a prototype application designed to support the authoring of successful crowdfunding campaigns in Kickstarter.
The main characteristics of GameOn 11 are: -It automatically induces (by means of clustering) a partition of semantically related domain aspects mined from user-generated product and service reviews, with each cluster representing an "influencing factor" for the campaign success; 12 -It employs SenticNet to perform an ABSA and to identify emotional intensities expressed in textual campaign descriptions for the above-mentioned domain aspects; -It aggregates the above-mentioned emotional intensities into a statistical index (NeedIndex), which: i) identifies the most influencing factors of the campaign success and ii) calibrates an objective and key result (OKR) 13 scale to interpret NeedIndexes, through the identification of the low and high emotional intensity bounds, delimiting low, medium, and high emotional intensity states, respectively; 11 https ://githu b.com/needi ndex/gameo n 12 It is worth noting that the tool can also process the human-crafted partitions of the domain aspects. 9 https ://webro bots.io/kicks tarte r-datas ets/ 10 https ://githu b.com/needi ndex/domai nsent icnet Additionally, the domainCooccScore("apple," "company," D technology ) , defined in Section 3.3, is represented as follows:

Results
DomainSenticNet was the result of our investigations aimed at achieving research objectives RO1, RO2, and RO3 (see Section 2).
The proposed approach was the result of RO4 (see Section 2), which primarily defined a generalized methodology that could be easily adapted to cover additional concepts and domains. In fact, the methodology can generate similar resources by simply using different domain corpora and external knowledge bases as input (see Fig. 3). Moreover, the methodology can be used to provide both domain distributional information and OWL representations for semantic networks other than OntoSenticNet, such as DBpedia and WebIsADB [17].
DomainSenticNet can be enhanced as a dynamic resource 8 in two ways: 1. by integrating significant variations in the concept collections and domain distribution of occurrences and co-occurrences linked to future releases of the domain corpora and external knowledge bases; and 2. by including timestamps (e.g., campaign start times) of the domain corpora (e.g., dumps of Kickstarter campaign URLs 9 ) or other references to specific time in a temporal dimension in domain distributional information.
-It leverages DomainSenticNet to further tune (for a given domain of interest) the OKR scale for the interpretations of the emotional intensities.
Finally, the application compares the computed NeedIndexes with the average of the corresponding indexes of the successful "mobile games" campaigns during the past 3 seasons (see Fig. 10, parts B and C). Therefore, in this application, Need-Indexes are used both to train the model for campaign success forecasting and to provide highly interpretable explanations of the prediction outcomes. NeedIndexes are thus effective indicators used by the application to suggest actions to be performed on the textual descriptions to refine the emotional intensities expressed with respect to influencing factors (i.e., clusters). Using DomainSenticNet, the application can also provide a domain adaption (at a cluster level) of the NeedIndex OKR scales of interpretation, whereby the resulting states of emotional intensities are calibrated with respect to the domainOccScore (defined in Section 3) for the "mobile games" domain.
To convey the previously mentioned calibration of the OKR scales, Fig. 11 Figure 10 part C shows both the "domain adapted Need-Index bounds" and "domain relevance." The domain-adapted emotional intensity states reflect both the average emotional intensity and the domain relevance for successful and unsuccessful campaigns, respectively.
In the "education" cluster 14 , the medium emotional intensity state produced lower values for two main reasons: i) in the considered Kickstarter dataset, the emotional intensities provided for the corresponding influential factors in Fig. 10 A screenshot of the GameOn user interface the "mobile games" domain were lower than the average observed over the previous three seasons with respect to other aspects; and ii) the average domainOccScore of the corresponding aspects indicated a lower domain pertinence.

Related Works
In this paper, we have presented DomainSenticNet as a resource to extend OntoSenticNet, a state-of-the-art commonsense ontology [14].
OntoSenticNet is an ontological representation of Sentic-Net [11], which is a resource resulting from the combined application of symbolic and sub-symbolic artificial intelligence methodologies to automatically discover conceptual primitives from text and link them to commonsense concepts and named entities. SenticNet includes the definition of 100K concepts (called SenticConcept). 15 Each Sentic-Concept (see Fig. 1 for a visual representation of the concept "apple") is defined by: i) a multiword expression; ii) the weights for the four dimensions of the Hourglass of Emotions model [29] (i.e., pleasantness, attention, sensitivity, and aptitude); iii) primary and secondary mood labels (e.g., "#interest,""#admiration"); iv) a polarity score; and v) a collection of five semantically related SenticConcepts.
OntoSenticNet is an ontological definition of the semantic network induced by the 100K SenticConcepts. Its main characteristic is its ability to provide a precise conceptual hierarchy, including associated concepts and sentiment values. Hence, OntoSenticNet is a preferential resource for developing state-ofthe-art applications of sentiment analysis based on SenticNet.
In recent years, SenticNet and OntoSenticNet have represented important research developments. In particular, the findings from Cambria's research group have enabled a novel interdisciplinary field of research known as sentic computing [7]. Within sentic computing, many successful investigations have generated novel insights in the domains of knowledge representation [2], deep learning-based ABSA [24], business intelligence [19], social media marketing [6], recommender systems [3], and financial forecasting [38], to name only a few.
In the remainder of this section, we summarize the relevant literature pertaining to the key aspects of the definition and construction of the DomainSenticNet resource.
In constructing the proposed resource, with the aim of collecting neighborhood semantically related concepts from external knowledge graphs, we applied basic graph mining techniques (as described in Section 3.1). In general, the task of collecting semantically related concepts from affordable or noisy automatically acquired external knowledge graphs can be performed by sophisticated approaches (e.g., [26]). As an example (see [27] for a recent survey), the authors of [30] experimented with similarity expansion-based techniques and obtained high levels of efficiency and precision with regard to the task of extending new concepts in a given knowledge base.
As already mentioned, the backbone of DomainSentic-Net is the OntoSenticNet ontological description of Sentic-Net. One of the key characteristics of SenticNet is that all concepts are defined with valued attributes derived from the Hourglass of Emotions model [9]. 16 Therefore, Sentic-Net is considered an appropriate knowledge base for the development of human interpretable sentiment analysis approaches.
The availability of the above-mentioned resources is beneficial for all ontology-driven sentiment analysis (ODSA)based applications. Specifically, the authors of [4] recently surveyed studies applying ODSA to customer reviews. Furthermore, as an example of an ODSA-based approach, the authors of [25] presented a hybrid solution for sentence-level ABSA using a lexicalized domain ontology in combination with neural attention networks.
Researchers in this field are also exploring the creation of new resources to be leveraged in ODSA-based applications.
As an example, in [23], the authors presented a methodology to extend ontologies in the "Materials Science" domain. The presented approach leveraged the titles and abstracts of 600 domain publications and complemented a given ontology with additional concepts and axioms by means of a phrase-based   16 A recent model revision is described in [35] 15 SenticNet 6 has recently been released. This updated resource now contains 200K concepts [8] topic model approach. In a similar direction, the authors of [39] proposed the addition of SOBA-a semiautomated methodology to generate ontologies-to ODSA applications. In contrast to the works mentioned above, our methodology (see Section 3) is unsupervised and can be easily adapted to include other external knowledge bases and multiple domain corpora. In this way, our approach automatically generates a high coverage of domain-relevant concepts (not included in OntoSenticNet) and related distributional information for an arbitrarily defined set of domains of interest. Additionally, in the present paper, we discuss a real application that benefited from the availability of Domain-SenticNet, in terms of both sentiment analysis performance and ease of interpretation (see Section 4).
As discussed in the Introduction (see Section 1), Domain-SenticNet is suitable for use in domain-aware sentiment analysis applications. Such applications have recently been improved due to advancements in semisupervised learning [15] and, more specifically, in semisupervised learning for social data analysis [5,20]. Researchers are experimenting with semisupervised learning as a potentially more robust solution to problems such as word polarity disambiguation [37] and the extraction of actionable information from unstructured text [21]. As an example, in [22], the authors presented a deep learning approach named ConvNet − SVM BoVW for fine-grained sentiment analysis. The model combined textual and visual features built on a convolution neural network (ConvNet) enhanced with the contextual scoring mechanism of SentiCircle [31]. The proposed model performed sentiment polarity classification with 91% accuracy. Moreover, in [1], the authors recently provided a stacked ensemble-based methodology to assess the emotional intensities in texts related to a general domain and performed sentiment analysis in the financial domain. With respect to the two above-mentioned studies, and in line with the findings of [33], the distributional information of DomainSenticNet may be coupled with contextual semantic features to address the problem of word polarity disambiguation. Finally, our resource may also be leveraged to improve the interpretability and explainability of sentiment analysis outcomes (see Section 4, in which we discuss these two properties through a real application).

Conclusions
This paper has presented DomainSenticNet-a resource that extends the OntoSenticNet commonsense ontology with: i) additional related concepts harvested from external knowledge bases and ii) distributional information on the occurrences and co-occurrences of each OntoSenticNet concept and related concepts in domain corpora. The paper also describes the methodology we adopted to generate DomainSenticNet. This methodology can be easily adapted to process different domain corpora and external knowledge bases to generate domain-aware resources similar to ours and to extend semantic networks other than OntoSenticNet. Therefore, this methodology also enables the computation of domain-adapted scales of interpretation to benchmark domain ABSA application outcomes (as shown in Section 4).
To provide a concrete example of the benefit of Domain-SenticNet to a variety of applications, we described a prototype tool for successful Kickstarter campaign authoring and campaign success prediction. Specifically, we discussed the high human interpretability level of both the prediction outcomes and the changes suggested for campaign descriptions to improve the likelihood of success. Moreover, the domain distributional information provided by DomainSenticNet enables it to produce domain-adapted scales of interpretation for predictive features at the level of influencing factors.
Regarding resource dynamicity (discussed in Section 3.5), we identify two opportunities: i) integrating updated releases (including new portions of the domain corpus) and ii) extending the current DomainSenticNet ontology schema with the inclusion of a time dimension. Additional dynamicity can be further leveraged by means of applying the proposed methodology (see Section 3) to other application-specific corpora. For instance, in the e-commerce domain, product and service reviews can be leveraged to capture the dynamics and trends of emotional intensities within customer opinion statements. Therefore, DomainSenticNet provides a basis for further interdisciplinary research within behavioral economics, applied data sciences and applied mathematics, with the aim of increasing the resource "dynamicity" to apply to an unlimited range of applications.
Additionally, to address the above-mentioned interdisciplinary investigations, we aim to study the effectiveness of causal inference approaches such as the DoWhy [32] framework. The DoWhy framework can be leveraged to gain insight into causeand-effect relationships when domain adaption is applied. Such insights can then support the development and the interpretation of calculated domain-aware emotional intensity weights. Specifically, we are interested in the ability of the DoWhy approach to identify the correlation magnitude of unexploited features in classification models [28], thus enabling, for example, the magnitude of missing domain concepts to be determined. 17 The current version of DomainSenticNet does not include sentiment polarities for ExternalConcepts; instead, it references OntoSenticNet for SenticConcept sentiment polarities. Therefore, other possible future research might aim at "propagating" the Hourglass of Emotions dimension weights and polarities to a collection of added external concepts. In addition, similar to [8,11], our resource opens an avenue for further research on the generation of contextual domain embeddings in deep neural network-based applications. Finally, as discussed in Section 5, approaches such as [1,21,22] can leverage DomainSenticNet as an effective resource to improve the interpretability and explainability of domain-aware sentic applications.