Word-level human interpretable scoring mechanism for novel text detection using Tsetlin Machines

Bhattarai, Bimal; Granmo, Ole-Christoffer; Jiao, Lei

doi:10.1007/s10489-022-03281-1

Word-level human interpretable scoring mechanism for novel text detection using Tsetlin Machines

Open access
Published: 02 April 2022

Volume 52, pages 17465–17489, (2022)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

Word-level human interpretable scoring mechanism for novel text detection using Tsetlin Machines

Download PDF

1616 Accesses
5 Altmetric
Explore all metrics

Abstract

Recent research in novelty detection focuses mainly on document-level classification, employing deep neural networks (DNN). However, the black-box nature of DNNs makes it difficult to extract an exact explanation of why a document is considered novel. In addition, dealing with novelty at the word level is crucial to provide a more fine-grained analysis than what is available at the document level. In this work, we propose a Tsetlin Machine (TM)-based architecture for scoring individual words according to their contribution to novelty. Our approach encodes a description of the novel documents using the linguistic patterns captured by TM clauses. We then adapt this description to measure how much a word contributes to making documents novel. Our experimental results demonstrate how our approach breaks down novelty into interpretable phrases, successfully measuring novelty.

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Article 27 November 2023

Testing of detection tools for AI-generated text

Article Open access 25 December 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The fundamental principle underlying machine learning classifiers is a generalization – the ability to form a decision boundary that differentiates new input into known classes. When training a supervised classifier, it is common to assume that the classes to be recognized are present both in the training and test data [49]. However, given an open world, training on all conceivable classes of input is impractical. This problem introduces the need for novelty detection – the task of spotting input classes that one has not seen before. The problem is particularly severe in text-based supervised classification due to the many-faceted nature of natural language, which gives rise to multiple application-dependent interpretations. Indeed, researchers have for a long time tried to address novelty detection in natural language. So far, no single best model has appeared. Indeed, the success of each model relies on the properties of each particular dataset.

The problem of novelty detection arises in many tasks, such as fault detection [16] and handwritten alphabet recognition [54]. In general, one applies novelty detection when it is required to know whether a given input is similar to or significantly different from the training data. For natural language text, the novelty detector should discern that a text does not belong to a predefined set of topics. Several challenges make such novelty detection particularly difficult:

1.
Textual information tends to be diverse, composed of large vocabularies.
2.
Language and topics are typically evolving, making the novelty detection problem dynamic [21].

Lately, the aforementioned challenges have manifested when using supervised learning to build chatbots, an application area that is gaining traction. A chatbot typically needs to handle the language of a multitude of users with evolving information requirements. As such, it must be able to determine when it is capable of answering a query and when it faces a new topic.

Majority of the existing literature on text-based novelty detection addresses one of the following granularity levels:

1.
Event-level techniques [4] perform topic detection and tracking on a stream of documents.
2.
Document-level techniques [17] classify an incoming document as known or novel based on its content.
3.
Sentence-level techniques [6] look for novel sentences within a particular document.

Usually, the sentences/documents are ranked based on some sort of similarity score, obtained from comparing them with previously seen sentences/documents. For instance, the Maximal Marginal Relevance model (MMR) proposed in [14] assigns low scores to previously seen sentences/documents, while assigning high scores to novel ones.

Figure 1 illustrates the problem of novelty detection, contrasting it against anomaly and outlier detection. Anomaly detection [15] concerns discovering anomalies, which are invalid data points. Outlier detection [3, 29], on the other hand, flags legitimate data points that deviate significantly from the mean. Finally, novelty detection [43] is the discovery of entirely new types of data points.

In contrast to prior work, we here focus on novelty detection at the word level. To this end, we propose a new interpretable machine learning approach for calculating novelty scores for the words within a sentence. The calculation is based on the linguistic patterns captured by a Tsetlin Machine (TM) in the form of AND-rules (i.e., conjunctive clauses). To the best of our knowledge, this is the first study of its kind on this problem.

Problem definition

In the supervised classification setting, i pre-labeled data points D = {(v₁,y₁),(v₂,y₂),…, (v_i,y_i)} is used for training. Here, v_i is the i^th input example and y_i is its class. The input v_i is an t-dimensional real-valued vector $(x_{1}, x_{2}, \ldots , x_{o}) \in \mathbb {R}^{t}$, where x_o refers to the o^th element of the vector. The class y_i ∈ Y = {1,2,…,C_l}, in turn, is an integer class index referring to one out of C_l classes. Learning a classifier entails constructing a classification function f(v;D), $f: \mathbb {R}^{t} \rightarrow Y$, based on the data D. The function simply assigns a label y to the data point v. Our emphasis is novelty scoring, which can be seen as another function $z(v;D), z: \mathbb {R}^{t} \rightarrow \mathbb {R}$. The function computes a real-valued novelty score for input data point v, with the purpose of discerning new classes not found in Y. In this way, a classifier can return the correct class label while flagging novel examples. Considering each element in v to represent a specific word, this paper further extend the novelty detection by introducing a method for breaking down the overall score z(v;D) for v into the contribution of each element x_o. By doing so, we break down novelty into interpretable phrases.

Paper contributions

In this paper, we use the TM to construct conjunctive clauses in propositional logic. In this manner, we capture frequent patterns in the data D, which we then utilize to characterize the known classes Y comprehensively. The novelty score is then calculated based on examining the clauses that match the given input. By further looking into the composition of each clause, we are able to break down the novelty score into the contribution of the different phrases. This decomposition is based on training clauses for the novel data and then measuring the relative frequency of each word inside the clauses for the known classes, contrasted against the relative frequency obtained from the clauses of the novel class. These scores can, in turn, be adopted as input features to machine learning classifiers for novelty detection. Similarly, contextual scores can be calculated simply by inspecting each word’s clauses, providing a local perspective for both novel and known classes.

The remainder of the paper is organized as follows. In Section 2, we first summarize related work before we present the details of the TM in Section 3. This forms the basis for our novelty description architecture, covered in Section 4. In Section 5, we present our empirical results, concluding the work in the last section.

2 Related work

Several studies have been carried out on supervised multiclass classification in a closed-world setting [5]. There is a dearth of work addressing open-world settings [33], with distance-based methods being one of the earliest approaches [28]. These approaches rely on nearest neighbor search, which introduces scalability issues when dealing with larger datasets. Another class of methods are based on single-class classifiers. These include One-Class SVM [50] and SVDD [55]. Further, the decision score from SVM has been used to produce a probability distribution for novelty detection [44]. As no negative training samples are used, single-class classifiers struggle with maximizing the class margin. To overcome the problem of One-Class SVMs, a new learning method named center-based similarity space (CBS) was proposed in [20], which transforms each document in a closed boundary to a central similarity vector that can be used in a binary classifier.

Probabilistic methods have also been utilized for novelty detection [43]. In [30], a technique to threshold the entropy of the estimated class probability distribution is proposed. In that method, choosing the entropy threshold needs prior knowledge. Additionally, the class probability distribution can be misleading when novel data points fall far from the decision boundary. In [32] and [46], an active learning model is proposed to both discover and classify novel classes during training. However, the appearance of novel instances during testing is not considered.

DNNs have recently been used to address the problem of novelty detection. In [61], a two-class SVM classifier is adopted to categorize known and novel classes. An adversarial sample generation (ASG) framework [23] is used to generate positive and negative samples. Similarly, [37] employs generative adversarial networks (GANs), where the generator produces a mixture of known and novel data. The generator is trained with so-called feature matching loss, and the discriminator performs simultaneous classification and novelty detection. In computer vision, the problem of novel image detection is addressed by introducing the concept of open space risk [49]. This is achieved by reducing the half-space of a binary SVM classifier with two parallel hyperplanes that bound the positive region. Although the binary SVM reduces the positive region to half-spaces, their open space risk is still infinite. In [5], a method called OpenMAX is proposed, which estimates the probability of an input belonging to a novel class. In general, the major weaknesses of these methods are high computational complexity and uninterpretable inference. A state-of-the-art GAN-based method for unsupervised outlier detection called Single-Objective Generative Adversarial Active Learning (SO-GAAL) and Multi-Objective Generative Adversarial Active Learning (MO-GAAL) was proposed in [41]. The method is based on a min-max game between a generator and a discriminator. The training process of the generator is paused before convergence to synthesize outliers, which is subsequently used to train the discriminator to recognize the outliers. However, the method is primarily designed for high-dimensional data, requiring extensive problem-specific hyperparameter tweaking. The unsupervised learning method COPOD [40] is a more recent approach that is inspired by copulas for modeling multivariate data distributions. In comparison to other methods, COPOD is computationally efficient, interpretable, and is unaffected by feature dimension. However, the method fails to handle complex features and intricate nonlinear relations.

Apart from the studies on the document-level novelty detection, novelty detection at the event level arises from topic detection, which focuses on the online event and story detection [38]. The study at the event level primarily consists of clustering algorithms that measure the closeness of incoming events or stories to one of the clusters depending on a pre-defined threshold. Novelty detection at the sentence level was investigated in Text Retrieval Conferences (TREC) by highlighting sentences that include novel information given a topic and a list of documents [52]. Based on TREC, many studies have been conducted on novelty detection at sentence level [56, 63], including term translations, Principal Component Analysis (PCA) vectors, Support Vector Machine (SVM) classification, named entities patterns, etc. Likewise, a few approaches have been introduced for learning sentence embeddings, including SkipThought [36], Conceptual Sentence Embedding [58], and FastSent [31]. However, these approaches on embeddings are very dependent on the domain-specific downstream tasks. Recently introduced powerful language models, such as ELMo [42] and BERT [18], have been successful for transfer learning and they are able to learn dynamic sentence embedding in an unsupervised manner.

In [22], a unified attention architecture is proposed to deal with vector representations of text input in NLP. The authors investigate how information can be retrieved from attention in NLP. Further, [51] checks whether the attention weights provide any interpretability by manipulating the weights in pretrained text classification models. They used an intermediate representation erasure method to demonstrate that attention weights are unreliable predictors of the relative significance of the specific input. They thus do not accurately explain the model’s decision-making. Additionally, [53] employed a novel approach for visualizing the attention score for each token. This is the first study on interpretability analysis by visualizing and scoring at the word level. However, as explained in [34], the scoring acquired using attention methods does not provide a meaningful explanation. A more advanced scoring method known as Masked Language Model (MLM) [48] uses pretrained MLM to score sentences using pseudo-log-likelihood scores (PLLs), which involves masking each token one by one. The method becomes unsuitable for scoring the entire tokens of the dataset as the computational complexity rises. Likewise, recent keyword extraction (KE) algorithms such as YAKE [13] and KeyBERT [26] are also used to extract the top-scoring tokens from the trained model. To the best of our knowledge, in novelty detection, there exists no such method to measure each word’s contribution to the novelty. In this study, we expand the study on novelty detection with a method for scoring each word’s contribution to the overall novelty, which offers a clear view to the researchers for the reasoning and the interpretation of the results that the algorithm offers.

3 Tsetlin machine (TM) architecture

The TM, proposed in [24], is a recent approach to pattern classification, regression, and novelty detection [1, 8, 25]. It captures the frequent patterns of the learning problem using conjunctive clauses in propositional logic. Each clause is a conjunction of literals, where a literal is a propositional/Boolean variable or its negation. Recent research reports that the TM performs competitively with state-of-the-art deep learning networks in text classification [7, 47, 59, 60] along with parallel and asynchronous architecture [2] for faster learning across diverse tasks. Further, theoretical studies have uncovered robust convergence properties [35, 62].

A basic TM accepts a vector $X= (x_{1}, \ldots , x_{o}) \in \{0,1\}^{o}$ of o Boolean features as input. For text input, it is typical to booleanize the text to form a Boolean set of words, as suggested in [7]. The input features, together with their negated counterparts, $\bar {x} = \lnot x = 1-x$, form a literal set L = {x₁,…,x_o,¬x₁,…,¬x_o}. For classification problems, the sub-patterns associated with the classes are captured by the TM using m conjunctive clauses $C_{j}^{+}$ or $C_{j}^{-}$. The j = 1,…,m/2 subscript denotes the clause index, while the superscript indicates the polarity of a clause. In brief, half of the clauses are assigned positive polarity, i.e., $C_{j}^{+}$, and the other half are assigned negative polarity, i.e., $C_{j}^{-}$. The positive polarity clauses vote for the input belonging to the class favored by the TM, while the negative polarity clauses vote against that class, that is, for other classes.

A clause $C_{j}^{\xi }, \xi \in \{-,+\},$ is formed by ANDing a subset $L_{j}^{\xi } \subseteq L$ of the literal set. That is, the set of literals for clause $C_{j}^{\xi }$ with polarity ξ can be written as:

$$ C_{j}^{\xi} (X)=\bigwedge_{l \in L_{j}^{\xi}} l = \prod\limits_{l \in L_{j}^{\xi}} l. $$

(1)

The clause evaluates to 1 if and only if all the literals of the clause also evaluate to 1. For example, the clause $C_{j}^{\xi }(X) = x_{1} x_{2}$ consists of the literals $L_{j}^{\xi } = \{x_{1}, x_{2}\}$ and outputs 1, if x₁ = x₂ = 1. The final classification decision is obtained by subtracting the negative votes from the positive votes, and then thresholding the resulting sum using the unit step function u:

$$ \hat{y} = u\left( \sum\limits_{j=1}^{m/2} C_{j}^{+}(X) - \sum\limits_{j=1}^{m/2} C_{j}^{-}(X)\right). $$

(2)

For example, the classifier $\hat {y} = u(x_{1} \bar {x}_{2} + \bar {x}_{1} x_{2} - x_{1} x_{2} - \bar {x}_{1} \bar {x}_{2})$ captures the XOR-relation.

For learning, the TM employs a team of Tsetlin Automata (TA), one TA per literal l ∈ L. Each TA performs one of two actions: either include or exclude its designated literal. Each clause statistically forwards the feedback to its individual TA. The TM employs Type I and Type II feedback. These feedback types control the reward, penalty or inaction received by TAs depending on six factors: (1) target output (y = 0 or y = 1), (2) clause polarity, (3) clause output (C_j = 0 or 1), (4) literals value (x = 1, or ¬x = 1), (5) vote sum, and (6) the current state of the TA. Type I feedback is designed to produce frequent patterns, while Type II feedback increases the discriminating power of the patterns (see [25] for details). The feedback guides the complete system of TAs towards a Nash equilibrium. At any point in the training process, we have m conjunctive clauses per class, half of them positive and half of them negative. These can be retrieved and deployed upon completion of training.

4 Novelty description

By novelty description, we mean the task of characterizing novel textual content at the word level. For instance, the known content may be reviews of mobile phones, while the novel content could be reviews of grocery stores. For this example, one may define the novel content using words associated with grocery stores. However, describing novelty at the word level is nontrivial because the meaning of words varies depending on the context they appear in. For example, consider the word “bat”. This word typically manifests in two distinct contexts- it can denote either “animal” or “sports”. Likewise, the word “bank” can refer to “river bank” or “cash bank”. That is, when contextual meaning is considered, the novelty of the word “bat” and “bank” can be different based on their respective uses. As a result, measuring and describing novel content is a challenging problem.

In general, one can detect and characterize novel content by contrasting against the probability of observing textual content X, given that the content is known. We denote this probability distribution by p_known(X). Assume that the corresponding probability distribution p_novel(X) for novel content also is available. Then, the optimal novelty detection test for a given false positive rate (α) can be obtained by thresholding the likelihood ratio p_novel(X) / p_known(X) [39].

Since neither p_known(X) or p_novel(X) are available to us, we must estimate them using training examples. Inspired by the work in [9] on Semi-Supervised Novelty Detection (SSND), we use two sets of examples. One set represents known content, while the other represents novel content. We obtain these sets by employing a binary classifier that can distinguish between known and novel content, such as the one we proposed in [8].

4.1 Identifying novel word candidates

In our approach, we begin by training a TM on input texts represented as Boolean bag-of-words, i.e., as word sets. A propositional variable represents each word in the vocabulary, capturing the presence/absence of the corresponding word in the input text. We group the texts into two classes, Known and Novel. The first represents known content, and the second represents novel content. Our task is to describe how the second group of text is novel at the word level. To this end, we begin by identifying novel word candidates, followed by scoring and ranking the words based on their contribution to novelty.

Figure 2 shows our architecture for identifying novel word candidates. As seen, upon training, we obtain the clauses of the two classes, Known and Novel. We extract all the words included in the clauses for each class. Each clause contains a combination of both plain ($\mathcal {P}_{L}$) and negated ($\mathcal {N}_{L}$) words. As such, the plain and the negated words serve two different roles. The plain words characterize the corresponding class, while the negated words characterize the other class. We exploit this property as follows, building two bag-of-words (BOW). The first is a bag of known words, referred to as ${\mathscr{B}}_{\mathcal {K}}$, and the second is a bag of novel words, referred to as ${\mathscr{B}}_{N}$.

For class Known, we perform the following procedure:

We consider the words included in positive clauses first. Here, the plain words $\mathcal {P}_{L}$ are added to the bag of known words ${\mathscr{B}}_{K}$, while the negated words are placed in the bag of novel words ${\mathscr{B}}_{N}$.
For negative clauses, we do the opposite. The plain words $\mathcal {P}_{L}$ are added to the novel words bag ${\mathscr{B}}_{N}$. The negated words $\mathcal {N}_{L}$, on the other hand, are added to the known word bag ${\mathscr{B}}_{K}$.

The above procedure is inverted for class Novel:

For the positive clauses, the plain words $\mathcal {P}_{L}$ are added to the novel word bag ${\mathscr{B}}_{N}$, while the negated words are added to the known word bag ${\mathscr{B}}_{K}$.
Conversely, for the negative clauses, the plain words are added to ${\mathscr{B}}_{K}$, characterizing the known class, while the negated words $\mathcal {N}_{L}$ are added to ${\mathscr{B}}_{N}$.

4.2 Scoring word novelty

With the word bags ${\mathscr{B}}_{K}$ and ${\mathscr{B}}_{N}$ available, we calculate novelty scores at the word level as follows. From the unique words in the bags ${\mathscr{B}}_{K}$ and ${\mathscr{B}}_{N}$, we produce two corresponding word sets, $\mathcal {S}_{K}$ and $\mathcal {S}_{N}$. Assume these respectively contain K and N unique words:

$$ \begin{array}{@{}rcl@{}} \mathcal{S}_{K} &=& \{s_{1}, s_{2},\ldots, s_{k},\ldots, s_{K}\}, \\ \mathcal{S}_{N} &=& \{s_{1}, s_{2}, \ldots, s_{n},\ldots, s_{N}\}. \end{array} $$

(3)

Here, s_k represents a specific word in the set $\mathcal {S}_{K}$, while s_n represents a specific word in the set $\mathcal {S}_{N}$.

We next estimate the occurrence probability $p_{s_{i}}$ of each word s_i in $\mathcal {S}_{K}$, from the known class. The estimate is based on the relative frequency of s_i in the word bag ${\mathscr{B}}_{K}$ as given by (4):

$$ p_{s_{i}}^{\mathcal{K}} = \frac{\mathcal{F}_{i}^{\mathcal{K}}}{{\sum}_{k=1}^{K} \mathcal{F}_{k}^{\mathcal{K}}}. $$

(4)

Here, $\mathcal {F}_{i}^{\mathcal {K}}$ is the frequency of word s_i in ${\mathscr{B}}_{K}$, i.e., the number of times that word s_i has the appropriate role in one of the clauses (as defined in the previous section). To prevent infinite or zero scores, we assume that every word has a minimum frequency of 1. In the following, we denote the set of relative frequencies for the words from ${\mathscr{B}}_{K}$ by $p_{\mathcal {K}}$, while $p_{\mathcal {N}}$ is the set of relative frequencies for the words from ${\mathscr{B}}_{N}$, as captured by (5):

$$ \begin{array}{@{}rcl@{}} p_{\mathcal{K}}&=& \{p_{s_{1}}^{\mathcal{K}}, p_{s_{2}}^{\mathcal{K}}, \ldots, P_{s_{K}}^{\mathcal{K}}\},\\ p_{\mathcal{N}}&=& \{p_{s_{1}}^{\mathcal{N}}, p_{s_{2}}^{\mathcal{N}},\ldots, p_{s_{N}}^{\mathcal{N}}\}. \end{array} $$

(5)

The calculation of the novelty score for each word depends on whether $s_{i} \in \mathcal {S}_{K}$, $s_{i} \in \mathcal {S}_{N}$, or both, as shown in (6):

$$ \mathit{Score}(s_{i}) =\begin{cases} \frac{p_{s_{i}}^{\mathcal{N}}}{p_{s_{i}}^{\mathcal{K}}}&\text{if } s_{k} \in \mathcal{S}_{K} \cap \mathcal{S}_{N},\\ 0&\text{if } s_{k} \in \mathcal{S}_{K} \setminus \mathcal{S}_{N},\\ \infty&\text{if } s_{k} \in \mathcal{S}_{N} \setminus \mathcal{S}_{K}. \end{cases} $$

(6)

Here, $p_{s_{i}}^{\mathcal {N}}$ and $p_{s_{i}}^{\mathcal {K}}$ denote the estimated occurrence probabilities of the word s_i from $p_{\mathcal {N}}$ and $p_{\mathcal {K}}$, respectively. The score defines how much a word contributes in a sentence/document to make it novel. That is, a higher score signals higher novelty and vice versa. Figure 3 shows the resulting TM-based architecture and flow of information for the above scoring approach.

Additionally, we also propose a contextual scoring approach to capture multiple word meanings determined by context. We presume that words that appear in the same clause are related semantically, and accordingly, we use clause co-occurrence of words to measure semantic relations. The intent is to differentiate between, for example, the meaning of “apple” in “apple phone” and the meaning of “apple” in “apple fruit”. We achieve this through leveraging clauses that capture “apple” and “phone” in combination with other clauses that capture “apple” and “fruit”.

The scoring is again performed in two steps:

1.
Rather than measuring the frequency of individual words, we now measure frequency of co-occurrence among the TM clauses. For instance, let us consider the word pair (s₁,s₂) and novel class, associated with a total number of m clauses. The frequency of the word pair occurring together in the clauses is then given as:
$$ p_{s_{1}, s_{2}}^{\mathcal{N}}=\frac{\mathcal{F}_{s_{1},s_{2}}^{\mathcal{N}}}{m}. $$
(7)
Here, $F_{s_{1}, s_{2}}^{\mathcal {N}}$ is the number of times the word pair occur together across the m clauses of the novel class.
2.
Finally, the contextual score for the word pair (s₁,s₂) in class Novel can be defined as:
$$ \mathit{Score}_{\mathit{context}}^{\mathcal{N}}(s_{1}, s_{2}) = \frac{p_{s_{1}, s_{2}}^{\mathcal{N}}}{p_{s_{1}}^{\mathcal{N}} \times p_{s_{2}}^{\mathcal{N}}}. $$
(8)
Above, $p_{s_{2}}^{\mathcal {N}}$ and $p_{s_{1}}^{\mathcal {N}}$ are the individual frequencies of each word across the novel clauses, from the previous subsection.

Notice how the above score increases with lower individual frequencies and higher joint frequency, measuring dependence over the clauses. In the same way, we can calculate dependence over the clauses for the known class as well.

4.3 Case study

We now demonstrate our novelty description approach, steb-by-step, using two example sentences from the sports domain. For illustration purposes, we consider the class Cricket to be Known and the class Rugby to be Novel.

Class : Cricket (Known) Text: England won the cricket match by hitting six in the last ball. Words: “England”, “won”, “cricket”, “match”, “hit”, “six”, “ball”.
Class: Rugby (Novel) Text: England won the rugby match despite using old ball. Words: “England”, “won”, “rugby”, “match”, “despite”, “old”, “ball”.

We first create the set of 10 unique words W = {“England”, “won”, “cricket”, “match”, “hit”, “six”, “ball”, “rugby”, “despite”, “old”} from the words in the two sentences, each with a unique index o. From this set, we produce the input feature vector for the TM, X = [x₁,x₂,…,x₁₀]. Each propositional input x_o in X refers to a particular word. Jointly, the propositional inputs are used to represent an input text. If a word w_o ∈ W is present in the document, the corresponding propositional input x_o is set to 1, otherwise, it is set to 0.

After TM training, we obtain a set of clauses, as examplified in Table 1. The clauses $(C_{1}^{+})_{\mathcal {K}}$, $(C_{2}^{+})_{\mathcal {K}}$, $(C_{1}^{-})_{\mathcal {N}}$, $(C_{2}^{-})_{\mathcal {N}}$ vote for class Known, while $(C_{1}^{-})_{\mathcal {K}},~(C_{2}^{-})_{\mathcal {K}}$, $(C_{1}^{+})_{\mathcal {N}}, (C_{2}^{+})_{\mathcal {N}}$ vote for class Novel. These clauses are then used to produce two bag-of-words, ${\mathscr{B}}^{K}$ and ${\mathscr{B}}^{N}$. All the plain words in $(C_{1}^{+})_{\mathcal {K}}$, $(C_{2}^{+})_{\mathcal {K}}$, $(C_{1}^{-})_{\mathcal {N}}$, $(C_{2}^{-})_{\mathcal {N}}$ are placed in ${\mathscr{B}}^{K}$, while all the negated words are placed in ${\mathscr{B}}^{N}$. Since none of the words are negated in the clauses, we now have ${\mathscr{B}}^{K}$ = (“England”, “cricket”, “match”, “hit”, “six”, “cricket”, “six”, “cricket”, “won”, “six”, “ball”, “cricket”, “hit”, “six”). Correspondingly, all the plain words in $(C_{1}^{-})_{\mathcal {K}},~(C_{2}^{-})_{\mathcal {K}},~(C_{1}^{+})_{\mathcal {N}}, (C_{2}^{+})_{\mathcal {N}}$ are placed in ${\mathscr{B}}^{N}$, while all the negated words are placed in ${\mathscr{B}}^{K}$.

Table 1 Clauses with conjunctive word patterns for known and novel class

Word-level human interpretable scoring mechanism for novel text detection using Tsetlin Machines

Abstract

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Testing of detection tools for AI-generated text

1 Introduction

Problem definition

Paper contributions

2 Related work

3 Tsetlin machine (TM) architecture

4 Novelty description

4.1 Identifying novel word candidates

4.2 Scoring word novelty

4.3 Case study

5 Results and discussions

5.1 Baseline

5.1.1 Attention mechanism

5.1.2 Term frequency-inverse document frequency (TF-IDF)

5.1.3 Keyword extraction algorithms

5.2 Evaluation measures

5.3 BBC sports dataset

5.4 20 newsgroups dataset

5.5 Contextual scoring

6 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation