Introduction

Text semantic representation is one of the core contents of natural language processing and plays an indispensable role in different applications, such as text classification [1], sentiment analysis [2] and information extraction [3]. Existing text semantic representation models can be divided into vector space models [4] and neural net-based methods. The former includes the latent semantic analysis model (LSA) [5] and latent Dirichlet allocation (LDA) [6], and the latter has word2vec [7] and doc2vec [8] methods. However, vector space model and traditional topic model method cannot model the contextual semantic information of text with high-precision. Although neural network methods have achieved relatively better precision, but their interpretability is very poor, which seriously limits its application scope.

In recent years, knowledge graph technology has been widely introduced in the field of text analysis. A knowledge graph is a tool of describing knowledge and modelling the relationships between things based on a graph structure [9], which has shown strong practical value in intelligent question answering [10, 11], natural language understanding [12], big data analysis [13,14,15], interpretability enhancement of machine learning [16], semantic search [17, 18], etc. When using knowledge graph to model text semantic information, entities in the text are represented as nodes in network, and edges represent the relationships between entities. In traditional knowledge graph construction methods, no matter for handcrafting rule methods [19] or deep learning methods [20], knowledge relationships among knowledge concepts are all considered to own extra semantic labels (like 'belong to', 'is a', 'located at', etc.). However, the above assumption is quite different from the basic computing process of our brain neural network, that is, there should be no label difference for different neural connections of human brain neurons. So, existing knowledge graph model will be not general for some text semantic analysis applications such as error detection in text writing.

For error correction, there are commonly two types of errors: word spelling errors, and grammatical or syntactic errors. Word spelling error correction methods are mainly based on word dictionaries, generally without considering whether the contextual semantic relations of words are reasonable. In contrast, grammatical, or syntactic error correction is relatively complex, and it is also a difficult issue in high-quality text writing even for people. Wherein four types of problems can be concluded including redundant words, missing words, word selection errors, and word ordering errors [21]. Such errors can only be found by means of semantic analysis on text context. However, current Chinese text error correction methods mainly focus on spelling error corrections, and few semantic error corrections.

Based on the above motivations, inspired by the associative computing mechanism of human brain, we propose a new model, associative knowledge network model, which uses the neighbour relationships between noun entities in the text to model the semantic relationships. Different from existing knowledge graph construction methods, semantic labels among knowledge relationships are no longer specifically considered in our proposed model, but only one type of relationship between knowledge concepts is considered, that is, a unified associative relationship with strength. And then, the performances of the new model are studied by solving the problem of checking the semantic coherence of noun context. That is a new text error detection method in word granularity and its main function is to detect and locate those noun entities with inconsistent contextual semantic in texts.

The main contributions of this study can be summarized as follows:

  1. (1)

    An interpretable text semantic representation model of noun context, named as associative knowledge network, is proposed, in which an improved associative strength computing equation is newly designed.

  2. (2)

    An interpretable method for checking the semantic coherence of noun context is designed by taking the learned associative knowledge network as background knowledge network. New method can realize the Chinese text word error detection using multilevel contextual word semantic relations.

  3. (3)

    The experimental results indicate that, the proposed method has not only good interpretability but also excellent detection performance compared to latest state-of-art neural network methods.

Related work

Text semantic representation modelling

Recently, great progress has been made in research work related to the modelling of text semantic representation. In the technical aspect based on knowledge graph, Etaiwi et al. [22] proposed a graph-based semantic representation model for Arabic text. The core idea is to use predefined rules to identify the semantic relationship between words and build the final semantic graph. Wei et al. [23] proposed a multilevel text representation model within background knowledge, which captures the semantic content of the text at three levels, machine surface code, machine text base and machine situational model. Furthermore, external background knowledge is introduced to enrich the text representation so that the machine can better understand the semantic content in the text. Geeganage et al. [24] proposed a semantic-based topic representation using frequent semantic patterns, and in new method the text semantic can be captured by matching the words in each topic with concepts in the Probase ontology.

In recent years, in addition to using knowledge graph to model text semantic representations, an increasing number of researchers have devoted themselves to the study of text semantic representations combined with deep neural networks to extract deeper text semantic features. Chen et al. [25] proposed a neural knowledge graph evaluator to effectively predict the reliability of answers in an automatic question answering system, in which the prediction performance is mainly improved by jointly encoding structural and semantic features in a knowledge graph. Wang et al. [26] proposed a novel text-enhanced knowledge graph representation model. They introduced a mutual attention mechanism between the knowledge graph and text to mutually reinforce the relationship between knowledge graph representation and textual relation representation. Wang et al. [27] proposed a graph-based neural network model for early fake news detection based on enhanced text representations. They modelled the global pair-wise semantic relations between sentences as a complete graph, and learned the global sentence representations via a graph convolutional network with self-attention mechanism. Although deep neural networks have shown good advantages in the study of text semantic representation learning, one of their well-known problems is that their learning representation is difficult to interpret. Accordingly, Xie et al. [28] proposed a novel neural sparse topic model called semantic reinforcement neural variational sparse topic model for explainable and sparse latent text representation learning. Ennajari et al. [29] proposed a Bayesian embedded spherical topic model that combines both knowledge graph and word embeddings in a non-Euclidean curved space, the hypersphere, for better topic interpretability and discriminative text representation. These developments all enhance the interpretability of neural networks by adding interpretable semantic module to neural networks, but the whole models are still not completely interpretable.

When using a knowledge graph to model the semantic representation of text, the measurement of semantic relationships between knowledge is also an essential task. To compensate for the incomplete measurement ability of the co-occurrence frequency and mutual information method in quantifying the relevance relation between words, Zhong et al. [30] proposed a quantitative computing method for the relationship between words that integrates co-occurrence frequency and mutual information. Wang et al. [31] proposed a new semantic relationship measurement method according to the number of times and intensity of knowledge co-occurrence in the text. Li et al. [32] proposed a lightweight algorithm for learning word single-meaning embeddings to enhance the accuracy of semantic relatedness measurement by developing WordNet synsets and Doc2vec document embeddings.

Chinese text error correction

Chinese text error correction is an important technology for realizing automatic checking and error correction of Chinese writing. Its importance in the fields of automatic question answering systems [33], machine translation [34] and summary generation [35] is self-evident. To solve the problems of mismatching words and unsmooth context sentences in text paragraphs, many text error correction techniques have been developed.

Cui et al. [36] proposed a new pre-trained model called MacBERT that mitigates the gap between the pre-training and fine-tuning stage by masking the word with its similar word, which has proven to be effective on downstream tasks. Liu et al. [37] proposed a pre-trained masked language model with misspelled knowledge (PLOME) for Chinese spelling correction, which jointly learns how to understand text semantic and correct spelling errors. Zhang et al. [38] proposed a new neural network model based on BERT for Chinese spelling error correction, which consists of two networks respectively for error detection and error correction. This model can detect the correctness of every position of Chinese sentences, which is an effective application extension of the original BERT model.

The proposed method

Overview

This study proposes a new interpretable semantic representation model of text corpus, associative knowledge network model. And, the performance of the proposed model is studied by developing new method for checking the semantic coherence of noun context. The whole framework is divided into two parts. The one part is the modelling process of associative knowledge network. And another part is the process of checking the semantic coherence of noun context. The whole framework of this study can be concluded as the following Fig. 1.

Fig. 1
figure 1

The framework of this study

As shown in Fig. 1, the left part is the modelling realization of associative knowledge network. Wherein, the text corpus is first preprocessed to generate noun nodes in "Noun entity node creation". Next, the associative relationships between knowledge nodes are created in "Associative relationship creation", whose associative strength is computed in "Associative strength computation". Then the relationships with strength are incrementally updated to the network in "Incremental updating of associative relationship". Finally, the whole network is reduced and reconstructed to form constructed associative knowledge network. In addition, extra cycles can be performed to learn more texts. And in the right part, a novel interpretable method for the practical problem of checking semantic coherence of noun context is introduced. Here, an associative knowledge network constructed on given text corpus is firstly considered as a background knowledge network. Next, for a given document required to be checked, all noun words are all extracted in "Current document preprocessing", and their multilevel contextual relationships are extracted in "Multilevel contextual relationships acquiring". Then, a group of interpretable semantic features are computed according to the coupling degree from the prior knowledge network to the multilevel contextual relationships of current document in "Associative coupling degree computing". Finally, a classification method is employed to realize the non-coherence error detection of noun context.

Associative knowledge network modelling

Next, from the perspective of associative memory of human brain, the construction process of associative knowledge network will be described in details. Associative memory is a basic way of human brain thinking, which is a process of forming, deleting, and changing the relationship between information neurons. Accordingly, we consider that the main process of associative knowledge network modelling includes the creation of noun entity nodes and associative relationships, the computing of associative strength, and the incremental updating of associative relationship.

Noun entity node creation

The main function of this part is to extract noun entities from given text and to create their mapping as network nodes in associative knowledge network. First, noun words with certain conditions are extracted from the text corpus and then extracted noun entities are directly added as nodes in the associative knowledge network. Before extracting noun entities, the text corpus is preprocessed by word segmentation tools, including sentence extraction, Chinese word segmentation and part-of-speech tagging. Then, only noun entities are extracted from the results of word segmentation and part-of-speech tagging.

Associative relationship creation

Similarly, the main function of this part is to extract the associative relationships from given text and to create their mapping as network relationships in associative knowledge network. Concretely, according to extracted noun entities, the direct pointing relationships are directly created according to the front and back positions of noun entities in sentences. If entity \(a_{1}\) precedes entity \(a_{2}\) in a sentence, a direct associative relationship from \(a_{1}\) to \(a_{2}\) is created in the associative knowledge network. Here, we think that the entity appearing later in the same sentence is produced associatively by the previous entity, and only when there is a direct pointing relationship between two entities can there be a direct associative relationship. \(\left\langle {a_{1} ,a_{2} } \right\rangle\) is used to represent a directed direct associative relationship pair, which means that there is a directed edge at which \(a_{1}\) points to \(a_{2}\) in the associative knowledge network.

Associative strength computation

The main function of this part is to compute the associative strength when a new associative relationship is extracted. We consider that the associative strength between two adjacent knowledge nodes in an associative knowledge network is related to their co-occurrence times and co-occurrence positions in the text. In addition, a more reasonable computing method for direct associative strength between knowledge nodes is further designed by developing the quantitative computing method of semantic relevance relation given by Zhong et al. [30] and the definition of associative weight given by Wang et al. [31]. Concretely, we propose the following Definition 1.

Definition 1

In the text corpus with a given statistical window size, direct associative strength \(R_{ab}\) between any two noun entities \(a\) and \(b\) is defined as:

$$ R_{ab} = \log \frac{p(a,b)}{{p_{a} * p_{b} }}/\log \frac{2}{{p_{a} + p_{b} }} $$
(1)

\(p\left( {a,b} \right)\) in the above formula represents the neighbour probability of noun entities \(a\) and \(b\) in the statistical window; \(p_{a}\) and \(p_{b}\) represent the probabilities of noun entities \(a\) and \(b\) appearing in the statistical window. Furthermore, it is defined as follows:

$$ p\left( {a,b} \right) = \frac{{u_{ab} }}{{u_{all} }} $$
(2)
$$ u_{ab} = \sum\limits_{1 \le k \le q} {\frac{1}{{I_{b} \left( {\left\langle {a^{k} ,b^{k} } \right\rangle } \right) - I_{a} \left( {\left\langle {a^{k} ,b^{k} } \right\rangle } \right)}}} $$
(3)
$$ u_{all} = \sum\limits_{xy \in M} {u_{xy} } $$
(4)

\(\left\langle {a^{k} ,b^{k} } \right\rangle\) indicates a direct associative relationship pair in the statistical window;\(q\) represents the sum of the co-occurrence times of knowledge items \(a\) and \(b\) in the statistical window; \(I_{a}\) and \(I_{b}\) represent the relative position index values of two noun entities, respectively. Obviously, in the same statistical window, the minimum difference value between them is 1. \(M\) is the set of all associative relationship pairs in the statistical window. When building the general associative knowledge network model, the statistical window is naturally considered as a natural sentence.

Different from Zhong's semantic relevance relation measurement method, we not only consider the frequency of their co-occurrence but also the relative proximity of two co-occurrence entities in the window when calculating the neighbour probability of two noun entities in the statistical window. That is, when the distance between two entities is closer, their relationship is closer and their strength is greater, and conversely, their strength is smaller. In the experimental part of this paper, a comparative study on above two computing strategies is also executed.

Incremental updating of associative relationship

Furthermore, the main function of this part is to incrementally update the associative strength when a new associative relationship is extracted. Like the process of human brain knowledge updating, associative knowledge network should have dynamic updating ability; that is, knowledge network can be updated incrementally with the increase of learning corpus. However, from the perspective of human brain memory, this incremental updating has not only the addition and enhancement of associative relationships but also the process of weakening, deleting, or forgetting associative relationships. Nevertheless, there is no clear overall consideration in the existing knowledge graph construction strategies, which is also presented in our previous research on knowledge network modelling [31]. Therefore, this study further considers the incremental updating mechanism of associative relationships, that is, with the increase of material texts in the corpus, knowledge nodes can be inserted incrementally, and the strength of new and existing associative relationships can be updated effectively.

Regarding the principle of associative relationship updating between neurons in human brain, Donald, a famous Canadian physiologist, proposed the Hebb learning rule [39]. He believes that the learning process of human brain neural network occurs at synapses between neurons, the strength of synaptic connections changes with the neuronal activity before and after synapses, and the amount of change is proportional to the total activities of two neurons. That is, in a certain period, the connection between activated neurons is strengthened, while the connection between neurons is weakened when two neurons are not activated for a long time. Combining above ideas, if knowledge nodes in associative knowledge network are related to neurons and associative relationships between knowledge nodes are related to synapses connected between neurons, we can give the following strategies for updating knowledge and associative relationships:

  1. (a)

    Considering the characteristics that neurons in the human brain are "stimulated" and "activated" by the brain, the connection between neurons will be strengthened. When nodes in associative knowledge network increase or are triggered, the associative strength on the corresponding node edges are also enhanced. The corresponding strategy schematic is given in Fig. 2, in which shadow nodes are newly inserted knowledge nodes, and the edge thickness indicates the size of associative strength.

  2. (b)

    If neurons in the human brain are not "stimulated" for a long time, the "connection" between neurons will be weakened or even "forgotten". We introduce that, in every learning period, global attenuation of associative strength and reduction in associative relationships are carried out one time. Global attenuation simulates the process of "memory weakening" in the human brain, while associative relationship reduction simulates the process of "memory forgetting". The corresponding strategy schematic is given in Fig. 3. The thickness of the edges in the figure indicates the size of associative strength. After global attenuation of associative strength, the strength value of network edges will decrease, while associative relationship reduction will delete those edges with less strength in the network.

  3. (c)

    In addition, according to the neuron chain reaction characteristics and from the perspective of information dissemination in complex networks [40], it is further considered that in the dynamic "learning" process of associative knowledge network, when nodes are "activated", neighbouring nodes are also "activated", and the corresponding associative strength is also enhanced. The corresponding process schematic is given in Fig. 4, and the thickness of edges in the figure indicates the size of associative strength. When nodes a and b are activated, not only the associative strength between a and b is enhanced but the associative strength of b's direct associative relationships are also enhanced. Wherein, \(R_{{\text{ab}}}\) is the strength of edges generated after nodes a and b are activated.

Fig. 2
figure 2

Node addition or trigger enhancement process in an associative knowledge network

Fig. 3
figure 3

Global attenuation of associative strength and associative relationship reduction in an associative knowledge network

Fig. 4
figure 4

Chain enhancement process of associative strength between nodes in an associative knowledge network

To summarize above discussions, we can give the following associative knowledge network construction algorithm.

figure a
figure b

In Algorithm 1, the global attenuation of associative strength between nodes is executed after learning a batch of material texts for every increment. Concretely, the strengths of all associative edges in the network are multiplied by an attenuation value to simulate the process of memory decline caused by long-term no stimulation of neurons in the human brain, where \(\gamma\) is the attenuation rate and 0.95 is taken as the default according to our empirical analysis.

In a large-scale knowledge graph, there will be many associative edges with weak associative strength (close to 0), so the existence of these edges will lead to unnecessary costs in the knowledge querying process. Therefore, in Algorithm 1, we consider setting the scale of network edges as a constraint capacity value \(T\) to simulate the "forgetting" process of the connection between neurons of human brain. The specific rule is that, after learning a batch of material texts, if the total number of edges exceeds the pre-set constraint capacity, the part of edges with smaller associative strength will be deleted directly, so that the total number of associative edges will be less than the constraint capacity value. This process is related to the step 14 to step 16 in the Algorithm 1.

In Algorithm 2, when a new associative edge \(e_{ij}\) is generated, if the edge already exists in the network, the corresponding associative strength is updated according to formula (6). If there is no edge \(e_{ij}\) in the network, the algorithm adds the edge \(e_{ij}\) to the network, and directly updates the associative strength to \(R_{ij}\) according to formula (7).

In addition, when edge \(e_{ij}\) in the network is updated, because nodes \(v_{i}\) and \(v_{j}\) are activated, the direct associative knowledge of node \(v_{j}\) is also considered to be activated and enhanced. The corresponding update computing is shown in formula (8), where \(x_{i}\) and \(y_{i}\) are learning rates, and \(y_{i} = 0.95\) and \(x_{i} = 0.85\) are taken as default values according to our empirical analysis on complex network.

Method of checking the semantic coherence of noun context

In this section, we take the associative knowledge network as the background knowledge, and judge whether the noun entity is semantic coherence by analysing the differences of their context information between background knowledge and current document.

In text writing, improper use of words in sentences is a common problem, and their semantic coherence checking will be an effective aid for such problems. Table 1 gives some simulated representative sentence examples, in which correctly used noun entities are marked with shadow background and wrong noun entities are marked with a double underline. The examples given in Table 1 include ① redundant words, ② word selection errors, and ③ word ordering errors. To effectively check and find these errors, in this study, we carry out context analysis on each noun entity. That is, we take our associative knowledge network as a background knowledge network to provide empirical knowledge and then take a word as an observation perspective to analyse whether the context words of this word in the current document can effectively support it semantically or interpret it associatively.

Table 1 Examples of sentences with incoherence context semantic of noun entities

Combined with the previous discussions, the method of checking the semantic coherence of the noun context is given below. Specifically, the criterion is whether the contextual relationships of noun entities in the current document have good associative characteristics in the background knowledge network. That is, if the contextual relationships of some noun entities in the current document do not have good associative characteristics in the background knowledge network, it can be considered that the contextual semantic of these noun entities are inconsistent or mismatched. According to the above algorithm principle, our coherence checking method can accurately locate the semantic consistency of a single noun entity in the context instead of giving a rough score of the coherence of the whole sentence or paragraph. Combined with the overall technical framework in this study given in Fig. 1, the following process for checking the contextual semantic coherence of nouns based on an associative knowledge network can be given.

figure c

Current document preprocessing

This part is to preprocess the detection document. First, sentence extraction, Chinese word segmentation and part-of-speech tagging are performed on the current document, and then noun entities are extracted. Different from natural sentence extraction in the building module of an associative knowledge network, in this module, a short sentence extraction method is proposed to avoid inaccurate contextual entity relationships caused by sentences that are too long. That is, comma extraction is added to traditional sentence extraction.

Acquisition of multilevel contextual relationships of noun entities

This part is to extract multilevel contextual relationships of noun entities in the given detection document. In general, the semantic coherence of noun entities in document is related to the location of entities and other entities in context. To evaluate the semantic coherence of a noun entity in a document, it is necessary to obtain the contextual relationships of this noun entity at first, that is, to determine the context-related entities of this noun entity from different perspectives and to form multi-perspective correlation pairs related to this noun entity. When obtaining context-related entities, we consider the following three perspectives:

(1) Intra-sentence relevance. The associative contextual relational network is constructed inside the current sentence to obtain the context-related nouns of noun entities. Considering a short sentence sequence \(S = a,b,c,d,e\) in detection document, the intra-sentence associative contextual relational network constructed by S is shown in Fig. 5a, in which the contextual relationships obtained by noun entity c are shown in Fig. 5b, and the corresponding correlation pair is \(Pair_{c} = \left\{ {\left\langle {a,c} \right\rangle ,\left\langle {c,e} \right\rangle ,\left\langle {b,c} \right\rangle ,\left\langle {c,d} \right\rangle } \right\}\).

Fig. 5
figure 5

Schematic diagram of extracting correlation pair of intra-sentences

(2) Inter-sentence relevance. At first, two short sentences before and after the current sentence are taken to construct an associative contextual relational network. For a short sentence sequence \(S_{q} = a,b,c,d,e\) in the current detection document, Fig. 6a shows two short sentences before and after the short sentence \(S_{q}\), Fig. 6b shows the associative contextual relational network based on inter-sentences, and Fig. 6c shows contextual relationships obtained by noun entity e. Then, the correlation pair of e is \(Pair_{e} = \left\{ {\left\langle {e,k} \right\rangle ,\left\langle {e,r} \right\rangle ,\left\langle {e,f} \right\rangle ,\left\langle {t,e} \right\rangle ,\left\langle {a,e} \right\rangle ,\left\langle {b,e} \right\rangle ,\left\langle {c,e} \right\rangle ,\left\langle {d,e} \right\rangle } \right\}\).

Fig. 6
figure 6

Schematic diagram of extracting correlation pair of inter-sentences

(3) Intra-paragraph relevance. First, the paragraph containing the target noun entity is located, then other nouns in the paragraph are taken as the context of the target noun, and several sets of contextual relationships of the target noun in the paragraph are extracted. Let \(Pks = \left\{ {k_{1} ,k_{2} ,...,k_{n} ,k_{m} } \right\}\) represents the set of noun entities in a paragraph. For noun entity \(k_{n} \in Pks\), we can obtain multiple groups of correlation pairs \(Mu(Pair_{{k_{n} }} ) = \left\{ {\left\{ {\left( {k_{n} ,k_{i} } \right)} \right\}_{1 \le i \le m,i \ne n} } \right\}\) based on an intra-paragraph, and the contextual relationships in the paragraph do not consider the directionality of edges.

Associative coupling degree computing

To quantitatively evaluate the semantic coherence of noun entities in a document with given background knowledge network, this part introduces the associative coupling degree computing strategy. In the previous section, how to extract the correlation pairs of target noun entities has been described. Next, how to further compute the multilevel associative coupling degree features of target noun entities in the background knowledge network \(G\) is discussed. Concretely, the correlation pairs of the target noun entity are mapped to the background knowledge network, and whether these relation pairs have direct associative relationships in the background knowledge network are queried. Obviously, if there is a direct associative relationship, it can show that this correlation pair has a good associative experience in the background knowledge network; that is, the target noun is more coherent in context. Figure 7 shows the associative computing process of a noun entity in the background network, in which the correlation pair of noun entity f is \(Pair_{f} = \left\{ {\left\langle {f,k} \right\rangle ,\left\langle {f,h} \right\rangle ,\left\langle {f,v} \right\rangle ,\left\langle {e,f} \right\rangle ,\left\langle {b,f} \right\rangle } \right\}\), the bold edges \(e_{{\text{ef}}}\), \(e_{{\text{fv}}}\) and \(e_{{\text{fh}}}\) in Fig. 7 indicate that, the correlation pair of entity f has direct associative relationships in background network \(G\). \(Ra_{ef}\), \(Ra_{fv}\) and \(Ra_{fh}\) are the associative strength values on the corresponding edges. In above computing process, the edge directionality is not considered for the intra-paragraph correlation pairs.

Fig. 7
figure 7

Associative computing process of the correlation pair of noun entity f in the background knowledge network

Furthermore, to quantitatively evaluate the semantic coherence of the target noun in context, a computing method of the multilevel associative coupling degree features is further designed as follows.

Definition 2

Let a correlation pair of a noun entity \(k_{i}\) be \(Pair_{{k_{i} }} = \left\{ {\left\langle {k_{1} ,k_{i} } \right\rangle ,\left\langle {k_{2} ,k_{i} } \right\rangle ,...,\left\langle {k_{m} ,k_{i} } \right\rangle } \right\},i \ne m\). Then, its associative coupling degree feature in the background knowledge network G is computed as follows:

$$ \begin{aligned} Vacd\left( {k_{i} } \right) &= \frac{{\sum\nolimits_{{\left\langle {k_{n} ,k_{i} } \right\rangle \in \left( {Pair_{{k_{i} }} \cap G} \right)}} {Ra_{{k_{n} k_{i} }} } }}{{\sum\nolimits_{{\left\langle {k_{n} ,k_{i} } \right\rangle \in \left( {Pair_{{k_{i} }} \cap G} \right)}} 1 }}\\&\quad * \log_{2} \left( {1 + \sum\nolimits_{{\left\langle {k_{n} ,k_{i} } \right\rangle \in \left( {Pair_{{k_{i} }} \cap G} \right)}} 1 } \right)\end{aligned} $$
(9)

In the formula, \(\left\langle {k_{n} ,k_{i} } \right\rangle \in \left( {Pair_{{k_{i} }} \cap G} \right)\) indicates that entity \(k_{i}\) has a direct associative relationship in background network G, and \(Ra_{{k_{n} k_{i} }}\) represents the associative strength value of edge \(e_{{k_{n} k_{i} }}\) in the background knowledge network.

Multilevel associative coupling degree features

This part further expands the multilevel associative coupling degree computing methods. By acquiring the contextual relationships of noun entities at multiple levels, we can obtain multiple groups of correlation pairs of noun entities including inside a sentence, between sentences, and in a same paragraph. Further, by mapping each group of correlation pairs to the background network for associative computing, we can obtain multiple groups of associative coupling degree features corresponding to the noun entity. For the assumption that there are multiple groups of correlation pairs of noun entity k, the associative coupling degree feature \(Vacd\left( {\text{k}} \right)\) inside the sentence is called \(Vacd_{inside}\), and the associative coupling degree feature \(Vacd\left( k \right)\) between sentences is called \(Vacd_{between}\). Moreover, we call \(Vacd_{inside}\) and \(Vacd_{between}\) basic features. Additionally, the associative coupling degree features \(\left\{ {Vacd_{1} ,Vacd_{2} ,Vacd_{3} ,...,Vacd_{n} ,...,Vacd_{m} } \right\}\) can be obtained based on multiple groups of intra-paragraph correlation pairs, in which the value sequence is sorted from largest to smallest and \(m\) is related to the number of noun entities in a paragraph. We take the top \(n\) \(Vacd\) values as paragraph features. In summary, the features \(\left\{ {Vacd_{inside} ,Vacd_{between} ,Vacd_{1} ,Vacd_{2} ,Vacd_{3} ,...,Vacd_{n} } \right\}\) of the multilevel associative coupling degree features of noun entity k can be used. In the experiment, we will study the influence of the number n to the method performance.

Coherence checking using interpretable classification

For coherence checking using interpretable classification decision, we simply use an interpretable classification method decision tree [41] to judge the coherence based on the multilevel associative coupling features. In the experimental part, more details will be discussed.

Experimental methods

Method parameters

The method parameters mainly include the constraint capacity value T of associative knowledge network and the number n of paragraph features. In the following experimental analysis, we will discuss the influence of these parameters on the method performance.

Evaluation metrics

In this study, precision (P), recall (R) and F1-score (F1) are considered as performance evaluation metrics, and the corresponding definitions are as follows:

$$ P = \frac{{\left| {M \cap B} \right|}}{\left| M \right|}*100 $$
(10)
$$ R = \frac{{\left| {M \cap B} \right|}}{\left| B \right|}*100 $$
(11)
$$ F1 = \frac{P \times R}{{(P + R)/2}}*100 $$
(12)

wherein, M is the output result of the classification method, and B is the result of the test sample. P can measure the precision of the model's error detection, R can measure the information coverage of the model's error detection, and F1 can balance the influence of P and R. Moreover, time complexity and space complexity are also used as metrics to measure the model performance in subsequent analysis.

Experimental datasets

In this study, we introduced two experimental corpus datasets. The first dataset is 10,797 texts related to the diet on the topics "healthy knowledge", "dietary nutrition" and "dietary errors", which are crawled from "Meishi-Baike" [42] and "Foodbk" [43] and recorded as Corpus I. The second dataset is 7249 texts provided by Yozosoft, which comes from the party and government corpus of various provinces in the "National Learning Platform Exhibition and Broadcast" module crawled from the official website of "Xuexi.cn" [44] and is recorded as Corpus II.

As the research method includes constructing a background knowledge network and the coherence checking application, the experimental data quantity of these two parts is shown in Table 2.

Table 2 Dataset size used in the experiments

After constructing associative knowledge network, the network on Corpus I owns 102,942 nodes and 5,024,139 edges. And for Corpus II, the network owns 43,576 nodes and 4,136,888 edges.

In addition, in the coherence checking experiment, we also need to build incorrect samples. We consider randomly inserting 1800 noun entities into 100 documents of two corpora as context semantic inconsistency nouns in text, namely, negative sample data, in which randomly inserted nouns are uniformly taken from the noun set in the background knowledge network. To ensure that the semantic information of nouns in the original text is not changed in the process of randomly inserting, a noun entity is inserted every 2–3 sentences. The corpora after insertion are called Dataset I and Dataset II. Figure 8 shows a typical result of randomly inserting noun entities into text paragraphs from the corpus [43, 44], in which the shaded part is the existing noun entities in text, and the double-underline denotes the noun entities we randomly inserted. The left sample text in the figure is taken from Dataset I, and the right sample text is taken from Dataset II.

Fig. 8
figure 8

Two examples obtained by randomly inserting noun words in text paragraphs

Numerical results and discussions

In this part, a group of numerical results will be reported. In the simulation experiments, based on the Dataset I and Dataset II constructed above and the multilevel coupling degree features extraction method, the decision tree model is introduced to judge the semantic coherence of the noun context. In addition to using most of materials in the corpus to construct a background knowledge network, for the training of classification model, the positive sample is from those noun entities already existing in original texts, and the negative sample data are constructed by randomly inserting noun entities in original texts. Because the number of original positive samples in Dataset I and Dataset II is far greater than the number of negative samples, the random under-sampling method is adopted to randomly sample the positive sample data to maintain the balance between the positive and negative sample data.

In the experimental analysis of checking the semantic coherence of noun context, all comparative experiments are performed by five-fold cross-validation for performance analysis. The output performance of the model takes the mean and mean square deviation of five experimental performance metrics.

Tables 3 and 4 show some results of the multilevel associative coupling degree features of noun entities in the example text of Fig. 8. The double underlined parts in the table are error entities with inconsistent context semantic. By analysing the data in Tables 3 and 4, it can be found that the associative coupling degree features of noun entities with correct semantic in document is usually greater than 0, while the associative coupling degree features of noun entities with wrong context and incoherent semantic usually has more 0 values. This result is very in line with our intuitive cognition. That is, for a noun entity with correct semantic meaning in text, its contextual words can effectively support the semantic meaning of this noun, and it must also have good associative interpretation ability in the background network. However, for the wrong entity with incoherent context semantic, the semantic support ability of its contextual words to itself is weak, and it is usually impossible to obtain better associative characteristics in the background knowledge network.

Table 3 Multilevel associative coupling degree features of some noun entities in the left example text in Fig. 8
Table 4 Multilevel associative coupling degree features of some noun entities in the right example text in Fig. 8

Based on above experimental results, we will further quantitatively analyse the performance impact of different parameters and carry out performance comparison to existing methods. And for our proposed method of checking the semantic coherence of noun context for given text, a method name AssoCheck is used for the convenience of description.

Performance analysis on different paragraph feature numbers

In this section, we will analyse the performance influence of different paragraph feature number n. The F1-score difference value is used to evaluate whether the comprehensive performance of the model is improved after adding paragraph features. The F1-score difference value is the F1-score of the model after adding paragraph features minus the F1-score of the model with only basic features. The experimental results are shown in Fig. 9, and the detailed analysis is as follows.

Fig. 9
figure 9

Influence of the number of paragraph features n on model performance

Figure 9 reflects the change in the F1-score difference value before and after adding paragraph features, and the abscissa shows the number of paragraph features. By analysing Fig. 9, compared with only basic features, the error detection performance of the model is improved to varying degrees after adding paragraph features. It can be seen in the figure that the comprehensive performance is best when the number of paragraph features in both datasets is 4. When there are too many paragraph features, the performance of the model will decline instead, which may be due to random interference caused by too many feature quantities. Therefore, we suggest that the number of paragraph features n is set to 4 by default.

Furthermore, by observing Table 5, it can be found that the performance of the model improved after adding paragraph features in both Dataset I and Dataset II. In Dataset I, after adding paragraph features, the F1-score increased by 0.82 performance points, and in Dataset II, the F1-score increased by 0.65 performance points. So, paragraph features are valuable in the coherence checking method, and when n is set to 4, the comprehensive performance is the highest.

Table 5 F1-score values obtained by two compared models with no and added paragraph features

From the above analysis, we can conclude that the proper introduction of paragraph features enables noun entities to acquire richer contextual semantic information, thus improving the error detection performance of the method.

Performance influence under different capacity scales of background knowledge network

In the next experiment, we will consider the influence of the constraint capacity ratio r, which represents the current network edge capacity T divided by the original scale of the background knowledge network. Wherein, the original scale of the background knowledge network refers to the network formed without any connection edge deletion. The experiment is also carried out on Dataset I and Dataset II, and the F1-score difference value is still used to evaluate the performance influence. The experimental results are shown in Fig. 10.

Fig. 10
figure 10

Performance comparison with different constraint capacity ratios for background knowledge network

Figure 10 shows the performance changes by setting different constraint capacity ratio r for Dataset I and Dataset II. By analysing the change curve of Dataset I, it can be seen that as the constraint capacity ratio r gradually decreases, the F1-score difference value gradually increases in the beginning. That is, the model has better comprehensive performance. However, when the constraint capacity ratio r is further reduced, the F1-score difference value of the model begins to decrease in reverse until a negative effect is appeared. We can think that some meaningless connections in the network are removed to a certain extent by properly restricting the scale of background knowledge network edges, thus improving the comprehensive performance of the model. While, if the degree of scale limitation is too large, some necessary connection edges will be discarded, and some necessary semantic connections will be ignored, which will reduce the semantic representation ability of the model. As shown in Fig. 10, the model has the best modelling performance for Corpus I when the constraint capacity ratio \(r\) is 0.5. However, for Dataset II, when the constraint capacity ratio is 0.7, the model has the best modelling performance.

Performance analysis using different relationship measurements

Here, the performance impact of different knowledge relationship measurement strategies is further analysed. As comparison, two related measurements Zhong [30] and Wang [31] are considered, wherein their relationship strength computing equations are used to compute the associative relationship strength of our model. Corresponding experimental results are reported in Tables 6 and 7 related to Dataset I and Dataset II respectively.

Table 6 Performance comparison using different relationship measurements on Dataset I
Table 7 Performance comparison using different relationship measurements on Dataset II

Above results clearly indicate that our proposed measurement strategy can gain slightly better performance than two compared strategies. Accordingly, we can think that our measurement strategy can capture the semantic relationship between noun entities more effectively.

Performance analysis of comparable methods

In this part, performance analysis of the method AssoCheck will be examined compared to two following neural network methods. In the experiment, fivefold cross-validation is also used.

1. ERNIE [45]. In 2019, Baidu put forward the ERNIE 1.0 pretraining model inspired by the masking strategy of BERT, in which BERT's random masking strategy is replaced by entity-level or phrase-level masking strategy. In our experiments, original text sentences are extracted from Dataset I and Dataset II as positive samples, and then negative sentences are constructed by inserting entities with inconsistent context semantic in positive sample sentences. Based on above samples and ERNIE 1.0 training model, semantic inconsistency of every sentence can be recognized. In the experiment, we set the batch size as 4, the learning rate as 5e-5 and the epoch as 1 respectively.

2. SoftMB [38]. Researchers from ByteDance and Fudan University proposed a new model framework for Chinese spelling error correction in 2020, soft-masked BERT. We quote the first part of the framework, the detection network, as comparison method. The detection network is a bidirectional GRU model. The input is a sequence of sentences, and the output is a classification label. It encodes each sentence sequence bidirectionally to obtain bidirectional hidden states. Then, the hidden states in two directions are spliced and sent to the fully connected layer to obtain a probability value between 0 and 1. In the experiment, we consider that the probability value with greater than 0.5 is related to the wrong word. And in the experiment, the batch size is set to 16, the embedding size is set to 256, and the number of layers is 2. Based on the above setting, Tables 8 and 9 give comparative experimental results on Dataset I and Dataset II, respectively.

Table 8 Comparison of error detection performance of different methods on Dataset I
Table 9 Comparison of error detection performance of different methods on Dataset II

According to results in Tables 8 and 9, our method shows the best comprehensive performance. In addition, our method is an interpretable semantic modelling method compared to the advanced neural network semantic modelling method, and it has no disadvantage in performance too.

Besides, although the comprehensive performance of ERNIE is relatively good, its sentence checking and judging can only give a judging result for the whole sentence but cannot locate which words are inconsistent in contextual semantic. For SoftMB, although semantic consistency can be judged for every position in the sentence, its overall performance is significantly lower than other methods. In contrast, the method AssoCheck can not only check the semantic coherence of words in each specific position but also accurately locate misused words, and the overall performance results are also very high.

Complexity analysis

In this section, we will further study the computational complexity of our method AssoCheck by compared to the neural network-based method SoftMB, in which Dataset I is used. In experiment, the computing requirements of dynamically updating 3000 texts with 340,770 words in training stage, and 10 texts in detecting stage. Detailed results are shown in Table 10.

Table 10 Computational complexity comparison of two methods by using Dataset I

According to the results in Table 10, we can find that the computational complexity should be higher than the neural network method. However, we can think that, deep neural network has been a very compact computational structure and been effectively optimized by using GPU. In contrast, our proposed method has no further computational optimization, and even so, their computational complexity is at the same level.

Extended experimental analysis

In previous experiments, the decision tree is considered to realize the judgment of noun coherence checking. Here, we further consider the influence of different classification algorithms on the error detection performance of our proposed detection method. The compared classification algorithms include SVM [46], KNN [47], Random Forest (RF) [48], Multilayer Perceptron (MLP) [49], and decision tree (DT) method. Based on the same datasets and experimental methods of the previous experiments, the results are shown in Table 11.

Table 11 Error detection performance F1 (%) within different classification algorithms

In experimental settings, for decision tree, entropy is used as the attribute selection measure. For SVM, linear kernel function is adopted. For KNN, the number of nearest neighbours is set to 3. For Random Forest, entropy is used as the attribute selection measure too, and the number of trees in the forest is 100. For MLP, the batch size is set to 16, the learning rate to 0.001 and the numbers of hidden neurons to 128, 64 and 32 respectively.

From the Table 11, the classification methods based on decision tree and MLP can obtain better classification performance on both datasets. However, from the perspective of interpretable semantic modelling, we think that the classification method based on decision tree is more in line with our proposed task. Besides, according to the results shown in Table 11, all classification algorithms can achieve good performance. This results again demonstrate the effectiveness of out proposed semantic coherence checking method.

Discussions on meta-heuristic algorithm to enhance associative knowledge network modelling performance

The nodes and edges in the associative knowledge network are crucial to the acquisition and dissemination of semantic information. At present, meta-heuristic algorithms [50] are widely used in practical problems. For associative knowledge network modelling, we think that meta-heuristic algorithms will be effective to optimize the representation and dissemination of semantic information of network nodes and edges. It will be further expanded in our future work.

Conclusion and future work

Inspired by what the human brain has a strong pure associative computing ability, an associative knowledge network is proposed for the semantic representation of noun context. Moreover, a completely novel and highly interpretable method is proposed for checking the contextual semantic coherence of noun words in a document, which is very valuable for error detection in text writing. By introducing existing comparable related methods, the rich experimental analysis results show that, the proposed method has better performance in F1-score metric than the methods based on deep neural network. In addition, the proposed model has incomparable advantages in natural interpretability and incremental learning ability.

Even so, in the construction of associative knowledge network, there is no strong theoretical basis for the computing of edge strength, which is mainly supported by research experience. In the future, we will explore stricter logical base in computing associative relationship strength. In addition, the proposed coherence checking method can only detect and locate those noun words with non-coherence context, but can't make them correct modification. This also be further studied.