Introduction

Auditing, as a systematic process, aims to independently obtain and evaluate evidence to ensure the accuracy of an entity’s financial statements, records, operations, or performance, and to fairly reflect its financial position, operating results, and compliance with laws, regulations, and industry standards. Auditing is divided into two main categories: internal and external [1]. Internal auditing is conducted by internal audit departments within organizations to improve efficiency and effectiveness, safeguard compliance, security, and efficiency [2]. External auditing is conducted by independent third-party organizations, focusing primarily on the accuracy and fairness of financial reports and information [3]. In an environment where cloud-edge collaboration reduces computing latency, auditing can be conducted more conveniently and efficiently [4]. Cloud technology enables auditing teams to achieve real-time collaboration, cross-regional, cross-departmental collaboration, significantly reducing information transmission and communication delays, and improving auditing efficiency. At the same time, the cloud environment provides elastic and scalable computing resources to meet the demands of large-scale data processing and analysis, providing stronger technical support for auditing. Under the framework of cloud-edge collaboration, auditing teams can flexibly adjust resource allocation, expand on demand, to adapt to audits of different scales and complexities, enhancing auditing quality and efficiency [5].

Traditional approaches in the field of auditing primarily rely on auditors manually examining ledgers, financial statements, and other relevant documents to assess the financial condition and operational efficiency of an organization or project [6]. This method emphasizes on-site investigations, face-to-face interviews, and sample inspections to ensure the accuracy and compliance of records [7]. In the context of power engineering projects, this involves auditing aspects such as project progress, cost control, contract execution, and project management. Traditional auditing methods exhibit the following characteristics:

  • On-site investigations and face-to-face interviews: Auditors visit the site to observe and engage in direct communication with project managers and staff.

  • Sample inspections: Since manually auditing all records is impractical, random or targeted sample inspections are typically employed.

  • Manual record analysis: Auditors need to manually inspect and analyze ledger records and financial statements.

  • Reliance on historical data: Traditional auditing methods primarily rely on historical data and records of completed tasks [8].

With the development of technology and changes in the business environment, traditional auditing methods face various challenges:

  • Sharp increase in data volume: As project scales expand, the volume of data involved also increases significantly, making it challenging for traditional methods to efficiently handle large datasets .

  • Increased complexity of technology: Power engineering projects involve various new technologies and complex engineering management processes, making it difficult for traditional auditing methods to comprehensively assess the effectiveness of technology implementation and project management.

  • Need for higher efficiency and accuracy: With increased regulatory requirements and stakeholders’ demands for transparency and accountability, auditing requires higher efficiency and accuracy.

  • Application of intelligent auditing technologies: The application of technologies such as big data, artificial intelligence (AI), and knowledge graphs presents new opportunities and challenges for auditing [9].

In power engineering projects, intelligent auditing refers to the comprehensive and real-time monitoring and analysis of various stages such as design, procurement, construction, and operation using modern information technologies such as big data analytics, AI, cloud computing, and knowledge graphs. This approach can enhance auditing efficiency and accuracy, enable timely identification and prevention of risks, and optimize resource allocation.

A knowledge graph is a technology that represents knowledge through a graphical structure, using nodes (entities) and edges (relations) to store and represent information related to entities and their relations. The core advantage of knowledge graphs lies in their ability to present complex data and information in a structured and interconnected manner, thus providing support for data analysis, information retrieval, and intelligent decision-making. Knowledge graphs can handle and analyze large-scale, heterogeneous datasets, which is particularly important for auditing to deal with complex financial data and unstructured information (such as contract texts, communication records, etc.) [10]. They can quickly identify correlations between data, improving the efficiency and accuracy of data processing. By constructing knowledge graphs involving companies, individuals, transactions, and other relevant entities and their relations, auditors can more easily identify abnormal patterns, hidden risks, and potential fraudulent behavior. This structured approach to data analysis can assist auditors in conducting more in-depth risk assessment and management. Knowledge graphs can provide a comprehensive perspective, helping auditors understand complex business environments and transaction relations. This deeper insight supports more precise audit decisions, enhancing audit quality and efficiency [11]. They can be used to monitor and analyze an organization’s compliance by quickly identifying potential compliance issues through comparing entity behavior and transactions with relevant regulations and standards. Combined with AI technologies such as machine learning and natural language processing (NLP), knowledge graphs can automate data analysis and insight discovery, providing powerful technical support for intelligent auditing [12]. This not only reduces manual workload during the audit process but also improves the quality and effectiveness of auditing. Knowledge graph technology demonstrates enormous potential in intelligent audit data analysis, enhancing audit process efficiency and accuracy, strengthening risk management, supporting wiser decision-making, and driving audit work towards greater efficiency and intelligence. With technology continually advancing, knowledge graphs are expected to play an increasingly important role in the field of intelligent auditing.

In the reality of limited resources for internal audit investment, it is imperative for internal auditing to enhance audit efficiency and focus on core risks within the enterprise, while expanding audit coverage. This has become an important trend in the current innovation and quality improvement of internal auditing [13]. Accelerating the pace of enterprise digital transformation, strengthening the reuse of data audit methods, and achieving intelligent auditing on this basis, transitioning from “human audit” to “machine audit” and then to “smart audit”, is gradually becoming a new trend. In this process, the cloud-edge collaboration environment plays a crucial role by reducing computing latency and improving the response speed and efficiency of auditing teams. Through the introduction of intelligent auditing and cloud-edge collaboration environments, internal auditing can better coordinate the relationships between power enterprise managers, audited business departments, and external auditing, thereby reducing the daily workload of internal auditors [14]. At the same time, utilizing advanced digital management methods and cloud-edge collaboration environments can enhance the level of intelligent auditing for engineering projects, achieve the intelligence and standardization of internal auditing, and thus improve the quality and efficiency of data analysis. This not only provides intelligent support for subsequent decision-making and advanced applications but also has profound theoretical and practical significance for preventing and reducing the operational risks of power enterprises [15]. Therefore, incorporating cloud-edge collaboration environments into the technical innovation and quality improvement strategies of internal auditing can effectively enhance audit efficiency, expand audit coverage, and provide support for the digital transformation of power enterprises.

The rest of the paper is organised as follows: “Knowledge graphs: fundamentals and technologies” section outlines the foundations and technical concepts of knowledge graphs. Then, “Application of knowledge graphs in intelligent audit” section analyses the application of knowledge graphs in smart auditing. “Challenges in adopting knowledge graphs for audit” section describes the challenges of applying knowledge graph to auditing. “Future trends and opportunities” section analyses future trends and challenges in the field of knowledge graphs and auditing. Finally, we conclude the paper in “Conclusion” section.

Knowledge graphs: fundamentals and technologies

Basic concepts of knowledge graphs

The concept of knowledge graph was first introduced by Google in its Knowledge Graph [16] project in 2012, aimed at enhancing the search quality and user experience of the Google search engine. A knowledge graph is a multi-relational directed graph composed of entities as nodes and relations as different types of edges. It describes various pieces of information about the real world in the form of knowledge triples, represented as (hrt), where h and t correspond to the head and tail entities, and r represents the relation between them. An entity can be an objectively existing object or an abstract concept. The connections between entities are described using relations, which include pre-defined types and properties. Semantic descriptions and relations between entities form a networked structure of knowledge.

One key advantage of knowledge graphs is their ability to express complex knowledge systems in a highly intuitive and flexible manner. This graphical representation not only allows people to easily understand the relations between information but also enables computers to efficiently process large amounts of data. In recent years, due to the outstanding advantage of knowledge graphs in representing structured data, they have played an increasingly important role in variousAI tasks, injecting new vitality into intelligent question answering [17], intelligent recommendation [18], and information retrieval [19]. Numerous large-scale knowledge graphs such as DBPedia [20], Freebase [21], WordNet [20], and Wikidata [22] are widely used in various fields. Table 1 shows the large-scale knowledge bases that are now well known. With the continuous discovery of new knowledge and the reinterpretation of old knowledge, knowledge graphs can be easily updated by adding new nodes and edges. This dynamism makes knowledge graphs an evolving knowledge base that can reflect the latest developments in the field of knowledge.

Table 1 Large-scale knowledge tables

Furthermore, knowledge graphs excel in task analysis mainly due to their semantic relationship models, rich knowledge representations, semantic reasoning capabilities, structured representations, and support for large-scale data. These features enable knowledge graphs to better understand tasks and provide accurate, comprehensive information support for them [27]. Users can query the knowledge graph to discover hidden relations between different entities or explore the knowledge structure within a specific domain. For example, by analyzing a medical knowledge graph, researchers can uncover associations between a certain medication and specific diseases, or identify connections between certain symptoms and particular health conditions. The construction and application of knowledge graphs are interdisciplinary fields that combine research achievements from various fields such as computer science, linguistics, and information science. With the continuous advancement ofAI technology, knowledge graphs will play an increasingly important role in intelligent information processing and knowledge discovery, providing people with richer, more accurate, and personalized information services [28].

Key technologies for knowledge graphs

Knowledge extraction

Knowledge extraction was first proposed in the late 1970s, with its core task being the automated discovery and extraction of relevant information from text. Knowledge extraction is a crucial technology for automatically constructing large-scale knowledge graphs, aiming to extract knowledge from various sources and structures of data and store it in a knowledge graph. The data sources for knowledge extraction can include structured data (such as linked data, databases), semi-structured data (such as tables, lists in web pages), or unstructured data (pure text data). Depending on the type of data source, knowledge extraction involves different key technologies and technical challenges that need to be addressed. Knowledge extraction models are classified into pipeline models and joint extraction models, as illustrated in Fig. 1.

Fig. 1
figure 1

Information extraction model classification diagram

In the pipeline model, Named Entity Recognition (NER) aims to identify strict indicators in the text belonging to predefined semantic types such as person names, locations, organizations, etc [29]. NER not only serves as an independent tool in Information Extraction (IE) but also plays a crucial role in various NLP applications including text understanding, information retrieval, automatic text summarization, question answering systems, machine translation, and knowledge base construction. The development of NER traces back to the Sixth Message Understanding Conference (MUC-6), where its purpose was to identify organization, personnel, and geographical location names, as well as currency, time, and percentage expressions in text. Since MUC-6, NER has garnered increasing attention and undergone in-depth research in various scientific activities. In its initial stages, NER primarily relied on expert-crafted rules and domain dictionaries [29]. These rules encompassed morphological rules, punctuation, statistical information, etc., while dictionaries included domain-specific vocabularies and common sense knowledge. The emergence and evolution of machine learning further propelled NER research. These methods leverage large volumes of annotated data for training, optimizing parameters to ultimately generate optimal models. Common machine learning approaches for entity recognition include Support Vector Machines, Hidden Markov Models, Maximum Entropy Models, and Conditional Random Fields. The advent of deep learning provided new avenues for entity recognition compared to earlier rule-based or machine learning-based approaches. Bengio et al. [30] proposed establishing a three-layer neural network structure to train language models and generate distributed word vectors. Subsequently, researchers continuously optimized word vectors, fostering rapid development in NLP. More updated techniques applied to NER subsequently achieved state-of-the-art performance. Collobert et al. [31] introduced a method based on sentences and windows, marking the classical inception of applying neural network models and word vectors to NER. This work also marked the first attempt to obtain general word embeddings from unlabeled data. Huang et al. [32] proposed the BiLSTM-CRF model, which combines bidirectional LSTM and CRF models for sequence labeling tasks, becoming the mainstream NER model at the time and still widely used as a baseline model. Chiu et al. [33] proposed the BiLSTM-CRF-CNNS model, combining RNN and CNN based on the work of Bengio et al. [30]. In 2015, Lu et al. [34] introduced the concept of hypergraphs, presenting a model based on hypergraph representation called Mention Hypergraph. This model uses nodes and directed edges together to represent named entities and their combinations. This representation efficiently handles nested named entities, solving the problem of difficult detection. In 2018, Wang et al. [35] proposed a new method called Segmental Hypergraphs, addressing structural ambiguity issues present in the previous Mention Hypergraph method.

Relation extraction (RE) is a core task in the field of NLP, aiming to extract unknown relation facts from human language text and organize unstructured information into structured formats [36]. By leveraging deep learning techniques such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory networks (LSTMs), researchers can accurately extract relations between entities from complex language environments [37, 38]. This process is crucial for applications like constructing knowledge graphs, understanding text content, and supporting question-answering systems, as it automates the handling and analysis of large-scale textual data, revealing implicit connections between information.Mainstream RE methods are categorized into supervised, unsupervised, and distantly supervised extraction methods based on the need for annotated data. Supervised RE methods are typically modeled as multi-class classification tasks, with much research focusing on extracting text features for relation classification. Early classic supervised methods for RE were based on feature vectors and kernel functions. For instance, Kambhatla [39] proposed a method to extract semantic relations between nouns by combining shallow and deep syntactic processing with semantic information, using kernel functions for calculation, and reducing dependence on semantic trees. Additionally, Giuliano et al. [40] integrated various lexical, syntactic, and semantic features in text using Support Vector Machine (SVM) models and Maximum Entropy models, achieving good results.Although methods based on feature vectors and kernel functions perform well in the extraction process, feature extraction and kernel function design often rely on the modeler’s expertise. Deep learning has seen wide application in various fields, with many neural network-based models such as CNNs, RNNs, and LSTMs performing well on entity RE tasks [41]. Unsupervised RE methods involve automatically identifying entities and their relations from text without pre-annotated corpora. Some clustering-based methods perform well in unsupervised RE. For example, Zhao et al. [42] utilized pre-defined relation labeled data to learn a relation-oriented representation and form a clustering structure by aggregating instances of the same relation into corresponding relation centers to discover new relation types in unlabeled data. Additionally, pattern matching-based methods are also popular, where Tran et al. [43] extracted relations between entities based on manually defined rules or patterns such as regular expressions and syntactic parsing. Furthermore, some deep learning techniques have been applied to unsupervised RE. Yuan et al. [44] proposed an unsupervised RE technique based on Variational Autoencoders (VAEs), connecting the decoder and encoder without imposing restrictions on the classification distribution, thereby improving training stability.

In the context of joint extraction models, methods based on Seq2Seq refer to treating triples as token sequences, transforming the triple extraction task into a generation task of producing triples in a certain order. Addressing the issue of overlapping triples in complex contexts, Zeng et al. [45] introduced the CopyRE model, which utilizes a copying mechanism. This model first generates relations and then generates all triples in the text to resolve the problem of relation overlap. Fu et al. [46] and Nayak et al. [47] also tackled this problem, where Fu et al. [46] proposed a Graph Convolutional Network (GCN)-based method for relation triple extraction, while Nayak et al. [47] introduced a Seq2Seq-based approach where each time step extracts a word representation. In 2021, Ye et al. [48] proposed a Transformer-based generative contrastive learning framework for RE, aiming to address the issue of recurrent neural networks failing to capture long-term dependency relations, resulting in unreliable triple generation. Additionally, word replacement operations can also be conducted to achieve different purposes. Zheng et al. [49] employed a novel annotation strategy, simultaneously labeling entity information and relation types, transforming the tasks of NER and RE into sequence labeling tasks. However, this model overlooks the scenarios of SEO and EPO.

The fill-in-the-blank method involves models typically maintaining tables corresponding to each relation, where each entry indicates whether the corresponding relation exists for a given token pair. Entries with labels usually represent the start and end positions of the two entities involved in the relation. This method represents relation information by filling in tables. In 2020, Wang et al. [50] proposed a method that utilizes mutual interaction between sequence encoders and relation table encoders, combined with attention mechanisms, to capture the interaction information between entity recognition and RE tasks, thereby enhancing model performance. In the same year, Wang et al. [51] also introduced the TPLinker model, which unifies the annotation framework into a linking problem of word pairs, capable of addressing issues like entity overlap relations and nested entities. In 2021, Ren et al. [52] first considered the global correlation between word pairs and various types of relations in fill-in-the-blank-based methods to overcome the limitations of relying solely on local features of individual word pairs. Additionally, Miwa et al. [53] proposed an end-to-end model for extracting entity relations on word sequences and dependency tree structures. This model employs bidirectional sequences and bidirectional tree structures to jointly model entities and relations. It first detects entities and then uses a single incremental decoding vector structure to extract relations between entities, while jointly updating vector parameters for entities and relation labels, achieving better entity RE results. Eberts et al. [54] proposed the SPERT (span-based joint entity and RE) model, which utilizes a Transformer network as the foundational unit to perform lightweight embedding, entity recognition, and filtering for entities within spans. This model also conducts word replacement operations.

The token-based method typically refers to models predicting relations based on tokens. In these models, binary token sequences are often used to determine the start and end positions of entities and sometimes to identify relations between two entities. Additionally, word replacement operations are commonly applied in these methods to increase model flexibility and adaptability. Yu et al. [55] proposed a span-based annotation strategy, where they first label the positions and types of head entities, then label the positions of tail entities, and predict the relation between head and tail entities. However, this method overlooks the potential relations between head and tail entities. Li et al. [56] formalized the joint RE task as a multi-turn question-answering task, extracting triplets based on question-answer templates, but the high computational complexity due to multiple sentence encodings is a limitation. Dai et al. [57] introduced a word-position-based annotation strategy, generating content vectors by combining position-based attention mechanisms. To better explore the interaction between entity recognition and RE, Wu et al. [58] employed two isomorphic bidirectional type attention LSTMs and enhanced the dependency between entity types and relation types through cross-type attention mechanisms. In recent years, researchers have also proposed the PRGC [59] model and bidirectional extraction framework to improve model efficiency and reduce entity extraction omissions. Additionally, word replacement operations can be performed in these methods to increase model flexibility and adaptability.

Knowledge graphs completion

Knowledge graph completion aims to automatically discover and fill in missing entities or relations by utilizing existing knowledge and reasoning techniques. Currently, there are numerous methods available for constructing knowledge graphs and inferring incomplete triplets [60]. These methods can be broadly categorized into tensor factorization models and translation models. In the context of intelligent auditing, tensor decomposition models can aid auditors in identifying potential irregularities or anomalies by transforming financial data into tensor form and leveraging tensor decomposition techniques to uncover underlying patterns and factors, thus providing valuable decision support. On the other hand, translation models can be utilized for translating and understanding cross-language audit information, enhancing audit team efficiency and communication quality.

Models based on tensor decomposition represent the combination of incomplete triplets and missing parts as a three-way tensor, which is then decomposed to obtain embeddings for head entities, relations, and tail entities. Among these are some lightweight models, such as the one proposed by Nickel et al. [61], which leverages hypercomplex space to learn knowledge graph embeddings, enhancing the generalization compared to ComplEx [62]. Unlike the standard vector space with a single component i, each quaternion embedding is a vector in the hypercomplex space H with imaginary components i,j, and k, embedding the relation quaternion via the Hamilton product into a new scoring function. Tucker [63] utilizes a different decomposition model called Tucker decomposition to compute a smaller core tensor and a sequence composed of three matrices, each representing embeddings for entities and relations. Overall, these models offer effective approaches for handling incomplete triplets and missing parts in knowledge graphs, which are crucial for reasoning and completion tasks.

The translation model interprets relations as simple translations of hidden entity representations. The translation distance model measures the plausibility of facts and utilizes a distance-based scoring function. Translation-based models aim to find low-dimensional vector representations of entities related to entity translations. TransW [64] proposes using word embeddings for knowledge graph embedding to better handle unseen entities or relations. Unlike previous works that overlook word-level details in triplets, TransW aims to enrich the knowledge graph by using word embeddings to represent missing entities and relations. RotatE [65] is another translation-based knowledge graph representation learning method. This model can infer different relation patterns, whether symmetric or asymmetric. The rotation model defines each relation as a rotation from the source entity to the target entity in the complex vector space. HAKE [66] is a translation distance model that explicitly models modulus information, with tree depth considered as modulus, while the distance function only considers the modulus part.

Knowledge graphs reasoning

With the further development of the Internet and big data technology, the types of knowledge graphs are constantly increasing, and the amount of data is continuously growing. During the inference process, complex question types are constrained, and answers are derived through multi-hop relations between entities. For instance, for the question “Which company did the chairman of xx company invest in?”, we can find Tom through the chairman relation of the subject entity “xx company”, and then find the answer entity “yy company” through Tom’s investment relation. The intermediate relations and entities constitute the reasoning path. Our goal is to automatically and accurately learn such reasoning paths. The simplified reasoning process is illustrated in Fig. 2. Multi-source knowledge graph reasoning holds significant importance in real-world application scenarios. With the advancement of multi-source knowledge graph reasoning, numerous methods have emerged in recent years, including inference based on traditional rule representation, distributed inference, and neural network-based inference.

Fig. 2
figure 2

Simple inference graph

Rule-based inference primarily applies simple rules or statistical features on some early manually constructed knowledge graphs. For instance, the NELL knowledge graph internally utilizes simple first-order relation learning algorithms for inference [67]. Wang et al. [68] introduced the first-order probabilistic language model ProPPR for knowledge graph inference. Lao et al. [69, 70] proposed the PRA algorithm, which treats inference paths as features to predict whether specific relations exist between entities. Gardner et al. [71] introduced the simpler and more effective SFE algorithm for generating feature matrices from knowledge graphs, which traverses adjacent entities of head entities using a breadth-first strategy, extracts features from local structures, and then performs inference. Liu et al. [72] proposed the hierarchical random walk inference algorithm, where the upper layer corresponds to a global learning perspective, and the lower layer corresponds to local learning inference within the knowledge graph. Additionally, some models utilize rule-based inference based on different types of paths. For example, Guo et al. [73] employ angle-soft rules, while Zhang et al. [74] utilize axioms.

The process of distributed-based inference involves obtaining low-dimensional vector representations through models and then using vector operations for inference on the corresponding knowledge graph. Bordes et al. [75] proposed the first transfer-based representation model, TransE, which serves as the foundation of distributed-based models and initiated the research trend of the Trans series. Wang et al. [76] proposed TransH, which learns one more mapping vector for each relation on top of TransE for mapping entities to the hyperplane specified by the relation, which somewhat alleviates the problem of not being able to deal with multi-mapped attribute relations well. Lin et al. [77] proposed TransR and CTransR. TransR establishes representations for entities and relations in separate spaces, with each relation corresponding to a space and a mapping matrix. After mapping relations to relation space, the relation vector can be transformed into a transition between two entity vectors. Lin et al. [78] also introduced PTransE, which distinguishes between different paths between entities. PTransE extends TransE by modeling relation path constraints, constructing paths through relation combination operations, and then weighting multiple paths between entities to improve the accuracy of inference.

With the rapid development of neural network technology, research on knowledge graphs based on neural networks has received widespread attention. Neural networks offer advantages in knowledge graph reasoning, including self-learning capabilities, fast computation speed, and high accuracy. Socher et al. [79] proposed NTN, which uses a bilinear tensor layer instead of traditional neural network layers to connect head and tail entities, enabling the representation of complex semantic relations between entities in different dimensions. Shi et al. [80] introduced the ProjE shared-variable neural network model in 2017. It learns joint embeddings of entities and edges in the knowledge graph and fills in missing information by modifying standard loss functions. Neelakantan et al. [81] trained an RNN for each relation type to obtain variable-length combination representations. They generate the next combination vector from input relation vectors and the current path vector on the path, with the output of the last step serving as the representation of the path vector. The training objective of this model is to maximize the probability of correct triplets. Graves et al. [82] proposed DNC, which includes an LSTM neural network controller and an addressable external storage matrix. Additionally, Trivedi et al. [83] proposed Know-Evolve, a novel deep evolutionary knowledge network capable of learning nonlinear evolving entity representations over time. Xu et al. [84] introduced GNN into TEA-GNN to capture long-term dependency relationships in temporal knowledge graphs. Bai et al. [85] employed a pruning strategy to obtain temporal logical rules and calculate their confidence scores.

Knowledge graphs and intelligent auditing

The application of knowledge graphs in intelligent auditing is increasingly widespread. Its core lies in utilizing graph databases to construct and represent enterprise information, associated entities, and their interactions, thereby supporting tasks such as information integration, data mining, and risk identification during the auditing process [86]. Through knowledge graphs, auditors can swiftly access comprehensive information about audit targets. Leveraging the querying and analytical capabilities of graph databases enables data correlation, pattern recognition, and trend analysis, thus enhancing audit efficiency and accuracy [87]. Additionally, knowledge graphs can be combined with natural language processing techniques to achieve semantic understanding and intelligent inference of audit documents and reports, providing further support and reference for audit decision-making [88].

While the application of knowledge graphs in intelligent auditing brings many benefits, it also faces several challenges. Firstly, constructing knowledge graphs requires a significant amount of data, including structured and semi-structured data, which can be a major challenge for enterprises [9]. Secondly, updating and maintaining knowledge graphs is also a concern as enterprise information and relationships may change frequently, requiring timely updates to ensure accuracy. Additionally, the scale and complexity of knowledge graphs increase the complexity of queries and reasoning, necessitating efficient algorithms and technologies to support them [89]. Lastly, the incompleteness and uncertainty of data in knowledge graphs may affect auditors’ judgment and decision-making regarding risks. Therefore, addressing these challenges requires a comprehensive consideration of issues such as data quality, update mechanisms, query performance, and uncertain reasoning [90].

The development of knowledge graphs in the field of intelligent auditing holds vast prospects for the future. Several trends are anticipated: Firstly, the construction of knowledge graphs will become increasingly automated and intelligent. With advancements in natural language processing, machine learning, and graph databases, enterprises can more easily extract knowledge from large-scale audit data and autonomously build and update knowledge graphs. Secondly, knowledge graphs will be integrated with other intelligent technologies such as machine learning, data mining, and predictive analytics to achieve more precise and efficient risk identification and decision support [91]. Furthermore, the application scope of knowledge graphs will expand beyond financial auditing to areas such as compliance auditing, internal controls, and risk management. Lastly, the openness and sharing of knowledge graphs will be enhanced, enabling different enterprises and institutions to share and exchange knowledge graphs, thereby further improving audit efficiency and accuracy [8].

Application of knowledge graphs in intelligent audit

Knowledge graphs, as a form of structured semantic knowledge repository, are designed to store information about entities (such as individuals, locations, organizations, etc.) and their relations in a graphical format from various forms of data [92,93,94]. This approach significantly simplifies the process of knowledge comprehension and retrieval for both machines and humans. The organization of information in graphs not only enhances the accessibility of data but also strengthens the interconnectedness of information, thereby making knowledge more easily explorable and exploitable.

Furthermore, the integration of knowledge graphs with cloud-edge computing could further enhance the utility of this technology [95]. Cloud-edge computing provides a decentralized processing infrastructure, which could not only speed up the processing and retrieval of data from large-scale knowledge graphs but also reduce the risk of data loss as the data can be stored on local devices. Moreover, this could facilitate real-time data analysis, as data can be processed at the edge of the network, closer to the source, thus reducing latency times significantly [15, 96,97,98,99,100,101].

In various fields, the application of knowledge graphs has become increasingly important, spanning domains such as finance, healthcare, education, information and communication technology, scientific engineering, social politics, and tourism.

In these domains, knowledge graph technology is utilized to integrate information from diverse sources, demonstrating its potent capabilities. For instance, in the healthcare sector, knowledge graphs constructed by integrating heterogeneous resources have been successfully employed to establish unified question-answering systems and recommendation systems, which provide accurate medical information and personalized health advice [10, 102, 103]. Additionally, knowledge graphs are utilized for uncovering intricate relations within biological data [104], predicting ecotoxicological effects in environmental engineering [36], detecting fraudulent activities in the financial domain [105], as well as discovering and visualizing political relations [29, 106]. These applications showcase how knowledge graphs enable the integration of knowledge from disparate sources and effectively utilize this knowledge for conceptualization and problem-solving in specific domains.

While general-purpose and open-world knowledge repositories have been widely adopted for handling a variety of cross-domain tasks, the construction of knowledge bases focusing on specific domain issues is particularly crucial. This is because such domain-specific knowledge graphs not only provide data directly relevant to the domain’s problems but also encompass semantically interrelated applications, which are essential for a deeper understanding and resolution of specific issues within the domain [107]. Specifically, these knowledge graphs enrich and extend the underlying domain ontology, enabling the resolution of specific problems from domain corpora. Although domain-specific knowledge graphs remain a relatively novel and underexplored area, lacking a unified and comprehensive definition [108], some studies have begun to regard domain-specific knowledge graphs as a particular type of knowledge repository representing specific and complex domains [11, 109, 110]. Jain et al. [111] proposed a report indicating that domain-specific knowledge bases result from the process of enriching the underlying domain ontology.

In efforts to provide an inclusive definition for domain-specific knowledge graphs, Abu-Salih et al. [112] proposed a comprehensive definition outlining three core aspects. (1) Formal Conceptualization: This refers to the logical design of the knowledge graph, which is depicted through a specific and predefined domain ontology aimed at capturing the generalization (higher-level) meaning of the domain of interest or the content of specific subdomains. (2) Thematic Domain: Ensuring that the knowledge graph for a specific domain is constructed around specific thematic knowledge, firmly situated within the context of that particular thematic knowledge. (3) Semantically Interrelated Entities and Relations: Emphasizing the physical design of domain-specific knowledge graphs, which are presented in the form of a labeled graph where the semantics of the data are enriched through specific conceptual representations of entities and the relations between them. This comprehensive definition not only provides a profound understanding of domain-specific knowledge graphs but also underscores their significant potential for information integration and knowledge discovery across various domains.

In this chapter, we will specifically delve into the achievements of knowledge graphs in the field of auditing, primarily focusing on the following aspects: the specific applications of knowledge graphs in the auditing domain, how to leverage knowledge graphs to enhance the efficiency of collecting and analyzing audit evidence, and the role of knowledge graphs in detecting audit risks and abnormal behaviors.

Specific applications of knowledge graphs in auditing

The task of auditing involves identifying risk points from complex structured and unstructured data and reporting significant errors. Therefore, the application of big data in the auditing domain is of great significance for achieving audit objectives [113]. Betti et al. [114] discussed the evolution of internal audit models in the context of digitization and intelligence, laying the foundation for integrating knowledge graphs and deep learning in audits. In recent years, the advent of big data has led to a shift in the media of audit targets, with the focus of audits shifting towards electronic data, such as financial data and other business data in various well-structured databases [115]. Lv et al. [116] pointed out that unstructured data from external audit network resources can be extracted using web crawler technologies like Nutch [117] and Heritrix [118], then transformed and stored in a structured manner in audit cloud platforms. In terms of data analysis, “SAMPLE = Total” data analysis models, software, and model-based streaming analysis methods have been used to enhance the accuracy of analysis and effectively improve audit efficiency [119]. Despite the widespread adoption and application of technologies such as data mining and data analysis in the auditing domain, the impact of the big data era on auditing remains insufficient [120]. Therefore, in the field of auditing, the application of knowledge graphs is rapidly becoming an indispensable part, fundamentally changing traditional methods by analyzing large and complex datasets to discover hidden relations and insights. This helps auditors identify potential risks and anomalies.

The general process of using a knowledge graph to accurately represent intelligent audit issues is as follows:

  • Knowledge graph modeling: Firstly, it is necessary to construct an appropriate knowledge graph model based on the domain expertise of intelligent auditing. This model should include entities, attributes, and relationships related to auditing. For example, entities such as “enterprise,” “financial statements,” “auditor,” attributes such as “enterprise registration time,” “period of financial statements,” and relationships such as “enterprise owns financial statements,” “auditor audits financial statements” can be defined.

  • Knowledge extraction and import: Knowledge related to intelligent auditing in the form of auditing reports, financial documents, regulatory files, and other sources is extracted and transformed into the format of a knowledge graph. This can be achieved through techniques such as natural language processing and information extraction. For example, information such as auditing conclusions and financial indicators from auditing reports can be extracted and mapped to entities, attributes, and relationships in the knowledge graph.

  • Knowledge inference and querying: Inference and querying operations are performed using the relationships and rules in the knowledge graph to answer specific intelligent audit questions. Graph databases and query languages like SPARQL can be used for implementation. For example, it is possible to query whether the financial statements of a particular enterprise comply with relevant regulations or find audit cases related to a specific auditor.

However, literature on knowledge graph analysis regarding the current state of research in internal auditing, big data auditing, intelligent auditing, performance auditing, and other auditing fields is scarce. Practical research utilizing knowledge graph technology for off-site audits and designing intelligent expert models for industry auditing is even rarer. By illustrating the practical applications of knowledge graphs in auditing work, we delve deeper into the contribution of knowledge graphs to improving audit quality and efficiency. The specific applications of knowledge graphs in the auditing domain include two aspects: auditing and information system applications, and network threat analysis and detection.

  • Audit and information systems applications: These applications aim to utilize semantic networks and graph technologies to better integrate, interpret, and apply data in the context of auditing and financial analysis.

  • Network threat analysis and detection: These applications focus on identifying and analyzing network security threats through systematic audit record examination. Techniques such as kernel audit record analysis and recommendation-guided threat analysis are employed. These efforts emphasize the development of methods for hunting network threats and strengthening security measures.

In the field of audit and information systems applications, Liu et al. [89] proposed a preliminary innovative approach to constructing an enterprise-level, knowledge graph-based information audit platform. The conceptual model is depicted in Fig. 3. Specifically, this platform gathers information from diverse data sources, including structured and semi-structured enterprise data from audit databases and commercial databases, as well as unstructured general information from sources such as encyclopedias. During the data collection and standardization phase, the platform employs a standardized terminology dictionary to normalize the data and transfers it to relational databases and object storage services. Subsequently, the construction of the knowledge graph encompasses knowledge extraction, storage, and fusion. Ultimately, leveraging the developed knowledge graph, this platform facilitates data-driven decision-making and efficient data analysis. It is important to note that this depiction represents a conceptual model, and an actual company information auditing platform may exhibit greater complexity. This integration significantly enhances the capability for data processing, analysis, and decision-making by leveraging various data sources to construct comprehensive knowledge graphs, thereby improving the efficiency and effectiveness of the audit process [89]. It underscores the transformative potential of knowledge graphs in auditing, providing a cutting-edge solution to the inherent complexity in audit data management and analysis. Additionally, the research by Zhu et al. [5] elaborately discussed the multifaceted applications of knowledge graphs in Economic Responsibility Auditing (ERA), emphasizing their utility in trend identification, hotspot identification in ERA data, as well as elucidating key themes, patterns, and their evolution over time. This approach not only deepens audit analysis by addressing emerging issues and predicting the future development of electronic reverse auctions but also utilizes a comprehensive analysis of literature on electronic reverse auctions from 1986 to 2022. Chen et al. [8] demonstrated an innovative approach that combines knowledge graphs with deep learning technologies to enhance the audit process, mapping structured and unstructured data to uncover hidden relations and insights. This significantly improves internal audit efficiency, highlighting the potential of knowledge graphs in transforming traditional audit practices. Dai et al. [9] introduced an innovative method for conducting audit queries through an intelligent question-answering system based on knowledge graphs and semantic similarity, emphasizing the integration of knowledge graphs to effectively handle and respond to complex audit-related queries. Their research underscores the transformative potential of combining knowledge graphs with AI technologies in auditing, providing insights for practical applications and improving audit efficiency.

Fig. 3
figure 3

The conceptual model of the knowledge graph-based enterprise information audit platform

In the realm of network threat analysis and detection, Zeng et al. [1] utilized system audit records and data source technologies, along with graph neural networks and recommendation systems, to identify and analyze network threats. This approach not only enhances threat detection accuracy but also reduces reliance on expert knowledge, showcasing the potential of knowledge graphs in automating and enhancing audit intelligence. Similarly, Wu et al. [3] developed a method for detecting financial fraud risks using an audit information knowledge graph, particularly for companies listed on the Growth Enterprise Market in China, as illustrated in Fig. 4. When detecting fraudulent companies, the workflow of utilizing a knowledge graph is as follows: firstly, they collect and preprocess audit opinion data, including information about companies, audit firms, auditors, and so on. Next, they perform information extraction and data analysis to gain insights into the financial condition of companies and audit opinions. Subsequently, they construct a knowledge graph that describes the associations between companies, audit firms, and auditors. With the knowledge graph in place, they proceed with model training and feature mining to enhance the ability to detect fraudulent behavior. Finally, they apply the trained model to a test set to validate its performance on new data. This workflow leverages the structure and relationships within the knowledge graph to improve the detection of potential fraudulent companies. This method integrates audit information into structured knowledge graphs for advanced analysis and inference, providing insights into identifying potential financial fraud. It offers a new tool for auditors and financial analysts to enhance risk analysis by leveraging interconnected data in knowledge graphs, thereby improving the accuracy and efficiency of fraud detection processes. Yang et al. [6] proposed a flexible approach to track network threats through kernel audit record analysis using knowledge graphs. This method organizes audit data into structured formats for detecting complex threat patterns and behaviors, thereby enhancing investigation and understanding of security events. This approach helps identify abnormal activities and potential security vulnerabilities, demonstrating the application of knowledge graphs in improving audit processes in the field of network security. A notable example of the application of knowledge graphs in the auditing domain is their use in financial audits of power grid enterprises. In addition, in the financial audit of power grid enterprises, the integration of cloud-edge data into a knowledge graph can offer a more comprehensive perspective, aiding auditors in evaluating the financial condition and operational performance of power grid enterprises.

Fig. 4
figure 4

The main workflow of using knowledge graphs to detect fraud corporations

Methodology and practices

After exploring the specific applications of knowledge graphs in the auditing domain, it is evident that these advanced data structures provide profound improvements to traditional auditing methods. Knowledge graphs further revolutionize the auditing process by significantly enhancing the efficiency of evidence collection and analysis. From identifying potential risks and anomalies through knowledge graphs to leveraging these systems for more detailed audit evidence tasks, a promising path is paved for auditing. By optimizing data integration, analysis, and insight generation, knowledge graphs offer an innovative approach that not only simplifies evidence collection but also enhances the analytical processes that support effective audit practices. This advancement highlights a broader trend of data-driven decision-making in auditing, where knowledge graphs act as catalysts and pathways to more complex and efficient auditing methods. Liu et al. [89], through collecting data from various databases and external internet resources, constructed, updated, and expanded knowledge graphs using comprehensive data. These graphs enhance data accessibility and support data-driven decision-making, providing flexible and adaptive tools for sustainable auditing. Additionally, Zhu et al. [5] utilized knowledge graph analysis to improve the efficiency of collecting and analyzing audit evidence in Economic Responsibility Auditing (ERA). Through CiteSpace analysis of ERA research, the study mapped the evolution of ERA themes, identified research hotspots, and elucidated trends. Chen et al. [8] proposed a method to enhance the efficiency of evidence collection and analysis in power grid enterprise audits using knowledge graphs.

In the realm of network threat analysis and detection, Zeng et al. [1] transformed system audit records into source graphs, which, in conjunction with system entities, form a knowledge graph. Wu et al. [3] outlined a method that utilizes a knowledge graph constructed based on audit information to enhance the efficiency and accuracy of detecting financial fraud. They focus on mining characteristic paths in the knowledge graph to identify potential fraudulent companies by analyzing abnormal relations between potential fraudulent companies and known fraudulent entities. This approach enhances the interpretability of audit data, allowing for a more comprehensive analysis of audit evidence and potential fraudulent activities, thereby aiding auditors and regulatory agencies in more effectively monitoring and identifying fraud risks. Yang et al. [6] discussed how to use a knowledge graph constructed based on kernel audit logs to improve the efficiency of network threat hunting. It simplifies the process by integrating threat intelligence and expert knowledge into the graph. This method enables security analysts to trim searches and quickly summarize large volumes of formatted data for anomaly detection, thereby aiding in identifying unknown threats. Dai et al. [9] constructed a knowledge graph from audit-related documents, allowing the system to better understand the context and semantics of user queries. They propose using this graph to classify issues, identify intentions, and match queries with relevant entities or information. This approach enables more accurate and timely responses to audit queries, significantly improving the efficiency of handling audit-related tasks. The overview of knowledge graph approaches in Audit domain is shown in Table 2. By significantly enhancing the efficiency of audit evidence collection and analysis, knowledge graphs have further revolutionized the audit process. From identifying potential risks and anomalies through knowledge graphs to leveraging these systems for more detailed audit evidence tasks, they provide a promising path for auditing. By optimizing data integration, analysis, and insight generation, knowledge graphs offer an innovative approach that not only simplifies evidence collection but also enhances the analytical processes that support effective audit practices. This advancement highlights a broader trend of data-driven decision-making in auditing, where knowledge graphs act as catalysts and pathways to more complex and efficient audit methods. Various research and practical cases demonstrate that by integrating and analyzing large amounts of data, knowledge graphs not only enhance data accessibility and interpretability but also improve the efficiency of risk identification and decision support in the audit process. Therefore, the application of knowledge graphs in the auditing domain showcases their significant potential in enhancing audit quality and efficiency, bringing about revolutionary changes in audit practices.

Table 2 Overview of knowledge graph approaches in audit domain

Intelligent exploration and response

In the identification of audit risks and abnormal behaviors, knowledge graphs play a pivotal role in integrating and analyzing complex cloud-edge data and relationships. Efficient internal auditing enables enterprises to better assess and improve themselves, enhance their risk management capabilities, and mitigate audit risks [121, 122]. Hou et al. [86] proposed an intelligent financial accounting and financial risk monitoring and early warning model based on knowledge graph and deep learning technologies, aiming to address the inefficiency, time consumption, and low level of intelligence in existing computerized financial data prediction systems. Additionally, Zehra et al. [123] explored the construction of domain knowledge graphs and their application in financial auditing, elucidating the practical application of knowledge graphs in the auditing process. Therefore, knowledge graphs play an increasingly important role in identifying audit risks, as they help auditors effectively identify potential audit risks and abnormal behaviors by structuring and analyzing large amounts of data. The application of knowledge graphs can significantly improve audit efficiency and quality. Auditors can use knowledge graphs to monitor changes in key indicators and transaction activities, identify new risks and abnormal behaviors in real-time, and thus achieve more effective risk management.

In auditing, abnormal behavior refers to actions that significantly deviate from normal business or accounting processes, which may indicate errors or fraud. Abnormal behavior can be unintentional errors, such as calculation mistakes or data input errors, or intentional fraudulent activities, such as falsifying transactions, concealing liabilities, or overstating income. Huang et al. [124] proposed CoDetect for detecting financial fraud, aiming to utilize both network and feature information simultaneously. Hilal et al. [125] explored the concept of anomalies in the context of financial fraud detection and reviewed the effectiveness of various anomaly detection techniques in identifying such fraud. Bakumenko et al. [126] introduced the application of machine learning techniques in identifying anomalies in general ledger data, which deviate from standard financial transaction patterns. These anomalies may signify errors or potential fraudulent activities. The role of knowledge graphs in detecting audit anomalies mainly lies in their capability to integrate and analyze vast amounts of financial data and their interrelationships to identify unusual patterns or behaviors. This approach enhances the efficiency and quality of auditing, enabling auditors to better understand complex financial data and transaction contexts, thereby more effectively identifying and responding to audit anomalies.

In summary, the role of knowledge graphs in identifying audit risks and anomalies lies in their ability to integrate and analyze complex financial data and their relations, thus identifying unusual patterns or behaviors that point to potential risks. Through entity relations, auditors can swiftly and accurately pinpoint areas of concern, conducting in-depth analyses to uncover the underlying reasons for risks. This enhances audit efficiency and quality, enabling auditors to better understand financial data and transaction contexts, and effectively identify and respond to abnormal behaviors.

The application of knowledge graphs in auditing, information system applications, and network threat analysis and detection holds vast potential. However, it also faces certain limitations. These limitations encompass challenges related to data quality and completeness, effective representation and integration of diverse knowledge, scalability and performance issues, complexities in knowledge acquisition and maintenance, challenges in interpretability and explainability, as well as concerns regarding privacy and security. Nevertheless, through continuous research and innovation, these limitations can be gradually overcome. Progress in improving data quality and completeness, developing more effective knowledge representation and integration techniques, enhancing system scalability and performance, automating knowledge acquisition and maintenance, bolstering interpretability and explainability, as well as ensuring privacy and security, will contribute to greater success in the application of knowledge graphs in these domains.

Challenges in adopting knowledge graphs for audit

We will explore the challenges of integrating knowledge graphs with auditing, including issues with data quality, scalability of knowledge graphs, and the integration of domain expertise and technological implementation in intelligent auditing.

Data quality concerns

The introduction of knowledge graph technology has brought unprecedented efficiency and convenience to the audit industry, significantly driving innovation and optimization in audit work [3]. The application of edge computing makes the audit process faster and more flexible, as it allows data to be processed close to where it is generated, reducing transmission latency and bandwidth requirements [14, 14, 127]. However, the issue of data quality is a major barrier to maximizing its effectiveness [128, 129]. The accuracy, consistency, timeliness, completeness, trustworthiness, and availability of data have a decisive impact on the reliability and effectiveness of audit outcomes [130], Table 3 shows the sources of these evaluation indicators. Each evaluation indicators for knowledge graph quality can provide objective standards for the data quality of knowledge graphs. In addition, edge computing can improve the real-time monitoring of data quality by processing data in real time near the data source, ensuring the real-time and accuracy of the data during the audit process [127]. It is noteworthy that these standards influence and interrelate with each other. Figure 5 offers insights into the correlations among these metrics and their roles in the workflow, thereby assisting in controlling data quality in the construction and maintenance of knowledge graphs to meet the specific requirements of the audit domain [130].

Table 3 The sources of these evaluation indicators
Fig. 5
figure 5

The role of data quality evaluation indicators in workflow

In the audit process, the reliance on accurate and complete data is self-evident. The construction and application of knowledge graphs are directly impacted by errors, omissions, or incompleteness in the original data, which can significantly compromise the quality of the knowledge graph and the accuracy of audit decisions. Initially, during the creation process, knowledge graphs can be optimized through these evaluation indicators at the stages of dataset selection, knowledge extraction, and knowledge integration. Issa et al. [128] focused on providing a systematic literature review to assess the completeness of knowledge graphs and collected existing methods from the literature for qualitative and quantitative analysis. By adopting comprehensive data quality management strategies and advanced knowledge extraction techniques, data quality can be significantly enhanced, thereby offering solid support for audit work. Once a knowledge graph has been created, its data quality can still be improved through evaluation indicators with updates and iterations. Identifying incorrect entities, relations, and attributes in the knowledge graph can resolve erroneous information, thus enhancing accuracy. Research by Xue et al. [145] found that low-quality data might contain inaccurate or outdated entries and not cover sufficient facts, limiting their credibility and further utility. Inaccurate financial data input or missing transaction information can lead to errors in audit outcomes. In addition, knowledge graph may also encounter the problem of missing data, and enhancing the missing information can improve the data quality of knowledge graph. Knowledge graph completion can be used to solve the problem of missing data. Knowledge graph completion can be divided into link prediction and attribute completion. Link prediction [146] can identify implicit relationships between entities. Link prediction predicts the missing relationship by determining whether there is an edge between two entities. It can deal with large-scale graph data, and it is suitable for short and medium term forecast, and the prediction effect of approximate exponential growth is good. However, it is only suitable for static networks, not for dynamic networks. At the same time, it is difficult to deal with complex graph structure. Cai et al. [147] proposes a linear graph neural network to realize link prediction and solve the problem of missing data. Trouillon et al. [62] enables link prediction through complex embedding to fill in the missing data. Attribute completion identifies the missing attribute values of the entity. Attribute completion can predict the missing attribute value based on the information of the existing attribute. It can solve the problems of low historical data and low sequence integrity, and is suitable for short and medium term forecasting. And it can generate regular sequences from irregular original data. However, attribute completion is also only applicable to predictions that approximate exponential growth and is not suitable for long-term predictions. Chen et al. [148] designed a novel GNN to perform attribute completion on graphs with missing attributes through distribution matching. Jin et al. [149] completes both attribute completion and learning embedding by generating adversarial networks. Handling information from multiple data sources poses a significant challenge due to the lack of uniformity in data formats and standards, increasing the complexity of data processing and potentially leading to misinformation and incorrect audit assessments. Timely updates of data and the reliability of its sources are equally important in auditing, as outdated or unclear data sources can significantly weaken the effectiveness of audit decisions. Optimizing knowledge graphs through six evaluation indicators in their updates and iterations can greatly enhance their quality. Therefore, with the widespread application of knowledge graph technology in the audit field, addressing data quality issues becomes particularly important. This not only can improve the accuracy and efficiency of auditing but also can enhance the credibility and effectiveness of audit outcomes, thereby providing a strong data guarantee for the stable operation and continuous development of enterprises.

Scalability issues in knowledge graphs

The scalability of knowledge graphs is a crucial feature in their design and application, enabling them to continuously absorb new information, entities, concepts, and their relations over time, thus significantly expanding their knowledge base and enhancing the richness and accuracy of the information they provide [150]. By processing data at the edge of the network, edge computing can absorb and analyze real-time data from various devices and sensors in real time, further enhancing the real-time updating ability and response speed of the knowledge graph. This scalability is evident in several dimensions: from the structural flexibility that allows for easy integration of new data without affecting existing structures, ensuring adaptability to evolving information needs and knowledge accumulation, to the semantic depth that enables the expression of complex concepts and relations, enhancing data interpretability and application intelligence. Knowledge graphs were designed from the outset to integrate with external data sources seamlessly, utilizing standardized formats and interfaces to amalgamate diverse data types, including open datasets, professional databases, and internet data, thereby enriching their content continuously. The integration of edge computing, especially in iot environments, enables faster data processing and analysis, enabling the knowledge graph to more effectively adapt and reflect the dynamic changes in the real world [127, 151]. Advances in AI, machine learning, and NLP have evolved the construction, updating, and querying processes of knowledge graphs, enabling more efficient handling of large-scale data and supporting complex analyses and applications. As knowledge graphs grow and refine, their impact in various applications-from search engine optimization to intelligent question answering [152, 153], recommendation systems [154], and risk management [155]-continues to expand, making them a powerful tool for linking different knowledge domains and supporting intelligent services and decision-making. In addition, the implementation of edge computing can improve the efficiency and accuracy of knowledge graphs when processing large amounts of data, especially in scenarios that require rapid decision making and automation. The ongoing development of technology and its applications promises to deepen the construction and utilization of knowledge graphs, contributing significantly to the informatization and intelligentization of human society.

In the audit domain, the application of knowledge graphs has introduced innovative pathways for data organization and analysis, delivering deep insights [156]. The rapid growth and diversification of audit data, however, challenge the scalability of knowledge graphs. This challenge is acutely felt across several dimensions: The sheer volume of data, encompassing financial information, transaction records, and audit logs, necessitates the continual integration of new data sources into knowledge graphs, raising the bar for their architecture and storage capacity to keep pace with data expansion. Effectively scaling knowledge graphs to accommodate this growth is paramount. Furthermore, the diversity and complexity of audit data demand that knowledge graphs can amalgamate a variety of data types, including structured, semi-structured, and unstructured data. This integration must not only ensure data accuracy and consistency but also maintain scalability, striking a balance between accommodating the wide array of audit-related data and preserving the integrity and utility of the knowledge graph. The need for immediacy in audit activities compels knowledge graphs to support real-time updates and data processing, facilitating rapid response and decision-making. This necessitates that knowledge graphs be both scalable and nimble, capable of swiftly incorporating and processing new information as it becomes available. Despite the support modern technology provides for expanding knowledge graphs, practical challenges due to technological and resource limitations remain. To effectively implement scalable knowledge graphs that can manage the growing and diversifying audit data, advanced technologies, alongside significant resources for development, maintenance, and expansion, are required. Addressing these challenges is crucial for the effective use of knowledge graphs in auditing. Solutions may involve advanced data integration techniques, more flexible knowledge graph architectures, and the application of cutting-edge technologies such as AI and machine learning for real-time data processing and analysis. Overcoming these hurdles will further cement the role of knowledge graphs as a vital resource in the audit industry, equipped to navigate the complexities of modern data-driven auditing environments.

To address these challenges and enhance the scalability of knowledge graphs, several strategies can be adopted: employing distributed storage and computing technologies enables knowledge graphs to store and process data across multiple servers, significantly improving their scalability and processing efficiency. Xu et al. [157] expanded knowledge graph methodologies to systematically and comprehensively review distributed ledger technology on the Internet of Things, achieving high-performance, sustainable, and highly scalable IoT systems. Developing efficient data updating and maintenance mechanisms ensures that knowledge graphs can promptly reflect data changes and support real-time processing. Jia et al. [158] proposed an adaptive incremental update embedding framework for dynamic knowledge graphs, dynamically updating and maintaining knowledge graphs based on a performance review mechanism. Utilizing cloud computing resources to dynamically adjust resource allocation according to demand supports the dynamic expansion of knowledge graphs. Mitropoulou et al. [98] combined knowledge graphs with cloud computing, using knowledge graphs to represent computing and storage resources and illustrating their relations with the applications that utilize them, thus achieving anomaly detection in cloud computing. By implementing these approaches, the scalability of knowledge graphs can be enhanced, thereby addressing the challenges posed by the rapid growth and diversification of audit data in the audit domain.

Data privacy and security issues

Data privacy and security are two paramount concepts in the field of information technology, pivotal to the protection, access control, legitimate use, and privacy rights of users. Data privacy primarily focuses on the lawful use and processing of personal or sensitive data, with its essence being the protection of users’ personal identifying information, such as names, phone numbers, and email addresses, to ensure these are not misused or unlawfully processed. On the other hand, the core of data security lies in safeguarding data from unauthorized access, disclosure, alteration, or destruction, aiming to preserve the integrity, confidentiality, and availability of data. Edge computing enhances the confidentiality and security of data by processing and analyzing it close to where it is generated, as it reduces the need for data to travel across the network, thereby reducing the risk of data breach or interception [99, 159]. As knowledge graph technology gains deeper integration into the audit domain, issues of data privacy and security become increasingly pronounced. The audit process involves handling vast amounts of sensitive financial data and personal information, where safeguarding this information is crucial for maintaining client trust and compliance with regulatory requirements. The construction and application phases of knowledge graphs pose risks of financial details, personal, and business-sensitive information leakage, potentially leading to legal liabilities and reputational damage. The open and interconnected nature of knowledge graphs further exposes data to external security threats, such as data tampering and unauthorized access. Edge computing enables stricter security measures, such as encryption and access control, to be implemented in a localized environment, providing stronger data protection [2]. In the face of stringent legal regulations like the GDPR [12], ensuring compliance in data processing activities while utilizing knowledge graph technology presents a significant challenge. Moreover, existing knowledge graph construction and querying technologies often lack considerations for privacy protection, devoid of effective privacy and security mechanisms. The introduction of edge computing can provide an additional layer of security to the knowledge graph system, ensuring the safe and compliant handling of sensitive data through real-time privacy protection and security monitoring at the data source point. Addressing these challenges requires a comprehensive approach, integrating advanced privacy-preserving techniques and robust security measures into knowledge graph systems to ensure the secure and compliant handling of sensitive data.

To protect data privacy and security, implementing data de-identification and anonymization measures is crucial. Processing sensitive information before its integration into knowledge graphs reduces the risk of privacy breaches. Enforcing strict data access control to ensure that only authorized users can access sensitive data, using Role-Based Access Control [160] strategies to manage permissions, is essential. Long et al. [161] constructed a financial knowledge graph by de-identifying stock transaction records and public market information. They then used the financial knowledge graph to explore correlations between stocks and market trends, ultimately forecasting stock price movements. Zhang et al. [162] built a large-scale knowledge graph for mental and physical disorders detection through data de-identification, initially encrypting personal information with encryption isolation technology, then using content replacement for data de-identification to protect data privacy and security. Xia et al. [163] created a knowledge graph by de-identifying MOOC static registration records and log information to safeguard user privacy and security. By analyzing the knowledge graph, they implemented interest dissemination on course videos to predict user preferences. Encrypting data storage and transmission, utilizing advanced encryption technologies to prevent data from unauthorized access or tampering, is also vital. Throughout the entire process of constructing and applying knowledge graphs, strict adherence to data protection regulations and industry standards is mandatory to ensure the legality and compliance of data processing activities. Introducing privacy protection technologies, such as differential privacy [164] and homomorphic encryption [165], can further enhance the capability to protect privacy when handling sensitive information. Differential privacy [166] is a technique aimed at providing robust privacy protection by adding a certain amount of random noise to the data query results. Its core idea is to allow for useful statistical information to be provided from database queries without disclosing any sensitive information about individual records. Homomorphic encryption [167] allows for computations to be performed on encrypted data, yielding an encrypted result that, when decrypted, matches the result of the same operations performed on the unencrypted data. This means data can be processed and analyzed without exposing the original data’s content, thereby enhancing privacy and security in knowledge graph applications.

Facing the challenges of data privacy and security within the audit domain, adopting these effective protection measures and technologies not only mitigates risks of privacy breaches and data security but also ensures the compliance and security of audit activities. This provides a solid foundation of privacy protection and data security for the application of knowledge graph technology in the audit field. Such strategies enhance trust among stakeholders, maintain the integrity of the audit process, and comply with stringent regulatory requirements. By integrating advanced privacy-preserving and security-enhancing technologies, organizations can navigate the complex landscape of data privacy and security in auditing, leveraging the full potential of knowledge graphs to derive insights while safeguarding sensitive information. This comprehensive approach to privacy and security is essential in today’s data-driven audit practices, enabling auditors to harness the power of knowledge graphs effectively and responsibly.

Future trends and opportunities

In this section, we will explore the future trends and opportunities of knowledge graph technology. Firstly, we will focus on the prospects of machine learning in the construction and analysis of knowledge graphs, discussing its potential applications in this field. Secondly, we will delve into the long-term potential of knowledge graphs in enhancing audit quality and efficiency, particularly in the context of intelligent audit applications in the field of electrical engineering projects. Finally, we will discuss the opportunities of knowledge graph technology in cross-disciplinary applications, ranging from audit to financial reporting and risk management, showcasing its widespread utilization across various domains.

Prospects for machine learning in knowledge graphs construction and analysis

Machine learning demonstrates vast potential in constructing and analyzing knowledge graphs, spanning various aspects from automated data mining [168] to deep knowledge exploration [169] and intelligent decision support [170]. Technological advancements have enabled machine learning not only to enhance the efficiency of constructing knowledge graphs but also to strengthen their analytical capabilities, bringing innovative breakthroughs to different fields. The integration of edge computing, through the machine learning processing at the data source point, can achieve faster data analysis and instant knowledge extraction, thus accelerating the construction and update process of knowledge graph. In terms of automated construction and continuous updates, machine learning automates the extraction of entities, attributes, and relations from texts and data sources, simplifying the manual data organization process. This includes text analysis, entity recognition, and RE, significantly improving the efficiency of building and maintaining knowledge graphs. Machine learning is also applied in quality control and knowledge enhancement, using algorithms to assess data quality, identify inaccuracies, eliminate duplicates, and complete missing information, thereby increasing the accuracy and richness of knowledge graphs. At the same time, edge computing can apply these machine learning algorithms at the point where the data is generated, further improving the real-time quality of data control and the dynamic updating of the knowledge graph [99]. Through knowledge graph completion technology, machine learning algorithms can deeply analyze existing knowledge graphs and identify patterns and associations in them. In this way, machine learning helps fill in the gaps in the knowledge graph, adding new entities and relationships to make the knowledge graph more comprehensive and accurate. Martinez-Rodriguez et al. [171] utilized OpenIE to generate binary relations for constructing specific knowledge graphs. They proposed to facilitate the extraction of named entities and individuals in knowledge graph and the completion of knowledge graph, and to mine more useful data information for the construction of knowledge graph. Wu et al. [172] employed a BERT-BiLSTM-CRF entity extraction model to extract data from safety incidents to build a safety incident knowledge graph, aiming to prevent accidents and improve construction safety management. Han et al. [173] used a residual dense block convolutional neural network to mine entities in the Vietnamese corpus. By establishing a Vietnamese corpus knowledge graph from the mined information, they conducted an in-depth analysis of Vietnamese grammar and morphology. With the addition of edge computing, these machine learning techniques can be processed more efficiently in real time at the original location of the data, providing a more rapid and accurate updating mechanism for the knowledge graph. These examples highlight how machine learning not only accelerates the creation and update of knowledge graphs but also enriches their utility and application across various domains, making it an indispensable tool for advanced data analysis and intelligent decision-making in the era of big data.

Deep analysis and knowledge discovery, especially with deep learning technologies, have made it possible to perform advanced analysis tasks on knowledge graphs, such as semantic search, powering recommendation systems, and predicting trends [174]. This deepens the understanding of complex relations between entities, enabling more accurate and personalized services. Ko et al. [175] built knowledge of design rules for additive manufacturing based on machine learning and knowledge graphs. By analyzing and reasoning over the knowledge graph, they enhanced the automation and autonomous construction and improvement of design rules for additive manufacturing, better exploring the impact of additive manufacturing on part quality. Lovera et al. [176] conducted sentiment analysis from Twitter data using deep learning classification and sentiment knowledge graphs. By predicting emotions in short texts, they analyzed and mined user profiles, aiding in decision-making and preference recommendations for users. Furthermore, machine learning has facilitated the integration and application of cross-domain knowledge, linking data and knowledge from different fields through knowledge graphs to support the formulation of complex problem-solving strategies. This holds immense value for multidisciplinary research and complex decision support, illustrating the transformative impact of machine learning and deep learning in enhancing the functionality and applicability of knowledge graphs across various sectors. Through such integration, knowledge graphs become not only a repository of interconnected data but also a dynamic tool for insight generation, pushing the boundaries of how information is analyzed and utilized in the digital age.

The advancements in machine learning and AI technologies herald the emergence of more innovative methods and models to optimize the construction and analysis of knowledge graphs. Graph neural networks show excellent ability in processing graph structured data. Because graph data often contains complex relationships and connections that traditional neural networks cannot handle directly, graph neural networks are specifically designed to handle graph data. Graph neural networks can capture the complex relationships between nodes directly and effectively on the graph structure. Graph neural network can analyze the hidden meaning of key nodes in knowledge graph by aggregating the information of neighbor nodes. This enables graph neural networks to make predictions in node-level, edge-level, and graph-level tasks. Li et al. [177] designed a novel graph neural network, which uses attention mechanism to embed knowledge graph learning, so as to mine hidden information of knowledge graph. Li et al. [178] performs representation learning of knowledge graph by simplifying heterogeneous graph neural network, and fully explores the potential connections between nodes in knowledge graph. Therefore, knowledge graph related technology based on graph neural network has become the future development trend. The application of machine learning in the knowledge graph will further advance the technology towards more intelligent, automated and efficient, providing more powerful and flexible knowledge management and decision support.

The potential of knowledge graphs in enhancing audit quality and efficiency

Knowledge graphs exhibit significant long-term potential in the audit domain, especially for the intelligent auditing of power engineering projects, which are characterized by their high complexity, numerous stakeholders, and stringent compliance requirements. By processing and analyzing data at the location where it is generated, edge computing can speed up data integration and improve the efficiency and real-time performance of the audit process [179]. By integrating and analyzing diverse data types from multiple sources, such as project management tools, financial systems, and compliance databases, knowledge graphs offer a comprehensive view of projects, enabling auditors to conduct thorough and nuanced audits. The technology facilitates the mapping and analysis of complex relations between various project entities, automates risk identification through machine learning algorithms, and supports real-time monitoring and updates. This capability not only accelerates the detection of potential risks, fraud, or compliance issues but also enhances decision support, allowing for more informed recommendations. Furthermore, the application of advanced analytics to knowledge graphs enables predictive insights, forecasting potential project delays, cost overruns, or compliance breaches before they occur, thereby allowing for preemptive risk mitigation measures. Edge computing further enhances this predictive power by analyzing data instantly at the project site, providing auditors with more timely insights to help them develop more effective risk management strategies. As knowledge graph technology continues to evolve, its sophisticated application in auditing power engineering projects promises deeper insights, heightened audit efficiency, and contributes to the successful management and completion of complex projects, representing a transformative approach to elevating audit quality, efficiency, and effectiveness.

In enhancing audit quality, knowledge graphs can integrate various data sources from power engineering projects, including but not limited to financial data, contract documents, progress reports, supply chain information, and compliance materials [180]. This comprehensive data integration provides auditors with a holistic view, enabling a more complete understanding of project status, thereby improving the accuracy and depth of audits. Additionally, by analyzing relations within the knowledge graph, auditors can identify potential risk points, audit anomalies, and non-compliant patterns. For instance, knowledge graphs can reveal potential reasons for project cost overruns or inconsistencies in contract execution. Knowledge graphs also enhance audit efficiency. Using machine learning algorithms in conjunction with knowledge graphs can automate risk assessment and rapidly identify high-risk areas. This automation not only significantly reduces the need for human resources but also shortens the audit cycle. Moreover, knowledge graph technology enables real-time monitoring of power engineering projects, providing timely risk alerts. Audit teams can adjust their audit focus and resource allocation based on real-time data, thereby increasing work efficiency. Meng et al. [181] proposed a BERT-based EPAT-BERT model to improve the quality and efficiency of auditing in the power engineering domain. By predicting words and entities in power-related texts through word-level and entity-level masked language models, it learns the rich morphology and semantics related to electricity. This model classifies audit texts in power engineering projects to unearth anomalies and risk alerts within the audit texts. Li et al. [182] applied knowledge graph algorithms to trace the data ownership systems in power engineering, efficiently and accurately aiding business personnel in identifying the data owners of unknown data fields. Once data owners are identified, they can quickly determine the source and responsibility from the origin, govern the data from its source, and control data quality, effectively reducing the costs associated with data source management, data governance, and data operations.

Knowledge graphs have demonstrated long-term potential in enhancing audit quality and efficiency within the intelligent audit of power engineering projects. By comprehensively integrating project-related data, uncovering relations and patterns, automating risk assessments, and enabling real-time monitoring, knowledge graphs not only help audit teams understand project situations more accurately and in-depth but also significantly improve the efficiency of audit work. As technology advances and its application deepens, knowledge graphs are set to play an increasingly important role in the field of intelligent auditing, transforming how audits are conducted and offering innovative solutions to complex audit challenges.

Cross-domain applications

Knowledge graph technology has successfully expanded its application scope beyond the audit domain to key areas such as financial reporting [161, 183] and risk management [155], demonstrating its exceptional ability to handle complex data and enhance decision support. By aggregating multi-source data, this technology has constructed a unified and comprehensive information platform for these fields, significantly aiding businesses and institutions in risk identification, compliance monitoring, decision optimization, and improving operational efficiency. This evolution showcases the versatility and impact of knowledge graphs, making them an invaluable tool for navigating the complexities of modern business landscapes and fostering informed, strategic decision-making across various sectors. Figure 6 illustrates the application of knowledge graphs in financial reporting and risk management.

Fig. 6
figure 6

Application of knowledge graph in financial reporting and risk management

In financial reporting, knowledge graphs integrate numerous internal and external data resources, such as financial information, market trends, regulatory standards, and historical records, enabling financial analysts to gain deeper insights into financial health and business performance [184]. Liang et al. [185] developed a graph network model to establish holistic connections between decentralized information, filtering through large, complex datasets in financial reports to unearth key information applicable to diverse decision-making scenarios. Knowledge graphs are utilized to analyze competitors’ financial performance and market movements, thereby precisely positioning market status and formulating strategies. Given the complexity and verbosity of financial reporting information, manual information extraction could diminish the accuracy and timeliness of investment decisions. Zehra et al. [123] proposed a financial knowledge graph through semantic modeling of annual financial reports, using the mined information from the financial knowledge graph for decision-making. Hou et al. [86] combined relevant financial reports and financial risk characteristics to construct an intelligent financial accounting model based on knowledge graphs. They explored financial report risk features using various knowledge graph analysis techniques, integrating financial risk characteristics with traditional financial reporting cases to achieve financial report risk prediction and decision-making. Wen et al. [186] designed a knowledge graph that employs analysis techniques to mine relations between managers and related institutions through financial reports, analyzing, and predicting to prevent financial fraud events.

In the realm of risk management, knowledge graphs offer businesses a comprehensive risk profile, enabling the identification and assessment of a wide range of internal and external risk factors, including financial, operational, market, and compliance risks. By delving into these risk elements and their interrelationships, businesses can more effectively mitigate risks, implement risk management strategies, and respond swiftly in emergency situations. Wu et al. [3] utilized audit relations among companies, audit firms, and auditors to build an audit information knowledge graph, proposing a knowledge graph reasoning framework based on sub-feature extraction methods capable of detecting potentially fraudulent enterprises. Kosasih et al. [187] designed a knowledge graph for supply chain risk management, integrating graph neural networks and knowledge graph reasoning techniques to proactively identify hidden risks within the supply chain. Liu et al. [188] constructed an accident knowledge graph using historical reports. Initially, entity information within the reports was mined using a Bi-LSTM-CRF model, followed by classification with random forests to establish the accident knowledge graph. Ultimately, the accident knowledge graph was used to uncover potential relations between hazards, failures, and accidents, assisting in the formulation of railway risk prevention measures.

These cross-domain applications highlight the immense value of knowledge graphs in enabling enterprises to conduct financial reporting and risk management more effectively. Through the comprehensive analysis and integration of various types of data, knowledge graphs not only optimize decision quality and process efficiency but also provide strong support for businesses to meet challenges in complex and ever-changing commercial environments. This demonstrates the transformative potential of knowledge graphs in enhancing the strategic capabilities of enterprises, underscoring their role as a crucial tool in navigating the intricacies of modern business practices.

Conclusion

In this paper, we conducted a meticulous investigation and analysis of the key technologies related to knowledge graphs. Subsequently, we delved into the applications of knowledge graphs in the field of intelligent auditing and the challenges they face, proposing directions for future research. Through comprehensive analysis of the use of knowledge graphs in intelligent auditing processes, we highlighted their immense potential in enhancing audit efficiency, accuracy, and depth. Future research should focus on several aspects. Firstly, researchers should develop more efficient algorithms and tools to increase the automation of knowledge graph construction, reduce manual intervention, and optimize their performance in a cloud environment. Secondly, there is a need to explore more effective knowledge representation and reasoning mechanisms to enhance the application effectiveness and accuracy of knowledge graphs. Additionally, researching how to better integrate and utilize heterogeneous data from multiple sources, including data stored in the cloud, to enrich the content and scope of knowledge graphs is also an important direction for future research. Finally, considering the importance of data privacy and security, addressing how to protect the privacy of individuals and businesses in intelligent auditing applications should be a key focus of future research.