Knowledge Extraction and Discovery Based on BIM: A Critical Review and Future Directions

In the past, knowledge in the fields of Architecture, Engineering and Construction (AEC) industries mainly come from experiences and are documented in hard copies or specific electronic databases. In order to make use of this knowledge, a lot of studies have focused on retrieving and storing this knowledge in a systematic and accessible way. The Building Information Modeling (BIM) technology proves to be a valuable media in extracting data because it provides physical and functional digital models for all the facilities within the life-cycle of the project. Therefore, the combination of the knowledge science with BIM shows great potential in constructing the knowledge map in the field of the AEC industry. Based on literature reviews, this article summarizes the latest achievements in the fields of knowledge science and BIM, in the aspects of (1) knowledge description, (2) knowledge discovery, (3) knowledge storage and management, (4) knowledge inference and (5) knowledge application, to show the state-of-arts and suggests the future directions in the application of knowledge science and BIM technology in the fields of AEC industries. The review indicates that BIM is capable of providing information for knowledge extraction and discovery, by adopting semantic network, knowledge graph and some other related methods. It also illustrates that the knowledge is helpful in the design, construction, operation and maintenance periods of the AEC industry, but now it is only at the beginning stage.


Introduction
In the past decades, knowledge in the fields of Architecture, Engineering and Construction (AEC) industries mainly comes from experiences and is commonly documented in hard copies or specific electronic databases. Such documents include regulations, standards, manuals and text books [1,2] . Retrieving information from documents and specific electronic media is, however, difficult and error-prone [3], which brings troubles to extract and reuse the knowledge [4]. With the developments in information technologies such as Building Information Modeling (BIM), Geographic Information System (GIS), Internet of Tings (IoT), cloud computing and so on, more and more data are collected in the AEC industry. However, data are not information when they are not useful, regardless of providing knowledges. Therefore, a lot of studies focused on extracting knowledge from the data via various methods, including ontology, semantic network, data mining algorithms, etc. For example, ontology is adopted to define and to represent the categories, properties and relationships among the concepts in the building industry. The concept of semantic network was proposed in the early 1990s when Berners-Lee et al. [5] extended the concept of World Wide Web (WWW) to handle the data accumulated in network communications without manual works. More specifically, ontology helps transforming the naturallanguage-based network data to information recognized by computers, which facilitates the process of inquires or inferences conducted by the computers as it "knows" the concepts and the relationships among them [6]. The semantic network, on the other hand, extracts new knowledge in the aspect of "facts" from the internet via different approaches. These facts are related to each other, and thus can be organized in a series of relationship graphs which is known as Knowledge Graph (KG). The KG combines data and discovers knowledge from different sources by analyzing the grammar, vocabulary and structure characteristics of the texts. Meanwhile, data mining algorithms such as clustering and pattern recognition are used to retrieve the hided information from big data. All these 1 3 methods are useful in terms of providing new knowledge for the AEC industry.
The concept of BIM technology was proposed in the 1970s [7]. In the recent 2 decades, it has brought great positive impacts to the AEC industry [8]. A BIM model involves physical and functional digital models for all the facilities of the building by applying digital modeling and associated technologies (DMAT) to collect data [9] and adopting unique, readable data standard to support the share of data among participants in different phases within the life-cycle of the project. It should be noted that it improves the collaborations among various participants from different aspects of the project as well. BIM can also integrate domain knowledge and expert methodologies for automated and intelligent applications [10]. Since BIM grows into a mature stage nowadays, a considerable number of BIM systems and corresponding data platforms are applied to the AEC industry [11]. With the growth in data amount, the participants have gradually realized the potential of discovering new knowledge from the data accumulated in the BIM. For example, researchers found that the value of BIM can be improved by adopting semantic network to support the data integration and complicated search requirements among different data sources [12]. Such trend is observed by the very influential organization BuildingSMART who revised its technical roadmap into three long-term levels and introduce the fourth layer to encompass the "semantic search" and "cloud database" domains. Moreover, studies concerning the application of BIM in the knowledge generation/extraction are continuously reported, including ontology-based data management and sharing, knowledge fusion among different domains and automatic logic inference and knowledge retrieve and so on.
Nevertheless, there are still a lot of works remained to be studied in the BIM-based knowledge related domain to improve the generality and efficiency of knowledge management. In order to provide an overview of the state-of-the-art in the knowledge extraction and discovery based on the data accumulated in BIM, to identify possible challenges and to indicate future directions, this paper conducts a systematic review on relevant literature. The following sections are arranged as follows. Section 2 illustrates the methodology and research framework. Sections 3 to 7 discuss the current up-to-date research topics in the aspects of knowledge description, knowledge discovery, knowledge storage and management, knowledge inference and knowledge application, respectively. The last section discusses the challenges and propose the future directions accordingly.

Methodology and Framework
The relevant literature is reviewed in a systematic and organized way in 5 steps. First, the scope of the review is determined as knowledge extraction and discovery based on BIM. Consequently, searches on the existing literature in the scholar database are conducted by combining keywords as TITLE-ABS-KEY (("construction industry") OR ("BIM") OR ("AEC") OR ("construction management")) AND TITLE-ABS-KEY (("ontology*") OR ("semantic web") OR ("knowledge") OR ("linked data")). Given the raw database, the less relevant studies are filtered out via examining the title and the abstract manually. Then summarize the high cited authors, journals and keywords, followed by the in-depth study of the main stream literatures to sum up the state-of-art research topics and future directions.

Research Scope
In order to focus on the BIM-based knowledge extraction and discovery, this review mainly covers the studies in a close relationship with BIM, semantic web, ontology, resource description framework, web ontology language, knowledge management, etc. Particularly, the combination of knowledge domain and BIM has brought great changes to the AEC industries [13]. In the early twenty-first century, domain-knowledge-based semantic technologies were introduced to the AEC industries [14]. For example, Pan et al. [15] and Elghamrawy and Boukamp [16] discussed the extra values that the semantic web would benefit the construction projects. At the same time, some recent reviews discussed the knowledge of BIM [17] and its application to architecture design, energy simulation, intelligent optimization, safety management [18], design and analysis of city spaces, integration of BIM and geographic information system, design codes compliance [19], facility management [20], etc. These topics were summarized according to the Latent Semantic Analysis (LSA) of the themes. Because BIM itself emphasizes the information within a digital model, the reviewed studies mostly focused on how to provide essential information to meet the requirements of specific domains or applications. In order to be organized, the studies are divided into 5 main stream topics, i.e., knowledge description, knowledge discovery, knowledge storage and management, knowledge inference and knowledge application as listed in Table 1.
Each research topic listed above relies on the building information in particular form and some of them even rely on information from different data source. BIM technology is capable to provide the information framework including data standard, data management and data platform to integrate these essential data. Meanwhile, semantic web provides technical framework for knowledge description, query language and inference engine for knowledge extraction and discovery from BIM-based data platform. Therefore, the integration of these two technologies is of significant potentiality as summarized in Fig. 1 for knowledge engineering in AEC industries.

Statistics of the Literature
This research mainly examines the literature published from 2009 to 2019 in the Web of Science (WoS) [21] and Scopus [22] database. The numbers of the reviewed papers are illustrated in Fig. 2. Generally, the number of papers in the field is growing year by year especially after 2014, implying that BIM-based knowledge management becomes an important trend attracting scholars concerning information technologies in AEC industries.
According to the filtered literature, the top cited and hence the most influential scholars such as C. Eastman, J. The formal logic basis of semantic w e b l a n g u a g e s a l l o w s t o automatically generate the proofs for what is inferred from model uses Beetz and P. Pauwels, and international academic journals such as Automation in Construction and Advanced Engineering Informatics are summarized in Fig. 3. Figure 4 shows the most active keywords include BIM, ontology and semantic web, etc.

Research Topic 1: Knowledge Description
Hannus et al. [23] used a term "island" to depict the gaps between information. When the islands can exchange information, it means that a public data model, instead of specific models, should be established for describing the objects in a common form, regardless of professions, processes and systems. Industry Foundation Classes (IFC) related standards  Key words related to knowledge extraction and discovery on building information modeling are the model of the kind for building information description and Resource Description Framework (RDF) [24] and Web Ontology Language (OWL) [25] are the models for semantic description.

IFC Standard Framework for Building Information Description
The IFC standard framework which was proposed by Build-ingSMART as an open standard framework for the delivery and support of assets in the built environment, consists of IFC, Data Dictionary (bSDD), Information Delivery Manual (IDM) and Model View Definition (MVD) and BIM collaboration format (BCF) [26]. The IFC is an open and neutral data format for describing BIM and all the elements inside the model, to provide a unique data exchange standard. Nowadays, a lot of BIM compatible software adopt IFC as one of the data exchange formats [27]. The bSDD is developed from International Framework for Dictionaries (IFD), constitute a library of objects and their attributes to identify objects in the built environment and their specific properties regardless of language. The IDM presents a method to define the information exchange requirements through process modeling, so that all participants in the organization know which and when different kind of information has to communicated. The MVD is an IFC view definition which defines a subset of the IFC schema to meet the needs of one or many exchange requirements within the IDM. The MVDs are encoded in a format "mvdXML", and define allowable values at particular attributes of particular data types. The BCF is an open file XML format "bcfXML" that supports the workflow communication in BIM processes and provides webservice "bcfAPI" for software development to exchange the BCF data. All these standards are nowadays formulating the most important and acceptable BIM standard framework to store the information and knowledge related to the AEC industries.
There are certainly studies that have developed an objectoriented information model [28] or common applications by adopting the IFC framework regardless of platform, machine or data source [29]. Commonly in these studies, the researchers adopted the data modeling language EXPRESS to describe an Entity-Relationship (E-R) model, including several hundreds of object definitions through a tree structure. In current released IFC schema, there are 4 layers, i.e., resource layer, kernel layer, shared layer and domain layer. Each layer has several modules. The resource layer consists of definitions for basic information resources such as materials, geometries and topologies. But these definitions can not be used alone without linking to entities from other layers because they do not contain the Globally Unique Identifier (GUID). While the entities in other 3 layers have GUID. The kernel layer defines the core data models including object definition (such as the position and geometric appearance of an engineering object and the relationships among these objects). The shared layer defines common entities that can be utilized by various professional domains or processes (such as walls, beams, doors and windows). The domain layer defines professional domain dependent entities for information exchange within the domain (such as steam boilers, fans and dampers in the Heating, Ventilation and Air Conditioning (HVAC) domain).

RDF and OWL for Semantic Description
The RDF is proposed by the World Wide Web Consortium (W3C). By using a variety of syntax notations and data serialization formats, it aims to describe the metadata data model to serve as the semantic network descriptions. Specifically, the RDF adopts expressions of subject-predicate-object, known as "triples", to describe resources. It is capable and widely adopted to describe the semantic-based knowledge because the triples can be presented in RDF graph which is a kind of directed label graph. In RDF graph, each node refers to a concept or an object of the real world and the node is identified by a Uniform Resource Identifier (URI). Through the directed linkages between these nodes, the information become readable and reusable by computers.
Being built on the RDF, the OWL [30] is also a family member of knowledge representation languages for authoring ontologies. The RDF graph that built based on OWL concept is called OWL ontologies. The current version of OWL is OWL2. Its profile is summarized in Fig. 5. The ovals upside represents the abstract concepts in ontology and therefore can be considered as an abstract structure or an RDF graph. In fact, any OWL ontology can also be regarded as an RDF graph. The relationship between ontology

ifcXML and ifcOWL
Because EXPRESS language lacks semantic information, logic-based languages such as OWL are believed to have advantages in the aspects of knowledge description, semantic data sharing, reusing the existing ontologies and cooperativity among software [31]. As a result, in order to improve the universality and accessibility, IFC also provides another two formats for building information, i.e., IFC-XML and IFC-OWL besides EXPRESS format. IFC-XML files, with a ".ifcXML" suffix, follows the rules of Extensible Markup Language (XML) and the constraints of IFC, ifcXML schema and XML schema. Based on the ifcXML, the ifcOWL language extends the OWL standards to provide BIM-based ontology language and thus improves the scalability problems caused by EXPRESS. Furthermore, in order to integrate the descriptions of both building information and semantic, some scholars proposed platform independent frameworks to transform the IFC data into semantic representations [32]. Such platform provides semantic rich and human readable information by exchanging data from different product information models. Schevers and Drogemuller [33] presented a transition diagram, and Beetz et al. [31] suggested a semi-automatic method to convert the EXPRESS-based IFC data to OWL ontologies. Barbau et al. also proposed regulations for such transition and developed an OWL plugin based on Protégé [34]. Zhang and Issa [35] asserted that by converting IFC to OWL, the information model can be restructured by adopting information technologies, and retrieving information from IFC can be more efficient. Pauwels et al. [36] demonstrated the use of Semantic Web Rule Language (SWRL) to enrich the OWL version for IFC and create the semantic rule checking environment. They also suggested that specific rule ontologies should be developed based on SWRL and connected to the kernel existing ontologies in the AEC industries.

Research Topic 2: Knowledge Discovery
It is easier for people, not computers, to understand the building information and the contents behind the big data, thus it is always a prospect to achieve the machine readable and exchangeable information so that the knowledge can be discovered by the computers themselves [37].

Knowledge Item and KG
Knowledge can be divided into 2 kinds [38], explicit knowledge and implicit knowledge. The explicit knowledge is usually recorded in natural language or readable symbols for communication while the implicit knowledge is gained through incidental activities, or without awareness [39]. In a project, solutions and technical frameworks often rely on experienced participants and such kind of knowledge is also considered as implicit [40]. Most information-based knowledge management studies focus on capturing implicit knowledge in construction projects [41,42]. Ugwu et al. [43] introduced ontology idea to support the mining, representation and reusing of knowledge for constructability assessment of steel structures, demonstrating that ontology is capable of obtaining domain knowledge, as well as turning the implicit knowledge into explicit. However, capturing implicit knowledge requires huge amount of manual works and lacks of a common way to establish the knowledge from bottom to top. Then the KG and Artificial Intelligence (AI) including deep learning model and Natural Language Processing (NLP) tools are the possible means to make discovering knowledge easier.
The KG was first named for the search engine of Google in 2012. Ever since then it has been promoted to a concept that refers to the network-based semantic database [44]. A KG which contains a series of entities, as well as the attributes and relationships between these entities can be extracted from various data sources (such as websites) through analyzing the text syntax, words and phrases, and the structure characteristics of the text. Compared to semantic network, the KG requires fewer manual intervention because it integrates the algorithms for information acquisition and handling, and is extendable via automatically extracting knowledge from the internet. Besides Google KG, there are now some generic KG such as DBPedia [45] and YAGO [46] and geoscience KG as AEC domain KG [47]. The KG has been widely applied affecting the people's daily lives, particularly in the area of information retrieval [48] and knowledge inference [49]. In AEC industries, researches have been carried out to construct KGs for managing interrelated project information from multiple participants [50] and identifying hazards on construction sites [51]. However, current KG is far from sufficient for the AEC industries. One challenge of generating a domain-specific KG is to integrate different information sources by a universal method.

Extraction of AEC Entities
The objective of entity extraction is to recognize those AEC domain related entities automatically from text contents. Here the entities refer to those AEC terminologies and will be considered as network nodes of a KG.
In non-structured building information, the process of entity extraction belongs to Named Entity Recognition (NER) in NLP, which has been presented with several technical algorithms. The early methods for NER are rulebased [52] or dictionary-based [53]. Although simple, these methods have found their applications in specific fields of AEC industry, such as recognizing the affected infrastructure and contractor entities in failure reports [54]. Machine-learning methods are then employed to replace the entity recognition to sequence labeling tasks, in which the labels are determined by their probabilities. Some typical machine-learning models for NER include Hidden Markov Model (HMM) [55] and Conditional Random Fields (CRF) [56]. Recently, the accuracy of NER is improved by adopting the deep learning technology, which also labels the sequence but the model is more complex. Recurrent Neural Network (RNN) is one successful example of such technology [57]. The neural in the same layer of an RNN is directedly connected to form a circulation so that the dependencies between the words within a sentence can be analyzed easier. Moreover, the RNN can be combined with traditional machine-learning models such as CRF to further improve the accuracy [58]. Attention mechanism can also be compounded with neural networks to improve the prediction performance [59]. Evidence shows that as a branch of RNN, long-and shortterm memory bi-directional neural network (Bi-LSTM) solves the problem that a normal RNN cannot "remember parameters" for a long time, and shows the highest accuracy and performance [60], thus it is expected to be potential in entity extraction in the AEC domain as well [61].
In structured BIM models, building entities are defined and represented object-orientally already. The BIM software provides geometric (such as length, width and depth) and non-geometric (such as color, fireproofing grade) parameters for various basic building objects. However, these default parameters only provide the information of an object, not the direct knowledge. Thus, some researches encourage BIM users to defined their own parameters to attach knowledge to building objects or projects in the BIM model [62,63]. For example, in Autodesk REVIT, users can define 2 types of parameters, i.e., project parameters and shared parameters but only the latter ones can be shared between families or projects throw an Open DataBase Connection (ODBC). Peng et al. [64] utilized this feature and generated evacuation entities by defining evacuation parameters in REVIT models for public building safety management. Deshpande, Azhar and Amireddy [65] developed a BIM-based knowledge system for users to define parameters representing the human experiences so that the knowledge can be extracted.

Extraction of Relationships Between AEC Entities
The objective of relationship extraction is to recognize different kinds of relationships between the entities. These relationships become the directed edges to connect the nodes in a KG, forming a net-shaped structure and depicting how the entities work together in logic and physical manners.
In non-structured texts, researchers also proposed similar algorithms as extraction of entities including adopting rulebased and deep-learning models. Particularly, Residual Convolutional Neural Networks (ResCNN) is widely adopted because of its high accuracy and low requirements of manual labeling work [66]. The structure of ResCNN is illustrated in Fig. 6. It consists of convolutional layer, pooling layer, full connection layer and Softmax process. The ResCNN firstly transforms the word sequences into vectors. Then it has shortcuts between several convolutional layer to form residual blocks so that the input can be directly involved in the calculations. The neural network is hence more stable because the learning objective becomes the residual instead of predicted results [67]. The Res-CNN finally predicts the probabilities of each relationships and the one with the highest probability can be regarded as the extracted relationships.
Besides constructing pipeline systems that extracts entities and relationships successively, the concept of end-to-end knowledge extraction which achieves entity recognition and relationship discovery in one neural network has emerged in recent years. End-to-end tasks also adopts deep neural models such as Bi-LSTM-CRF networks, while the encoding and decoding process is carefully designed to consider both entity and relationship information [68]. The end-toend model solves the problem of error propagation that is common in pipeline systems, and is proved to get state-ofthe-art performance.
Besides the semantic logic, the extraction of spatial relationships is also important in the building management [69]. Current BIM supported knowledge management gradually concerns extracting the spatial relationships by geometric information within the BIM model [70]. Most BIM models can be converted to IFC structure represented data and IFC describes geometric information by taking Curve2D, GeometricSet and GeometricCuverSet as basic elements. IFC also adopts SurfaceModel and SolidModel to describe the 3D models in surface and solid modes. The SolidModel can be further divided into types such as SweptSolid, B-rep and CSG, etc. According to such decomposition of geometric information, Borrmann discussed the spatial data structures and the definitions of analytic operators [71] and summarized the topologies among points, lines, surfaces and volumes as boundaries, interiors, exteriors and closeloop. Table 2 compares different kinds of operators, their theoretical basis and relationships. The logic chain can also be automatically generated according to the spatial information within a BIM model and pre-defined identification rules [72].

Research Topic 3: Knowledge Storage and Management
Knowledge should be shared between members in an organization and between organizations. Given the goal, knowledge is better stored in a computer readable form.
The storage strategy has great impacts on the efficiency of knowledge retrieval and reuse. Only the knowledge is appropriately represented can it be effectively stored.

Storage Medium for Knowledge
Wang and Meng [75] gave a review on BIM-supported knowledge bases and concluded that BIM have multiple benefits including 3D visualization and collaborative work compared with other IT-based knowledge bases. However, the data and even some knowledge stored in the BIM database are mostly useful for a specific project, but not for common purpose. In order to reuse such knowledge, some studies tried to establish independent knowledge bases that were linked to BIM model through information standards (such as IFC). For example, Motawa and Almarshad [63] developed a system that separated the knowledge base and BIM model. Ding et al. [76] also presented a system to achieve the idea that knowledge could be collected and integrated in a platform and reused in other relevant projects. Ontology uses sharing formats to conceptualize domain knowledge but normally it does not support knowledge exchange between domains [40]. Combining ontology and semantic technology, domain knowledge is meaningful to other domains, given that the data is properly interpreted. For example, the knowledge management model developed by Lee and Jeong [77] can transform a specific domain format to a neural one, which is an opposite direction by the semantic mechanism. Even in the same knowledge domain, ontology regulations are heterogeneous [78]. Beside ontology, linked data [79] and fuzzy multi-criteria model [80] can also be a media to link cross-domain knowledge.  The development of ontology framework brings new ways to knowledge retrieval in BIM environments. One example is the storage and management of the knowledge concerning construction risk based on both ontology and BIM [81]. But usually, ontology is responsible for reflecting a set of concepts and the relationships among them for specific domain knowledge. Therefore, some efforts were made on developing a shared ontology to embed various common domain concepts in the BIM environment [82]. The shared ontology can be considered as the semantic agency for the alignment of domain knowledge, so that users can create their own ontology based on the shared ontology.
To achieve the semantic collaboration among various domain knowledge, a common semantic mechanism should be established by the integration of different ontology knowledge in BIM environment [83]. As a result, the integration of BIM and knowledge-based system has become a new technical trend. Deshpande et al. [65] proposed a method to acquire, retrieve and store information and knowledge within BIM, as well as a knowledge classification and propagation framework. Ho, Tserng and Jan [84] developed a BIM-based knowledge sharing management system for both managers and engineers to share knowledge and experience within BIM environment. Efforts have also been made to link heterogeneous BIM data by constructing an ontology-based vaults database and prompt data sharing among different domains include architecture, engineering, construction and facility management [85].

Framework of Knowledge Storage
As summarized in Fig. 7, there are 4 typical BIM knowledge storage frameworks, i.e., file-based, central database, single server and cloud server [86].
Most of the file-based frameworks are designed on top of IFC standard and its file formats are frequently IFC-based, such as IFC-SPF with a .ifc suffix [87]. In the central database framework, researchers adopted all kinds of databases including relational database, object-oriented database, key-value database and relational-object database as their storage medias. However, neither file-based nor central database framework contains a generic layer for the use of model and knowledge. Instead, users should develop their own applications according to the functional requirements. A way to improve the data share among participants is to set up a BIM server for users and their applications. This server should provide extra functions based on the BIM data such as model review, 3D visualization, version control and collision detection, etc. Three representative BIM servers are the IFC model server developed by VTT Building and SECOM Co. Ltd. [88], EDM model server by Jotne EPM [89] and Bimserver.org by TNO Netherlands and Eindhoven TU [90]. Since cloud computing technology has the potential to reform the information management for the AEC industry, the cloud computing platform for BIM model management, which is helpful to reduce the cost while providing higher performance [91], has been proposed in the past a few years and become more and more mature nowadays. This idea can be traced back a decade, when Zhang proposed a BIM-based construction Information Integration Platform (BIMIIP) [92]. Based on further researches on data collaboration within BIM systems [93], Zhang et al. developed a prototype system (BIMDISP) to establish a multi-server data-sharing environment [86]. Some typical commercial cloud computing platforms for BIM are Autodesk BIM 360, Cadd Force, BIM9, BIMServer, BIMx and STRATUS, etc. Linking domain knowledge.

Linking Domain Knowledge
How to manage the domain knowledge that depends on semantic ontology is an emerging research area because such technologies provide collaborative representation for domain knowledge such as data from BIM, GIS, sensors and building automation systems (BAS), etc. [94]. Besides, they also provide linkages between data [95]. These technologies often adopt ifcOWL to link building data from a number of data sources because ifcOWL brings the advantages of (1) providing RDF to present any data type; (2) allowing extensions of logic inference by adopting OWL and (3) linking information graph in a network. The framework of building data linkage across domain knowledge based on ifcOWL is illustrated in Fig. 8.
To integrate the data in BIM, GIS and CAD platforms is also an ongoing research topic, which in some aspects leads the integration of knowledge management [96]. GIS is widely applied in infrastructure projects within their lifecycle. The data standard for GIS such as CityGML is organized by the Open Geospatial Consortium (OGC). The GML model, having close relationship with RDF, is appropriate to link data. Taking advantage of this characteristic, a large number of software and products support effective storage and spatial inquiry of large GIS data set, by introducing the geometric description WKT [97] (Well Known Text) and a query language GeoSPARQL [98].
Particularly in large infrastructure projects, the integration of building data and geographic data is essential to support project knowledge management and exchange among all participants from multi-disciplines [99,100]. Mignard et al.
[101] developed a Siga3D system to integrate BIM data and GIS data and achieve the city-scale infrastructure management. Beetz and Borrmann [102] introduced linked data and discussed the spatial semantic data exchange, management and related applications to the design, construction and operation management of road projects, and developed a system to integrate various data sources. Zhao, Liu and Mbachu [103] linked similar concepts between BIM and GIS ontologies by ontology mapping and used integrated BIM-GIS data to support highway planning process.

Building Energy Consumption and Facility Maintenance Knowledge
Knowledge integration management also focuses on information integration across different phases in the building life-cycle [104]. A typical example is constructing a domain ontology schema for the building information and Mechanical, Electrical and Plumbing (MEP) data to provide knowledge for building energy analysis and optimization, especially for applications during design, and operation and maintenance periods.
In the design period, Korman, Fischer and Tatum [105] established an MEP knowledge base to represent the complex compositions of MEP systems by retrieving, analyzing and summarizing related knowledge, based on which they developed a knowledge inference method as well as a prototype system to utilize such knowledge for MEP design and conflict analysis. Olofsson et al. [81] through a case study, discussed the integrated application of BIM and Virtual Design and Construction (VDC) to the collaborative design of MEP system, and proposed an implementation routine for the design process and installation according to the roles of contractor and sub-contractors.

Regulatory Data
Rules/Codes -Regulation compliance checking

Product Data
bsDD -Product concepts -Manufacturer data RDF Fig. 8 Linking building data across domain knowledge During the operation and maintenance period, the researches focus on the information integration for building performance evaluation and optimization. For example, some integrated delivery technologies for intelligent MEP management in the operation and maintenance phase were proposed based on BIM models [106]; S. Jiang, Wang and Wu [107] achieved the building energy integrated management based on semantic network and Wu [108] proposed an intelligent evaluation system for green building also based on ontology. Dibley et al. [109] developed an OntoFM system to monitor the building in real-time based on multiagents. In the study, the domain ontology is built up based on ifcOWL and Ontosensor, which is an ontology for sensors themselves. Zhang et al. [110] proposed a comprehensive data model for building performance monitoring by reusing ifcOWL, semantic sensor network (SSN) and building topology ontology (BOT) as references. The model is designed to integrate static facility information and dynamic monitoring data to support the performance management platform. A similar sensing-ontology-based analysis conducted by Marroquin, Dubois and Nicolle [111] was applied to find out the relationships between indoor occupants and building energy consumption. One thing to emphasize in Marroquin's research is that the knowledge learning and inference were employed to provide intelligent analyses. Besides sensing data, construction materials are also capable of supporting the performance analysis of a building by accurately enquiring materials for each element [112]. Knowledge can always be regarded as the kernel to support information retrieval algorithms for energy consumption calculations [113] and most of the energy analyses were based on the kernel building data such as IFC described models, and the energy consumption data [114]. Here, a typical energy data model is SimModel which is widely adopted in energy simulation software for data exchange and share. The SimModel now is included in the building domain ontology [115] and can be transformed to other data model such as RDF graph [116]. Transforming the energy data to RDF model has the advantages in the information acquisition and analysis but further researches are needed in the data exchange standards between IFC and SimModel.

Multi-layer Building Knowledge Integration and Management
In a large-scale building, due to the huge workload of modeling all the details, some researchers prefer to define multiscale information models to provide knowledge in various details. In these studies, the term LOD may refer to Level of Details or Level of Development. The former focuses on the details of the model elements, especial the geometric information, while the latter focuses on the details of additional information attached to the model. The lower the LOD is, the fewer details provided by the model. At the same time, Higher LOD models mostly builds up on top of Lower LOD ones, but any LOD model should be established gradually. In the GIS systems, LOD is also adopted for rapid and multiscaled visualizing the city-level models [117]. According to the differences between GIS and BIM models, how to map and reuse the models within these two kinds of systems are essential for large-scale projects. For example, Kang and Hong [117] proposed a multi-scale mapping method for BIM and GIS based on semantic and multiprocessingbased screen-buffer scanning including mapping rules. Hu et al. [118] also presented a multi-scale management framework based on a multi-scale model for both construction and facility management of large public buildings. The proposed multi-scale model consists of several macro-, micro-and schematic-scale models. The management framework makes use of the multi-scale model and embeds a data management mechanism, as well as algorithms to transform BIM model to GIS map model, according to multi-scale management requirements.

Query Language for Domain Knowledge
Because IFC is built on the base of EXPRESS, which can provide limited query and analysis support for large data set, few query languages are established for IFC [119]. In order to solve this problem, studies have been conducted focusing on the framework of XML or RDF, which provide better supports for query language. Within such frameworks, Structured Query Language (SQL) can be adopted for queries in relational database, XQuery language for XML format, and SPARQL for RDF format. XML is one of the standard representations of structured knowledge. In most situation, XML is an object-oriented mode and can be adopted in the instance files for data exchange. Nowadays, XML and XML Schema are widely adopted in the BIM environment to record AEC knowledge. Scholars have even adopted XML mapping (ifcXML) for IFC models [120]. Furthermore, in order to analyze and filter XML data, XQuery language is proposed and accepted in the set of W3C standard [121].
SPARQL, a kind of RDF query language, is based on graph structured resource descriptions. It is developed based on semantic network. SPARQL adopts "Select-Where" expressions which is similar to relational database queries except that it is combined with ifcOWL for use. Zhang et al. presented a more BIM-compliant query language based on SPARQL, named BimSPARQL [122]. As shown in Fig. 9, SPARQL as a public interface language, with extensive functions designed for querying data from outside the data source, is adopted by W3C [123] and has a series of Application Programming Interface (API) for RDF applications.
OWL, is another kind of ontology language that based of RDF and providing the ontology information framework. Generally, ontology makes data readable in computer and therefore can be deduced by computers according to known knowledge and defined regulations. However, RDF and OWL only support low level inferences. When complicated inferences are needed, more professional languages are required for rule definitions. Such language can customize rules to illustrate complicated logics and support the inference process by using these customize rules. Currently, SWRL, Rule Interchange Format (RIF) and N3logic are such kinds of the rule languages. Here SWRL is a common semantic network rule language based on OWL and Rule ML [124] and can link different data models [125]. The SWRL language has been applied in BIM-based knowledge systems for complicated analysis tasks such as checking whether masonries belong to the same wall by comparing their laying sequences and topologies [126].
It should be emphasized that the formalisms for retrieving information should be established before performing the query. Liu et al. [127] developed the lexicon and syntax of the domain-specific query language, as well as a set of mechanisms to facilitate users formulating query statements, parse the query, and retrieve the information. Their work focused on HVAC systems but seemed to be generic.

CWA and OWA for Knowledge Description
When describing the real word, people use two kinds of knowledge descriptions, i.e., Closed World Assumption (CWA) and Open World Assumption (OWA). According to the CWA, any unknown fact is considered as wrong. When the knowledge is complete in the knowledge base, or users have to make decisions according to incomplete knowledge with no risk, CWA is a good assumption for knowledge inference. In contrast, OWA is more open to handle the incomplete knowledge, and it will return unknown to those undetermined results.
Most existing AEC applications, including BIM systems and public databases, adopt CWA to make decisions according to existing knowledge. But semantic web technologies depend on OWA, because the semantic network is a system with incomplete knowledge. For example, in the semantic network, a common knowledge may imply that A is a sub system of B in the MEP engineering. Then CWA may infer that A is the only one sub system of B because the knowledge base does not show any other sub systems of B. This infer may not be true in many cases. Thus, some researches discussed how to map the information in CWA to those in OWA and extend OWL with integrity constraints [128], ensuring that both CWA and OWA were valuable to AEC software [129].

CBR, RBR and KG for Knowledge Retrieval
Case-Based Reasoning (CBR) and Rule-Based Reasoning (RBR) are adopted in BIM-based knowledge retrieval [130,131]. For example, knowledge manage systems were developed to carry out schedule evaluation [132] and building maintenance based on CBR in a BIM environment [62]. In these systems, BIM model provides the parameters of elements through IFC standard to CBR. Zhang et al. [131] developed an RBR-based system for safety check, also within the BIM environment. This system was responsible of checking the model contents according to pre-defined rules, and then provided the construability and safety optimization reports automatically. GhaffarianHoseini et al. [133] combined CBR and RBR to support BIM-integrated facility failure management where CBR is used for capturing experiences from past problems and RBR is applied to give explicit solutions. The visualization of BIM was proved to be essential in these CBR or RBR systems. It was also demonstrated that when combining CBR-BIM and RBR-BIM, the BIM-based knowledge management system could also be helpful in safety and security recognition within the lifecycle of an AEC project [134].
KG shows great potential in knowledge retrieval in common areas. Its main process includes the link prediction [135] and entity resolution [136]. The former one refers to the prediction of possible relationships between entities in the KG while the latter refers to the recognition and fusion of entities in case that different entities' names represent a unique object or a single entity's name represents several different objects. Thus, Socher, Chen, Manning and Ng [137] proposed a machine learning and knowledge retrieving mechanism based on Neural Tensor Networks (NTN). Wang, Wang and Guo [138] embedded the KG in the lowdimensional vector space and extended both the physical and  Fig. 9 SPARQL-supported domain ontology retrieval extended flow chart logical rules as constraints, to improve the learning behaviors of the KG and the accuracy of the knowledge retrieval.

Ontology Reasoning Machine for Knowledge Inference
Ontology reasoning is a key technique to realize the semantic network for knowledge inference and thus a number of ontology reasoning machines, such as testing machine by W3C and the one embedded in the semantic web framework by HP labs [139], were developed as the basic, supportive tool for ontology creation and application. These ontology reasoning machines, providing the following two main functions, can retrieve the hided information in the ontology and then generate new knowledge. (1) Consistency test for the ontology. The consistency test ensures that all the logics between the entities and instances, and the entities themselves are consistent, as well as no conflicts within axiom constraints of the entities, the properties and instances. (2) Inference of implicit knowledge. The creation of ontology usually follows a principle that the entities and their relationships should be as simple as possible, with sufficient information. Thus, when adopting the ontology, inference of implicit knowledge is important to provide knowledge. Some typical reasoning machines are listed in Table 3. All of these machines maintain the above-mentioned functions and have their own advantages and disadvantages. Among them, Racer [140], Pellet [141] and FaCT + + [142] are all specialized reasoning machines for ontology. Their inference methods rely on traditional description of logics and are optimized by adopting Tableau algorithm. As a result, they show advantages on the inference efficiency but are limited in specific ontology languages and the ability of extension and customization. Specifically, Racer doesn't provide inference for enumeration or user-defined data types, Pellet only supports a few ontology query languages and Jena can only support simple inference rules and can't support OWL inference. Jess [143] is an open source engine which is easier to connect with other applications, adopts generic inference engine to support cross-domain inference. But the efficiency of inference is low.

Research Topic 5: Knowledge Application
There are a lot of knowledge applications to the AEC industry within the project lifecycle. Rivera et al. [144] proposed a methodological-technological framework for the emerging concept Construction 4.0, which took BIM-based knowledge application as a core process to achieve automation and digitization. In this section, some typical and effective applications are studied and summarized according to the design, construction and maintenance phases.

Design Phase
Intelligent design of buildings based on domain knowledge is a fast-growing field in the aspect of application knowledge management to the AEC industry. Computer aided design (CAD) and engineering (CAE) are combined with computational intelligence (CI) to provide integrated application in the design phase. MacCallum once raised a question: "Does intelligent CAD exist?" in 1990 [145] which showed that applying algorithms such as machine learning to the building design was already in place in the early 1990s [146]. The rise of BIM provides an interactive visualization platform and powerful knowledge management system for building design [147], and knowledge-based AI technologies were applied to optimization and configuration [148][149][150], patterns and philosophies [151][152][153], intelligent design [154,155] and interactive design [156], etc. to achieve the goals of automation of manual tasks, personality supports for domain specialist and professional guidance to the amateurs. Moreover, Merrell et al. [157] applied the Bayesian Network based on domain knowledge and real data to stochastically generate a series of layouts and even a complete 3D residential building. Also taking Bayesian model as a kernel element, Fisher et al. [158] presented an approach to arrange 3D furniture objects within a building base on existing examples. These researches explored the possibilities of intelligent building design based on domain knowledge.
Automatic rule-based checking in the AEC industry means assessing the building designs according to various criteria by computer programs [159]. The objective is to automatically check the designed model by interpreting the rules and coding the standards to give the results such as "pass", "failed", "warning" or "unknown" [160]. The rules and standards considered in the rule-based checking are mostly the minimum criteria that a building or an infrastructure should follow to ensure the safety and proper functions. At least two steps are needed in such checking process: (1) formalize the building regulations and BIM models to rule models and representation models, respectively, and (2) make sure that the computer program can parse these two models and then perform the rule-based checking according to the rules and the designed object [161]. Recently, a lot of achievements were observed. For instance, efforts were made to propose a regulation compliance checking for buildings [162,163]. Solihin [164] developed an automated code checking system by adopting IFC model and Express Data Manager (EDM) to evaluate the code compliance in Singapore. Autocodes, developed by Fiatech [165], is capable of code compliance checking for American building according to Existing BIM Standards and Guidelines. Besides, some special checks have also been considered including curriculum problems, spatial requirements and special site needs.
Han, Kunz and Law [166] proposed a hybrid approach to compliance analysis for disabled access, combining the rule coding and performance-based method. Lee et al. [167] built a CBR method to combine with rule checking process and give recommendations after the checking process. Kadolsky et al. [168] showed a case to illustrate how to define the building information management rules based on ontology and then Hu [169] presented an automatic fire safety checking in building design based on BIM and ontology. Baumgärtel et al. [170] discussed how to preserve heat in building design by using rules.

Construction Phase
Knowledge provides an innovative way to better support the construction management in the sub divisions of cost estimation [171], safety management [134,172], and computeraided construction [173,174] and so on. Specifically, when AI technologies are introduced to the construction management, the safety, cost and schedule can be optimized. Sigalov and König [175] presented a process pattern recognition method for BIM-based construction schedules, to solve the problem of manual definition of proper and application-specific process templates, by estimating the similarities in construction schedules. Genetic algorithm (GA) [176] and resource constraints [177][178][179], spatial constraints [175,180] can also be adopted to generate and optimize the construction schedule based on the knowledge provided by the BIM model and environment. The prediction of schedule is also possible for specific construction activities [181]. A systematic review on BIM-enabled risk management applications was carried out by Ganbat et al. [182], and their conclusions shows that timely uploading, recording and checking of data is the key to reduce potential risk.
Construction cost is considered the fifth dimension (5D) of BIM [183], and knowledge also substantiates the results of model-based cost estimation and process optimization. For example, de Soto and Adey [171] presented a resource requirement prediction model to estimate the construction cost according to the detailed material prices. In some researches, BIM and knowledge were combined to select appropriate tower cranes [173] and make a reasonable layout planning for the tower cranes [174] during construction. Equipment travel path can also be optimized to avoid obstacle and reduce construction cost based on BIM and construction knowledge [184]. Most of these researches adopted existing AI algorithms to generate the knowledge hidden in the data or regular activities.
Moreover, in the area of safety and risk management, based on the semantic regulation checking, Zhang et al. [185] proposed a construction safety knowledge for job hazard analysis, which determines the safety issues in the aspects of tasks, activities and resources by transforming the Tekla structural model to RDF graph and combining the ontology and SWRL regulations. Ding et al. [76] constructed a semantic network based on an ontology-described BIMsupported management framework for construction safety knowledge, so as to generate the safety mappings and their relationships. Then by searching the semantics, the application knowledge is combined to a BIM object. With these ideas, knowledge-based BIM system can be developed to capture and to store various types of information and knowledge from different participants during construction [62].

Operation and Maintenance Phase
Knowledge is popular and of high expectations during the operation and maintenance of an AEC project, particularly in the aspects of safety management [186], automatic control [187], energy consumption management [188,189] and decision-making supports [62,190]. In the following three aspects of applications are discussed in detail.
Intelligent answering system is a common knowledgebased application in the operation and maintenance period. A typical intelligent answering system have at least 3 modules, i.e., information handling, question indexing and answer recommending. The system should at first carry out semantic analysis on the queries sent by the users. Specifically, based on the semantic knowledge base, the sentences are preprocessed by analyzing the lexical, syntactic and grammar to extract the semantic in the form of statements and text collections. A BIM-based dialogue system for users to seek answers for building maintenance problems has been constructed by Motawa [191]. In such system, the domain KG is essential to provide comprehensive and deeply related summarization for the answers. Corry et al. [190] proposed an approach to use semantic web technologies to access soft AEC data from social medias, personal communications, mobile networks, indoor locations and financial reports etc., to build up a knowledge base. Through the integration of BIM and semantic network, the building information can be accessed together with other open data sources. This building information includes material, system, occupation information and even the weather. Such information is helpful to predict the energy consumption, future material, and the relationships between environment, people and cost [189]. Particularly in the energy consumption domain, more researches have been carried out in the aspects of human behavior modeling for energy saving in residential buildings [192], and semantic analysis [193] and decision support models [194] for energy efficiency in smart house environments.
Fault detection and diagnostics (FDD), usually consists the processes of fault detection, fault diagnostics and fault recognition, is another knowledge-based application in this period. The methods for FDD can be divided into two categories, i.e., model-based FDD and data-driven FDD [195]. The model-based FDD calculated the deviations between the predictions by the proposed model and the measured values, followed by the comparison of such deviations and pre-defined thresholds to assert the faults. The data-driven FDD, however, extracted the characteristics of the historical accumulated data and then considered these characteristics as the prior knowledge to detect the faults. Nevertheless, with the development of the advanced MEP facilities, the complexity is an obstacle to detect faults and requires skilled professionals to deal with the data [196]. Recently, some researches took advantages of the information integration in BIM to provide the FDD model with information stored in the BIM models, so as to reduce the dependence on professionals. For example, Liu et al. [197] proposed an integrated information framework for automated performance analysis of HVAC systems and identified the requirements for the framework of self-managing HVAC systems [198]. The framework was presented based on the existing standards such as IFC, sensor modeling language (SensorML) and BACnet and so on to establish the automatic retrieval and integration mechanism of related information. Then combined with existing algorithms, the automatic FDD could be achieved. Besides integrated framework, some information knowledge bases for HVAC systems based on BIM were also proposed to improve the efficiency and accuracy of the FDD process [199,200]. Dong et al. [201] focused on the data modeling process of the BIM-based quantified FDD method, and built up an information model for FDD by retrieving and integrating the static information of facilities, topologies, rooms and spaces from IFC and green building XML (gbXML), and dynamic monitoring information from sensors from BACnet protocol. A standalone FDD system was developed to locate the defective facilities in the BIM environment according to the monitoring temperatures of inside and outside the rooms, and the minimum message length principle [202]. These researches proved that knowledge, particularly integrated information is useful for facility management and FDD.
Building energy management is another area that relies on knowledge. The Building Energy Management System (BEMS) is a kind of system that developed to improve the building energy efficiency by collecting and analyzing the current energy consumption data. Thus technically, a BEMS should integrate the BAS and related operating and energy parameters, as well as the human behaviors and environment parameters within the building, to provide knowledge for making decisions. A BEMS has at least three physical layers, i.e., sensor layer, computing layer and application layer [203] and four modules, i.e., sensor and driver module, middleware, handling engine and user interface [204]. The sensor and driver exchange information between digital infrastructure and the physical environment; the middleware integrates generic data interface; the handling engine is responsible to retrieve and analysis the collected data, which include the environment data (such as temperature, humidity and carbon dioxide) and the human behavior (such as working, having a meeting and taking a break) [205]. The user interface interacts with the user after transforming the analysis result to a kind of knowledge. BIM usually integrates domain knowledge of energy management and can be used as knowledge sources to construct BEMS. A transformation workflow from IFC-based BIM to BEMS has been proposed [206].

Discussions and Future Directions
Even though some of the reviewed publications do not accurately use the terms of "knowledge" or "BIM", the ideas of data integration, data-driven approach, semantic analysis, knowledge discovery, etc. have been widely accepted and show that there's a growing interest to combine the knowledge science with the BIM technology to provide better support for the AEC industry. But the development seems to be at the preliminary stage and lacks of the integral, systematic and generic achievements. The review reveals that we are still facing at least the following four challenges.

Lack of fundamental innovation. Fundamental innova-
tion is a boost for the revolution of the technology to level up an industry. However, knowledge science is a generic term in most of the disciplines and the funda-mental principles mainly comes from the biology and computer science domains. That is one of reasons that AI, KG etc. show more powerful influence on the information and biological fields. BIM is for building and civil engineering but the idea comes from the machinery industry. Compared to producing an industrial equipment or components, the lifecycle management of buildings and infrastructures is far more complicated. As a result, it is a common view that BIM does not go beyond the product information model (PIM) in the machinery industry. The AEC industry adopts a lot of newly technologies but which one is this industry desperately struggling for remains unclear. 2. Lack of accurate on-site data. The BIM platforms or systems provide an integrated media to store and manage the big data collected within the lifecycle of a project. They also arouse people's attitude on the AEC data. However, because of the low level of informationization and the inaccuracy of collected data, the AEC industry is in a situation of lacking accurate on-site data, regardless of data-driven project management. This drawback is gradually changing since more and more advanced information technologies are applied, but without running in a right track for a period of time. The data can not be fully dependable, neither are the ways to manage the data or discover knowledge from the data. 3. Lack of consolidated knowledge. Ontology, semantic network and KG provide managers with knowledge, but decisions in the AEC industry nowadays are mostly based on intuition or experience because the knowledge is non-unique, incomplete and hard to use. The experience can be considered as a kind of knowledge but with no clear expressions. How to accumulate and then to represent knowledge in a computer-readable and human-reusable way in order to aid project decisions, is still out of good solution. Accumulated, well-organized, accurate represented, retrieval-supported knowledge is not yet ready to revolve the activities within the project lifecycle. 4. Insufficient level of application. It is of potential to combine knowledge with BIM because BIM provides the environment and tools to integrate and manage data while knowledge science can not only discover knowledge from the BIM model and its data but also promoted ways to make use of the knowledge. A lot of trials have been run in diverse phases and multiple aspects but the conclusion can still be drawn that partial application to projects, without a clear and systematic roadmap or solution, does not fully show the bright future. In fact, most of the applications are kind of proving the effectiveness of the information technology, instead of promoting the building or the infrastructure to a more intelligent level.
Facing these challenges, there are a lot of future works to be carried out. Some of the future directions are summarized below.
1. Expand the semantic network and KG. The larger the semantic network and the KG are, the more intelligence people can be benefited from. Thus, one essential way to improve the intelligence level of the AEC industry is to expand the semantic network and KG by creating and managing various domain ontologies, regulation sets and open them to the public. However, manual generation of knowledge is unacceptable. Therefore, new methods on automatic generation and update of semantic network and KG are of great importance. Furthermore, more accurate NLP and more efficient knowledge enquiries should be taken into account considering the explosion of the knowledge. 2. Connected with other advanced technologies. Knowledge science and BIM are considered sub-divisions of information technology. They can be further integrated with other advanced technologies. For example, AI algorithms such as deep learning and data mining are helpful to generate new knowledge from existing data. Cloud computing can enhance the efficiency of knowledge storage, as well as the computational ability of knowledge retrieval and analysis. IoT supports data acquisition and immediate feedback to the environment, VR/AR/MR technologies provide more immersive environment to sense the new world, etc. Only with the combination of these technologies, the knowledge science has the potential to explode into a new generation. 3. Improve the information platform. Another way to make progress is to improve the knowledge management systems and BIM systems. Most current knowledge management systems focus on the storage and retrieval of knowledge, lacking of the ability to generate new knowledge according to accumulated big data with reliable rules. Moreover, knowledge management systems are still yet popular in real project management. Nowadays there are many BIM systems provided by a lot of software companies and research institutes., however, because of the diversity of AEC projects, these systems are either generic but failed in deep penetrations in management or vice versa. On the other hand, BIM systems are still developing in the aspects of distribution of data, integration of information, and centralization of management activities, etc. 4. Towards the high intelligent buildings and infrastructures. The BIM technology integrates, shares and manages the data of buildings and infrastructures, at the same time knowledge science widens the road of applying these data. The combination of these two technologies can also predicts the development trends of intelligence in the AEC industry. The intelligence here not only includes providing a more intelligent way for making use of the building or the infrastructures, but also enhancing the perceptible and controllable environment, and providing a more knowledge-driven methods in managing the lifecycle of the project. With these development, automatic design, construction robotics, and smart operations are reachable in the coming years to bring great improvements to the industry and peoples' daily lives. 5. Vitalize the potential of the accumulated knowledge.
It is a common sense that knowledge is not playing a crucial role in the AEC industry even though the information technologies have been so advanced today. With the development of BIM in the past decade, the value of information has been realized by all the participants and thus data are accumulated, information is managed and knowledges are generated, gradually. Just as the accumulated data provides new knowledge to the industry, accumulated knowledges are capable to reform the cooperation, activity and more aspects of the industry. A knowledge-driven intelligent world is coming.

Conclusions
In the past decades, knowledges for AEC industries mainly come from experiences and are mostly recorded in documents in paper or electronic forms. In order to make use of these knowledge, a lot of researches focused on retrieving the knowledges by applying various of methods including ontology, semantic network, data mining algorithms, etc. These methods rely on valuable data. BIM, seems to be a valuable media to provide information because it provides physical and functional digital models for all the facilities within the lifecycle of the project by adopting unique, readable data standard. Therefore, the combination of the knowledge science with BIM shows great potential. Based on the review of existing publications, this research summarizes the latest research achievements of these two technologies, in the aspects of knowledge description, knowledge discovery, knowledge storage and management, knowledge inference and knowledge application. By thoroughly studying of these publications, it shows that the ideas of data integration, data-driven approach, semantic analysis, knowledge discovery, etc. have been widely accepted to provide better support for the AEC industry. But the development seems to be at the preliminary stage and lacks of integral, systematic and generic achievements. This study identifies 4 major challenges for current situation, i.e., lack of fundamental innovation, lack of accurate on-site data, lack of consolidated knowledge, and insufficient level of applications.
Finally, this study predicts the future directions for the development of knowledge-driven intelligent AEC industry, including the aspects of expanding the semantic network and KG, being connected with other advanced technologies, improving the information platform, moving towards the highly intelligent buildings and infrastructures, and vitalizing the potential of the accumulated knowledge.
According to the review and discussions above, it can be considered that the research of combing knowledge science with BIM, particular in the area of knowledge extraction and discovery on BIM, is still in the very beginning stage with a lot of challenges to overcome. However, the further research and development show great potential to reform the AEC industry to provide a more intelligent environment for people.