Keywords

1 Introduction

In modern industries, effectively managing and utilizing standard components is crucial for achieving high-quality products, ensuring cost-effectiveness, and meeting project deadlines. The fundamental point is to establish a comprehensive standard components database. While collecting comprehensive components data is too difficult to be achieved, this paper focuses on developing a query system for the database, which is expandable for effortless data integration. Logical filtering and semantic searching are integrated to enhance system’s functionality. Data of SKF [1] hydraulic seals is used as an illustrative example in the system developing process.

A robust standard components database acts as a centralized repository, promoting consistency across projects, reducing duplication, and facilitating team collaboration. Similar to MEGARes 2.0 [2], which aids identifying antimicrobial resistance genes in metagenomic data for epidemiological investigations. Such a database speeds up the design process by eliminating the need for manual handbook searches, making it indispensable for the application of AI in industrial settings.

Creating a database for standard components, akin to PubChem’s [3] inter-linked Substance, Compound, and BioAssay databases, needs a well-organized data structure, robust search functionalities (including logical and semantic filtering), and accessible through APIs for programmatic use.

The query system for the database should contain the following parts as shown in Fig. 1. The user interface is the entry point for users to interact with the query system. It comprises two main components:

Fig. 1
A flow chart of the Query system's functional architecture. It presents a user interface consisting of user input and result presentation, a query processor consisting of a logical filtering module and semantic retrieval module, and standard component data consisting of structured data and unstructured data.

The query system functional architecture

  • User Input: Takes two forms, structured queries and content-based queries. Structured queries allow users to specify attributes such as dimensions, materials, and performance metrics in a structured format. Content-based queries leverage natural language input, enabling users to describe their needs in more intuitive terms.

  • Result Presentation: Showcases the retrieved standard components. Users can explore the results, compare components, and select the most suitable ones for their projects.

The query processor is responsible for handling user inputs and transforming them into database processable requests. It comprises two key modules:

  • Logical Filtering Module: Allows users to filter components based on attributes such as size, material, or other technical specifications. Supports cross-logical filtering, combining criteria from different data tables to identify components that meet complex requirements.

  • Semantic Retrieval Module: Leveraging advanced natural language processing as showed in the vitrivr [4] and SOSRepair [5]. Instead of relying on precise keyword matches, it interprets query descriptions based on content, delivering results that harmonize better with the user’s intended context. It’s used for exploration of standard components descriptions and related textual materials.

The standard components database is the core of the architecture, housing a collection of standardized parts, specifications, and related data. It is structured to accommodate different data types:

  • Structured Data: Includes basic data tables, which store the structured information about standard components, such as technical specifications, part numbers, and dimensions.

  • Unstructured Data: Contains additional information about the components in various formats, such as documentation, images, CAD drawings, and other multimedia elements.

2 Standard Components Database

2.1 Structured Data

The standard components database contains three types of structured data tables: basic data tables, associated data tables, and multilevel data tables. Figure 2 provides a visual representation of how these three types of data tables are interconnected, covering all the structured data related to the components in this comprehensive framework.

Fig. 2
A diagram of the structured data tables. It presents structured data consisting of a basic data table and an associated data table, and a multilevel data table consists of the basic data table.

The structured data tables

Basic Data Tables: The fundamental building blocks of the database, containing essential structured information about individual standard components.

Associated Data Tables: Providing additional context information for components in different basic data tables, enhancing the abilities of components cross-reference and components filtering based on associated data.

Multilevel Data Tables: Capturing the hierarchical relationships within standard components data, which enables representing assemblies, sub-components, and other intricate structures, and retrieving information at various levels of detail.

2.1.1 Basic Data Tables

A basic data table is designed as a simple bivariate chart, in which columns stand for distinct features or attributes, while rows correspond to individual standard components. For instance, Table 1 displays a basic data table illustrating the types and basic features of seal parts.

Table 1 Seal types and basic features

2.1.2 Associated Data Tables

An associated data table is a set of basic data tables, with each of them focusing on specific sets of attributes that are related in components filtering.

For example, Table 1 records seal types and materials, while Table 2 records hydraulic fluids and seal material compatibility. The synergy between these tables is crucial in choosing an appropriate seal for a specific application, as the compatibility of hydraulic fluids with seal materials (from Table 2) and the corresponding seal types (from Table 1) collectively determine the optimal choice. When a particular hydraulic fluid is specified, the system utilizes associated data from both Tables 1 and 2 to deliver logical filtering outcomes. This ensures that the chosen seal not only aligns with the hydraulic fluid but also correlates with other attributes detailed in the associated data tables.

Table 2 Hydraulic fluids and seal material compatibility

2.1.3 Multilevel Data Tables

A multilevel data table is a basic data table and its sub-tables, capturing how different attributes of the component interact with one another.

For instance, consider a scenario where Table 3 represents basic seal installation dimensions, while Table 4 serves as a sub-table, capturing installation dimensions that are specifically related to pressure considerations. While Table 3 provides fundamental installation dimensions, it is Table 4 that refines these dimensions in the context of pressure requirements. Tables 3 and 4 collaboratively define the complete installation dimensions of a seal under varying pressure conditions.

Table 3 Basic seal installation dimensions
Table 4 Maximum extrusion gape

2.2 Unstructured Data

The standard components database contains two main types of unstructured data: component descriptions and multimedia assets. Component descriptions consist of Textual narratives that offer detailed information about the characteristics, features, and possible uses of standard components. Multimedia assets encompass visual and multimedia resources like pictures, videos, and CAD drawings, which help users visualize physical attributes of standard components.

2.2.1 Component Descriptions

Employs JSON to store textual descriptions that can not fit into SQL data table. Python can be used to manipulate JSON files, supporting programmatic interactions and implementing advanced semantic search. Here is an example describing the temperature condition of the hydraulic seals.

A set of programming codes. The written text consists of components, objects, and descriptions.

2.2.2 Multimedia Assets (Images, CAD Drawings, Etc.)

Document-oriented NoSQL databases are well-suited for handling multimedia assets. They excel in managing and storing unstructured data in flexible, JSON-like documents. Storing multimedia files alongside associated metadata and unique identifiers ensures easy retrieval, categorization, and access control.

3 Query Processor

3.1 Logical Filtering Module

In the query system design, all structured information finds its place in basic data tables. These tables comprise ‘Attributes’ as column names, ‘Entries’ as unique row values, and ‘Values’ as the contents within. In essence, structured data can be represented as triples denoted as (A, E, V), where A stands for Attributes, E for Entries, and V for Values. The logic filtering is the process of completing triple from these tables based on user-query conditions, such as (A, E, ?), (A, ?, V) and (?, E, V).

For logical filtering of basic data tables, the data of a (A, E, V) triple can be get from a single table using SQL query commands.

For an associated data table, logical filtering requires a series of (A, E, V) triples to be completed, crossing different basic data tables in the set. Take selecting types of seals that are compatible with hydraulic fluids material according to Tables 1 and 2, as described in Sect. 2.1.2, for example. As shown in Fig. 3, one basic table corresponds to one triple, where the user-query condition is (A2, ?, V2), the user-query target is (A1, E1, V1), and the associated table’s matching condition is E2 \(\iff \) (A1, V1). So, the mathematical description of the logical filtering is shown in Eq. (1).

Fig. 3
A diagram of the associated data table. It presents Basic Data Tables 1 and 2. Basic Data Table 2 has 2 columns labeled A and A 2. Basic Data Table 1 has 3 columns and 3 rows.

Associated data table logical filtering

(1)

For a multilevel data table, logical filtering also involves completing a series of (A, E, V) triples, while the matching condition between basic tables is different. As exemplified in the description of parent Tables 3 and 4 in Sect. 2.1.3, Fig. 4 shows that the matching condition is (A3′, V3′) \(\Rightarrow \) E4 and V4 \(\Rightarrow \) (A3, V3). The goal is to obtain the correct value for Table 3 from Table 4. Therefore, the user-query condition is (A3, E3, ?) and (A4, ?, ?), the user-query target is (A3, E3, V3), and the mathematical description of the logical filtering is shown in Eq. (2).

(2)
Fig. 4
An illustration of a multilevel data table. It presents Basic Data Tables 3 and 4. The Basic Data Table 3 has 3 columns labeled A, A 3 dash, and A 3. The Basic Data Table 4 has 3 columns labeled A, A 4, and an empty column.

Multilevel data table logical filtering

In the query system, all the basic data tables should be connected using the described relationships, forming a complete ontology. This means that starting from any point within the ontology, the user should be able to obtain a clear query result with sufficient conditions. Figure 5 illustrates an example ontology from Tables 1 to 4.

Fig. 5
An illustration of the structured data ontology. It presents an associated data table consisting of Basic Tables 1 and 2, and a multilevel data table consisting of Basic Tables 2, 3, and 4.

Structured data ontology

3.2 Semantic Retrieval Module

In the query system design, the semantic retrieval module is used for user content-based query analyse and Textual description analyse.

Content-based query analysis aims to convert unstructured textual queries into a structured format for the query system’s comprehension. This involves initial text parsing to identify relevant elements like keywords, phrases, and entities, using techniques like tokenization, part-of-speech tagging, and named entity recognition. The next steps include structuring the query by extracting subject, predicate, and object information, which often results in the creation of (A, E, V) triples. Entity resolution is then performed to link entities to specific database tables or records, ensuring the system knows where to retrieve data. Finally, the structured query conditions and targets are used to generate a formal query, typically in SQL or a similar query language, for execution against the database. Logical filtering can be applied to further refine the results based on user-defined criteria or constraints.

Textual description analysis is used to analyse component descriptions described in Sect. 2.2.1. It can extract valuable information from text, and generate meaningful natural language responses. It involves identifying key information, such as facts, entities, and relationships through techniques like sentiment analysis and topic modelling. It enables the query system to handle unstructured data and generate contextually appropriate responses by assembling information and providing human-like answers.

4 User Interface

An example of the user interface for the query system is shown in Fig. 6, featuring logical filtering and semantic search as inputs, along with a preview of structured data and unstructured data (models and figures) from the standard components database.

Fig. 6
A screenshot of the user interface. It presents a condition, value, function, position, code, pressure, and value. An inverted U shaped illustration is also highlighted, along with a figure.

User interface