Tribo-informatics: Concept, architecture, and case study

Friction plays a vital role in energy dissipation, device failure, and even energy supply in modern society. After years of research, data and information on tribology research are becoming increasingly available. Because of the strong systematic and multi-disciplinary coupling characteristics of tribology, tribology information is scattered in various disciplines with different patterns, e.g., technical documents, databases, and papers, thereby increasing the information entropy of the system, which is inconducive to the preservation and circulation of research information. With the development of computer and information science and technology, many subjects have begun to be combined with information technology, and multi-disciplinary informatics has been born. This paper describes the combination of information technology with tribology research, presenting the connotation and architecture of tribo-informatics, and providing a case study on implementing the proposed concept and architecture. The proposal and development of tribo-informatics described herein will improve the research efficiency and optimize the research process of tribology, which is of considerable significance to the development of this field.


Introduction
In recent years, the field of tribology has experienced significant developments in both scope and depth [1]. Tribology has not only achieved many new developments in areas such as traditional lubrication [2], surface engineering [3], biotribology [4], and computational tribology [5,6], but also essential applications in new fields such as superlubricity [7] and triboelectric nanogenerators [8]. Significant research achievements have been gained in various tribology disciplines, including biology, materials science, and engineering. As a result, information from tribology research has rapidly accumulated. However, the multidisciplinary coupling and system characteristics of tribology research [9,10] have lead to the formation of many information islands. The discrete information increases the information entropy in academia and makes the tribology branch more isolated, which is not conducive to the innovation and development of the tribology discipline. Researchers have made many efforts to mitigate the entropy of information, such as reviewing papers of different fields or proposing unified models. However, owing to the sharp increase in research methods and discipline information, the effect of traditional entropy reduction methods is not remarkable. Therefore, a more efficient method to mitigate information entropy is needed.
The development of information technology and computer science has provided an efficient and reliable method for reducing the entropy of information. By collecting, recording, and analyzing data, and outputting the results, along with other approaches, such technologies offer a large amount of data and suggestions for researchers in various fields, including health information technology [11], supply chain information technology [12], and educational information technology [13]. With these information technologies, the information in various fields can be summarized and sorted out, which reduces the information entropy of the system and improves the efficiency of information utilization. However, tribology has had a strong systematization since it was put forward as an interdisciplinary product. At the same time, it is significantly time dependent. Therefore, it is difficult to classify and sort out the tribology database with the available informatics technologies of other disciplines directly.
In this study, a framework of tribo-informatics is established based on a combination of information technology and tribology. This framework reduces the information entropy in tribology research and improves the research efficiency by facilitating the circulation of tribology information. The establishment of a database can also improve the inheritance of tribology data and improve the utilization and reusability of the research results. In addition, a case study is carried out in this paper, and the entire process of building and using a tribo-informatics database, function database, and application software is introduced in detail.

Development of tribology
Friction is the resistance produced by the relative motion of objects. Friction was recognized and utilized by human beings thousands of years ago. In that era, there were two important uses: friction heating and rolling friction. The utilization of friction has dramatically promoted the levels of human production and living. Tribology is defined as a science and technology studying the theory and practice of dual surfaces with relative motion and interaction [14]. Therefore, tribological research focuses on friction, wear, and lubrication.
With the development of other basic disciplines, the multidisciplinary characteristics of tribology are increasingly significant (Fig. 1). For example, the combination of materials science and tribology can use preparation methods [15] and performance parameters of different materials [16] to design an interface and achieve good tribological properties. The combination of physics and tribology can guide the design of an interface texture [17,18] and explain the mechanism of friction reduction of 2D materials [19]. The mix of chemical and tribology has promoted the development of tribochemistry, which can be used to study the influence of the friction process on the chemical properties of the interface and the effect of surface chemical modification on the friction properties [20,21]. At the same time, the development of computer science has promoted the development of calculations and design methods of tribology systems, including the development of simulation methods [22,23], as well as the verification of theoretical models [24].
With the development of tribology research, new tribological technologies, such as superlubricity and triboelectric nanogenerators, have been developed. These new technologies have the characteristics of multidisciplinary coupling. For example, the realization of superlubricity can be achieved by designing new materials (material science), surface texture (physics), and chemical surface modifications (chemistry). The performance of triboelectric nanogenerators can also be improved by changing friction pair materials and through physical and chemical modifications. In addition, computer science can also simulate the implementation of these new technologies. However, the development of such technologies has been greatly restricted owing to the difficulty in the communication of tribological information among different disciplines. Therefore, it is necessary to establish a tribology database to summarize and integrate tribological information from multiple disciplines, so as to better serve the development of new tribology technologies.

Development of informatics
Benefitting from the development of computing tools, information technology has developed and can process more and more data. In fact, a variety of technologies are developing rapidly to manage  and process information, including a variety of technologies, such as artificial intelligence, machine learning, database technology, and cloud computing. In the era of exploding amounts of data, information technology improves the information function of humans.
When information technology is applied to many fields of research, a variety of informatics disciplines are established. For instance, the development of next-generation sequencing (NGS) technologies in bioinformatics has accelerated the research of complex biological systems and provides a technical basis for basic research and medical applications of biology [25]. Materials informatics was first proposed in 2003 and has experienced significant developments, including furniture products, anti-corrosion materials, aerospace materials, and nanomaterials [26]. Materials informatics can accurately summarize and describe the properties of materials and provide the basis for the development of new materials. In addition, the rapid development of chemical informatics [27], music informatics [28], health informatics [29], and safety informatics [30] reflects the integration of information technology and multiple fields.
With the rapid development of information technology, efficient methods of collection, classification, storage, retrieval, analysis, extraction, and dissemi-nation of research information in multiple disciplines have emerged. However, owing to its strong systematic and multi-disciplinary coupling, the combination of tribology and informatics is extremely difficult. As a result, the research results on tribo-informatics have been limited.

Application of information technology to the field of tribology
Since the first recognition of friction, the study of tribology has gone through four processes to date: empirical science based on phenomena, theoretical science based on simplified models, computational science based on computational tools, and information science based on big data (Fig. 2). The application of information technology in the field of tribology mainly includes three aspects: the acquisition, analysis, and application of tribology data. To obtain tribological information, which is divided into tribological signals (such as friction, wear, and other tribological parameters) and derivative signals (such as image, noise, vibration, temperature, and electrical signal), a neural network method is mainly used to predict the wear, friction, and other tribological performance parameters [31][32][33]. In addition, during the process of friction, there are many derivative signals (Fig. 3). These signals can also be used to monitor the running state of a tribology system or predict its tribology performance. For example, Xue et al. [34] studied the influence of the surface structure and lubricant on the friction vibration and noise of GCr15 bearing steel. In addition, Kwang-Hua [35] studied the temperature dependence of the friction properties of ultra-lubricated molybdenum disulfide films.
The establishment and application of a tribology database mainly include research on the storage and extraction rules of various types of tribology data. For instance, Jia et al. [36] studied the database building method of lubricating materials and predicted the relationship between tribology and oxidation resistance by using a machine learning method. These results provide conditions for the rapid design, preparation, and application of lubricating materials. However, owing to the strong systematic characteristics of tribology, a complete tribology database has yet to be established. In addition, at the beginning of studies on tribology, some researchers mentioned the establishment of a tribology literature database [37,38]. However, based  | https://mc03.manuscriptcentral.com/friction on a lack of studies regarding the analysis of such technology, this type of database has not been thoroughly considered.
The continuous combination of information technology and tribology provides more possibilities for research into the field of tribology, and the establishment of a database creates favorable conditions for the preservation and circulation of tribology research data. However, there are no descriptions of the connotation and structure of tribology information systems, and thus the focus of this paper is to build a systematic tribology information framework.

Concept of tribo-informatics
To provide a complete definition of tribo-informatics, the concepts of information entropy and tribology systems need to be clarified. First, information entropy is the average amount of information after eliminating redundancy in the system. Therefore, when the value of the signal source is uniquely determined, the information entropy of the system is the lowest. The information entropy can be calculated as follows: where U is all possible results of the source value, i p is the probability of the i-th value in the source, and E is the expectation of log i p . For example, in tribology research, the study of super lubrication is taken as an example. When "superlubricity" is used as a keyword in a literature search, it can be seen that, from 2016 to June 2020, a large number of studies have been published, resulting in an increase in information entropy in the field of superlubricity research (Fig. 4). It is assumed that the probability of any particular researcher searching for a particular journal is equal to that of another researcher, and thus the information entropy (H) can be calculated as follows: where n is the total number of the published papers. In general, there are two ways to reduce information entropy: One is to put forward a new unified theory, which requires the creativity of researchers. The other is to sort out the published research results, such as the publication of review articles and the establishment of databases. Tribology research is extremely systematic; it mainly includes structural elements (E), element performance (P), the relationship between elements (R), and historical information (H) [9,10,39]. Tribology systems mainly include subsystems (such as coatings and lubricating oils), current systems (friction pairs), and super systems (bearings or other friction scenarios). (Fig. 5) Therefore, bioinformatics, chemical informatics, and material informatics are different from triboinformatics, which are based on subsystem informatics. As a result, the establishment of a database is simpler than that of tribo-informatics.
Through the above analysis of the information entropy and tribology system, the following conclusion of the concept of tribo-informatics can be drawn: Tribo-informatics improves the research efficiency and process of tribology by establishing tribology standards, building tribology databases, and using information technology to collect, classify, store, retrieve, analyze, and disseminate tribology information.

Architecture of tribo-informatics
The architecture of tribo-informatics is shown in Fig. 6, which mainly includes the connection, establishment, and establishment of a functional database, as well as the retrieval and dissemination of information.  1) Connection of the database. Because of the systematisms of tribology research, the input conditions of the tribology model can be connected with other databases, such as material, manufacturing, and chemical databases. These databases constitute the input conditions of the tribology system. It should be noted that the information unit of the tribology database is composed of "input-environment-interfaceoutput", and the tribology information is recorded in the tribology database in the "input-environmentinterface-output" form (Fig. 7).
2) Establishment of the database. During this process, it is necessary to pay attention to data classification. The data sources to classify mainly include the original data, processed data, and literature and report data. Among them, the original data come from the real standard test with the highest reliability. The processed data comes from simulation and theoretical prediction, and the reliability is relatively weak. In addition, literature and report data mainly include hot words, which can be added to analyze the hot spot changes in the field of tribology. From  the perspective of data characteristics, the classification can be divided into time-series data and relational data, in which time-series data represent the relationship between the performance of a tribology system and time, and relational data represent the relationship between input variables and the output of the tribology system.
3) Establishment of a functional database. The functions of the tribology database include visualization, retrieval, and analysis. Among them, the most critical is the analysis function, which mainly includes a theoretical model, a simulation model, and an artificial intelligence prediction model. These models can use the information of the underlying database for verification, and can also be used to predict the tribology information. The prediction results can then be stored in the processed data. It must be noted that the processed data are temporarily stored because the original experimental conditions cannot be achieved. When the conditions of the original experimental data can be achieved, the original data can cover the processed data.
4) The retrieval and dissemination of information.
In this section, industrial applications are developed to provide rich retrieval and visualization functions.
To some extent, the development of these applications can draw lessons from the current relatively mature methods of bioinformatics and materials informatics. In terms of the application of friction informatics, basic researchers can extract information from the underlying database for simulation and theoretical model research. Application researchers can extract simulation models, theoretical models, and artificial intelligence prediction models from the functional library for the output prediction of the tribology application system. Enterprise staff members can directly extract the output information of the tribology system in the database for industrial applications.

Case study 4.1 Background
Variable stator vane (VSV) assemblies are used in a compressor airflow control, playing a significant role in the performance of aircraft engines. Bushing, a key component of VSV assemblies, extends between the variable vane and the casing to prevent direct contact between them, as shown in Fig. 8. Because of its severe working conditions, such as high temperature and exposure to a water/air mist mixture, accurate predictions of its wear life are greatly needed. Therefore, tribological experiments are conducted in this case to obtain the wear data under different working conditions.

Generation of data
In this section, the influencing factors including temperature, radial load, amplitudes of reciprocating rotation, and clearance between the bushing and spindle are analyzed. Before the experiments, the central composite design method was employed to investigate the mapping relations between these factors and the bushing wear quality, which enhances the prediction precision with no significant increase in the costs of the experimental source. The parameters of the design of experiment (DoE) are listed in Table 1. Specifically, the frequency of reciprocation was set to 2 Hz and the test duration was 2 h for every group.
1) Preparation of specimen. Three types of bushing from different polymers and two types of spindles from different alloys are produced, which are called B1, B2, B3, S1, and S2, respectively. For a sequence number of less than 22 in Table 1, a friction pair formed by bushings B1, B2, B3, and spindle S1 are tested. For the remaining sequence numbers, the friction pair formed by three types of bushing and spindle S2 are tested.
2) Preparation of the experimental apparatus. Owing to the lack of commercial tribometers that satisfy the working conditions of the bushing, an experimental set-up is created that includes a reciprocating rotation, radial load application, and high-temperature environments [40]. To obtain the wear mass loss after the experiment, an electric balance MS105 was adopted. Before and after the experiment, drying was conducted in an oven to reduce the system error from steam.
3) Experimental procedure. First, the bushing samples were cleaned ultrasonically using isopropanol for 5 min to remove any stains. Then, the cleaned sample was placed in an oven for drying at a fixed temperature of 150 °C for 3 h. After that, the  cooling operation was conducted in a desiccator before being weighed by an electric balance, and the mass was recorded as 0 m . Once the tribometer and tested samples were prepared, the motor ran at a reciprocating frequency of 2 Hz for 2 h, during which frictional torque, radial load, and temperature data were recorded. After finishing the wear tests, the bushing was unloaded and then ultrasonic cleaning, drying, and weighing were successively applied to measure the quality 1 m after the test. Finally, the wear mass loss under certain working conditions was solved as    1 0 m m m . After all sequences in Table 1 for the three types of bushing were completed in the above manner, a dataset containing the results of the 126 groups was generated.

Data storage
The procession of raw data according to the uniform tribological storage format is required before being inserted into the database. It is important that all data be converted into the International System (SI) of Units. The rationality and validity of the data should be ensured by experts, although such operations can be accomplished using a script. Two main messages of raw data are collected, including material messages and tribological system messages, as shown in Fig. 9. The former can be crawled from the Internet or the literature and is recorded in the form of a table, listing all physical characteristics that can be found and its respective values, such as the elastic modulus, density, hardness, and thermal conductivity. It is notable that each physical attribute of material is separated by a line. The latter is the more interesting component for the tribological database, which is organized with respect to the system. This part consists of four types of messages: tribological pairs, tribological behavior, working conditions, and results. These four components begin and end with special characters, including their respective names. Tribological pairs record the names of materials that participate in the tribological behavior, which also includes oil. If there is a lack of lubricants, then the string would be labeled as "lacked". The table of tribological behavior notes the system information of the tested pairs, including the relative motion and shape of the tested pair, such as a pin on a disk with reciprocating rotation, a ball on a disk with reciprocating linear motion, and a ring on a block with rotation. In addition, a description of this data is contained in this entry, which will be the main scanned area during a search. The working condition recodes the exact values of the load, temperature, velocity, and duration for every group in the experiment, which is operated under the defined tribological behavior and pairs. The results record the tribological outputs, such as friction and wear rate, which are arranged in the sequence of working conditions.

Analysis and data utilization
In the past few decades, a large amount of experimental data has been unavailable for the public even after the relevant reports were published. Hence, these data have only been investigated by a few researchers, which reduces their utilization rate. Furthermore, previous research has not obtained an appropriate model to interpret the regularity behind these data because of limited resources. In this program, the database is available for the researcher, and numerous models will be applied to analyze the data and abstract implicit regularity in the future. Therefore, the utilization of data is improved, and a favorable model will be updated. In this section, two numerical models by two researchers are conducted to master the rules of the experimental data, which is called model A and model B, herein.
For both models, lookup features and data extraction of the tribological database are the first two steps to obtain an analysis object. The former is based on a keyword search, which is executed in terms of tribological behavior and pairs. Among the search results, the interesting ones are chosen and downloaded as packets, which include the material characteristics and tribological system information. For data extraction, any scripting language is allowed to reprocess packets and defragment them according to the storage rules. The analysis object is built up, followed by a statistical analysis and model description.
Based on mathematical statics, model A, implements the quadratic function fitting process for every tribological pair. At the initial stage, the prediction function with four influence factors, namely, 1 2 3 4 , , , A A A A , is hypothesized as follows: where ij a is the optimizing coefficient of every item i j A A , andŷ is the approximated value of the predicted parameter.
Before giving the final prediction equation, the correlation between every item i j A A and the experimental results y is calculated, which determines whether item i j A A is reserved in the function. After all items are tested and the final form of the prediction function is identified, the coefficient ij a can be optimized according to various algorithms, such as the least-squares method. Then, the assessment of the equation is performed based on the pvalue, whose smaller value suggests stronger evidence in favor of the established equation. The prediction results for this case are presented in Fig. 10.
Based on the neural network method, model B is conducted on the MATLAB neural network toolbox to predict the wear loss. In this section, the network consists of four layers, where the input, hidden 1, hidden 2, and output layers have 7, 3, 5, and 1 neurons, respectively, as shown in Fig. 11. The layer transfer function is set as logsig, logsig, and purelin, respectively. The ratio of the training data, validating data, and testing data are determined as 70%, 15%, and 15%, respectively. Unlike the above mathematical statics method, all samples of different pairs are prepared as input, with parameters including the compression modulus, heat transfer coefficient, and density of the respective polymers. The performance of the network in this case is presented in Fig. 12.
Two types of models are created based on the same dataset to predict the results under different working conditions. Thus, utilization of this VSV experiment increased, and more alternative models with high precision were established, and the VSV users or test personnel could choose a more satisfied model among them. In other words, both researchers concentrating on a model study and experimental testers benefit from this platform because the former have more opportunities to develop and popularize their models.

Data visualization
After keywords are entered in the search box, several related items are listed in terms of relevance. Once the term of interest is selected, data visualization is performed in two modes, including plots based on raw datasets and processed data. If the option of a non-model is chosen, raw datasets will be displayed  in the report form in the following sequence: title, tribological pairs, tribological behavior, data graph, data table, and material description, as shown in Fig. 13(a). It is worth noting that the data graph herein is plotted according to the sequence of working conditions, and the data table lists both controlled working conditions and respective tribological outputs. If the regularity of the raw data is focused, the analysis function is suggested with the models that have been developed for such data. For example, if a neural network model is chosen during the data visualization, then such a function will be performed and return both predicted and raw data, which is also shown in the report form. In general, such a model provides the prediction service according to a certain input, which is set as the default when the message is missed from the input block. As shown in Fig. 13(b), compared to the visualization of raw data, processed datasets increase the prediction function modulus, model description, and performance messages.

Conclusions
To improve the efficiency of tribology research and optimize the process of tribology research, the term tribo-informatics is proposed in this paper. Triboinformatics improves the research efficiency and process of tribology by establishing tribology standards, building tribology databases, and using information technology to collect, classify, store, retrieve, analyze, and disseminate tribology information.
Tribo-informatics research systems mainly include the connection of databases, the establishment of databases, the establishment of functional databases, and the retrieval and dissemination of information. To describe the research process of friction informatics in detail, the specific process of data generation, data storage, data analysis, and data visualization is described using the friction and wear research conducted on variable stator vane assemblies as an example.
The application of information technology can significantly reduce the information entropy of the system, whereas a decrease in information entropy can increase the order degree of research information, |www.Springer.com/journal/40544 | Friction http://friction.tsinghuajournals.com thus reducing the research time. Therefore, the establishment and popularization of a complete tribo-informatics system will certainly shorten the research cycle of tribology and promote the research results more widely.