Introduction

Catalase (EC 1.11.1.6), which is present in all aerobic organisms and catalyzes hydrogen peroxide into water and oxygen, acts as an oxidoreductase enzyme to eliminate reactive oxygen species (ROS) produced through intracellular metabolism of molecular oxygen (Kaushal et al. 2018; Yamamoto et al. 2019). Aside from resistance to oxidative stress (Pradhan et al. 2017), catalase is also involved in responses to other environmental stresses including low temperature (Zhang et al. 2021), drought (Ma et al. 2017), and salt (Gondim et al. 2012). Extracellularly, catalases are effective candidates for eliminating residual H2O2 during many production processes, and therefore are used in various industrial fields including bleaching, food preservation, bioremediation, and wastewater treatment (Jia et al. 2016; Kaushal et al. 2018). Additionally, various industries employ catalases with specific enzymatic properties, such as thermostable catalases in textile bleaching and papermaking (Ebara and Shigemori 2008; Paar et al. 2001).

Low temperature helps to reduce microbial contamination and raw material deterioration during food processing, and some psychrophilic enzymes are widely used in food production as cost-effective and environmentally friendly additives. In contrast to the large amount of research available on alkali-stable and thermostable catalases (Calandrelli et al. 2008; Fall et al. 2023; Fu et al. 2014; Paar et al. 2001; Shaeer et al. 2019; Thompson et al. 2003), few psychrophilic catalases have been reported, including a catalase from Antarctic Bacillus with an optimum temperature of 25 °C (Wang et al. 2008) and a catalase from Vibrio rumoiensis sp. nov. with an optimum temperature of 30 °C (Yumoto et al. 2000). Demand for psychrophilic catalases with high catalytic activity and good stability at low temperatures is extensive.

Traditionally, the genes encoding catalases in various organisms have been identified via polymerase chain reaction (PCR), which is a time- and effort-consuming process that involves the isolation, enrichment culturing and identification of strains, extraction of chromosomal DNA, and PCR amplification of the target gene. More importantly, numerous microbes known as unculturable microbes cannot be cultured under artificial culture conditions, and thus their tremendous genetic resources cannot be accessed via the classical procedures of gene acquisition. The multitudinous valuable genes harbored in unculturable microbes (Datta et al. 2020) have remained largely mysterious (Wu et al. 2020). Fortunately, with the development of methods independent of microbial cultivation such as metagenomics based on DNA sequencing technology, genes of interest can be efficiently mined from samples containing unculturable microbes (Berini et al. 2017; Ngara and Zhang 2018).

Soil is a diverse and complex ecosystem (Jansson and Hofmockel 2020) that functions as a huge pool of microbes (Banerjee and van der Heijden 2023), biochemical gene library (Daniel 2005), and protein bank (Galhardi et al. 2020). Large numbers of genes identified from soil samples containing diverse unculturable microorganisms have been explored (Xu et al. 2022), revealing the microbial diversity of soil samples. Psychrophiles are diverse and widely distributed on Earth (Piette et al. 2011), and antioxidative enzymes such as catalases and superoxide dismutases are abundant in psychrophiles due to the increased solubility of gases such as O2 at low temperatures, which allows for more ROS production during normal metabolism. Thus, soil samples from extremely cold and high-elevation areas serve as excellent pools for mining of genes encoding psychrophilic catalases.

Many natural proteins are unsuitable for industrial application due to low thermostability, narrow optimum temperature or pH ranges, or low catalytic activity, and such proteins can be redesigned via protein engineering to improve their properties. In recent decades, error-prone PCR coupled with direct screening or selection was commonly employed to engineer proteins. This process was not dependent on protein structure but required a large amount of effort for the screening of numerous mutants. With the growth of databases containing protein sequence, structure, and function data and rapidly emerging applications of deep learning algorithms, site-directed mutagenesis mediated by rational design is playing increasing roles in protein engineering (Yang et al. 2019) and has been used to improve biochemical characteristics such as thermostability (Yoshida et al. 2021) and alkaline stability (Suplatov et al. 2014) as well as to design new proteins (Callaway 2022). Overall, protein engineering guided by machine-learning models can greatly promote the progress of protein evolution.

In this study, genes encoding psychrophilic catalases were identified from high-elevation soil samples through the combination of metagenomics with the optimum temperature prediction software Preoptem, and then a psychrophilic catalase encoded by the gene designated soiCat1 was characterized. The optimum temperature range of soiCAT1 was extended through site-directed mutagenesis guided by position-specific amino acid probabilities (PSAP) calculation.

Materials and methods

Mining of psychrophilic catalases via the deep learning model Preoptem

Soil sample collection and soil metagenomic analysis of Tianshan No. 1 Glacier was previously performed (Xu et al. 2022) and the data was deposited with the National Center for Biotechnology Information (NCBI) as BioProject PRJNA658179. The genes for catalases were identified via retrieval of DNA sequences from the NCBI database. The optimum temperatures of catalases were predicted using the Preoptem model, which was previously developed in our lab (http://www.elabcaas.cn/pird/preoptem.html).

Expression and purification of catalases

The pET-28a( +) vector was stored in our laboratory. Escherichia coli Top10, used as the cloning host, and E. coli BL21, used as the expression host, were obtained from TransGen Biotech Co., Ltd. (Beijing, China). The catalase-encoding genes were synthesized by GenScript Biotech Co. (Nanjing, China), and ligated into pET-28a( +) to generate the plasmid pET-28a/cat. pET-28a/cat was transformed into E. coli BL21 to develop the catalase-expressing strain. Expression and purification of catalases were routinely performed according to the pET system manual with the induction temperature at 16 °C and purification temperature at 4 °C. Finally, the purified proteins were assayed via 12% polyacrylamide gel electrophoresis (SDS-PAGE).

Catalase activity assay

Catalase activity was determined as described previously (Martins and English 2014) with some modifications. Briefly, 225 µl of 50 mM NaH2PO4–Na2HPO4 buffer (pH 7.0) containing H2O2 at a final concentration of 30 mM was added as the substrate to a 1.5 ml tube and preheated at 20 °C for 3 min, followed by the addition of 25 µl of enzyme solution and incubation at 20 °C for 3 min. The reaction was terminated through addition of 250 µl sulfuric acid (2 M). The absorbance of the reaction solution was measured at 240 nm. Enzyme activity (U) was defined as the amount of enzyme required to decompose 1 µmol of hydrogen peroxide in 1 min.

Characterization of recombinant catalases

The optimum temperature for catalase activity was measured across the range of 4–50 °C in 50 mM NaH2PO4–Na2HPO4 buffer (pH 7.0). Enzyme activity was also determined at pH 5.0 to 10.0 to identify the optimum pH for enzyme activity. The pH stabilities of catalases were investigated using the residual activities after incubation in buffers with various pH values for 1 h. Enzyme thermostability was assessed by measuring the residual activity after incubation at various temperatures and durations.

Parameters including Km, Vmax, and Kcat were acquired after fitting the data to the Michaelis–Menten equation with the software GraphPad Prism (London, UK).

Site-directed mutagenesis of wild-type catalase based on rational design

Proteins with amino acid sequences similar to soiCAT were collected from Uniparc database. The optimum temperatures of these proteins were first predicted using Preoptem, from which the proteins were classified into two catalogues, one with optimum temperatures above 50 °C and the other with optimum temperatures below 50 °C. The amino acid frequency for each site of soiCAT was analyzed using the Parepro program (http://www.elabcaas.cn/pird/premuse.html), and the amino acids at specific sites of soiCAT were mutated into amino acids found only in catalases with optimum temperatures above 50 °C.

Results

Psychrophilic catalases mined via metagenomics and a deep learning model

Numerous microbes survive in the low-temperature environment of glaciers, many of which remain uncultured under laboratory conditions. The genes of interest harbored in unculturable microorganisms cannot be cloned via the classical cultivation-dependent process. Due to their independence on culturing, metagenomic sequencing and genome assembly are useful strategies for investigating the overall number of microbial species and genetic diversity in glaciers (Liu et al. 2022).

In total, 161 genes encoding catalases were predicted using metagenomic data from a glacier, and the optimum temperatures of these catalases were predicted using the established online deep learning model Preoptem (http://www.elabcaas.cn/pird/preoptem.html). The temperature of an enzyme is positively correlated to the optimum temperature determined in an experimental test (Zhang et al. 2022). The predicted optimum temperature was corrected to match the actual optimum temperature determined in an experimental test using a linear equation (Zhang et al. 2022) (Fig. S1). Subsequently, 16 genes encoding catalases with the corrected optimum temperatures below 20 °C are listed in Table 1.

Table 1 The predicted and corrected optimum temperatures of catalases by Preoptem

The integrity of these 16 genes was analyzed via comparison of the proteins encoded by them with a protein database, and most of the genes were found to be incomplete (Table 2). The catalase encoded by the gene PI-H_1, designated soiCat1, had the lowest optimum temperature, and its gene sequence is complete (Table 2 and S1). Sequence alignment indicated that soiCat1 shows 73% identity to a catalase-encoding gene from Mucilaginibacter rubeus strain P1, while soiCAT1 shows 96.8% identity and 99% similarity in amino acid sequence to a catalase from Pedobacter cryoconitis (GenBank accession number WP_183867858.1).

Table 2 Sequence integrity analysis of catalase-encoding genes by sequence alignment

To further elucidate the phylogenetic relationships between soiCAT1 and psychrophilic catalases of various species, phylogenetic analysis was conducted. From the BRENDA Enzyme Database, 7 catalases with optimum temperatures below 25 °C were obtained; meanwhile, 64 catalases were acquired by searching the protein database of NCBI using psychrophilic catalase as the query. Additionally, the top 54 proteins homologous to soiCAT1 were identified using the Basic Local Alignment Search Tool for Proteins (BLASTP) with the sequence of soiCAT1 as the query. Of these proteins, 111 remained after running the clustering tool CD-HIT with a similarity threshold of 0.9, and a phylogenetic tree of low-temperature peroxidases was constructed (Fig. 1). The closest phylogenetic relationship of soiCAT1 was with the catalase of Pedobacter sp. Most members of genus Pedobacter are psychrophilic strains that grow at low temperatures of 1–25 °C (He et al. 2020; Margesin et al. 2003), and therefore soiCAT1 is most likely a psychrophilic catalase.

Fig. 1
figure 1

Phylogenetic tree of catalase sequences using the Neighbor-Joining method in MEGA11

Characterization of soiCAT1

To confirm that soiCAT1 is a psychrophilic catalase with experimental data, soiCAT1 was expressed in E. coli and purified using Ni–NTA resin. The purity of proteins was checked using SDS-PAGE, and a single band corresponding to an expected molecular weight of approximately 68 kDa was observed (Fig. S2). Purified recombinant soiCAT1 showed optimal activity at the lowest test temperature (4 °C), which markedly decreased with increasing temperature from 4 to 50 °C (Fig. 2A). Notably, the activity of soiCAT1 at 20 °C was 60% of that at 4 °C. The optimal pH for soiCAT1 activity was 9 (Fig. 2B). soiCAT1 exhibited excellent stability at low temperatures of 10 °C and 20 °C (Fig. 2C), but rapidly lost stability at temperatures above 30 °C (data not shown). Additionally, soiCAT1 exhibited better stability at pH 9 than other pH levels tested (Fig. 2D). These results suggest that soiCAT1 is indeed a psychrophilic catalase.

Fig. 2
figure 2

Characterization of soiCAT1. A optimum temperature; B optimum pH; C thermal stability; D pH stability

Engineering soiCAT1 through rational design

The narrow optimum temperature range of soiCAT1 indicates that a slight shift would significantly decrease its activity, which greatly limits its use in various industrial applications. Ideally, soiCAT1 should have high and steady enzymatic activity across a wide temperature range. To broaden the optimum temperature range of soiCAT1 and thus reduce its dependence on temperature, soiCAT1 was rationally designed based on the coevolution of residues within protein sequences. First, over 14,000 proteins similar to soiCAT1 were collected from Uniparc database. Preoptem was used to predict the optimum temperatures of the proteins, which were then divided into two groups: H and L. The H group contained 5565 proteins with optimum temperatures above 50 °C (Table S2), while the L group comprised 8865 proteins with optimum temperatures below 50 °C (Table S3). Position-specific amino-acid frequencies (PSAPs) of the proteins in the two catalogues corresponding to each amino acid of soiCAT1 were analyzed using the Parepro program (Meng et al. 2021; Tian et al. 2007) (Figs. S3 and S4), and the mutations of soiCAT1 were screened to identify the amino acids that occur only with high optimum temperatures. Fifty-nine amino acid sites were found (Table S4), and the top 14 mutations were selected as candidates for the following experimental assessment (Table 3).

Table 3 The amino acids of soiCAT1 to be mutated to extend optimum temperature range

Properties of the mutant soiCAT1 S205K

Site-directed mutagenesis was employed to develop 14 mutants of soiCAT1, and the optimum temperatures of these mutants were determined after expression and purification. Among the 14 mutants, one mutant, soiCAT1S205K, exhibited an extended optimum temperature range compared to soiCAT1 and the other 13 mutants, and therefore soiCAT1S205K was selected for further detailed characterization. soiCATS205K showed high and steady enzymatic activity from 4 to 20 °C. soiCAT1S205K had its highest activity at 4 °C, as did soiCAT1. However, the activity of soiCAT1S205K remained steady over the range from 4 to 20 °C, in marked contrast to soiCAT1. Thereafter, the activity of soiCAT1S205K decreased with increasing reaction temperature (Fig. 3A). soiCAT1S205K activity showed similar dependence on pH, with an optimum pH of 9 (Fig. 3B). In terms of the effects of temperature and pH on catalase stability, soiCAT1S205K exhibited similar characteristics to soiCAT1 (Fig. 3C and D).

Fig. 3
figure 3

Characterization of soiCAT1S205K. A optimum temperature; B optimum pH; C thermal stability; D pH stability

To examine the catalytic activity changes of soiCAT1 and soiCAT1S205K at different temperatures, the removal efficiency of H2O2 was assessed at 4 °C and 20 °C. At the same protein concentration (20 µg/L), nearly identical amounts of H2O2 were decomposed at 4 °C by soiCAT1 and soiCAT1S205K. At 20 °C, the amount of H2O2 removed by soiCAT1S205K was similar to that removed by soiCAT1 or soiCAT1S205K at 4 °C and markedly higher than the amount removed by soiCAT1 at 20 °C (Fig. 4). These results indicate that the temperature range over which soiCAT1S205K maintains high catalytic activity was extended after site-directed mutagenesis via rational design. Kinetic parameters of the wild type and mutant at 20 °C were also analyzed and fitted using nonlinear regression by GraphPad Prism 6 (Fig. 5). The Km values of soiCAT1 and soiCAT1S205K were 46.31 mM and 32.07 mM, while the Vmax values of soiCAT1 and soiCAT1S205K were 1.39 × 106 µmol/mg/min and 1.54 × 106 µmol/mg/min, respectively. Overall, soiCAT1S205K exhibited a wider range of optimum temperature than soiCAT1, without great alteration of other important properties.

Fig. 4
figure 4

The removal amounts of H2O2 of soiCAT1 and soiCAT1S205K at different temperatures

Fig. 5
figure 5

The kinetic curves of soiCAT1 and soiCAT1S205K for H2O2 at 20 °C using nonlinear regression by GraphPad Prism 6.0

Discussion

Psychrophilic catalases are considered effective candidates for food processing due to their high catalytic activity and stability at low temperature, which can strongly reduce the deterioration of raw materials as well as the risk of microbial contamination. Psychrophiles are the main providers of psychrophilic catalases. However, psychrophiles generally live under extremely cold conditions including glaciers, the deep sea and ice lakes, and most such microbes remain unculturable under artificial conditions, complicating the isolation of genes encoding psychrophilic catalases from these microbes. Fortunately, metagenomics methods are independent of culturing, providing a useful strategy for mining of microbial and genetic resources (Acinas et al. 2021; Daniel 2005; Xu et al. 2022), and a large number of genes have been mined via metagenomics. For these reasons, we aimed to identify psychrophilic catalase-encoding genes in a glacier sample from Tianshan, China, via metagenomics. The results suggest that this glacier harbors abundant catalase-encoding genes, and demonstrate that metagenomics is a useful high-throughput technique for identifying novel genes of interest from complex samples.

Efficient screening of desirable genes from enormous metagenomic datasets is another serious challenge. Optimum temperature is one of the key parameters characterizing an enzyme, and therefore accurate prediction of the optimum temperature of enzymes is useful for preliminary screening of desirable enzymes. We used the deep learning model Preoptem for high-throughput screening of potential psychrophilic catalases with optimum temperatures below 30 °C from among 161 catalases. soiCAT1 was predicted as a psychrophilic catalase via sequence alignment and this property was confirmed through enzymatic property analysis, validating the use of the deep learning model Preoptem as an effective tool to screen desired proteins from a large metagenomic dataset.

The catalytical activity of soiCAT1 peaked at 4 °C, and then decreased with increasing temperature, although it remained relatively stable at temperatures up to 20 °C. Notably, soiCAT1 had a narrow range of optimum temperature, which may limit its application. To broaden the optimum temperature range of soiCAT1, specific amino acids were mutated via PSAP to match the sequences of catalases with optimum temperatures above 50 °C. The mutant soiCAT1S205K exhibited the expected property, with steady catalytical activity across the range of from 4 to 20 °C, confirming that site-directed mutagenesis via rational design is an efficient method for protein engineering.

In summary, we identified numerous catalase-encoding genes through analysis of soil metagenomic data and efficiently isolated putative psychrophilic catalases using the deep learning model Preoptem. One psychrophilic catalase was then engineered to improve specific properties through rational design guided by PASP calculation. The strategies used in this study are suitable for mining and rational design of other enzymes from gigantic metagenomic datasets.