KINtaro: protein kinase-like database

Baranowski, Bartosz; Krysińska, Marianna; Gradowski, Marcin

doi:10.1186/s13104-024-06713-y

KINtaro: protein kinase-like database

Research Note
Open access
Published: 16 February 2024

Volume 17, article number 50, (2024)
Cite this article

Download PDF

You have full access to this open access article

BMC Research Notes Aims and scope Submit manuscript

KINtaro: protein kinase-like database

Download PDF

Bartosz Baranowski¹,
Marianna Krysińska² &
Marcin Gradowski²

702 Accesses
Explore all metrics

Abstract

Objective

The superfamily of protein kinases features a common Protein Kinase-like (PKL) three-dimensional fold. Proteins with PKL structure can also possess enzymatic activities other than protein phosphorylation, such as AMPylation or glutamylation. PKL proteins play a vital role in the world of living organisms, contributing to the survival of pathogenic bacteria inside host cells, as well as being involved in carcinogenesis and neurological diseases in humans. The superfamily of PKL proteins is constantly growing. Therefore, it is crucial to gather new information about PKL families.

Results

To this end, the KINtaro database (http://bioinfo.sggw.edu.pl/kintaro/) has been created as a resource for collecting and sharing such information. KINtaro combines protein sequence information and additional annotations for more than 70 PKL families, including 32 families not associated with PKL superfamily in established protein domain databases. KINtaro is searchable by keywords and by protein sequence and provides family descriptions, sequences, sequence alignments, HMM models, 3D structure models, experimental structures with PKL domain annotations and sequence logos with catalytic residue annotations.

View this article's peer review reports

A Structurally-Validated Multiple Sequence Alignment of 497 Human Protein Kinase Domains

Article Open access 24 December 2019

UniProt Protein Knowledgebase

dbPSP 2.0, an updated database of protein phosphorylation sites in prokaryotes

Article Open access 29 May 2020

Introduction

Kinases are among the most crucial enzymes found in all living organisms. They facilitate phosphorylation reactions, transferring phosphate groups from high-energy compounds like ATP to specific target molecules. Within the PKL superfamily, best known are protein kinases responsible for phosphorylating proteins [1]. Additionally, in the PKL superfamily there are small molecule kinases whose substrates include antibiotics and sugars [2], as well as lipid kinases that target membrane lipids like phospholipids and sphingolipids [3,4,5].

PKL proteins play critical roles in various biological processes, including cell growth, differentiation, and apoptosis. Dysregulation of these proteins can contribute to the development of numerous diseases, including tumorigenesis [6]. Moreover, PKL proteins can act as promoters of antibiotic resistance [2], aid pathogen survival within host cells [5, 7], and serve as effectors influencing cellular processes in affected cells [8]. Consequently, blocking their activity through various types of inhibitors can be crucial in preventing diseases, infections, and treating cancer [9] providing alternative treatments.

Pseudokinases were initially considered to be non-functional relatives of protein kinases that lost their enzymatic activity due to mutations [10, 11]. However, recent studies have revealed that pseudokinases can exhibit alternative enzymatic activities. For example, the coronavirus NiRAN pseudokinase domain transfers nascent RNA to GDP, using an RNA–protein intermediate, and ultimately forming the core RNA cap structure: GpppA-RNA [12]. The SelO pseudokinase performs AMPylation of proteins involved in redox homeostasis [13]. The bacterial pseudokinase effector SidJ polyglutamylates SidE effectors, blocking their activity which consists of phosphoribosyl ubiquitination of host Rab GTPases to evade phagocytosis [14], thus modulating the effect on the host cell. Pseudokinases can also serve as allosteric regulators of protein kinases, influencing their activity [15] or stasis for other proteins (for example as part of the secretion system of bacteria) [16].

A number of databases related to protein kinases are known, e.g., the best known database of human kinases according to Manning’s classification [17] or the database of protein kinases in genomes—KinG [18], which is based on Pfam [19] domains. The Pfam domains are not always well defined in terms of domain boundaries, e.g., the PIP49_C family does not cover the entire PKL fold [20]. The Pan3_PK pseudokinase family lacks the kinase N-lobe [21]. Moreover, the Pfam clan (superfamily) Pkinase does not include all known PKL families e.g., SelO pseudokinase family—involved in redox homeostasis [13] or FAM198 family which has been recently identified as a potential cancer-associated gene [22]. Other examples are Pox_E2-like—a pseudokinase found in Poxviridae [23] or the CLU [24] pseudokinase present in eukaryotes. In addition, a lot of PKL families are not recognized as domains in the Pfam base, for example, the pseudokinase SidJ [14] or the viral pseudokinase NiRAN [12].

The InterPro database, which absorbed Pfam is still missing many known PKL families [25].

Other databases dedicated to protein kinases are specialized, e.g., KLIFS—a database based on structural knowledge allowing to navigate in the space of kinase-ligand interactions [26], KinaseMD—a database collecting most updated information on mutations, unique annotations of drug response, especially drug resistance and functional sites of kinases [27], BYKdb—Bacterial tYrosine-Kinase database [28]. There is no specialized database collecting information on all the proteins that share the common PKL structure.

Earlier, we studied the pan-proteome of the Legionella genus bioinformatically. Some of the Legionella PKL families seem to be unique to this bacterium [29].

Together with information from our own research, databases and literature our database contains 72 updated and carefully prepared PKL families (Additional file 1: Table S1) and basic information about each family from all domains of life. The available 3D structure models and domain structures can help in search strategies for further PKL homologs [30].

We believe that our semi-automatic approach of constructing the PKL domain family sequence models based on the protein structure model is better than automatic approaches used in other protein domain databases.

The main value of the database lies in its searchable presentation of 32 novel annotated families, previously unrecognized as PKL, along with the assignment of active sites to each family.

Methods and materials

KINtaro protein family model

For defining protein kinase families, we adopted an approach similar to the protein database Pfam [19], now part of InterPro [25]. However, Pfam’s “PKinase (CL0016)” protein clan as mentioned before was not adequately updated, and their family models were not always accurate [19]. In our pipeline, we initiated the process of defining a new family with a representative sequence. These sequences were obtained from existing PKL families in Pfam, and also, for families missing in Pfam, from known 3D structures possessing the PKL fold, from novel PKL families described in the literature or from our own sequence/structure searches. Such representative sequences served as a query for 3D structure modeling. Model was created based on the representative sequence (Fig. 1, arrow A) using ColabFold (AlphaFold2 using MMseqs2) or ESMfold, the final model was chosen based on the pLDDT score [31, 32].

To find all members of a PKL family, a representative sequence also served as a query for phmmer [33] against the NR database [34] with an E-value threshold of 0.0001 (Fig. 1, arrow B). Next, we filtered out homologous sequences shorter than 100 amino acids and clustered them at 90% sequence identity [35] (Fig. 1, arrow C). The clustered sequences were then aligned using the ClustalO program [36] to build the family's hidden Markov model (HMM) [33] (Fig. 1, arrow D and E). The alignment was collapsed, where gaps were removed from the representative sequence (Fig. 1, arrow F). A sequence logo was generated from the collapsed alignment using Weblogo [37] (Fig. 1, arrow G). In the final optional step, an iterative approach was used to enhance the family model by adjusting the domain boundaries, where we evaluated the collapsed logo and structure model (Fig. 1, red arrows). For convenience, in the database in the "Family" tab (Fig. 2), the "origin" of the family is recorded, which includes the parameters used and information about any customized steps used in family model construction.

Two large and highly similar Pfam families PF00069 (Pkinase) and PF07714 (PK_Tyr_Ser-Thr) were combined into one family of classical kinases PKLF000033 (Pkinase). Instead of using phmmer, here we employed HMMsearch (with an E-value threshold of 0.0001) and HMM [33] derived from seed alignments (PF00069 and PF07714) from the Pfam database [19]. This HMM was employed to gather homologs, which were then clustered at a 30% sequence identity level.

Each family is assigned a unique identifier (Additional file 1: Table S1; Fig. 2), beginning sequentially with the abbreviation “Protein Kinase-Like—PKL + F” followed by the family's ordinal number. Additionally, each family possesses its own distinctive name.

Results

Database implementation

All PKL families and their relevant information were deposited into a local postgreSQL database. The KINtaro database website (http://bioinfo.sggw.edu.pl/kintaro/) was developed with the Django framework on a Linux machine. All KINtaro data is accessible for all users without registration or login. One can register to maintain sequence search history.

What KINtaro provides

KINtaro offers concise descriptions in family cards (Fig. 2) along with sequence logos collapsed to representative sequences [36] with annotated catalytic residues (when possible) corresponding to canonical kinase catalytic residues. The active site assignments (as originally described by Hanks) is based on literature [1], family sequence logos, 3D structure models, known structures and homology. Family structure models are provided, generated using either AlphaFold2 [31] or ESMfold [32]. Additionally, curated representative protein structures from PDB and individual PKL domain structures are provided [38]. The database also includes, for every family, a HMM sequence model, sets of full and clustered sequences of family members, accompanied by their alignments, full sequences containing the PKL domain and links to external databases. Family HMMs can be used to enrich, for example, genomic annotations. The provided sets of PKL sequences can be used for example, for finding new families (e.g. by cluster analysis through quasi-distances between sequences [39]). Structures and models, as mentioned earlier, can be used to search for distant kinase homologs [30]. Such a well-curated dataset can support research into novel (pseudo)enzymatic PKL families.

PKL family search in KINtaro

KINtaro enables users to conduct PKL domain searches with their own sequences using HMMscan (HMMER [33]). KINtaro also is searchable by keywords.

Conclusions

The family of proteins with the PKL fold is continuously expanding. In 2020, we counted over 50 families [40], and in 2022, nearly 70 [29]. Primarily composed of kinases, this group also includes proteins with diverse enzymatic functions and proteins with non-enzymatic roles [15, 16]. To summarize, our database represents a meticulously curated compilation of PKL proteins, serving as a comprehensive and up-to-date resource for information on this rapidly expanding protein superfamily.

Limitations

For some novel families, the PKL assignment is not experimentally confirmed but only predicted by sequence and structure similarities.

Data availability

The database and all information can be found at http://bioinfo.sggw.edu.pl/kintaro/. Apart from that, authors are always welcome to share the data required for reviewers and other researchers.

Abbreviations

PDB:: Protein Data Bank
PKL:: Protein Kinase-like
pLDDT:: Per-residue confidence metric of structure models

References

Hanks SK, Hunter T. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification ¹. FASEB J. 1995;9(8):576–96.
Article CAS PubMed Google Scholar
Kannan N, Taylor SS, Zhai Y, Venter JC, Manning G. Structural and functional diversity of the microbial kinome. PLOS Biol. 2007;5(3): e17.
Article PubMed PubMed Central Google Scholar
Itoh T, Ishihara H, Shibasaki Y, Oka Y, Takenawa T. Autophosphorylation of type I phosphatidylinositol phosphate kinase regulates its lipid kinase activity. J Biol Chem. 2000;275(25):19389–94.
Article CAS PubMed Google Scholar
Heath CM, et al. Lipid kinases play crucial and multiple roles in membrane trafficking and signaling. Histol Histopathol. 2003;18:989–98.
CAS PubMed Google Scholar
Li G, Liu H, Luo ZQ, Qiu J. Modulation of phagosome phosphoinositide dynamics by a Legionella phosphoinositide 3-kinase. EMBO Rep. 2021;22(3): e51163.
Article CAS PubMed PubMed Central Google Scholar
Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The protein kinase complement of the human genome. Science. 2002;298(5600):1912–34.
Article ADS CAS PubMed Google Scholar
Dong N, Niu M, Hu L, Yao Q, Zhou R, Shao F. Modulation of membrane phosphoinositide dynamics by the phosphatidylinositide 4-kinase activity of the Legionella LepB effector. Nat Microbiol. 2016;2:16236–16236.
Article PubMed Google Scholar
St. Louis BM, Quagliato SM, Lee PC. Bacterial effector kinases and strategies to identify their target host substrates. Front Microbiol. 2023. https://doi.org/10.3389/fmicb.2023.1113021.
Article PubMed PubMed Central Google Scholar
Castelo-Soccio L, Kim H, Gadina M, Schwartzberg PL, Laurence A, O’Shea JJ. Protein kinases: drug targets for immunological disorders. Nat Rev Immunol. 2023;15:1–20.
Google Scholar
James MM, Peter DM, Patrick AE. Live and let die: insights into pseudoenzyme mechanisms from structure. Curr Opin Struct Biol. 2017;5(47):95–104.
Google Scholar
Murphy JM, Farhan H, Eyers PA. Bio-Zombie: the rise of pseudoenzymes in biology. Biochem Soc Trans. 2017;45(2):537–44.
Article CAS PubMed Google Scholar
Park GJ, Osinski A, Hernandez G, Eitson JL, Majumdar A, Tonelli M, et al. The mechanism of RNA capping by SARS-CoV-2. Nature. 2022;609(7928):793–800.
ADS CAS PubMed PubMed Central Google Scholar
Sreelatha A, Yee SS, Lopez VA, Park BC, Kinch L, Pilch S, et al. Protein AMPylation by an evolutionarily conserved pseudokinase. Cell. 2018;175(3):809-821.e19.
Article CAS PubMed PubMed Central Google Scholar
Black MH, Osinski A, Gradowski M, Servage KA, Pawłowski K, Tomchick DR, et al. Bacterial pseudokinase catalyzes protein polyglutamylation to inhibit the SidE-family ubiquitin ligases. Science. 2019;364(6442):787–92.
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang H, Zhu Q, Cui J, Wang Y, Chen MJ, Guo X, et al. Structure and evolution of the Fam20 kinases. Nat Commun. 2018;23(9):1218.
Article ADS CAS Google Scholar
Tassinari M, Doan T, Bellinzoni M, Chabalier M, Ben-Assaya M, Martinez M, et al. The antibacterial type VII secretion system of Bacillus subtilis: structure and interactions of the Pseudokinase YukC/EssB. MBio. 2022;13(5): e0013422.
Article PubMed Google Scholar
Kinase.com. 2023. http://kinase.com/web/current/. Accessed 10 May 2023.
Krupa A, Abhinandan KR, Srinivasan N. KinG: a database of protein kinases in genomes. Nucleic Acids Res. 2004;32(1):D153–5.
Article CAS PubMed PubMed Central Google Scholar
Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49(D1):D412–9.
Article CAS PubMed Google Scholar
Tennant-Eyles AJ, Moffitt H, Whitehouse CA, Roberts RG. Characterisation of the FAM69 family of cysteine-rich endoplasmic reticulum proteins. Biochem Biophys Res Commun. 2011;406(3):471–7.
Article CAS PubMed Google Scholar
Christie M, Boland A, Huntzinger E, Weichenrieder O, Izaurralde E. Structure of the PAN3 pseudokinase reveals the basis for interactions with the PAN2 deadenylase and the GW182 proteins. Mol Cell. 2013;51(3):360–73.
Article CAS PubMed Google Scholar
Zheng X, Chen J, Nan T, Zheng L, Lan J, Jin X, et al. FAM198B promotes colorectal cancer progression by regulating the polarization of tumor-associated macrophages via the SMAD2 signaling pathway. Bioengineered. 2023;13(5):12435–45.
Article Google Scholar
Gao WND, Gao C, Deane JE, Carpentier DCJ, Smith GL, Graham SC. The crystal structure of vaccinia virus protein E2 and perspectives on the prediction of novel viral protein folds. J Gen Virol. 2022;103(1): 001716.
Article CAS PubMed PubMed Central Google Scholar
Schaeffer RD, Zhang J, Kinch LN, Pei J, Cong Q, Grishin NV. Classification of domains in predicted structures of the human proteome. Proc Natl Acad Sci. 2023;120(12): e2214069120.
Article CAS PubMed PubMed Central Google Scholar
Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA, et al. InterPro in 2022. Nucleic Acids Res. 2023;51(D1):D418–27.
Article CAS PubMed Google Scholar
Kanev GK, de Graaf C, Westerman BA, de Esch IJP, Kooistra AJ. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res. 2021;49(D1):D562–9.
Article CAS PubMed Google Scholar
Hu R, Xu H, Jia P, Zhao Z. KinaseMD: kinase mutations and drug response database. Nucleic Acids Res. 2021;49(D1):D552–61.
Article CAS PubMed Google Scholar
Jadeau F, Grangeasse C, Shi L, Mijakovic I, Deléage G, Combet C. BYKdb: the Bacterial protein tYrosine Kinase database. Nucleic Acids Res. 2012;40(D1):D321–4.
Article CAS PubMed Google Scholar
Krysińska M, Baranowski B, Deszcz B, Pawłowski K, Gradowski M. Pan-kinome of Legionella expanded by a bioinformatics survey. Sci Rep. 2022;12(1):21782.
Article ADS PubMed PubMed Central Google Scholar
Black MH, Gradowski M, Pawłowski K, Tagliabracci VS. Methods for discovering catalytic activities for pseudokinases. Methods Enzymol. 2022;667:575–610.
Article CAS PubMed PubMed Central Google Scholar
Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022;19(6):679–82.
Article CAS PubMed PubMed Central Google Scholar
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv. 2022. https://doi.org/10.1101/2022.07.20.500902v1.
Article PubMed PubMed Central Google Scholar
Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39(2):W29-37.
Article CAS PubMed PubMed Central Google Scholar
Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2021;50(D1):D20–6.
Article PubMed Central Google Scholar
Li W, Jaroszewski L, Godzik A. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics. 2002;18(1):77–82.
Article CAS PubMed Google Scholar
Sievers F, Higgins DG. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 2018;27(1):135–45.
Article CAS PubMed Google Scholar
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14(6):1188–90.
Article CAS PubMed PubMed Central Google Scholar
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, et al. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 2023;51(D1):D488-508.
Article CAS PubMed Google Scholar
Frickey T, Lupas A. CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics. 2004;20(18):3702–4.
Article CAS PubMed Google Scholar
Gradowski M, Baranowski B, Pawłowski K. The expanding world of protein kinase-like families in bacteria: forty families and counting. Biochem Soc Trans. 2020;48(4):1337–52.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank Dr Krzysztof Pawłowski for consultations and critical reading of the manuscript.

Funding

Marcin Gradowski was supported by the Polish National Science Centre Grant 2019/35/N/NZ2/02844.

Author information

Authors and Affiliations

Laboratory of Plant Pathogenesis, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
Bartosz Baranowski
Department of Biochemistry and Microbiology, Warsaw University of Life Sciences (SGGW), Warsaw, Poland
Marianna Krysińska & Marcin Gradowski

Authors

Bartosz Baranowski
View author publications
You can also search for this author in PubMed Google Scholar
Marianna Krysińska
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Gradowski
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

BB designed and created web KINtaro Database. MK checked the correctness of the data and designed the database. MG prepared scripts, data and wrote the manuscript. All the authors approved the final draft.

Corresponding author

Correspondence to Marcin Gradowski.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing Interests

The authors declare there are no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Table S1. KINtaro families. Columns: ID_family—KINtaro family id, name_family—KINtaro family name, pfam_id—pfam family id, interpro_id—interpro family id.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Baranowski, B., Krysińska, M. & Gradowski, M. KINtaro: protein kinase-like database. BMC Res Notes 17, 50 (2024). https://doi.org/10.1186/s13104-024-06713-y

Download citation

Received: 21 October 2023
Accepted: 01 February 2024
Published: 16 February 2024
DOI: https://doi.org/10.1186/s13104-024-06713-y

KINtaro: protein kinase-like database