Abstract
Posttranslational modifications (PTMs) of proteins impart a significant role in human cellular functions ranging from localization to signal transduction. Hundreds of PTMs act in a human cell. Among them, only the selected PTMs are well established and documented. PubMed includes thousands of papers on the selected PTMs, and it is a challenge for the biomedical researchers to assimilate useful information manually. Alternatively, text mining approaches and machine learning algorithm automatically extract the relevant information from PubMed. Protein phosphorylation is a well-established PTM and several research works are under way. Many existing systems are there for protein phosphorylation information extraction. A recent approach uses a hybrid approach using text mining and machine learning to extract protein phosphorylation information from PubMed. Some of the other common PTMs that exhibit similar features in terms of entities that are involved in PTM process, that is, the substrate, the enzymes, and the amino acid residues, are glycosylation, acetylation, methylation, hydroxylation, and ubiquitination. This has motivated us to repurpose and extend the text mining protocol and machine learning information extraction methodology developed for protein phosphorylation to these PTMs. In this chapter, the chemistry behind each of the PTMs is briefly outlined and the text mining protocol and machine learning algorithm adaption is explained for the same.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Makałowski W (2001) The human genome structure and organization. Acta Biochim Pol 48(3):587–598. Available from: https://pubmed.ncbi.nlm.nih.gov/11833767/
Kim M-S et al (2014) A draft map of the human proteome. Nature 509:575–581. Available from: https://pubmed.ncbi.nlm.nih.gov/24870542/
Minguez P, Parca L, Diella F et al (2012) Deciphering a global network of functionally associated post-translational modifications. Mol Syst Biol 8:599. https://doi.org/10.1038/msb.2012.31. Available from: https://pubmed.ncbi.nlm.nih.gov/22806145/
Khoury GA, Baliban RC, Floudas CA (2011) Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci Rep 1:90. Available from: https://www.nature.com/articles/srep00090?message-global=remove&page=2
Wang YC, Peterson S, Loring J (2014) Protein post-translational modifications and regulation of pluripotency in human stem cells. Cell Res 24:143–160. https://doi.org/10.1038/cr.2013.151. Available from: https://www.nature.com/articles/cr2013151
David GC et al Post-translational protein acetylation: an elegant mechanism for bacteria to dynamically regulate metabolic functions. Front Microbiol. https://doi.org/10.3389/fmicb.2019.01604. Available from: https://www.frontiersin.org/articles/10.3389/fmicb.2019.01604/full
Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4(6):1633–1649. https://doi.org/10.1002/pmic.200300771. Available from: https://pubmed.ncbi.nlm.nih.gov/15174133/
Ramazi S, Allahverdi A, Zahiri J (2020) Evaluation of post-translational modifications in histone proteins: A review on histone modification defects in developmental and neurological disorders. J Biosci 45:135. https://doi.org/10.1007/s12038-020-00099-2. Available from: https://link.springer.com/article/10.1007/s12038-020-00099-2#citeas
Pratt DV, Judith GV, Charlotte W (2006) Fundamentals of biochemistry : life at the molecular level, 2nd edn. Wiley, Hoboken, NJ
Walsh CT (2006) Posttranslational modification of proteins : expanding nature’s inventory. Roberts and Co., Englewood
Omenn GS, Lane L, Lundberg EK, Beavis RC, Overall CM, Deutsch EW (2016) Metrics for the human proteome project 2016: Progress on identifying and characterizing the human proteome, including post-translational modifications. J Proteome Res 15(11):3951–3960. https://doi.org/10.1021/acs.jproteome.6b00511. Available from: https://pubmed.ncbi.nlm.nih.gov/27487407/
Lange PF, Overall CM (2013) Protein tails: when termini tell tales of proteolysis and function. Curr Opin Chem Biol 17:73–82. https://doi.org/10.1016/j.cbpa.2012.11.025
Walsh CT, Garneau-Tsodikova S, Gatto GJ (2005) Protein posttranslational modifications: the chemistry of proteome diversifications. Angew Chem Int Ed Engl 44:7342–7372. https://doi.org/10.1002/anie.200501023
Paulus H (2000) Protein splicing and related forms of protein autoprocessing. Annu Rev Biochem 69:447–496. https://doi.org/10.1146/annurev.biochem.69.1.447. Available from:https://pubmed.ncbi.nlm.nih.gov/10966466/
Lu KP, Finn G, Lee TH, Nicholson LK (2007) Prolyl cis-trans isomerization as a molecular timer. Nat Chem Biol 3:619–629. https://doi.org/10.1038/nchembio.2007.35. Available from: https://pubmed.ncbi.nlm.nih.gov/17876319/
Santos AL, Lindner AB (2017) Protein posttranslational modifications: roles in aging and age-related disease. Oxid Med Cell Longev 2017:5716409. https://doi.org/10.1155/2017/5716409. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5574318/#B20
Apweiler R et al (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1473:4–8
Schjoldager KT, Narimatsu Y, Joshi HJ et al (2020) Global view of human protein glycosylation pathways and functions. Nat Rev Mol Cell Biol 21:729–749. https://doi.org/10.1038/s41580-020-00294-x. Available from: https://pubmed.ncbi.nlm.nih.gov/33087899/
Kim EH, Misek DE (2011) Glycoproteomics-based identification of cancer biomarkers. Int J Proteomics 1–10. https://doi.org/10.1155/2011/601937
Overview of Post-Translational Modifications (PTMs). Available from: https://www.thermofisher.com/us/en/home/life-science/protein-biology/protein-biology-learning-center/protein-biology-resource-library/pierce-protein-methods/overview-post-translational-modification.html
Glycosylation. UniProt: Protein sequence and functional information. Available from: https://www.uniprot.org/help/carbohyd
Protein Glycosylation. Available from: https://www.creative-proteomics.com/services/glycosylation-analysis-of-protein.htm
Drazic A et al (2016) The world of protein acetylation. Biochim Biophys Acta, Proteins Proteomics 1864(10):1372–1401
Zhang K, Shanshan T, Enguo F (2013) Protein lysine acetylation analysis: current MS-based proteomic technologies. Analyst 138(6):1628–1636
Shantha Raju T (2019) Methylation of Proteins. In: Chapter 11. Co and post translational modifications of therapeutic antibodies and proteins. Wiley, NJ, pp 133–146
Bedford MT (2006) Methylation of Proteins. In: Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine. Springer, Berlin, Heidelberg, p 114. https://doi.org/10.1007/3-540-29623-9_2780
Mahmood MK, Ehsan A, Khan YD, Chou KC (2020) iHyd-LysSite (EPSV): identifying Hydroxylysine sites in protein using statistical formulation by extracting enhanced position and sequence variant feature technique. Curr Genomics 21(7):536–545. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7604750/
Markolovic S, Wilkins SE, Schofield CJ (2015) Protein hydroxylation catalyzed by 2-Oxoglutarate-dependent Oxygenases. J Biol Chem 290(34):20712–20722. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4543633/
Swatek K, Komander D (2016) Ubiquitin modifications. Cell Res 26:399–422. Available from: https://www.nature.com/articles/cr201639
Choo YS, Zhang Z (2009) Detection of protein ubiquitination. J Vis Exp 30:1293. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3149903/
Neutzner M, Neutzner A (2012) Enzymes of ubiquitination and deubiquitination. Essays Biochem 52:37–50. https://doi.org/10.1042/bse0520037
Faktor J, Pjechová M, Hernychová L, Vojtěšek B (2019) Protein ubiquitination research in oncology. Klin Onkol 32(Suppl. 3):56–64. Available from: https://pubmed.ncbi.nlm.nih.gov/31627707/
Torii M, Arighi CN, Li G, Wang Q, Wu CH, Vijay-Shanker K (2015) RLIMS-P 2.0: a generalizable rule-based information extraction system for literature Mining of Protein Phosphorylation Information. IEEE/ACM Trans Comput Biol Bioinform 12(1):17–29. https://doi.org/10.1109/TCBB.2014.2372765
Sun D, Wang M, Li A (2017) MPTM: A tool for mining protein post-translational modifications from literature. J Bioinforma Comput Biol 15(5):1740005. https://doi.org/10.1142/S0219720017400054. Available from: https://pubmed.ncbi.nlm.nih.gov/28982288/
Huang H, Arighi CN, Ross KE, Ren J, Li G, Chen SC, Wang Q, Cowart J, Vijay-Shanker K, Wu CH (2018) iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 46(D1):D542–D550. https://doi.org/10.1093/nar/gkx1104. Available from: https://pubmed.ncbi.nlm.nih.gov/29145615/
Raja K, Natarajan J (2018) Mining protein phosphorylation information from biomedical literature using NLP parsing and support vector machines. Comput Methods Prog Biomed 160:57–64. https://doi.org/10.1016/j.cmpb.2018.03.022. Epub 2018 Mar 22
Eichler J (2019) Protein glycosylation. Curr Biol 29(7):R229–R231. https://doi.org/10.1016/j.cub.2019.01.003
Leaman R, Gonzalez G (2008) BANNER: an executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput:652–663
Raja K, Subramani S, Natarajan J (2014) A hybrid named entity tagger for tagging human proteins/genes. Int J Data Min Bioinform 10(3):315–328. https://doi.org/10.1504/ijdmb.2014.064545. Available from: https://pubmed.ncbi.nlm.nih.gov/25946866/
Antje C et al (2021) BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res 8:D498–D508. https://doi.org/10.1093/nar/gkaa1025. Available from: https://academic.oup.com/nar/article/49/D1/D498/5992283
Hu ZZ, Mani I, Hermoso V, Liu H, Wu CH (2004) iProLINK: an integrated protein resource for literature mining. Comput Biol Chem 28(5–6):409–416. https://doi.org/10.1016/j.compbiolchem.2004.09.010
PIR-Protein Information Resource. iProLINK/corpora. Available from: https://research.bioinformatics.udel.edu/iprolink/corpora.php
Ej L, Seo JH, Kim KW (2018) Special issue on protein acetylation: from molecular modification to human disease. Exp Mol Med 50:1–2. https://doi.org/10.1038/s12276-018-0103-4. Available from: https://www.nature.com/articles/s12276-018-0103-4
Hounsell EF, Davies MJ, Renouf DV (1996) O-linked protein glycosylation structure and function. Glycoconj J 13(1):19–26. https://doi.org/10.1007/bf01049675. Available from: https://pubmed.ncbi.nlm.nih.gov/8785483/
Varki A (2015) Essentials of glycobiology, 3rd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor. New York
PIR-Protein Information Resource.iProLINK/Evidence Attribution. Available from: https://proteininformationresource.org/pirwww/iprolink/ftcorpora.shtml
Raja K, Subramanian D, Abdulkadhar S, Natarajan J (2020) hPP Corpus: A Tagged Biomedical Corpus for Automatic Extraction of Human Protein Phosphorylation for Understanding Cellular Functions. J. Embryol. Stem Cell Res 1:1–12. Available from: https://medwinpublishers.com/JES/JES16000140.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Arumugam, K., Sellappan, M., Anand, D., Anand, S., Radhakrishnan, S.V. (2022). A Text Mining and Machine Learning Protocol for Extracting Posttranslational Modifications of Proteins from PubMed: A Special Focus on Glycosylation, Acetylation, Methylation, Hydroxylation, and Ubiquitination. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_10
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2305-3_10
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2304-6
Online ISBN: 978-1-0716-2305-3
eBook Packages: Springer Protocols