Handling gene and protein names in the age of bioinformatics: the special challenge of secreted multimodular bacterial enzymes such as the cbhA/cbh9A gene of Clostridium thermocellum

  • Wolfgang H. Schwarz
  • Roman Brunecky
  • Jannis Broeker
  • Wolfgang Liebl
  • Vladimir V. Zverlov


An increasing number of researchers working in biology, biochemistry, biotechnology, bioengineering, bioinformatics and other related fields of science are using biological molecules. As the scientific background of the members of different scientific communities is more diverse than ever before, the number of scientists not familiar with the rules for non-ambiguous designation of genetic elements is increasing. However, with biological molecules gaining importance through biotechnology, their functional and unambiguous designation is vital. Unfortunately, naming genes and proteins is not an easy task. In addition, the traditional concepts of bioinformatics are challenged with the appearance of proteins comprising different modules with a respective function in each module. This article highlights basic rules and novel solutions in designation recently used within the community of bacterial geneticists, and we discuss the present-day handling of gene and protein designations. As an example we will utilize a recent mischaracterization of gene nomenclature. We make suggestions for better handling of names in future literature as well as in databases and annotation projects. Our methodology emphasizes the hydrolytic function of multi-modular genes and extracellular proteins from bacteria.


Gene annotation Gene naming Database handling Nomenclature Multimodular protein Gene sequencing Record tracking Non-catalytic modules 



A. Angelov, P. Kornberger, A. Ehrenreich and M. Mechelke are thanked for valuable comments. Financial support from the German Federal Ministry of Food and Agriculture (FNR Grant Number FKZ 22021715) for WHS, WL and VVZ, and from the European Commission (Collaborative FP7-KBBE Project Valor Plus, No. 613802) for JB is gratefully acknowledged.


  1. Alahuhta M, Xu Q, Bomble YJ, Brunecky R, Adney WS, Ding S-Y, Himmel ME, Lunin VV (2010) The unique binding mode of cellulosomal CBM4 from Clostridium thermocellum cellobiohydrolase A. J Mol Biol 402:374–387. CrossRefGoogle Scholar
  2. Anbar M, Gul O, Lamed R, Sezerman UO, Bayer EA (2012) Improved thermostability of Clostridium thermocellum endoglucanase Cel8A by using consensus-guided mutagenesis. Appl Environ Microbiol 78:3458–3464. CrossRefGoogle Scholar
  3. Belaich A, Parsiegla G, Gal L, Villard C, Haser R, Belaich J-P (2002) Cel9M, a new family 9 cellulase of the Clostridium cellulolyticum cellulosome. J Bacteriol 184:1378–1384. CrossRefGoogle Scholar
  4. Brown SD, Lamed R, Morag E, Borovok I, Shoham Y, Klingeman DM, Johnson CM, Yang Z, Land ML, Utturkar SM, Keller M, Bayer EA (2012) Draft genome sequences for Clostridium thermocellum wild-type strain YS and derived cellulose adhesion-defective mutant strain AD2. J Bacteriol 194:3290–3291. CrossRefGoogle Scholar
  5. Brunecky R, Alahuhta M, Bomble YJ, Xu Q, Baker JO, Ding SY, Himmel ME, Lunin VV (2012) Structure and function of the Clostridium thermocellum cellobiohydrolase A X1-module repeat: enhancement through stabilization of the CbhA complex. Acta Crystallogr D 68:292–299. CrossRefGoogle Scholar
  6. Coutinho JB, Gilkes NR, Warren RA, Kilburn DG, Miller RC (1992) The binding of Cellulomonas fimi endoglucanase C (CenC) to cellulose and Sephadex is mediated by the N-terminal repeats. Mol Microbiol 6:1243–1252CrossRefGoogle Scholar
  7. Demerec M, Adelberg EA, Clark AJ, Hartman PE (1966) A proposal for a uniform nomenclature in bacterial genetics. Genetics 54:61–76Google Scholar
  8. Din N, Beck CF, Miller RC, Kilburn DG, Warren RA (1990) Expression of the Cellulomonas fimi cellulase genes cex and cenA from the divergent tet promoters of transposon Tn10. Arch Microbiol 153:129–133CrossRefGoogle Scholar
  9. EMBL-EBI (2017) InterPro: Similar proteins: Carbohydrate-binding, CenC domain protein (A0LUX1).;jsessionid=7BA5481A5E7312EF2F9D266AC8CC1731. Accessed 19 October 2017
  10. Faure E, Belaich A, Bagnara C, Gaudin C, Belaich JP (1989) Sequence analysis of the Clostridium cellulolyticum endoglucanase-A-encoding gene, celCCA. Gene 84:39–46CrossRefGoogle Scholar
  11. Feinberg L, Foden J, Barrett T, Davenport KW, Bruce D, Detter C, Tapia R, Han C, Lapidus A, Lucas S, Cheng J-F, Pitluck S, Woyke T, Ivanova N, Mikhailova N, Land M, Hauser L, Argyros DA, Goodwin L, Hogsett D, Caiazza N (2011) Complete genome sequence of the cellulolytic thermophile Clostridium thermocellum DSM1313. J Bacteriol 193:2906–2907. CrossRefGoogle Scholar
  12. Fundel K, Zimmer R (2006) Gene and protein nomenclature in public databases. BMC Bioinform 7:372. CrossRefGoogle Scholar
  13. Haq Iu, Akram F, Khan MA, Hussain Z, Nawaz A, Iqbal K, Shah AJ (2015) CenC, a multidomain thermostable GH9 processive endoglucanase from Clostridium thermocellum: Cloning, characterization and saccharification studies. World J Microbiol Biotechnol 31:1699–1710. CrossRefGoogle Scholar
  14. Heinze S, Mechelke M, Kornberger P, Liebl W, Schwarz WH, Zverlov VV (2017) Identification of endoxylanase XynE from Clostridium thermocellum as the first xylanase of glycoside hydrolase family GH141. Sci Rep 7:11178. CrossRefGoogle Scholar
  15. Henrissat B, Teeri TT, Warren RAJ (1998) A scheme for designating enzymes that hydrolyse the polysaccharides in the cell walls of plants. FEBS Lett 425:352–354. CrossRefGoogle Scholar
  16. Huntemann M, Ivanova NN, Mavromatis K, Tripp HJ, Paez-Espino D, Palaniappan K, Szeto E, Pillay M, Chen I-MA, Pati A, Nielsen T, Markowitz VM, Kyrpides NC (2015) The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4). Stand Genomic Sci 10:86. CrossRefGoogle Scholar
  17. Jenke-Kodama H, Sandmann A, Müller R, Dittmann E (2005) Evolutionary implications of bacterial polyketide synthases. Mol Biol Evol 22:2027–2039. CrossRefGoogle Scholar
  18. Kataeva IA, Uversky VN, Brewer JM, Schubot F, Rose JP, Wang B-C, Ljungdahl LG (2004) Interactions between immunoglobulin-like and catalytic modules in Clostridium thermocellum cellulosomal cellobiohydrolase CbhA. Protein Eng Des Sel 17:759–769. CrossRefGoogle Scholar
  19. Katherine A (2014) Guidelines for formatting gene and protein names. Accessed 18 October 2017
  20. Khosla C, Harbury PB (2001) Modular enzymes. Nature 409:247–252. CrossRefGoogle Scholar
  21. Koeck DE, Wibberg D, Koellmeier T, Blom J, Jaenicke S, Winkler A, Albersmeier A, Zverlov VV, Pühler A, Schwarz WH, Schlüter A (2013) Draft genome sequence of the cellulolytic Clostridium thermocellum wild-type strain BC1 playing a role in cellulosic biomass degradation. J Biotechnol 168:62–63. CrossRefGoogle Scholar
  22. Koeck DE, Zverlov VV, Liebl W, Schwarz WH (2014) Comparative genotyping of Clostridium thermocellum strains isolated from biogas plants: genetic markers and characterization of cellulolytic potential. Syst Appl Microbiol 37:311–319. CrossRefGoogle Scholar
  23. Koeck DE, Koellmeier T, Zverlov VV, Liebl W, Schwarz WH (2015) Differences in biomass degradation between newly isolated environmental strains of Clostridium thermocellum and heterogeneity in the size of the cellulosomal scaffoldin. Syst Appl Microbiol 38:424–432. CrossRefGoogle Scholar
  24. Koeck DE, Wibberg D, Maus I, Winkler A, Albersmeier A, Zverlov VV, Liebl W, Pühler A, Schwarz WH, Schlüter A (2016) Corrigendum to “Complete genome sequence of the cellulolytic thermophile Ruminoclostridium cellulosi wild-type strain DG5 isolated from a thermophilic biogas plant”. J Biotechnol 188(2014):136–137. J Biotechnol 237:35.
  25. Lacroix Z, Critchlow T (eds) (2003) Bioinformatics: managing scientific data. The Morgan Kaufmann series in multimedia information and systems. Morgan Kaufmann Publishers, San Francisco. ISBN: 155860829XGoogle Scholar
  26. Lamed R, Bayer EA, Utturkar SM, Borovok I, Klingeman- DM, Zhou J, Huntemann M, Clum A, Pillay M, Palaniappan K, Varghese N, Mikhailova N, Stamatis D, Reddy T, Ngan CY, Daum C, Shapiro N, Markowitz V, Ivanova N, Kyrpides N, Woyke T, Brown SD (2016) Ruminiclostridium thermocellum DSM 2360, complete genome., Accessed 19 October 2017
  27. Leis B, Held C, Bergkemper F, Dennemarck K, Steinbauer R, Reiter A, Mechelke M, Mörch M, Liebl W, Schwarz WH, Zverlov VV (2017) Comparative characterization of all cellulosomal cellulases from Clostridium thermocellum reveals high diversity in endoglucanase product formation essential for complex activity. Biotechnol Biofuels 10:240. doi. CrossRefGoogle Scholar
  28. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. CrossRefGoogle Scholar
  29. Lucas S, Copeland A, Lapidus A, Glavina del Rio T, Tice H, Saunders E, Brettin T, Detter JC, Han C, Bruce D, Goodwin L, Pitluck S, Larimer F, Land ML, Hauser L, Kyrpides N, Mikhailova N, Hemme CL, He Q, Wiegel J, Tanner R, Lynd L, Lawson P, Fields MW, He Z, Arkin A, Schadt C, Stevenson BS, McInerney M, Yang Y, Dong H, Huhnke R, Mielenz JR, Ding SY, Himmel M, Taghavi S, van der Lelie D, Zhou J (2013) Ruminiclostridium thermocellum JW20, whole genome shotgun sequencing project: GenBank: ABVG00000000.2. Accessed 19 October 2017
  30. Mel’nik MS, Rabinovich ML, Voznyĭ IV (1991) Tsellobiogidrolaza Clostridium thermocellum, obrazuemaia rekombinantnym shtammom E. coli (Cellobiohydrolase from Clostridium thermocellum, synthesized by a recombinant E. coli strain). Biokhimiia 56:1787–1797Google Scholar
  31. Mendes KR, Martinez JA, Kantrowitz ER (2010) Asymmetric allosteric signaling in aspartate transcarbamoylase. ACS Chem Biol 5:499–506. CrossRefGoogle Scholar
  32. Meyer HS (2007) AMA manual of style: a guide for authors and editors. In: Sec. 3, Terminology, 15.6.2 Human Gene Nomenclature, 10th edn. Univ. Press, Oxford.
  33. Ndeh D, Rogowski A, Cartmell A, Luis AS, Baslé A, Gray J, Venditto I, Briggs J, Zhang X, Labourel A, Terrapon N, Buffetto F, Nepogodiev S, Xiao Y, Field RA, Zhu Y, O’Neil MA, Urbanowicz BR, York WS, Davies GJ, Abbott DW, Ralet M-C, Martens EC, Henrissat B, Gilbert HJ (2017) Complex pectin metabolism by gut bacteria reveals novel catalytic functions. Nature 544:65–70. CrossRefGoogle Scholar
  34. Prosite ExPASy (2009) Annotation rule: PRU00777. Accessed 18 October 2017
  35. Prosite ExPASy (2015) PROSITE documentation PDOC51766: dockerin domain profile. Accessed 18 October 2017
  36. Schubot FD, Kataeva IA, Chang J, Shah AK, Ljungdahl LG, Rose JP, Wang B-C (2004) Structural basis for the exocellulase activity of the cellobiohydrolase CbhA from Clostridium thermocellum. Biochemistry 43:1163–1170. CrossRefGoogle Scholar
  37. Singh RN, Akimenko VK (1993) Isolation of a cellobiohydrolase of Clostridium thermocellum capable of degrading natural crystalline substrates. Biochem Biophys Res Commun 192:1123–1130. CrossRefGoogle Scholar
  38. Tuka K, Zverlov VV, Bumazkin BK, Velikodvorskaya GA, Strongin A (1990) Cloning and expression of Clostridium thermocellum genes coding for thermostable exoglucanases (cellobiohydrolases) in Escherichia coli cells. Biochem Biophys Res Commun 169:1055–1060CrossRefGoogle Scholar
  39. UniProt Knowledgebase UniProtKB - P26208 (BGLA_CLOTH). Accessed 18 October 2017
  40. Wilson CM, Rodriguez M, Johnson CM, Martin SL, Chu TM, Wolfinger RD, Hauser LJ, Land ML, Klingeman DM, Syed MH, Ragauskas AJ, Tschaplinski TJ, Mielenz JR, Brown SD (2013) Global transcriptome analysis of Clostridium thermocellum ATCC 27405 during growth on dilute acid pretreated Populus and switchgrass. Biotechnol Biofuels 6:179. CrossRefGoogle Scholar
  41. Wood R (1998) Genetic nomenclature guide with information on websites. Trends Genet 14:1–4. CrossRefGoogle Scholar
  42. Wright RM, Yablonsky MD, Shalita ZP, Goyal AK, Eveleigh DE (1992) Cloning, characterization, and nucleotide sequence of a gene encoding Microbispora bispora BglB, a thermostable beta-glucosidase expressed in Escherichia coli. Appl Environ Microbiol 58:3455–3465Google Scholar
  43. Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y (2012) dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 40:W445–W451. CrossRefGoogle Scholar
  44. Yutin N, Galperin MY (2013) A genomic update on clostridial phylogeny: gram-negative spore formers and other misplaced clostridia. Environ Microbiol 15:2631–2641. Google Scholar
  45. Zverlov VV, Schwarz WH The Clostridium thermocellum cellulosome—the paradigm of a multienzyme complex. In: Ohmiya K, Sakka K, Karita S, Kimura T, Sakka M, Onishi Y (ed) Proceedings of the Mie Bioforum 2003: biotechnology of lignocellulose degradation and biomass utilization. UNI Publishers Co., Ltd., Tokyo, pp. 137–147. ISBN: 4946450254Google Scholar
  46. Zverlov VV, Schwarz WH (2004) The Clostridium thermocellum cellulosome—the paradigm of a multienzyme complex. In: Oomiya K (ed) Biotechnology of lignocellulose degradation and biomass utilization: proceedings of Mie Bioforum 2003: on November 10–14, 2003, Ise-Shima, Japan. UNI Publishers Co., Ltd., Tokyo, pp 137–147Google Scholar
  47. Zverlov VV, Schwarz WH (2008) Bacterial cellulose hydrolysis in anaerobic environmental subsystems—Clostridium thermocellum and Clostridium stercorarium, thermophilic plant-fiber degraders. Ann N Y Acad Sci 1125:298–307. CrossRefGoogle Scholar
  48. Zverlov V, Piotukh K, Dakhova O, Velikodvorskaya G, Borriss R (1996) The multidomain xylanase A of the hyperthermophilic bacterium Thermotoga neapolitana is extremely thermoresistant. Appl Microbiol Biotechnol 45:245–247CrossRefGoogle Scholar
  49. Zverlov VV, Velikodvorskaya GV, Schwarz WH, Bronnenmeier K, Kellermann J, Staudenbauer WL (1998) Multidomain structure and cellulosomal localization of the Clostridium thermocellum cellobiohydrolase CbhA. J Bacteriol 180:3091–3099Google Scholar
  50. Zverlov VV, Velikodvorskaya GA, Schwarz WH, Kellermann J, Staudenbauer WL (1999) Duplicated Clostridium thermocellum cellobiohydrolase gene encoding cellulosomal subunits S3 and S5. Appl Microbiol Biotechnol 51:852–859CrossRefGoogle Scholar
  51. Zverlov VV, Kellermann J, Schwarz WH (2005) Functional subgenomics of Clostridium thermocellum cellulosomal genes: Identification of the major catalytic components in the extracellular complex and detection of three new enzymes. Proteomics 5:3646–3653. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Microbiology, TUM School of Life Sciences WeihenstephanTechnical University of MunichFreisingGermany
  2. 2.Biosciences CenterNational Renewable Energy LaboratoryGoldenUSA
  3. 3.Institute of Molecular GeneticsRussian Academy of ScienceMoscowRussia

Personalised recommendations