pp 1-45 | Cite as

Representation of Proteins with Posttranslational Modifications in the HL7 SPL Standard

  • Yulia BorodinaEmail author
  • Gunther Schadow
Part of the Methods in Pharmacology and Toxicology book series


The Health Level Seven (HL7) Structured Product Labeling (SPL) is an ANSI-accredited data exchange standard, which was adopted by the US Food and Drug Administration (FDA) for the exchange of health and regulatory product and facility data. We describe an extension of this standard for exchanging structural characteristics of substances used as ingredients in medicinal products, particularly in biopharmaceuticals. The chapter covers basics of the abstract SPL data model, its specialization for substances, and its further specialization for proteins with posttranslational modifications. The standard utilizes the XML syntax framework, which allows combining specialized substance-related standards, such as the IUPAC International Chemical Identifier (InChI), with coded terminologies and quantitative parameters important for substance identification. The key elements of the data model for substances are structural units connected in a specified manner or related to each other as mixtures. Small molecules are represented by chemical structures and are uniquely defined using InChI. Macromolecules are represented in two different ways depending on whether they were synthesized in a template-driven chemical/biochemical process (e.g., proteins synthesized on ribosomes) or in a non-template-driven process (e.g., synthetic polymers). In the case of proteins, the arrangement of repeating units is described using the conventional amino acid letter notation. In the case of synthetic polymers, the explicit chemical structures of repeating units are provided. Finally, layers of modifications to the chains are described consistently by substituting the standard structural repeating units with special structural units whose structures are provided in the same XML document. The InChI canonicalization algorithm and the InChI atom numbering schema are used to ensure that the relationships between structural units are represented canonically. Bridging “bioinformatical” and “chemoinformatical” approaches in this way allows describing structures of very complex biochemical objects such as proteins with posttranslational modifications.


Electronic standard Substance Protein Therapeutic protein Modified protein Posttranslational modification Biosimilar Data exchange Informatics IDMP SPL HL7 



This publication targets the scientific chemoinformatics community only and should not be regarded as a guidance for industry.


  1. 1.
    Sekhon BS (2010) Biopharmaceuticals: an overview. Thai J Pharm Sci 34:1–19Google Scholar
  2. 2.
    Martin-Liberal J, Ochoa de Olza M, Hierro C et al (2017) The expanding role of immunotherapy. Cancer Treat Rev 54:74–86Google Scholar
  3. 3.
    Ayyar BV, Arora S, O’Kennedy R (2016) Coming-of-age of antibodies in cancer therapeutics. Trends Pharmacol Sci 37:1009–1028Google Scholar
  4. 4.
    Lagassé HA, Alexaki A, Simhadri VL et al (2017) Recent advances in (therapeutic protein) drug development. F1000Res 6:113. Scholar
  5. 5.
    Declerck PJ (2012) Biologicals and biosimilars: a review of the science and its implications. Generics Biosimilars Initiative J 1:13–16Google Scholar
  6. 6.
    Government Publishing Office (2009) Licensure pathway for biosimilar biological products. Accessed 21 May 2018
  7. 7.
    World Health Organization (2009) Guidelines on evaluation of similar biotherapeutic products (SBPs) Accessed 21 May 2018
  8. 8.
    European Medicines Agency (2005) Guideline on similar biological medicinal products. Accessed 21 May 2018
  9. 9.
    U.S. Food and Drug Administration (2015) Scientific considerations in demonstrating biosimilarity to a reference product.
  10. 10.
    Zhang YJ, Luo L, Desai DD (2016) Overview on biotherapeutic proteins: impact on bioanalysis. Bioanalysis 8(1):1–9Google Scholar
  11. 11.
    Kia-Ki H, Martinage A (1992) Post-translational chemical modification(s) of proteins. Int J Biochem 24:19–28Google Scholar
  12. 12.
    Wiltschi B (2012) Expressed protein modifications: making synthetic proteins. Methods Mol Biol 813:211–225Google Scholar
  13. 13.
    Terasaka N, Iwane Y, Geiermann A-S et al (2015) Recent developments of engineered translational machineries for the incorporation of non-canonical amino acids into polypeptides. Int J Mol Sci 16:6513–6531Google Scholar
  14. 14.
    Turecek PL, Bossard MJ, Schoetens F, Ivens IA (2016) PEGylation of biopharmaceuticals: a review of chemistry and nonclinical safety information of approved drugs. J Pharm Sci 105:460–475Google Scholar
  15. 15.
    Gong Y, Leroux J-C, Gauthier MA (2015) Releasable conjugation of polymers to proteins. Bioconjug Chem 26:1172–1181Google Scholar
  16. 16.
    Bakhtiar R (2016) Antibody drug conjugates. Biotechnol Lett 38:1655–1664Google Scholar
  17. 17.
    Yao H, Jiang F, Lu A, Zhang G (2016) Methods to design and synthesize antibody-drug conjugates (ADCs). Int J Mol Sci. Scholar
  18. 18.
    ISO (2012) Health informatics—identification of medicinal products—data elements and structures for the unique identification and exchange of regulated information on substances. Accessed 21 May 2018
  19. 19.
    Health Level Seven International (2018) HL7 Version 3 Standard: Structured Product Labeling, Release 7 (SPL R7). Accessed 21 May 2018
  20. 20.
    Beeler GW, Huff S, Rishel W et al (1999) HL7 v3 message development framework. Accessed 21 May 2018
  21. 21.
    Soley R and the OMG Staff Strategy Group (2000) Model driven architecture [White paper]. Object Management Group White. Accessed 21 May 2018
  22. 22.
    International Telecommunication Union (2002) Information technology—abstract syntax notation one (ASN.1): specification of basic notation [standard]. ITU-T Recommendation X.680. Accessed 21 May 2018
  23. 23.
    Rivest RL (1997) S-expressions. Accessed 21 May 2018
  24. 24.
    W3C (1998) Extensible Markup Language (XML) 1.0. Accessed 21 May 2018
  25. 25.
    Moss L (2008) Enterprise data modeling—is it worth it? EIMInsight Mag 2(1). Accessed 21 May 2018
  26. 26.
    Russler DC1, Schadow G, Mead C et al (1999) Influences of the Unified Service Action Model on the HL7 Reference Information Model. Proc AMIA Symp 1999:930–934Google Scholar
  27. 27.
    Fennell P (2014) Schematron—more useful than you’d thought. In: XML London 2014—conference proceedings, University College London, London, UK, 7–8 June 2014Google Scholar
  28. 28.
    W3C (2010) XML Path Language (XPath) 2.0. In: Berglund A, Boag S, Chamberlin D et al (eds) W3C recommendation 14 December 2010. Accessed 21 May 2018
  29. 29.
    International Union of Pure and Applied Chemistry (2014) Gold book. Accessed 21 May 2018
  30. 30.
    Sioutos N, de Coronado S, Haber MW et al (2007) NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40:30–43Google Scholar
  31. 31.
    Stein SE, Heller SR, Tchekhovskoi DV, Pletnev IV (2017) InChI version 1, software version 1.05. Accessed 21 May 2018
  32. 32.
    Hull SE, Barnard JM, Thomas DG (2011) InChI source code documentation. Accessed 21 May 2018
  33. 33.
    Wilks ES (1997) Polymer nomenclature and structure: a comparison of systems used by CAS, IUPAC, MDL, and DuPont. 3. Comb/graft, cross-linked, and dendritic/hyperconnected/star polymers. J Chem Inf Comput Sci 37:209–223Google Scholar
  34. 34.
    The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res.
  35. 35.
    World Health Organization (2018) Essential medicines and health products. Accessed 21 May 2018
  36. 36.
    Wikipedia (2018) Desmosine. Accessed 21 May 2018
  37. 37.
    UniProt (2018) UniProtKB—P15502 (ELN_HUMAN) Accessed 21 May 2018
  38. 38.
    UniProt (2018) UniProtKB—P80025 (PERL_BOVIN). Accessed 21 May 2018
  39. 39.
    FDA (2007) Food and Drug Administration Substance Registration System Standard Operating Procedure. Accessed 21 May 2018
  40. 40.
    Rae TD, Goff HM (1996) Lactoperoxidase heme structure characterized by paramagnetic proton NMR spectroscopy. J Am Chem Soc 118:2103–2104Google Scholar
  41. 41.
    World Health Organization (2014) .Recommended international nonproprietary names: list 72. WHO Drug Inf 28(3):401Google Scholar
  42. 42.
    World Health Organization (2013) .Recommended international nonproprietary names: list 70. WHO Drug Inf 27(3):302Google Scholar
  43. 43.
    Gupta GS (2012) Lectican protein family. In: Gupta GS (ed) Animal lectins: form, function and clinical applications. Springer, WienGoogle Scholar
  44. 44.
    UniProt (2018) UniProtKB—P16112 (PGCA_HUMAN) Accessed 21 May 2018
  45. 45.
    Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488Google Scholar
  46. 46.
    Zhang T, Li H, Xi H et al (2012) HELM: a hierarchical notation language for complex biomolecule structure representation. J Chem Inf Model 52(10):2796–2806Google Scholar
  47. 47.
    U.S. Food and Drug Administration (2018) Substance Registration System—Unique Ingredient Identifier (UNII). Accessed 21 May 2018
  48. 48.
    Chemical Abstracts Service (2018) CAS registry and CAS registry number FAQs. Accessed 21 May 2018
  49. 49.
    Kulmanov M, Khan MA, Hoehndorf R (2018) DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 34(4):660–668Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.U.S. Food and Drug AdministrationSilver SpringUSA
  2. 2.Pragmatic Data, LLCIndianapolisUSA

Personalised recommendations