Introduction

Participation in biological processes is what makes particular molecular entities uniquely important and interestingFootnote 1. However, the very importance of biological processes has fostered piecemeal approaches to the description of functional relationships between biological activities and the chemical substances that express them. This document attempts to define biological activity in a way that is both biologically informative and that enables development of the quantitative measurements needed to fully exploit the new knowledge. We identify the biological activityFootnote 2 of any entity by its ability to effect a change in a biological process. We then devise a framework for defining and measuring biological activity based on the tenets of modern measurement science, but which, at the same time, is practical enough for use during the course of the discovery and characterization of new biologically active entitiesFootnote 3.

A fundamental dichotomy in the perspective regarding biological substances exists between chemistry and biology. In chemistry, an entity is identified by its molecular structure and the amount of the entity is typically measured in moles or grams. In biology, the ability of an entity to effect a change in a biological-process-based assay (bioassay)i identifies the entity. Quantification of the activity is obtained by an empirical dose–response relationship. More often than not, the dose is varied by diluting a sample of a biological material that includes the activity-expressing substance(s). While both disciplines may seek a biochemical-mechanism-based description of the process, a biologist is mainly concerned about what the entity does, whereas a chemist is mainly interested in what the entity is and how much of it is present. Recognition of this dichotomy and the incorporation of substance, substance amount, and the property responsible for the expression of function into a new definition for biological activity is the primary goal of this document.

The dichotomy between chemistry and biology has resulted in confusion regarding the measurements of biological entities and has confounded efforts to improve the comparability, traceability, and equivalence of the results of many biological assays. The quantitative measurement of biological materials by physical and chemical means is sometimes used to infer or predict biological activity, although the measurements themselves provide no information about the activity. For example, macromolecules such as proteins may be chemically measurable, but functionally inactive. Conversely, biological activity measurements are used to infer the amount of the entity that is present, but these measurements are fraught with potential for serious bias. Measurements made in the presence of interfering or inhibitory entities will frequently underestimate the amount of the biologically active entity. Similarly, an apoenzyme will be inactive in the absence of an obligatory cofactor. Activity measurements can also overestimate the amount of an associated molecular entity if other entities are present that enhance the activity, such as pro-enzyme activation. A distinction between biological and chemical entities has long been recognized in some World Health Organization (WHO) documents, where biological entities are specifically described as those that “cannot be characterized adequately by physicochemical means alone” [1]. While true, this stance offers no hint at a solution to the problem.

Strategy and goals for the proposed definition of biological activity

Our strategy for relating biological activity to the amount of a molecular entity begins by proposing a definition for biological activity that merges insights provided from both biological and chemical approaches to measurement. A common definition for biological activity is highly desirable for communicating information, particularly to those who make life-saving or life-threatening decisions on the basis of the reported values of markers of biological dysfunction.

To be most useful, the definition for biological activity should be applicable to both the simplest and the most complex reaction systems and molecules. It should also provide a means for refinement as knowledge of the biological process advances, a property we describe as extensibilityii. We implicitly use proteins as the archetype macromolecule; in subsequent documents, we will relate the definition to other macromolecules and to relatively low molecular mass entities. The value of this definition will be realized if, in medicine, it: (1) facilitates direct comparison of the potency (activity) of different biological entities that are being used therapeutically or being measured for their diagnostic value; (2) allows estimation of the extent of their equivalence; and (3) decreases the likelihood of incorrect diagnoses.

Several goals have been established for this definition of biological activity. The definition is intended first to provide a framework for communicating, discussing, and expanding knowledge of biological activity as it relates to chemical structure. A second goal is to separate the measurement of the structure-derived biological properties from that of the structure, physical properties, and amounts of the entities involved. In this regard, the definition follows the long-standing biochemical approach of relating structure and function. The third goal is to formulate a definition that is metrologically sound and, thereby, to facilitate use of measurements of biological activity without the confusion that is inevitable when arbitrary units or impermanent references are used. It is also intended that the definition be useful to bioinformatic efforts to make the rapidly expanding knowledgebase of biology more readily accessible.

The framework and a general definition for biological activityiii

Several truisms form the basis of the proposed definition. While perhaps “obvious” to some, we believe that these axioms establish the framework and the logic that underlies the definition. In a highly simplified way, the focus on a molecular entity with biological activity can be stated to be: (1) what it is; (2) what it does, where “what” is clearly plural; and (3) how much of it is present, both entity amount and expressed activity.

Axiom 1

Biological macromolecules are the predominant agents of biological activity. Many biological macromolecules express more than one definable activity or function.

Axiom 2

A particular function of a macromolecule is a property determined by structural attributesiv of the macromolecule.

Axiom 3

The function(s) of biological entities is (are) modifiable. These modifications may be the consequence of interaction(s)v with other molecules, the composition of the solution in which they are found, and temperature. Commonly recognized functional properties include: ligand-binding and binding site affinity, efficiency of expression of the activity, and specificity. An obvious example is the interaction of a protein with an allosteric effector.

Axiom 4

The biological activities of small molecules (ligands) are reciprocally related to the macromolecules to which they bind (acceptors). That is, the expression of a biological activity that can be ascribed to a small molecule is the consequence of the linkagevi between the small molecule bound to the macromolecule and the effect on the macromolecule [2, 3].

Axiom 5

Each distinguishable activity of a biological entity must be represented by the simplest possible set of chemical equationsvii. Multiple chemical equations are expected for most macromolecules; complexities should be introduced parsimoniously.

Metrological principles

The goal of measurement science (metrology) is “to achieve comparability of results over space and time” [4]. The four interrelated measurement principles that form the basis for achieving comparability are: fitness for purpose, validation, uncertainty, and traceability. That is: a measurement system must be designed to provide measurement results of adequate quality for the task(s) at hand; the implementation of the design must be shown to indeed provide results of adequate quality; the measurement results must explicitly state what the expected measurement quality is; and the measurement results from a particular measurement system must be relatable to results obtained from other measurement systems through a common set of primary references.

Two tools of proven worth in the pursuit of comparabilityviii in physical and chemical metrology are the common system of units provided by the International System of Unitsix (SI) and the common nomenclature provided by the International Vocabulary of Metrology (VIM) [5]. These internationally accepted systems of units and vocabulary facilitate the achievement of measurement comparability and the evaluation and description of the extent of its achievement.

The rigorous application of these principles and tools to the measurement of biological activity has been limited to small molecules and “procedure-defined measurands”x, e.g., enzymes [6]. We believe that these principles can be applied generally to the measurement of other biological entities, including macromolecules, and can be directly related to their activity in biological systems. The proposed definition for biological activity provides a means for applying metrology to all biological substances.

The algebraic definition of biological activity

The first step in applying metrological principles to the measurement of a biological entity requires the separation of entity and entity amount from the entity’s expression of function. Although it has always been evident that biological activity is dependent on what the entity is and its amount, explicit separation of amount and activity is not commonly made. Because such a separation cannot be made when activity is assigned in arbitrary units to a complex mixture, one component of which is assumed to express the activity, this “traditional” approach will always be severely limited. We propose a parameter f which links entity, entity amount, and biological activity to achieve this separation.

Based on the axioms stated above and an imperfect analogy with thermodynamic activityFootnote 4, the following simple algebraic equation is proposed to define the biological activity of an entity:

$$A = cf$$

where A is the biological activity, c is the amount-of-substance concentration of the entity of interest, and f is a parameter designated as inherent activity xi. The description of f as “inherent” can be legitimately applied without ambiguity only for an idealized reference material in which all of the molecular entities present are identicalxii. However, materials that approximate this requirement are rapidly becoming availableFootnote 5.

This equation emphasizes that the measurement of the concentration alone does not suffice to describe the functional capability of a biological entity, nor does the measurement of activity alone infer unambiguously the entity concentration.

With regard to this definition, the aforementioned biological perspective may be regarded as a focus on A without recognizing it as the product of c and f, whereas the chemical perspective is focused on c without considering that the functional significance of c lies in the value f. The proposed definition enables harmonizing the two perspectives by separating the chemical variable and biological variable into two terms that, only together, disclose functional activity.

The definition of biological activity, A=cf describes a linear dose–response curve of biological activity as a function of concentration. The intercept on the ordinate is zero (0). Ideally, concentration is an entity amount traceable to a certified reference material (CRM) of high purity. When no suitable CRM is available and a suitably stable and homogeneous reference material cannot be identified or developed, f′ is used to indicate that the relationship is empirical (Fig. 1).

Fig. 1
figure 1

Dependence of biological activity on concentration

When the entity concentration c is known, the definition describes a straight line that passes through the origin and has a slope f. The parameter f is a property of the biological entity for which the defined function is being measured and contains the relevant information about the ability of the particular attribute of the molecular entity to express its activity. Although the equation itself does not demand that the concentration be expressed as mol L−1, inferences about the function that can be related to molecular structure follow from concentrations expressed on that scale. In this regard, information related to the structure of the molecular entity and its related properties (e.g., molecular mass) is intended to be linked to c without reference to the function(s) expressed by the entity.

It must be noted that attributes (e.g., sites) of biological macromolecules limit the site occupancy to the concentration of the ligand that binds to the site, its stoichiometry, and the binding constant of the site for the ligand. Consequently, the process of binding is intrinsically non-linear and is commonly described by a rectangular hyperbolic function. A strictly linear dependence of A upon c is, thus, only observed in the limiting slope as c approaches zero. In this regard, f differs from the classical activity coefficient, which approaches one as c approaches zero.

An additional, and possibly the greatest, advantage of this definition is that the variable f can be interpreted using contemporary knowledge of biochemical reaction mechanismsFootnote 6. In fact, it is the ability to interpret f mechanistically that makes it useful for harmonizing the biological and chemical perspectives. It is, moreover, the capability of the mechanistic descriptions of the biological process, e.g., kinetic equations describing enzymatic reactions, that enables inherently non-linear processes to be transformed into forms that permit such simple description.

For f to be interpreted as an inherent activity of the macromolecule, the process that is being measured must be the process described by the defining chemical equation. When a measurement procedure is or cannot be limited exclusively to a single attribute, separate chemical equations must be written to describe each attribute and reaction that occurs in the measurement procedure in order to avoid confusion in interpreting f. If the measured biological activity comprises the expression of multiple functions, then f is unlikely to be related to a single or particular attribute of the macromolecule in a readily discernible way, unless all of the individual functions are specified and their combined effects are taken into account. Succinctly, the utility of the proposed definition demands appropriate definition of the measurand.

In situations where the definition seemingly suffices, but doubt remains about the adequacy of the chemical equation(s) used to define the function, the inherent activity should be defined as f′ to indicate the doubt. This is expected to be the prevalent situation. If, as is inevitable in the earliest stages of discovery and characterization, the measurement procedure is necessarily empirical, then the functional capability parameter should again be designated as f′ to signify its lack of a proven chemical basis. As information regarding the entity and its measurement increases, refinements to the definition of the function, the measured value, and the extent of interpretation of f can be made. Moreover, when the limitations of the initial estimate for f are stated, the changes and the causes that demand change can be recognized and insight gained from the refinement process. The approach based on this proposed definition promotes meeting the goals of metrology by acknowledging uncertainty, first qualitatively and subsequently quantitatively, and in any particular measurement procedure, it fosters a clear definition of the measurand, the effects of interactants and influence quantities, and a rational discussion of fitness for purpose.

Although it is universally recognized that biological activity does not appear in a hypothetical isolated state, i.e., as a description only on paper or as a depiction of the 3D structure of a macromolecule, this seems to be easily forgotten when actually describing a measurand. Biological activity is a reflection of interactions between molecules noted in the defining equation(s) and their transformation(s) in the milieu provided by the medium (solution) in which the reactions occur. Further, in this regard, interaction between a substrate and an enzyme or a ligand and a receptor implies a reciprocal relationship that requires the consideration of both entities. Each molecular entity will possess its own attributes and their associated inherent functional capabilities that are represented by the value of f; but for simple low molecular mass entities, f may simply be unity. The linkage relationship is analogous to the linked functions of thermodynamics [2, 3] and reciprocal relationships of thermodynamics.

Entity identity, entity amount, structural attributes, and the expression of activity—practical considerations for the measurement and interpretation of f

The separation of biological activity into chemical and biological terms via c and f forms the basis for resolving the dichotomy of chemical and biological perspectives. In this regard, the differences in the chemical and biological perspective to which we attribute much of the confusion surrounding the measurement of biological activity now become the means for eliminating that confusion.

Macromolecular entities such as proteins can be treated as chemical entities in the same ways as simpler molecules. Although macromolecules are much more complex, technological advances now make it feasible to approach their characterization in the same ways as is done for relatively low molecular mass entities. The characterization of protein molecules and the determination of their concentrations in the SI unit of mol L−1 is becoming practical; many commercial, governmental, and academic organizations exist to provide the necessary measurements of amino acid composition, post-ribosomal modification, prosthetic group content, etc. The measurement of c for proteins can be rigorous, although the uncertainty of the measured value can be very large because of calibration bias, unless the procedure used is directly related to the particular protein and its amino acid composition. However, when accompanied by an appropriate uncertainty estimatexiii, c will be suitable for use in this definition for biological activity, in part, because the approach anticipates refinement.

Estimation of the uncertainty of c must recognize and consider the intrinsic heterogeneity of biological macromolecules. Macromolecules that have been derived from biological sources, even highly purified preparations without detectable contaminants or observable heterogeneity, will inevitably be heterogeneous to some extent. Materials of biological origin are heterogeneous because of genetic mutations, polymorphisms, and variability in post-translational modification. In practice, this is a consequence of the pooling of tissues (e.g., blood plasma) from multiple individuals prior to purification of the entity. Such micro-heterogeneity can be very small, and, in many cases, of little or no importance. However, values for the measured properties of these “real” materials are averages of the properties of the individual molecules.

Failure to recognize such sources of heterogeneity can lead to unproductive discussion and inappropriate estimation of the true uncertainty of the inherent activity f to be assigned to the reference material. For this discussion, however, it is conceptually helpful to assume a hypothetical reference material that consists of molecules, all of which are identical in structure. Such an idealized macromolecular entity provides an advantage in that the functions of the macromolecule can be directly related to its sequence and sites on the macromolecule that are altered post transcriptionally or translationally. Moreover, without unnecessary discussion, the structures can be assumed to be represented by their primary sequences and 3D structures. A real example that may approximate the idealized reference material is a recombinant protein produced in a system with complete fidelity in transcription, translation, and post-translational modification. In the context of metrological traceability, such an idealized material might be considered to be an approximation to a primary, pure-entity reference material. Entity identity is thus defined, and the entity amount and the uncertainty of its measurement is included in the assessment of c in the defining equation. All secondary batches of the entity are traced to the primary material through the values of the substance amount, the variations in the nature, and the extent of post-translational modifications and the effects of suchlike on the value of f for the batch.

It is conceptually important to distinguish intrinsic heterogeneity as described above from heterogeneity resulting from the reference material of biological origin being a mixture of different entities. The consequences of this latter type of heterogeneity, i.e., the presence of other entities (contaminants) in mixtures of biological materials, such as blood plasma, must be treated with regard to the influences they exert on the measurements of the biological activity of the idealized reference material or its practical equivalent. Because some such entities, isolated in conjunction with the macromolecule, are likely to be interactants (activity modifiers), they must be explicitly measured whenever they are known. Matrix-based reference materials in which the measurand is a pure entity can be characterized with respect to the effects of matrix components on the measurand, whereas reference materials in which activity is defined by the measurand native to the matrix cannot be so characterized and, thus, are predisposed to the effects of interactants and influence quantities that are probably unknowable.

The specific interpretation of f is intentionally and obligatorily linked to the process and chemical equations that define and describe the function. As previously noted, the definition for A is analogous to the definition of activity (a) in classical thermodynamics, but with appropriate caveats. Thus, f can be considered to be analogous to γ, the thermodynamic activity coefficient. Interpretation of γ for ions is based on the Debye–Hückel–Onsager theory of electrolytes [8, 9]Footnote 7. In the context of enzyme biological activity, the obvious model for interpreting f is the Michaelis–Menten equation (including its forms that include the effects of modifiers).

By way of illustration that this is an established approach for characterizing enzymes and their activity, the chemical equation that describes the catalytic action of an enzyme in its simplest form is:

$$\begin{aligned}{\text{Enzyme}} + {\text{Substrate}}\overset {K_{{\text{m}}} } \leftrightarrows \;{\text{Enzyme:Substrate}}&\\{\text{Enzyme:Substrate}}\xrightarrow{{k_{{\text{c}}} }}{\text{Enzyme}} + {\text{Product}}\end{aligned}$$

The two processes involved in the expression of enzyme activity are substrate recognition and binding, represented by K m (the Michaelis constant), and chemical transformation, represented by k c. Interpretation, thus, links biological activity (A) to the concentration of the enzyme, c enzyme, from:

$$A = - \frac{{{\text{d}}c_{{{\text{substrate}}}} }}{{{\text{d}}t}} = c_{{{\text{enzyme}}}} k_{{\text{c}}} \frac{{c_{{{\text{substrate}}}} }}{{K_{{\text{m}}} + c_{{{\text{substrate}}}} }}$$

A linear dependence on c enzyme is usually observed, but there are situations in which this generalization does not holdFootnote 8. If the independent variable is c substrate, then linearity is only apparent when c substrate<<K m. The selection of appropriate conditions for using an enzyme-catalyzed reaction as a means for the measurement of c enzyme, c substrate>K m, and for c substrate, c substrate<<K m, are well and long established [10]. The single parameter that informs the catalytic efficiency of an enzyme for the particular substrate is the ratio of k c/K m; conventionally, this is given with dimensions mol−1 L s−1 or a decimal multiple, e.g., μmol−1 L s−1. Since 1999, the SI name for the unit of catalytic activity (A) is the katal (kat), having dimensions s−1 mol. The SI expression for catalytic activity concentration is katal per cubic meter, kat m−3. Based on the SI unit, the katal, the units for f are L s−1. The efficiencies of enzyme-catalyzed processes and the low concentrations of the enzymes and substrates in such reactions lead to measured values reported in μkat L−1 or nkat L−1.

A similar chemical equation would apply for a system consisting of a macromolecular receptor and a small drug molecule that alters the biological behavior of the receptor. Likewise, if the molecular entity were an oligonucleotide, then binding alone might be the biological activity. However, if oligonucleotide binding were the first step in the process of gene transcription, then additional chemical equations and reactions would be included.

Discussion

The results from the measurement of entities of biological origin constitute a substantial part of the information upon which medical diagnoses are made. Moreover, it has been estimated that 60–70% of medical diagnoses are based on laboratory tests [11]. Assays, or more specifically, measurement procedures, provide values of biomarkers that underpin diagnostic decisions and therapeutic interventions. The comparability and equivalence of measurement results for biomarkers are necessary to support appropriate diagnoses and avoid misdiagnoses.

The utility of genomic information ultimately derives from its ability to predict phenotype as exhibited in resistance or susceptibility to disease. Although historically applied to a trait exhibited by an organism, the concept of phenotype can be, and now is, applied to both cells and macromolecules. Phenotype expression by a macromolecule is, in essence, what we call biological activity. Genetic mutations and polymorphisms can, and frequently do, alter function, thus, the need for measurement of the functional consequences of the structural changes.

The complexity of biological systems and biological macromolecules creates a formidable challenge for the development of traceable reference materials and procedures that provide equivalent values for the measurand. This is particularly so for proteins because of the multiple functions that a single protein frequently expresses, and also because of the presence in the biological sample of different proteins that nominally express the same or a similar function. A practical but limited solution to this problem has been achieved for the measurement of enzymatic activity under highly optimized and rigidly defined conditions, a solution that designates enzymatic activity measurements as catalytic activity concentrations and procedure-defined measurands [6]x. The “procedure-defined measurand” approach does not lend itself to the acquisition of new information about the function of the measurand and does not address the challenge that originates from the need for translating genotype to phenotype at the level of the molecule and its structure.

The proposed definition can also be viewed as an attempt to improve on the “procedure-defined measurand” approach; we have devised a simple equation that separates the more definable entity and entity amount, the pertinent parameters as perceived by a chemist or chemical metrologist, from the expression of the biological function of the entity. To do so, we have further attempted to take advantage of much that is already known. However, at this stage, the approach has not been applied generally to the measurement of entities of biological origin and their properties that imbue them with biological functionFootnote 9.

A benefit that derives from the proposed description of biological function and the particular structural attribute responsible extends beyond the definition of the measurand. The proposed definition can also focus attention on the fitness for purpose of the measurement procedureFootnote 10. Functions, reactants, and interferents that cannot be described completely may well be important in identifying the limitations of particular measurement procedures. Although incompletely described procedures might be suitable for some purposes, perhaps because there are no alternative procedures, they will be unfit for reference measurement procedures that are intended to be included in a metrological traceability chain.

The utility of a general definition extends beyond the measurement of biomarkers used in medical diagnosis. The estimation of the potency and, thus, dose of therapeutic agents derived from biological materials is another obvious area of applicability. If interactants and interferences are identified during the development and validation of a measurement procedure based on the general definition, the procedure should prove beneficial for the monitoring of therapeutic, as well as adverse, effects of drugs.

As already noted, the lack of comparability and absence of a high degree of equivalence of biological activity measurements create opportunities for inappropriate diagnostic inferences to be made. When different measurement procedures (e.g., different manufacturers’ kits for nominally the same entity or activity) do not produce equivalent results, the clinician can be misled. Similarly, when the results of large, multiple-site epidemiological studies are interpreted, bias can obscure relationships and also lead to invalid inferences. The proposed definition for biological activity attempts to reduce the incidence of judgment errors that are derived from the ambiguity and inherent complexity of the measurement of biological activity and biological entities. It does so by separating the strictly chemical properties of a measurand from the functional consequences of those properties—biological or, more strictly, biochemical function. Moreover, because the definition is intended to be simple yet rigorous, its application can be used to identify and correct for interferents and influence quantities that could otherwise go unnoticed because this definition embodies predictability and testability [12].

The authors recognize that all change is, itself, initially prone to be a source of confusion and mistakes. In an attempt to minimize this and also to avoid needless change, we have drawn on the insights of individuals whose interests and contributions to laboratory medicine derive from biological perspectives, as well as from those whose insights are based on the measurement of simpler chemical entities. Similarly, we have adapted, by analogy, a definition for biological activity that has a precedent in chemical thermodynamics—the concept of chemical concentration and an activity coefficient that can be interpreted in molecular terms to explain why the behavior of even simple molecular entities is not always compatible with 100% efficiency of function. To the extent that readers of this proposed definition recognize familiar principles being applied to a more complex system, we hope that this approach will have a good chance of achieving its intended effect.

On a more technical level, the most valuable aspect of the proposed definition derives from its separation of the “chemical” and “biological” aspects of the defining chemical equation(s). This in itself is not novel, but neither the utility nor even the necessity of this separation seems to have been recognized. Inferences from the measurement of activity regarding entity amount or activity from the measurement of entity are now clearly related to a defined parameter, f. Because the value for f is related to the composition of the solution in which the measurement procedure occurs, it is constrained to be measured under well-controlled conditions—a long-recognized prerequisite for all procedure-defined measurands. Moreover, as illustrated by the Michaelis–Menten equation for an enzyme, f is interpretable in the same fashion as charge effects on ion properties in the Debye–Hückel–Onsager theory for electrolytes. The appropriate mechanistic equation is determined by the chemical equations that describe the biological activity.

The definition also aids in harmonizing the empirical and mechanistic interpretation of f because of its simple linearity, a feature that, for intrinsically non-linear biological processes and phenomena, can be obtained by mathematical transformation of the mechanistic equation(s) or, as already noted, by extrapolation of c to zero. However, the behavior of the function that mechanistically defines biological activity does not have to be linear, but it must be explicitly stated. Another evident benefit of this simple definition is that it is consistent with practices in pharmacology and other biological fields, although some descriptive terms used in other areas may not be strictly equivalent to f.

The focus on attributes as “agents” that produce or express function and the requirement that the attribute be included in the chemical equation should be useful in databases that catalog properties of biological macromolecules. The separation of c and f provides for extensibility; structure and substance amount are included in c and the expressed function in f. Structural attributes are related to motifs, domains, and other descriptors of structural features of macromolecules, particularly proteins. The prescription of an “idealized macromolecule,” moreover, provides a path between the entity and its 3D structure as provided by crystallographic and magnetic resonance methods. Such a path is, in our minds, necessary if the information from all disciplines is to be used to devise and develop measurement procedures that are suitable for all intended purposes. When limitations exist, they can be known and be used as caveats to prevent inappropriate inferences and decisions.