Skip to main content

Beyond categorical definitions of life: a data-driven approach to assessing lifeness


The concept of “life” certainly is of some use to distinguish birds and beavers from water and stones. This pragmatic usefulness has led to its construal as a categorical predicate that can sift out living entities from non-living ones depending on their possessing specific properties—reproduction, metabolism, evolvability etc. In this paper, we argue against this binary construal of life. Using text-mining methods across over 30,000 scientific articles, we defend instead a degrees-of-life view and show how these methods can contribute to experimental philosophy of science and concept explication. We apply topic-modeling algorithms to identify which specific properties are attributed to a target set of entities (bacteria, archaea, viruses, prions, plasmids, phages and the molecule of adenine). Eight major clusters of properties were identified together with their relative relevance for each target entity (two that relate to metabolism and catalysis, one to genetics, one to evolvability, one to structure, and—rather unexpectedly—three that concern interactions with the environment broadly construed). While aligning with intuitions—for instance about viruses being less alive than bacteria—these quantitative results also reveal differential degrees of performance that have so far remained elusive or overlooked. Taken together, these analyses provide a conceptual “lifeness space” that makes it possible to move away from a categorical construal of life by empirically assessing the relative lifeness of more-or-less alive entities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. This paper builds on the conceptual idea of “lifeness signatures” as proposed by Malaterre (2010b). One aspect of our contribution is to show that such conceptual construal of life can be operationalized and rendered measurable. As noted by one referee for Synthese, our views bear some resemblance with Godfrey-Smith’s Darwinian space (2009) in that both are multidimensional. The two projects pursue however different objectives: a characterization of more-or-less paradigmatically Darwinian populations for Godfrey-Smith, a characterization of more-or-less alive entities in our case. The methods are also different, relatively qualitative in the case of Godfrey-Smith, more quantitative and data-driven in our case. Both instances show the value of thinking multi-dimensionally for conceptual explication.

  2. For lists of definitions, see e.g. Popa (2004), Pályi et al. (2002). See also Tirard et al. (2010) for an historical perspective on definitions of life.

  3. The analysis of how defining impacts (DL) is beyond the scope of this paper, but see Malaterre (2010c).

  4. Investigating whether life delineates a natural kind or, possibly, imposes adjustments to existing accounts of natural kinds is beyond the scope of this paper, but see e.g. Lange (1996), Khalidi (1998), Diéguez (2013), Ferreira Ruiz and Umerez (2018).

  5. The question whether natural kinds should partition the entities of the universe into unambiguous and non-intersecting sets is the object of debate; see e.g. Boyd (1999), Ellis (2001). For the sake of the present discussion, suffice it to say that at least some authors argue simultaneously for a positive answer to (DL-O) and a binary partitioning.

  6. For guidance, one may look at the 100 + definitions listed in Pályi et al. (2002), Popa (2004), most of which take the form of biconditionals delineating life in a binary fashion and without further justification of that binary view. Some authors, though, appeal to phase-transitions or emergent properties that would sharply distinguish life from non-life, e.g. Lange (1996), Luisi (2006); for a critical assessment of these, see Malaterre (2010a).

  7. Along the same line of thoughts, see also the operational role of definitions of life in the search for life elsewhere than on Earth— be it on Mars, on the Jovian moons or, much further away, on exoplanets circling other suns in our galaxy—with the mediation of “biosignatures” (Raulin 2010; Seager et al. 2016), as well as in the search for life in the test-tube to assess whether experimental attempts at creating life have been successful or not (Blain and Szostak 2014).

  8. As pointed to us by a referee, were the categories “alive” and “not-alive” be considered as determinate membership categories (i.e., once something has been found to be a member, that thing is a full member), entities of the gray-zone of lifeness could be considered as borderline cases with indeterminate membership (neither members nor non-members of the two categories “alive” and “not-alive”). As a result, they could be considered as delineating a third determinate membership category. Alternatively, they could be taken as evidence that the categories “alive” and “not-alive” are not determinate membership categories but categories with degrees of membership. Independently of which interpretation one chooses, our point is to stress out that construing the two categories “alive” and “not-alive” as being mutually exclusive and collectively exhaustive categories with determinate membership does not do justice to the variety of entities that populate the world. Hence our argument against (Bi).

  9. This synchronic view can contribute, for instance, to arguments about the lack of clear-cut delineation of biodiversity’s scope when considering ever smaller biological entities (Malaterre 2013).

  10. As an illustration of this point, see the debate on the role of viruses in the origin of DNA in (Forterre 2006).

  11. See for instance the debate about whether viruses should be included or not in the tree of life (Moreira and López-García 2009; Forterre 2010).

  12. In Bedau’s framework, the position on the scale of lifeness is a function of the number of interactions. As a consequence, a value of 1 interaction can be achieved either by having 1 functionality and 1 retroactive interaction from this functionality onto itself, or by having 2 functionalities linked by a 1 interaction. With the same reasoning, one can figure out that step 2 entities necessarily have 2 or 3 functionalities, and that step 5 entities necessarily have all three functionalities. From step 5 to step 9, all entities necessarily have all 3 functionalities and are only distinguished by their number of interactions. Functionalities therefore play a secondary role compared to that of interactions.

  13. A referee pointed to us the importance of interactions between functions. Interactions do indeed capture a very relevant feature of living organisms. This is a point emphasized notably by Gántí (2003), and that is central in Bedau’s framework and in other perspectives that are largely focused on designing or engineering protocells. Yet, we argue that characterizing the lifeness of entities concerns, above all, identifying what these entities do (this is a project that is different from the one pursued by Bedau, and complementary). Consider a simple analogy: a sidecar motorbike can be depicted as possessing the functions of ‘motorized propulsion’, of ‘providing seating for a driver’ and of ‘providing seating for a passenger’. This third function clearly is what differentiates a sidecar motorbike from a regular motorbike. The fact that there is an interaction between the sidecar itself and the motorbike to which it is attached is of course crucial but it does not spontaneously arise in the top list of differentiating properties of such sidecar motorbikes compared to regular motorbikes. In biology, the characterization of microorganisms through the identification of genes belonging to different clusters of orthologous groups clearly also illustrates the importance of this perspective (Galperin et al. 2015). Indeed, the significant differentiating factors between microorganisms are taken to be genes that correspond to specific functions, not genes that correspond to interactions between functions.

  14. Note however that not all entities that are classified within the domain Bacteria would appear as uncontroversially alive: as mentioned earlier, the status of the endosymbiont Carsonella ruddii and others clearly is disputed.

  15. This provides an additional element of answer to (FG0), namely that the functional dimensions of common bacteria are the ones to be considered as functional dimensions of lifeness.

  16. One anonymous referee pointed to us the risk that we might be introducing a form of circularity by adopting bacteria as reference for the 1-value of lifeness (somehow similar to the circularity of Bruylants and colleagues). The difference between the two approaches is that, in our case, we take common bacteria such as E. coli as reference point, and then assess the lifeness of any other entity by comparison to this reference point, whereas Bruylants et al. consider that any entity of the tree of life automatically deserves to be at the 1-value of lifeness (relegating the question whether entities ought to be part of the tree of life to scientists’ classificatory practices).

  17. Ideally, we would have liked to conduct our analyses at the granularity of species—so as to clearly identify, in particular, the performance of E. coli and use it as reference point for comparing other species (such as C. ruddii and others). In practice, this was unfortunately not possible: first, it is quite difficult to identify which particular species generic words such as ‘bacteria’ or ‘virus’ refer to; second, species names are rare throughout the corpus, and therefore cannot be reliably used for text-mining purposes (this is a limitation of the methodology we discuss in Sect. 7). We thereby decided to conduct our analyses at a coarser-grained level (hence the choice of target entities such as ‘bacteria’, ‘virus’ etc.). This implies that the lifeness of particular species could not be assessed. It also implies that the actual reference point for the 1-value of lifeness is actually the averaged performance of all bacterial species present in the corpus (and not exclusively of E. coli). This is an area where further research could be conducted, on larger corpuses and with more sophisticated text-mining tools.

  18. As pointed to us by one referee, there are other reasons why properties could be strongly present in relationship to entities throughout the corpus. This could be the case, for instance, for properties of well-studied organisms such as Drosophila melanogaster or Escherichia coli compared to lesser-studied ones. In our analyses, we did not seek to investigate the lifeness at the level of specific species but adopted a much coarser-grained perspective (bacteria, archaea, viruses etc.). Such a perspective contributes to averaging out such possible biases and to justifying our second premise.

  19. One of the reasons for choosing the BioMed Central collection was its open-access availability for text-mining via a dedicated API, as well as its diversified content in biology-related journals in which we had strong reasons to suspect that the target entities we were interested in would be mentioned. See also discussion in Sect. 7.

  20. In short, this metric measures the distance between the average vector that represents the set of topics belonging to a given category, and the vector that represents a given entity in the semantic vector space constructed on the basis of the corpus. Note that this approach neutralizes—to a certain extent—possible biases due to the differing frequencies with which the target entities are mentioned throughout the corpus: what matters is how often entities and categories are jointly present in the same text excerpts (paragraphs), independently of how often entities are present in the corpus.

  21. One anonymous referee pointed to us a risk of circularity, lifeness scores being assessed along dimensions that refer to humans, plants and animals, which are themselves living systems. But this circularity is only apparent: lifeness scores along these dimensions do not depend on whether we consider humans, plants or animals alive or not. What matters is the extent to which entities—whose lifeness is to be assessed—interact or not with other entities that are labeled humans, plants or animals, independently of whether humans, plants or animals are themselves considered alive or not. A second concern is whether our approach might be too much based on life-as-we-know-it, in its existing context. As a result, lifeness dimensions and scores would change depending on the properties of newly-found entities (e.g. Martian bacteria) or on modifications in the environment (e.g. humans being present or not, as captured by dimension F). We do not see this as a problem: the best perspective we can have on lifeness, we argue, is one that emerges from the scientific practice in its recent state (hence without Martian bacteria and with humans around). Yet, as the state of science changes, lifeness may also change (Martian bacteria might be discovered that have unheard-of properties or humans may go extinct while robots still pursue the project of assessing lifeness). The methodology we propose makes it possible to include novel entities and contexts, and revise our construal of lifeness by re-running the analyses on updated scientific corpora (see also the discussion in Sect. 7).

  22. The overall lower performance of archaea compared to bacteria could come from two factors. (1) It could be the case that archaea have been less studied than bacteria, and therefore that less is currently known about what archaea do (compared to bacteria); in the future therefore, as publications on archaea increase in number and research themes, one will likely see in increase in archaea lifeness. On the other hand (2), it could be the case that archaea are simply less sophisticated in many respects when compared to bacteria. If this is the case, their relative lifeness compared to bacteria will not change in the future. In any case, one should bear in mind that the entities performance is relative to the current state of knowledge of the scientific community (as sampled by the corpus).

  23. Note that an extension of the methodology to entities more complex than bacteria—for instance unicellular or pluricellular eukaryotes—could place such entities at a higher level of performance than bacteria (extending the methodology would raise a number of practical issues, such as extending the corpus to properly cover the entities in question, but we see no principled reason why it would not work). We would see no problem extending the dimensions in such a way as to reflect this state of affairs, and thereby confer an even stronger degree of lifeness to such organisms, were it to be the case. Note that viruses already score higher than 1 with regards to some dimensions. Here our objective was to undermine the binary assumption (Bi) and to re-engineer the concept of life so as to better capture the existence of different degrees and modes of lifeness as suggested by the current state of science. Hence our focus on entities that can intuitively be characterized as belonging to the grey-zone of lifeness.

  24. Note however that studying lifeness at the level of species may raise other methodological difficulties, in particular linked to the fact that certain species of organisms are more studied than others due to their being model organisms or simply because they are easier to study (e.g. micro-organisms that can be cultivated as opposed to those that cannot), or still because they affect human health or economy. In this respect, adopting a more aggregated level averages out these differences (see also footnote 17).

  25. Imagine a corpus that would have included journals in astronomy or physics, or even on origins of life, synthetic biology or theoretical biology: other topics specific to these disciplines would have emerged (e.g., about ‘stars’, ‘planets’, ‘elementary particles’ or about ‘prebiotic chemistry’, ‘engineered micro-organisms’ or ‘theoretical models’). These additional topics could then result in either creating novel topic-categories or modifying existing ones. Yet one also has to consider whether these new topics are correlated or not to the target entities: while one rarely talks about ‘stars’ and ‘bacteria’ in the same paragraphs, some articles may consider ‘microbial contamination’ in the context of space exploration or investigate ‘genetically engineered properties’ of bacteria. Hence the possibility for novel topics to influence the lifeness space.

  26. Interestingly, adding texts that specifically focus on the question of defining life is unlikely to affect the results. Indeed, such texts typically weigh the relative significance of different criteria for life, yet rarely mention target entities. As a result, few of their paragraphs would be retained.


Download references


Access to the BioMed Central Collection is gratefully acknowledged. The authors thank Marc Bedau, Mark Ereshefsky, Michel Morange, Kepa Ruiz-Mirazo and Eran Tal for comments on earlier versions or parts of the manuscript. They also thank the audience of the 2015 “Origins” conference organized by COST Action TD 1308, as well as the participants to the UQAM and McGill 2017 conferences where this work was presented. Thanks are also extended to three anonymous reviewers for Synthese for thoughtful suggestions. Research conducted with funding from Canada Research Chair Program (Grant 950-230795), Canada Foundation for Innovation (Grant 34555), and Canada Social Sciences and Humanities Research Council (Grant 430-2018-00899).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christophe Malaterre.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 1,004 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Malaterre, C., Chartier, JF. Beyond categorical definitions of life: a data-driven approach to assessing lifeness. Synthese 198, 4543–4572 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Definition of life
  • Lifeness space
  • Topic-modeling
  • Text-mining
  • Experimental philosophy of science
  • Concept explication