Keywords

Introduction

The NOW database just turned 25 years old. Its colorful history is described in Chap. 2 (Fortelius et al. 2023). Started as a compilation of the European Neogene fossil record, the NOW database is now fully global, covering all continents, as well as the Paleogene and Quaternary in addition to the Neogene, although not equally densely.

NOW stands for New and Old Worlds. The name was changed in 2012 from Neogene of the Old World to reflect expanding geographic and temporal coverage. The database collects, curates, and presents the global fossil record of mammals from roughly 66 million years ago to the very recent past. The primary source of the data in NOW is the published scientific literature. Harmonization of data from different sources and synchronizing diverging interpretations takes most of the curatorial effort. For integrity and consistency, NOW strictly follows the scientific interpretations of dedicated experts – the Advisory Board members.

While NOW is not the only database covering fossil mammals (Uhen et al., 2013), it is the only global fossil database dedicated solely to mammals. It aims to include all the classical and well-studied fossil mammal localities known to research. Currently, we believe it roughly encompasses about two-thirds of the entire Cenozoic mammalian fossil record in terms of species occurrences known to research. It does not, of course, record all the individual specimens ever found, as that would be an impossible task, but focuses on the temporal and geographic distribution of taxa and their ecological characteristics. The database does not include mammals or mammalian ancestors from the Mesozoic.

From the start of operation, NOW has put a very strong emphasis on the Advisory Board and the curation of data. Each taxonomic group, geographic area or geological time has dedicated experts, whose roles are similar to those of associate editors in scientific journals. This system allows us to incorporate new scientific insights into the database while ensuring consistency throughout. Rather than being an archival repository of static data snapshots, NOW aims at integration and continuous incorporation of the evolving and expanding knowledge of the research community. In that sense, NOW is not only a database of fossil data, but equally a database of current scientific interpretations. The consistency of curatorial treatment is important not only in taxonomic assignments, but equally in the level of conservatism when compiling faunal lists, age estimation, and selection of localities for reporting. NOW puts a major emphasis on data curation and global integrity of the fossil record. All records will never be at the same level in terms of accuracy and reliability, but NOW aims to apply the same curatorial principles in handling uncertainties, especially taxonomic uncertainties.

“Why study the fossil record?” is an eternal question that has many answers, which may be obvious to members of the paleontological and related research communities. For outsiders, the main take-away message is that the fossil record allows us to see alternative scenarios of how living worlds could be and to infer principles for how the living world works in general. The fossil record allows understanding the patterns of evolution and ecosystem structure, as these cannot be inferred from the study of living organisms alone. It also provides evidence of ancient climate and geography, and helps to estimate the age of geological formations. Last but not least, the fossil record is the key to understanding how current global biodiversity and ecology developed over geological time.

The greatly impoverished biodiversity of today provides only a very limited view of what was typical of most of the Cenozoic. Paleontological research is thus essential for studying the principles of how the living world works and how continental ecosystems once supported higher mammalian biodiversity. For example, tropical African savannas and forests today include one or two species of proboscideans and rhinocerotids, while over the last 25 million years it was not unusual for many species of equids (Janis, 2023), proboscideans (Huang et al., 2023), and rhinocerotids to occur together across Eurasia, Africa, and North America.

A wide spectrum of scientific questions can be addressed using fossil databases, including inquiries about global and local environmental dynamics in the past, mass and background extinctions, ancestral relationships of species and their diversity and the response of life to major changes in our planet’s geological past. Global fossil data compilations can be used for analyzing evolutionary patterns and evolutionary contexts, studying ways of life and reconstructing evolutionary contexts of individual species, including humans. They also have potential applications in conservation, the latter defining the relatively new field of conservation paleobiology (Barnosky et al., 2017; Dietl & Flessa, 2011; Kiessling et al., 2019). Certainly, many new research questions are yet to be formulated.

The main goal of the NOW database is to summarize and represent the current scientific knowledge of the global research community about the mammalian fossil record in a way that researchers, students, journalists, policy makers and the public all over the world can use. In this way, NOW contributes to recording global geoheritage. It also gives coordinates and information that can help policy makers to locate and delimit important fossiliferous areas for protection.

The Nature of NOW Data

The fossil record is as much about the contexts of remains of past organisms found in sedimentary rocks as it is about the fossils themselves. The NOW database compiles and harmonizes secondary data – data from publications identifying what has been found, where and in what geological contexts. Specimens that we see on display in museums are usually exceptionally well preserved or well restored and often include nearly complete skeletons. However, the majority of mammalian fossil specimens are only fragments of organisms that once lived. Identifying them to taxonomic level and placing them into relevant geological contexts requires extensive specialist training and expertise.

Collection of primary data in paleontology is extremely labor intensive. Finding fossils in the first place requires skill acquired through training and practice, as well as a certain amount of luck. Fossil finds primarily come from targeted searches and expertise is needed to know where and how to search. Often the target is an area of known exposures, but within that, paleontologists normally range over the outcrops looking for fossils or concentrations of them. Sometimes they target particular taxonomic groups. Collecting methods also differ, for example most small mammal collections come from screen-washing fossiliferous sediments in field campaigns rather than systematic excavation.

Not uncommonly though, fossil finds can result accidentally from roadworks and other building activities, probably an increasing trend as human land use intensifies. The excavation and preparation of fossil specimens that may be broken or encased in rock is another labor intensive activity that requires special training. Next, fossils and their geological contexts are studied by experts, identified, interpreted, and described in publications. It may take years or decades for a fossil specimen to become a quality data point with accurate temporal, spatial and taxonomic information. Such data points can be analyzed on their own or as part of larger scale analyses covering extensive geographic areas and time spans.

When many data points are available, regional- to global-scale studies become possible only if scientific interpretations of, e.g., taxonomic assignments, classification of taxa or age estimation of a locality, are reliable and consistent. The NOW database collects, integrates, and harmonizes the mammalian fossil record, aiming for consistency of interpretations. The data in NOW are a precious collection of evidence and knowledge generated over many years by research communities worldwide. It is not just a sample dataset, but to a large extent embraces the known record of mammals in the past.

Is the Fossil Record Biased?

Yes, of course it is. Fossils do not represent all times and all places of the past equally well. In On the Origin of Species, Charles Darwin dedicated more than a full chapter to the imperfection of the geological record (Darwin, 1859): “I look at the natural geological record, as a history of the world imperfectly kept, and written in a changing dialect; of this history we possess the last volume alone, relating only to two or three countries. Of this volume, only here and there a short chapter has been preserved; and of each page, only here and there a few lines.” (pp. 310–311). The fossil record known in Darwin’s day represented a very small part of potentially available material, and he understood that these fossils reflected a very incomplete and fragmentary record of past life (Newell, 1959). With over a century of active collection and documentation of fossil evidence, the global fossil record is now broader and more abundant, and includes increasingly rich contextual information, but still, most certainly, we have only a glimpse of a subset of the former life on Earth that was preserved and is available for study. Even though concerns about the quality of the fossil record are voiced frequently (Benton et al., 2000; Kalmar & Currie, 2010; Kidwell & Flessa, 1996; Saarinen et al., 2010; Seddon et al., 2014; Turner, 2007; Valentine et al., 2006), others argue that as knowledge is cumulative and sampling biases are gradually corrected, gaps in the fossil record are better understood (Benton & Storrs, 1994; Currie, 2019; Newell, 1959; Plotnick, 1993), and major patterns stabilize. Preservational patterns in vertebrate fossils are to some extent consistent over time within known preservation modes (Behrensmeyer et al., 1992), which brings good and bad news. The good news is that potential biases can be quantified and accounted for. The bad news is that what rarely or never gets preserved will remain virtually absent from the record.

Understanding that the fossil record will never be complete raises a broader question: how representative is the knowledge that we derive from this record and how robust are our conclusions about how the living world works in general? Surely, the fossil record does not cover the complete history of life. Even within the mammalian fossil record, which has received focused attention over centuries, not only new species, but new genera of mammals are still being discovered by fieldwork (e.g., Madden et al., 2010; Ríos et al., 2017; Turvey et al., 2018) and even on the shelves of museum collections (e.g., Borths & Stevens, 2019). Yet, the existing fossil record contains pieces of evidence of different kinds – momentary disaster snapshots and long-term attritional accumulations, fissure fillings and lagerstätten (sites with exceptionally rich accumulation or preservation), carnivore traps or cave deposits, anthropogenic base camp and kill site accumulations and even past environments that are otherwise unfavorable to fossilization. Given the variety and vast number of such fragments, even if the fossil record is incomplete in a historical sense, the record is expected to be relatively more complete in representing the past diversity of function and forms. Thus, if many different pieces of evidence are available and those pieces represent different environmental circumstances and preservation contexts, we can hope to reconstruct the functional ecosystems of the past by drawing on overlaps among those pieces.

Data Curation and the NOW Community

At the time of writing this chapter the NOW database contains records on around 16,200 mammalian taxa (not all of which are identified to the species rank), about 6,400 localities, and approximately 68,000 species by locality occurrences as publicly available open data, licensed under the Creative Commons attribution 4.0 license (CC BY 4.0 by The NOW Community).

Species information in NOW is always public. New localities may be initially entered in private mode and released as public data after the faunal lists and the contextual information are curated. Inevitably, some entries in the database will contain errors. Existing data are curated either via targeted revisions of the database or following notifications from users about potential errors. New data chunks are added following larger research projects or data releases in the community, or on an ad hoc basis when new publications come out, typically under initiatives or following pointers flagged by the Advisory Board members. An active community of users helps to keep the database up to date as much as possible. The database does not strive to react immediately to major taxonomic updates or other revisions, but rather takes a conservative long-term approach and allows some time before major revisions within the database, as new perspectives settle in the research community.

The primary authority for any data interpretation and treatment is the NOW Advisory Board, listed on https://nowdatabase.org/now/board/. The Advisory Board consists of the General Coordination and the Management team, the Steering Group, the Coordinators and the Emeritus Board. The General Coordination and the Management team runs NOW on a daily basis. The NOW management team is headed by the General Coordinator and includes the Steering Group, Associate Coordinators, specialists for database infrastructure, and junior data curators. The Steering Group makes strategic decisions and appoints Coordinators. Coordinators have individual dedicated areas of responsibility. The Emeritus Board Members have no dedicated tasks but advise when needed on a range of issues. NOW strictly follows scientific interpretations and perspectives of the coordinators, responsible for taxonomic groups, regions, times, geological and ecological contexts. Currently NOW has about 100 dedicated coordinator roles. Coordinators are appointed by the Steering Group based on experience, expertise and reputation in the research community for a minimum of five years. The work of coordinators is similar to that of editors of scientific journals. The main responsibilities of coordinators are twofold – answering queries from data curators when interpretations or edits for small snapshots of data are needed, and monitoring their own areas of responsibility in NOW, flagging potential issues with data and editing data when necessary.

The NOW database is run as a community service on a voluntary basis. The database has no dedicated funding. Over the years, most of the funding used to build and maintain the technical infrastructure, user interface, and carry out large scale data preparation comes from regular research projects of the NOW Community members.

Appendix 3.1 provides more details on the working procedures via Frequently Asked Questions.

The Database and Data Infrastructure

The database is physically hosted in Helsinki, Finland. The history chapter by Fortelius et al. (2023) provides details on how and why the database came to be hosted there, where it has operated for the last 25 years. The infrastructure consists of the relational database itself and the user interface via which users can access and query the database. The NOW website acts as a gateway to the database, as well as providing news, lists of advisory personnel, background information, instructions for usage and pointers to the user interface (https://nowdatabase.org/). Currently, the user interface of the database runs on a container platform at the University of Helsinki. The user interface code is written in PHP. The database itself is currently hosted by the Finnish Museum of Natural History. For many years, the database used to be MySQL but now it runs on MariaDB.

At the center of the database are the species table, the locality table and the relational table between the two. There are over twenty accompanying tables, of which the most important are shown in Fig. 3.1. Treatment of time in NOW is modular. Only in rare cases when absolute time in years is available (such as from radiometric dating) is the age of a locality entered as a number. In most cases, localities are assigned to time units. Time units can be continental or regional mammal biozones, magnetochrons, geochronological units and more. When the boundaries of time units are updated globally, locality ages in the database adjust automatically. Each locality has a minimum and a maximum age estimate, which do not have to come from the same basis of age: for example, the upper age bound of a locality can be derived from magnetostratigraphy and the lower bound from an assignment to a mammalian biozone.

Fig. 3.1
figure 1

A simplified database scheme, a map of public NOW localities over all ages and NOW logo

All entries or edits to the database are referenced and most of the references are scientific publications. However, NOW allows referencing personal communications or unpublished sources if necessary and if those sources give the most reliable information according to the opinion of the responsible coordinator. The opinion of a coordinator within his or her jurisdiction in NOW can be used as a reference. Strictly following the authority of individual coordinators allows consistent treatment and data harmonization within NOW, which, in case of inconsistencies, is prioritized over information derived verbatim from the publications.

The policy of NOW is to provide a conservative species list per locality. This means that in a case where two occurrences of the same genus are reported in the literature, one of which is identified to the species level and the other is species indeterminate, NOW assumes that unless there is strong evidence otherwise, both represent the same taxon. If there is strong evidence that the second occurrence represents a different species, this can be recorded in NOW with the second occurrence highlighted in the species list. Similar conservative treatment applies at higher taxonomic ranks. NOW allows the recording of yet unnamed species at localities. NOW also has an extensive infrastructure for handling synonyms of taxa and localities. Data entry conventions and treatment instructions are available on the NOW website (https://nowdatabase.org/now/conventions/).

Users can export selected data as a flat table, wherein each row is one occurrence of a species at a locality, and columns represent attributes of localities, attributes of species in general and attributes of species by locality. Selected data can be viewed via integrated mapping services. Table 3.1 outlines what major information in addition to species occurrences is available in the database. Users can also get information on the references associated with each entry in the database as well as the history of updates for each entry.

Table 3.1 Major information available in NOW database in addition to species occurrences at localities. Stars indicate mandatory fields, which means this information is available for all the entries. Information in the remaining (optional) fields may be sparse

The Future

The future remains to be seen. It is clear that we are entering an era in which the study of large and integrative data sets is becoming an increasingly common practice in science. In paleontology, the treatment of large collections of data has become as important as the careful excavation and study of individual fossil assemblages. Considering that the NOW initiative started at a time in which we could not foresee current developments, the database is surprisingly well-equipped to meet contemporary challenges. The focus on integrity and consistency of data, rather than attempting to catch each new development, has proven to be a successful formula. Because of this down-to-earth approach, the database continues to support research and inquiries into the world of fossil mammals. The NOW database not only provides data but acts as an authority for scientific interpretations of the mammalian fossil record.

From the curatorial perspective NOW is fundamentally like a physical fossil collection at a museum, growing and developing in many ways, some of them goal-oriented and others more random. And like a museum collection, it is indispensable as a reference for scientists. NOW summarizes the knowledge of a research community and the database is curated and refined as knowledge of the community accumulates. In that way NOW is like a “living monograph” of information. More than a simple collection of data, NOW is a community.