Introduction

Detailed understanding of the environment surrounding an archaeological site has always been a major goal in archaeobotanical analyses. Various methods have been applied that take the individual species as a starting point. Indicator species can be defined to make a judgement upon specific environmental properties (e.g. Salicornia europaea: salinity, Sphagnum spp: acidity). Other approaches apply indicator values (e.g. Ellenberg et al. 1991) to all species in a spectrum.

From an ecological perspective, however, a disadvantage of this individualistic approach is that it largely ignores the interplay, or sociology, between plant species. The spatial manifestation and co-occurrence of plant species is known as vegetation. A basic understanding of vegetation is often acquired by grouping species in ‘ecological groups’ based on individual labelling of species such as, for example, ‘arable weed’, ‘grassland species’, or ‘ruderal’ (e.g. Arnolds and van der Maarel 1979).

Another approach is phytosociology, the study of plant communities (syntaxa). Plant communities are defined on the basis of field observations of co-occurrence of species. Although substantial variation in field methods and research density exists, numerous systematic vegetation recordings (relevés) are available in most countries (Schaminée et al. 1995a). In the Netherlands, this adds up to more than 600,000 at present.

The ASSOCIA software package (van Tongeren et al. 2008) assigns plots from field observations to pre-defined and well-established plant communities. Schepers et al. (2013) developed a method to divide an archaeobotanical sample into overlapping species groups. PALAEOASSOCIA more or less treats archaeobotanical datasets as modern plots. A major challenge to overcome is the fact that the vast majority of archaeobotanical assemblages contain plant species from various environmental origins. Essentially, what the method aspires to, is to split this environmental mixture into various groups of taxa that may have grown together, and then assign these ‘subgroups’ to one or more vegetation types in varying degrees of ‘fitting’, which may result in more than one possible syntaxon for one group of species (Fig. 1). However, only the most likely syntaxon for each species group has been used in this study.

Fig. 1
figure 1

Simplified visualisation of the three main steps being used in PALAEOASSOCIA (after Schepers 2014)

The manual formation of groups as seen in the centre of Fig. 1 proved very time-consuming in the 2013 version of the computer program, required substantial practice and experience, and was therefore potentially subject to interpersonal differences. Thus, we felt the need to develop an automated, user-friendly, and faster version and compare the results with manual formation of groups by two experienced and one unexperienced person. A rich dataset from the Southern Netherlands village of Best provided an excellent test set for this further development.

In this paper we describe the developments in the methods and the results of the different methods are compared.

Site and sample description

The village of Best is located in the province North Brabant (Fig. 2). The samples were obtained in preparation of major restoration works on a monumental farm building, ‘De Armenhoef’. Dendrochronological research revealed that the oldest parts of the building date back to 1263, the oldest farm still standing in the Netherlands and probably its wider surroundings. Detailed corings in the barn area revealed the presence of dung-rich layers at various depths (de Kort et al. 2016).

Fig. 2
figure 2

Location of Best on a soil map of the Netherlands

Sampling

A total of 15 samples for archaeobotanical research were obtained from these dung-rich layers in seven corings by archaeologist J.W. de Kort (Cultural Heritage Agency, The Netherlands). Methodological details can be found in de Kort et al. (2016).

As was expected considering the context of the samples, the species found in each individual sample originated from various vegetation types and became mixed during deposition of material from various sources in the sunken byre. The dataset with a large number of species from a large variety of vegetation types was ideally suited for a more detailed analysis by means of an improved automated version of PALAEOASSOCIA.

Methods

Archaeobotanical analyses

The analyses resulted in a rich and diverse dataset, representing 130 unique taxa.

Statistical analyses

The species lists of the 15 samples, different in species richness, were analysed with PALAEOASSOCIA. ASSOCIA, the original program (van Tongeren et al. 2008), as well as PALAEOASSOCIA are based on Bayesian statistics. These are used to estimate the conditional probability that a species list is a random sample from a certain syntaxon. The basic assumption is that this conditional probability is sufficient to discriminate between syntaxa, thus linking a species list to a certain syntaxon. The so-called prior probability of finding a syntaxon in a certain area has, before this study, been neglected in (Palaeo-)ASSOCIA. The conditional probabilities are transformed by taking the natural logarithm and multiplying by -2. Under certain, seldom fulfilled, conditions -2ln(likelihood) values that follow the Chi-squared distribution can be used for statistical tests.

Data preparation

The dataset resulting from the archaeobotanical analyses displays several characteristics typical of archaeobotany, but not occurring as such in the Dutch vegetation data. Uncertain identifications (‘cf’) were accepted as true identifications (and not deleted). Identification groups (e.g. Atriplex patula/prostrata) must be split or deleted as well. In such cases, we decided to split in two species with identical distribution over the samples, combinations of three or more taxa were deleted. Splitting in separate taxa has the advantage that the attribution to plant communities may render one of the two possible species very unlikely. Finally, species can only occur once in a sample, so various plant parts are combined. All numerical values were converted to presence/absence data.

Development of automated methods

The manual construction of species groups, based on the estimated associations between species, following the procedures from Schepers et al. (2013), was complex and time-consuming. This step was thus expected to result in mistakes and differences in outcomes between individuals. To test if this expectation is correct, species lists from four out of 15 samples were processed by manual data processing by the three authors independently. The samples were selected to cover the variation in taxa richness and diversity. The instructions were not to spend more than eight hours for the four samples and to make the largest possible groups of species with the restriction that no pair of species within the groups was associated less than -0.25.

The species groups were assigned to plant communities by the original ASSOCIA program, using minimum weirdness as the criterion (van Tongeren et al. 2008; Schepers et al. 2013). The species groups were checked in two steps. In the first step the association matrix between all species in each species group was checked for values below the threshold. If these were present, species were removed from the group until no pair of species within a group showed an association value below the threshold. In the second step the species belonging to the sample but not to the group under consideration were evaluated; if the association to all species in the group was above threshold the species were added to the group. After this correction the analysis with ASSOCIA was repeated.

All samples were subsequently processed by the program using the first cluster method that was added to the program, single linkage with a constraint on the diameter of clusters, to form species groups. All implemented cluster methods are described in detail later in this section. All evaluations of species groups were done by the original criteria, developed by van Tongeren et al. (2008) and implemented in ASSOCIA. The species groups were compared with all (sub)associations described in Schaminée et al. (1995b, 1996, 1998) and Stortelder et al. (1999), except basal and derivative communities. The (sub)association best fitting to the species group in terms of minimum of -2ln(likelihood), or one of its components, weirdness or incompleteness, can be chosen.

Please note: The minimum of -2ln(likelihood) is reached at maximum likelihood. As defined by van Tongeren et al. (2008) the -2ln(likelihood) can be split into two complementary values, called weirdness and incompleteness.

Data processing

The data processing for use in a palaeobotanical context, partly added to the PALAEOASSOCIA program, consists of four steps of which the last one is not yet implemented in the program and the first one is not necessary for all automated clustering methods:

  1. 1.

    Computation of the association between species.

  2. 2.

    Construction of overlapping clusters around all species.

  3. 3.

    Assignment of all clusters found in subsequent steps of the clustering to vegetation units, (sub)associations of The Vegetation of the Netherlands (Schaminée et al. 1995a, b, 1996, 1998; Stortelder et al. 1999).

  4. 4.

    Evaluation of the results is done in an Excel file. Because steps two and four of this process have been developed or changed substantially, they are described in more detail in the following.

Step 2: clustering the species

Clusters are constructed using modifications of the well-known clustering algorithms single linkage and complete linkage as well as a modification of single linkage in which the diameter of clusters is constrained (van Tongeren 1986). Each species is used as a starting point for a growing cluster and the other species are added to this cluster in the order determined by the definitions of distance between a cluster and single objects (species) in single linkage (nearest neighbour) or complete linkage (furthest neighbour). This process is illustrated in Fig. 3.

Fig. 3
figure 3

Description of the clustering process in diagrams. a. Nine species, numbered in the order in the dendrograms (b, c), are displayed in a two-dimensional diagram, like an ordination diagram. Associations between species are transformed into distances. Short distances represent co-occurrence, long distances avoidance. Each species is surrounded by a circle, representing a threshold value. b. Single linkage dendrogram. c. Complete linkage dendrogram

The threshold value is represented by a dashed line in the dendrograms. See text for full description of the clustering process and of the dashed magenta circles

In the tedious process of manual clustering (Schepers et al. 2013), theoretically all species combinations showing overlapping areas between two or more circles can be found, for example {1,2,3}, {3,4,5}, {6,7,9}, {7,8,9} etc.

One of the nine dendrograms for single linkage clustering is displayed in Fig. 3b. This example dendrogram of single linkage clustering (starts at species number 8 (red)). Using the threshold within cluster distance between species as displayed, the final result, starting at any of the species, will be that all species are in one cluster. However, applying a smaller threshold, representing a higher between-species association value, could separate {1,2} from {3,4,5,6,7,8,9} or even further split the larger one. Single linkage may lead to chaining, i.e. the formation of long clusters.

Six dendrograms of complete linkage (Fig. 3c) start at species 1 (blue) or 2 (purple); 3 (black) or 4 (grey); 6 (olive) or 7 (light green); and 8 (red) or 9 (brown). During the formation of the cluster starting at 5 (orange), species 3 and 4 are also added, but the order of the fusions and the distances are not correctly displayed. The distance between a cluster and a species is the largest of the distances between that species and the species of the cluster. Fusions at distances above the threshold (at lower between-species association values) are not allowed, therefore 7 (olive green) nor any other of the remaining species can be added to cluster {3,4,5} for example. Complete linkage using the threshold in the example therefore leads to the overlapping clusters {1,2,3}, {3,4,5,7}, {6,7,3} and {7,8,9}. Complete linkage generally leads to more or less spherical clusters.

The third method (constrained single linkage, not shown separately, is like single linkage, but, when a fusion of a species with the group is not allowed because of the distance between this species and another species in the group, tries to find another species, further from the cluster according to the definition of distance, but close enough to all other species in the group) was the first to be added to the program and was used for the comparison with the manual clustering. In this third method, species leading to exceedance of the maximum cluster diameter are excluded from addition to the cluster. This may produce different clusters or lead to a different order in the addition of the species when compared to the other two methods.

None of these three cluster methods, however, leads to the species groups indicated by the dashed magenta circles in Fig. 3: {4,3,7} and {6,7,9}. These are valid groups because they meet the criteria about the threshold value. These two groups could have been found by applying the manual method.

Finally, a new and completely different clustering algorithm has been added, where each choice for species to be added is based on a modified maximum likelihood criterion. Starting at each species again, all possible additions of species are evaluated by choosing the one that leads to the best group in terms of likelihood. In this modified maximum likelihood [minimum -2ln(likelihood)] criterion, species not present in the sample are not contributing to the incompleteness value for the species groups. In the context of PALAEOASSOCIA this method is called likelihood clustering. This method is designed to minimize -2ln(likelihood), and thereby minimizes weirdness, but also incompleteness as far as species in the sample are involved. This algorithm is expected to perform better than the three mentioned above.

Evaluation of the results of the subsequent steps during the formation of clusters starting at each species

All species groups subsequently found in all cluster procedures can be evaluated by the standard procedures of ASSOCIA by setting weights for the contributions of present and absent species to the combined index. During the growth of the clusters various (sub)associations may be assigned. The best solution, usually based on normalised weirdness, is chosen as the (sub)association related to the species that started the cluster. Several species may lead to the same (sub)association. The sample is considered to originate from a mixture of the (sub)associations related to its species.

Problems leading to new developments

Using these methods we found syntaxa that are not very likely to have occurred in the surroundings of Best. Synbiosys (Hennekens et al. 2010) was used to produce distribution maps of the Netherlands for all observed syntaxa. Based on these maps we excluded several syntaxa from our analysis (Fig. 4). This is a simple way to let roughly estimated prior probabilities (value either 0.0 or 1.0), influence the outcome of the analyses.

Fig. 4
figure 4

Spatial distribution of all syntaxa excluded from the analysis. Note that soil, see Fig. 2, is one of the most important factors determining where syntaxa can occur

Comparing the four automated methods

The four final groupings were compared using a simple test based on the binomial distribution. For important parameters in each of the samples the minimum and maximum value was determined. For each method the number of times its value was equal to the minimum or the maximum was counted. The null hypothesis for the tests is either that the number of times one of the four methods shows the maximum value or the number of times this method shows the minimum value is a binomially distributed number in the range 0–15. Since we have four methods the probability for each method, assuming randomness, is 0.25 and the expected number of times the method leads to the maximum or minimum value is 3.75. If there is a clear idea if either the maximum value or the minimum value expresses the best result, one of the two one-sided tests is chosen. However, the tests for both sides were always done to be able to generate hypotheses for future research. The hypotheses tested are:

  • The group sizes are larger for Likelihood-clustering (L) and smaller for the cluster methods based on the association matrices (modified complete linkage, C, modified single linkage with constraint, SC, and modified Single linkage, S).

  • The average (mean) values of the absolute values of the normalised indices are smaller for L-clustering and larger for the methods based on the association matrices of the species. The averages of the absolute values are used in this case, because the expected value of the mean of the absolute values is always 0.

  • The standard deviations of the normalised indices are smaller for L-clustering than for the methods based on the association matrix. Here the absolute values are not used, because the values including the sign lead to a better indication of the standard deviation (spread) in the scores on these parameters.

Results and discussion

A full overview of the identified syntaxa can be found in ESM 1. In the following sections, we discuss the results of the methodological elements explored in detail in this paper.

Manual grouping and clustering with constrained single linkage

The manual clustering led to different solutions for the three authors, especially in samples rich in species. None of the authors achieved a flawless result in the context of our agreements on how to make groups of species within restricted time. Rather often more species could be added to the groups. Also other errors, like adding species with low association to other species in the group, occurred. Notwithstanding the differences observed, the general outcome of the three authors as well as the automated ASSOCIA results are highly comparable in terms of syntaxa recovered. The largest variation is the addition of extra species to groups, still leading to the same syntaxa. Weirdness can increase in this process, when the extra species makes the group more atypical for the syntaxon identified. The automatized output, however, is no longer subjective and saves hours of labour in manually sorting.

Table 1 shows the differences between the results of the manual processing by the three authors and the results of single linkage with constraint (SC), using the arbitrary threshold of -0.25 as used by Schepers et al. (2013)

Table 1 Suggested syntaxa for the four samples (8, 25, 35 and 43) by method and summarized for the four samples. Syntaxa are arranged according to their syntaxon number. The colour scheme in the column with the syntaxon names follows the field guide of Dutch plant communities (Schaminée et al. 2010): blue = wetlands; light green = grasslands; orange = pioneer; dark green = woodlands. The numbers show the number of different species groups that were assigned to the syntaxa in each of the samples. The grey shaded numbers show the number of samples in which a syntaxon appears for each method. Methods: (1) M with errors, the manual method before corrections, (2) M corrected, after correction of the species groups, (3) SC, single linkage with constraints; the scores for the individual samples are included in the ESM

Note that most associations were found with both methods. The manual method, after correction, resulted in the same number of types as Single Linkage with constraint (SC). After the correction, each of these methods leads to one unique syntaxon. All other syntaxa are found with both methods. However there are differences in the samples where the syntaxa have been found.

Comparing subsequent steps during clustering

Addition of the next species to a species group generally leads to higher weirdness and lower incompleteness. If, however, the assigned (sub)association is different from the (sub)association assigned at the previous step, the contributions of each species to weirdness and incompleteness change, because their presence is different in different types. In case of such a change in type it is possible that either weirdness decreases or incompleteness increases. Single linkage, constrained single linkage and complete linkage start at the association matrix of the species, which is biased as a consequence of the way it is estimated. Likelihood clustering was developed to optimise the likelihood criteria and therefore is expected to perform better in terms of these criteria, which are also used to judge the results in the end.

Table 2 shows the results of the comparison between automated methods. The complete results, including the values for each sample are provided as ESM 2.

Table 2 Results of the comparison between the automated methods

The hypothesis tested for different parameters (white cells in the test results) is that L-clustering performs better than the cluster methods based on the between species association matrix. Column group size mean shows that it is very unlikely that the number of cases where L-clustering shows the largest groups is so large by chance. The conclusion is that method L produces larger clusters, when evaluated by the same criterion, normalised weirdness, than the other methods.

There was not any prior idea about the standard deviation of the group sizes. Based on the results of the tests the hypothesis is generated that the groups of likelihood clustering are more equal in size than those of the other clusterings or that the same groups are formed when starting from different species. This is confirmed, but not proven, by the fact that likelihood clustering has the maximum number of types that are found more than once and the lowest number of syntaxa that are found once.

Also for -2ln(likelihood) there is no a priori hypothesis because -2ln(likelihood) generally increases with increasing group size, because of the increase in weirdness, and gets lower when a species is added that is frequently occurring in the syntaxon that is identified. There is no explanation for the only significant value in the column of mean (-2ln(likelihood)). One could think that the standard deviation of -2ln(likelihood) is lower for constrained single linkage than for the likelihood clustering.

The six columns on the right side of the table are about parameters that have been normalised and the standard deviations in their values (absolute values for the mean, the value including its sign for the standard deviations).

General discussion

The results make clear that taking every individual sample as a starting point, and not the assemblage as a whole, allows for a more detailed and nuanced understanding of the vegetation diversity represented by the archaeobotanical samples. This is particularly valuable in cases where a larger number of samples with a high number of taxa is available, as was the case for Best. As mentioned in the introduction, the earlier version of PALAEOASSOCIA was very time consuming, which makes it problematic to apply in larger projects.

Another disadvantage of the earlier version was that manually constructing species groups by ordering elaborate association matrices is a likely source of personal errors, and resulted in differences when carried out by different people. These complex steps have now been fully automated. At the same time, the association matrices, displaying co-occurrence values between the different taxa identified in a sample, are still available as output of the software. These allow for a more in-depth direct visual understanding of the calculated associations between taxa.

PALAEOASSOCIA explicitly uses the uniformitarian assumption, or actualism, applying extensive modern vegetation data as a key to the past (see Schepers et al. 2013 and references therein). A major strength of the approach is the vast amount of reference data available for the analyses. The systematics within phytosociology, however, and the currently accepted syntaxa, are not static in themselves because of ongoing research. This mainly affects lower syntaxonomical levels, such as subassociations and basal and derivative plant communities. The latter two are supposed not to have existed in the past and therefore were excluded from the results from the very beginning of this study. The level to which using present-day plant communities is ‘safe’ or valid cannot objectively be defined. The individual researcher using PALAEOASSOCIA should decide which level to use. However, using lower, more precise levels does have a higher potential for showing differences between samples.

Using the manual method there is no way to decide beforehand which threshold should be chosen for the minimum required association between species. Every additional chosen threshold can be evaluated afterwards, but requires more time for the manual processing of each sample. When using clustering methods it is feasible to evaluate the clustering step by step and therefore to decide which species group, starting at a certain species of the sample, is the optimal choice (depending on the chosen criterion for evaluation) for the group starting at that species.

Generally, the species list of archaeobotanical samples is incomplete, which led to the decision to use weirdness, instead of maximum likelihood, to find the most optimal solution in Schepers et al. (2013). The newly developed method of likelihood clustering also optimises the incompleteness of the subsamples. The criterion for the choice of the next species that should be added is not yet available for evaluation criteria. For reasons of comparability weirdness was chosen as optimality criterion again. However, because weirdness increases with the addition of species, the normalised version of this criterion was used in this study to choose which of the growing groups starting from each species is the best group for this species. Non normalised weirdness would have led to species poor subgroups and to incomparability between nested species groups that are assigned to different syntaxa.

PALAEOASSOCIA does not replace other methods to address the ecological implications of archaeobotanical data. It serves as an additional tool to reach a more detailed interpretation. The results generated by the analysis should always be interpreted within the archaeological and physical-geographical context of the data, and preferably in combination with other bioarchaeological proxies. There is no guarantee that all possible syntaxa are found with any of the methods, even not when the results of all methods are combined.

Conclusions

The revised form of the PALAEOASSOCIA package makes it easier to use and removes the most complex manual steps previously included in the process. Thus, a large number of samples can be processed much faster and the outcomes are no longer dependent on personal experience with the package.

However, there is no guarantee that every feasible species group will be found. A thorough inspection of the results, for example by interpretation of the extended diagnoses the program can give, is still necessary. The program should be provided with the tools to find the best subgroups automatically and with the possibility to feed it with prior probabilities. Thus, it will make a complex form of vegetation analyses available to archaeobotanists with less experience in phytosociology.