GenTag: a package to i m prove ani m al color tagging protocol

The individual identification of animals by means of tagging is a common methodological approach in ornithology. However, several studies suggest that specific colors may affect animal behavior and disrupt sexual selection processes. Thus, methods to choose color tagging combinations should be carefully evaluated. However, reporting of this information is usually neglected. Here, we introduce the GenTag, an R package developed to support biologists in creating color tag sequence combinations using a random process. First, a single-color tag sequence is created from an algorithm selected by the user, followed by verification of the combination. We provide three methods to produce color tag sequences. GenTag provides accessible and simple methods to generate color tag sequences. The use of a random process to define the color tags to be applied to each animal is the best way to deal with the influence of tag color upon behavior and life history parameters.


INTRODUCTION
The individual marking of animals in natural populations is a widespread methodological approach for field ecologists and provides the foundation for several methods to determine population size, lifespan, animal movements in the landscape, and migration patterns, among other possibilities (Sutherland 2006, McCrea & Morgan 2014. For birds, the most popular marking method is the application of color rings (Calvo & Furness 1992), where individuals receive unique combinations of color bands that allow the researcher to individually identify animals by recapture or observation at a distance.
Since the early 1980's, several studies have reported the influence of color tags on bird social behavior , Burley 1986). Because of methodological limitations, most of these studies were carried out with captive populations (e.g., Burley 1986, Jennions 1998. Few studies took place in the field, and these suggest that some patterns of color tagging have an influence on social behavior (e.g., Zann 1994, Johnsen et al. 1997& 2000. It appears that tag colors between 600-700 nm, in the warm range of the spectrum (e.g., yellow, orange and red), influence behavioral contexts associated with conspecific preferences , reproductive investment (Zann 1994, Gil et al. 1999, offspring sex ratio (Burley 1986), dominance behavior (Cuthill et al. 1997), mate-guarding (Johnsen et al. 1997), and levels of cuckoldry (Johnsen et al. 2000).
Despite the negative effects of tags on animal behavior and survival (Calvo & Furness 1992), and the fact that 40% of research projects use color rings to identify birds, 98% of publications do not mention the possibility of potential injury and reduced survival due to tags (Alisauskas & Lindberg 2002). The potential for injury differs between taxonomic groups (Sedgwick & Klus 1997, Pierce et al. 2007, Nietmann & Ha 2018 and animal body sizes (Griesser et al. 2012). Notwithstanding these issues, colored tags are still very popular, mainly due to their lower cost when compared with other identification methodologies such as radio trackers or PIT-tags (Schlicht & Kempenaers 2018). On the other hand, there is no evidence of changes in predation rate due to tags (Cresswell et al. 2007), and even leg flags do not substantially increase predation (Weiser et al. 2018).
Given the above overview, we emphasize that the methodology used to choose color tagging combinations should be carefully evaluated during project development and subsequently reported in the methods section of publications. However, this information is rarely reported. When developing their tagging methodology, investigators may unconsciously select more conspicuous colors to tag animals as these are more likely to result in fast identification. To avoid biases in their choices, field ecologists should necessarily adopt a randomized strategy to determine tag colors and their combinations. We suggest that the best way to deal with the possibility of tag color influence is to generate a list of color tag combinations before tag application, and to follow the list regardless of individual characteristics of the animal (e.g., body size, percentage of feather coverage, color, etc.).

PACKAGE DESCRIPTION
The genseq function in the GenTag package is the main function to create a list of color tag sequences (functions are summarized in Table 1). First, a single sequence is generated by an algorithm, followed by confirmation of its uniqueness. Previously used sequences can also be used for the uniqueness test. Users can request sequences with specific tags, such as metal or flag bands for numbered tagging. This function was designed to sort out sequences using equal numbers of tags. If the user wants to create sequences with different numbers of tags, it is necessary to use "EMPTY" as a proxy for a special color for the nontag, and then change the parameter emptyused to TRUE (see genseq help for more information of application). In this scenario, genseq will take into account synonyms of combinations with the "EMPTY" code, for example: "EMPTY"-"Red"-"Blue" is synonymous for: "Red"-"EMPTY"-"Blue" and "Red"-"Blue"-"EMPTY".
Although there is no evidence of conspecific preferences based on number of tags (Jennions 1998), we recommend avoiding applying different numbers of tags to individuals in the same population, as this may generate confusion in identification. Some animals may actively remove tags, and some colors appear to have a higher rejection rate than others (Kosinski 2004), leading researchers to misidentify individuals in the field.
Sequences are created by a replaceable algorithm that selects among tag colors. Here we provide three algorithms: "All equal", creates combinations of tags in which all colors have the same probability of being sampled; "Variable frequency", creates combinations of tags using different probabilities for each color, where the probabilities are defined by the user; and the "Life expectancy" algorithm creates a restriction based upon color combinations, so that all colors will be represented in similar frequencies in the natural population under study. The latter algorithm requires information of all previously used combinations and dates of applied tags. Additionally, this algorithm can be improved by providing an estimation of survival probability and lifespan. The routine first estimates the quantity of remaining color tags in the natural population. The estimates of survival probability and lifespan provided to the algorithm removes the number of tags that are lost through individual mortality. The sample ratio for each color is then determined by Equation 1. The speed parameter can range from 0 to 1, and can be used to relax the restriction in the sampling procedure. When speed is set at 1 (default) the color that was previously used most extensively will not be sampled in any combination, and other colors occurring in a large number of combinations will only rarely be sampled. Intermediate values allow the occurrence of combinations with commonly used colors, but with a degree of restriction. Alternatively, when speed is set at 0, no adjustments will be made. The user can also select colors that will be ignored in sample adjustment (see lifexp help for more details).
r = ratio for sampling a given color. c = estimated number of remaining tags of the given color. m = estimated number of remaining tags of the most used color. s = speed.
Our algorithms were developed for three different situations, where researchers: i) require a non-biased color tag generator, where all colors will be equally represented in the produced combinations; ii) have a restriction in the proportion of each color availability, a common occurrence, for example, when researchers receive donations of color tags; and iii) have ongoing studies and realize a possible bias in color tag effect, and need to implement a quick adjustment so all colors are equally represented in the natural study population.

RECOMMENDATIONS
For new studies we recommend using the "All equal" algorithm, because it will ensure that all colors are equally represented in the study population. For ongoing studies, we recommend both "All equal" and "Life expectancy" methods. Across a long time period, "All equal" will adjust tag color representation as animals die. For a fast adjustment, "Life expectancy" is more appropriate since it changes the sample probabilities based upon differences in previously used color frequencies. Both methods do not assume limitation in color tag availability. For any situation with limitations of color tag availability, we recommend the use of "Variable frequency" algorithms, to take advantage of the maximum number of combinations using current tag availability. The use of different numbers of tags is an option to save tags, since it increases the number of possible combinations while using fewer tags. We recommend avoiding this procedure, because it may result in misidentifications if an individual loses or removes a tag (Kosinski 2004). Furthermore, by using the same number of tags for all individuals, tag weight will be equivalent for all animals, despite a possible effect of color.
In Appendix I we provide a tutorial of how to generate the list of color tag combinations for both new and ongoing studies. We exemplify how to apply the three methods to generate color tag sequences.
The GenTag package provides accessible and simple methods for ecologists and field researchers to generate color tag sequences. The use of a random process to define the color tags to be applied to each animal is the best way to deal with the influence of tag color upon behavior and life history parameters in general. We highlight that the method used to choose color tagging combinations should be carefully evaluated and reported in the methodology section of publications. The GenTag package provides a straightforward and flexible way to deal with tagging effects on natural populations under study.
GenTag is written in the R programming language (version 3.5.0) and can be run on Windows, Mac OS X, and Linux systems. There are no package dependencies in the current stable version (version 1.0). It can be installed from CRAN (https://cran.r-project.org/web/ packages/GenTag/), and a development version can be found on GitHub (https://github.com/biagolini/ GenTag).

INTRODUCTION
This tutorial illustrates how to use the GenTag package to improve bird color tagging protocols. We provide examples and advice based on our experience with bird field surveys. The theoretical background of the available methods presented in the main paper must be consulted before following this tutorial. This tutorial was written for R beginners; however, it demands a minimum knowledge of how R works (user must know what is an object, working directory, and how to apply functions).

Choose parameters to generate sequences
The first step is to determine three fundamental parameters: i) number of tags that each bird will receive; ii) colors to be used; iii) which algorithm will be used. The first two parameters are fundamental to determine the number of possible combinations that can be created for color tagging. The maximum number of color tag combinations is given by the formula:

Mcomb = Maximum number of unique color tags combinations Ncolors = Number of available colors Tag = Number of tags used for each animal
Thus, it is clear that each new possible color tag has a significant impact on the number of possible combinations. Therefore, to achieve a large number of possible combinations, the researcher should use as many colors as possible. The definition of which color will be used, depends on some factors. First, similar colors, such as white and light blue, should be avoided because natural conditions (i.e. sunlight, dust) can result in tags with similar colors becoming impossible to tell apart during focal observations, even with binoculars. Conspicuous bands make visual identification easy, however they can impact social behavior and the probability of the bird being detected by a predator. The use of band colors similar to bird plumage or leg tissue, reduces this impact.
The number of tags used on each animal also has a large impact on the number of possible combinations. However, it should be kept to a minimum, because color tags generally have a negative effect on birds. There is no rule of thumb concerning the number of tags to be used on each animal. Decisions are based upon the number of tags needed to cover the expected sample size, but take into account effects on bird behavior and survival. For instance, too many tags can be detrimental for flight, color rings can catch on vegetation leading to the bird's death (see , and colors of tags may disrupt social behavior (e.g. We suggest that a good starting point is to use four tags per animal, two on each leg. For instance, using four tags with seven available colors to produce different combinations will yield over 16,000 unique combinations.
In terms of the best algorithm for color sampling, we recommend use of the "All equal" method. This is a method designed to produce non-biased color sequences, where all colors will be equally represented in the combinations generated. It is recommended for both new and ongoing studies, because over the long term, the method ensures that all colors are equally represented in the study population. However, if new color tags are introduced in the system, researchers must consider the use of the "Life expectancy" method. Finally, to take advantage of the maximum number of combinations with a restricted number of available tags we recommend the use of the "Variable frequency" method. This latter method is useful in situations where color tags were donated, a common occurrence in laboratories with several ongoing field surveys. We describe each sample routine method in the main paper.

Generate color tag sequences
In this section, we show how to apply the main functions of and provide all sequences of codes necessary to use GenTag. For a minimum and necessary acquaintance with R, we recommend Crawley (2012). Make sure commands are typed exactly as illustrated, as they are case sensitive. The first step is to install and load GenTag. You can follow this tutorial by typing the following commands at the R prompt. install.packages("GenTag")

library("GenTag")
You must create an object to hold the name/code of available colors. Make sure that all color names are types exactly the same in your database. For instance, if you typed "Green", "green", and "GREEN", R will recognize these as different colors/codes. tcol<-c("Black","Blue","Brown","Gray","Green","Pink","Purple","Red","White","Yellow") At this point you can create your first color tag sequences list. For this first example, we will use the "All equal" algorithm. Use the function genseq to create combinations, the argument ncombinations will determine the number of combinations to be produced, ntag is the argument for the number of tags used in each animal, and the colorsname argument is to determine the available colors to be sampled (i.e., the object created in the last step).
genseq (ncombinations=30, ntag=4,colorsname=tcol) If you have any difficulties in applying a function, access the help documentation by using the help command.

help(genseq)
# or just type ?genseq Note that in our example, we do not inform the algorithm used to generate color sequences. In this situation, Genseq will automatically use the default "All equal" algorithm. If a different algorithm is desired, it must be informed in the argument gen_method, as will be shown below. Another important point is to notice that in this example, previous used combinations were not taken into account in the uniqueness test. Thus, using the above example, previously used combinations can be generated again, leading to duplicates in your database.
There are several ways to import data into R, as shown in Fig. 1. In this tutorial we use simulated data of previously used combinations provided within the GenTag package. data(pre_used) # Load data example In the example, data are stored in an object named pre_used, a type of data frame. Information in a data frame can be accessed in various ways. To see what is contained in the pre_used object, type the following code to check the first elements of your data frame:

head(pre_used)
You can see that this data frame contains 5 columns, the first 4 are colors used in sequence (the order is: upper left, bottom left, upper right, bottom right), the last column is the year when each combination was used. You can use this to assess previously used sequences. Set the argument usedcombinations to the object (data frame or matrix) that contains color tag records (columns 1 through 4 in the example).
genseq (ncombinations=30, ntag=4, colorsname= tcol, usedcombinations=pre_used[,1:4]) To create sequences that contain special codes, such as metal for numbered tagging: set the argument nspecial to the number of special codes, and the argument name1 and location1 to inform the tag codes and where each special tag can be placed. In the following example, one metal tag will be used for all birds, in positions 2 or 4 (left or right bottom).
genseq(ncombinations=30, ntag=4, colorsname= tcol, nspecial=1, name1="Metal", location1=c(2,4)) Special codes can also be used to create combinations with different numbers of tags. In this situation, a special "color" named as "EMPTY" can be a proxy for non-tags. Two problems arise with using different numbers of tags: i) misidentification of individuals in the field, since some animals can actively remove tags; ii) several synonyms of combinations, for example, by using 2 tags in each leg "EMPTY"-"Green"-"Red"-"Blue" is synonymous with: "Green"-"EMPTY"-"Red"-"Blue". To adjust the test of uniqueness for codes with "EMPTY" data, set the argument emptyused to TRUE, inform which code is the proxy of non-tag at argument emptyname, and define which tags are in the same group (e.g., applied on the same leg) by arguments g1,g2,…g6 (in the example g1 represents left leg and g2 represents right leg).
genseq (ncombinations=30, ntag=4, colorsname= tcol, usedcombinations=pre_used[,1:4], emptyused = TRUE, emptyname = "EMPTY", g1 = c(1,2), g2 = c(3,4)) Until now, the combinations were just displayed on R console. To export combinations, you can address combinations to an object, and then export this object as a .txt or .csv file. setwd(choose.dir())# Choose a working directory to save your data combinations <-genseq(100, 4, tcol) # Save a set of sequences in an object # Export the object to csv file write.csv(combinations, file="Color_sequences.csv", row.names=F) # Export the object to txt file write.table(combinations, file = "Color_sequences.txt", sep = "\t", row.names = F) The tools presented above provide the versatility to adjust combinations to fit any particular study. All specifications are equally used for all sample algorithms. As mentioned before, to change the sample method you must use the argument gen_method. The "Variable frequency" creates combinations of tags using different probabilities to sample each color. Thus, to apply this method it is also necessary to inform a proportion of each available color. You can set the sample ratio by an object with ratios present in the same sequence as the color name, tcol object in our example.
# Create an object to hold the ratio for sampling p<-c(1,2,5,1,2,2,4,5,8,5) # Generate sequences by Variable frequency algorithm genseq(ncombinations=30, ntag=4, colorsname=tcol, gen_method="vfrequency", colorsf=p) A good practice for those that decide to use this method is to create a spreadsheet with two columns, where the first column contains the name of the colors and the second contains the number of available tags. Next, import the table (as shown in Fig. 1), and use the first column to address color name (in colorsname argument), and the second column as a reference for the sampling ratio (in usedcombinations argument).
For a quick adjustment in color representation, we recommend the use of the "Life expectancy" method. This algorithm creates a restriction based upon color combinations. The sample ratio for each color is adjusted based upon an estimate of how many color tags still exist in nature. This method allows a proportional adjustment of colors in the population faster than the "All equal" method. To apply this method it is necessary to inform when each combination was used (yearusedcombinations argument). To improve accuracy, you can provide an estimation of yearly survival rate (yearsurvival argument), and lifespan (lifespan argument), which will provide an estimate on the remaining color tags present in nature based on ringing date. If yearsurvival and lifespan are undefined, it will be assumed that animals never die, and that the proportion that occurs in the natural population equals the total number of tags used. In a long-term survey, it is reasonable to not take into account old tag records. # Generate sequences by Life expectancy algorithm genseq(ncombinations=30, ntag=4, gen_method="lifexp", colorsname= tcol, usedcombinations=pre_used[,1:4], yearusedcombinations=pre_used[,5], yearsurvival= 0.8, lifespan=5, currentyear=2019) Figure 1. General overview of how to import pre-used sequences into R. There are several ways to import data into R, this is just one approach. A) Use a spreadsheet software (e.g. Microsoft Excel, LibreOffice Calc, Apple Numbers) to type your pre-used combinations.
In the example, the first row is the header, and five columns are used to present information of color tags. Columns 1, 2, 3 and 4 denotes positions upper left, bottom left, upper right, and bottom right, respectively; the last column denotes the year/breeding season when the bird was color tagged. B) Export your spreadsheet as a ".txt" file. C) Import your pre-used records and store in an object, by typing the following command at the R prompt: pre_used<-read.table(choose.files(), header = TRUE)