Skip to main content

Period Estimation and Rhythm Detection in Timeseries Data Using BioDare2, the Free, Online, Community Resource

  • 667 Accesses

Part of the Methods in Molecular Biology book series (MIMB,volume 2398)

Abstract

One of the key objectives of data analysis in circadian research is to quantify the rhythmic properties of the experimental data. BioDare2 is a free, online service which provides fast timeseries analysis, attractive visualizations, and data sharing. This chapter outlines the description of an experiment for BioDare2 and how to upload and analyze the numerical timeseries data.

Key words

  • Circadian rhythms
  • Biological clocks
  • Research data management
  • Period analysis
  • Rhythmicity
  • Data repository
  • Data sharing
  • Metadata

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

1 Introduction

Circadian research touches many fields from agriculture to biomedicine. Circadian timeseries over several days are costly to obtain, and mathematical analysis is required to measure the timing features in the data. Analysis of and access to timeseries data are therefore critical, across the diverse, rhythm research community. However, we still encounter barriers in making data fair and open, because high-quality data deposition takes time and substantial effort. Linking the process of data sharing to routine data analysis can compensate users for the time spent on data deposition and thus promote data sharing. This is the main principle behind BioDare2 (https://biodare2.ed.ac.uk).

BioDare2 provides fast timing analysis, summary statistics, and attractive data visualizations in a modern web interface, all to ensure the best user experience. Access to BioDare2 is provided on condition that data will be made public. Thus, public data sharing is a “side effect” of using our analysis tools and is not perceived as an additional burden. On the contrary, BioDare2 increases research productivity as its analysis methods are faster and easier to use than any alternative. This approach has successfully attracted data from worldwide users, gaining over 400,000 data timeseries in the 2 years since inception, with continuing fast growth.

Here, we describe how BioDare2 can be used to visualize and pre-process data, perform rhythm detection, and estimate period , phase , and amplitude values of individual timeseries.

2 Materials

  1. 1.

    Access to a modern web browser with JavaScript enabled; we use Google Chrome (there are currently no known issues of compatibility with any other up to date browsers).

  2. 2.

    An account in BioDare2 (new accounts can be registered at the BioDare2 website: https://biodare2.ed.ac.uk).

  3. 3.

    Numerical, timeseries data file: typically an Excel file in a tabular form, with one column containing time values followed by one or more columns of measured data values (see Note 1).

3 Methods

The basic entity in BioDare2 is an experiment. Although a timeseries is the basic data unit to be processed and analyzed, many timeseries are typically collected in parallel in contemporary circadian research. The behavior of the circadian system is greatly influenced by the environmental conditions under which the data were generated, which typically vary between experiments. Thus, it is important to store not only the timeseries data in BioDare2 but also the context of the measurement, the experimental details.

The procedure for processing data with BioDare2 is a multistage operation as follows:

  1. 1.

    Creating and describing the new experiment in BioDare2.

  2. 2.

    Importing and labeling numerical timeseries data.

  3. 3.

    Visualizing the timeseries.

  4. 4.

    Analyzing timeseries data to estimate period , phase , and amplitude values or test for rhythmicity.

Conventions. Text in bold and in quotation marks, e.g., “New experiment,” refers to buttons, links, and menu options found on BioDare2 screens, which trigger specific actions. Text in italics refers to the visual elements on the screen, for example, dialog titles or labels for input fields.

Disclaimer. The layout of the screens or action labels may have changed since the publication of this book, as new functionalities are continually added to BioDare2. However, the features described below are “core” functions of BioDare2; the typical processing steps and their mechanics will not change even if the user interface is modified in the future.

3.1 Creation of a New Experiment in BioDare2

  1. 1.

    In your web browser, navigate to the BioDare2 home page (https://biodare2.ed.ac.uk) and login.

  2. 2.

    Click “New experiment” in the top menu (Fig. 1a), which will load the page shown in Fig. 1.

  3. 3.

    Complete the form provided to describe the details of the experiment, by filling in the free text fields, selecting from the options list, or using the autocompletion fields (Fig. 1b). At the minimum, the following items should be provided:

    1. (a)

      Descriptive name of the experiment which helps to distinguish this data set from others.

    2. (b)

      Purpose/hypothesis which summarizes the main aims of the experiment.

    3. (c)

      Description which outlines the specifics of the experiment, for example, environmental conditions and biological materials.

    4. (d)

      Species.

    5. (e)

      Data category that distinguishes between experimental data types, usually on the basis of the assay method.

    The form is divided into sections, and each section has an “Accept” button that stores its state (Fig. 1c). The fields contain hints on how to complete them (Fig. 1d), and uncompleted fields are marked in red (Fig. 1e).

    At the bottom of the form, the “Create” button will create a new record in BioDare2 with the information provided.

  4. 4.

    After the creation form is submitted, the experiment dashboard screen is presented, which is the launch pad of most of the activities undertaken in BioDare2.

Fig. 1
figure 1

Screenshot of the Create Experiment screen. (a) Menu item that opens the form. (b) Input form elements. (c) Accept button to store the section entries. (d) Input fields contain hints about the desired content. (e) Erroneous fields are marked in red and accompanied by an error description

3.2 Import of Numerical Data

  1. 1.

    Start adding the numerical data by clicking the “Import data” button.

  2. 2.

    Select the appropriate file format from the list and drag the data file into the upload area on the screen or click inside to open the file selection dialog. Press “Upload selected.”

  3. 3.

    The next step of the description process will expand in the form (here we limit the procedure only to the Excel table format, see Note 1).

  4. 4.

    Define the layout of the data; for example, if timeseries data points are in rows or if they are already labeled in the file.

  5. 5.

    In the next step (Fig. 2), define time column properties by clicking on the cell containing the time of the first measurement and then selecting the time unit. It is also possible to define the time offset which should be applied during import (see Note 2).

  6. 6.

    Labeling the data is the most laborious step in BioDare2, so two options are available to streamline the process:

    1. (a)

      Importing existing labels from the file.

      Very often, the data table already contains the labels. The labels might have been provided during the quantification step of the image processing, written by a PCR machine, or the user may simply find it easier to label the data in Excel rather than using the web UI. In that case, the row containing labels must be clicked, and the labels will be read during import.

    2. (b)

      Selecting columns for labeling.

      Click on a column header (cells containing letters at the top of the table) to select one column or click and drag to select a column range. The “Label column(s)” popup will appear which allows the user to enter the label text. Clicking “Assign” or pressing “Enter” assigns the label to the selected columns and then automatically moves focus to the next column on the right, so that the proceeding block of data can be described.

      Data content is presented in pages; in order to describe all the data, the next page must be loaded using the “>” of the pagination widget on top. The number of columns loaded can be set with the “Columns per page” drop-down.

  7. 7.

    Optionally select which labels signify “Background noise”; columns with these labels will be used for data pre-processing (see Note 3).

  8. 8.

    Once the time column has been defined and the data columns are labeled, press “Import timeseries” to initiate conversion of the selected data from the file into a timeseries. After the timeseries is successfully imported, the “Show timeseries” screen is displayed, which presents the data visualization dashboard.

Fig. 2
figure 2

Screenshot of the Timeseries import screen. (a) A current step of the import process. (b) The first time point selected by clicking. (c) Each step contains hints how it can be completed. (d) Column headings can be selected to label their content

3.3 Visualizing Timeseries

The timeseries can be visualized on traditional line plots via the “Show data” button (Fig. 3b).

Fig. 3
figure 3

Data visualization. (a) Pre-processing options for timeseries plotting. (b) Pagination and sorting panel. (c) Line plot of a data set. (d) Heatmap chart of the same data set

Options available for customizing the timeseries visualizations are (Fig. 3a):

  1. 1.

    Scaling data using the “from” and “to” fields (leaving at the default of 0 simply scales the timeseries according to their minimum and maximum data values, respectively).

  2. 2.

    Detrending and normalizing data by one of the available methods (see Note 4).

  3. 3.

    Organizing data on the plots by applying sorting (see Note 5), trace alignments, and pagination.

The individual traces plotted within each chart can be hidden or revealed by clicking on the data labels beneath the x-axis on each chart.

Alternatively, data can be visualized in a heatmap by pressing the “Heatmap” button (Fig. 3c). This visualization is especially useful when analyzing large data sets. The heatmap screen offers similar customization options as for the line plots. Hovering over a heatmap cell shows the value for the corresponding data point, while hovering over the widgets on the left of the graph displays the data label.

The version of the processed data used for plotting (e.g., cubic detrended timeseries normalized to the maximum) can be downloaded in the numerical CSV format. Two download buttons are available: “current view,” which retrieves the currently viewed data subset, and “full,” which retrieves the entire data set. This output can be useful for plotting and analysis with other software.

3.4 Period Estimation

A rhythmic signal can be characterized by its period , amplitude , and phase (see Note 6), and BioDare2 provides six methods for estimating period values: FFT-NLLS [1], mFourFit [2], MESA [3], Enright Periodogram [4], Lomb-Scargle periodogram [5], and Spectrum resampling [6].

3.4.1 Starting a New Analysis

  1. 1.

    Click on “Period analysis” to open the “Start period analysis” screen, which contains a form for setting analysis parameters and plots of the input data to be analyzed.

  2. 2.

    Define the “Data window” by specifying the two time points between which the timeseries data will be analyzed. The default values of 0 “from” and 0 “to” indicate the first and last time points in the data, respectively, so the entire timeseries is analyzed.

  3. 3.

    In “Input data,” select the detrending method that should be applied prior to analysis. The plots are automatically refreshed to show the detrended data. The default detrending method is linear, but more vigorous detrending methods such as cubic or baseline can be used. The more severe detrending should be used if individual oscillations are not clearly visible on the plots due to the presence of a trend (i.e., the trend level is larger than the oscillation amplitude ). You can read more about the impact of trends on the period analysis in our Zielinski et al.’s (2014) paper [7].

  4. 4.

    Populate the “from” and “to” fields for “Expected periods,” which define the range of periods that are considered to be circadian. This range affects the behavior of certain analytical methods as described in Note 7. The default values are from 18 to 34 h, respectively, and should be changed only if input data suggest periods outside this range.

  5. 5.

    Select an option from the “Analysis Method” drop-down list. The strengths and limitations of each method are described in our paper [7]. We recommend using two distinct methods FFT-NLLS and MESA, as these methods are based on completely different principles which permit the cross-validation. FFT-NLLS is a well-established and commonly used analysis method in the field of circadian studies, while MESA proved to be the most accurate in our method evaluations.

  6. 6.

    Press “Analyse,” which will initiate the analysis with the specified parameters, and advance to the “Period analyses” screen which contains details of all analysis jobs and their results.

  7. 7.

    The newly submitted analysis (“job”) is displayed at the top of the screen, and its status is continually auto refreshed until all calculations are completed (a typical analysis takes few seconds). Once completed, the status of the job is updated to FINISHED, and the results are available for examination.

3.4.2 Examining Results of Period Estimates

Each analysis has its own pane, which contains a summary of its parameters at the top, followed by action buttons and sections with result graphs and the numerical output values (Fig. 4). The results pane of the most recent analysis is expanded by default, while other analysis panes can be accessed by clicking their associated headers.

  1. 1.

    Period estimates are illustrated on a box-and-whisker plot (see Note 8, Fig. 4a) which shows period distributions in each group of replicates (grouping is based on the label values). The groups can be sorted in the original input order by selecting “None,” ordered alphabetically by data “Labels” or by the “Period ” median value (Fig. 3b). Sorting by median is probably the most useful during interpretation of the results.

    The individual groups can be hidden by clicking on their labels in the legend below the graph (Fig. 4c). The plot can be downloaded by pressing the button featuring the downward-pointing arrow icon (Fig. 4d), and it is saved in .svg format, enabling further editing in vector graphics software such as the free Inkscape application.

  2. 2.

    Phase estimates are illustrated on a polar plot (Fig. 4e) showing a representation of a 24-h clock face, with hands pointing to the average phase value for each data group (with values grouped by label). Individual phase values can be drawn by pressing the “Ind.” button. The values can be reported relative to time point 0 by pressing “Zero” or relative to the start of the analysis window by pressing “Window” (Fig. 4f). The phases can be represented in “Circadian” units (in the range 0–24, Note 9) or “Absolute,” in which case the clock face runs from 0 to the maximum phase value in the data set (see Note 10). There are four ways of calculating phases: “Fit,” “Method,” “First,” and “Avg.,” which are described in detail in Note 11 and in the BioDare2 documentation (Fig. 3f). We recommend that users calculate phase using the “by fit” method and report it in circadian time units (CT; 24th’s of the period ). As with the period plot, there is a button to download the graph in .svg format (Fig. 4d).

  3. 3.

    The average values for the period , phase , and amplitude parameters of each data group are available after clicking on “Summary statistics” (Fig. 4g), which opens a table with the mean and standard deviation values for each parameter. Calculation of amplitude values is based on the same methods as for phase (see Note 12). Changing the switches next to the phase plot (e.g., selecting absolute unit, or methods for phase ) also updates the values for phase and amplitude in the summary table. The number reported (“N”) is the number of individual timeseries included in the statistics, which may be less than the number submitted for analysis.

  4. 4.

    Individual results” opens a table with estimates for period, phase , and amplitude estimates for each data trace. Again, the method and units for phase and amplitude calculations are set using the switches next to the polar plot. There is also an ERR column with an analysis error as determined by each method (for FFT-NLLS , it is relative amplitude error, RAE ) and GOF column containing goodness of fit values as described in Note 13. Clicking on the “fit” cell value in the final column opens a popup displaying a plot of the original timeseries along with the fitted line generated by the analysis method, which is a practical method of verifying the soundness of the fit-based methods of analysis (see Note 14).

  5. 5.

    All the numerical values (for individuals and data groups) can be downloaded in tabular csv format using the download button (arrow down) at the top of the analysis pane (Fig. 4d).

  6. 6.

    The “Select periods” button at the top of the analysis pane navigates to a new screen, “Select periods from job.” On this screen, all the individual results are presented with radio button switches labeled “ignored,” allowing certain results to be excluded from the summary statistics. We would recommend using this screen to remove outliers from the results. This screen is also used to deal with multiple periodic components identified by the FFT-NLLS method; see Note 15 for details.

Fig. 4
figure 4

Screenshot of the job pane presenting period analysis results. (a) Period values represented on a box plot. (b) Switches for ordering the plot content. (c) Clicking on a data label can hide the corresponding timeseries from the graph. (d) Download buttons for downloading numerical data or plots in SVG format. (e) Polar plot with phase values. (f) Switches that control methods for calculating phase /amplitude and their units

3.5 Rhythm Detection

In short timeseries (e.g., 48 h), there may be only one full period within the data, so any estimate of the period value makes little sense. Typical examples are -omics data which are sparsely sampled (e.g., every 4 h) over a relatively short duration (e.g., 1 or 2 days). For these types of data, rhythmicity tests were developed. At present, BioDare2 provides an implementation of the classic JTK_Cycle test [8] and its more robust version: the empirical JTK_Cycle method [9].

3.5.1 Starting a New Rhythmicity Test

  1. 1.

    Click on “Rhythmicity” to open the “Start rhythmicity test” screen which contains a form for setting analysis parameters and plots of the input data being analyzed.

  2. 2.

    Define the “Data window,” i.e., the start and end time between which the timeseries will be analyzed. A value of 0 in “from” or “to” denotes from the beginning and to the end of the data, respectively.

  3. 3.

    In “Input data,” linear detrending can be performed prior to the analysis. By default, no detrending is used which is a typical approach for rhythmicity tests of -omics data.

  4. 4.

    The “test method” currently offers the classic JTK and empirical JTK (BD2 eJTK) methods, but the ARSER [10] implementation is underdevelopment.

  5. 5.

    Analysis Presets” is used to define the standard curves to which the data are compared in the rhythmicity test. The default “eJTK Classic” pre-set analysis tests against a set of asymmetric cosine waveforms with a 24 h period , as in the original eJTK publication [9], and is recommended for typical -omics experiments. “BD2 Classic” is a pre-set analysis that can be used for testing against a wider period range or longer timeseries, but it has a higher rate of false negatives for typical 24-h rhythmic data.

  6. 6.

    Press “Test Rhythmicity” to start a new analysis, and the screen will update to display “Rhythmicity tests.”

  7. 7.

    The newly submitted analysis (“job”) is displayed at the top of the screen, and its status is automatically refreshed until all calculations are completed. Rhythmicity tests are generally slower than period estimates and can take up to 10 min. Once completed, the job status is updated to SUCCESS, and the results are available for examination.

3.5.2 Examining Rhythmicity Test Results

Each analysis has its own pane, which contains a summary of its parameters at the top, followed by action buttons and the sections with the analysis results. The pane for the most recent analysis is expanded by default, while other analysis panes can be accessed by clicking the associated headers.

There is a “P-value threshold” parameter switch for specifying the threshold value above which the timeseries data should be rejected as arrhythmic.

At the bottom of the test pane, there is a table with individual results for each data trace. The “Rhythmic” column is populated with true or false values according to the test result p-value and the set threshold. “emp-p” (or “p”) contains the empirical p-value as determined by the eJTK method (or p for classic JTK); “Tau” contains the Kendall τ rank correlation coefficient. False discovery rate can be controlled with Benjamini-Hochberg correction of p-values. “Show pattern” generates a description of the standard curve that best matches the data trace, such as its period or peak time (see Note 16). The results can be sorted by data labels, p-values, or pattern properties. The numerical values can be downloaded in a csv format using the download button (the downward-pointing arrow symbol).

4 Notes

  1. 1.

    Timeseries data can be stored in an Excel, CSV, or TSV file in the following form: one of the columns holds the measurement time for each data point, while the other columns contain recordings of measured values at each time. The time column can represent either the actual time from the experiment start or a serial number (such as an image number) which will be converted to the time during import, depending on the value of the parameter: “Images interval.

    Alternatively, data can be recorded in rows rather than columns following similar principles.

    Details and Constraints

    1. (a)

      Only the first data sheet is read from an uploaded Excel file.

    2. (b)

      The time column should be located before (to the left of) the first data column, although it does not have to be the first column in the sheet.

    3. (c)

      There may be missing data points in timeseries; they should be represented as blank cells in the sheet.

    4. (d)

      For time in hours, the time is represented as a fraction of an hour, so 1.25 means 1 h and 15 min.

    5. (e)

      Where time values are provided as a series of image numbers, they must be integers starting from 1 (first picture taken), and they must be taken at equal intervals as indicated by the “Images interval” property.

      BioDare2 can also read raw data files produced by the Packard (now PerkinElmer) TopCount reader for multi-well plates. The TopCount can write data in multiple ways, so please see the current documentation of BioDare2 for details.

  2. 2.

    In BioDare2, time 0 should coincide with the beginning of the experimental conditions/ZT/subjective dusk/dawn. For that reason, when raw data are uploaded, then the time offset parameter can be used to adjust the time values accordingly. So, for example, if the data were recorded from 12:00 o’clock and the first point in the raw data file is recorded as time 0, but subjective dawn was at 9:00, then first data point should be corrected to have a time of 3 h. To do so, a time offset of +3 h must be specified when the raw data are uploaded.

  3. 3.

    One way of accounting for technical noise in the data (e.g., caused by detector dark current, light leakage, etc.) is by placing black tokens within the imaging field. The recorded values for those tokens should in principle represent a constant level of 0, so the actual measured values can be used for correction of the background noise. The timeseries described as “Background noise” are treated as the background levels. All “Background noise” columns are averaged, and the averaged background timeseries is then subtracted from all the other timeseries during the raw data import.

  4. 4.

    The options for detrending are the following:

    • Linear detrending is the least intrusive detrending method; it simply applies a linear regression model to the data and subtracts the optimum fitted, straight line from the data points.

    • Cubic detrending fits a polynomial model of degree 3 to the data and subtracts the resulting curve from the data. It may introduce artifacts for short data series or even remove the oscillations for very short data (e.g., timeseries with only one full oscillation).

    • Baseline detrending is based on kernel smoothing. Kernel smoothing can be explained as a more sophisticated form of moving average. Once the smoothed version of the data (the trend) is found, it is subtracted from the original. This detrending works well, with the exception that the detrended data points towards the beginning and end of the timeseries—usually the first and last half cycle of data—must be discarded.

    • Amp&baseline detrending removes the trend not only in the baseline but also in the amplitude . It first finds the baseline trend using the kernel smoothing procedure above and then uses it as the reference to establish the amplitude change in the data. The change in amplitude is then smoothed, giving the amplitude trend, which is used to scale the original data. The beginning and end of the data suffer from strong artifacts, but it is the only available method that can remove amplitude dampening.

    The options for normalization are the following:

    • To [−1,1] rescales the timeseries values to produce a mean value of 0 and oscillations in the range [−1,1]. Rescaling is achieved by subtracting first the mean value for each data point and then dividing by either peak or trough height whichever is larger.

    • Fold change normalizes the data set by dividing each data point by the minimum value; if the minimum value is less than or equal to 0, an empty timeseries is produced.

    • Z-Score calculates the number of standard deviations by which the point is above or below the mean value.

    • To extreme normalizes the timeseries data to 1 by dividing each data point by either the peak value or the absolute value of the trough (i.e., the positive distance between the trough and 0), whichever is greater.

  5. 5.

    In addition to standard sorting by label or ID, once data has been analyzed, data traces can be sorted according to the results of a period analysis or rhythmicity test, for example, by the estimated period values or significance of the rhythms.

  6. 6.

    The concept of phase and amplitude is consistently defined and “mathematically sound” only for cosine-like functions, y = A*cos(xf), where f is the phase parameter and A is the signal amplitude . For other waveforms, phase is linked to the peak time; amplitude is computed as half the distance between the peaks and the troughs.

  7. 7.

    The period range defines which period values should be considered (e.g., as circadian) by the analysis methods. Firstly, some of the methods (e.g., MESA, or periodograms) scan and discretize period values within the period range (in BioDare2 with a step of 0.1 h), calculating some statistic and then selecting a period value with the best statistic. Period range therefore determines which period values will be tested during analysis. Secondly, methods such as FFT-NLLS can find multiple period components that together describe the data. Only those periods within the defined period range will be reported and included in summary statistics. Periods outside this range may be detected but are ignored unless user selects them (see Note 14).

  8. 8.

    A box-and-whisker plot is a standard way of showing multiple values from biological replicates. Assuming that period values are sorted, the box encompasses the interquartile range of the values from the 25% smallest value to the 75% largest (in other words, half of the data points centered around the median). The median is the middle dotted line in the box, and the mean is drawn as a solid line (they coincide if the distribution of values is symmetrical). The whiskers are at the lowest value still within 1.5 IQR (interquartile range) of the lower quartile and the highest point still within 1.5 IQR of the upper quartile. The circled points outside the box represent outliers, which in general should be removed before looking at Summary statistics.

  9. 9.

    Circadian units are scaled to 24 h using the estimated free-running period . Imagine two traces: one with a period of 24 h and peaking time at 6 h and another one with period 32 h and peaking time 8 h. In absolute units, it looks as if the phase of the second clock was delayed by 2 h (to 8 instead of 6), while in reality only its clock runs slower. The peak time is still at 1/4of the whole cycle (8/32), which is why circadian units are better for looking at phases since they can accommodate different speeds of the underlying clocks.

  10. 10.

    Phases in BioDare2 are always reported in the range from 0 to period : phase ϵ [0,period ] so values should roughly coincide with the time of the peak (for circadian units phase ϵ [0,24)). This differs from some mathematical approaches in which phase is defined from –period /2,period /2, but our approach is more natural in chronobiology and easier to interpret. For example, if a data timeseries peaks at 18, 42, 66, and 90 h, then the phase would be 18 h (not −6 h).

  11. 11.

    As only the analysis methods based on cosine fitting define the method of calculating phase , BioDare2 provides a set of options that can be used to obtain phase estimates from all period analysis methods.

    In the phase “by fit” method, after obtaining the period value, one cosine function having that period is fitted to the data. The fitting procedure finds phase and amplitude parameters of the cosine that follows the data the most closely and reports those values as phase and amplitude . Such a phase value is well defined, but it may not coincide with the peaks in the data if the signals are not symmetrical.

    As the name suggests, phase “by first peak” reports the time of the first peak (potentially back-calculated to 0 using the estimated period if the timeseries data do not start at 0). However, the peak is found in the fit that has been generated by the analysis method rather than a peak in the original data. Phase “by average peak” finds one peak per period in the generated fit and uses their times to calculate circular averages (modulo period and wrapping around zero).

    Phase “method”-specific reports the phase using the original approach of the analysis algorithm. For FFT NNLS, it is the phase of the main circadian component found by the method. If FFT-NLLS finds only one component, the value should be the same as phase by fit; otherwise they may be slightly different. MESA does not define phase , so phase by fit is reported here instead.

  12. 12.

    Amplitude depends on the estimation method selected for the phase . Phase “by fit” reports the amplitude of the fitted cosine. Phase “by first peak” amplitude is calculated using the first trough and peak (of the method fit). Phase “by average peak” amplitude is calculated using trough and peak values in each periodic cycle.

  13. 13.

    Goodness of fit (GOF) is defined as the ratio of two errors: the method fit error, i.e., the error between the original timeseries and the curve predicted by the user-selected algorithm, and the “polynomial fit error,” i.e., the error between the original timeseries and a polynomial (cubic) curve fitted to the timeseries. The ratio can vary from 0 (where the model provides a perfect fit) to a large number, indicating that the model is no better than (or is worse than) a cubic fit to the data.

  14. 14.

    MESA is not a fit-based method; it does not fit a curve to the data. It finds a model that generalizes the input data as combinations of preceding and subsequent points. BioDare2 uses this model to simulate a fit; however, it may produce artifacts at the first and last few data points.

  15. 15.

    The analysis pane may contain a message like “5 results needs attention,” which would mean that for five data traces, the method found period values but these were not automatically included in the Summary statistics. The “Select periods” button allows the inspection of results details and manual corrections. Those time series which yielded a period outside the specified circadian range or have multiple circadian periods will be listed first (in red). In order to reproduce the complex signal shape, FFT-NLLS often finds multiple oscillating components that need to be composed in order to give a good data fit. If more than one of them has a period inside the circadian range, then the user must decide which should be treated as the “correct” component. It is possible to view the period analysis fit graph which shows the original timeseries, the method fit, and the cosines for each of the proposed periods. That is the easiest way to decide which period value should be selected, since it should be the cosine which most closely follows the input data on the graph.

  16. 16.

    The periodic properties (such as period and peak time) of pattern curves used for testing with JTK methods are often reported as period values of the data. We strongly discourage such an approach. Data sets collected from experiments of shorter durations (i.e., a duration of less than 48 h) cannot give meaningful period estimates, and it would be even more inappropriate for such “period ” values to be reported for data collected over only 24 h. Our tests also demonstrated that for typical cases of timeseries with a period in the range of 22–26 h and measured over 48 h, the rhythmicity tests performed better when using the smaller pre-set of cosines with only 24 h than a pre-set spanning 22–26 h. Obviously, JTK always reports only the 24-h period when using the smaller pre-set.

Change history

  • 10 April 2022

    The original version of the chapter “Period Estimation and Rhythm Detection in Timeseries Data Using BioDare2, the Free, Online, Community Resource” was previously published non-open access. This has now been changed to open access under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), and the copyright holder updated to “The Author(s)”. For further details, see license information in the chapter. The chapter and the book have been updated with the change.

References

  1. Plautz JD, Straume M, Stanewsky R, Jamison CF, Brandes C, Dowse HB, Hall JC, Kay SA (1997) Quantitative analysis of drosophila period gene transcription in living animals. J Biol Rhythms 12:204–217. Erratum in: J Biol Rhythms 1999 14:77

    CAS  CrossRef  Google Scholar 

  2. Edwards KD, Akman OE, Knox K, Lumsden PJ, Thomson AW et al (2010) Quantitative analysis of regulatory flexibility under changing environmental conditions. Mol Syst Biol 6:424

    CrossRef  Google Scholar 

  3. Burg JP (1972) The relationship between maximum entropy spectra and maximum likelihood spectra. Geophysics 37:375–376

    CrossRef  Google Scholar 

  4. Enright JT (1965) The search for rhythmicity in biological time-series. J Theoret Biol 8:426–268

    CAS  CrossRef  Google Scholar 

  5. Lomb NR (1976) Least-squares frequency analysis of unequally spaced data. Astrophys Space Sci 39:447–462

    CrossRef  Google Scholar 

  6. Costa MJ, Finkenstädt B, Roche V, Lévi F, Gould PD et al (2013) Inference on periodicity of circadian time series. Biostatistics 14(4):792–806

    CrossRef  Google Scholar 

  7. Zielinski T, Moore AM, Troup E, Halliday KJ, Millar AJ (2014) Strengths and limitations of period estimation methods for circadian data. PLoS One 9:e96462. https://doi.org/10.1371/journal.pone.0096462

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  8. Hughes M, Hogenesch J, Kornacker K (2010) JTK CYCLE: an efficient non-parametric algorithm for detecting rhythmic components in genome-scale datasets. J Biol Rhythm 25:372–380. https://doi.org/10.1177/0748730410379711

    CrossRef  Google Scholar 

  9. Hutchison AL, Maienschein-Cline M, Chiang AH et al (2015) Improved statistical methods enable greater sensitivity in rhythm detection for genome-wide data. PLoS Comput Biol 11(3):e1004094. https://doi.org/10.1371/journal.pcbi.1004094

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  10. Yang R, Su Z (2010) Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation. Bioinformatics 26(12):i168–i174. https://doi.org/10.1093/bioinformatics/btq189

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

Funded by the European Commission through FP7 Integrated Project TiMet (award 245143), by the Wellcome Trust (award 204804/Z/16/Z) and by the Biotechnology and Biological Sciences Research Council (BBSRC) through the Centre for Systems Biology at Edinburgh [BB/D019621] and UK Centre for Mammalian Synthetic Biology [BB/M018040].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew J. Millar .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2022 The Author(s)

About this protocol

Verify currency and authenticity via CrossMark

Cite this protocol

Zieliński, T., Hay, J., Millar, A.J. (2022). Period Estimation and Rhythm Detection in Timeseries Data Using BioDare2, the Free, Online, Community Resource. In: Staiger, D., Davis, S., Davis, A.M. (eds) Plant Circadian Networks. Methods in Molecular Biology, vol 2398. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1912-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1912-4_2

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1911-7

  • Online ISBN: 978-1-0716-1912-4

  • eBook Packages: Springer Protocols