Objective

Maize (Zea mays subsp. mays L.) plays an important role in the global economy. As a crop, it displays a variety of uses such as food, feed, and fuel. At the same time, and due to its versatility and relevance, maize has been widely studied. The Genomes to Fields (G2F) is a collaborative initiative involving scientists from the public sector that support growers, consumers, and society. G2F researchers generate phenotypic, genotypic, environmental, and metadata datasets to facilitate the understanding of the potential and challenges of maize production in different environments.

Individual genotype performances differ across environments, and the magnitude of this difference dictates the importance of the Genotype by Environment (G × E) interaction. Understanding and harnessing G × E interactions improves the efficiency in the use and allocation of resources, and it facilitates the identification of genotypes with higher stability across a range of locations, the identification of locations where the effect of G x E is minimized, and the identification of mechanisms affecting the differential response of phenotypes to variable environments. Furthermore, advances in our understanding of the fundamental components contributing to the differential response of plants to environmental cues will also improve genomic and phenotypic predictabilities for traits of interest. Therefore, this data release provides a unique resource of combined agronomic, phenological, and morphological information to dissect G × E interaction.

In the 2018 and 2019 experiments, 1153 publicly available hybrids were evaluated through a network of collaborators in 32 different locations. The main group of hybrids was produced by the cross of doubled-haploid (DH) inbred lines from a collection of three biparental populations that share one parent in common (PHW65) and PHN11, Mo44, and MoG as the alternative parent, to two ex-PVP inbred testers, LH195 in Midwest to Southern locations, and PHT69 in Northern locations.

Data description

The 2018 and 2019 datasets are publicly available via CyVerse/iPlant and structured as described in Table 1. Briefly, the datasets included here are:

  • Phenotypic dataset: Phenotypic measurements that follow a standard set of instructions, available in the G2F webpage [1]. Standard traits include days to anthesis, days to silking, ear height, plant height, stand count, stalk lodging, root lodging, grain moisture, test weight, plot weight, and estimated grain yield. Raw data and quality-controlled data are reported. Out of range observations were set to missing following the rules described in the readMe and data description files.

  • Genotypic dataset: Inbred parents of the tested hybrids were genotyped using the Practical Haplotype Graph (PHG) [2, 3]. The data is minimally filtered, allowing the public to perform their own quality control steps prior to using it. The raw sequencing reads are available under BioProject ID PRJNA530187 [4]. The code used to create the genotypic data is also available at https://bitbucket.org/bucklerlab/g2f_2018_phg_genotyping/src/master/.

  • Environmental dataset: WatchDog 2700 weather stations (Spectrum Technologies) were placed at each field site. Data was collected at 30-min intervals from planting through harvest at each location. The geographic locations of the experiments are not identical across years due to crop rotation management practices; thus, the locations of the weather stations vary across years. Each station measured wind speed, direction, and gust; air temperature, dewpoint, relative humidity; soil temperature and moisture; rainfall and solar radiation. Additional measurements taken at selected sites included soil electrical conductivity, ultra-violet light, carbon dioxide, and photosynthetically active radiation. Instructions for weather station maintenance activities including pre-season tasks, field setup, maintenance throughout the growing season, and removal are available in the G2F webpage [5].

  • Soil dataset: Each field location collected soil samples that represent the experiment field. Collaborators follow instructions available on the G2F webpage for sample collection [5].

  • Supplemental dataset: Supplemental information consists of metadata (any field-level data collected at planting, in season, and/or at harvest), agronomic information (list of pesticides, nutrients, and irrigation applied), and cooperator list (collaborators responsible for the field locations in 2018 and 2019).

Table 1 Overview of data files and dataset for 2018 and 2019 planting seasons

Limitations

These datasets contain missing data. Missing data includes data not reported by collaborators or erroneous data as determined by data description files. In 2019, some locations had pedigree information set to missing due to packaging problems and only plot number was reported in the phenotypic dataset to reduce misinterpretation.