Introduction

Sugarcane is a complex autopolyploid and outbred species with a high level of heterozygosity. Modern sugarcane varieties are derived from the interspecific hybridization of Saccharum officinarum (2n = 80) and Saccharum spontaneum (2n = 40–128), resulting in highly polyploid and aneuploid plants. The introduction of this hybridization scheme represents a large breakthrough in modern sugarcane breeding, solving disease problems, providing increased yields and, adaptability to grow under several environmental conditions and improving ratooning ability (Paterson et al. 2013).

Genetic linkage map construction and QTL mapping provide information about the genetic architecture of traits, linkage and pleiotropy (Zeng et al. 1999). However, the construction of genetic maps in sugarcane is more complicated and laborious than in diploid species because (i) the high level of polyploidy and aneuploidy results in complex chromosomal segregation patterns during meiosis (Heinz and Tew 1987); (ii) mapping progeny are derived from bi-parental crosses between highly heterozygous outbred parents, in which there are different numbers of alleles per locus, resulting in a mixture of marker segregation patterns in the progeny (Wu et al. 2002; Lin et al. 2003; Pastina et al. 2010); and (iii) the linkage phase between markers and between markers and QTL is unknown (Pastina et al. 2012; Gazaffi et al. 2014).

Wu et al. (1992) proposed the development of genetic linkage maps based solely on a segregation analysis of single-dose markers (SDM). These markers represent alleles that are present in one copy in one of the parents and segregating in a 1:1 ratio in the progeny, or in one copy in both parents and segregating in a 3:1 ratio. The first genetic maps were published by da Silva et al. (1993) and Al-Janabi et al. (1993) using a population derived from a cross between the double-haploid ‘ADP85-0068’ and S. spontaneum ‘SES208’, trying maximize SDM detection. In general, the majority of published sugarcane linkage maps have been constructed based on SDM and on the double pseudo-testcross strategy (Grattapaglia and Sederoff 1994), which results in two individual maps, one for each parent (Daugrois et al. 1996; Ming et al. 2001, 2002a; Hoarau et al. 2002; Reffay et al. 2005; Aitken et al. 2006, 2008; Al-Janabi et al. 2007; as reviewed by Pastina et al. 2010; Shing et al. 2013). Garcia et al. (2006) presented an integrated sugarcane linkage map that was constructed based on the methodology proposed by Wu et al. (2002) and combined information provided by markers segregating in 1:1 and 3:1 ratios, resulting in an integrated linkage map for both parents. Integrated linkage maps are advantageous because more saturated maps are obtained, allowing for better estimates of QTL locations and the ability to estimate linkage and linkage phases more accurately. This approach was also used for the sugarcane linkage map presented by Oliveira et al. (2007), Palhares et al. (2012) and Pastina et al. (2012).

To date, approximately 22 linkage maps (reviewed by Alwala and Kimbeng 2010 and Pastina et al. 2010) for sugarcane exist. The majority of these maps were constructed using molecular markers that are read as dominant markers in polyploids, such as restriction fragment length polymorphism (RFLP), random amplified polymorphism (RAPD), amplified fragment length polymorphism (AFLP), and simple sequence repeat (SSR) markers. SSRs behave like dominant markers in complex polyploids because they do not allow the identification of the different alleles or the marker dosage (Garcia et al. 2013). Modern technologies, such as the Sequenom iPLEX MassARRAY® (Sequenom Inc., San Diego, CA, USA), allow for the evaluation of single-nucleotide polymorphisms (SNPs) throughout the genome considering the relative abundance of each allele, i.e., the allelic dosage (Serang et al. 2012; Garcia et al. 2013; Bargary et al. 2014; Mollinari and Serang 2015). Knowledge of the dosage and ploidy level of an SNP can significantly increase the information imparted by each locus and provides several advantages for genetic analysis. None of the published sugarcane genetic linkage maps are believed to be saturated, and the only map made with SNP markers uses only single dosages and does not consider ploidy level (Aitken et al. 2014).

Most of the economically important agronomic traits have a quantitative nature with polygenic inheritance (Falconer and Mackay 1996; Lynch and Walsh 1998) that is highly influenced by environmental conditions. The first QTL mapping study in sugarcane was performed by Sills et al. (1995). Other works studies making important contributions to QTL mapping include Ming et al. (2001, 2002a, b), Hoarau et al. (2002), McIntyre et al. (2005), da Silva and Bressiani (2005), Reffay et al. (2005), Aitken et al. (2006, 2008), Piperidis et al. (2008), Pastina et al. (2012), and Shing et al. (2013). Specifically, Pastina et al. (2012) proposed a mixed model for QTL mapping in sugarcane in order to include complex G × E interactions. Unlike previously reported studies, these authors considered the presence of genetic correlations between locations and harvests by testing a specific variance and covariance matrix for the genetic effects. In this sense, it was possible to consider genotype-by-environment, QTL-by-location, QTL-by-harvest and QTL-by-harvest-by-location interactions and thus to distinguish stable-effect QTL from those with interaction effects, contributing to a better understanding of the genetic basis of important traits of sugarcane.

Recently, Gazaffi et al. (2014) developed an approach for composite interval mapping (CIM) in full-sib crosses. In this method, linkage phase and segregation patterns are determined in addition to QTL location, allowing for the simultaneous analysis of QTLs with different segregation patterns. Additionally, Gazaffi et al. (2014) included markers as cofactors into the mapping procedure, enabling more precise estimates of putative QTL locations in the genome. This provides a useful framework for QTL mapping in sugarcane based on integrated maps.

In this study, we constructed an integrated genetic linkage map from a bi-parental cross between two Brazilian commercial sugarcane varieties with SSRs, AFLPs and SNPs. The SNP calling approach provides information about ploidy level, which could be used to infer the size of identified homology groups. The SNPs are also displayed as codominant markers (1:2:1 segregation ratio), which have never before been included in a sugarcane genetic map. These markers are more informative than the ones segregating in 1:1 or 3:1 patterns. We then performed QTL mapping using CIM after evaluating the field trials using mixed models. This methodology permits the reporting of the segregation pattern, linkage phase and additive and dominance effects for each putative QTL.

Materials and methods

Mapping population and field experiments

Full-sib progeny were obtained from a bi-parental cross between the elite clone IACSP95-3018 (female parent) and the variety IACSP93-3046 (male parent), which were developed by the Sugarcane Breeding Program at the Instituto Agronômico de Campinas (IAC). IACSP95-3018 is a promising clone that is also used as parent in the IAC Sugarcane Breeding Program, whereas IACSP93-3046 has a high level of sucrose, resistance to rust, good tillering and an erect stool habit and is recommended for mechanical harvest. The full-sib progeny were planted in 2007 at Sales de Oliveira (São Paulo, Brazil) and in 2011 at Ribeirão Preto (São Paulo, Brazil) in a randomized complete block design with four (in 2007) and three (2011) replicates and plots of 2-m-rows spaced 1.5 m apart. Both parents and two varieties (SP81-3250 and RB835486) were included in each replicate as checks. A preliminary analysis was made in all of the full-sib progeny using ten SSRs to avoid using individuals who not belong in the bi-parental cross.

Molecular marker data and map construction

A total of 187 individuals from the full-sib progeny were screened with SSRs, AFLPs and SNPs. Of the 140 SSRs that were used, 105 were EST-SSRs that had been developed from sequences in the Brazilian Sugarcane EST Project (SUCEST) database and presented in Pinto et al. (2004, 2006), Oliveira et al. (2007, 2009), and Marconi et al. (2011). The other 35 were genomic SSRs that had been developed either by CIRAD (Centre de Cooperation Internationale Recherché Agronomique pour le Développement, Montpellier, France) and described in Rossi et al. (2003) or by Cordeiro et al. (2000). A total of 25 AFLP markers (EcoRI/MseI) were screened according to Vos et al. (1995). The 531 SNP markers that were genotyped had also been developed from the SUCEST database, described in Garcia et al. (2013). The SSR and AFLP amplification products were separated by electrophoresis on 6 % denaturing polyacrylamide gels and visualized by silver staining (Creste et al. 2001).

SSR and AFLP markers were scored based on their presence (1) or absence (0) in the parents and segregating progeny. The 1:1 and 3:1 marker segregation was verified through a Chi square test using Bonferroni correction at a significance level of 0.05 to control for type I errors for multiple tests (Province 1999). The SNP genotyping data were generated using a Sequenom iPLEX MassARRAY® (Sequenom Inc., San Diego, CA, USA) as described in Garcia et al. (2013), and each SNP locus was assessed for ploidy level and allelic dosage (number of copies of each form) using SuperMASSA software (Serang et al. 2012). This software simultaneously considers all of the available information and the genetic constraints that the derived results must fulfill, i.e., the possible genotypes to be observed given the ploidy level and the parental genotypes, the ratio between allele intensities, and the expected complete polysomic segregations (Garcia et al. 2013). Loci were classified as single dose when they had an SNP with only a single copy of one of the alleles in one parent, being nulliplex in the other (thus segregating in a 1:1 ratio in the progeny) or when both parents had a single copy of the same allele (segregating in a 1:2:1 ratio).

The linkage map was constructed using the OneMap package (Margarido et al. 2007). To avoid false-positive linkages, a LOD Score of 5.8 and a recombination frequency of 0.50 were used to determine the linkage groups (LGs). To determine the marker order and linkage phases in each linkage group with five markers or less, all possible orders were compared, and the most likely order was selected. For LGs with more than five markers, all possible orders for the five most informative markers were compared and the most likely order identified; the remaining markers were then sequentially added to the position of highest likelihood with respect to that order. The distances in the genetic map were expressed in centimorgans (cM) based on the Kosambi mapping function (Kosambi 1944).

Phenotypic data analysis

The field experiment was evaluated in 2008 (plant cane) and 2009 (ratoon cane) at Sales de Oliveira (São Paulo, Brazil) and in 2012 (plant cane), 2013 (ratoon cane) and 2014 (ratoon cane) at Ribeirão Preto (São Paulo, Brazil).

Important agronomic and economic traits in sugarcane production were measured: soluble solid content (Brix), fiber percentage (Fiber), sucrose content (Pol), stalk weight (SW), stalk diameter (SD) and stalk height (SH). Each measurement was obtained from a sample of 10 stalks harvested from each individual plot according to the methods described in Consecana (2006) for both plant cane and ratoon crops. Brix denotes the total dissolved solids in cane juice, whereas Fiber refers to the water-insoluble matter that is present in the stalk.

To evaluate the phenotypic data obtained in different locations and harvests, an appropriate mixed model was adjusted by comparing different structures for the variance–covariance (VCOV) matrices of genetic (G) and non-genetic effects (R). The statistical model used for each trait separately was (underlined terms indicate random effects):

$$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{Y}_{ijrkm} = \mu + L_{k} + H_{m} + LH_{km} + \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{B}_{{j\left( {km} \right)}} + \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{G}_{{i\left( {km} \right)}} + \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{e}_{ijrkm}$$

where \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{Y}_{ijrkm}\) is the phenotype of the ith genotype (i = 1, …, n) in the jth block (j = 1, 2, 3) of the rth replicate (r = 1, …, 4), the kth location (k = 1, 2) and the mth harvest (m = 1, 2); μ is the trait mean; L k is the location effect; H m is the harvest effect; LH km is the location by harvest interaction; \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{B}_{{j\left( {km} \right)}}\) is the effect of the jth block in the kth location and mth harvest; \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{G}_{{i\left( {km} \right)}}\) is the effect of the ith individual in the kth location and mth harvest; and \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{e}_{ijrkm}\) is the non-genetic effect. The individuals can be separated into two groups by genotype: n = n g  + n c . The number of genotypes is represented by n g (i = 1, …, n g ), and the number of checks is represented by n c (\(i = n_{g} + 1, \ldots ,n_{g} + n_{c}\)). Similar to that of Pastina et al. (2012), the model for \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{G}_{{i\left( {km} \right)}}\) is:

$$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{G}_{{i\left( {km} \right)}} \left\{ {\begin{array}{*{20}l} {\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{g}_{{i\left( {km} \right)}}\quad i = 1, \ldots ,n_{g} } \\ {c_{{i\left( {km} \right)}} \quad i = n_{g} + 1, \ldots ,n_{g} + n_{c} } \\ \end{array} } \right.$$

With \(g_{{i\left( {km} \right)}}\) representing the random genetic effect of the ith genotype at the kth location and mth harvest and c i(km) representing the fixed effect of the ith check at the kth location and mth harvest.

The VCOV matrix G was obtained via the Kronecker direct product of \(G_{M} \otimes I_{ng}\), as also described in Pastina et al. (2012), where \(G_{M} = G^{L} \otimes G^{H}\), ⊗ is the Kronecker direct product of genetic effects, and I ng is an identity VCOV matrix of genotypes. To compare the factorial combination between locations and harvests, we also considered G M  = G LH. Likewise, the R M matrix was obtained via the Kronecker direct product of R L (residual effects between locations), R H (residual effects between harvests) and R B (residual effects between blocks) matrices; thus, \(R_{M} = R^{L} \otimes R^{H} \otimes R^{B}\). It was assumed that e MVN(0, R)and gMVN(0, G),  where e = (e 11111, …, e IJRKM ) and \(\begin{array}{*{20}c} g \\ {\left( {|1\left( {11} \right), \ldots ,g_{{I\left( {KM} \right)}} } \right)} \\ {g = } \\ \end{array}\), respectively.

Different models for the VCOV structures were compared via the Akaike information criterion (AIC) (Akaike 1974) and Bayesian information criterion (BIC) (Schwarz 1978). The G M matrix was G L; G L; G L, and different VCOV structures were tested for G H; G H matrix; G M and the model R M . For the factorial combination G M  = G L–H, comparisons of different VCOV structures were performed in a single step, and the R M matrix was used to model the G L, G H and G LH matrices, including identity, diagonal, uniform, uniform heterogeneous, analytic factor of order 1, auto-regressive of order 1 and unstructured, analogous to the modeling performed by Pastina et al. (2012). Likewise, the same VCOV structures were compared for R L, and R H, whereas for R B, factor analytic of order 1 and auto-regressive of order 1 models were not included. Once the best model was selected, joint adjusted means were obtained for each trait via best linear unbiased predictor (BLUP), combining the information of different harvests and locations, and the variance components were estimated by residual maximum likelihood (REML). All of the phenotypic analyses were performed in the GenStat (v. 16.1) software (VSN International, 2011).

The broad-sense heritability coefficient was calculated for each trait. Genetic correlations were estimated for each pair of traits based on the BLUP means and using the Pearson correlation coefficient as implemented in the R software (R Core Team 2013).

QTL mapping

To test the association between phenotype and genotype, each trait was analyzed separately using the joint adjusted means obtained via BLUP and the composite interval mapping (CIM) model (Zeng 1993), and the extended approach described by Gazaffi et al. (2014) was applied. QTL mapping analysis was performed as follows: (i) conditional multipoint QTL genotype probabilities for all positions in a discrete grid of evaluation points with a step size of 1 cM along the genome were estimated via hidden Markov models (HMMs) implemented in the OneMap package (Margarido et al. 2007); (ii) a genome scan was performed (in a grid of 1 cM) to detect QTLs; and (iii) subsequent to QTL identification, the significance of additive (α p , α q ) and dominance (δ pq ) effects was verified, and segregation patterns were estimated (Gazaffi et al. 2014). The inclusion of cofactors was based on a multiple linear regression analysis using stepwise selection with the AIC. We considered a maximum of 27 cofactors and a window size of 20 cM.

The threshold for the detection of a QTL was calculated using 1000 permutations and a significance level of 0.05 (Churchill and Doerge 1994) considering the distribution of the second highest peak for each linkage group (Chen and Storey 2006). For positions with evidence of putative QTL, significant marginal effects for α p , α q and δ pq were verified using LOD scores equivalent to a significance level of 0.05 (Gazaffi et al. 2014). The proportion of the phenotypic variance (R 2) explained by each detected QTL was obtained for all effects simultaneously. All of the analyses were performed in the R environment (R Development Core Team 2013) using an R test package under development.

Results

Linkage map

Using 140 SSR and 25 AFLP primer combinations, we scored 1102 polymorphic markers in the mapping population. Of these markers, 634 (57.5 %) segregated as single-dose markers (SDMs) in 1:1 (377) and 3:1 (257) ratios. Moreover, 531 SNP markers underwent dosage and ploidy level estimation using the SuperMASSA software (Serang et al. 2012). Of these, 54 (10.17 %) were SDMs with segregation ratios of 1:1 (30) and 1:2:1 (24) (Supplementary Material—Table S1). This low number of SDMs is agrees with the results reported by Garcia et al. (2013) and indicates that SDMs are not the most abundant markers in the sugarcane genome when modern genotyping technologies and appropriate analytical methodologies are used. In total, 688 polymorphic SDMs were used to construct the genetic linkage map.

The estimated linkage map has 421 markers (61.2 %) on 118 LGs, with 267 (38.8 %) markers remaining unlinked. The LGs vary in length from 1.0 to 142.9 cM, with an average length of 38.2 cM per LG and an accumulated length of 4512.6 cM. The average distance between markers is 10.7 cM, with average of 3.6 markers per LG, distributed irregularly along the chromosomes. Fourteen gaps are present, ranging from 30 to 39.7 cM. These gaps show that parts of the genome are only partially covered by the markers. The final linkage map is smaller than that constructed for R570 (Hoarau et al. 2001), Q165 (Aitken et al. 2005, 2014) and SP80-180 × SP80-4966 (Oliveira et al. 2007), which have 5849, 9058.3, 9774.4, and 6261.1 cM, respectively.

The 118 LGs were used to establish putative homology groups (HGs) based on at least one pair of common SSR-derived markers that were shared. In total, 87 LGs were organized into 16 putative HGs, each containing between 2 and 17 LGs (Supplementary Material—Figure S1). In accordance with previous studies (Hoarau et al. 2001; Aitken et al. 2005, 2014; Raboin et al. 2006; Garcia et al. 2006; Oliveira et al. 2007), the markers were not distributed equally within the different HGs. The largest HG contains 54 markers distributed along 11 LGs, and the smallest HG comprises six markers distributed between two LGs. The remaining 31 LGs contain no pairs of SSR-derived markers in common to allow for assignment into HGs. In some cases, the marker order was maintained among LGs. For example, in HGI, the markers scb060, SMC1047HA and scb262 were found to occur in the same order on LG4 and LG8. In HGIII, the order of the cv038, Cir001 and Cir012 markers was the same on LG30 and LG40. In HGIV, the order of the Cir067, Cir036 and scb312 markers on LG42 was preserved on LG44. Gardiner et al. (1993), Aitken et al. (2005, 2014) and Oliveira et al. (2007) also reported the preservation of marker order in some LGs belonging to the same HG.

This is the first genetic map in which SNP markers segregating in a 1:2:1 fashion, and therefore being codominant, was included together with information about ploidy level. Of the 54 SDM SNPs, nine are linked on the map with segregation ratios of 1:2:1 (five) and 1:1 (four) and ploidy levels ranging from 6 to 12 (Table 1). These ploidy levels agree with chromosome number estimates for modern sugarcane cultivars, which fall between 6 and 14 (D’Hont et al. 1996; D’Hont 2005; Garcia et al. 2013).

Table 1 Description of the single-dose marker SNPs linked on the genetic map in the full-sib sugarcane progeny (IACSP95-3018 vs. IACSP93-3046), presenting the segregation ratio, ploidy level, linkage group (LG) and homology group (HG) for each SNP

Phenotypic analysis

Different VCOV models for the G and R matrices were compared in the phenotypic analysis, and the best one, i.e., that with the lowest AIC and BIC values, was selected (Table 2). For SD, SW, SH and Fiber, the best model for the G M matrix was Uns–AR1, which is based on the Kronecker direct product of the G L and G H matrices for the locations and the harvests, respectively. For Brix and Pol, the best model led to a uniform structure, considering each harvest-location combination as an environment. The models selected for SD, SW, SH and Fiber consider a particular genetic variance for each location and a specific covariance between different locations, whereas the correlation between harvest decay and time, i.e., consecutive harvests, are more correlated. Likewise, the model selected for Brix and Pol allows for homogeneous genetic variances across environments and a common genetic covariance between pairs of environments. Different models for the R M (non-genetic effects) matrices were also compared for all of the traits that were evaluated in this study. For SD, SW, SH and Fiber, \(R_{M} = R^{L} \otimes R^{H} \otimes R^{B}\), whereas for Brix and Pol, \(R_{M} = R^{L - H} \otimes R^{B}\). In this way, both adjusted BLUP means and variance components were obtained for each trait, allowing for the estimation of genetic parameters (Tables 3, 4).

Table 2 Most appropriated selected structures for the variance–covariance matrices of genetic (G M matrix) and non-genetic (R M matrix) effects used in the mixed model analysis considering each trait separately
Table 3 Estimation of genetic parameters: broad-sense heritability coefficient (h 2), genotypic variance (\(\sigma_{G}^{2}\)), phenotypic variance (\(\sigma_{P}^{2}\)), residual coefficient of variation (CV) and mean (µ), based on BLUP values for each trait separately in the full-sib sugarcane progeny (IACSP95-3018 vs. IACSP93-3046)
Table 4 Correlation matrix for all of the traits that were evaluated in the full-sib sugarcane progeny (IACSP95-3018 vs. IACSP93-3046) based on BLUP values

Broad-sense heritability coefficients were high: 80.02 for SD, 71.98 for SW, 69.58 for SL, 54.48 for Fiber, 53.99 for Brix and 53.20 for Pol. Heritability estimates were consistent with the coefficient of variation (CV), showing that the field experiments were conducted under good conditions. The highest CV value was associated with SW (21.033), as expected because this trait is strongly influenced by environmental conditions. The CV values were low for the other traits, ranging from 5.57 for Brix to 12.14 for SH.

A total of nine significant pairwise correlation coefficients were found. The highest significant phenotypic correlation was found for Brix and Pol (0.91). Intermediate significant genotypic correlations were reported for SD-SW (0.66), SH-SW (0.55), SD-Fiber (−0.39), SW-Fiber (−0.25), Fiber-Brix (0.25), and SW-Pol (0.22). The lowest significant correlation was observed between the traits SD and Pol (0.18) (Table 4).

QTL mapping

In total, 421 single-dose markers (1:1, 3:1 and 1:2:1) were considered in the QTL mapping procedure. Based on the information from these markers, a total of 25 QTLs were detected for SD, SW, SH, Fiber, Pol and Brix using the CIM approach and the integrated genetic map constructed in this study (Table 5, Fig. 1). For all of the evaluated traits in plant cane and ratoon cane, 1000 permutation tests were performed, resulting in LOD score thresholds of 4.40 for SD, 3.98 for SW, 4.31 for SH, 4.89 for Fiber, 5.75 for Pol, and 6.34 for Brix. QTLs were identified in 22 linkage groups and 10 distinct homology groups. As expected for sugarcane, the proportion of the phenotypic variance (R2) explained by each QTL was low, ranging from 0.069 to 3.87 %. Mapped QTL exhibited 1:2:1, 3:1 and 1:1 segregation ratios, which are also the marker segregation patterns available in our genetic map.

Table 5 Quantitative trait loci estimated with the composite interval mapping (CIM) model considering each trait separately in the full-sib sugarcane progeny (IACSP95-3018 vs. IACSP93-3046)
Fig. 1
figure 1

QTLs identified (inverted triangles) and associated with stalk diameter (SD), stalk weight (SW), stalk height (SH), fiber, Pol and Brix in the full-sib sugarcane progeny (IACSP95-3018 vs. IACSP93-3046)

Of these 25 QTLs, five were detected for SW, two for Pol, three for Brix, six for SD, four for SH and five for Fiber (Fig. 1; Table 5). The highest LOD score (9.48) was associated with the QTL SW.1, whereas the lowest score (4.49) was associated with the QTL SW.5. Most of the QTLs had a 1:1 segregation pattern, and the proportion of phenotypic variation (R2) explained for all QTLs were higher (7.11 %) for fiber content than for the remaining traits. QTL B.2 shows the lowest R2 value (0.069), and QTL F.2 shows the highest R2 value (3.87). QTL F.2 accounts for almost half of the phenotypic variance explained by the mapped QTL for fiber content and exhibits a 1:2:1 segregation pattern. This QTL was detected by an SNP marker (SugSNP_0729) and is classified as hexaploid (Table 1).

Discussion

QTL mapping is a useful tool for evaluating genotypes of importance for breeding programs because the inheritance of the majority of quantitative traits in sugarcane is complex. However, successful QTL mapping depends on the construction of reliable genetic linkage maps. In general, most of the 118 LGs present in our genetic map had reduced size (average length of 38.2 cM) and had few markers per LG (average of 3.57). This information if combined with the 241 markers that remained unlinked indicates that the genetic linkage map is still not saturated. This lack of saturation can probably be attributed to the low level of polymorphism found in some regions of the S. officinarum genome, from which a large part of the genome of modern sugarcane varieties originated as a consequence of the nobilization process (Alwala and Kimbeng 2010). Grivet et al. (1996), Ming et al. (1998), Hoarau et al. (2001), Aitken et al. (2005), Garcia et al. (2006), Oliveira et al. (2007), Palhares et al. (2012) and Pastina et al. (2012) also observed small LGs with few linked markers in their genetic maps. Although the genetic map not be completely saturated, the number of LGs agrees with the chromosome number expected in modern sugarcane varieties derived from S. officinarum (2n = 80) and S. spontaneum (2n = 40–128), which have a genome composed of approximately 70–80 % S. officinarum, 10–20 % S. spontaneum and 5–17 % recombinant chromosomes (D’Hont et al. 1996; Grivet et al. 1996; D’Hont and Glaszmann 2001).

When only single-dose polymorphisms are considered, gaps are expected. In our linkage map, gaps ranging from 30 to 39.7 cM were observed; these values are smaller than the 40 cM gaps reported by Hoarau et al. (2001). Another possible explanation for the large number of unlinked markers is the use of progeny derived from a cross between two commercial varieties. This type of cross has complex meiotic behavior, including aneuploidy and non-pairing of chromosomes, which could result in a large proportion of unlinked markers on the genetic map (Garcia et al. 2006). Using alleles derived from the same SSR or EST-SSR locus that was mapped on multiple LGs, we identified 16 HGs. This is significantly more than the expected basic number of chromosomes for the genus Saccharum, which falls between x = 8 and x = 10 (D’Hont et al. 1998; Irvine 1999; Grivet and Arruda 2001). Five HGs contained only two LGs. The small size of the linkage groups may reflect chromosome fragmentation that hinders correct HG grouping, further suggesting that the map is not saturated and reinforcing the need to use multiplex markers.

It is extremely difficult to determine the ploidy level of polymorphisms using molecular markers, such as AFLPs and SSRs, due to the dominant nature and dominance behavior in complex polyploids, respectively, of these markers. Garcia et al. (2013) suggested that only 30.5 % of all the markers in the sugarcane genome are SDMs when considering SNPs of all ploidy levels classified by SuperMASSA software. This result is in contrast to the results of others studies, such as Aitken et al. (2014), which reported that 83 % of all SNPs markers are in SDMs when not considering information about the ploidy level. Of the 531 SNPs genotyped in our mapping population, 54 (10.16 %) were SDMs, and only nine of these were linked on the genetic map. However, even if the analysis is restricted to SDM SNPs, knowing the ploidy level of the SNPs is advantageous for mapping the sugarcane genome because the ploidy level can indicate the number of chromosomes present in the homology group to which the SNP was mapped. This ploidy information can now be ascertained, improving the genetic mapping of sugarcane.

As shown in Table 1, SugSNP_0032 was estimated to have a ploidy level of 8 and mapped to LG47 in HGIV (Fig. 2). This HG comprises seven LGs; the ploidy level of 8 for SugSNP_0032 suggests that the HG is not yet saturated and will likely contain eight LGs when the map is saturated. SugSNP_0503 and SugSNP_0729 are found in HGII on LG25 (11.3 cM) and LG26 (8.3 cM), respectively, and have ploidy levels of 8 (SugSNP_0503) and 6 (SugSNP_0729). However, this HG has 17 LGs (Fig. 2). The LGs to which the SNP markers were mapped are very small, suggesting that this HG will be reorganized as the map becomes more saturated.

Fig. 2
figure 2

Two homology groups (HGs) with SNP markers (in red) mapped in the full-sib sugarcane progeny (IACSP95-3018 vs. IACSP93-3046). In HG IV, SugSNP_0032 was found on LG47. In HG II, SugSNP_0729 and SugSNP_0503 were mapped to LG25 and LG26, respectively

The variation in ploidy level (Table 1) of the SNPs mapped to LGs inside the HGs agrees with previously published data indicating that the sugarcane homology groups have different numbers of chromosomes (Grivet and Arruda 2001). An estimation of the ploidy level for each SNP is essential for further analysis because saturated linkage maps cannot be obtained for sugarcane if molecular markers with higher doses are not considered. The use of these multiplex markers widely distributed throughout the genome will probably contribute to increased linkage map coverage.

Because the association between genotype and phenotype in sugarcane is based on data from studies at different locations across multiple harvests, it is important to take into account appropriate assumptions regarding variance and covariance matrices for genotype effects and residual effects (Smith et al. 2007). The fitted VCOV structure for the genetic effects matrix for SD, SW, SH and Fiber was the Uns-AR1 model, showing that each interaction, namely, genotype-location-harvest (see Table 3 in results), is inherent for each location, whereas the correlation between harvests decays over time. By applying mixed models to data from trials performed in different environments, it is possible to detect heterogeneity in genetic variance and correlations between environments (Malosetti et al. 2013), allowing for a more realistic understanding of the genotypic effects. Analyses that account for joint-adjusted means, obtained via BLUPs, for different environments should improve the detection of significant marker-phenotype associations.

CIM offers several advantages over single-marker and interval mapping approaches (Zeng 1993, 1994; Jansen and Stam 1994). However, few studies have reported the use of CIM in QTL mapping for sugarcane yield-related traits (Aitken et al. 2008; Shing et al. 2013). Gazaffi et al. (2014) presented a model for outcrossing species using integrated maps. QTL mapping results are more informative when the CIM method is performed with an integrated genetic linkage map and the genotypes at QTL are obtained via multipoint conditional probabilities. This method was employed here, and it proved to be an excellent approach to QTL mapping in outbred species using full-sib progeny obtained from two non-inbred parents. Our QTL mapping results provided estimates of additive effects, dominance effects and segregation patterns for QTL, as well as linkage phases. Using this approach, it is possible to obtain the segregation pattern of QTLs, including those located in less informative regions. In addition, our results are very useful for marker-assisted selection because it is possible to identify, in each parent, alleles that contribute to increased or decreased values of a specific phenotypic trait.

Because our study involved field experiments with the same mapping population in two locations, a validation process ensured that the genotypes in both places were consistent. Although a low number of QTLs were detected, the confirmation of the relation between genotype and phenotype in both field locations gives us more confidence for the mapped QTL. A total of 25 QTLs were identified, including six for SD, five for SW, four for SH, five for Fiber, two for Pol and three for Brix. All of the QTLs for a given trait were located in different linkage groups. The values of the estimated heritability coefficients for Brix (53.99) and Pol (53.20) may have caused a decrease in the statistical power of the QTL mapping. Comparisons between QTL mapping results from different sugarcane studies are difficult due to differences in parental variety, experimental design and the environment, among other factors. However, several studies have reported the detection of a low number of QTLs associated with cane yield and sugar yield traits because all of the genetic maps in sugarcane use only single-dose markers and are not yet saturated (Sills et al. 1995; Ming et al. 2002a, b; Jordan et al. 2004; da Silva and Bressiani 2005; Aitken et al. 2006; Piperidis et al. 2008). By including additional markers as cofactors in the mapping model, the variation associated with QTLs located outside the mapping interval and the detection of false-positive QTLs were both reduced. Note that all QTLs were detected when the average of the environments (harvest location × harvest) was considered. Because we used BLUPs in the QTL mapping procedure, we cannot draw conclusions about QTL stability across different combinations of environments (i.e., location and harvest). Such claims can be made with mapping models that take the QTL-location-harvest interaction into account, as suggested by Pastina et al. (2012).

All of the QTLs showed significant additive and/or dominance effects (with a predominance of additive effects), indicating that the associated alleles contribute to the variation observed in the investigated traits. In particular, these effects were detected for IACSP95-3018 because it is a promising clone used as a parent in the IAC Sugarcane Breeding Program. The significant dominance effects detected for the QTLs SD.1, SD.5, P.2, B.1 and B.3 are important for understanding the genetic complexity of sugarcane. The percentage of phenotypic variation explained by each QTL is low, ranging from 0.069 (B.2) to 3.87 (F.2). Given the high level of polyploidy in sugarcane, individual QTLs are expected to have limited effects on phenotype. Hoarau et al. (2002) found R2 values ranging from 3 to 7 %; Aitken et al. (2006) reported variation between 2 and 8 %; and Aitken et al. (2008) observed R2 values between 2 and 10 %. Ming et al. (2002a) reported phenotypic variation with greater influence on evaluated traits, with R2 values ranging from 3.8 to 16.2 %. The inclusion of more molecular markers and the use of mapping models that consider multiple traits and multiple environments, as well as QTL dosage, would likely increase the proportion of phenotypic variation explained by each QTL.

The QTLs that had a 1:1 segregation pattern each exhibited only a significant effect (α p , α q or δ pq ), except for SH.3. Six QTLs displayed a 1:2:1 segregation pattern, of which three had two significant effects and three had only one significant effect. QTLs that had a 3:1 segregation pattern showed all significant effects (α p , α q and δ pq ) according to Gazaffi et al. (2014) and segregated in a 3:1 ratio. This information helps us understand the behavior of the QTL alleles in the progeny.

QTLs for correlated traits that map to neighboring or overlapping regions of the same linkage group are important for future investigations involving linkage and pleiotropy. SD and SW have a significant positive phenotypic correlation, and QTLs for both (SD.5 and SW.3) were detected in close proximity on LG75 (HGXI). These QTLs may actually be a single pleiotropic QTL. Brix and Pol also have a significant positive phenotypic correlation, and QTLs for both (P.2 and B.1) mapped to the same linkage group (LG46 and HGIV). The LOD profiles for these QTL were similar, possibly indicating a second pleiotropic QTL. QTLs for Fiber and Brix (F.2 and B.2) mapped to LG61 (HGVII). These traits had a significant positive phenotypic correlation.

Several factors hinder the dissection of polyploid genomes by QTL mapping, including the small number of markers; the use of SDMs only; the large proportion of unlinked markers, resulting in poorly saturated maps; and the absence of inbred lines, which reduces mapping accuracy. We used mixed models applied to phenotypic data, which allowed us to model VCOV structures for genetic effects to predict genotype values and to avoid an unbalanced data scenario. Moreover, QTL genotypes were estimated via multipoint conditional probabilities, providing advantages, such as an increase in the statistical power for QTL detection. We also mapped QTLs into an integrated genetic linkage map via a CIM approach that included additive and dominance effects, enabling us to estimate segregation patterns and linkage phases for all 25 QTL. The limits on the power of QTL detection could be overcome by the advent of single-nucleotide polymorphism (SNP) technology that uses a statistical innovation to interpret data from polyploids (Debibakas et al. 2014), moving from dominant markers to bi-allelic SNPs with ploidy and dosage information (Garcia et al. 2013). The SuperMASSA software (Serang et al. 2012) was able to infer ploidy level and allelic dosage for all SNPs. However, only the SNPs that were classified as SDMs were used for linkage map construction (Supplementary Material—Table S1). The linkage map did not provide a full coverage of the genome, and the QTL number was likely underestimated.

To the best of our knowledge, this study is the first to include SNPs segregating in a codominant pattern (1:2:1) for sugarcane, allowing a better integration of the map and better statistical tests for QTL. Presumably, the utilization of multiple dose markers will lead to more precise QTL localization and better estimates of QTL effects, segregation patterns and interactions, in addition to a more saturated genetic linkage map. Therefore, the development of new approaches that include ploidy and dosage information is necessary to better understand complex polyploid genomes. This knowledge could then be used as a first step to improve the methods for understanding the genetic and genomic mechanisms associated with agronomic traits.