Xu et al BMC Genomics (2021) 22:26 https://doi.org/10.1186/s12864-020-07321-3 RESEARCH ARTICLE Open Access Genome-wide association analysis reveals genetic variations and candidate genes associated with salt tolerance related traits in Gossypium hirsutum Peng Xu1,2, Qi Guo2, Shan Meng2, Xianggui Zhang2, Zhenzhen Xu2, Wangzhen Guo1* and Xinlian Shen2* Abstract Background: Cotton is more resistant to salt and drought stresses as compared to other field crops, which makes itself as a pioneer industrial crop in saline-alkali lands However, abiotic stresses still negatively affect its growth and development significantly It is therefore important to breed salt tolerance varieties which can help accelerate the improvement of cotton production The development of molecular markers linked to causal genes has provided an effective and efficient approach for improving salt tolerance Results: In this study, a genome-wide association study (GWAS) of salt tolerance related traits at seedling stage was performed based on years of phenotype identification for 217 representative upland cotton cultivars by genotyping-by-sequencing (GBS) platform A total of 51,060 single nucleotide polymorphisms (SNPs) unevenly distributed among 26 chromosomes were screened across the cotton cultivars, and 25 associations with 27 SNPs scattered over 12 chromosomes were detected significantly (−log10p > 4) associated with three salt tolerance related traits in 2016 and 2017 Among these, the associations on chromosome A13 and D08 for relative plant height (RPH), A07 for relative shoot fresh matter weight (RSFW), A08 and A13 for relative shoot dry matter weight (RSDW) were expressed in both environments, indicating that they were likely to be stable quantitative trait loci (QTLs) A total of 12 salt-induced candidate genes were identified differentially expressed by the combination of GWAS and transcriptome analysis Three promising genes were selected for preliminary function verification of salt tolerance The increase of GH_A13G0171-silenced plants in salt related traits under salt stress indicated its negative function in regulating the salt stress response Conclusions: These results provided important genetic variations and candidate genes for accelerating the improvement of salt tolerance in cotton Keywords: Gossypium hirsutum, Genome-wide association study, Genotyping-by-sequencing, Salt tolerance, Virusinduced gene silencing assay * Correspondence: moelab@njau.edu.cn; xlshen68@126.com State Key Laboratory of Crop Genetics & Germplasm Enhancement, Hybrid Cotton R & D Engineering Research Center, Ministry of Education, Nanjing Agricultural University, Nanjing 210095, China Provincial Key Laboratory of Agrobiology, Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Key Laboratory of Cotton and Rapeseed, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data Xu et al BMC Genomics (2021) 22:26 Background The competition for arable land between food crops and cotton (Gossypium spp.) has existed for a long time in China However, the emphasis on food security has inadvertently moved cotton production to more marginal soils in saline-alkali areas and coastal beaches due to the growing population Cotton, the largest source of textiles fiber in the world is more resistant to abiotic stresses such as high salt and drought stresses than other crops However, excessive salt in the soil can severely affect the growth and development of cotton plants [1], resulting in a reduction in fiber yield by as much as 60% [2] Therefore, breeding cotton varieties with improved salt tolerance could alleviate the conflict between food crops and cotton by reclaiming and utilizing saline-alkali coastal lands for production Similarly, in the northwestern inland cotton production region of China, the availability of salt tolerant varieties would expand the area of cotton production by promoting the synchronous growth of food crops and cotton The genetic architecture of salt tolerance is one of the most important subjects in plant science Salt tolerance is a complex quantitative trait, which is controlled by multiple genes and involves a variety of physiological and biochemical metabolic pathways in cotton In addition, the expression of each gene is sensitive to external environment In the current study, the main conventional breeding approaches for development of salt tolerant varieties are screening and collecting salt tolerant germplasm resources, then transferring elite loci by hybridization, composite hybridization and backcrossing methods The progress of conventional breeding of salt tolerance in upland cotton is impeded due to the lack of high salt tolerant genetic resources [3], low heritability [4], genetic complexity, and the difficulties in phenotyping [5] The development of molecular markers linked to causal genes for a trait has provided an effective and efficient approach for improving quantitative traits Once identified, markers linked to a quantitative trait locus (QTL) such as for salt tolerance can then serve as a selection tool for rapid and efficient marker-assisted selection (MAS) QTL mapping via bi-parental populations is an important method for quantitative trait research, and has been widely employed to map a number of traits including salt tolerance in various crops [6] In addition to QTL mapping, association mapping based on linkage disequilibrium (LD) is another approach for detecting molecular markers tightly linked with quantitative traits in a natural germplasm population Association analysis is time- and cost-effective, and can mine for genetic variations existed in the natural population, and more importantly, it takes advantage of the recombination information that has accumulated in the natural Page of 14 population over the long-term evolution process, thus achieving a higher mapping resolution, possibly identifying the causative genes [7, 8] Because of its effectiveness in QTL identification, association mapping has been widely used in crop species and plays an important role in molecular breeding [9–14] Genotyping-by-sequencing (GBS) is a reduced representation genotyping platform, and has a broad application in crop genetics and breeding [15–18] In general, the GBS approach begins with an enzymatic digestion to reduce genome complexity using barcoding restriction enzymes, then performs multiple sequencing of barcoding DNA fragments on the high-throughput nextgeneration sequencing platform A bioinformatics analysis of indexed sequence reads is followed to identify genetic variants Finally, genetic diversity analysis is carried out based on a sample-by-variant matrix [19] GBS is a powerful and cost-effective tool to assess variations across populations Since the first molecular marker-based genetic map of cotton published in 1994 [20], cotton scientists have identified a large number of QTLs regulating important agronomic traits, including fiber quality, yield, and disease resistance However, only limited reports have been published on salt tolerance in cotton [21–25] Herein, we reported the genome-wide association study (GWAS) of salt tolerant QTLs during the seedling stage performed over years of phenotyping on 217 representative upland cotton cultivars These results provided important genetic variations and candidate genes for accelerating the improvement of salt tolerance in cotton Results Phenotypic diversity analysis In order to evaluate the phenotypic variations of salt tolerance in the GWAS population with 217 upland cotton cultivars (Additional file 1: Table S1), three traits related to salt tolerance including relative plant height (RPH), relative shoot fresh matter weight (RSFW) and relative shoot dry matter weight (RSDW) were determined The Table Statistics of various traits related to salt tolerance Trait Range Mean SD CV% H2% RPH16 0.189–0.760 0.440 0.145 0.330 92.802 RPH17 0.169–0.780 0.445 0.143 0.321 RSFW16 0.136–0.762 0.411 0.150 0.365 RSFW17 0.129–0.794 0.420 0.167 0.398 RSDW16 0.137–0.756 0.497 0.143 0.288 RSDW17 0.172–0.776 0.521 0.158 0.303 91.873 83.870 RPH16 Relative plant height in 2016, RPH17 Relative plant height in 2017, RSFW16 Relative shoot fresh matter weight in 2016, RSFW17 Relative shoot fresh matter weight in 2017, RSDW16 Relative shoot dry matter weight in 2016, RSDW17 Relative shoot dry matter weight in 2017, SD Standard Deviation, CV Coefficient of Variation H2: Broad-sense heritability Xu et al BMC Genomics (2021) 22:26 Page of 14 mean values, ranges, standard deviations (SD), coefficients of variation (CV) and broad-sense heritability (H2) for these salt tolerance related traits are shown in Table Great differences of the CV in both years were found for the three traits Overall, the upland cotton cultivars in this GWAS panel clearly exhibited considerable natural variations in the three traits to salt tolerance and displayed very high genetic diversity Genetic diversity analysis GBS produced 53,321 high-quality polymorphic single nucleotide polymorphisms (SNPs) among the 217 upland cotton cultivars, with 47,133 SNPs located in the intergenic intervals and 6188 SNPs in the coding regions Of these, 95.8% of the loci (51,060 SNPs) were mapped onto 26 chromosomes of the cotton genome, and were selected for the GWAS analysis (Additional file 2: Table S2) These SNPs were not evenly distributed (Additional file 3: Fig.S1) with an average of 1964 SNPs per chromosome ranging from 863 to 5209 Chromosome A08 had the most SNPs (5209) with the highest SNP density (199 kb/SNP) D04 had the least SNPs (863), but A02 had the lowest SNP density with 680 kb/SNP The genetic diversity of this population varied from 0.224 (A08) to 0.389 (A13) The polymorphism information content (PIC) varied from 0.192 (A08) to 0.305 (A13) (Table 2) The above results indicated a relatively large span in genetic diversity index and PIC in the cotton genome Population structure and LD analysis It is important for GWAS analysis to control the effect of population structure, because population stratification could eliminate spurious associations between genotypes and phenotypes [26, 27] STRUCTURE software was used to calculate the Bayesian clustering from K = to Table Summary of polymorphic SNPs mapped in 26 chromosomes of Gossypium hirsutum Chr Chr length (Mb) No of SNPs SNP density (kb/SNP) Gene diversity PIC LD Decay (kb) r2 = 0.1 r2 = 0.2 A01 99.9 2111 473 0.294 0.235 1145 990 A02 83.5 1228 680 0.344 0.275 1215 903 A03 100.3 1490 673 0.321 0.26 1385 1035 A04 62.9 1138 553 0.314 0.255 1104 974 A05 92.1 2202 418 0.346 0.277 1354 1060 A06 103.2 3924 263 0.232 0.199 1377 1157 A07 78.3 1897 413 0.319 0.259 1258 959 A08 103.7 5209 199 0.224 0.192 1136 891 A09 75 1639 458 0.312 0.252 1299 1028 A10 100.9 2335 432 0.308 0.254 1191 957 A11 93.3 2027 460 0.257 0.215 1194 970 A12 87.5 1498 584 0.306 0.247 1183 915 A13 80 2996 267 0.389 0.305 1358 1058 D01 61.5 2385 258 0.318 0.261 880 435 D02 67.3 1914 352 0.321 0.26 712 599 D03 46.7 934 500 0.27 0.225 1046 803 D04 51.5 863 597 0.331 0.266 953 844 D05 61.9 1206 513 0.308 0.25 1204 816 D06 64.3 2454 262 0.281 0.235 1275 917 D07 55.3 2293 241 0.324 0.264 1124 817 D08 61.9 2463 251 0.306 0.248 909 523 D09 51 1643 310 0.286 0.239 1002 732 D10 63.4 1517 418 0.294 0.241 733 663 D11 66.1 1080 612 0.317 0.257 846 801 D12 59.1 1407 420 0.276 0.228 913 754 D13 60.5 1207 501 0.287 0.236 1325 988 Total 1931.1 51,060 427 0.303 0.248 1120 869 Chr Chromosome, PIC Polymorphism information content Xu et al BMC Genomics (2021) 22:26 10 for five repetitions LnP (D) value continued to increase from K = to K = 10 without a significant inflection point (Fig 1a) However, there was an obvious spike at the value of ΔK = (Fig 1b), suggesting that the population could be divided into subgroups (Fig 1c) Taking the corresponding Q matrix at k = as the covariate could reasonably eliminate spurious association effects and improve the GWAS accuracy The LD distribution among chromosomes of the 217 upland cotton cultivars was shown in Table The LD decay distance was 869 kb and 1120 kb when the r2 dropped to 0.2 and 0.1, respectively The LD decay distance was not evenly distributed among chromosomes, ranged from 435 kb (D01) to 1157 kb (A06) when r2 was set at 0.2 and ranged from 712 kb (D02) to 1377 kb (A06) when r2 was set at 0.1, the overall LD decay in the At subgenome was significantly higher than that in the Dt subgenome Association analysis In order to explore the genetic factors underlying salt tolerance, the mixed linear models (MLMs) were performed by simultaneously accounting for population structure and relative kinship matrix to conduct a GWAS A total of 25 significant associations (−log10p > 4) with 27 significant SNPs located on chromosomes Page of 14 A05, A07, A08, A09, A10, A11, A12, A13, D02, D03, D06, and D09 were detected for the three salt tolerance related traits in the 2016 and 2017 dataset Eleven associations with 12 SNPs were detected in 2016 (Fig 2) and nine associations with SNPs were detected in 2017 (Fig 3) In addition, five associations with SNPs were detected in both 2016 and 2017 dataset The phenotypic variance explained (PVE) by individual QTL ranged from 1.29 to 7.00% (Table 3) For RPH, nine significant associations with 10 SNPs were identified on chromosomes A07, A09, A10, A12, A13, D02, D03, and D08 The PVE ranged from 1.50 to 7.00% Moreover, the association with one SNP on chromosome A13 and the association with two SNPs on chromosome D08 were detected in both years For RSDW, eight significant associations with SNPs on A05, A08, A13, and D08 were detected The PVE ranged from 1.29 to 4.85% The association with one SNP on chromosome A08 and the association with one SNP on chromosome A13 were detected in both years For RSFW, eight associations with significant SNPs on A07, A08, A10, A11 and D06 were detected in 2016 and 2017 The PVE ranged from 2.35 to 5.25% The association with three SNPs on chromosome A07 was detected in both years Fig Population structure analysis of 217 cotton cultivars a Estimated LnP(K) of possible clusters (K) from to 10 b Delta K based on the rate of change of LnP(K) between successive K c Population structure of 217 upland cotton accessions based on STRUCTURE when K = Xu et al BMC Genomics (2021) 22:26 Page of 14 Fig Manhattan and Q–Q plots for salt tolerance related traits in 2016 The horizontal dotted lines of the Manhattan plots with black color represent the genome-wide significance threshold of 0.0001 Pleiotropy was also found in our GWAS results For example, the significant SNP A07_90,682,411 was simultaneously detected to be associated with RPH in 2017 and RSFW in both environments The significant SNP A10_84,786,908 was simultaneously detected to be associated with RPH in 2016 and RSFW in 2016 The significant SNP D08_49,014,753 and D08_49,080,865 simultaneously detected to be associated with RPH in 2016, 2017, and RSDW in 2016 In addition, the associations on A08 for RSFW and RSDW in 2016 were only 70 kb apart Their confidence interval overlapped with each other Identification and preliminary function verification of candidate genes In general, the LD decay could be used as confidence interval to identify candidate genes Because the cotton genome has a large LD decay [28, 29], we extracted potential candidate genes within 100 kb of flanking significant markers on the basis of the published upland cotton genome sequencing database [30] A total of 156 genes were identified in these intervals (Additional file 4: Table S3) Gene ontology (GO) enrichment analysis indicated that “dioxygenase activity”, “oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen” and “oxidoreductase activity, acting on single donors with incorporation of molecular oxygen” were significantly enriched using an false discovery rates (FDR) adjusted P-value of ≤0.05 as the cutoff Of the 156 genes, 12 were differentially expressed between salt tolerant variety Miscott7913–83 and salt sensitive variety Su 12 according to the previous transcriptome sequencing results [31] (Additional file 4: Table S3) Some of these genes may be associated with salt stress, such as GH_ A08G0488 and GH_A10G1620 encoding protein kinase Protein kinase has been proved to play an important role in salt tolerance in cotton [32] Another gene GH_ A13G0171, which encodes aquaporins (AQPs), was also likely to regulate the salt stress response [33] The confidence interval contains GH_A13G0171 was simultaneously detected in both years We found that the salt tolerance of upland cotton cultivars with the G haplotype was significantly higher than that of cultivars with the C haplotype in both years upon a t test (Fig 4) Xu et al BMC Genomics (2021) 22:26 Page of 14 Fig Manhattan and Q–Q plots for salt tolerance related traits in 2017 The horizontal dotted lines of the Manhattan plots with black color represent the genome-wide significance threshold of 0.0001 The three promising genes (GH_A08G0488, GH_ A10G1620 and GH_A13G0171) were selected for preliminary function verification of salt tolerance in cotton Analysis of gene expression patterns could provide important clues for gene function determination A quantitative real-time PCR (qRT-PCR) was performed to analyze the expression levels of GH_A08G0488, GH_ A10G1620 and GH_A13G0171 in roots and leaves under salt stress treatment in salt tolerant variety Miscott7913–83 and salt sensitive variety Su 12 As shown in Fig 5, the three genes were induced by salt stress and displayed distinct expression patterns in response to salt stress in salt tolerant variety Miscott7913–83 The three genes had a much higher expression level in roots than in leaves The gene GH_A13G0171 exhibited a significantly down-regulated expression in both root and leaf tissues after salt stress The gene GH_A08G0488 exhibited a significantly up-regulated expression in both root and leaf tissues The expression level of GH_A10G1620 showed an increase in leaf and no significant changes in the root tissues As shown in Fig 6, the three genes were also displayed distinct expression patterns in response to salt stress in salt sensitive variety Su 12 The gene GH_ A13G0171 exhibited an identical expression pattern in Miscott7913–83 and Su 12 The expression levels of GH_A10G1620 and GH_A08G0488 were not significant different To confirm the functional roles of GH_A08G0488, GH_A10G1620 and GH_A13G0171 genes under salt stress, virus-induced gene silencing (VIGS) assay was used to repress expression of these genes in salt tolerant variety Miscott7913–83 plants The inoculated seedlings were grown in three light incubators at 23 °C under a 16-h light and 8-h dark cycle as three biological replicates At the developmental period when two leaves had formed, the pTRV2:: GH_A08G0488, pTRV2:: GH_ A10G1620, pTRV2:: GH_A13G0171 and pTRV2:: 00 inoculated plants were treated with 350 mM NaCl After 15 days, the plant height, fresh and dry shoot matter weight were determined and the corresponding relative values were calculated The transcripts of the three genes in the VIGS leaves were significantly reduced compared to the negative control pTRV2:: 00 inoculated plants, indicating that they were effectively silenced (Additional file 5: Fig.S2, Fig 7a) Compared with the control pTRV2:: 00, no significance effect on Xu et al BMC Genomics (2021) 22:26 Page of 14 Table Summary of SNPs significantly associated with salt tolerance related traits Traits Chr Site Allele MAF RPH A10 84,786,908 C/T 0.15(T) A10 84,851,927 G/A A13 1,869,056 C/G A07 90,682,411 A09 RSFW RSDW -Log10(P) PVE Environment 4.27 5.67 2016 0.20(A) 4.10 4.31 0.33(C) 4.22, 4.36 4.41,4.93 2016, 2017 C/T 0.12(T) 4.35 4.58 2017 62,356,818 A/G 0.19(A) 4.13 7.00 2017 A12 74,098,493 G/A 0.49(A) 4.58 5.51 2017 D02 38,449,756 T/A 0.47(T) 4.01 1.50 2017 D03 18,961,034 T/G 0.11(G) 4.18 2.54 2017 D08 49,014,753 C/T 0.13(T) 4.34, 4.16 4.35,3.83 2016, 2017 D08 49,080,865 G/A 0.13(A) 4.34, 4.16 4.35,3.83 A08 5,515,194 C/T 0.10(T) 4.79 4.23 A10 84,786,908 C/T 0.15(T) 4.19 2.95 2016 D06 19,966,743 C/G 0.5(G) 4.09 3.56 2016 A07 90,532,061 C/G 0.13(G) 4.73 5.23 2017 A07 90,682,411 C/T 0.12(T) 4.13, 4.73 2.40, 5.25 2016, 2017 A08 41,804,418 T/A 0.09(T) 4.23 2.74 2017 A08 53,916,411 G/A 0.088(G) 4.15 2.35 2017 A11 117,938,337 G/A 0.32(A) 4.09 3.08 2017 A05 94,406,377 C/T 0.33(T) 4.17 2.92 2016 A08 5,589,045 G/A 0.19(A) 4.67 4.85 2016 A08 6,254,890 T/G 0.22(G) 4.36,4.66 3.52,4.33 2016,2017 A13 42,992,196 A/T 0.48(T) 4.27,4.28 2.79,3.03 2016,2017 D08 47,703,450 C/T 0.096(T) 4.35 1.93 2016 D08 48,527,413 C/A 0.12(A) 4.23 1.29 2016 D08 48,911,757 C/G 0.13(G) 4.44 2.62 2016 D08 49,014,753 C/T 0.13(T) 5.20 3.88 2016 D08 49,080,865 G/A 0.13(A) 5.20 3.88 2016 2016 2016, 2017 2016 Chr Chromosome, RPH Relative plant height, RSFW Relative shoot fresh matter weight, RSDW Relative shoot dry matter weight, MAF Minor Allele Frequency, PVE Phenotypic variance explained Rows with two values represent the two years data Fig Box plots for the phenotypic values of the association for RPH on chromosome A13 ** indicate statistical significance at the 0.01 probability level C: C haplotype G: G haplotype ... tolerance related traits in the 2016 and 2017 dataset Eleven associations with 12 SNPs were detected in 2016 (Fig 2) and nine associations with SNPs were detected in 2017 (Fig 3) In addition, five associations... representative upland cotton cultivars These results provided important genetic variations and candidate genes for accelerating the improvement of salt tolerance in cotton Results Phenotypic diversity analysis. .. genotyping platform, and has a broad application in crop genetics and breeding [15–18] In general, the GBS approach begins with an enzymatic digestion to reduce genome complexity using barcoding