基于3,366个基因组测序的鹰嘴豆遗传变异图

  我们使用HISEQ2500在基因组学和系统生物学卓越中心(Icrisat)进行了全球复合材料集合中的2,967个CICER加入的WGS。通过包括早期研究2的399条线的序列...

  我们使用HISEQ2500在基因组学和系统生物学卓越中心(Icrisat)进行了全球复合材料集合中的2,967个CICER加入的WGS 。通过包括早期研究2的399条线的序列数据 ,我们分析了3,366个加入(3,171个培养和195种野生物种的加入)(补充说明) 。   我们使用BWA-MEM31 v.0.7.15将来自3,366个鹰嘴豆配件的测序数据与CDC Frontier11的参考基因组对齐。根据GATK调用的最佳实践 ,使用GATK32 v.3.7进行SNP调用,从而创建基本SNP集。我们定义了其他两个SNP集:(i)set-a:只有带有的SNP <30% missing call, and biallelic calls, and (ii) Set-B: SNPs with less than 30% missing calls, biallelic calls, and LD-pruned using PLINK33 v.1.90 (“--indep-pairphase 50 10 0.2 ” parameter). Set-B SNPs were only used to depict the population genetic structure.   To determine the private and population-specific SNPs, the frequency of alleles within a given population was determined using VariantsToTable34 of GATK v3.8.1. We defined ‘private alleles’ as those present in at least four accessions within a population and absent in other populations, and ‘population-enriched alleles’ as those present in a given population (≥20%) and less frequent in other populations5 (≤2%).   LD decay was determined using the software PopLDdecay35 v.3.29 with the parameter “-MaxDist 1000 ”. Nucleotide diversity (π) was calculated from a 100-kb sliding window with a 10-kb step using VCFtools36 v.0.1.13. The average of all valid windows was considered the population genetic diversity. The fixation index (FST) was calculated from 100-kb non-overlapping windows using VCFtools. The global weighted FST was used to measure the differentiation of populations.   The chickpea draft genome of CDC Frontier11 (a kabuli variety; considered as the foundation genome) together with ICC 495812,37 (a desi genome sequence), a C. reticulatum genome13, and de-novo-assembled sequences from 3,171 cultivated and 28 C. reticulatum accessions were used to guide the assembly of the chickpea pan-genome using a conservative approach38. Following the alignment of reads from each accession to the reference, unmapped and dangling mapped read pairs were extracted using SAMTools39 v.1.2 based on the FLAG field. The extracted reads were de-novo-assembled using MEGAHIT40 v.1.2.9 with default parameters. To identify possible redundancies among assembled contigs that were already present in the foundation genome, the assembled contigs were aligned to the foundation genome using NUCmer41 v.4.0.0beta2 with the parameters “-l 20 -c 65” and the alignments with length ≥ 500 bp and identity of greater than 80% were extracted to be added into the intermediate pan-genome. The processes were performed one by one: ICC 4958, de-novo-assembled sequences from 3,171 cultivated accessions, the C. reticulatum genome, and de-novo-assembled sequences from 28 C. reticulatum accessions. Further, to identify redundancy among the ‘novel’ sequences, all-versus-all alignment was performed using CD-HIT42 v.4.81. The same process was performed for the next iteration until no sequence was left. Finally, we removed the potential containments from vectors, bacteria, viruses, animals, fungi and organelle sequences using BLASTN43 v.2.2.31 to the corresponding NT databases and obtained the final pan-genome. As a result, the CDC Frontier genome11 and novel assembled sequences were combined to construct the chickpea pan-genome.   A total of 2,258 cultivated and 22 C. reticulatum accessions (with sequence depth of greater than 10×) were used to identify structural variations against the reference genome of CDC Frontier11, such as large insertions, deletions, inversions, and intra- and inter-chromosomal translocations. The insertions, deletions and inversions were identified using a dual calling strategy through BreakDancer44 v.1.1.2 and Pindel45 v.0.2.5b9. First, BreakDancer was used to detect structural variations with parameter “-q 20 -y 20 -r 1”. Secondly, the output of BreakDancer was used as an input for Pindel using the parameter “-x 4 -breakdancer ” to increase the sensitivity and specificity. To merge the results from BreakDancer and Pindel, two structural variants with a distance between the two breakpoints of less than 100 bp were considered the same structural variation and merged. Owing to the inability of Pindel to detect intra- and inter-chromosomal translocations, only BreakDancer was used for their analysis. Furthermore, a structural variation was considered if it was present in at least 5% of the individuals in a given population.   For CNVs, we first generated a GC-content profile using gccount (http://bioinfo-out.curie.fr/projects/freec/src/gccount.tar.gz) with parameter “window = 1000 step = 1000” to normalize non-uniform read coverage of genomic position. Then, Control-FREEC46 v.11.0 was used to detect CNVs in 1-kb non-overlapping windows (bins) with parameter “ploidy = 2 window = 1000 step = 1000 mateOrientation=FR” for each high-depth individual (sequencing depth >10倍) 。接下来,将样品级副本编号组合在一起 ,以在队列级别为每个垃圾桶生成拷贝数的矩阵。为了进一步降低假阳性,我们以CNV速率小于1%的垃圾箱过滤了垃圾箱。通过存在CNV的重叠区域来鉴定受影响的基因 。   为了估算发散时间,使用Megahit40 v.1.2.9分别组装了195个野生物种加入 ,带有默认参数。Then, the ‘fabales’ genes were downloaded from the BUSCO17 database (odb10), which contains 5,366 single-copy orthologues to predict the genes for 195 wild species accessions, CDC Frontier genome11 and M. truncatula genome18 (as outgroup) using GeneWise47 v.2.4.1 with the parameters “-both -sum -genesf ”.根据195种野生物种的基因注释,为每个野生物种选择了一个平均编码序列(CD)长度最长的样本。提取了七个野生物种,CDC前沿和截骨菌的单拷贝基因的CDS序列 。对于每个单拷贝家族 ,使用默认参数的Muscle48 v.3.8.31进行了多个序列比对,并且使用gblocks49 v.0.91b和参数为“ -t = c”消除了默认参数,并且对齐不良和发散区域。将每个单拷贝家族的对齐基质组合在一起以构建超排列矩阵。最大似然树使用带有参数的RAXML50 v.8.2.12构建“ -f a -X 12345 -P 12345-#1000 -M GTRCATX” 。最后 ,通过McMctree51 v.4.4估算了差异时间,三个时间校准点(0.007-0.013 mAc。C.co。Sticulatum–C 。Arietinum,12.2-17.4 mA ,用于Arietinum – C. c. c. c. pinnatifidum ,pinnatifidum,pinnatifidum,c.0.0-44.0 ma ,c. arietinum-mm-c. arietinum – m c. 30.0-54.0 ma 。   为了评估195个野生加入和3,171条培养线之间的相关性,通过使用LD-Pruned SNP(-indeppairwise 50 10 0.2)的参数“ - 距离1-ibs ”基于状态(IBS)基于身份(IBS)的遗传距离矩阵进行了计算。在距离矩阵的基础上,然后在Phylip55 v.3.6中使用“邻居”构建邻居加入系统发育树。   进行了PCA ,以研究耕种鹰嘴豆配件之间的相关性和聚类 。使用eigensoft56 v.7.2.0估算了差异标准关系矩阵的前20个主要组件(PC),其中具有pseudomolecules上的LD-Proun-Prouned SNP上的默认参数。使用R软件包“ rworldmap”绘制了PCA结果(参考文献57)。   为了表征人群之间的变化,使用VCFTools v.0.1.13在10 kb/2kb滑动窗口中计算了人口分化统计(FST) 。以与杆计算相同的组合计算了一系列成对的FST。Tajima的D是使用100 kb非重叠窗口中的VCFTools(“ -tajimad 100000 ”)计算的。一个窗口被认为是杆和FST统计量的90%的90%的选择窗口 ,以及塔吉马的负值d值(小于-2) 。鉴定了位于选择窗口上的基因,并使用Fisher的精确测试进行了使用Fisher的精确测试,并使用Forfal Discovery Pipeleline58(https://sourceforge.net/project.net/projectss/enrichmentpipipeleline/)进行了这些候选基因的功能富集。   为了确定人口大小的历史和拆分时间 ,使用了SMC ++程序59 v.1.13.1。删除了超过20%的数据的个体 。我们构建了20个由150种基因型的随机数据集。对于20个数据集中的每个数据集,SMC ++的一代时间为一年,突变率为6.5×10-9(参考文献60)。为了避免由于长期纯合性而在估计中的潜在偏差 ,我们在150个样品中滤除了超过5 kb的纯合区域 。对于20个估计中的每一个 ,我们使用了5种不同的谱系组合,如前所述59 。然后,我们计算了每个时间点的20个独立估计值的中位数。   SWEED(V.3.3.1)分析是在CA1至CA8染色体上以前为61的。为了将计算时间和资源保持在合理的负担中 ,同时保守基因组区域可能是肯定选择的,考虑了每个地理区域的251个陆地的随机子样本,每个地理区域与2,439个地面成正比 。该分析沿每个SNP沿基因组计算每个子样本A CLR。我们对每个染色体使用10,000的网格值 ,大致对应于每9 kb计算CLR比率。我们考虑了每个样品的最高1%CLR值,并将其保留为候选SNP,以正选择两个样品中检测到的位置 。由于连锁不平衡 ,在SNP上检测到的高CLR值可能是由作用在附近基因上的选择引起的。因此,我们计算了一个间隔的列表,这些间隔可能是从选择中检测到的SNP列表中选择的 ,而无需指向特定的SNP,而是将所有SNP彼此之间包含。   通过SIFT 4G21 v.2.0.0预测核苷酸变异对蛋白质功能的影响 。鉴定出推定的有害突变,其筛分得分小于0.05。Medicago基因组被用作外群来识别鹰嘴豆基因组中的衍生等位基因。如前所述 ,通过计算每个基因组约束区域中存在的有害等位基因的衍生有害等位基因的数量来计算突变负担 。   全基因组关联研究(GWAS)分析使用394万基因组SNP和在2个季节和6个位置上产生的16个特征产生的表型数据进行。GWAS分析中仅使用了培养基因型中的双重SNP。此外 ,过滤的次要等位基因频率(MAF)截止值为0.05,缺失率为0.8,杂合度速率为0.1 。然后 ,使用具有过滤的HAPMAP文件和表型数据的混合线性模型进行标记性状关联(MTA)分析 。前三个PC用于控制人口结构。曼哈顿图和QQ图是从GWAS结果中产生的。P值为3.16×10-7,以将MTA视为显着 。   对于单倍型分析,我们根据以下标准保留了3,171条栽培鹰嘴豆线的SNP集合:(i)MAF> 0.001;(ii)每个SNP失踪电话的比例 < 30%. The haplotypes present within trait-associated genes were examined and only homozygous calls were considered for haplotype analysis. The identified haplotypes were visualized in Flapjack62 v.1.19.09.04.   For the haplo–pheno analysis, haplotypes carrying only one genotype were removed from the analysis. The accessions were categorized on the basis of haplotype groups, and together with phenotypic data, superior haplotypes were identified63. Haplotype-wise means for 100SW, days to flowering (DF) and YPP were compared to define superior haplotypes. Duncan’s multiple range test was used for statistical significance.   We used GEBV from the genomic prediction section for key production traits (YPP, 100SW, DF and days to maturity (DM)) to generate a genomic relationship matrix based on 754,576 SNPs. We used the breeding program implementation platform MateSel v.6.3 (http://matesel.une.edu.au) to generate an optimized mating design within desi, kabuli and intermediate types. The relative emphasis on the mean index versus co-ancestry was set by choosing the target degrees on the response surface24. We chose a target of 60 degrees to minimize the increase in population co-ancestry (maximize population genetic diversity) while achieving an acceptable rate of genetic gain. As this study aimed to maintain a diverse pre-breeding pool while making economic improvements, we followed the conservative approach for ‘evolving gene banks’ (ref. 23).   We generated unique economic indices for desi and kabuli chickpea, which were calculated on a US$ per ha basis and included yield (average GEBV for YPP over 9 sites) with a bonus price for large seeds (when average GEBV for 100SW over 9 sites exceeded the average for kabuli of +5.9 g) and earliness (average GEBV for DF and DM over 9 sites < 0 days). The base price for chickpea was assumed to be US$400 per tonne, and YPP was converted to an equivalent grain yield value per hectare by assuming that the mean YPP of 18 g per plant is equivalent to 1.8 tonnes per hectare. The index was also adjusted for a price bonus for large seeds and earliness as follows. The starting values for GEBV for 100SW are low in desi candidates (mean −4.0 g) and high in kabuli candidates (mean +5.9 g). Hence, the starting value for a price bonus for 100SW begins at GEBV + 5.9 g, and there is no bonus below this value. The price bonus per gram (GEBV 100SW > 5.9 g) is US$35 per gram, which is added to the base price. Similarly, a bonus was provided in price per tonne for GEBV earliness (average of GEBV DF and GEBV DM). The average GEBV earliness in the desi group was −1.6 days, and in the kabuli group was +2.4 days. The starting value for a price bonus for earliness begins at average GEBV 0 days; there is a bonus for negative values of US$10 per day added to the base price and no bonus for positive values.   As described previously25, three models, a basic model (E + L) with main effects of environments (E) and lines (L), a model (E + L + G) including the main effects of markers, and a genomic by environment interaction model (E + L + G + GE) were used. Three different SNP datasets (G1, cultivated accessions; G2, wild accessions; and G3, G1 + G2) were used as a genomic matrix (G), post-conventional quality controls on missing values (<20%) and MAF (>0.05). Phenotyping data for nine traits across 12 different year × location combinations were used. The Pearson’s correlation coefficient between observed phenotype and predicted genomic breeding value was used to estimate the accuracy of genomic prediction. Three different random cross-validation (CV) schemes, CV1 (evaluate the prediction accuracy of models when a certain percentage of lines are not observed in any environment), CV2 (estimates the prediction accuracy of models when some lines are evaluated in some environments but not in others) and CV0 (predicts an unobserved environment using the remaining environments as a training set) were used. CV1 and CV2 with fivefold cross-validation were implemented to generate the training and testing sets, and the prediction accuracy was assessed for each testing set. The permutation of the five subsets led to five possible training and validation datasets. This procedure was repeated 20 times, and 100 runs were performed for each trait–environment combination on each population. The same partition was used for the analysis of all the GS models. For CV0, each environment was predicted using the remaining environments. For fitting the GS models, the R package Bayesian Generalized Linear Regression (BGLR)64 v.1.0.7 was used.   For WhoGEM analysis, 1,318 accessions with the validated geographical location were selected and used as a reference dataset. The SNP dataset was filtered for missing (>0.1)和MAF(<0.01) and used for a detailed search with ADMIXTURE65 v.1.3.0 between K = 19 and K = 30 to identify the most likely number of admixture components. To confirm the admixture value, another method, DAPC (discriminant analysis of principal components), was used. The optimal number of admixture components in the WhoGEM method was obtained by comparing the predicted and recorded locations (ProvenancePredictor algorithm26) and fixed to K = 23.   A general linear model explored the relationships between the phenotypes and admixture components, and land types. A forward–backward algorithm was used to reduce the set of predictors to the most significant ones. The model is fitted on the whole dataset, and the significant factors are identified and conserved. A negative control (a model without any genetics (called environment-only)) is also fitted to the data. The models were fitted on the whole dataset, and the significant factors were identified and conserved.   A test of WhoGEM significance is given by a likelihood ratio test comparing the WhoGEM-based model and the environment-only-based model. The performances of the three models (full WhoGEM-based model, additive and environment-only model) are then evaluated using 100–300 replicates of a fivefold cross-validation scheme.   The SNP set was filtered, first by excluding all markers with more than two called alleles, missing (>10%)和MAF(<5%)。在进一步的分析中 ,随机选择了240万高质量SNP的240万个高质量SNP的124,833(20%)的子集。这些SNP用于构建LD块,并估计这些LD块的单倍型的局部GEBV 。以前的Report27中描述了用于计算LD块单倍型本地GEBV的方法的详细信息。   我们还运行了r-package rrblup(Ref.66)v.4.6.0中的脊回归最佳线性无偏见预测(BLUP)模型,以预测七个农艺性状的标记效应 ,然后概括所有基因组全基因组的LD LD块的每个观察到的单倍型的预测等位基因效应。最后,我们估计了每个LD块内的单倍型的局部GEBV之间的方差,以突出基因组中的区域 ,显示了与现场试验中测得的农艺性状的观察到的表型变异相关的分子变异 。   有关研究设计的更多信息可在与本文有关的自然研究报告摘要中获得。
http://http://www.o-press.com/news/show-238.html/sitemaps.xml http://http://www.o-press.com/news/show-236.html/sitemaps.xml http://http://www.o-press.com/news/show-271.html/sitemaps.xml http://http://www.0517kq.com/news/show-8269.html/sitemaps.xml http://http://www.o-press.com/news/show-99.html/sitemaps.xml http://http://www.0517kq.com/news/show-8177.html/sitemaps.xml http://http://www.o-press.com/news/show-94.html/sitemaps.xml http://http://www.o-press.com/news/show-206.html/sitemaps.xml http://http://www.0517kq.com/news/show-8249.html/sitemaps.xml http://http://www.0517kq.com/news/show-8246.html/sitemaps.xml

本文来自作者[qingdaomobile]投稿,不代表青鸟号立场,如若转载,请注明出处:https://www.qingdaomobile.com/zskp/202506-27375.html

(4)

文章推荐

  • 【阿普利亚150价位,阿普利亚150v】

    aDrilia是什么摩托车?1、这是aprilia。第二个字母是P不是D,它是意大利著名摩托车品牌,中文名是“阿普利亚”。阿普利亚摩托车于1962年在意大利威尼斯创立,2004年并入意大利另一著名摩托车企业比亚乔。视频中这款是阿普利亚GPR150,四冲程单缸水冷电喷6档150CC发动机,指导价格是

    2025年02月27日
    63
  • 长城全部车型(长城全部车型和价格)

    长城汽车SUV一共有几款1、长城汽车suv车型有欧拉iQ、WEYP哈弗H坦克300、哈弗H6新能源等等。欧拉iQ这款欧拉iQ散发着前卫感,全封闭的前脸设计,极具辨识度,并搭配两侧犀利的长LED大灯,整体充满气势。这款车的设计非常动感,溜背设计凸显了新车的跨界位置。车身长宽高分别为4445/173

    2025年03月20日
    57
  • 英伟达gpu没有替代吗(英伟达的gpu是哪家代工)

    算力芯片谁能替代英伟达的1、沐曦的GPU产品阵列包括针对AI推理的MXN系列、AI训练和通用计算的MXC系列,以及图形渲染的MXG系列。作为该公司产品家族的新成员,MXC500被视为能够与英伟达A100和A800算力芯片相媲美的产品,其FP32算力目标值达到了

    2025年04月27日
    47
  • 【勇气默示录流程攻略,勇气默示录全支线任务】

    求介绍勇气默示录2凡人职业解锁方法~~~全职业解锁方法一共24个职业除掉1个初始凡人和2个隐藏职业,其他都是可以在主线流程中获得的,2个隐藏也可以后面去补,不用担心遗漏。全职业解锁方法一共24个职业除掉1个初始凡人和2个隐藏职业,其他都是可以在主线流程中获得的,2个隐藏也可以后面去补,不用担心

    2025年05月18日
    37
  • 永州最好的大学/永州有什么名牌大学

    永州有什么大学1、永州共有三所大学,分别是:湖南科技学院:湖南科技学院是湖南永州市的最高学府,也是湖南省十三五硕士学位授予单位立项建设单位。该校的高等教育研究所被评为全国高校优秀高等教育研究机构,吸引了众多学子前来求学。永州职业技术学院:永州职业技术学院位于湖南省永州市,是一所国家示范性高等职业学

    2025年05月29日
    30
  • 【掌上修仙天师属性功法,掌上修仙天师属性功法怎么获得】

    求耽美小说《男后难当》,作者是公子如画。《霸道总裁的极品男宠》,作者是雪妖飞翔。《君倾我心》,作者是公子无泪。《臣本奸佞》,作者是长辰。《快穿:男主总是缠着我》,作者是从前有只兔子。《怎如倾城一顾》,作者是因为你不是我。《重生末世之未来》,作者是锦弋。《快穿之耽美之旅》,作者是码字卿。《耽美之生

    2025年06月02日
    27
  • 尾号限行北京2021(尾号限行北京2021什么时候轮换)

    2021北京市尾号限行规定是什么?1、年北京11月限号新规规定如下:限行规则:11月1日起,限行机动车车牌尾号具体为:周一,周二,周三,周四,周五。周周日不限行。外地车进京规定:自2021年11月1日起,外地车进京需办理进京通行证。2、车牌尾号限行规定:星期一至星期五限行机动车车牌尾号分别为5和0

    2025年06月04日
    42
  • 【疫情幕后真凶,疫情的凶手】

    困扰每一个家庭是什么意思1、“困扰每一个家庭”意思是:每个家庭都有这样的烦恼,感觉到很困惑。说到家庭这个问题,我想到了姐姐:我姐姐结婚比较早,大概是28岁左右吧,她老公是一个很老实的人,平时少言寡语,就是工作和生活两点一线,也很靠谱。2、困扰每一个家庭,意思是一件困难棘手的事,困扰着每一个家庭,我

    2025年06月07日
    34
  • 【轿车轮胎价格,高尔夫轿车轮胎价格】

    一般小轿车轮胎多少钱一个换个轮胎的价格在300800元之间,具体取决于轮胎的型号和品牌:一般家用经济型轿车轮胎:价格大约在300元左右。高端品牌轮胎:如米其林、普利司通等,价格会比一般轮胎贵30%左右。国产品牌轮胎:如朝阳、三角等,价格会比一般轮胎便宜30%左右。更换普通小汽车轮胎的费用因型号而异

    2025年06月08日
    25
  • 豫u是哪里的车牌号/豫u是哪里的车牌号简称

    豫U是哪里的车牌号豫U车牌号是河南省济源市的专属车牌标识。以下是关于豫U车牌号的详细说明:地区标识:豫U中的“豫”代表河南省,而“U”则是济源市在河南省内的车牌代码,用于唯一标识济源市的车辆。历史底蕴:济源市作为济水的发源地,拥有深厚的历史底蕴。豫U是河南省济源市的车牌号。以下是关于豫U车牌号的详

    2025年06月08日
    24

发表回复

本站作者后才能评论

评论列表(4条)

  • qingdaomobile
    qingdaomobile 2025年06月17日

    我是青鸟号的签约作者“qingdaomobile”!

  • qingdaomobile
    qingdaomobile 2025年06月17日

    希望本篇文章《基于3,366个基因组测序的鹰嘴豆遗传变异图》能对你有所帮助!

  • qingdaomobile
    qingdaomobile 2025年06月17日

    本站[青鸟号]内容主要涵盖:国足,欧洲杯,世界杯,篮球,欧冠,亚冠,英超,足球,综合体育

  • qingdaomobile
    qingdaomobile 2025年06月17日

    本文概览:  我们使用HISEQ2500在基因组学和系统生物学卓越中心(Icrisat)进行了全球复合材料集合中的2,967个CICER加入的WGS。通过包括早期研究2的399条线的序列...

    联系我们

    邮件:青鸟号@sina.com

    工作时间:周一至周五,9:30-18:30,节假日休息

    关注我们