我们的X染色体研究使我们能够开发一组综合方法,用于快速测序和分析整个人类基因组。我们对来自尼日利亚伊巴丹的男性约鲁巴人的基因组进行了测序(YRI,样本NA18507)。该样...
我们的X染色体研究使我们能够开发一组综合方法,用于快速测序和分析整个人类基因组。我们对来自尼日利亚伊巴丹的男性约鲁巴人的基因组进行了测序(YRI ,样本NA18507) 。该样本最初是通过社区参与过程为HAPMAP Project17,18收集的,并获得了知情同意,也已在其他项目中进行了研究。20,21。因此 ,我们能够将结果与来自同一样本的公开数据进行比较 。我们构建了两个库:一个简短插入(200 bp),具有与先前的X染色体库相似的特性,一个来自长片段(2 kb)的属性(2 kb)提供了较长范围的读取对信息(有关尺寸分布,请参见补充图11)。我们在8周(2007年12月至2008年1月)的六种GA1仪器中 ,平均每次生产3.3 GB(例如,请参见补充表1),在8周(2007年12月至2008年1月)的时间内生成了135 GB的序列(40亿配对35台读数;请参见补充表6)。大约消耗品成本(基于试剂的全额价格)为250,000美元。我们使用MAQ将97%的读数对齐 ,发现99.9%的人参考(NCBI Build 36.1)覆盖了一个或多个读数,平均深度为40.6倍 。使用Eland,我们将91%的读数与参考序列的93%的读数保持在足够的深度 ,以调用较强的共识(>三个Q30碱基)。映射的读取深度的分布接近随机,对于X染色体数据所见,略有过度分散。我们观察到在广泛的G+C含量上的综合表示 ,仅在极端的末端下降,但是与X染色体相比,分布模式不同(参见补充图12) 。
我们确定了400万个SNP ,其中74%匹配了DBSNP的先前条目(图3)。我们发现了SNP调用与基因分型结果的极好的一致性:基于序列的SNP调用涵盖了HM550的几乎所有552,710个基因座,测序与基因分型调用的一致性> 99.5%(表1和补充表7a)。少数分歧主要是低序列深度区域的杂合位置(GT> seq)的呼声,为我们提供了假阴性的率<0.35% from the ELAND analysis (see Table 1). The other disagreements (0.09% of all genotypes) included errors in genotyping plus apparent tri-allelic SNPs (Supplementary Table 7a). The main cause of genotype error (0.05% of all genotypes) is the existence of a second ‘hidden’ SNP close to the assayed locus that disrupts the genotyping assay, leading to loss of one allele and an erroneous homozygous genotype (Supplementary Figs 13 and 14).
To examine the accuracy of SNP calling in more detail, we compared our sequence-based SNP calls with 3.7 million genotypes (HM-All) generated for this sample during the HapMap project (Table 1 and Supplementary Table 7b)18 and found excellent concordance between the data sets. Disagreements included sequence-based under-calls of heterozygous positions in regions of low read depth. The slightly higher level of other disagreements (0.76%) seen in this analysis compared to that of the HM550 data (0.09%) is in line with the higher level of underlying genotype error rate of 0.7% for the HapMap data18. To refine this analysis further, we generated a set of 530,750 very high confidence reference genotypes comprising concordant calls in both the HM550 and HM-All genotype data sets. Comparing the results of the MAQ analysis to this high confidence set (see Table 1), we found 130 heterozygote under-calls GT>SEQ(即假阴性率为0.025%) 。还有130个杂合子过度呼叫seq> gt,但是其中大多数可能是基因型错误 ,因为82个附近的“隐藏” SNP和3个附近的Indel。另外41个是三平台基因座,通过测序最多可以留下4个潜在的错误调用(即,假阳性速率为每529,589个位置4)。最后 ,我们从序列数据中选择了新的SNP调用子集,并通过基因分型进行了测试 。我们发现序列和基因型调用之间的一致性为96.1%(补充表8)。但是,这47个分歧包括10个正确的测序调用(由于隐藏的SNP而导致的基因分型呼叫)和7种测序下接来的呼叫。因此 ,在此基础上,一百万个新型SNP的假阳性发现率为2.5%(1,206个中的30个) 。对于本分析中检测到的四百万个SNP的整个数据集,假阳性和负率均平均<1%.
This genome from a Yoruba individual contains significantly more polymorphism than a genome of European descent. The autosomal heterozygosity (π) of NA18507 is 9.94 × 10-4 (1 SNP per 1,006 bp), higher than previous values for Caucasians (7.6 × 10-4, ref. 12). Heterozygosity in the pseudoautosomal region 1 (PAR1) is substantially higher (1.92 × 10-3) than the autosomal value. PAR1 (2.7 Mb) at the tip of the short arm of chromosomes X and Y undergoes obligatory recombination in male meiosis, which is equivalent to 20× the autosome average. This illustrates a clear correlation between recombination and nucleotide diversity. By contrast, the 0.33-Mb PAR2 region has a much lower recombination rate than PAR1; we observed that heterozygosity in PAR2 is identical to that of the autosomes in NA18507. Heterozygosity in coding regions is lower (0.54 × 10-3) than the total autosome average, consistent with the model that some coding changes are deleterious and are lost as the result of natural selection22. Nevertheless, the 26,140 coding SNPs (Supplementary Fig. 15) include 5,361 non-conservative amino acid substitutions plus 153 premature termination codons (Supplementary Table 9), many of which are expected to affect protein function.
We performed a genome-wide survey of structural variation in this individual and found excellent correlation with variants that had been reported in previous studies, as well as detecting many new variants. We found 0.4 million short indels (1–16 bp; Supplementary Fig. 16), most of which are length polymorphisms in homopolymeric tracts of A or T. Half of these events are corroborated by entries in dbSNP, and 95 of 100 examined were present in amplicons sequenced from this individual in ENCODE regions, confirming the high specificity of this method of short indel detection. For larger structural variants (detected by anomalously spaced paired ends) we found that some were detected by both long and short insert data sets (Supplementary Fig. 17a), but most were unique to one or other data set. We observed two reasons for this: first, small events (<400 bp) are within the normal size variance of the long insert data; second, nearby repetitive structures can prevent unique alignment of read pairs (see Supplementary Fig. 17b, c). In some cases, the high resolution of the short insert data permits detection of additional complexity in a structural rearrangement that is not revealed by the long insert data. For example, where the long insert data indicate a 1.3-kb deletion in NA18507 relative to the reference, the short insert data reveal an inversion accompanied by deletions at both breakpoints (Fig. 4). We carried out de novo assembly of reads in this region and constructed a single contig that defines the exact structure of the rearrangement (data not shown).
We discovered 5,704 structural variants ranging from 50 bp to >与参考基因组相比 ,Na18507基因组缺乏序列的35 kb。我们观察到这种类型的事件数量稳步减少,大小增加,除了两个峰(补充图18)。在300–350 bp处的大峰代表的大多数事件都包含一系列Aluy家族。这与参考基因组中存在但NA18507基因组中缺少的短散布核元件(SIN)的插入一致 。同样 ,在6-7 kb处的第二个较小的峰是在许多情况下插入长插入式核元件(线)L1 HOMO SAPIEN(L1HS)的结果。我们发现结果与参考数据之间的良好对应关系。23,根据异常的fosmid配对端间距,该人在该个体中报告了148个<100 kb的缺失 。我们发现了其中111个事件的支持证据。我们在60-160 bp的范围内检测到另外2,345个插入,这是NA18507基因组中存在的序列 ,并且参考基因组中没有(补充图19)。一个示例显示在补充图20中 。事件的任一侧读取的“单身人士 ”读取,其伴侣与参考的伙伴不符,构成了从头组装的一部分 ,这些组件的一部分精确地定义了新的序列和断点(补充图21)。
http://http://www.0517kq.com/news/show-8128.html/sitemaps.xml http://http://www.o-press.com/news/show-272.html/sitemaps.xml http://http://www.o-press.com/news/show-47.html/sitemaps.xml http://http://www.o-press.com/news/show-117.html/sitemaps.xml http://http://www.o-press.com/news/show-375.html/sitemaps.xml http://http://www.0517kq.com/news/show-8381.html/sitemaps.xml http://http://www.o-press.com/news/show-274.html/sitemaps.xml http://http://www.0517kq.com/news/show-8270.html/sitemaps.xml http://http://www.o-press.com/news/show-330.html/sitemaps.xml http://http://www.0517kq.com/news/show-8316.html/sitemaps.xml
本文来自作者[qingdaomobile]投稿,不代表青鸟号立场,如若转载,请注明出处:https://www.qingdaomobile.com/life/202506-27290.html
评论列表(4条)
我是青鸟号的签约作者“qingdaomobile”!
希望本篇文章《使用可逆终结剂化学精确的整个人类基因组测序》能对你有所帮助!
本站[青鸟号]内容主要涵盖:国足,欧洲杯,世界杯,篮球,欧冠,亚冠,英超,足球,综合体育
本文概览: 我们的X染色体研究使我们能够开发一组综合方法,用于快速测序和分析整个人类基因组。我们对来自尼日利亚伊巴丹的男性约鲁巴人的基因组进行了测序(YRI,样本NA18507)。该样...