目的 · 针对基因 Panel 测序数据整合多种二代测序分析方法,建立一套具备质量控制、基因突变检测的自动化分析及可视化 工具。方法 · 整合FastQC、Prinseq 等方法开发针对基因Panel 测序数据的质量控制和可视化R 包;BWA 或 TMAP 用于FASTQ 文件 与参考基因组映射;Lofreq、Varscan2、GATK、TVC 等用于基因突变检测得到含有基因突变信息的变异识别格式(VCF)文件;使 用 Annovar 完成基因突变注释。结果 · 完成 36 例急性髓系白血病患者 PGM 平台数据分析,在 2 例示例样本数据的 DNMT3A、TET2、 JAK2、PHF6、ASXL1、NPM1 和 CEBPA 基因中找到了 10 个经过一代测序验证的高可信度基因突变位点。结论 · 该分析方法整合和开 发了一系列用于基因 Panel 数据分析的工具,能有效完成基因 Panel 测序数据基因突变检测工作,降低检测假阳性率,并提高检测效 率,对基因 Panel 测序相关数据分析工作提供了有效支持。
Objective · To establish an integrative method for the gene-panel sequencing data to automatically complete quality control, detection of gene mutation and visualization. Methods · Integrate several methods, e.g. FastQC, preprocessing and information of sequences (Prinseq) to develop an R package that can be used to visualize and control the quality of the raw sequencing reads and final mutations result. The sequencing reads mapped against to the reference genome using Burrows-Wheeler Alignment Tool (BWA)/Torrent Mapping Alignment Program (TMAP). Lofreq, Varscan2, the Genome Analysis Toolkit (GATK) and Torrent Variant Caller (TVC) were used to detect gene mutation and get the variant call format (VCF) format file. Annotate the gene mutation sites using Annovar. Results · Thirty-six cases of acute myeloid leukemia sequencing from Ion Torrent Personal Genome Machine (PGM) platform were passed by this analysis tool. Ten mutation sites of 2 demo data were found in DNMT3A, TET2, JAK2, PHF6, ASXL1, NPM1 and CEBPA which were validated by sanger sequencing. Conclusion · The analysis method that integrated and developed several tools for gene-panel sequencing data analysis can accomplish the gene-panel sequencing data analysis effectively. Besides, it can reduce the false positive ratio and improve the sensitivity of gene mutation detection that provides support for the analysis of gene-panel sequencing data.