上海交通大学学报(医学版) ›› 2017, Vol. 37 ›› Issue (11): 1575-.doi: 10.3969/j.issn.1674-8115.2017.11.022

• 技术与方法 • 上一篇    

基于基因 Panel 测序数据的分析方法

李剑峰 1,严天奇 2,崔博文 1,孔杰 3,王舒 1,陈冰 1,黄金艳 1   

  1. 1. 上海交通大学 医学院附属瑞金医院,上海血液学研究所, 医学基因组学国家重点实验室,上海 200025;2. 上海交通大学 系统生物医学教育 部重点实验室,系统生物医学研究院,上海 200240;3. 中国科学院上海生命科学研究院 / 上海交通大学医学院  健康科学研究所,上海 200031
  • 出版日期:2017-11-28 发布日期:2018-01-10
  • 通讯作者: 黄金艳,电子信箱:jinyan@shsmu.edu.cn
  • 作者简介:李剑峰(1993—),男,硕士生;电子信箱:lee_jianfeng@sjtu.edu.cn
  • 基金资助:
    国家自然科学基金(81570122,81770205);上海市教育委员会高峰高原学科建设计划(20161303)

Analysis method based on the gene-panel sequencing data

LI Jian-feng1, YAN Tian-qi2, CUI Bo-wen1, KONG Jie3, WANG Shu1, CHEN Bing1, HUANG Jin-yan1   

  1. 1. State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China;  2. Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China;  3. the Institute of Health Sciences, Shanghai Institutes for Biological Sciences of the Chinese Academy of Sciences / Shanghai Jiao Tong University School of Medicine, Shanghai 200031, China
  • Online:2017-11-28 Published:2018-01-10
  • Supported by:
    National Natural Science Foundation of China, 81570122; Shanghai Municipal Education Commission—Gaofeng Clinical Medicine Grant Support, 20161303

摘要: 目的 · 针对基因 Panel 测序数据整合多种二代测序分析方法,建立一套具备质量控制、基因突变检测的自动化分析及可视化 工具。方法 · 整合FastQC、Prinseq 等方法开发针对基因Panel 测序数据的质量控制和可视化R 包;BWA 或 TMAP 用于FASTQ 文件 与参考基因组映射;Lofreq、Varscan2、GATK、TVC 等用于基因突变检测得到含有基因突变信息的变异识别格式(VCF)文件;使 用 Annovar 完成基因突变注释。结果 · 完成 36 例急性髓系白血病患者 PGM 平台数据分析,在 2 例示例样本数据的 DNMT3A、TET2、 JAK2、PHF6、ASXL1、NPM1 和 CEBPA 基因中找到了 10 个经过一代测序验证的高可信度基因突变位点。结论 · 该分析方法整合和开 发了一系列用于基因 Panel 数据分析的工具,能有效完成基因 Panel 测序数据基因突变检测工作,降低检测假阳性率,并提高检测效 率,对基因 Panel 测序相关数据分析工作提供了有效支持。

关键词: 二代测序, 基因 Panel 测序, 质量控制, 基因突变检测, 可视化

Abstract:

Objective · To establish an integrative method for the gene-panel sequencing data to automatically complete quality control, detection of gene mutation and visualization.  Methods · Integrate several methods, e.g. FastQC, preprocessing and information of sequences (Prinseq) to develop an R package that can be used to visualize and control the quality of the raw sequencing reads and final mutations result. The sequencing reads mapped against to the reference genome using Burrows-Wheeler Alignment Tool (BWA)/Torrent Mapping Alignment Program (TMAP). Lofreq, Varscan2, the Genome Analysis Toolkit (GATK) and Torrent Variant Caller (TVC) were used to detect gene mutation and get the variant call format (VCF) format file. Annotate the gene mutation sites using Annovar.  Results · Thirty-six cases of acute myeloid leukemia sequencing from Ion Torrent Personal Genome Machine (PGM) platform were passed by this analysis tool. Ten mutation sites of 2 demo data were found in DNMT3A, TET2, JAK2, PHF6, ASXL1, NPM1 and CEBPA which were validated by sanger sequencing.  Conclusion · The analysis method that integrated and developed several tools for gene-panel sequencing data analysis can accomplish the gene-panel sequencing data analysis effectively. Besides, it can reduce the false positive ratio and improve the sensitivity of gene mutation detection that provides support for the analysis of gene-panel sequencing data.

Key words:  next-generation sequencing, gene-panel sequencing, quality control, detection of mutations, visualization

中图分类号: