上海交通大学学报(医学版)

• 论著(基础研究) • 上一篇    下一篇

转录组测序数据中cSNP和表达差异基因的分析方法

李少波,傅国辉   

  1. 上海交通大学 基础医学院病理学教研室, 上海 200025
  • 出版日期:2014-02-28 发布日期:2014-03-25
  • 通讯作者: 傅国辉, 电子信箱: fuguhu@263.net。
  • 作者简介:李少波(1987—), 男, 硕士生; 电子信箱: smss@sjtu.edu.cn。
  • 基金资助:

    国家自然科学基金(81171939)

RNA-Seq based analysis on cSNP and gene expression level

LI Shao-bo, FU Guo-hui   

  1. Department of Pathology, Basic Medical College, Shanghai Jiao Tong University, Shanghai 200025, China
  • Online:2014-02-28 Published:2014-03-25
  • Supported by:

    National Natural Science Foundation of China, 81171939

摘要:

目的 确立本次转录组测序数据中编码区单核苷酸多态性(cSNP)和表达差异基因的分析方法,筛选出可能导致蛋白质功能改变的单核苷酸多态性(SNP)位点和不同表型细胞中存在的表达差异基因。方法 对正常培养的胃癌细胞系MKN28和SGC7901进行RNA测序(RNA-Seq),将测序数据与参考基因组进行比对,对测序的reads数、测得的基因数、MKN28和SGC7901中各自表达上调的基因数、SNP数及可变剪接形式进行统计学分析。运用在线的软件和数据库并结合计算机编程,对2株胃癌细胞系转录组测序数据中的SNP进行筛选和功能预测;对2株细胞中表达差异基因GO聚类结果进行分析比较。结果 筛选并预测了8种类别709种基因的SNP,分析出了6个经预测能够导致蛋白功能改变的SNP位点。对表达差异基因的分析得到了丝氨酸/苏氨酸蛋白激酶在2株细胞中的表达情况;经Western blotting和PCR验证了部分分析结果。结论 确立了1种转录组测序后cSNP数据的分析方法,该方法能够对大量SNP数据进行高效筛选和分析;通过聚类分析后再比较得到了一组在MKN28中高表达而在SGC7901中低表达的蛋白激酶基因;这些结果为后续实验提供了依据。

关键词: 编码区单核苷酸多态性, 转录组, RNA测序, 表达差异基因, 胃癌

Abstract:

Objective To establish the analytical method for cSNP and gene expression difference based on transcriptome RNA-Seq data, and to screen SNP loci that may alter protein functions and gene expression difference among different cell phenotypes. Methods RNA-Seq was performed for normal cultured gastric cancer cell lines MKN28 and SGC7901. The sequencing data was then compared with the reference genome and the statistic analysis was conducted for the number of reads, sequenced genes, upregulated genes of MKN28 and SGC7901, and SNP and variable splicing patterns. Online software, database and computer programming were combined to screen and predict functions of SNP in transcriptome sequencing data of two gastric cancer cell lines, and to perform analysis and comparison for the GO clustering results of differentially expressed genes. Results The SNP of 709 genes belonging to 8 different gene terms were screened and predicted and 6 cSNPs that could cause protein functional alterations were identified. The expression of serine/threonine kinase in two cell lines were obtained by analyzing gene expression differences. Some of the analytical results were confirmed by the Western blotting and PCR. Conclusion An analytical method for cSNP data of transcriptome sequencing is established. This method can efficiently screen and analyze massive SNP data. A set of protein kinase genes with high expression in MKN28 and low expression in SGC7901 are obtained by clustering analysis and comparision. These results are basis for further experiments.

Key words: cSNP, transcriptome, RNA-Seq, gene expression difference, gastric cancer