上海交通大学学报(医学版) ›› 2021, Vol. 41 ›› Issue (3): 285-296.doi: 10.3969/j.issn.1674-8115.2021.03.001

• 创新团队成果专栏 • 上一篇    下一篇

基于转录组异常表达构建结直肠癌特征基因预后风险评分模型

包汝娟(), 陈慧芳, 董宇, 叶幼琼(), 苏冰()   

  1. 上海交通大学医学院上海市免疫学研究所,上海 200025
  • 收稿日期:2020-11-16 出版日期:2021-03-28 发布日期:2021-04-06
  • 通讯作者: 叶幼琼,苏冰 E-mail:brj@shsmu.edu.cn;youqiong.ye@shsmu.edu.cn;bingsu@sjtu.edu.cn
  • 作者简介:包汝娟(1991—),女,硕士;电子信箱:brj@shsmu.edu.cn
  • 基金资助:
    上海交通大学医学院高水平地方高校创新团队(SSMU-ZDCX20180300)

Construction of prognostic risk score model of colorectal cancer gene signature based on transcriptome dysregulation

Ru-juan BAO(), Hui-fang CHEN, Yu DONG, You-qiong YE(), Bing SU()   

  1. Shanghai Jiao Tong University School of Medicine, Shanghai Institute of Immunology, Shanghai 200025, China
  • Received:2020-11-16 Online:2021-03-28 Published:2021-04-06
  • Contact: You-qiong YE,Bing SU E-mail:brj@shsmu.edu.cn;youqiong.ye@shsmu.edu.cn;bingsu@sjtu.edu.cn
  • Supported by:
    Innovative Research Team of High-Level Local Universities in Shanghai(SSMU-ZDCX20180300)

摘要:

目的·构建结直肠癌(colorectal cancer,CRC)预后风险评分模型,分析不同评分CRC患者间显著差异的肿瘤特征信号通路或生物过程,并预测该模型对其他癌症患者的免疫治疗效果。方法·从公共数据库中收集8个独立的CRC微阵列数据集和2个CRC RNA-seq数据集,筛选每个CRC 数据集中的差异表达基因(differentially expressed genes,DEGs)。基于数据集共有的DEGs,采用单因素Cox 回归模型筛选与不良预后相关的基因,采用套索(LASSO)回归和多因素Cox 回归模型构建CRC预后风险评分模型。依据风险评分,将患者分为高风险组和低风险组。使用受试者操作特征曲线的曲线下面积(area under the curve,AUC)和Kaplan-Meier(KM)生存分析对模型性能进行评价。采用多因素Cox回归模型分析风险评分是否为CRC的独立预后因素。利用基因集富集分析(gene set enrichment analysis,GSEA)探究高、低风险组CRC患者在肿瘤特征基因集相关通路中的差异。通过KM生存分析和χ2检验预测其他癌症患者的免疫治疗效果,以评估模型的应用价值。结果·单因素Cox回归分析,从不同数据集共有的DEGs中获得16个与不良预后相关的基因;以此为基础,构建了包含8个特征基因的CRC预后风险评分模型。该模型在训练集(AUCmax=0.788)、内外部验证集(AUC均值>0.600)中展现了中等程度的准确性,其低风险组患者的生存率均高于高风险组。多因素Cox回归分析显示,风险评分可作为CRC的独立预后因素。GSEA结果显示,肿瘤特征基因集相关通路在高风险组患者中显著富集。KM生存分析和χ2检验结果显示,低风险组的其他癌症患者具有更高的生存率及更好的免疫治疗效果。结论·成功构建了含8个特征基因的CRC风险评分预后模型,可为改善CRC患者预后、预测其他癌症患者的免疫治疗效果提供参考。

关键词: 结直肠癌, 套索回归, Cox回归模型, 特征基因, 预后模型

Abstract:

Objective·To construct colorectal cancer (CRC) prognostic risk score model, analyze the significant differences of cancer hallmark signaling pathway or biological process among CRC patients with different scores, and predict the immunotherapy effect of the model on other cancer patients.

Methods·Eight independent CRC microarray datasets and two CRC RNA-seq datasets were collected from a public database. Differentially expressed genes (DEGs) in each CRC dataset were screened. Based on DEGs with intersection from different datasets, univariate Cox regression model was used to screen the genes associated with adverse prognosis. LASSO regression and multivariate Cox regression models were used to construct CRC prognostic risk score model. According to the risk scores, the patients were divided into high risk group and low risk group. The area under the curve (AUC) of receiver operator characteristic curve and Kaplan-Meier (KM) survival analysis were used to evaluate the model performance. Multivariate Cox regression model was used to analyze whether risk score was an independent prognostic factor for CRC. Gene set enrichment analysis (GSEA) was used to analyze the differences of cancer hallmark gene sets-related pathways between the CRC patients in the high risk group and low risk group. KM survival analysis and chi-square test were used to predict the immunotherapy effect of other cancer patients, so as to evaluate the application value of CRC prognostic risk score model.

Results·Univariate Cox regression analysis showed that 16 genes associated with adverse prognosis were obtained from DEGs with intersection from different datasets. Based on this, a CRC prognostic risk score model containing 8 gene signatures was constructed. In the training set (AUCmax=0.788) and internal/external validation sets (AUCmean>0.600), the model displayed moderate accuracy, and the patients in the low risk group of all the above sets had significantly higher survival rate than those in the high risk group. Multivariate Cox regression analysis showed that risk score was an independent prognostic factor for CRC. GSEA results showed that cancer hallmark gene sets-related pathways were significantly enriched in CRC patients of the high risk group. KM survival analysis and chi-square test showed that other cancer patients in the low risk group had higher survival rate and better immunotherapy effect.

Conclusion·The CRC risk score prognosis model containing 8 gene signatures is successfully constructed, which can provide reference for improving the prognosis of CRC patients and predicting the immunotherapy effect on other cancer patients.

Key words: colorectal cancer (CRC), LASSO regression, Cox regression model, gene signature, prognosis model

中图分类号: