上海交通大学学报(医学版) ›› 2021, Vol. 41 ›› Issue (2): 159-165.doi: 10.3969/j.issn.1674-8115.2021.02.006

• 论著·基础研究 • 上一篇    下一篇

基于单细胞RNA测序的结直肠癌预后预测模型的建立和验证

马燕如1(), 季林华2, 童天颖1, 严宇青1, 沈超琴1, 张昕雨1, 曹颖颖1, 洪洁1, 陈豪燕1()   

  1. 1.上海交通大学医学院附属仁济医院消化科,上海 200001
    2.上海交通大学医学院附属仁济医院胃肠外科,上海 200001
  • 收稿日期:2020-05-21 出版日期:2021-02-28 发布日期:2021-02-28
  • 通讯作者: 陈豪燕 E-mail:mayanru0213@163.com;haoyanchen@shsmu.edu.cn
  • 作者简介:马燕如(1995—),女,硕士生;电子信箱:mayanru0213@163.com
  • 基金资助:
    上海市教育委员会高峰高原学科建设计划(20161309);上海交通大学医学院高水平地方高校创新团队(SSMU-ZLCX20180200)

Establishment and validation of prognostic prediction model of colorectal cancer based on single-cell RNA sequencing

Yan-ru MA1(), Lin-hua JI2, Tian-ying TONG1, Yu-qing YAN1, Chao-qin SHEN1, Xin-yu ZHANG1, Ying-ying CAO1, Jie HONG1, Hao-yan CHEN1()   

  1. 1.Department of Gastroenterology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200001, China
    2.Department of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200001, China
  • Received:2020-05-21 Online:2021-02-28 Published:2021-02-28
  • Contact: Hao-yan CHEN E-mail:mayanru0213@163.com;haoyanchen@shsmu.edu.cn
  • Supported by:
    Shanghai Municipal Education Commission—Gaofeng Clinical Medicine Grant Support(20161309);Innovative Research Team of High-Level Local Universities in Shanghai(SSMU-ZLCX20180200)

摘要:

目的·基于单细胞RNA测序(single cell RNA sequence,scRNA-seq)技术构建结直肠癌预后预测模型。方法·利用GEO(Gene Expression Omnibus)数据库获取结直肠癌样本的scRNA-seq数据集,筛选与结直肠癌转移相关的差异基因作为预测模型的候选基因,运用套索回归算法(LASSO)、Logistic回归和Kaplan-Meier生存分析进一步在癌症基因组图谱(The Cancer Genome Atlas,TCGA)数据库中筛选及验证与结直肠癌预后相关的基因集,并建立结直肠癌预后预测模型。通过决策曲线分析和受试者工作特征(receiver operating characteristic,ROC)曲线评估预测模型在临床应用中的价值。结果·利用GEO数据库获取的scRNA-seq数据筛选出30个差异表达基因,进一步在TCGA数据库中利用LASSO回归得到9个关键基因,并以此对每例患者的关键基因表达进行评分。分别在训练集和验证集中对复发和未复发患者的评分进行比较,差异均有统计学意义(P<0.05)。采用 Logistic回归分析将肿瘤原发灶分级(T stage)和是否发生远处转移(M stage)2个独立的临床变量纳入评分-临床变量整合模型。对评分-临床变量整合模型的实际预测价值进行评估,ROC曲线在训练集和验证集的曲线下面积分别为0.775和0.705。结论·基于scRNA-seq结果,构建了较为稳定的结直肠癌预后预测模型,可供临床评估患者预后参考。

关键词: 单细胞RNA测序, 结直肠癌, 预后, 生物信息学

Abstract:

Objective·To establish a model for predicting the prognosis in patients with colorectal cancer (CRC) using single cell RNA sequencing (scRNA-seq).

Methods·scRNA-seq data of patients with CRC from Gene Expression Omnibus (GEO) database was used to filter out candidate genes, which were related to metastatic CRC. The least absolute shrinkage and selection operator (LASSO) regression, Logistic regression and Kaplan-Meier analysis were used to select and evaluate the significance of the hub gene filtered out in The Cancer Genome Atlas (TCGA) database, and to develop the prognostic prediction model of CRC. Decision curve analysis and receiver operating characteristic (ROC) curve were used to assess the clinical use of the prediction model.

Results·Thirty candidate genes were filtered out from the scRNA-seq data which was downloaded in GEO database, and then 9 hub genes were selected by LASSO regression in the TCGA database. The hub-gene expression was scored for each patient. The scores had significant difference between the groups with and without recurrence both in the training set and the validation set (P<0.05). In addition, Logistic regression analysis was carried out to incorporate the two independent clinical variables of primary tumor grade (T stage) and metastasis status (M stage) into the score-clinical variable integration model. Area under curve of the ROC curve in the training set and validation set were 0.775 and 0.705, respectively.

Conclusion·A relatively stable model for predicting prognosis in CRC was constructed based on the results of scRNA-seq, which has certain guiding significance for treatment decision and prognostic prediction.

Key words: single cell RNA sequencing (scRNA-seq), colorectal cancer, prognosis, bioinformatics

中图分类号: