JOURNAL OF SHANGHAI JIAOTONG UNIVERSITY (MEDICAL SCIENCE) ›› 2021, Vol. 41 ›› Issue (2): 159-165.doi: 10.3969/j.issn.1674-8115.2021.02.006

• Basic research • Previous Articles     Next Articles

Establishment and validation of prognostic prediction model of colorectal cancer based on single-cell RNA sequencing

Yan-ru MA1(), Lin-hua JI2, Tian-ying TONG1, Yu-qing YAN1, Chao-qin SHEN1, Xin-yu ZHANG1, Ying-ying CAO1, Jie HONG1, Hao-yan CHEN1()   

  1. 1.Department of Gastroenterology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200001, China
    2.Department of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200001, China
  • Received:2020-05-21 Online:2021-02-28 Published:2021-02-28
  • Contact: Hao-yan CHEN;
  • Supported by:
    Shanghai Municipal Education Commission—Gaofeng Clinical Medicine Grant Support(20161309);Innovative Research Team of High-Level Local Universities in Shanghai(SSMU-ZLCX20180200)

Abstract: Objective

·To establish a model for predicting the prognosis in patients with colorectal cancer (CRC) using single cell RNA sequencing (scRNA-seq).


·scRNA-seq data of patients with CRC from Gene Expression Omnibus (GEO) database was used to filter out candidate genes, which were related to metastatic CRC. The least absolute shrinkage and selection operator (LASSO) regression, Logistic regression and Kaplan-Meier analysis were used to select and evaluate the significance of the hub gene filtered out in The Cancer Genome Atlas (TCGA) database, and to develop the prognostic prediction model of CRC. Decision curve analysis and receiver operating characteristic (ROC) curve were used to assess the clinical use of the prediction model.


·Thirty candidate genes were filtered out from the scRNA-seq data which was downloaded in GEO database, and then 9 hub genes were selected by LASSO regression in the TCGA database. The hub-gene expression was scored for each patient. The scores had significant difference between the groups with and without recurrence both in the training set and the validation set (P<0.05). In addition, Logistic regression analysis was carried out to incorporate the two independent clinical variables of primary tumor grade (T stage) and metastasis status (M stage) into the score-clinical variable integration model. Area under curve of the ROC curve in the training set and validation set were 0.775 and 0.705, respectively.


·A relatively stable model for predicting prognosis in CRC was constructed based on the results of scRNA-seq, which has certain guiding significance for treatment decision and prognostic prediction.

Key words: single cell RNA sequencing (scRNA-seq), colorectal cancer, prognosis, bioinformatics

CLC Number: