上海交通大学学报(医学版) ›› 2025, Vol. 45 ›› Issue (8): 1009-1016.doi: 10.3969/j.issn.1674-8115.2025.08.008

• 论著 · 临床研究 • 上一篇    下一篇

基于机器学习的小细胞肺癌代谢分子诊断模型的建立和临床应用

黄昕1,2, 刘家辉1,2, 叶敬文1,2, 钱文莉1,2, 许万星1,2, 王琳1,2()   

  1. 1.上海交通大学医学院附属第一人民医院检验医学中心,上海 200080
    2.上海交通大学医学院医学技术学院,上海 200025
  • 收稿日期:2024-11-25 接受日期:2025-02-28 出版日期:2025-08-28 发布日期:2025-08-28
  • 通讯作者: 王 琳,副研究员,博士;电子信箱:wanglin987654321@126.com
  • 基金资助:
    国家自然科学基金(82273418);上海市科学技术委员会医学创新研究专项(22Y11902800)

Development and clinical application of a machine learning-driven model for metabolite-based diagnosis of small cell lung cancer

HUANG Xin1,2, LIU Jiahui1,2, YE Jingwen1,2, QIAN Wenli1,2, XU Wanxing1,2, WANG Lin1,2()   

  1. 1.Clinical Laboratory Medicine Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, China
    2.College of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
  • Received:2024-11-25 Accepted:2025-02-28 Online:2025-08-28 Published:2025-08-28
  • Contact: WANG Lin, E-mail: wanglin987654321@126.com.
  • Supported by:
    National Natural Science Foundation of China(82273418);Medical Innovation Project of Science and Technology Commission of Shanghai Municipality(22Y11902800)

摘要:

目的·基于小细胞肺癌(small cell lung cancer,SCLC)和良性肺部疾病患者血清中代谢分子表达谱差异,利用机器学习算法建立SCLC早期诊断模型。方法·纳入并收集上海交通大学医学院附属第一人民医院29名SCLC患者和67名良性肺部疾病患者数据为模型训练集,甘肃省肿瘤医院20名SCLC患者和40名良性肺部疾病患者数据为独立外部测试集。运用液相色谱-串联质谱法(liquid chromatography-tandem mass spectrometry,LC-MS/MS)对入组患者血清中的69种代谢分子进行绝对定量检测。使用XGBoost Classifer计算输出代谢分子重要性排序,根据顺序前向选择策略结合XGBoost算法筛选出重要代谢分子集合。利用训练集构建AdaBoost、随机森林(random forest,RF)和轻量的梯度提升机(light gradient boosting machine,LGBM)3种常规机器学习模型,采用受试者工作特征曲线(receiver operating characteristic curve,ROC曲线)及曲线下面积(area under the curve,AUC)评估和比较模型性能,并使用独立外部测试集进一步验证。结果·对训练集靶向代谢组学数据进行的主成分分析(principal component analysis,PCA)和正交偏最小二乘判别分析(orthogonal projections to latent structures-discriminate analysis,OPLS-DA)结果显示,SCLC患者与良性肺部疾病患者在代谢组学特征上具有显著区分。根据重要性排序从中筛选出6个重要代谢分子,并利用AdaBoost、RF和LGBM训练诊断模型MTB-6(metabolite-6),结果显示AdaBoost模型在训练集中表现最好,AUC为0.943,诊断SCLC患者的灵敏度和特异度分别为75.0%和90.9%。其在外部测试集中,AUC为0.921,灵敏度和特异度分别为80.0%和87.5%。结论·基于6种代谢物分子和AdaBoost算法的MTB-6模型是一种性能优秀的SCLC诊断模型,具有对SCLC和良性肺部疾病进行鉴别诊断的潜在价值。

关键词: 小细胞肺癌, 诊断模型, 机器学习, 代谢组学, 液相色谱-串联质谱法

Abstract:

Objective ·To develop an early diagnostic model for small cell lung cancer (SCLC) based on differences in serum metabolite expression profiles between patients with SCLC and those with benign pulmonary diseases, using machine learning algorithms. Methods ·Serum samples were collected from 29 SCLC patients and 67 patients with benign lung diseases at Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, as the training cohort. An independent external validation cohort included 20 SCLC patients and 40 patients with benign lung diseases from Gansu Provincial Cancer Hospital. A total of 69 serum metabolites were quantitatively analyzed using liquid chromatography-tandem mass spectrometry (LC-MS/MS). The XGBoost Classifier was employed to rank metabolite importance, and a forward feature selection strategy based on XGBoost was used to identify a subset of key metabolites. Diagnostic models were constructed using AdaBoost, random forest (RF), and light gradient boosting machine (LGBM) algorithms. Model performance was assessed using receiver operating characteristic (ROC) curves and the area under the curve (AUC), and validated on the external test cohort. Results ·Principal component analysis (PCA) and orthogonal projections to latent structures-discriminant analysis (OPLS-DA) of the training cohort revealed distinct metabolic profiles between SCLC and benign lung disease patients. Based on feature importance rankings, six key metabolites were selected to construct the MTB-6 diagnostic model. Among the models, AdaBoost achieved the best performance, with an AUC of 0.943, sensitivity of 75.0%, and specificity of 90.9% in the training cohort. In the external test cohort, the model demonstrated robust performance with an AUC of 0.921, sensitivity of 80.0%, and specificity of 87.5%. Conclusion ·The MTB-6 model, based on six serum metabolites and the AdaBoost algorithm, exhibits excellent diagnostic performance and holds potential for the differential diagnosis of SCLC and benign pulmonary diseases.

Key words: small cell lung cancer (SCLC), diagnosis model, machine learning, metabolomics, liquid chromatography-tandem mass spectrometry (LC-MS/MS)

中图分类号: