
收稿日期: 2024-11-25
录用日期: 2025-02-28
网络出版日期: 2025-08-28
基金资助
国家自然科学基金(82273418);上海市科学技术委员会医学创新研究专项(22Y11902800)
Development and clinical application of a machine learning-driven model for metabolite-based diagnosis of small cell lung cancer
Received date: 2024-11-25
Accepted date: 2025-02-28
Online published: 2025-08-28
Supported by
National Natural Science Foundation of China(82273418);Medical Innovation Project of Science and Technology Commission of Shanghai Municipality(22Y11902800)
目的·基于小细胞肺癌(small cell lung cancer,SCLC)和良性肺部疾病患者血清中代谢分子表达谱差异,利用机器学习算法建立SCLC早期诊断模型。方法·纳入并收集上海交通大学医学院附属第一人民医院29名SCLC患者和67名良性肺部疾病患者数据为模型训练集,甘肃省肿瘤医院20名SCLC患者和40名良性肺部疾病患者数据为独立外部测试集。运用液相色谱-串联质谱法(liquid chromatography-tandem mass spectrometry,LC-MS/MS)对入组患者血清中的69种代谢分子进行绝对定量检测。使用XGBoost Classifer计算输出代谢分子重要性排序,根据顺序前向选择策略结合XGBoost算法筛选出重要代谢分子集合。利用训练集构建AdaBoost、随机森林(random forest,RF)和轻量的梯度提升机(light gradient boosting machine,LGBM)3种常规机器学习模型,采用受试者工作特征曲线(receiver operating characteristic curve,ROC曲线)及曲线下面积(area under the curve,AUC)评估和比较模型性能,并使用独立外部测试集进一步验证。结果·对训练集靶向代谢组学数据进行的主成分分析(principal component analysis,PCA)和正交偏最小二乘判别分析(orthogonal projections to latent structures-discriminate analysis,OPLS-DA)结果显示,SCLC患者与良性肺部疾病患者在代谢组学特征上具有显著区分。根据重要性排序从中筛选出6个重要代谢分子,并利用AdaBoost、RF和LGBM训练诊断模型MTB-6(metabolite-6),结果显示AdaBoost模型在训练集中表现最好,AUC为0.943,诊断SCLC患者的灵敏度和特异度分别为75.0%和90.9%。其在外部测试集中,AUC为0.921,灵敏度和特异度分别为80.0%和87.5%。结论·基于6种代谢物分子和AdaBoost算法的MTB-6模型是一种性能优秀的SCLC诊断模型,具有对SCLC和良性肺部疾病进行鉴别诊断的潜在价值。
关键词: 小细胞肺癌; 诊断模型; 机器学习; 代谢组学; 液相色谱-串联质谱法
黄昕 , 刘家辉 , 叶敬文 , 钱文莉 , 许万星 , 王琳 . 基于机器学习的小细胞肺癌代谢分子诊断模型的建立和临床应用[J]. 上海交通大学学报(医学版), 2025 , 45(8) : 1009 -1016 . DOI: 10.3969/j.issn.1674-8115.2025.08.008
Objective ·To develop an early diagnostic model for small cell lung cancer (SCLC) based on differences in serum metabolite expression profiles between patients with SCLC and those with benign pulmonary diseases, using machine learning algorithms. Methods ·Serum samples were collected from 29 SCLC patients and 67 patients with benign lung diseases at Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, as the training cohort. An independent external validation cohort included 20 SCLC patients and 40 patients with benign lung diseases from Gansu Provincial Cancer Hospital. A total of 69 serum metabolites were quantitatively analyzed using liquid chromatography-tandem mass spectrometry (LC-MS/MS). The XGBoost Classifier was employed to rank metabolite importance, and a forward feature selection strategy based on XGBoost was used to identify a subset of key metabolites. Diagnostic models were constructed using AdaBoost, random forest (RF), and light gradient boosting machine (LGBM) algorithms. Model performance was assessed using receiver operating characteristic (ROC) curves and the area under the curve (AUC), and validated on the external test cohort. Results ·Principal component analysis (PCA) and orthogonal projections to latent structures-discriminant analysis (OPLS-DA) of the training cohort revealed distinct metabolic profiles between SCLC and benign lung disease patients. Based on feature importance rankings, six key metabolites were selected to construct the MTB-6 diagnostic model. Among the models, AdaBoost achieved the best performance, with an AUC of 0.943, sensitivity of 75.0%, and specificity of 90.9% in the training cohort. In the external test cohort, the model demonstrated robust performance with an AUC of 0.921, sensitivity of 80.0%, and specificity of 87.5%. Conclusion ·The MTB-6 model, based on six serum metabolites and the AdaBoost algorithm, exhibits excellent diagnostic performance and holds potential for the differential diagnosis of SCLC and benign pulmonary diseases.
| [1] | RUDIN C M, BRAMBILLA E, FAIVRE-FINN C, et al. Small-cell lung cancer[J]. Nat Rev Dis Primers, 2021, 7(1): 3. |
| [2] | SIEGEL R L, MILLER K D, WAGLE N S, et al. Cancer statistics, 2023[J]. CA A Cancer J Clin, 2023, 73(1): 17-48. |
| [3] | CAO W, QIN K, LI F, et al. Comparative study of cancer profiles between 2020 and 2022 using global cancer statistics (GLOBOCAN)[J]. J Natl Cancer Cent, 2024, 4(2): 128-134. |
| [4] | HUANG L, ZHOU J G, YAO W X, et al. Systematic review and meta-analysis of the efficacy of serum neuron-specific enolase for early small cell lung cancer screening[J]. Oncotarget, 2017, 8(38): 64358-64372. |
| [5] | ISGRò M A, BOTTONI P, SCATENA R. Neuron-specific enolase as a biomarker: biochemical and clinical aspects[J]. Adv Exp Med Biol, 2015, 867: 125-143. |
| [6] | XIE E F, ZHANG W, XU H G, et al. Correction of serum NSE reference intervals includes the unidentified hemolysis sample: 1-year data analysis from healthcare individuals[J]. J Clin Lab Anal, 2019, 33(9): e22997. |
| [7] | SEIJO L M, PELED N, AJONA D, et al. Biomarkers in lung cancer screening: achievements, promises, and challenges[J]. J Thorac Oncol, 2019, 14(3): 343-357. |
| [8] | MAURO C, PASSERINI R, SPAGGIARI L, et al. New and old biomarkers in the differential diagnosis of lung cancer: pro-gastrin-releasing peptide in comparison with neuron-specific enolase, carcinoembryonic antigen, and CYFRA 21-1[J]. Int J Biol Markers, 2019, 34(2): 163-167. |
| [9] | FERNANDEZ-CUESTA L, PERDOMO S, AVOGBE P H, et al. Identification of circulating tumor DNA for the early detection of small-cell lung cancer[J]. eBioMedicine, 2016, 10: 117-123. |
| [10] | BATOOL S M, YEKULA A, KHANNA P, et al. The Liquid Biopsy Consortium: challenges and opportunities for early cancer detection and monitoring[J]. Cell Rep Med, 2023, 4(10): 101198. |
| [11] | NI J, ZHANG X T, WANG H P, et al. Clinical characteristics and prognostic model for extensive-stage small cell lung cancer: a retrospective study over an 8-year period[J]. Thorac Cancer, 2022, 13(4): 539-548. |
| [12] | TIAN Y H, WANG Z J, LIU X H, et al. Prediction of chemotherapeutic efficacy in non-small cell lung cancer by serum metabolomic profiling[J]. Clin Cancer Res, 2018, 24(9): 2100-2109. |
| [13] | WANG L, ZHANG M J, PAN X F, et al. Integrative serum metabolic fingerprints based multi-modal platforms for lung adenocarcinoma early detection and pulmonary nodule classification[J]. Adv Sci (Weinh), 2022, 9(34): e2203786. |
| [14] | PRABHA A, YADAV J, RANI A, et al. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier[J]. Comput Biol Med, 2021, 136: 104664. |
| [15] | CHEN Y P, YANG S P, LIU R Q, et al. Forecasting myopic maculopathy risk over a decade: development and validation of an interpretable machine learning algorithm[J]. Invest Ophthalmol Vis Sci, 2024, 65(6): 40. |
| [16] | LIU H Q, LIN S Y, SONG Y D, et al. Machine learning on MRI radiomic features: identification of molecular subtype alteration in breast cancer after neoadjuvant therapy[J]. Eur Radiol, 2023, 33(4): 2965-2974. |
| [17] | KALEMKERIAN G P, LOO B W, AKERLEY W, et al. NCCN guidelines insights: small cell lung cancer, version 2.2018[J]. J Natl Compr Canc Netw, 2018, 16(10): 1171-1182. |
| [18] | MEGYESFALVI Z, GAY C M, POPPER H, et al. Clinical insights into small cell lung cancer: tumor heterogeneity, diagnosis, therapy, and future directions[J]. CA Cancer J Clin, 2023, 73(6): 620-652. |
| [19] | YANG S, ZHANG Z, WANG Q M. Emerging therapies for small cell lung cancer[J]. J Hematol Oncol, 2019, 12(1): 47. |
| [20] | SCHMIDT D R, PATEL R, KIRSCH D G, et al. Metabolomics in cancer research and emerging applications in clinical oncology[J]. CA Cancer J Clin, 2021, 71(4): 333-358. |
| [21] | LUO P, YIN P Y, HUA R, et al. A large-scale, multicenter serum metabolite biomarker identification study for the early detection of hepatocellular carcinoma[J]. Hepatology, 2018, 67(2): 662-675. |
| [22] | MAYERLE J, KALTHOFF H, RESZKA R, et al. Metabolic biomarker signature to differentiate pancreatic ductal adenocarcinoma from chronic pancreatitis[J]. Gut, 2018, 67(1): 128-137. |
| [23] | JABBARI M, SALARI-MOGHADDAM A, BAGHERI A, et al. A systematic review and dose-response meta-analysis of prospective cohort studies on coffee consumption and risk of lung cancer[J]. Sci Rep, 2024, 14(1): 14991. |
| [24] | 徐润灏, 邹琛, 张洁, 等. 胆汁酸谱在肺炎和肺癌鉴别诊断中的应用价值[J]. 检验医学, 2021, 36(1): 1-7. |
| XU R H, ZOU C, ZHANG J, et al. Application of serum bile acid spectrum in the differential diagnosis of pneumonia and lung cancer[J]. Laboratory Medicine, 2021, 36(1): 1-7. |
/
| 〈 |
|
〉 |