Clinical research

Development and clinical application of a machine learning-driven model for metabolite-based diagnosis of small cell lung cancer

  • HUANG Xin ,
  • LIU Jiahui ,
  • YE Jingwen ,
  • QIAN Wenli ,
  • XU Wanxing ,
  • WANG Lin
Expand
  • 1.Clinical Laboratory Medicine Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, China
    2.College of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
WANG Lin, E-mail: wanglin987654321@126.com.

Received date: 2024-11-25

  Accepted date: 2025-02-28

  Online published: 2025-08-28

Supported by

National Natural Science Foundation of China(82273418);Medical Innovation Project of Science and Technology Commission of Shanghai Municipality(22Y11902800)

Abstract

Objective ·To develop an early diagnostic model for small cell lung cancer (SCLC) based on differences in serum metabolite expression profiles between patients with SCLC and those with benign pulmonary diseases, using machine learning algorithms. Methods ·Serum samples were collected from 29 SCLC patients and 67 patients with benign lung diseases at Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, as the training cohort. An independent external validation cohort included 20 SCLC patients and 40 patients with benign lung diseases from Gansu Provincial Cancer Hospital. A total of 69 serum metabolites were quantitatively analyzed using liquid chromatography-tandem mass spectrometry (LC-MS/MS). The XGBoost Classifier was employed to rank metabolite importance, and a forward feature selection strategy based on XGBoost was used to identify a subset of key metabolites. Diagnostic models were constructed using AdaBoost, random forest (RF), and light gradient boosting machine (LGBM) algorithms. Model performance was assessed using receiver operating characteristic (ROC) curves and the area under the curve (AUC), and validated on the external test cohort. Results ·Principal component analysis (PCA) and orthogonal projections to latent structures-discriminant analysis (OPLS-DA) of the training cohort revealed distinct metabolic profiles between SCLC and benign lung disease patients. Based on feature importance rankings, six key metabolites were selected to construct the MTB-6 diagnostic model. Among the models, AdaBoost achieved the best performance, with an AUC of 0.943, sensitivity of 75.0%, and specificity of 90.9% in the training cohort. In the external test cohort, the model demonstrated robust performance with an AUC of 0.921, sensitivity of 80.0%, and specificity of 87.5%. Conclusion ·The MTB-6 model, based on six serum metabolites and the AdaBoost algorithm, exhibits excellent diagnostic performance and holds potential for the differential diagnosis of SCLC and benign pulmonary diseases.

Cite this article

HUANG Xin , LIU Jiahui , YE Jingwen , QIAN Wenli , XU Wanxing , WANG Lin . Development and clinical application of a machine learning-driven model for metabolite-based diagnosis of small cell lung cancer[J]. Journal of Shanghai Jiao Tong University (Medical Science), 2025 , 45(8) : 1009 -1016 . DOI: 10.3969/j.issn.1674-8115.2025.08.008

References

[1] RUDIN C M, BRAMBILLA E, FAIVRE-FINN C, et al. Small-cell lung cancer[J]. Nat Rev Dis Primers, 2021, 7(1): 3.
[2] SIEGEL R L, MILLER K D, WAGLE N S, et al. Cancer statistics, 2023[J]. CA A Cancer J Clin, 2023, 73(1): 17-48.
[3] CAO W, QIN K, LI F, et al. Comparative study of cancer profiles between 2020 and 2022 using global cancer statistics (GLOBOCAN)[J]. J Natl Cancer Cent, 2024, 4(2): 128-134.
[4] HUANG L, ZHOU J G, YAO W X, et al. Systematic review and meta-analysis of the efficacy of serum neuron-specific enolase for early small cell lung cancer screening[J]. Oncotarget, 2017, 8(38): 64358-64372.
[5] ISGRò M A, BOTTONI P, SCATENA R. Neuron-specific enolase as a biomarker: biochemical and clinical aspects[J]. Adv Exp Med Biol, 2015, 867: 125-143.
[6] XIE E F, ZHANG W, XU H G, et al. Correction of serum NSE reference intervals includes the unidentified hemolysis sample: 1-year data analysis from healthcare individuals[J]. J Clin Lab Anal, 2019, 33(9): e22997.
[7] SEIJO L M, PELED N, AJONA D, et al. Biomarkers in lung cancer screening: achievements, promises, and challenges[J]. J Thorac Oncol, 2019, 14(3): 343-357.
[8] MAURO C, PASSERINI R, SPAGGIARI L, et al. New and old biomarkers in the differential diagnosis of lung cancer: pro-gastrin-releasing peptide in comparison with neuron-specific enolase, carcinoembryonic antigen, and CYFRA 21-1[J]. Int J Biol Markers, 2019, 34(2): 163-167.
[9] FERNANDEZ-CUESTA L, PERDOMO S, AVOGBE P H, et al. Identification of circulating tumor DNA for the early detection of small-cell lung cancer[J]. eBioMedicine, 2016, 10: 117-123.
[10] BATOOL S M, YEKULA A, KHANNA P, et al. The Liquid Biopsy Consortium: challenges and opportunities for early cancer detection and monitoring[J]. Cell Rep Med, 2023, 4(10): 101198.
[11] NI J, ZHANG X T, WANG H P, et al. Clinical characteristics and prognostic model for extensive-stage small cell lung cancer: a retrospective study over an 8-year period[J]. Thorac Cancer, 2022, 13(4): 539-548.
[12] TIAN Y H, WANG Z J, LIU X H, et al. Prediction of chemotherapeutic efficacy in non-small cell lung cancer by serum metabolomic profiling[J]. Clin Cancer Res, 2018, 24(9): 2100-2109.
[13] WANG L, ZHANG M J, PAN X F, et al. Integrative serum metabolic fingerprints based multi-modal platforms for lung adenocarcinoma early detection and pulmonary nodule classification[J]. Adv Sci (Weinh), 2022, 9(34): e2203786.
[14] PRABHA A, YADAV J, RANI A, et al. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier[J]. Comput Biol Med, 2021, 136: 104664.
[15] CHEN Y P, YANG S P, LIU R Q, et al. Forecasting myopic maculopathy risk over a decade: development and validation of an interpretable machine learning algorithm[J]. Invest Ophthalmol Vis Sci, 2024, 65(6): 40.
[16] LIU H Q, LIN S Y, SONG Y D, et al. Machine learning on MRI radiomic features: identification of molecular subtype alteration in breast cancer after neoadjuvant therapy[J]. Eur Radiol, 2023, 33(4): 2965-2974.
[17] KALEMKERIAN G P, LOO B W, AKERLEY W, et al. NCCN guidelines insights: small cell lung cancer, version 2.2018[J]. J Natl Compr Canc Netw, 2018, 16(10): 1171-1182.
[18] MEGYESFALVI Z, GAY C M, POPPER H, et al. Clinical insights into small cell lung cancer: tumor heterogeneity, diagnosis, therapy, and future directions[J]. CA Cancer J Clin, 2023, 73(6): 620-652.
[19] YANG S, ZHANG Z, WANG Q M. Emerging therapies for small cell lung cancer[J]. J Hematol Oncol, 2019, 12(1): 47.
[20] SCHMIDT D R, PATEL R, KIRSCH D G, et al. Metabolomics in cancer research and emerging applications in clinical oncology[J]. CA Cancer J Clin, 2021, 71(4): 333-358.
[21] LUO P, YIN P Y, HUA R, et al. A large-scale, multicenter serum metabolite biomarker identification study for the early detection of hepatocellular carcinoma[J]. Hepatology, 2018, 67(2): 662-675.
[22] MAYERLE J, KALTHOFF H, RESZKA R, et al. Metabolic biomarker signature to differentiate pancreatic ductal adenocarcinoma from chronic pancreatitis[J]. Gut, 2018, 67(1): 128-137.
[23] JABBARI M, SALARI-MOGHADDAM A, BAGHERI A, et al. A systematic review and dose-response meta-analysis of prospective cohort studies on coffee consumption and risk of lung cancer[J]. Sci Rep, 2024, 14(1): 14991.
[24] 徐润灏, 邹琛, 张洁, 等. 胆汁酸谱在肺炎和肺癌鉴别诊断中的应用价值[J]. 检验医学, 2021, 36(1): 1-7.
  XU R H, ZOU C, ZHANG J, et al. Application of serum bile acid spectrum in the differential diagnosis of pneumonia and lung cancer[J]. Laboratory Medicine, 2021, 36(1): 1-7.
Outlines

/