Journal of Shanghai Jiao Tong University (Medical Science) ›› 2025, Vol. 45 ›› Issue (8): 1009-1016.doi: 10.3969/j.issn.1674-8115.2025.08.008

• Clinical research • Previous Articles     Next Articles

Development and clinical application of a machine learning-driven model for metabolite-based diagnosis of small cell lung cancer

HUANG Xin1,2, LIU Jiahui1,2, YE Jingwen1,2, QIAN Wenli1,2, XU Wanxing1,2, WANG Lin1,2()   

  1. 1.Clinical Laboratory Medicine Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, China
    2.College of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
  • Received:2024-11-25 Accepted:2025-02-28 Online:2025-08-28 Published:2025-08-28
  • Contact: WANG Lin E-mail:wanglin987654321@126.com
  • Supported by:
    National Natural Science Foundation of China(82273418);Medical Innovation Project of Science and Technology Commission of Shanghai Municipality(22Y11902800)

Abstract:

Objective ·To develop an early diagnostic model for small cell lung cancer (SCLC) based on differences in serum metabolite expression profiles between patients with SCLC and those with benign pulmonary diseases, using machine learning algorithms. Methods ·Serum samples were collected from 29 SCLC patients and 67 patients with benign lung diseases at Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, as the training cohort. An independent external validation cohort included 20 SCLC patients and 40 patients with benign lung diseases from Gansu Provincial Cancer Hospital. A total of 69 serum metabolites were quantitatively analyzed using liquid chromatography-tandem mass spectrometry (LC-MS/MS). The XGBoost Classifier was employed to rank metabolite importance, and a forward feature selection strategy based on XGBoost was used to identify a subset of key metabolites. Diagnostic models were constructed using AdaBoost, random forest (RF), and light gradient boosting machine (LGBM) algorithms. Model performance was assessed using receiver operating characteristic (ROC) curves and the area under the curve (AUC), and validated on the external test cohort. Results ·Principal component analysis (PCA) and orthogonal projections to latent structures-discriminant analysis (OPLS-DA) of the training cohort revealed distinct metabolic profiles between SCLC and benign lung disease patients. Based on feature importance rankings, six key metabolites were selected to construct the MTB-6 diagnostic model. Among the models, AdaBoost achieved the best performance, with an AUC of 0.943, sensitivity of 75.0%, and specificity of 90.9% in the training cohort. In the external test cohort, the model demonstrated robust performance with an AUC of 0.921, sensitivity of 80.0%, and specificity of 87.5%. Conclusion ·The MTB-6 model, based on six serum metabolites and the AdaBoost algorithm, exhibits excellent diagnostic performance and holds potential for the differential diagnosis of SCLC and benign pulmonary diseases.

Key words: small cell lung cancer (SCLC), diagnosis model, machine learning, metabolomics, liquid chromatography-tandem mass spectrometry (LC-MS/MS)

CLC Number: