李晨曦,1, 王子瑞1, 金恬昊1, 周曾同1, 唐国瑶,1,2, 施琳俊,1

1.上海交通大学医学院附属第九人民医院口腔黏膜病科,上海交通大学口腔医学院,国家口腔医学中心,国家口腔疾病临床医学研究中心,上海市口腔医学重点实验室,上海市口腔医学研究所,上海 200011

2.上海交通大学医学院附属新华医院口腔科,上海 200092

Correlation between computer-assisted quantitative autofluorescence imaging results and the pathological grading of oral epithelial dysplasia in oral leukoplakia

LI Chenxi,1, WANG Zirui1, JIN Tianhao1, ZHOU Zengtong1, TANG Guoyao,1,2, SHI Linjun,1

1.Department of Oral Mucosal Diseases, Shanghai Ninth People′s Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology; Shanghai Research Institute of Stomatology, Shanghai 200011, China

2.Department of stomatology, Shanghai Xin Hua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200092, China

关键词: 自体荧光图像 ; 口腔白斑病 ; 上皮异常增生 ; 有序多元Logistic回归模型 ; 混淆矩阵


Objective ·To explore the correlation between the quantitative results of autofluorescence imaging under computer assistance and the grade of epithelial dysplasia in oral leukoplakia. Methods ·From April 2016 to January 2024, 357 patients with oral leukoplakia who visited the Department of Oral Mucosal Diseases at Shanghai Ninth People′s Hospital, Shanghai Jiao Tong University School of Medicine, were included. Autofluorescence images of the lesions were obtained using a handheld autofluorescence device. These images were converted to grayscale images to obtain quantitative metrics. An ordered multinomial Logistic regression model was fitted in Python, and cumulative probability plots were generated. The dataset was divided into training and testing sets, and a decision tree was generated. Different hyperparameters were adjusted to achieve optimal model performance. Accuracy, precision, and F1 scores were calculated. The model performance was visualized using a confusion matrix. Results ·As the degree of epithelial dysplasia increased, the relative mean color level showed a declining trend. In the binary classification of epithelial dysplasia, there was no overlap between the cumulative probability curves of different categories. In the four-category classification, only severe epithelial dysplasia overlapped with other category curves, indicating good discriminative ability of the model. In binary pathological grading, when the training and testing set ratio was 4∶1 and the maximum depth was 2, the accuracy, precision, and F1 scores were 0.792, 0.801, and 0.795, respectively. In the four-category pathological grading, when the training and testing set ratio was 9∶1 and the maximum depth was 4, the accuracy, precision, and F1 scores were 0.611, 0.537, and 0.569, respectively. Conclusion ·Computer-assisted quantitative analysis of autofluorescence images can be used by oral mucosal specialists as a reference to predict the degree of epithelial dysplasia in patients with oral leukoplakia and to monitor their risk of cancer.

Keywords: autofluorescence imaging ; oral leukoplakia ; epithelial dysplasia ; ordered multinomial Logistic regression model ; confusion matrix

LI Chenxi, WANG Zirui, JIN Tianhao, ZHOU Zengtong, TANG Guoyao, SHI Linjun. Correlation between computer-assisted quantitative autofluorescence imaging results and the pathological grading of oral epithelial dysplasia in oral leukoplakia. Journal of Shanghai Jiao Tong University (Medical Science)[J], 2024, 44(9): 1146-1154 doi:10.3969/j.issn.1674-8115.2024.09.009

口腔潜在恶性疾患(oral potentially malignant disorders,OPMDs)指一组可能进展为口腔鳞状细胞癌(oral squamous cell carcinomas,OSCCs)的疾病总称1。世界卫生组织根据上皮异常增生程度,将OPMDs分为高级别(中、重度异常增生)和低级别(无上皮异常增生和轻度上皮异常增生)2类,以区别其癌变风险2。据文献3报道,约3.5%的OPMDs会转变为OSCCs,高级别OPMDs癌变率更高,达到了25.4%,并且癌变率随上皮异常增生程度加重而上升。口腔白斑病(oral leukoplakia)是临床上最常见的OPMDs之一,5年癌变率约为9.5%,并且其癌变率随着随访时间的延长而升高4-5


1 对象与方法

1.1 研究对象

纳入2016年4月—2024年1月在上海交通大学医学院附属第九人民医院口腔黏膜病科就诊的口腔白斑病患者357例。纳入标准:① 临床和病理组织学均诊断为口腔白斑病。② 年龄18~75周岁。③ 能够配合自体荧光检查和口腔白斑病损害部位活检。排除标准:① 妊娠及哺乳期妇女。② 同时患有其他肿瘤及精神疾病的患者。③ 病理诊断伴有口腔黏膜癌变。

1.2 诊断标准

参考以下口腔白斑病的诊断规范13-14:① 发生于口腔黏膜上以白色为主的损害,不能擦去,也不能以临床和组织病理学的方法诊断为其他可定义的损害。② 当以白色为主的损害在临床检查中不能被诊断为任何其他疾病或病症时,可进行白斑的临时诊断。③ 活检是必需的。④ 如果排除了任何病因,包括使用烟草或槟榔,并且组织病理学未证实其他特定疾病,则做出明确诊断。

1.3 研究流程

研究的流程为:① 告知患者研究目的及过程,签署知情同意书。② 记录符合标准的患者的流行病学信息。③ 进行传统口腔检查,记录损害部位、大小、类型。④ 对损害部位进行自体荧光图像检查,留存自体荧光图像。⑤ 同期、同部位行活检,完成组织病理学诊断。患者口腔损害部位苏木精-伊红染色(hematoxylin-eosin staining,H-E染色)的组织病理学照片由我院口腔病理科提供。

1.4 自体荧光检测方法

本研究的自体荧光检测采用手持式自体荧光检测仪器VELscope®Vx(LED Medical Diagnostics Inc,Burnaby,Canada)。通过其发出的一定波长的可见光(400~460 nm),激发生物组织自身的荧光基团,利用与仪器相连的设备(本研究采用的为iPod),拍摄并储存生物组织反射的荧光图像,用于后续量化分析。

1.5 自体荧光图像的量化

1.5.1 图像转化

在Photoshop CS5中打开自体荧光图像,将彩色图像转化为灰度模式(图1A、B)。


图1   自体荧光图像的量化

Note: A. Original autofluorescence image. B. Converted grayscale image. C. Grayscale image was used to obtain a histogram of grayscale levels, which was used to obtain the mean grayscale value. D. Selection of two negative regions as background controls. E. Obtaining the mean color level of the negative regions. F. Selection of one positive region. G. Obtaining the MCL of the positive region.

Fig 1   Quantification of autofluorescence imaging

1.5.2 分析指标选取

每一幅灰度图像可得到色阶图(图1C)。该色阶图本质上是一个直方图。在直方图中,横坐标标注质量特性值,纵坐标标注频数或频率值,各组的频数或频率的大小用直方柱的高度表示。在数字图像中,色阶图是说明照片中像素色调分布的直方图。其中,横坐标代表256种色阶,记为0~255,0代表黑色,255代表白色;纵坐标代表该色阶的频数,即像素值。本研究采用选定区域内的色阶平均值(mean color level,MCL)作为分析指标,对该区域图像进行数字化。

1.5.3 自体荧光检查图像数据的获取


1.5.4 定量数据的齐化

由于损害部位不同,受试者阳性区域在自体荧光图像中呈现的颜色受周围组织背景色干扰。为消除该干扰,本研究采用相对色阶平均值(relative mean color level,RMCL)进行统计,使得受试者间比较更有意义。RMCL=阳性区域MCL(MCL)-阳性区域MCL(MCL)。

1.5.5 组织病理学结果赋分


1.6 统计学分析

将获得的自体荧光定量数据,使用Python 3.12.2拟合有序多元Logistic回归模型,绘制累积概率图。将数据划分为训练集和测试集,生成决策树,调整不同的超参数,例如训练集和测试集的比例、决策树的深度等,以获得最佳的模型效果。计算准确度(accuracy)、精确度(precision)和F1分值(F1 score)。利用混淆矩阵对模型性能进行可视化呈现。

2 结果



图2   RMCL分布图

Note: A decreasing trend in RMCL was observed as the degree of epithelial dysplasia increased.

Fig 2   Distribution of RMCL


图3   不同级别上皮异常增生的4例口腔白斑病

Fig 3   Four cases of oral leukoplakia with different levels of epithelial dysplasia



图4   有序多元Logistic回归模型预测病理结果的累积概率图

Fig 4   Cumulative probability plot for the prediction of pathological results using an ordinal multivariate Logistic regression model


表1   二分类病理结果的准确度、精确度和F1分值

Tab 1  Accuracy, precision, and F1 scores of binary classification of pathological results

Maximum depth of decision treeIndexThe proportion of the test set in all samples
F1 score0.6690.7440.7950.5470.5340.5220.5560.5860.578
F1 score0.6400.6900.7550.5080.5960.6630.5510.5540.554
F1 score0.6670.7070.7550.7090.5760.6500.5660.5620.557

图5   模型性能最佳时的混淆矩阵和决策树

Note: A. Confusion matrix of the test set prediction model for binary classification of pathological grade. Darker color blocks along the diagonal indicate a higher probability of correct predictions. B. Confusion matrix of the test set prediction model for four-class classification of pathological grade. Dark color blocks are primarily concentrated in the second and third column, indicating that the model tends to predict the degree of epithelial dysplasia as mild or moderate. C. Decision tree of the training set for binary classification of pathological grade, showing the prediction process of each sample. The red circle indicates the root node, and the red square indicates the leaf nodes, with different classification results. D. Decision tree of the training set for four-class classification of pathological grade, showing the prediction process of each sample. The red circle indicates the root node, and the red square indicates the leaf nodes, with different classification results.

Fig 5   Confusion matrix and decision tree with optimal model performance

表2   四分类病理结果的准确度、精确度和F1分值

Tab 2  Accuracy, precision, and F1 scores of four-class classification of pathological results

Max imum depth of decision treeIndexThe proportion of the test set in all samples
F1 score0.3840.4190.5560.4250.4240.4140.4210.4330.423
F1 score0.5690.5330.4070.4540.4270.4380.4650.4890.455
F1 score0.4220.4460.4170.4600.4670.4530.4230.5060.460

3 讨论

外来光子被生物分子吸收后,会促使这些生物分子由基态转化为激发态。激发态的生物分子通过多途径丢失能量,最终返回基态。通常当生物分子重回基态时,能量通过热量的形式散失到周围。但在某些情况下,能量以光子发射的形式释放,即荧光现象。人体组织中的一些生物分子,如卟啉(porphyrins)、胶原(collagen)、氨基酸(amino acid)、弹性蛋白(elastin)和维生素等,在受到特定波长的光激发后会产生荧光。正常组织与癌变组织的分子组成结构存在差异,其荧光光谱特征也不同。据此,临床上可借助仪器(如本研究采用的手持自体荧光仪器VELscope®)发射一定波长的可见光并采集和储存图像(如本研究采用的iPod),进行结果判读,以此区分正常组织、癌前组织和癌变组织,从而对早期癌症或癌前病变的进展进行诊断,这也是自体荧光诊断技术的基础15


然而,上述研究中自体荧光检查采用的均为定性标准,存在较大的主观性,导致结果差异较大。有学者23对该技术在疾病筛查中的应用价值提出了质疑,认为该技术仅对有相关经验的临床医师起到较好的辅助作用,而对缺乏经验的社区医师在疾病筛查时的作用尚无定论。此外,PENTENERO等24的研究还显示,对于自体荧光结果的判断,无论是有经验的口腔医学从业者(oral medicine practitioners,OMPs)还是普通的牙科医师(general dental practitioners,GDPs),观察者内部和观察者间的一致性均不理想,其中的重要原因之一是缺乏客观标准。因此,虽然自体荧光图像技术具有较好的应用前景,但由于缺乏定量标准,结果判断主观性高,对临床医师的判读经验要求较高,难以服务二级医院和基层医院用于高癌变风险患者筛查,限制了其推广和应用。

基于自体荧光诊断技术的原理,受可见光激发的生物分子含量是连续变化的,因此产生的自体荧光明暗程度也是一个连续的变量,这为实现自体荧光定量诊断提供了理论基础。近年来,研究者们也提出了一些自体荧光的定量检查。其中,HUANG等25证实癌变区域与正常区域的光强度和异质性存在明显差异,但他们没有进一步明确诊断阈值并进行验证,也没有对OPMDs与荧光量化之间关系进行探究。QUANG等26用红绿强度比(red to green ratio)对荧光暗区进行数字化,使用该方法对癌变的诊断准确性达到了85%,但研究未阐明其与上皮异常增生程度是否存在关联。CHERRY等27同样利用红绿强度比对病损区域进行风险评级,在随访过程中验证了评级结果与活检结果有较好的一致性,但同样未阐明其与上皮异常增生的关系。


除了上皮异常增生等级分布不均外,造成模型预测不够准确的原因还可能与病损部位和类型有关:① 正常舌背黏膜在自体荧光图像中MCL较高,也就是普遍较亮;而舌背部位的损害,通常伴有舌乳头萎缩,在自体荧光图像中较暗,MCL较低。这一点造成的结果是,无论病损的上皮异常增生程度如何,RMCL均较小,影响了模型的准确程度。② 当损害为疣状时,较厚的角质层在自体荧光图像中均呈现较高的MCL,因此易被模型预测为无或轻度上皮异常增生,而实际的上皮异常增生等级可能更高。③ 当损害伴感染时,自体荧光图像易出现假阳性,也就是MCL远低于周围正常区域,导致RMCL较低,预测的上皮异常增生程度较重,可能与实际上皮异常增生程度不符。在后续研究中,增加样本量,将损害按不同部位和类型进行分类分析,有望进一步提高模型的准确度。



