Journal of Shanghai Jiao Tong University (Medical Science) ›› 2022, Vol. 42 ›› Issue (7): 911-918.doi: 10.3969/j.issn.1674-8115.2022.07.010

• Techniques and methods • Previous Articles    

Effects of different expression matrices on screening differential lncRNAs

WEI Hao(), QIU Jiajun, YAN Jingbin()   

  1. Shanghai Childern's Hospital, Shanghai Institute of Medical Genetics, Shanghai Jiao Tong University School of Medicine, Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China
  • Received:2022-03-24 Accepted:2022-07-14 Online:2022-07-28 Published:2022-09-04
  • Contact: YAN Jingbin E-mail:1187383951@qq.com;m18917128323@163.com
  • Supported by:
    National Key R&D Plan(2019YFA0801402);National Natural Science Foundation of China(81971421);Shanghai Key Clinical Specialty Project(shslczdzk05705)

Abstract: Objective

·To compare the effects of two methods for differential analysis of long non-coding RNA (lncRNA) expression levels on screening differential lncRNAs based on whole transcriptome sequencing data.

Methods

·Two sets of whole transcriptome sequencing datasets were downloaded from the NCBI_GEO database with a total of 10 samples. Group A consisted of universal human reference RNA samples, and Group B consisted of human brain reference RNA samples. Each sample contained a series of synthetic RNA (spike-in RNA) at known concentrations from the External RNA Control Consortium (ERCC). The processed sequencing data were counted by using the annotated reference genomes of mRNA, lncRNA, and total RNA, respectively, to obtain the corresponding three expression matrices containing the annotation information of spike-in RNA. Under the condition of P<0.05, according to the real concentration of spike-in RNA in different groups, the false positive rate and false negative rate of differential expression analysis results were judged. The R language software packages DESeq2 and edgeR were used to perform differential expression analysis between groups for all expression matrices, and the receiver operating characteristic (ROC) curve of spike-in RNA was used to show the specificity and sensitivity of differential expression analysis of different expression matrices. Our study mainly focused on the differences between the total RNA expression matrix and the lncRNA expression matrix. Differentially expressed lncRNA analysis was then performed on the total RNA expression matrix and lncRNA expression matrix within groups, and the P value distribution was calculated to compare the false positive rate of different expression matrices.

Results

·Under the condition of P<0.05, the false positive rate and false negative rate of spike-in RNA between group A and B were 0.52 and 0.14 when analyzed with the total RNA expression matrix, and when analyzed with the lncRNA expression matrix, it was 0.30 and 0.17, which indicated that the false positive rate using the lncRNA expression matrix differential analysis was higher. The area under the curve (AUC) of spike-in RNA in expression matrices analyzed by different R packages was generally consistent: AUC (total RNA)≈AUC (mRNA)<AUC (lncRNA), which indicated that the screening effect of lncRNA expression matrix was better than that of total RNA. The intra-group lncRNA differential expression analysis results showed that, under the condition of P<0.05, there were 9 and 7 different expressed lncRNAs in the lncRNA expression matrix and total RNA expression matrix in group A, and 15 and 17 in group B, respectively. The numbers were not significantly different between expression matrices.

Conclusion

·In the differential expression analysis of known lncRNAs in whole transcriptome sequencing data, the specificity and sensitivity of the lncRNA expression matrix analysis are better than that of total RNA.

Key words: long non-coding RNA, differential expression analysis, spike-in RNA, receiver operating characteristic curve

CLC Number: