Techniques and methods

Effects of different expression matrices on screening differential lncRNAs

  • Hao WEI ,
  • Jiajun QIU ,
  • Jingbin YAN
Expand
  • Shanghai Childern's Hospital, Shanghai Institute of Medical Genetics, Shanghai Jiao Tong University School of Medicine, Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China
YAN Jingbin, E-mail: m18917128323@163.com.

Received date: 2022-03-24

  Accepted date: 2022-07-14

  Online published: 2022-09-04

Supported by

National Key R&D Plan(2019YFA0801402);National Natural Science Foundation of China(81971421);Shanghai Key Clinical Specialty Project(shslczdzk05705)

Abstract

Objective

·To compare the effects of two methods for differential analysis of long non-coding RNA (lncRNA) expression levels on screening differential lncRNAs based on whole transcriptome sequencing data.

Methods

·Two sets of whole transcriptome sequencing datasets were downloaded from the NCBI_GEO database with a total of 10 samples. Group A consisted of universal human reference RNA samples, and Group B consisted of human brain reference RNA samples. Each sample contained a series of synthetic RNA (spike-in RNA) at known concentrations from the External RNA Control Consortium (ERCC). The processed sequencing data were counted by using the annotated reference genomes of mRNA, lncRNA, and total RNA, respectively, to obtain the corresponding three expression matrices containing the annotation information of spike-in RNA. Under the condition of P<0.05, according to the real concentration of spike-in RNA in different groups, the false positive rate and false negative rate of differential expression analysis results were judged. The R language software packages DESeq2 and edgeR were used to perform differential expression analysis between groups for all expression matrices, and the receiver operating characteristic (ROC) curve of spike-in RNA was used to show the specificity and sensitivity of differential expression analysis of different expression matrices. Our study mainly focused on the differences between the total RNA expression matrix and the lncRNA expression matrix. Differentially expressed lncRNA analysis was then performed on the total RNA expression matrix and lncRNA expression matrix within groups, and the P value distribution was calculated to compare the false positive rate of different expression matrices.

Results

·Under the condition of P<0.05, the false positive rate and false negative rate of spike-in RNA between group A and B were 0.52 and 0.14 when analyzed with the total RNA expression matrix, and when analyzed with the lncRNA expression matrix, it was 0.30 and 0.17, which indicated that the false positive rate using the lncRNA expression matrix differential analysis was higher. The area under the curve (AUC) of spike-in RNA in expression matrices analyzed by different R packages was generally consistent: AUC (total RNA)≈AUC (mRNA)<AUC (lncRNA), which indicated that the screening effect of lncRNA expression matrix was better than that of total RNA. The intra-group lncRNA differential expression analysis results showed that, under the condition of P<0.05, there were 9 and 7 different expressed lncRNAs in the lncRNA expression matrix and total RNA expression matrix in group A, and 15 and 17 in group B, respectively. The numbers were not significantly different between expression matrices.

Conclusion

·In the differential expression analysis of known lncRNAs in whole transcriptome sequencing data, the specificity and sensitivity of the lncRNA expression matrix analysis are better than that of total RNA.

Cite this article

Hao WEI , Jiajun QIU , Jingbin YAN . Effects of different expression matrices on screening differential lncRNAs[J]. Journal of Shanghai Jiao Tong University (Medical Science), 2022 , 42(7) : 911 -918 . DOI: 10.3969/j.issn.1674-8115.2022.07.010

References

1 PONTING C P, OLIVER P L, REIK W. Evolution and functions of long noncoding RNAs[J]. Cell, 2009;136(4): 629-641.
2 RINN J L, CHANG H Y. Genome regulation by long noncoding RNAs[J]. Annu Rev Biochem, 2012, 81: 145-166.
3 WANG Z, GERSTEIN M, SNYDER M. RNA-Seq: a revolutionary tool for transcriptomics[J]. Nat Rev Genet, 2009, 10(1): 57-63.
4 TRAPNELL C, WILLIAMS B A, PERTEA G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation[J]. Nat Biotechnol, 2010, 28(5):511-515.
5 BULLARD J H, PURDOM E, HANSEN K D, DUDOIT S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments[J]. BMC Bioinformatics, 2010, 11: 94.
6 ROBINSON M D, OSHLACK A. A scaling normalization method for differential expression analysis of RNA-seq data[J]. Genome Biol, 2010, 11(3): R25.
7 ANDERS S, HUBER W. Differential expression analysis for sequence count data[J]. Genome Biol, 2010, 11(10): R106.
8 ROBINSON M D, MCCARTHY D J, SMYTH G K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data[J]. Bioinformatics, 2010, 26(1): 139-140.
9 LI J, WITTEN D M, JOHNSTONE I M, TIBSHIRANI R. Normalization, testing, and false discovery rate estimation for RNA-sequencing data[J]. Biostatistics, 2012, 13(3): 523-538.
10 TRAPNELL C, HENDRICKSON D G, SAUVAGEAU M, et al. Differential analysis of gene regulation at transcript resolution with RNA-seq[J]. Nat Biotechnol, 2013, 31(1): 46-53.
11 RAPAPORT F, KHANIN R, LIANG Y, et al. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data[J]. Genome Biol, 2013, 14(9): R95.
12 STARK R, GRZELAK M, HADFIELD J. RNA sequencing: the teenage years[J]. Nat Rev Genet, 2019, 20(11): 631-656.
13 MARGUERAT S, BAHLER J. RNA-seq: from technology to biology[J]. Cell Mol Life Sci, 2010, 67(4): 569-879.
14 MCDERMAID A, MONIER B, ZHAO J, et al. Interpretation of differential gene expression results of RNA-seq data: review and integration[J]. Brief Bioinform, 2019, 20(6): 2044-2054.
15 DILLIES M A, RAU A, AUBERT J, et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis[J]. Brief Bioinform, 2013, 14(6): 671-683.
16 MORTAZAVI A, WILLIAMS B A, MCCUE K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq[J]. Nat Methods, 2008, 5(7): 621-628.
17 SHI L, CAMPBELL G, JONES W D, et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models[J]. Nat Biotechnol, 2010, 28(8): 827-838.
18 MAQC CONSORTIUM, SHI L, REID L H, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements[J]. Nat Biotechnol, 2006, 24(9): 1151-1161.
19 ST LAURENT G, WAHLESTEDT C, KAPRANOV P. The Landscape of long noncoding RNA classification[J]. Trends Genet, 2015, 31(5): 239-251.
20 YAN L, YANG M, GUO H, et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells[J]. Nat Struct Mol Biol, 2013, 20(9): 1131-1139.
21 ERICKSON R A, RATTNER B A. Moving Beyond P<0.05 in Ecotoxicology: A Guide for Practitioners[J]. Environ Toxicol Chem, 2020, 39(9): 1657-1669.
22 HADJIPAVLOU G, SIVITER R, FEIX B. What is the true worth of a P-value? Time for a change[J]. Br J Anaesth, 2021, 126(3): 564-567.
23 FRIESE M, FRANKENBACH J. P-Hacking and publication bias interact to distort meta-analytic effect size estimates[J]. Psychol Methods, 2020, 25(4): 456-471.
24 YADDANAPUDI L N. The American Statistical Association statement on P-values explained[J]. J Anaesthesiol Clin Pharmacol, 2016, 32(4): 421-423.
25 MORGAN J F. P value fetishism and use of the Bonferroni adjustment[J]. Evid Based Ment Health, 2007, 10(2): 34-35.
Outlines

/