• Original article (Basic research) • Previous Articles     Next Articles

Establishment of model of adaboost classifier and evaluation of harmful mutations in non-coding regions of liver cancer cells

XU Li-ping1, LI Jia2, FANG Lin2   

  1. 1.Department of General Surgery, Ningbo First Hospital, Ningbo 315010, China; 2.Department of Thyroid and Breast, Shanghai Tenth People's Hospital, Tongji University, Shanghai 200072, China
  • Online:2015-06-28 Published:2015-07-30

Abstract:

Objective To establish a model of adaboost classifier, evaluate the possibility of disease related mutations in non-coding regions of liver cancer cells, and identify harmful mutations in non-coding regions. Methods A total of 13 108 disease related mutations in non-coding regions were selected from HGMD database and used as subjects and neutral SNPs were used as controls. Combined with regulatory factors of non-coding regions, such as conserved regions, evolutionary RNA conservative structures, high-expressed genes, DNAseⅠ hypersensitive sites, transcription factor binding sites, histone modification, and early replicated genes, the model of adaboost classifier was established. The value of these factors for predicting harmful mutations in non-coding regions was analyzed. The receiver operating characteristic (ROC) curve was plotted and the area under the ROC curve (AUCROC) was calculated. The genome-wide association study (GWAS) and ClinVar disease-associated variants database were used to verify the model. Results Factors sorted by the importance for identifying disease related mutations were conserved regions, early replicated genes, untranslated Regions (UTR), promoters, high-expressed regions, H3K36me3, and conserved TFBSs. The ROC curve was established by using the prediction probability of adaboost classifier and the AUCROC was 0.90. The average scores of GWAS and ClinVar diseaseassociated variants were significantly higher than that of neutral SNPs (P<0.05). Conclusion The adaboost classifier is helpful for evaluating the possibility of harmful mutations in non-coding regions of liver cancer cells and is an accurate prediction tool.

Key words: liver cancer, non-coding variant, adaboost classifier