• Original article (Basic research) • Previous Articles     Next Articles

Literature mining for non-coding base sequence

AN Jian-fu, MENG Li-li   

  1. Information Center, Renji Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200127, China
  • Online:2013-10-28 Published:2013-10-31


Objective To improve the recall rate and precision rate of non-coding base sequence literature retrieval with neural network algorithm. Methods The related literatures were obtained from PubMed as examples. After the sample literatures were dealt, the terms were selected with term frequency (TF) and inverse document frequency (IDF) methods, then the retrieval model based on back-propagation (BP) neural network algorithm was built. Results When 100 terms were selected, the precision rate, recall rate, area under the receiver operating characteristic curve (ROCAUC), specificity, sensitivity and accuracy rate were 91.49%, 71.23%, 0.823, 93.37%, 71.23% and 82.30% respectively. Conclusion Compared with common methods such as key words and MeSH retrieval, the retrieval model with neural network algorithm can effectively retrieve the literatures related tbo a particular topic.

Key words: non-coding base sequence, neural network, back-propagation algorithm, term occurrence frequency and inverse document frequency, literature mining