广西科学

引用本文：

张茜,孙一佳,白琳,李陶深.面向同源蛋白质探测的一种新型混合深度学习模型[J].广西科学,2019,26(3):283-290. [点击复制]
ZHANG Qian,SUN Yijia,BAI Lin,LI Taoshen.A New Hybrid Deep Learning Model for Homologous Protein Detection[J].Guangxi Sciences,2019,26(3):283-290. [点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

本文已被：浏览 972次下载 755次	码上扫一扫！
面向同源蛋白质探测的一种新型混合深度学习模型
张茜¹, 孙一佳^2,3, 白琳^2,3, 李陶深^2,3
0 字体:加大+\|默认\|缩小-
(1.广西医科大学第一附属医院, 广西南宁 530021;2.广西大学计算机与电子信息学院, 广西南宁 530004;3.广西高校并行与分布式计算技术重点实验室, 广西南宁 530004)

摘要:

根据蛋白质氨基酸链探测其同源蛋白质，进而预测蛋白质的功能，是生物信息学研究领域的一个重要挑战，也是众多生物医学研究领域的基础研究内容，有着重要的科研价值和广泛的应用需求。其研究难点在于：（1）如何学习对同源蛋白质预测有效、有用的蛋白质特征信息；（2）如何更好地运用蛋白质特征信息，实现同源蛋白质的探测与识别。为了解决同源蛋白质探测与识别研究中的关键难点，本文提出一种基于混合深度学习架构的同源蛋白质探测与识别模型（HDLM-PHP）。通过采用统一的"管道式"深度学习架构，将蛋白质特征学习和探测识别统一为一个整体，提高同源蛋白质探测与识别的效能。采用多组并行的深度卷积神经网络，学习蛋白质的各种属性信息，以期获得丰富的待检测蛋白质和靶蛋白质的高级相关性特征，并通过全连接方式使用多层RBM结构融合和精炼这些相关性特征为全局相关性特征。通过统一的深度网络连接方式，以探测和识别任务为导向，学习到对于同源蛋白质预测最有效、最全面的蛋白质特征信息。在标准数据集SCOPe上，对所提模型进行性能与效率评测，结果表明：本文提出的模型能有效地学习到符合任务导向的蛋白质特征数据，提升同源蛋白质探测与识别的准确度和召回率，优于现有的模型和算法。

关键词: 混合深度学习同源蛋白质深度卷积神经网络蛋白质特征提取深度学习模型机器学习算法

DOI：10.13656/j.cnki.gxkx.20190618.009

基金项目:广西自然科学基金项目（2018GXNSFAA138085）资助。

A New Hybrid Deep Learning Model for Homologous Protein Detection

ZHANG Qian¹, SUN Yijia^2,3, BAI Lin^2,3, LI Taoshen^2,3

(1.The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, 530021, China;2.School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, 530004, China;3.Guangxi Colleges and Universities Key Laboratory of Parallel and Distributed Computing Technology, Nanning, Guangxi, 530004, China)

Abstract:

It is an important challenge in the field of bioinformatics research to detect its homologous proteins based on protein amino acid chains and to predict the function of proteins. It is also a basic research content in many biomedical research fields with important scientific research value and extensive application requirements. The research difficulties are how to learn effective and useful protein feature information for homologous protein prediction and how to better use protein feature information to achieve detection and recognition of homologous proteins. In order to solve the key difficulties in the research of homologous protein detection and recognition, this paper proposed a homologous protein detection and recognition model based on hybrid deep learning architecture (HDLM-PHP). By using a unified "pipelined" deep learning architecture, protein feature learning and detection and recognition were unified into a single entity to improve the efficiency of homologous protein detection and recognition. The model used multiple sets of parallel deep convolutional neural networks to learn various attribute information of proteins and to obtain rich and advanced correlation features between the protein to be detected and the target protein. The multi-layer RBM structure through full connection was used to fuse and refine these correlation features into global correlation features. Through a unified deep network connection, the most effective and comprehensive protein feature information for homologous protein prediction was achieved, which guided by detection and recognition tasks. On the standard dataset SCOPe, performance and efficiency evaluation of the proposed model was performed. The experimental results show that the proposed model can effectively learn the task-oriented protein characteristic data and improve the accuracy and recall rate of homologous protein detection and recognition. The performance of this model is superior to existing models and algorithms.

Key words: hybrid deep learning homologous proteins deep convolution neural network protein feature learning deep learning model machine learning algorithm

用微信扫一扫