1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Using multiple windows scanning and natural language processing techniques to study electron transport proteins

5 3 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 5
Dung lượng 193,45 KB

Nội dung

元 智 大 學 資 訊 工 程 學 系 博 士 論 文 使用多窗口掃描和自然語言處理技術研究電子傳 遞蛋白 Using multiple windows scanning and natural language processing techniques to study electron transport proteins 研 究 生: 胡光泰 指導教授: 歐昱言 博士 中華民國 111 年 4月 使用多窗口掃描和自然語言處理技術研究電子傳 遞蛋白 Using multiple windows scanning and natural language processing techniques to study electron transport proteins 研 究 生 :胡光泰 Student:Ho Quang Thai 指 導 教 授 :歐昱言 博士 元 Advisor:Dr Yu Yen Ou 智 大 學 資 訊 工 程 學 系 博 士 論 文 A Dissertation Submitted to the Department of Computer Science and Engineering Yuan Ze University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science and Engineering April 2022 Chungli, Taiwan, Republic of China 中華民國 111 年 月 Using multiple windows scanning and natural language processing techniques to study electron transport proteins Student: Ho Quang Thai Advisor: Dr Yu-Yen Ou Department of Computer Science and Engineering College of Informatics Yuan Ze University Abstract Nature is an infinite source of inspiration for people to discover and recreate wonderful inventions Inspired by the way neurons work in the human brain, Convolutional Neural Networks (CNNs) have been proposed and become a powerful and widely used tool in imaging-related tasks It and its structural variants are increasingly developed and achieved many cutting-edge achievements not only in the field of computer vision but also in many other fields In addition, CNN is also known as an effective tool in extracting hidden information in visual data In the field of bioinformatics, CNN has rapidly gained a lot of interest over the past decade, especially in biomedical imaging However, current solutions for applying CNNs to non-visual data, such as protein sequences, are still not fully resolved Unlike image data, protein sequences cannot be decomposed into color channels, and color channels provide a lot of useful information for CNN's pattern recognition capabilities Another problem that makes it more troublesome to apply CNN to protein chains is the limitation of the input layer The input layer is defined as a fixed dimension, iii and the length of the protein chain is usually variable With different filters are applied to identify sequence motifs within the input protein sequence with multiple sequence alignment profiles The model uses a number of filters to capture many different sequence motifs, and then use multiple different window sizes to capture more feature motifs further to capture more significant patterns for protein prediction problems Recently, the NLP field has risen with much state-of-the-art performance when successfully applying the transformers to help computers focus attention on word position and its context rather than just relying on how often the word appears in a sentence Natural language and protein sequence share several common points, like how they form and how they are presented This study assumes the protein sequences as an unknown language, each amino acid as a “word” in a biological vocabulary We operate NLP models pre-trained by an extensive corpus of natural language data to determine whether an association between natural language and an undiscovered language exists inside our body Our study analyzed electron transport proteins using multiple windows scanning and natural language processing techniques in three works Firstly, we used multiple windows scanning technique to predict electron transport proteins in transport proteins For independent data, our model performed with an average sensitivity of 92,59%, specificity of 98,19%, accuracy of 97,41%, and Matthew's correlation coefficient (MCC) of 0.89 Additionally, our method can identify complexes with different molecular functions in electron transport proteins Across five independent data sets, MCCs were 0.86, 0.80, 0.88, 1.00, and 0.92, respectively In the second work, we combined feature set extracted from Bidirectional Encoder Representations from Transformers (BERT) pre-trained iv model with Position Specific Score Matrix Profiles (PSSM), and the Amino Acid Index database (AAIndex) to identify Flavin Adenine Dinucleotide (FAD) binding sites in electron transport proteins with an average sensitivity of 85.19%, a specificity of 85.62%, an accuracy of 85.60%, and an MCC of 0.35 for independent data set In the last work, we attempt to use multiple windows scanning technique to resolve the FAD binding site identification problem In order to solve the problem of the modest amount of data in nature, we first trained the model by using PSSM profiles of FAD binding sites in transporters We then used the model to predict the FAD binding sites in electron transport proteins In our analysis, we found that the performance of independent data set had an average sensitivity of 92,59%, specificity of 98,19%, accuracy of 97,41%, and MCC of 0.89 The performance of our method is better in all measurement metrics than other published methods Researchers may be able to gain a deeper understanding of transport proteins through the proposed technique, which can also be used as a powerful web tool Further, the results of this study pave the way for further research on deep learning to enrich the bioinformatics field Keywords: machine learning; deep learning; convolutional neural network; electron transport proteins; position specific scoring matrix; FAD binding site; natural language processing, multiple windows scanning v ... multiple windows scanning and natural language processing techniques in three works Firstly, we used multiple windows scanning technique to predict electron transport proteins in transport proteins. .. 月 Using multiple windows scanning and natural language processing techniques to study electron transport proteins Student: Ho Quang Thai Advisor: Dr Yu-Yen Ou Department of Computer Science and. ..使用多窗口掃描和自然語言處理技術研究電子傳 遞蛋白 Using multiple windows scanning and natural language processing techniques to study electron transport proteins 研 究 生 :胡光泰 Student:Ho Quang Thai 指

Ngày đăng: 29/10/2022, 05:11

w