1. Trang chủ
  2. » Ngoại Ngữ

Comparing receptor binding properties of 2019 ncov virus with those of sars cov virus using computational biophysics approach

48 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

VIETNAM NATIONAL UNIVERSITY, HANOI VIETNAM JAPAN UNIVERSITY CONG PHUONG CAO COMPARING RECEPTOR BINDING PROPERTIES OF 2019-nCoV VIRUS WITH THOSE OF SARS-CoV VIRUS USING COMPUTATIONAL BIOPHYSICS APPROACH MASTER'S THESIS VIETNAM NATIONAL UNIVERSITY, HANOI VIETNAM JAPAN UNIVERSITY CONG PHUONG CAO COMPARING RECEPTOR BINDING PROPERTIES OF 2019-nCoV VIRUS WITH THOSE OF SARS-CoV VIRUS USING COMPUTATIONAL BIOPHYSICS APPROACH MAJOR: NANOTECHNOLOGY CODE: 8440140.11 QTD RESEARCH SUPERVISOR: Associate Prof Dr NGUYEN THE TOAN Hanoi, 2021 Acknowledgements It could be said that without Prof Nguyen The Toan, I couldn’t have gone this far in my scientific research path, much less conducting this master thesis Therefore, first of all, I want to express my sincere thank to Prof Nguyen The Toan as my beloved master thesis supervisor in the VNU Key Laboratory on Multiscale Simulation of Complex Systems and the Faculty of Physics, VNU University of Science, Vietnam National University I also wish to thank Dr Pham Trong Lam, who guided me in my very first steps in the machine learning field as well as give me precious advice for my research during my internship period and thesis defense preparation I would like to thank the lecturers in VJU Master’s Program in Nanotechnology for many inspirational discussions and helpful knowledge from classes I would also like to thank all staff, lecturers, and my good friends in VJU for helping me a lot during my memorable study in VJU This research is funded by Vietnam National University under grant number QG.20.82 Hanoi, 17 July 2021 Cong Phuong Cao Contents Acknowledgements i List of Tables iv List of Figures v List of Abbreviations vi INTRODUCTION MOLECULAR DYNAMICS SIMULATION 2.1 Molecular Dynamics 2.1.1 Integration Algorithm 2.1.2 Force field 2.2 Materials and Models 2.3 Simulation Details 2.3.1 Thermostat and Barostat 2.3.2 Periodic Boundary Conditions ANALYSES METHODS 3.1 Sequence Alignment 3.2 Root Mean Square Deviation 3.3 Root Mean Square Fluctuation 3.4 Principal Component Analysis 3.5 Variational Autoencoder RESULTS AND DISCUSSION 4.1 Preliminary Sequence Alignments of The Viral RBDs 4.2 Deviations and Fluctuations of The Structural Backbone Atoms 4.2.1 Root Mean Square Deviations 4.2.2 Root Mean Square Fluctuations 4.3 Principal Component Analysis 4.4 Machine Learning on 6M0J System 4 9 10 13 13 13 14 14 15 19 19 20 21 22 25 27 CONCLUSIONS 30 REFERENCES 32 ii A IN-HOUSE SOURCE CODE A.1 Data Pre-processing Source Code A.2 Autoencoder Source Code 35 35 35 B ADDITIONAL VAE RESULTS 38 iii List of Tables 2.1 The molecules simulated for each systems 3.1 The detailed parameters of VAE model 18 4.1 The trace of the co-variance matrix of the projections of the protein backbones on the two largest principal components 26 iv List of Figures 1.1 1.2 2.1 The binding of coronavirus spike protein to human ACE2 receptor Antibodies neutralizing SARS-CoV-2 virus by blocking its interaction with human ACE2 receptor A 2-dimensional PBC view along the z-axis direction of the 6VW1 system The primitive system is surrounded and interacts with its images A typical snapshot of the 6M0J system after being simulated for 800ns showing the arrangement of RBD and ACE2 fluctuating in water 12 3.1 Illustration of VAE structure used for protein datasets 16 4.1 The sequence alignments of the viral RBD of 6VW1 and 6M0J for two variants of SARS-CoV-2 virus, and of 2AJF for SARS-CoV virus The location of four discovered significant mutations of the viral RBD The root mean square deviations of the backbone of the human ACE2 receptor and of the viral RBD protein The root mean square fluctuations of the backbone of the human ACE2 receptor and of the viral RBD protein The location of residue 113 of the viral RBD in the 6VW1 system The location of residue 50 of the viral RBD in the 2AJF system The probability density in the plane of the two largest principal components from the PCA of the backbones structure of proteins Latent space projection of variational autoencoder trained on the distance matrix of RBD-ACE2 complex of 6M0J 2.2 4.2 4.3 4.4 4.5 4.6 4.7 4.8 B.1 Latent space projection of variational autoencoder tance matrix of RBD-ACE2 complex of 6M0J B.2 Latent space projection of variational autoencoder tance matrix of RBD-ACE2 complex of 6M0J B.3 Latent space projection of variational autoencoder tance matrix of RBD-ACE2 complex of 6M0J trained trained trained on on on the the the dis dis dis 11 19 20 21 23 24 25 27 28 38 39 40 v List of Abbreviations SARS SARS-CoV-2 2019-nCoV SARS-CoV or SARS-CoV-1 RBD ACE2 MD EOM RCSB PDB PBC PME RMSD RMSF PCA VAE DAE Severe Acute Respiratory Syndrome Severe Acute Respiratory Syndrome CoronaVirus 2019 Novel CoronaVirus, colloquial name of SARS-CoV-2 Severe Scute Respiratory Syndrome CoronaVirus (caused the epidemic in June 2003, different from 2019-nCoV) Receptor-Binding Domain Angiotensin Converting Enzyme Molecular Dynamics Newton’s Equations of Motion The Research Collaboratory for Structural Bioinformatics Protein Data Bank Periodic Boundary Conditions Particle Mesh Ewald Root Mean Square Deviation Root Mean Square Fluctuation Principal Component Analysis Variational Autoencoder Deep Autoencoder vi Chapter INTRODUCTION By the end of 2019, the Severe acute respiratory syndrome coronavirus (SARS-CoV-2) (also known as 2019-nCoV) was detected in Wuhan city, China, and spread rapidly to all over the countries and regions, forcing The World Health Organization must declare a public health emergency only three months later [1] Because of the extremely fast spread rate, fast mutation rate and the toxicity of the SARS-CoV-2, scientists are rushing to find a cure for severe acute respiratory syndrome caused by the virus It turns out that the genome of SARS-CoV-2 is very similar to the genome of other coronaviruses and can be classified as a variant of the Severe acute respiratory syndrome coronavirus (SARS-CoV), which caused the SARS epidemic in June 2003 The structure of coronavirus can be divided into two parts, namely core and shell The viral core is the single-stranded RNA viral genome The viral shell is the combination of fat lipids, envelope proteins, and spike proteins, in which spike proteins play an important role in the entry of the RNA viral genome into the host cell The receptor-binding domain (RBD) is a subunit of the spike glycoprotein (also known as protein S) attached to the viral outer shell [2], [3] RBD recognizes and binds to human cells through a receptor call Angiotensin Converting Enzyme (ACE2), like a key being inserted into a lock (illustrated in Figure 1.1) [4] After that, the coronavirus is incorporated into the host cell to release the viral RNA into the cytoplasm According to [6]–[10], the RBD of SARS-CoV and SARS-CoV-2 have significant similarities in genome sequence and also use the same cellular entry receptor, namely ACE2 Because of the critical relation between SARS-CoV and SARS-CoV-2, there raises an important question: What are the significant differences (mutations) between them making SARS-CoV-2 much more contagious and dangerous? It is supposed that the mutations in the RBD of SARS-CoV-2 in respect of that of SARS-CoV can impact the binding affinity for the ACE2 receptor [8], [11] In this study, we aimed to answer the above question by analyzing the structural differences in the binding of RBDs of two variants of SARS-CoV-2 and SARS-CoV to the human ACE2 receptor F IGURE 1.1: The binding of coronavirus spike protein to human ACE2 receptor (The figure is from [5]) One of the approaches is to study the behavior of the coronaviruses (including SARSCoV-2) interactions with the human ACE2 receptor using computational biophysics approaches, such as molecular dynamics and unsupervised machine learning techniques In this study, we use both molecular dynamics and machine learning To investigate the characteristics of the binding mechanism of the complex of RBD protein and ACE2 receptor, conventional molecular dynamics is used to simulate the molecular interactions The trajectories obtained from the molecular dynamics simulation are then used as input for the principal component analysis (PCA) and the variational autoencoder (unsupervised learning methods) to extract features (knowledge) of the binding It is expected that from knowing the binding mechanism between the viral RBDs and the ACE2 receptor, one can build and develop antibodies or antiviral drugs based on the binding features of the RBD of the SARS-CoV-2 spike protein The SARS-CoV-2 spike protein is the main target for antibodies and antiviral drugs design throughout the system is the largest and that of the 2AJF system is the smallest This infers that the ACE2 receptor is more flexible in 6M0J system than in 2AJF system Opposite to the trace of the ACE2 receptor, the traces of the viral RBD decrease from 2AJF system to 6M0J system By a similar argument, this result is expected This result also showed that the viral RBD of the two new viruses is more stable than the SARS-CoV virus RBD TABLE 4.1: The trace of the co-variance matrix of the projections of the protein backbones on the two largest principal components 2AJF 6VW1 6M0J Trace ACE2 receptor 10.118 15.500 19.197 3.012 2.729 1.820 (nm2 ) viral RBD Figure 4.7 shows the probability density of the two largest principal components from the PCA of the backbones structure of proteins The projections on the third-highest principal show some simple normal distribution, thus are ignored not shown in this thesis From Figure 4.7, the brightness of an area indicates the likelihood that the system configuration will localize in this area The sharper and brighter the region is, the more preferable the system stay at that region, and vice versa The first set of three subfigures (the first row) of Figure 4.7 display the sharpest and brightest area of ACE2 receptor in the 2AJF system Despite a large number of spots, the ACE2 receptor is comparatively stable The two remaining subfigures of 6VW1 and 6M0J systems show the dim and blurred spots implying that even considering the two most outstanding motion modes, the systems still tend to vibrate freely The second set of three subfigures (the second row) of Figure 4.7 tell a contrary story The probability density of RBD protein backbone of 2AJF system is very dim and unclear On the other hand, the probability density shown in the two remaining subfigures of 6VW1 and 6M0J is brilliant and very sharp It means that the RBD protein in 6VW1 and 6M0J systems is very localized and stable throughout the simulation These results are in good agreement with the previous observations that the RBD of SARS-CoV-2 viruses are more stable than that of SAR-CoV virus meanwhile their ACE2 receptor vibrates harder A reasonable explanation for this is that the RBD protein binds stronger and more stable to the ACE2 in 6M0J and 6VW1 systems making it vibrate together 26 F IGURE 4.7: The probability density in the plane of the two largest principal components from the PCA of the backbones structure of proteins Noteably, the colorbar scale is the same for all RBD or ACE2, but different for RBD vs ACE2 The same setup is for the range of axes 4.4 Machine Learning on 6M0J System Figure 4.8 shows the latent space projection of variational autoencoder trained on the distance matrix of RBD-ACE2 complex of 6M0J system In the beginning, the latent space is expected to represent the classified system configurations in clusters with reference to the potential energy However, the results shown in Figure 4.8 are astounding From running VAE on 6M0J system, the first observation is that the latent space of VAE represents protein structures linearly according to their simulation time instead of their potential energy (Figures (b) and (d)) In MD simulation of proteins of biophysical systems, the simulation time 1ns is not too big but still long enough for proteins to perform some significant changes in their structure During this VAE training, the input protein structures are 1ns-simulation-apart from each other and shuffled However, VAE somehow can learn and organize the data representation linearly in the latent space (Figure (d)) The application of this result is promising, such 27 (a) (b) (c) (d) F IGURE 4.8: Latent space projection of variational autoencoder trained on the distance matrix of RBD-ACE2 complex of 6M0J The training procedure was done after 60 epochs Figure (a) shows the projection without labeling data Figure (b) and (d) show the projection with data labeled to time frame and potential energy respectively Figure (c) shows the histogram of the projection points from Figure (a) as predicting the next system configurations based on some simulated configurations Nevertheless, this result is not completed yet and still needs further investigations The second observation is that clearly, there are two clusters of the representative data in the latent space Figure (c) shows two distinct regions of the representative data (colored deep dark blue) On the other hand, these two regions exactly match two different simulation stages according to Figure (b) Combining the results, we can draw a conclusion that during the simulation, the 6M0J is moving from a local minimum state to another local minimum state This moving is not observed in any MD analyses before Moreover, it is native that there are still thermal fluctuations in a local minimum causing the different values of potential energy of the system (shown in Figure (d)) The upper 28 region is bigger and covers about two-third color spectrum or two-third of simulation time equally Therefore, this local minimum state movement happens recently and there also needs more further simulations and investigations In addition, some additional results of VAE training are shown in Appendix B The difference between these results is the number of epochs As the number of epochs increases, the distribution of representative points in the latent space tends to form a line This is not a good result because the first latent vector seems to linearly depend on the second latent vector at epoch 100 Therefore, the representation in the latent space would be meaningless It is supposed that the training is overfitted and needs more investigations in the future 29 CONCLUSIONS In this thesis, three systems of the human ACE2 receptor interacting with the viral RBDs of SARS-CoV virus and two variants of SARS-CoV-2 viruses are modeled and simulated using MD According to several sequence and structural analyses, considerably the two variants of SARS-CoV-2 show that they bind stronger and more stable to the human ACE2 receptor than SARS-CoV virus does Moreover, the stronger bindings can make some affects on the structure of the human receptor to vibrate more and less stable From the sequence alignments, four important mutations in the RBDs of SARS-CoV-2 viruses are detected These four important mutations are all near the binding interface and play some important roles in making SARS-CoV-2 viruses bind stronger and more stable to the human ACE2 receptors The RMSD and RMSF analyses of the protein backbone show the detailed differences in structure between the SARS-CoV virus and the other two new viruses The results indicate that the four significant mutations really make the RBD less fluctuate to bind stronger to the ACE2 receptor Especially the mutation from -PP to GVE makes the backbone more flexible to move closer to the ACE2 receptor and attach to it more tightly This mutation is also proved by other methods [30] The PCA is in good agreement with the previous results Indeed, the PCA visually displays the projections of the two largest principal components of the backbones structure of proteins corresponding to principal motion modes The VAE learning does not go along in the same direction as previous analyses to support the above results However, the VAE interestingly can learn and arrange the shuffled protein structures in simulation time order representing in the latent space The application of this result is promising but there still needs further investigations The discovery of the four important mutation positions in SARS-CoV-2 viruses is valuable information One can exert the change in the viral sequence and structure to design 30 and develop rational antibody or antiviral drug model candidates from the molecular dynamics point of view to address the critical demands of ongoing pandemics 31 REFERENCES [1] Chen Wang, Peter Horby, Frederick Hayden, et al “A novel coronavirus outbreak of global health concern” In: The Lancet 395 (Jan 2020), pp 470–473 DOI: ✶✵✳✶✵✶✻✴❙✵✶✹✵✲✻✼✸✻✭✷✵✮✸✵✶✽✺✲✾ [2] Sandrine Belouzard, Jean Millet, Beth Licitra, et al “Mechanisms of Coronavirus Cell Entry Mediated by the Viral Spike Protein” In: Viruses (June 2012), pp 1011– 1033 DOI: ✶✵✳✸✸✾✵✴✈✹✵✻✶✵✶✶ [3] Sara Sieczkarski and Gary Whittaker “Dissecting virus entry via endocytosis” In: The Journal of general virology 83 (Aug 2002), pp 1535–1545 DOI: ✶✵ ✳ ✶✵✾✾✴✵✵✷✷✲✶✸✶✼✲✽✸✲✼✲✶✺✸✺ [4] Wenhui Li, Michael Moore, Natalya Vasilieva, et al “Angiotensin-converting enzyme is a functional receptor for the SARS coronavirus” In: Nature 426 (Dec 2003), pp 450–454 DOI: ✶✵✳✶✵✸✽✴♥❛t✉r❡✵✷✶✹✺ [5] Owen Wiese, Annalise Zemlin, and Tahir Pillay “Molecules in pathogenesis: angiotensin converting enzyme (ACE2)” In: Journal of Clinical Pathology 74 (Aug 2020), pp 285–290 DOI: ✶✵✳✶✶✸✻✴❥❝❧✐♥♣❛t❤✲✷✵✷✵✲✷✵✻✾✺✹ [6] Peng Zhou, Xinglou Yang, Xian-Guang Wang, et al “A pneumonia outbreak associated with a new coronavirus of probable bat origin” In: Nature 579 (Mar 2020), pp 270–273 DOI: ✶✵✳✶✵✸✽✴s✹✶✺✽✻✲✵✷✵✲✷✵✶✷✲✼ [7] Jun Lan, Jiwan Ge, Jinfang Yu, et al “Structure of the SARS-CoV-2 spike receptorbinding domain bound to the ACE2 receptor” In: Nature 581 (May 2020), pp 1– DOI: ✶✵✳✶✵✸✽✴s✹✶✺✽✻✲✵✷✵✲✷✶✽✵✲✺ [8] Yushun Wan, Jian Shang, Rachel Graham, et al “Receptor recognition by novel coronavirus from Wuhan: An analysis based on decade-long structural studies of SARS” In: Journal of Virology 94 (Jan 2020), e00127–20 DOI: ✶✵✳✶✶✷✽✴❏❱■✳ ✵✵✶✷✼✲✷✵ [9] Wanbo Tai, Lei He, Xiujuan Zhang, et al “Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine” In: Cellular & Molecular Immunology 17 (Mar 2020), pp 1–8 DOI: ✶✵✳✶✵✸✽✴s✹✶✹✷✸✲✵✷✵✲✵✹✵✵✲✹ [10] Markus Hoffmann, Hannah Kleine-Weber, Simon Schroeder, et al “SARS-CoV2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor” In: Cell 181 (Mar 2020), pp 271–280 DOI: ✶✵✳✶✵✶✻✴ ❥✳❝❡❧❧✳✷✵✷✵✳✵✷✳✵✺✷ 32 [11] Kristian Andersen, Andrew Rambaut, W Lipkin, et al “The proximal origin of SARS-CoV-2” In: Nature Medicine 26 (Mar 2020), pp 1–3 DOI: ✶✵ ✳ ✶✵✸✽ ✴ s✹✶✺✾✶✲✵✷✵✲✵✽✷✵✲✾ [13] Kumaran Baskaran “Protein Data Bank: the single global archive for 3D macromolecular structure data” In: Nucleic Acids Research 47 (Oct 2018), pp D520– D528 DOI: ✶✵✳✶✵✾✸✴♥❛r✴❣❦②✾✹✾ [14] Fang Li, Wenhui Li, Michael Farzan, et al “Structure of SARS Coronavirus Spike Receptor-Binding Domain Complexed with Receptor” In: Science (New York, N.Y.) 309 (Oct 2005), pp 1864–1868 DOI: ✶✵✳✶✶✷✻✴s❝✐❡♥❝❡✳✶✶✶✻✹✽✵ [15] Jun Lan, Jiwan Ge, Jinfang Yu, et al “Crystal structure of the 2019-nCoV spike receptor-binding domain bound with the ACE2 receptor” In: BioRxiv (Feb 2020) DOI : ✶✵✳✶✶✵✶✴✷✵✷✵✳✵✷✳✶✾✳✾✺✻✷✸✺ [16] Jian Shang, Gang Ye, Ke Shi, et al “Structural basis for receptor recognition by the novel coronavirus from Wuhan” In: Research Square (Feb 2020) DOI: ✶✵✳✷✶✷✵✸✴rs✳✷✳✷✹✼✹✾✴✈✶ [17] Julie Thompson, Toby Gibson, and Des Higgins “Multiple Sequence Alignment Using ClustalW and ClustalX” In: Current protocols in bioinformatics / editoral board, Andreas D Baxevanis [et al.] Chapter (Sept 2002), pp 2–3 DOI: ✶✵✳✶✵✵✷✴✵✹✼✶✷✺✵✾✺✸✳❜✐✵✷✵✸s✵✵ [18] David Mount “Using BLOSUM in sequence alignments” In: CSH protocols 2008 (June 2008), pdb.top39 DOI: ✶✵✳✶✶✵✶✴♣❞❜✳t♦♣✸✾ [19] Sunhwan Jo, Taehoon Kim, Vidyashankara Iyer, et al “CHARMM-GUI: a webbased graphical user interface for CHARMM” In: Journal of computational chemistry 29 (Aug 2008), pp 1859–1865 DOI: ✶✵✳✶✵✵✷✴❥❝❝✳✷✵✾✹✺ [20] Herman Berendsen, David van der Spoel, and Rudi Drunen “GROMACS: A message-passing parallel molecular dynamics implementation” In: Computer Physics Communications 91 (Sept 1995), pp 43–56 DOI: ✶✵ ✳ ✶✵✶✻ ✴ ✵✵✶✵â⑨➇✹✻✺✺✭✾✺ ✮ ✵✵✵✹✷✲❊ [21] Jing Huang and Alexander MacKerell “CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data” In: Journal of computational chemistry 34 (Sept 2013), pp 2135–2145 DOI: ✶✵✳✶✵✵✷✴❥❝❝✳✷✸✸✺✹ [22] Karl Kirschner, Austin Yongye, Sarah Tschampel, et al “GLYCAM06: A generalizable biomolecular force field” In: Journal of computational chemistry 29 (Mar 2008), pp 622–655 DOI: ✶✵✳✶✵✵✷✴❥❝❝✳✷✵✽✷✵ [23] Yaxiong Sun and Peter Kollman “Hydrophobic Solvation of Methane and Nonbond Parameters of the TIP3P Water Model.” In: Journal of Computational Chemistry 16 (Sept 1995), pp 1164–1169 DOI: ✶✵✳✶✵✵✷✴❥❝❝✳✺✹✵✶✻✵✾✶✵ [25] Diederik P Kingma and Max Welling “Auto-encoding variational bayes” In: arXiv preprint arXiv:1312.6114 (Dec 2013) 33 [26] Matteo Degiacomi “Coupling Molecular Dynamics and Deep Learning to Mine Protein Conformational Space” In: Structure 27 (Apr 2019), pp 1034–1040 DOI : ✶✵✳✶✵✶✻✴❥✳str✳✷✵✶✾✳✵✸✳✵✶✽ [27] Naveen Michaud-Agrawal, Elizabeth Denning, Thomas Woolf, et al “MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations” In: Journal of computational chemistry 32 (July 2011), pp 23192327 DOI: [28] Franỗois Chollet et al “Keras: Deep learning library for theano and tensorflow” In: URL: https://keras io/k 7.8 (2015), T1 [30] Ashwaq Omar, Pratibha Manickavasagam, and Sk Haque “V483A: an emerging mutation hotspot of SARS-CoV-2” In: Future Virology 16 (June 2021) DOI: ✶✵✳ ✷✷✶✼✴❢✈❧✲✷✵✷✵✲✵✸✽✹ 34 Appendix A IN-HOUSE SOURCE CODE A.1 Data Pre-processing Source Code import MDAnalysis as mda from MDAnalysis analysis import align , rms from MDAnalysis import transformations import numpy as np from numpy linalg import norm from sklearn preprocessing import MinMaxScaler trj = mda Universe (’6m0j/ newtpr tpr ’, ’6m0j/ md_fit_dt200 xtc ’) trj trajectory [0] W = trj select_atoms (’all ’) W write (’6m0j/tmp.gro ’) ref = mda Universe (’6m0j/tmp.gro ’) selection = ’backbone or name CB ’ alignment = align AlignTraj (trj , ref , select = selection ) alignment run () def neighbor_distance (positions , n_neibor = 3): dis = [] n = positions shape [0] for i in range(n − 1): for j in range(i + 1, i + n_neibor + 1): if i == j or j > n − 1: continue dis append (norm( positions [i] − positions [j])) return dis dis_data = [] for iframe in range(trj trajectory n_frames ): trj trajectory [ iframe ] time_ = trj trajectory time if time_ % 1000 != 0: continue ts = trj select_atoms ( selection ) pos = ts positions dis = neighbor_distance (pos) dis_data append (dis) scaler = MinMaxScaler () dis_data = scaler fit_transform ( dis_data ) np savetxt (’dis_data_2001 data ’, dis_data ) A.2 Autoencoder Source Code 35 import numpy as np from numpy linalg import norm from matplotlib import pyplot as plt from matplotlib import gridspec as grs import matplotlib import seaborn as sns from sklearn preprocessing import MinMaxScaler from sklearn model_selection import train_test_split as tts import keras import tensorflow as tf from keras import layers from keras import backend as K from tensorflow keras initializers import GlorotUniform from tensorflow keras import regularizers layer = [1340 , 153 , 18] nepoch = 100 nbatch = 64 seedid = latent_dim = tf random set_seed (2022) np random seed (23) init = GlorotUniform (seed =15062021) opti = tf keras optimizers Adam (1e − 3) kregularizer = None x_data = np loadtxt (’dis_data_2001 data ’) y_data = [] with open(’6m0j/ energy_every1000 xvg ’) as f: for line in f: content = line split () timeframe = float ( content [0]) potential = float ( content [1]) y_data append ([ timeframe , potential ]) y_data = np array ( y_data ) print ( y_data [:, 0], y_data [:, 1]) x_train , x_test , y_train , y_test = tts(x_data , y_data , test_size =0.3) original_dim = x_train shape [1] inputs = keras Input ( shape =( original_dim ,)) h = layers Dense ( layer [0] , activation =’relu ’, kernel_initializer =init , kernel_regularizer = kregularizer )( inputs ) for nnode in layer [1:]: h = layers Dense (nnode , activation =’relu ’, kernel_initializer =init , kernel_regularizer = kregularizer )(h) z_mean = layers Dense ( latent_dim , kernel_initializer =init , kernel_regularizer = kregularizer )(h) z_log_sigma = layers Dense ( latent_dim , kernel_initializer =init , kernel_regularizer = kregularizer )(h) 36 def sampling (args ): z_mean , z_log_sigma = args epsilon = K random_normal ( shape =(K shape ( z_mean )[0] , latent_dim ), mean =0 , stddev =0.1 , seed =2021) return z_mean + K.exp( z_log_sigma ) * epsilon z = layers Lambda ( sampling )([ z_mean , z_log_sigma ]) encoder = keras Model (inputs , [z_mean , z_log_sigma , z], name=’encoder ’) latent_inputs = keras Input ( shape =( latent_dim ,), name=’z_sampling ’) x = layers Dense ( layer [ − 1], activation =’relu ’, kernel_initializer =init , kernel_regularizer = kregularizer )( latent_inputs ) for i in range(len( layer ) − 2, − 1, − 1): x = layers Dense ( layer [i], activation =’relu ’, kernel_initializer =init , kernel_regularizer = kregularizer )(x) outputs = layers Dense ( original_dim , activation =’sigmoid ’, kernel_initializer =init , kernel_regularizer = kregularizer )(x) decoder = keras Model ( latent_inputs , outputs , name=’decoder ’) outputs = decoder ( encoder ( inputs )[2]) vae = keras Model (inputs , outputs , name=’vae_mlp ’) reconstruction_loss = keras losses binary_crossentropy (inputs , outputs ) reconstruction_loss * = original_dim kl_loss = + z_log_sigma − K square ( z_mean ) − K.exp( z_log_sigma ) kl_loss = K.sum(kl_loss , axis = − 1) kl_loss * = − 0.5 vae_loss = K.mean( reconstruction_loss + kl_loss ) vae add_loss ( vae_loss ) vae compile ( optimizer =opti , metrics = [’mse ’, ’mae ’]) vae summary () history = vae.fit(x_train , x_train , epochs =nepoch , batch_size =nbatch , validation_data =( x_test , x_test )) x_test_encoded = encoder predict (x_test , batch_size = nbatch ) x_test_encoded = np array ( x_test_encoded ) plt figure ( figsize =(18 , 12)) plt scatter ( x_test_encoded [0, :, 0], x_test_encoded [0, :, 1], s =100) plt.grid () sns.set( color_codes =True) plt figure ( figsize =(20 , 12)) ax = sns kdeplot ( x_test_encoded [0, :, 0], x_test_encoded [0, :, 1], n_levels =15 , cmap=" Blues ", shade =True , shade_lowest = False ) 37 Appendix B ADDITIONAL VAE RESULTS (a) (b) (c) (d) F IGURE B.1: Latent space projection of variational autoencoder trained on the distance matrix of RBD-ACE2 complex of 6M0J The training procedure was done after 10 epochs Figure (a) shows the projection without labeling data Figure (b) and (d) show the projection with data labeled to time frame and potential energy respectively Figure (c) show the histogram of the projection points from Figure (a) 38 (a) (b) (c) (d) F IGURE B.2: Latent space projection of variational autoencoder trained on the distance matrix of RBD-ACE2 complex of 6M0J The training procedure was done after 50 epochs Figure (a) shows the projection without labeling data Figure (b) and (d) show the projection with data labeled to time frame and potential energy respectively Figure (c) show the histogram of the projection points from Figure (a) 39 (a) (b) (c) (d) F IGURE B.3: Latent space projection of variational autoencoder trained on the distance matrix of RBD-ACE2 complex of 6M0J The training procedure was done after 100 epochs Figure (a) shows the projection without labeling data Figure (b) and (d) show the projection with data labeled to time frame and potential energy respectively Figure (c) show the histogram of the projection points from Figure (a) 40 ... VIETNAM JAPAN UNIVERSITY CONG PHUONG CAO COMPARING RECEPTOR BINDING PROPERTIES OF 2019- nCoV VIRUS WITH THOSE OF SARS- CoV VIRUS USING COMPUTATIONAL BIOPHYSICS APPROACH MAJOR: NANOTECHNOLOGY CODE:... RMSF of RBD of SARS- CoV- 2 viruses at these residues is smaller than that of SARS- CoV virus Especially at residue 150152, the RMSF of RBD of SARS- CoV virus is almost twice as much as that of the... differences in the binding of RBDs of two variants of SARS- CoV- 2 and SARS- CoV to the human ACE2 receptor F IGURE 1.1: The binding of coronavirus spike protein to human ACE2 receptor (The figure

Ngày đăng: 12/12/2021, 21:01

Xem thêm: