2.2 PHƯƠNG PHÁP NGHIÊN CỨU
2.2.2 Các bước chuẩn bị tệp đầu vào
Tách protein và ligand thành tệp riêng từ phức hợp Mở PDB code 3WZD bằng phần mềm Chimera.
Menu Select -> Structure -> Protein
File -> Save PDB -> protein_extracted_from3WZD.pdb (save selected atoms only) Menu Select -> Residue -> LEV
File -> Save PDB -> LEV_extracted_from3WZD.pdb (save selected atoms only)
Xóa các nguyên tử đa vị trí (alternate location) trong protein.pdb Mở file protein.pdb bằng phần mềm Pymol. Nhập lệnh trong Pymol:
remove not alt ''+A alter all, alt=''
Lưu cấu trúc thành file 3WZD_protein_A_alternateonly.pdb
Bổ sung các amino axit còn thiếu (missing residue) bằng modeller
Mục tiêu: bổ sung các residue 814-819, 842-845, 1051-1066, 1068-1172. (Xem trong file 3WZD.pdb tải về từ PDB Databank, phần MISSING RESIDUE)
Tạo file createtemplate.py from m o d e l l e r i m p o r t *
# Get t h e s e q u e n c e o f t h e PDB f i l e , and w r i t e t o an a l i g n m e n t f i l e
code = '3 W Z D _ p r o t e i n _ A _ a l t e r n a t e o n l y '
e = e n v i r o n ( )
m = model ( e , f i l e =code ) a l n = a l i g n m e n t ( e )
a l n . append_model (m, a l i g n _ c o d e s =code ) a l n . w r i t e ( f i l e =code + '. seq ')
Chạy file
python createtemplate.py
sẽ thu được file 3WZD_protein_A_alternateonly.seq
Chuyển mã amino axit 3 kí tự sang mã 1 kí tự residue 814-819
asp glu his cys glu arg
>results for sequence "Untitled" starting "ASPGLUHISCYS"
DEHCER
residue 842-845 arg gly ala phe
>results for sequence "Untitled" starting "argglyalaphe"
RGAF
residue 1051-1066
arg asp ile tyr lys asp pro asp tyr val arg lys gly asp ala arg
>results for sequence "Untitled" starting "argaspiletyr"
RDIYKDPDYVRKGDAR
residue 1168-1172 arg ala gln gln asp
>results for sequence "Untitled" starting "argalaglngln"
RAQQD
Thay các chuỗi 1 letter code ở trên vào các vị trí có dấu / tương ứng trong file 3WZD_protein_A_alternateonly.seq
Lưu ý:Nếu như cần tính toán vị trí kí tự để chèn (double check) thì cần tính đến cả deleted residue 941-990 và cần để ý rằng khi mở file .seq lên thì có kí hiệu xuống dòng LF (dùng phần mềm Geany để mở)
Tạo tệp align.ali dựa trên nội dung tệp .seq ở trên
Xóa hết các TER record trong file 3WZD_protein_A_alternateonly.pdb
Tạo tệp addresidue.py
from modeller import *
from modeller.automodel import * # Load the automodel class log.verbose()
env = environ()
# directories for input atom files
env.io.atom_files_directory = ['.', './atom_files']
class MyModel(automodel):
def select_atoms(self):
return selection(self.residue_range('1', '6'),self.residue_range('29', '32'),
,→
self.residue_range('188',
'203'),self.residue_range('305', '309'),)
,→ ,→
a = MyModel(env, alnfile = 'align.ali',
knowns = '3WZD_protein_A_alternateonly', sequence
= '3WZD_fill')
,→
a.starting_model= 1 a.ending_model = 1 a.make()
Chạy file
python addresidue.py
Chuẩn bị ligand
Thêm các nguyên tử hidro (Add hydrogen) bằng phần mềm Avogadro Mở file LEV_extracted_from3WZD.pdb bằng phần mềm Avogadro, thêm hydrogen vào rồi lưu thành file LEV_protonated.pdb. Mở file vừa lưu ra, sửa tên chain và tên residue mà Avogadro tự động thêm hidro vào cho trùng với tên residue cũ là LEV.
Tạo tham số ligand (ligand parameter) bằng phần mềm acpype Tạo ligand parameter với force field GAFF và charge assignment = bcc charge.
acpype.py -i LEV_protonated.pdb -c bcc -n 0 -m 1 -a gaff -o gmx
,→
Mở file LEV_protonated_GMX.itp đầu ra bằng text editor xem tên các nguyên tử hidro có bị bất thường hay không (4 kí tự chẳng hạn).
Chuẩn bị các tệp đầu vào và thực hiện tính toán MD trên phần mềm Gromacs
Tạo file topology (Generate topology)
gmx pdb2gmx -f 3WZD_fill.B99990001.pdb -o 3wzd_protein.gro -water spc
,→
Chọn trường lực số 6: amber99sb-ILDN Tạo file 3wzd_complex.gro
Tạo hộp và solvat hóa (Define box and solvate)
gmx editconf -f 3wzd_complex.gro -o newbox.gro -bt
Thêm ion (Add ion)
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -nn 0
,→
Cực tiểu năng lượng (Energy Minimization) Nội dung file em.mdp như sau:
; LINES STARTING WITH ';' ARE COMMENTS
title = Minimization ; Title of run
; Parameters describing what to do, when to stop and what to save
,→
integrator = steep ; Algorithm (steep = steepest descent minimization)
,→
emtol = 1000.0 ; Stop minimization
when the maximum force < 10.0 kJ/mol
,→
emstep = 0.01 ; Energy step size
nsteps = 50000 ; Maximum
number of (minimization) steps to perform
,→
energygrps = Protein LEV ; Which energy group(s) to write to disk
,→
; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
,→
nstlist = 1 ; Frequency
to update the neighbor list and long range forces
,→
cutoff-scheme = Verlet
ns_type = grid ; Method to
determine neighbor list (simple, grid)
,→
rlist = 1.0 ; Cut-off for
making neighbor list (short range forces)
,→
coulombtype = PME ; Treatment of
long range electrostatic interactions
,→
rcoulomb = 1.0 ; long range
electrostatic cut-off
,→
rvdw = 1.0 ; long range Van
der Waals cut-off
,→
pbc = xyz ; Periodic
Boundary Conditions
,→
Chạy lệnh:
gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr gmx mdrun -deffnm em -v
Cân bằng (Equilibrium)
Ràng buộc ligand (Restraining ligand)
gmx genrestr -f LEV_protonated_GMX.gro -o posre_LEV.itp -fc 1000 1000 1000
,→
Chèn nội dung file topol.top như sau:
; Include forcefield parameters
#include "amber99sb-ildn.ff/forcefield.itp"
; Include ligand topology file
#include "LEV_protonated_GMX.itp"
; Ligand position restraints
#ifdef POSRES
#include "posre_LEV.itp"
#endif
Điều nhiệt và điều áp (Thermostat and Barostat) Nội dung 2 file nvt.mdp và npt.mdp như sau:
title = Protein-ligand complex NVT equilibration
define = -DPOSRES ; position restrain the protein and ligand
,→
; Run parameters
integrator = md ; leap-frog integrator nsteps = 100000 ; 2 * = 200 ps
dt = 0.002 ; 2 fs
; Output control
nstxout = 0 ; suppress .trr output nstvout = 0 ;suppress .trr output nstenergy = 500 ; save energies nstlog = 500 ; update log file energygrps = Protein LEV
; Bond parameters
continuation = no ; first dynamics run constraint_algorithm = lincs ; holonomic constraints
; Neighborsearching
cutoff-scheme = Verlet
ns_type = grid ; search neighboring grid cells nstlist = 40 ; 20 fs, largely irrelevant with
Verlet
,→
rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)
,→
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
,→
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
,→
pme_order = 4 ; cubic interpolation fourierspacing = 0.1 ; grid spacing for FFT
; Temperature coupling
tcoupl = V-rescale ; modified
Berendsen thermostat
,→
tc-grps = Protein_LEV Water ; two coupling groups - more accurate
,→
tau_t = 0.1 0.1 ; time constant,
in ps
,→
ref_t = 300 300 ; reference
temperature, one for each group, in K
,→
; Pressure coupling
pcoupl = no ; no pressure coupling in NVT
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = yes ; assign velocities from Maxwell distribution
,→
gen_temp = 300 ; temperature for Maxwell distribution gen_seed = -1 ; generate a random seed
title = Protein-ligand complex NPT equilibration
define = -DPOSRES ; position restrain the protein and ligand
,→
; Run parameters
integrator = md ; leap-frog integrator nsteps = 1000000 ; 2 * = 1ns
dt = 0.002 ; 2 fs
; Output control
nstxout = 0 ; suppress .trr output nstvout = 0 ; suppress .trr output
nstenergy = 500 ; save energies every 1.0 ps
nstlog = 500 ; update log file every 1.0 ps energygrps = Protein LEV
; Bond parameters
continuation = yes ; first dynamics run constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained
,→
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
cutoff-scheme = Verlet
ns_type = grid ; search neighboring grid cells nstlist = 40 ; 20 fs, largely irrelevant with
Verlet
,→
rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)
,→
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
,→
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
,→
pme_order = 4 ; cubic interpolation fourierspacing = 0.1 ; grid spacing for FFT
; Temperature coupling
tcoupl = V-rescale ; modified
Berendsen thermostat
,→
tc-grps = Protein_LEV Water ; two coupling groups - more accurate
,→
tau_t = 0.1 0.1 ; time constant,
in ps
,→
ref_t = 300 300 ; reference
temperature, one for each group, in K
,→
; Pressure coupling
pcoupl = Parrinello-Rahman ; pressure coupling is on for NPT
,→
pcoupltype = isotropic ; uniform scaling of box vectors
,→
tau_p = 2.0 ; time constant,
in ps
,→
ref_p = 1.0 ; reference
pressure, in bar
,→
compressibility = 4.5e-5 ; isothermal
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = no ; velocity generation off after NVT
Chạy lệnh:
gmx make_ndx -f em.gro -o index.ndx 1|13
q
gmx grompp -f nvt.mdp -c em.gro -p topol.top -n index.ndx -o nvt.tpr
,→
gmx mdrun -deffnm nvt -v
gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -p topol.top -n index.ndx -o npt.tpr
,→
gmx mdrun -deffnm npt -v
Thực hiện tính toán MD (MD Production) Nội dung file md.mdp như sau:
title = Protein-ligand complex MD simulation
; Run parameters
integrator = md ; leap-frog integrator nsteps = 50000000 ; 2 * = 100 ns
dt = 0.002 ; 2 fs
; Output control
nstxout = 0 ; suppress .trr output
nstvout = 0 ; suppress .trr output
nstenergy = 10000 ; save energies every 20.0 ps nstlog = 10000 ; update log file every 20.0
ps
,→
nstxout-compressed = 10000 ; write .xtc trajectory every 20.0 ps
,→
compressed-x-grps = System
energygrps = Protein LEV
; Bond parameters
continuation = yes ; first dynamics run constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained
,→
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
cutoff-scheme = Verlet
ns_type = grid ; search neighboring grid cells
rvdw = 1.0 ; short-range van der Waals cutoff (in nm)
,→
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
,→
pme_order = 4 ; cubic interpolation fourierspacing = 0.1 ; grid spacing for FFT
; Temperature coupling
tcoupl = V-rescale ; modified
Berendsen thermostat
,→
tc-grps = Protein_LEV Water ; two coupling groups - more accurate
,→
tau_t = 0.1 0.1 ; time constant,
in ps
,→
ref_t = 300 300 ; reference
temperature, one for each group, in K
,→
; Pressure coupling
pcoupl = Parrinello-Rahman ; pressure coupling is on for NPT
,→
pcoupltype = isotropic ; uniform scaling of box vectors
,→
tau_p = 2.0 ; time constant,
in ps
,→
ref_p = 1.0 ; reference
pressure, in bar
,→
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
,→
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = no ; assign velocities from Maxwell distribution
,→
Chạy lệnh:
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md.tpr
,→
gmx mdrun -deffnm md -v
Chương 3.
KẾT QUẢ VÀ THẢO LUẬN
Việc phân tích kết quả được thực hiện trên các phần mềm VMD, Chimera, Gro- macs , Grace,và các script tự viết trên ngôn ngữ lập trình Python.