Các bước chuẩn bị tệp đầu vào

2.2 PHƯƠNG PHÁP NGHIÊN CỨU

2.2.2 Các bước chuẩn bị tệp đầu vào

Tách protein và ligand thành tệp riêng từ phức hợp Mở PDB code 3WZD bằng phần mềm Chimera.

Menu Select -> Structure -> Protein

File -> Save PDB -> protein_extracted_from3WZD.pdb (save selected atoms only) Menu Select -> Residue -> LEV

File -> Save PDB -> LEV_extracted_from3WZD.pdb (save selected atoms only)

Xóa các nguyên tử đa vị trí (alternate location) trong protein.pdb Mở file protein.pdb bằng phần mềm Pymol. Nhập lệnh trong Pymol:

remove not alt ''+A alter all, alt=''

Lưu cấu trúc thành file 3WZD_protein_A_alternateonly.pdb

Bổ sung các amino axit còn thiếu (missing residue) bằng modeller

Mục tiêu: bổ sung các residue 814-819, 842-845, 1051-1066, 1068-1172. (Xem trong file 3WZD.pdb tải về từ PDB Databank, phần MISSING RESIDUE)

Tạo file createtemplate.py from m o d e l l e r i m p o r t *

# Get t h e s e q u e n c e o f t h e PDB f i l e , and w r i t e t o an a l i g n m e n t f i l e

code = '3 W Z D _ p r o t e i n _ A _ a l t e r n a t e o n l y '

e = e n v i r o n ( )

m = model ( e , f i l e =code ) a l n = a l i g n m e n t ( e )

a l n . append_model (m, a l i g n _ c o d e s =code ) a l n . w r i t e ( f i l e =code + '. seq ')

Chạy file

python createtemplate.py

sẽ thu được file 3WZD_protein_A_alternateonly.seq

Chuyển mã amino axit 3 kí tự sang mã 1 kí tự residue 814-819

asp glu his cys glu arg

>results for sequence "Untitled" starting "ASPGLUHISCYS"

DEHCER

residue 842-845 arg gly ala phe

>results for sequence "Untitled" starting "argglyalaphe"

RGAF

residue 1051-1066

arg asp ile tyr lys asp pro asp tyr val arg lys gly asp ala arg

>results for sequence "Untitled" starting "argaspiletyr"

RDIYKDPDYVRKGDAR

residue 1168-1172 arg ala gln gln asp

>results for sequence "Untitled" starting "argalaglngln"

RAQQD

Thay các chuỗi 1 letter code ở trên vào các vị trí có dấu / tương ứng trong file 3WZD_protein_A_alternateonly.seq

Lưu ý:Nếu như cần tính toán vị trí kí tự để chèn (double check) thì cần tính đến cả deleted residue 941-990 và cần để ý rằng khi mở file .seq lên thì có kí hiệu xuống dòng LF (dùng phần mềm Geany để mở)

Tạo tệp align.ali dựa trên nội dung tệp .seq ở trên

Xóa hết các TER record trong file 3WZD_protein_A_alternateonly.pdb

Tạo tệp addresidue.py

from modeller import *

from modeller.automodel import * # Load the automodel class log.verbose()

env = environ()

# directories for input atom files

env.io.atom_files_directory = ['.', './atom_files']

class MyModel(automodel):

def select_atoms(self):

return selection(self.residue_range('1', '6'),self.residue_range('29', '32'),

,→

self.residue_range('188',

'203'),self.residue_range('305', '309'),)

,→ ,→

a = MyModel(env, alnfile = 'align.ali',

knowns = '3WZD_protein_A_alternateonly', sequence

= '3WZD_fill')

,→

a.starting_model= 1 a.ending_model = 1 a.make()

Chạy file

python addresidue.py

Chuẩn bị ligand

Thêm các nguyên tử hidro (Add hydrogen) bằng phần mềm Avogadro Mở file LEV_extracted_from3WZD.pdb bằng phần mềm Avogadro, thêm hydrogen vào rồi lưu thành file LEV_protonated.pdb. Mở file vừa lưu ra, sửa tên chain và tên residue mà Avogadro tự động thêm hidro vào cho trùng với tên residue cũ là LEV.

Tạo tham số ligand (ligand parameter) bằng phần mềm acpype Tạo ligand parameter với force field GAFF và charge assignment = bcc charge.

acpype.py -i LEV_protonated.pdb -c bcc -n 0 -m 1 -a gaff -o gmx

,→

Mở file LEV_protonated_GMX.itp đầu ra bằng text editor xem tên các nguyên tử hidro có bị bất thường hay không (4 kí tự chẳng hạn).

Chuẩn bị các tệp đầu vào và thực hiện tính toán MD trên phần mềm Gromacs

Tạo file topology (Generate topology)

gmx pdb2gmx -f 3WZD_fill.B99990001.pdb -o 3wzd_protein.gro -water spc

,→

Chọn trường lực số 6: amber99sb-ILDN Tạo file 3wzd_complex.gro

Tạo hộp và solvat hóa (Define box and solvate)

gmx editconf -f 3wzd_complex.gro -o newbox.gro -bt

Thêm ion (Add ion)

gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -nn 0

,→

Cực tiểu năng lượng (Energy Minimization) Nội dung file em.mdp như sau:

; LINES STARTING WITH ';' ARE COMMENTS

title = Minimization ; Title of run

; Parameters describing what to do, when to stop and what to save

,→

integrator = steep ; Algorithm (steep = steepest descent minimization)

,→

emtol = 1000.0 ; Stop minimization

when the maximum force < 10.0 kJ/mol

,→

emstep = 0.01 ; Energy step size

nsteps = 50000 ; Maximum

number of (minimization) steps to perform

,→

energygrps = Protein LEV ; Which energy group(s) to write to disk

,→

; Parameters describing how to find the neighbors of each atom and how to calculate the interactions

,→

nstlist = 1 ; Frequency

to update the neighbor list and long range forces

,→

cutoff-scheme = Verlet

ns_type = grid ; Method to

determine neighbor list (simple, grid)

,→

rlist = 1.0 ; Cut-off for

making neighbor list (short range forces)

,→

coulombtype = PME ; Treatment of

long range electrostatic interactions

,→

rcoulomb = 1.0 ; long range

electrostatic cut-off

,→

rvdw = 1.0 ; long range Van

der Waals cut-off

,→

pbc = xyz ; Periodic

Boundary Conditions

,→

Chạy lệnh:

gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr gmx mdrun -deffnm em -v

Cân bằng (Equilibrium)

Ràng buộc ligand (Restraining ligand)

gmx genrestr -f LEV_protonated_GMX.gro -o posre_LEV.itp -fc 1000 1000 1000

,→

Chèn nội dung file topol.top như sau:

; Include forcefield parameters

#include "amber99sb-ildn.ff/forcefield.itp"

; Include ligand topology file

#include "LEV_protonated_GMX.itp"

; Ligand position restraints

#ifdef POSRES

#include "posre_LEV.itp"

#endif

Điều nhiệt và điều áp (Thermostat and Barostat) Nội dung 2 file nvt.mdp và npt.mdp như sau:

title = Protein-ligand complex NVT equilibration

define = -DPOSRES ; position restrain the protein and ligand

,→

; Run parameters

integrator = md ; leap-frog integrator nsteps = 100000 ; 2 * = 200 ps

dt = 0.002 ; 2 fs

; Output control

nstxout = 0 ; suppress .trr output nstvout = 0 ;suppress .trr output nstenergy = 500 ; save energies nstlog = 500 ; update log file energygrps = Protein LEV

; Bond parameters

continuation = no ; first dynamics run constraint_algorithm = lincs ; holonomic constraints

; Neighborsearching

cutoff-scheme = Verlet

ns_type = grid ; search neighboring grid cells nstlist = 40 ; 20 fs, largely irrelevant with

Verlet

,→

rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)

,→

rvdw = 1.0 ; short-range van der Waals cutoff (in nm)

,→

; Electrostatics

coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics

,→

pme_order = 4 ; cubic interpolation fourierspacing = 0.1 ; grid spacing for FFT

; Temperature coupling

tcoupl = V-rescale ; modified

Berendsen thermostat

,→

tc-grps = Protein_LEV Water ; two coupling groups - more accurate

,→

tau_t = 0.1 0.1 ; time constant,

in ps

,→

ref_t = 300 300 ; reference

temperature, one for each group, in K

,→

; Pressure coupling

pcoupl = no ; no pressure coupling in NVT

; Periodic boundary conditions

pbc = xyz ; 3-D PBC

; Dispersion correction

DispCorr = EnerPres ; account for cut-off vdW scheme

; Velocity generation

gen_vel = yes ; assign velocities from Maxwell distribution

,→

gen_temp = 300 ; temperature for Maxwell distribution gen_seed = -1 ; generate a random seed

title = Protein-ligand complex NPT equilibration

define = -DPOSRES ; position restrain the protein and ligand

,→

; Run parameters

integrator = md ; leap-frog integrator nsteps = 1000000 ; 2 * = 1ns

dt = 0.002 ; 2 fs

; Output control

nstxout = 0 ; suppress .trr output nstvout = 0 ; suppress .trr output

nstenergy = 500 ; save energies every 1.0 ps

nstlog = 500 ; update log file every 1.0 ps energygrps = Protein LEV

; Bond parameters

continuation = yes ; first dynamics run constraint_algorithm = lincs ; holonomic constraints

constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained

,→

lincs_iter = 1 ; accuracy of LINCS

lincs_order = 4 ; also related to accuracy

; Neighborsearching

cutoff-scheme = Verlet

ns_type = grid ; search neighboring grid cells nstlist = 40 ; 20 fs, largely irrelevant with

Verlet

,→

rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)

,→

rvdw = 1.0 ; short-range van der Waals cutoff (in nm)

,→

; Electrostatics

coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics

,→

pme_order = 4 ; cubic interpolation fourierspacing = 0.1 ; grid spacing for FFT

; Temperature coupling

tcoupl = V-rescale ; modified

Berendsen thermostat

,→

tc-grps = Protein_LEV Water ; two coupling groups - more accurate

,→

tau_t = 0.1 0.1 ; time constant,

in ps

,→

ref_t = 300 300 ; reference

temperature, one for each group, in K

,→

; Pressure coupling

pcoupl = Parrinello-Rahman ; pressure coupling is on for NPT

,→

pcoupltype = isotropic ; uniform scaling of box vectors

,→

tau_p = 2.0 ; time constant,

in ps

,→

ref_p = 1.0 ; reference

pressure, in bar

,→

compressibility = 4.5e-5 ; isothermal

; Dispersion correction

DispCorr = EnerPres ; account for cut-off vdW scheme

; Velocity generation

gen_vel = no ; velocity generation off after NVT

Chạy lệnh:

gmx make_ndx -f em.gro -o index.ndx 1|13

gmx grompp -f nvt.mdp -c em.gro -p topol.top -n index.ndx -o nvt.tpr

,→

gmx mdrun -deffnm nvt -v

gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -p topol.top -n index.ndx -o npt.tpr

,→

gmx mdrun -deffnm npt -v

Thực hiện tính toán MD (MD Production) Nội dung file md.mdp như sau:

title = Protein-ligand complex MD simulation

; Run parameters

integrator = md ; leap-frog integrator nsteps = 50000000 ; 2 * = 100 ns

dt = 0.002 ; 2 fs

; Output control

nstxout = 0 ; suppress .trr output

nstvout = 0 ; suppress .trr output

nstenergy = 10000 ; save energies every 20.0 ps nstlog = 10000 ; update log file every 20.0

,→

nstxout-compressed = 10000 ; write .xtc trajectory every 20.0 ps

,→

compressed-x-grps = System

energygrps = Protein LEV

; Bond parameters

continuation = yes ; first dynamics run constraint_algorithm = lincs ; holonomic constraints

constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained

,→

lincs_iter = 1 ; accuracy of LINCS

lincs_order = 4 ; also related to accuracy

; Neighborsearching

cutoff-scheme = Verlet

ns_type = grid ; search neighboring grid cells

rvdw = 1.0 ; short-range van der Waals cutoff (in nm)

,→

; Electrostatics

coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics

,→

pme_order = 4 ; cubic interpolation fourierspacing = 0.1 ; grid spacing for FFT

; Temperature coupling

tcoupl = V-rescale ; modified

Berendsen thermostat

,→

tc-grps = Protein_LEV Water ; two coupling groups - more accurate

,→

tau_t = 0.1 0.1 ; time constant,

in ps

,→

ref_t = 300 300 ; reference

temperature, one for each group, in K

,→

; Pressure coupling

pcoupl = Parrinello-Rahman ; pressure coupling is on for NPT

,→

pcoupltype = isotropic ; uniform scaling of box vectors

,→

tau_p = 2.0 ; time constant,

in ps

,→

ref_p = 1.0 ; reference

pressure, in bar

,→

compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1

,→

; Periodic boundary conditions

pbc = xyz ; 3-D PBC

; Dispersion correction

DispCorr = EnerPres ; account for cut-off vdW scheme

; Velocity generation

gen_vel = no ; assign velocities from Maxwell distribution

,→

Chạy lệnh:

gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md.tpr

,→

gmx mdrun -deffnm md -v

Chương 3.

KẾT QUẢ VÀ THẢO LUẬN

Việc phân tích kết quả được thực hiện trên các phần mềm VMD, Chimera, Gro- macs , Grace,và các script tự viết trên ngôn ngữ lập trình Python.

Cực tiểu năng lượng - EM