LҰP C Hӌ M Ө&&Ѫ6Ӣ DӲ L IӊU CҨU T R Ú C PR O T E I N Phan MҥQK7Kѭӡng1, L âm T hӏ Hoà Bình 1ĈһQJ1Kѭ7RjQ1ĈRjQ7KLӋn M inh1 T rҫQ9ăQ/ăQJ2 Khoa Công ngh͏ WK{QJWLQ7U˱ͥQJĈ̩ i h͕c L̩c H ͛ng 10 HuǤQK9ăQ1JKӋ%LrQ+zDĈӗng Nai {thuong,binh,dangnhutoan,dtminh}@lhu.edu.vn 9L͏Q.KRDK͕FYj&{QJQJK͏9L͏W1DP 0ҥFĈƭQK&KL4XұQ73+ӗ&Kt0LQK tvlang@vast-hcm.ac.vn 7yP WҳW 7uP NLӃP Vӵ WѭѫQJ ÿӗQJ YӅ FҩX WU~F EұF ED FӫD FiF SURWHLQ WURQJFѫ Vӣ GӳOLӋXFҩXWU~F SURWHLQOӟQOj PӝWEjLWRiQSKӭFWҥSYjÿzL KӓLQKLӅX WKӡLJLDQ[ӱOê6ӕOѭӧQJFiFFҩXWU~FSURWHLQÿѭӧFNKiPSKi QJj\ FjQJ JLD WăQJ QKDQK FKyQJ Yj WURQJ FiF Fѫ Vӣ Gӳ OLӋX YӅ FҩX WU~F SURWHLQ YLӋF OұS FKӍ PөF FKR FiF SURWHLQ VӁ JL~S WKDR WiF WuP NLӃP VR ViQK FҩX WU~F WKӵF KLӋQ QKDQK KѫQ Yj KLӋX TXҧ KѫQ 7Uong báo WUuQKEj\PӝWSKѭѫQJSKiSOұSFKӍPөFFKRFѫVӣGӳOLӋXFҩXWU~FSURWHLQ WK{QJ TXD YLӋF SKkQ WtFK FҩX WU~F Wӯ ÿy U~W UD YHFWRU ÿһF WUѭQJ Yj [k\ GӵQJ PӝWFҩXWU~FFk\GӵD YHFWRUÿһFWUѭQJÿӇOұSFKӍPөFFKR FҩX WU~F SURWHLQ 9ӟL Fѫ Vӣ Gӳ OLӋX ÿm ÿѭӧF OұS FKӍ PөF YLӋF WuP NLӃP PӝW FҩX WU~F SURWHLQ KRһF PӝW FҩX WU~F FRQ WURQJ SURWHLQ WUӣ QrQ QKDQK FKyQJYjFKtQK[iFKѫQ 7ӯNKRi&ҩXWU~FSURWHLQEұFEDOұSFKӍPөFFѫVӣGӳOLӋXSURWHLQ Ĉһt Yҩn ÿӅ Protein mӝt chuӛi polypeptLGHÿѭӧc tҥo thành tӯ axít amin Nghiên cӭu SURWHLQÿyQJYDLWUzTXDQWUӑng, chúng hoҥWÿӝng tҩt cҧ trình sinh hӑc, bao gӗm cҧ xúc tác enzym (tҩt cҧ phҧn ӭng hóa hӑc tӃ bào sӕQJÿѭӧc xúc tác bӣi enzyme protein), vұn chuyӇn chҩWNKiFQKDXQKѭGѭӥQJNKtFiFLRQ«, tín hiӋu ĈӇ hiӇXÿѭӧc mӕi quan hӋ giӳa cҩu trúc chӭFQăQJ cӫa protein, nhà nghiên cӭu cҫn phҧi lҩy tӯ Fѫ Vӣ dӳ liӋu cҩu trúc protein phân loҥi chúng thành hӑ protein khác nhau.VҩQ ÿӅ quan trӑng viӋc gom nhóm protein dӵa sӵ WѭѫQJÿӗng cҩu trúc nhҵm mөc tiêu: o Phát hiӋn mӕi quan hӋ tiӃn hóa o Xác ÿӏQKFiFPRWLIÿRҥn lһp), nhӳng cҩXWU~Fÿѭӧc hình thành bӣi sӵ sҳp xӃp cӫa axit amin không gian ba chiӅu o Phát hiӋn mӕi quan hӋ giӳa cҩu trúc chӭFQăQJFӫa protein o Hӛ trӧ viӋc thiӃt kӃ thuӕc trӏ bӋnh o Phát hiӋn trình tӵ FyOLrQTXDQÿӃn bӋQKXQJWKѭYjFiFEӋnh khác 9ӟLVӵÿәLPӟLF{QJQJKӋYjSKiWWULӇQQKDQKFKyQJFӫD SKѭѫQJSKiSxác ÿӏQKFҩXWU~FSURWHLQQKѭSKѭѫQJSKiS;-quang WLQKWKӇ, NӻWKXұWSKkQWtFKTXDQJSKә NMR«PӝWVӕOѭӧQJOӟQ FiFFҩXWU~F FKLӅX FӫD FiFSKkQWӱSURWHLQ PӟLÿm ÿѭӧF[iF ÿӏQK &iFFҩXWU~F Qj\KLӋQÿDQJÿѭӧFOѭXWUӳWҥLQKLӅXFѫVӣGӳOLӋXWUrQLQWHUQHWYj FXQJFҩSPLӉQSKtFKRFiFQKjQJKLrQFӭXFyWKӇNӇÿӃQ o Ngân hàng dӳ liӋu protein PDB [1] (Protein Data Bank) thuӝc phòng thí nghiӋm RCSB (Research Collaboratory for Structural Bioinformatics): bao gӗm 73153 cҩu trúc o SCOP Structural Classification of Proteins [2]: bao gӗm 38221 cҩu trúc o CATH Protein Structure Classification [3]: bao gӗm 104238 cҩu trúc o ModBase Database of Comparative Protein Structure Models (Sali Lab, UCSF): bao gӗm 41140 cҩu trúc 7uPNLӃPVӵWѭѫQJÿӗQJYӅFҩXWU~F EұFba FӫDPӝWSURWHLQKRһFPӝWFҩXWU~F cӫDprotein EҩWNǤtrong FѫVӣGӳOLӋXFҩXWU~FSURWHLQngày OӟQ OjPӝWQKLӋP YөNKyNKăQYjWӕQWKӡLJLDQ9uYұ\ FiFQKjVLQKKӑFÿDQJFҫQPӝWSKѭѫQJWLӋQÿӇWuP NLӃPFѫVӣGӳOLӋXFҩXWU~FSURWHLQQKDQKFKyQJ YjKLӋXTXҧWѭѫQJWӵQKѭFiFK%/$67 [5] WuPNLӃP FѫVӣGӳOLӋXWUuQKWӵ %jLWRiQWuPNLӃPYjSKkQORҥLSURWHLQWKѭӡQJ WUҧLTXDKDLJLDLÿRҥQU~WWUtFKÿһFWUѭQJP{WҧFKRSURWHLQ YjÿRVӵJLӕQJQKDXYӅÿһF WUѭQJFӫDFiFSURWHLQÿӇSKkQORҥLFK~QJ ĈӇ WKӵF KLӋQ U~W WUtFK ÿһF WUѭQJ FӫD Fҩu trúc protein Fy UҩW QKLӅX WKXұW WRiQ, WKXұWWRiQ&766>6@[ҩS[ӍFҩXWU~FFiF&Į[ѭѫQJVӕQJFӫDSURWHLQ EҵQJ PӝWÿѭӡQJ VSOLQHPӏQYӟLÿӝFRQJWӕLWKLӇXVDXÿyOѭXWUӳÿѭӡQJFRQJJyF[RҳQYjFҩXWU~FEұF KDLFӫDPӛLQJX\rQWӱ&ĮWURQJPӝWPөFFKӍVӕGӵD WUrQSKpSEăP ProGreSS [5@OjPӝWSKѭѫQJSKiS PӟL, WKӵFKLӋQU~WWUtFKÿһFWUѭQJWӯFҩXWU~F NӃWKӧSYӟLWUuQKWӵWK{QJTXDPӝWFӱDVәWUѭӧWWUrQFҩXWU~F[ѭѫQJVӕQJFӫDSURWHLQ ĈһFWUѭQJYӅFҩXWU~FFӫDQyWѭѫQJWӵQKѭFiFÿһFWUѭQJU~WUDWӯ&766ÿӝcong, góc [RҳQYjWK{QJWLQFҩXWU~FEұFKDLFiFFKXӛLÿһFWUѭQJÿѭӧFWtQKWRiQWӯYLӋFVӱGөQJ PD WUұQ ÿLӇP QKѭ 3$0 KRһF %/2680 *LӕQJ QKѭ &766 FiF ÿһF WUѭQJ U~W UD Wӯ ProGreSS NK{QJSKҧLOjÿһFWUѭQJFөFEӝ 7KXұWtoán PSIST[7] OjPӝWWURQJVӕFiFWKXұWWRiQKLӋXTXҧYuFyÿӝFKtQK[iF WѭѫQJÿӕLFDR, ciFKWLӃSFұQFӫD WKXұWWRiQ36,67 ELӃQÿәLFiFWK{QJWLQFҩXWU~FFөF EӝFӫDPӝWSURWHLQWKjQKPӝWWUuQKWӵ" YjGӵDtrên WұSFiF³WUuQKWӵ´ÿy [k\GӵQJPӝW KұX WӕSKөFYөFKRYLӋFWuPNLӃP6RYӟi cách rút trích FiFÿһFWUѭQJFөFEӝWӯPӝW axit amin GX\QKҩWthì cách rút trích ÿһFWUѭQJWKHRFӱDVәWUѭӧWWURQJKѭӟQJWLӃSFұQ FӫD WKXұWWRiQ36,67 OjWӕWKѫQYuYHFWRUÿһFWUѭQJKjPFKӭD FҧKDLWK{QJWLQWӏQKWLӃQ xoay ӣ ErQ WURQJ Sau veFWѫ ÿһF WUѭQJ ÿѭӧF FKXҭQ KyD FҩX WU~F SURWHLQ ÿѭӧFFKX\ӇQWKjQKPӝWFKXӛLJӑLOjWUuQKWӵÿһFWUѭQJ-FҩXWU~FFӫDFiFNêKLӋXÿѭӧF UӡLUҥFKRi Tuy nhiên viӋc tìm kiӃm hұu tӕ thӵc sӵ FKѭDÿҥt hiӋu quҧ cao vӅ tӕFÿӝ, thuұt toán PSISA[8] sӱ dөng hѭӟng tiӃp cұQWUtFKYHFWRUÿһFWUѭQJJLӕQJ36,67QKѭQJ thay dùng hұu tӕ thuұt toán sӱ dөng mҧng hұu tӕ WURQJ SKѭѫQJ SKiS ÿiQK FKӍ mөc nhҵP WăQJ WӕF ÿӝ tìm kiӃm KӃt quҧ thӵc nghiӋm PSISA chӍ rҵQJÿiQKFKӍ mөc bҵng mҧng hұu tӕ giúp WăQJtӕFÿӝ tìm kiӃPQKѭQJÿӗng thӡLFNJQJ OjPJLDWăQJkhҧ QăQJVӱ dөng bӝ nhӟ vӟi hӋ sӕ OrQÿӃQKѫQVRYӟi hұu tӕ QKѭ PSIST 7URQJEjLEiRQj\WUuQKEj\ PӝWSKѭѫQJSKiS OұSFKӍ PөFFKRFѫ VӣGӳOLӋX FҩXWU~FSURWHLQWK{QJTXDYLӋFNӃWKӯD WKXұWWRiQ36,67 ÿӇ U~WUDYHFWRUÿһFWUѭQJYj WӯWұSFiFYHFWRUÿһFWUѭQJEjLEiRÿӅ[XҩW[k\GӵQJPӝWFҩXWU~FFk\FKӍPөF GӵDWUrQ YLӋFJKpSQKiQKFiFFKXӛLYHFWRUÿһFWUѭQJFҩXWU~FFk\Qj\YӯDJL~SKҥQFKӃYLӋFVӱ GөQJEӝQKӟYjYӯDFKRSKpSWuPNLӃPWUrQNK{QJJLDQFӫDWRjQEӝFiFFҩXWU~FWKXӝF FiFKӑSURWHLQNKiFQKDX, ÿLӅXQj\JL~SFKR YLӋFWuPNLӃPPӝWFҩXWU~FSURWHLQKRһF PӝWWLӇXFҩXWU~FWURQJSURWHLQWUӣQrQQKDQKFKyQJYjFKtQK[iFKѫQ &iFQӝLGXQJ FzQOҥLFӫDEjLEiRÿѭӧF WUuQKEj\QKѭVau: SKҫQWKӭKDLWUuQKEj\ SKѭѫQJSKiSOұSFKӍPөFGӳOLӋXFҩXWU~FSURWHLQFiFKWKӭFU~WWUtFKYHFWRUÿһFWUѭQJ FKXҭQKyDYeFWRUÿһFWUѭQJFNJQJQKѭYLӋF[k\GӵQJFk\FKӍPөFSKҫQWKӭEDQrXOrQ PӝWVӕWKӱQJKLӋPWӯQJXӗQGӳOLӋXFҩXWU~FSURWHLQ YLӋF WUX\YҩQWUrQQJXӗQGӳOLӋX Qj\SKҫQFXӕLFQJWUuQKEj\PӝWVӕÿiQKJLiYjNӃWOXұQ /ұSFKӍPөFGӳOLӋXFҩXWU~FSURWHLQ a) 5~WWUtFKYHFWRUÿһFWUѭQJ 0ӛLSURWHLQOjPӝWWәKӧSFӫDPӝWFKXӛLFyWKӭWӵFiFD[LWDPLQUHVLGXHÿѭӧF OLrQNӃWYӟLQKDXEӣLFiFOLrQNӃWSHSWLGH0ӛLUHVLGXHJӗPPӝW& D , N C khác &KLӅXGjLFӫDOLrQNӃWJyFOLrQNӃWYjFiFJyF[RҳQKRjQWRjQ[iFÿӏQKFҩXWҥRYjKuQK KӑFFӫDSURWHLQ ĈӝGjLOLrQNӃWOjNKRҧQJFiFKJLӳDFiFQJX\rQWӱÿѭӧFQӕLNӃW ÿѭӧFWtQKEҵQJ o ÿѫQYӏ Amstrong ( A )YjJyFOLrQNӃWOjJyFJLӳDKDLOLrQNӃWFӝQJKRiWUӏFӫDFQJPӝW o QJX\rQWӱ9tGөÿӝGjLOLrQNӃWJLӳDFһSQJX\rQWӱ1-C 1.33 A JyFOLrQNӃWJLӳD CD-N N-C 1220 Hình 1ĈӝGjLOLrQNӃWYjFiFJyFOLrQNӃWJLӳDFiFQJX\rQWӱ *yF[RҳQGQJÿӇP{WҧFiFFҩXWU~FFyWKӇ[RD\TXDQKFiFOLrQNӃW*LҧVӱWDFy EӕQ ngX\rQWӱÿѭӧFNӃWQӕLWK{QJTXDED OLrQNӃW%i-1, Bi Bi+1WKuJyF[RҳQFӫDPӕL OLrQNӃW%i ÿѭӧFÿӏQKQJKƭDEҵQJJyFQKӓQKҩWFӫDFiFKuQKFKLӃX%i-1 Bi+1 OrQPһW SKҷQJYX{QJJyFYӟL%i Hình 2&iFJyF[RҳQI, M Z JLӳDFiFQJX\rQWӱ ĈӇFKөSÿѭӧFFiFÿһFWUѭQJFөFEӝPӝWFiFKFKtQK[iFKѫQ FҫQSKҧLWUtFK[XҩW FiFÿһFWUѭQJWӯPӝWWұSFiFUHVLGXHFөFEӝĈӇWҥRUDYHFWRUÿһFWUѭQJFөFEӝÿҫXWLrQ P{WҧWӯQJUHVLGXHULrQJELӋWYj[iFÿӏQKVӵOLrQKӋJLӳDPӝWFһSUHVLGXHYjJLӳDPӝW o WұSFiFUHVLGXHYӟLQKDX9ӟLPӛLUHVLGXHÿӝGjLOLrQNӃWCD-N 1.46 A OLrQNӃW&D-C o 1.51 A YjJyFJLӳD&D-N CD-C 11601KѭYұ\WҩWFҧFiFWDPJLiFWҥRQrQWӯFiF QJX\rQWӱ1-CD-&FӫDPӛLUHVLGXHOjWѭѫQJÿѭѫQJQKѭQKDXYjPӛLUHVLGXHFyWKӇÿҥL GLӋQEӣLPӝWWDPJLiF KRҧQJ FiFK G JLӳD PӝW FһS UHVLGXH ÿѭӧF [iF ÿӏQK GӵD WUrQ NKRҧQJ FiFK EXFOLGH JLӳD KDL QJX\rQ Wӱ &D FӫD FK~QJ &{QJ WKӭF ÿѭӧF Vӱ GөQJ ÿӇ WtQK WRiQ NKRҧQJFiFKJLӳDhai residue (1) Góc T JLӳDPӝWFһSUHVLGXHÿѭӧF[iFÿӏQKEҵQJJyFJLӳDKDLPһWSKҷQJWҥRQrQ Wӯba QJX\rQWӱ1-CD-&FӫDPӛLUHVLGXH Hình .KRҧQJFiFKYjJyFJLӳDKDLUHVLGXH .KRҧQJ FiFK Yj JyF Oj EҩW ELӃQ ÿӕL YӟL SKpS GӏFK FKX\ӇQ Yj [RD\ SURWHLQ KRҧQJ FiFK (XFOLGH JLӳD hai QJX\rQ Wӱ &D ÿѭӧF WtQK WUӵF WLӃS Wӯ FiF WRҥ ÿӝ WURQJ không gian ba FKLӅXFӫDFK~QJ*yFJLӳDKDLPһWSKҷQJWҥRQrQWӯEӝED ngu\rQWӱ1CD-&ÿѭӧFWtQKWRiQGӵDWUrQJyFFӫDFһSYHFWRUSKiSWX\ӃQFyJӕF[XҩWSKiWWӯQJX\rQ Wӱ&D FӫDPӛLPһWSKҷQJ9HFWRUSKiSWX\ӃQQj\ÿѭӧFWtQKEӣLF{QJWKӭF (2) (2) *yFJLӳDKDLYHFWRUSKiSWX\ӃQQYjQÿѭӧFWtQKWKHRF{QJWKӭF (3) (3) ĈӇ P{Wҧ FiF ÿһFWUѭQJFөF EӝWӯPӝWWұSFiFUHVLGXH QKyP WiF JLҧ GQJ PӝW FӱD Vә Fy NtFK WKѭӟF Z WUѭӧW TXD WUrQ FKXӛL & D [ѭѫQJ VӕQJ FӫD SURWHLQ &iF NKRҧQJ FiFKYjFiFJyFJLӳDUHVLGXHÿҫXWLrQYjFiFUHVLGXHFzQOҥLWURQJFӱDVәVӁÿѭӧFWtQK toán thêm vào vHFWRUÿһFWUѭQJ, mӛLFӱDVәӭQJYӟLPӝWYHFWRUÿһFWUѭQJ &KRWұS3 ^S1,p2, pn`ÿҥLGLӋQFKRPӝWSURWHLQWURQJÿyS i OjUHVLGXHWKӭLWURQJ FҩX WU~F [ѭѫQJ VӕQJ FӫD SURWHLQ 9HFWRU ÿһF WUѭQJ FӫD SURWHLQ ÿѭӧF ÿӏQK QJKƭD Oj Pv={pv1, pv2« pvn-w+1}, ÿyZOjÿӝUӝQJFӱDVәWUѭӧWYjS vi OjYHFWRUÿһFWUѭQJFy pvi=(d(pi,pi+1FRVșSi,pi+1), , d(pi,pLZí), FRVșSi,pLZí)) YӟLGSi, pjOjNKRҧQJFiFKJLӳDKDL UHVLGXHWKӭLYjMYjFRVșSi,pjFKREӣLJyFJLӳDhai UHVLGXH9ӟLFӱDVәFyNtFKWKѭӟFZ WKuFKLӅXFӫDPӛLYHFWRUÿһFWUѭQJSvi 2(w-1) b) C huҭQKRiYHFWRUÿһFWUѭQJ 'RFiFYHFWRUÿһFWUѭQJFKӭDFiFWK{QJWLQYӅNKRҧQJFiFKYjJyFOLrQNӃWYӟL ÿѫQYӏÿROѭӡQJNKiFQKDXQrQFҫQSKҧLÿѭӧFFKXҭQKRi7KrPQӳDYLӋFFKXҭQKRiVӁ JL~SKҥQFKӃEӟWPLӅQJLiWUӏFӫDFiFWKjQKSKҫQWURQJYHFWRUÿһFWUѭQJ*yFș WKXӝF SKҥPYL>ʌ@YuYұ\FRVș[ א-1, 1] ĈӇFKXҭQKyDNKRҧQJFiFKFK~QJWDFҫQSKҧLELӃW FұQWUrQ YӅNKRҧQJFiFKJLӳDresidue WKӭL YjUHVLGXHWKӭ (i+w-1) protein 7ҩWFҧFiFNKRҧQJFiFKYjFiFJyFÿӅXÿѭӧFFKXҭQKRiYjÿѭDYӅPӝWVӕQJX\rQ WURQJNKRҧQJ>E-1] YӟLEOjPӝWWKDPVӕ FKRWUѭӟF 0ӛLNKRҧQJFiFKGWURQJYHFWRUÿһFWUѭQJVӁÿѭӧFFKXҭQKRiWKHRc{QJWKӭF(4) d= « » d *b « 4.025 * ( w 1) » (4) ¬ ¼ WURQJF{QJWKӭFJLiWUӏKҵQJVӕ5 OjNKRҧQJFiFKWUXQJEuQKJLӳDKDLQJX\rQWӱ CD , ZOjÿӝUӝQJFӱDVәWUѭӧW &iFJyFWURQJYHFWRUÿһFWUѭQJVӁÿѭӧFFKXҭQKRiWKHRF{QJWKӭF(5) cos T = « (cos T 1) * b » «¬ »¼ (5) 6DXNKLFKXҭQKRiFҩXWU~FSURWHLQVӁÿѭӧFELӇXGLӉQEҵQJPӝWFKXӛL³WUuQKWӵ´ FiFJLiWUӏUӡLUҥFWKHRFiF YHFWRUÿһFWUѭQJWURQJÿyYHFWRUWKӭLELӇXGLӉQÿһFWUѭQJ FӫDUHVLGXHWKӭLWURQJFKXӛL[ѭѫQJVӕQJFӫDSURWHLQ c) X ây dӵng chӍ mөc ĈӇ WLӃQKjQKOұSFKӍ PөF FKRWұSGӳOLӋXFҩXWU~FSURWHLQEjLEiRÿӅ[XҩW[k\ GӵQJPӝWFҩXWU~FFk\QKLӅXQKiQKWKHR WKXұWWRiQQKѭWURQJKuQK ĈҫXWLrQWKXұWWRiQVӁÿӑFGӳOLӋXFҩXWU~FFӫDWӯQJSURWHLQWURQJFѫVӣGӳOLӋX VDXÿyWLӃQKjQKU~WWUtFKÿһFWUѭQJGӵDWKHRWKXұWWRiQÿmWUuQKEj\ QKҵP³WUuQKWӵ´KRi FҩXWU~FEDFKLӅXFӫD PӛLSURWHLQEҵQJPӝWWұSFiFYHFWRUÿһFWUѭQJӭQJYӟLFҩXWU~F [ѭѫQJ VӕQJ FӫD Qy 6DX NKL FKXҭQ KRi FiF YHFWRU ÿһF WUѭQJ PӛL ³WUuQK Wӵ´ FҩX WU~F SURWHLQVӁÿѭӧF WKrPYjRWURQJFk\FKӍPөFÿӇSKөFYөFKRYLӋFWUDFӭX Hình 7KXұWWRiQWҥRFk\FKӍPөFGӵDWUrQÿһFWUѭQJFҩXWU~FFӫDSURWHLQ 9tGө;k\GӵQJFk\FKӍPөFWӯWұSJӗPViX FҩXWU~FSURWHLQÿmWUuQKWӵKRiӣ ÿk\PӛLWUuQKWӵSURWHLQÿѭӧFELӇXGLӉQEӣLPӝWWұS FiFNêWӵPӛLNêWӵӭQJYӟLPӝW YHFWRUÿһFWUѭQJÿmÿѭӧFFKXҭQKRi P1={a,b,d,f,a,h}; P2={b,a,d,b,d}; P3={a,b,c,b,d,s,f}; P4={c,a,b,a,b,c}; P5={c,a,b,c,c,b}; P6={a,c,b,a,d}; ӃWTXҧVӁÿѭӧFFҩXWU~FFk\QKѭKuQK Hình Cây FKӍPөFGӵDWUrQÿһFWUѭQJFҩXWU~FFӫDcác protein d) T ruy vҩn dӳ liӋu chӍ mөc &KRPӝWWUX\YҩQ4WUѭӟFWLrQcác vector ÿһFWUѭQJFӫDFҩXWU~F4VӁÿѭӧFtrích [XҩWYjFKX\ӇQÿәLWKjQKPӝWFKXӛL³WUuQKWӵ´QKѭP{WҧWURQJPөFD 2b6DXÿy vLӋFWUDFӭXVӁÿѭӧFWKӵFKLӋQ TXDEDJLDLÿRҥQWuPNLӃP[ӃSKҥQJYj FKӑQWӕLѭX Giai ÿRҥQ WuP NLӃP WKӕQJ Nr FҩX WU~F WURQJ Fѫ Vӣ Gӳ OLӋX SK KӧS YӟL Q theo PӝW QJѭӥQJ NKRҧQJ FiFK H JLӳD FiF YHFWRU JLDL ÿRҥQ WKӭ Kai [ӃS KҥQJ WҩW Fҧ FiF SURWHLQ FKӭD FKXӛL SK KӧS WuP WKҩ\, JLDL ÿRҥQ sau Vӱ GөQJ WKXұW WRiQ SmithWaterman[9@ÿӇWuPNLӃPFҩXWU~FWѭѫQJÿӗQJFөFEӝ WӕWQKҩW GӵDWUrQWUX\YҩQQ WұSJӗPFiFSURWHLQÿѭӧFOӵDFKӑQ 7KXұWWRiQ WuPNLӃP PүXWUX\YҩQ Q FҩXWU~FFk\FKӍ PөF ÿѭӧc trình bày QKѭVDX InputÿRҥQFҩXWU~FSURWHLQ4QJѭӥQJVRNKӟSQKӓQKҩWH Output7ұSFiFFҩXWU~FSURWHLQWKRҧÿLӅXNLӋQWuPNLӃPÿѭӧFVҳS[ӃSWKHRVӕ OѭӧQJUHVLGXHVRNKӟSJLҧPGҫQ F unction Search WUHH5RRWPͱFLFKX͟LWUX\Y̭Q4QJ˱ͩQJH ){ While (i FKL͉XFDRFk\ - ÿ͡GjLFKX͟L4 ^ - *RPQKiQKWKHRPͱFL - )RUHDFKQRGHW̩LPͱFL o 1͇XQRGH1>M@WUQJNKͣSYͣL4 [0]) )RU HDFK QKiQK FRQ FͯD 1>M@ 1͇X VR NKͣS YͣL SK̯Q FzQO̩LFͯDFKX͟L4WKR̫QJ˱ͩQJH thì: o x 7KrPQKiQKYjRW̵SN͇WTX̫ x /R̩LQKiQKNK͗L Return Search (Root, i +1, Q[0], H); 1J˱ͫFO̩L Return Search (N[j], i +1, Q[i+1], H); } end while }end function )XQFWLRQ4XHU\WUHH5RRWP̳XWUX\Y̭Q4WRSNP̳XF̯QFK͕QQJ˱ͩQJH){ - KͧLW̩RW̵SN͇WTX̫U͟QJ - 5~WWUtFKÿ̿FWU˱QJYjW̩RFKX͟LWUuQKWF̭XWU~FFKRWUX\Y̭Q4 - ;k\GQJFk\FK͑PͭF - Search (Root, i =0, Q, H); - 6̷S[͇SW̵SN͇WTX̫ JL̫PG̯QWKHRV͙O˱ͫQJVRNKͣS m ; - &K͕QNP̳XW͙WQK̭WWURQJW̵SN͇WTX̫YjiSGͭQJWKX̵WWRiQ6PLWK -Waterman WuPV̷SKjQJF̭XWU~FFͭFE͡W͙WQK̭W }end function 9tGө: 7uPNLӃPPүXWUX\YҩQ4 ^EFGE`trên FKӍPөFWӯWұSFiFFҩXWU~FSURWHin ÿmWUuQKWӵKRi YӟL QJѭӥQJH=3 7ұSJӗPP1={a,b,d,f,a,h}; P2={b,a,d,b,c}; P3={a,b,c,d,b,s,f}; P4={c,b,c,a,b,c}; P5={c,b,c,c,d,b}; P6={a,c,b,a,d} x TUX\YҩQWҥLPӭFJӕF PӭFÆ 7ұSNӃWTXҧ ^P2 (VӕVRNKӟSP )} x TUX\YҩQWҥLPӭF1Æ 7ұSNӃWTXҧ ^P4 (VӕVRNKӟSP ), P3 (m=4)} x Truy YҩQWҥLPӭF2Æ 7ұSNӃWTXҧ ^P5 (VӕVRNKӟSP )} 0ӝWVӕNӃWTXҧWKӱQJKLӋP a) C ác nguӗn dӳ liӋu cҩu trúc protein &iF FҩX WU~F SURWHLQ EұF ED ÿѭӧF OѭX WUӳ QKLӅX WҥL QJkQ KjQJ Gӳ OLӋX 3URWHLQ (PDB ± Protein Data Bank>@ÿyOj NKROѭXWUӳFKtQKFKRWKӵFQJKLӋP[iFÿӏQK FҩX trúc EұF ED FӫD Protein Ngân hàng PDB ÿѭӧF WҥR UD YjR QăP WҥL 3KzQJ WKt QJKLӋPTXӕFJLD%URRNKDYHQ %1/ӣ0ӻ1KӳQJFҩXWU~FÿѭӧF [iFÿӏQKQKӡ VӱGөQJ SKѭѫQJSKiSWLQKWKӇKӑF+LӋQ QD\FyKѫQ 73153 FҩXWU~FSURWHLQWURQJNKROѭXWUӳWҥL PDB KjQJQăP có KѫQF{QJWUuQKPӟLÿѭӧFOѭXWUӳ &iF SURWHLQ WURQJ Fѫ Vӣ Gӳ OLӋX 6&23 >@ ÿѭӧF Wә FKӭF WҥL 3KzQJ WKt QJKLӋP 6LQKKӑF3KkQWӱFӫD+ӝLÿӗQJ1JKLrQFӭX@ JӗP FiF SURWHLQ WKXӝF Fҧ EӕQ OӟS FXӝQ D, SKLӃQE, D+E D/E7ұSGӳOLӋXEDRJӗPSURWHLQWKXӝFPӛL³VLrXKӑ´VXSHUIDPLO\ WURQJWәQJVӕ³VLrXKӑ´FӫD6&23QKѭYұ\FyWәQJFӝQJSURWHLQ0үXWUX\ YҩQVӁÿѭӧFOҩ\QJүXQKLrQWӯWұSGӳOLӋX'WURQJFiFWKӱQJKLӋP&yWKDPVӕWURQJ FiFWKӱQJKLӋP JӗPZOjÿӝUӝQJFӱDVәEOjJLiWUӏFKXҭQKRiH QJѭӥQJNKRҧQJFiFK WӕLWKLӇXJLӳDKDLYHFWRUOOjÿӝGjLWӕLWKLӇXSKҧLÿҥWFӫDFKXӛLVRNKӟSOӟQQKҩWYjNOj Vӕ OѭӧQJ SURWHLQ ÿѭӧF Oҩ\ Wӯ WUrQ [XӕQJ WKHR ÿLӇP Vӕ 7KXұW WRiQ ÿѭӧF FjL ÿһW EҵQJ C++ cKҥ\ WKӱ QJKLӋP WUrQ P{L WUѭӡQJ :LQGRZV YӟL FҩX KuQK Pi\ &38 'XDO 1.6GHz, RAM 2GB 6ӕSURWHLQWKӇ KLӋQWURQJÿӗWKӏOj VӕWUXQJEuQKFiFSURWHLQWuP 6ӕSURWHLQWuPWKҩ\ 6ӕSURWHLQWuPWKҩ\ WKҩ\WURQJ³siêu Kӑ´ TXDFiFWKӱQJKLӋP .tFKWKѭӟFFӱDVәZ Hình 8 6ӕ SURWHLQ WuP WKҩ\ WURQJFQJVXSHUIDPLO\WKHRVӕ OѭӧQJNFXWRII (w=3, b=10, H=3 l=10) Hình 9 6ӕ SURWHLQ WuP WKҩ\ superfamily theo NtFKWKѭӟFFӱDVәZ (b=10, H=3 l=15) 6ӕSURWHLQWuPWKҩ\ 6ӕSURWHLQWuPWKҩ\ 6ӕOѭӧQJNFXWRII KRҧQJFiFKH *LiWUӏFKXҭQKRiE Hình 10 6ӕ SURWHLQ WuP WKҩ\ superfamily theo QJѭӥQJNKRҧQJFiFKH (w=3, b=10, l=15) Hình 11 6ӕ SURWHLQ WuP WKҩ\ superfamily theo JLiWUӏFKXҭQKRiE (w=3, H=2.5, l=15) 13 d) ĈiQKJLi YjQKұQ[pW 7URQJKuQKFKRWKҩ\VӕSURWHLQWuPÿѭӧF superfamily ÿһWÿѭӧFPӭF WUXQJEuQKNKRҧQJ YӟLVӕFXWRIIWӯÿӃQNӃWTXҧQj\FKRWKҩ\KLӋXTXҧWuP NLӃPJҫQWѭѫQJÿѭѫQJYӟL PSIST .ӃWTXҧӣhình FKRWKҩ\WKXұWWRiQKRҥWÿӝQJәQ ÿӏQK YӟLNtFKWKѭӟFFӱDVәNKRҧQJ Wӯ3 ÿӃQ QӃXYѭӧWTXDNKRҧQJQj\WKuKLӋXTXҧ JLҧPWKҩ\U}GRcác sai Vӕ SKiWVLQKWURQJTXiWUuQKU~WÿһFWUѭQJYjFKXҭQKRiYHFWRU &yWKӇFҧLWKLӋQYҩQÿӅQj\EҵQJFiFKJLDWăQJJLiWUӏFKXҭQKRi QKѭNӃWTXҧWKӇKLӋQ WURQJKuQKWX\QKLrQYLӋFQj\VӁGүQÿӃQWăQJWKӡLJLDQ[ӱ OêYjNK{QJJLDQOѭXWUӳ vector ÿһc WUѭQJ .ӃW TXҧ FKR WKҩ\ KLӋX VXҩW FӫD WKXұW WRiQ JҫQ WѭѫQJ ÿѭѫQJ YӟL 36,67 Yj Fy SKҫQWӕWKѫQ3UR*UH66WX\QKLrQQӃX[pWYӅPһWOѭXWUӳWKuWKXұWWRiQ36,67FҫQQKLӅX NK{QJJLDQKѫQFKRFk\KұXWӕQӃXSKҧLFKҥ\WUrQWұSGӳOLӋXOӟQYjWKDRWiFWuPNLӃP FNJQJSKӭFWҥSKѫQQKѭQJ FyÿӝFKtQK[iFFDRKѫQWKXұWWRiQbài báo ÿӅ[XҩW 7KXұWWRiQÿӅ[XҩWFyQKӳQJÿLӇPWӕW &k\FKӍPөFÿѭӧF[k\GӵQJPӝWOҫQYjKLӋXFKӍQKQKLӅXOҫQWURQJTXi WUuQK WuP NLӃP Ĉӝ SKӭF WҥS WuP NLӃP FKXӛL 4 ÿӝ GjL O WUrQ Fk\ FKӍ PөF FKLӅX FDR K Oj 2k*(h-l)*b), k Oj Vӕ WUXQJ EuQK FiF QKiQK Fy WUQJJLiWUӏӣPӭFi, EOjVӕQKiQKWҥLJӕF 9LӋF JӝS QKiQK NKL KLӋX FKӍQK Fk\ VӁ FKR SKpS WuP WKҩ\ FQJ O~F QKLӅXFҩXWU~FWKRҧWUX\YҩQQKiQKVDXNKLWuPWKҩ\ÿѭӧFORҥLEӓNKӓL câ\ÿӇJLҧPNK{QJ JLDQWuPNLӃPWUrQFiFPӭFFDRKѫQ 7KXұWWRiQFKRSKpSWuPWUrQWRjQEӝNK{QJJLDQGӳOLӋXFҩXWU~F .ӃWOXұQ 7URQJEjLEiRQj\WUuQKEj\PӝWKѭӟQJWLӃSFұQWURQJYLӋFOұSFKӍPөFFKRFѫVӣ GӳOLӋXFҩXWU~F EұFED FӫD SURWHLQGӵDWUrQU~WWUtFKÿһFWUѭQJFӫD protein theo WKXұW WRiQ36,67YjÿӅ[XҩWWKXұWWRiQWuPNLӃPWUrQFҩXWU~FFk\FKӍPөF%jLEiRFNJQJWUuQK Ej\YӅFiFQJXӗQGӳOLӋXFҩXWU~FEұFEDFӫDSURWHLQÿӅ[XҩWP{KuQKFѫVӣGӳOLӋXFKR YLӋFOѭXWUӳSKөFYөWKDRWiFOұSFKӍPөFYjWUDFӭXWK{QJWLQFiFFҩXWUúc protein 'ӳOLӋXGQJFKRFiFWKӱQJKLӋPÿѭӧFU~WWUtFKWӯ³VLrXKӑ´FӫD6&23YjFiFNӃW TXҧFKRWKҩ\ÿӝ FKtQK[iFWѭѫQJÿӕLFDRYjKLӋXTXҧNKLiSGөQJFiFWKXұWWRiQÿӅ[XҩW WUrQGӳOLӋXWKӱQJKLӋP 14 dăŝůŝҵƵƚŚĂŵŬŚңŽ [1] H.M Berman, J Westbrook, Z Feng, G Gilliland, T.N Bhat, H Weissig, I.N 6KLQG\DORYDQG3(%RXUQH³7KH3URWHLQ'DWD%DQN´1XFOHLF$FLGV5HVHDUFK vol 28, 2000, pp 235-242 [2@ $* 0XU]LQ 6( %UHQQHU 7 +XEEDUG DQG & &KRWKLD ³6FRS $ 6WUXFWXUDO Classification of Proteins Database for the Investigation of Sequences and 6WUXFWXUHV´-0RO%LROSS-540 [3] C.A Orengo, A.D Michie, D.T Jones, M.B Swindells, and J.M Thornton, ³&$7+ - A Hierarchic Classification of Protein Domain SWUXFWXUHV´ 6WUXFWXUH vol 5, no 8, 1997, pp 1093-1108 [4@ / +ROP DQG & 6DQGHU ³7KH )663 'DWDEDVH )ROG &ODVVLILFDWLRQ %DVHG RQ Structure - 6WUXFWXUH $OLJQPHQW RI 3URWHLQV´ 1XFOHLF $FLGV 5HVHDUFK YRO 1996, pp 206-210 [5] Can T Kahveci T Singh A.K , A and Y.F Wang, ³Progress: Simultaneous searching of protein databases by sequence and structure´, Pacific Symp Bioinformatics, pages 264±275, 2004 [6] T Can and Y.Wang, ³&766 D UREXVW DQG HI¿FLHQW PHWKRG IRU protein structure alignment based on local geometrical and biological features´ IEEE Computer Society Bioinformatics Conference (CSB), pages 169±179, 2003 [7] Mohammed J Zaki Feng Gao, ³PSIST: Indexing Protein Structures using Suffix Trees´ in IEEE Computational Systems Bioinformatics Conference, Palo Alto, CA, August 2005 [8] A Salah Tarek F Gharib and Abdel-Badeeh M.Salem, ³PSISA: an Algorithm for Indexing and Searching Protein Structure using Suffix Arrays´ In The WSEAS International Conference on Computers, pages 775±780, 2008 [9] F Smith and M Waterman, ³,GHQWL¿FDWLRQRIFRPPRQ molecular subsequences´ J Mol Biol., (147):195±197, 1981 15 [...]... YҩQWҥLPӭF2Æ 7ұSNӃWTXҧ ^P5 (VӕVRNKӟSP )} 3 0ӝWVӕNӃWTXҧWKӱQJKLӋP a) C ác nguӗn dӳ liӋu cҩu trúc protein &iF FҩX WU~F SURWHLQ EұF ED ÿѭӧF OѭX WUӳ QKLӅX WҥL QJkQ KjQJ Gӳ OLӋX 3URWHLQ (PDB ± Protein Data Bank>@ÿyOj NKROѭXWUӳFKtQKFKRWKӵFQJKLӋP[iFÿӏQK FҩX trúc EұF ED FӫD Protein Ngân hàng PDB ÿѭӧF WҥR UD YjR QăP WҥL 3KzQJ WKt QJKLӋPTXӕFJLD%URRNKDYHQ... &ѫVӣGӳOLӋX&$7+[3@ÿѭӧFWәFKӭFWҥLĈҥLKӑF8&//RQGRQKLӋQFy104238 cҩu trúc, VӱGөQJSKѭѫQJSKiSWӵÿӝQJÿӇSKkQORҥLSURWHLQ YjFNJQJFy QKӳQJÿyQJ JySFӫDFiFFKX\rQJLDNKLSKѭѫQJSKiSWӵÿӝQJNK{QJFKRNӃWTXҧÿiQJWLQFұ\&ѫVӣ Gӳ OLӋX &$7+ ÿѭӧF [k\ GӵQJ EҵQJ FiFK iS GөQJ F{QJ Fө VR ViQK FҩX WU~F EұF KDL SSAP 66$3 Vӱ GөQJ PӝW Nӻ WKXұW OұS WUuQK TX\ KRҥFK ÿӝQJ KDL OӟS ÿӇ VR NKӟS KDL protein và tìm ra FҩXWU~FOLrQNӃWWӕLѭXFӫa... JL~SKҥQFKӃEӟWPLӅQJLiWUӏFӫDFiFWKjQKSKҫQWURQJYHFWRUÿһFWUѭQJ*yFș WKXӝF SKҥPYL>ʌ@YuYұ\FRVș[ א-1, 1] ĈӇFKXҭQKyDNKRҧQJFiFKFK~QJWDFҫQSKҧLELӃW FұQWUrQ YӅNKRҧQJFiFKJLӳDresidue WKӭL YjUHVLGXHWKӭ (i+w-1) trong protein 7ҩWFҧFiFNKRҧQJFiFKYjFiFJyFÿӅXÿѭӧFFKXҭQKRiYjÿѭDYӅPӝWVӕQJX\rQ WURQJNKRҧQJ>E-1] YӟLEOjPӝWWKDPVӕ FKRWUѭӟF 0ӛLNKRҧQJFiFKGWURQJYHFWRUÿһFWUѭQJVӁÿѭӧFFKXҭQKRiWKHRc{QJWKӭF(4)... P1={a,b,d,f,a,h}; P2={b,a,d,b,d}; P3={a,b,c,b,d,s,f}; P4={c,a,b,a,b,c}; P5={c,a,b,c,c,b}; P6={a,c,b,a,d}; ӃWTXҧVӁÿѭӧFFҩXWU~FFk\QKѭKuQK Hình 5 Cây FKӍPөFGӵDWUrQÿһFWUѭQJFҩXWU~FFӫDcác protein d) T ruy vҩn dӳ liӋu trên cây chӍ mөc &KRPӝWWUX\YҩQ4WUѭӟFWLrQcác vector ÿһFWUѭQJFӫDFҩXWU~F4VӁÿѭӧFtrích [XҩWYjFKX\ӇQÿәLWKjQKPӝWFKXӛL³WUuQKWӵ´QKѭP{WҧWURQJPөFD và 2b6DXÿy... iS GөQJ F{QJ Fө VR ViQK FҩX WU~F EұF KDL SSAP 66$3 Vӱ GөQJ PӝW Nӻ WKXұW OұS WUuQK TX\ KRҥFK ÿӝQJ KDL OӟS ÿӇ VR NKӟS KDL protein và tìm ra FҩXWU~FOLrQNӃWWӕLѭXFӫa hai protein Cѫ Vӣ Gӳ OLӋX FSSP [4@ ÿm ÿѭӧF WҥR UD WKHR SKѭѫQJ SKiS SKkQ ORҥL '$/, Yj ÿѭӧFWәFKӭFWҥL9LӋQ7LQVLQKKӑFFKkXÆX(%,1yFXQJFҩSPӝWSKkQORҥLSKӭFWҥS FӫDFiFFҩXWU~FSURWHLQ6ӵWѭѫQJWӵJLӳDKDLSURWHLQÿѭӧF[iFÿӏQKGӵDWUrQFҩXWU~F ... enzyme protein) , vұn chuyӇn chҩWNKiFQKDXQKѭGѭӥQJNKtFiFLRQ«, tín hiӋu ĈӇ hiӇXÿѭӧc mӕi quan hӋ giӳa cҩu trúc chӭFQăQJ cӫa protein, nhà nghiên cӭu cҫn phҧi lҩy tӯ Fѫ Vӣ dӳ liӋu cҩu trúc protein. .. gӗm 38221 cҩu trúc o CATH Protein Structure Classification [3]: bao gӗm 104238 cҩu trúc o ModBase Database of Comparative Protein Structure Models (Sali Lab, UCSF): bao gӗm 41140 cҩu trúc 7uPNLӃPVӵWѭѫQJÿӗQJYӅFҩXWU~F... dӳ liӋu protein PDB [1] (Protein Data Bank) thuӝc phòng thí nghiӋm RCSB (Research Collaboratory for Structural Bioinformatics): bao gӗm 73153 cҩu trúc o SCOP Structural Classification of Proteins