1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

SEQUENCE ALIGNMENT ppt

29 67 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 29
Dung lượng 303,38 KB

Nội dung

SEQUENCE ALIGNMENT SEQUENCE ALIGNMENT   Two Alignment Two Alignment   Multiple Alignment Multiple Alignment Fundamental Fundamental steps of the steps of the procedure procedure leading leading to optimal 2 to optimal 2 sequences sequences alignment alignment 1 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0. 0 % 2 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0. 0% 3 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0. 0% 4 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 1 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 25. 0% 5 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 .0% • • • n - 1 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 1 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 3. 6 % n R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 1 8 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 6 2 .1% n + 1 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 5 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 1 7 .2% n + 2 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 2 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 6 .9% • • • n + m -3 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 1 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 33 . 3% n + m -2 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 . 0% n + m -1 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 . 0% n R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G 22 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 7 3 % m Comparison of Comparison of the fragments the fragments of 1st and 2nd of 1st and 2nd domain of domain of chicken chicken ovomucoid ovomucoid using unitary using unitary matrix, GCM, matrix, GCM, PAM250 and PAM250 and algorithm of algorithm of genetic genetic semihomology semihomology GTTAATTGCAGCCTGTATGCCAGCGGCATCGGCAAGGATGGGACGAGTTGGGTAGCC 1) V N C S L Y A S G I G K D G T S W V A ATTGATTGCTCTCCGTACCTCCAA GTTGTAAGAGATGGTAACACCATGGTAGCC 2) I D C S P Y L Q - V V R D G N T M V A V N C S L Y A S G I G K D G T S W V A % I D C S P Y D G N T M V A 0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 7/19 36.8 GTTAATTGCAGCCTGTATGCCAGCGGCATCGGCAAGGATGGGACGAGTTGGGTAGCC ATTGATTGCTCTCCGTACCTC GTTGTAAGAGATGGTAACACCATGGTAGCC 2 2 3 0 2 2 1 0 0 1 1 1 3 2 1 1 1 3 3 29/57 50.9 V N C S L Y A S G I G K D G T S W V A 42/97 43.3 I D C S P Y L V V R D G N T M V A 42/89 47.2 1 1 2 2 0 2 0 0 0 1 0 1 2 2 1 1 0 2 2 20/38 52.6 V N C S L Y A S G I G K D G T S W V A I D C S P Y L V V R D G N T M V A 2 2 3 3 2 3 0 0 0 2 1 2 3 3 1 1 0 3 3 34/57 59.6 <L Q V V R> < CAA > < Q > < Q > UNITARY MATRIX GENETIC CODE MATRIX PAM250 SCORING GENETIC SEMIHOMOLOGY SCORE What is What is important important in the in the protein protein similarity similarity search ? search ? 1) Contribution (%) of identical positions P K I L M E C K K D 8 P K I L M E C K K D 2 P K I L M K C K H D 8 0 % S D C L L D C V C L 2 0 % similar not similar 2) Length of the compared strings (sequences) LCE 1 M V EI C I E P K I R C I K V C T K D E R I T C L I L D ET 8 WCG 33.3% M V Y WC P R R F M H C V H L K A G G C T C W C L R L D Y Y 2 6 % casual probably similar 3) Distribution of the identical positions along the analyzed sequence MVEMICIEPKIRCIKVCTKDERITL 5 MVEMIMAGDARCIKVCTKDERITCL 5 HVYYWRPERFMHTVKLKAGGCRCWL 20% HHYYWMAGDAHTVQLKAGGCWCWAG 20% casual similar 4) Residues at conservative positions M V C P K I L M K C K H D S D C L L D C V C L E D M V C P K I L M K C K H D S D T L L D C V C L E D E D E G K R R T K R E H F K E S N L A A A F K E Q Q N C P G P R E W C F T T R M N D S S C A C P Q T not similar similar 5) Structural/genetic similarity of the amino acids at non-conservative positions Identity only M V C P K I L M K C K H D S D C L L D C V C L E D R L C R R L V K R C R K E T E C I V E C I C I D E Structural Genetic M V C P K I L M K C K H D S D C L L D C V C L E D M V C P K I L M K C K H D S D C L L D C V C L E D R L C R R L V K R C R K E T E C I V E C I C I D E R L C R R L V K R C R K E T E C I V E C I C I D E The The sequence sequence identity identity estimation estimation procedure procedure The probability of randomly occurred minimum identity match ( a is equal to declared or higher) is: Where: x – the number of unit types in sequence (20 for proteins; 4 for NA) n – the sequence length (the number of compared position pairs) a – the number of identical positions ( )( ) n kn k n ak an x xxx k n P 2 1 − = −       = ∑ Genetic conditioning of the amino acid replacement probabilities and spectrum in molecular evolution Do the amino acids possess their pedigree ? or Do they contain the information about their history (genealogy)? Can the amino acid mutational replacements described as Markovian processes ? or The Markov model assumes that the substitution probability of amino acid AA 1 by AA 2 is the same, regardless of what the initial residue AA 1 was transformed from ( AA x , AA y ) The currently used statistical algorithms are based on Markovian model of the amino acid replacement (they directly use stochastic matrices of replacement frequency indices) AA 1 AA 2 AA x P a AA 1 AA 2 AA y P b P a = P b C 12 S 0 2 T -2 1 3 P -3 1 0 6 A -2 1 1 1 2 G -3 1 0 -1 1 5 N -4 1 0 -1 0 0 2 D -5 0 0 -1 0 1 2 4 E -5 0 0 -1 0 0 1 3 4 Q -5 -1 -1 0 0 -1 1 2 2 4 H -3 -1 -1 0 -1 -2 2 1 1 3 6 R -4 0 -1 0 -2 -3 0 -1 -1 1 2 6 K -5 0 0 -1 -1 -2 1 0 0 1 0 3 5 M -5 -2 -1 -2 -1 -3 -2 -3 -2 -1 -2 0 0 6 I -2 -1 0 -2 -1 -3 -2 -2 -2 -2 -2 -2 -2 2 5 L -6 -3 -2 -3 -2 -4 -3 -4 -3 -2 -2 -3 -3 4 2 6 V -2 -1 0 -1 0 -1 -2 -2 -2 -2 -2 -2 -2 2 4 2 4 F -4 -3 -3 -5 -5 -5 -4 -6 -5 -5 -2 -4 -5 0 1 2 -1 9 Y 0 -3 -3 -5 -3 -5 -2 -4 -4 -4 0 -4 -4 -2 -1 -1 -2 7 10 W -8 -2 -5 -6 -6 -7 -4 -7 -7 -5 -3 2 -3 -4 -5 -2 -6 0 0 17 C S T P A G N D E Q H R K M I L V F Y W PAM250 matrix of amino acid replacements Why tryptophane is here the most conservative residue? A 4 R -1 5 N -2 0 6 D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A R N D C Q E G H I L K M F P S T W Y V BLOSUM62 matrix of amino acid replacements [...]... [YHS][K] [DN][M] GA EKRA C RKE PLQE KERD [ISV][H] [VG][PT] 66 [MEK][PS] ! What part of the codon contains the information about the previous amino acid that occurred at certain position of the protein sequence? At most 2/3 of the entire codon Ala Val GCG GUG How long is the information about codons of preceeding amino acids stored? The shortest storage period is 3 transitions/transversions Ala Val Met . SEQUENCE ALIGNMENT SEQUENCE ALIGNMENT   Two Alignment Two Alignment   Multiple Alignment Multiple Alignment Fundamental. procedure procedure leading leading to optimal 2 to optimal 2 sequences sequences alignment alignment 1 R V C P K I L M E C K K D S D C L A E C I C L

Ngày đăng: 16/03/2014, 02:20

TỪ KHÓA LIÊN QUAN