Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 29 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
29
Dung lượng
303,38 KB
Nội dung
SEQUENCEALIGNMENT
SEQUENCE ALIGNMENT
Two Alignment
Two Alignment
Multiple Alignment
Multiple Alignment
Fundamental
Fundamental
steps of the
steps of the
procedure
procedure
leading
leading
to optimal 2
to optimal 2
sequences
sequences
alignment
alignment
1
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0. 0 %
2
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0. 0%
3
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0. 0%
4
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 1
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 25. 0%
5
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 .0%
•
•
•
n - 1
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 1
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 3. 6 %
n
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 1 8
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 6 2 .1%
n + 1
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 5
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 1 7 .2%
n + 2
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 2
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 6 .9%
•
•
•
n + m -3
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 1
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 33 . 3%
n + m -2
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 . 0%
n + m -1
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 . 0%
n
R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G 22
M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 7 3 %
m
Comparison of
Comparison of
the fragments
the fragments
of 1st and 2nd
of 1st and 2nd
domain of
domain of
chicken
chicken
ovomucoid
ovomucoid
using unitary
using unitary
matrix, GCM,
matrix, GCM,
PAM250 and
PAM250 and
algorithm of
algorithm of
genetic
genetic
semihomology
semihomology
GTTAATTGCAGCCTGTATGCCAGCGGCATCGGCAAGGATGGGACGAGTTGGGTAGCC
1) V N C S L Y A S G I G K D G T S W V A
ATTGATTGCTCTCCGTACCTCCAA GTTGTAAGAGATGGTAACACCATGGTAGCC
2) I D C S P Y L Q - V V R D G N T M V A
V N C S L Y A S G I G K D G T S W V A
%
I D C S P Y D G N T M V A
0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 7/19 36.8
GTTAATTGCAGCCTGTATGCCAGCGGCATCGGCAAGGATGGGACGAGTTGGGTAGCC
ATTGATTGCTCTCCGTACCTC GTTGTAAGAGATGGTAACACCATGGTAGCC
2 2 3 0 2 2 1 0 0 1 1 1 3 2 1 1 1 3 3 29/57 50.9
V N C S L Y A S G I G K D G T S W V A
42/97 43.3
I D C S P Y L V V R D G N T M V A
42/89 47.2
1 1 2 2 0 2 0 0 0 1 0 1 2 2 1 1 0 2 2 20/38 52.6
V N C S L Y A S G I G K D G T S W V A
I D C S P Y L V V R D G N T M V A
2 2 3 3 2 3 0 0 0 2 1 2 3 3 1 1 0 3 3 34/57 59.6
<L Q V V R>
< CAA >
< Q >
< Q >
UNITARY MATRIX
GENETIC CODE MATRIX
PAM250 SCORING
GENETIC SEMIHOMOLOGY
SCORE
What is
What is
important
important
in the
in the
protein
protein
similarity
similarity
search ?
search ?
1) Contribution (%) of identical positions
P K I L M E C K K D 8 P K I L M E C K K D 2
P K I L M K C K H D 8 0 % S D C L L D C V C L 2 0 %
similar not similar
2) Length of the compared strings (sequences)
LCE 1 M V EI C I E P K I R C I K V C T K D E R I T C L I L D ET 8
WCG 33.3% M V Y WC P R R F M H C V H L K A G G C T C W C L R L D Y Y 2 6 %
casual probably similar
3) Distribution of the identical positions along the analyzed sequence
MVEMICIEPKIRCIKVCTKDERITL 5 MVEMIMAGDARCIKVCTKDERITCL 5
HVYYWRPERFMHTVKLKAGGCRCWL 20% HHYYWMAGDAHTVQLKAGGCWCWAG
20%
casual similar
4) Residues at conservative positions
M V C P K I L M K C K H D S D C L L D C V C L E D M V C P K I L M K C K H D S D T L L D C V C L E D
E D E G K R R T K R E H F K E S N L A A A F K E Q Q N C P G P R E W C F T T R M N D S S C A C P Q T
not similar similar
5) Structural/genetic similarity of the amino acids at non-conservative
positions
Identity only
M V C P K I L M K C K H D S D C L L D C V C L E D
R L C R R L V K R C R K E T E C I V E C I C I D E
Structural Genetic
M V C P K I L M K C K H D S D C L L D C V C L E D M V C P K I L M K C K H D S D C L L D C V C L E D
R L C R R L V K R C R K E T E C I V E C I C I D E R L C R R L V K R C R K E T E C I V E C I C I D E
The
The
sequence
sequence
identity
identity
estimation
estimation
procedure
procedure
The probability of randomly occurred minimum
identity match (
a
is equal to declared or higher) is:
Where:
x
– the number of unit types in sequence (20
for proteins; 4 for NA)
n
– the sequence length (the number of
compared position pairs)
a
– the number of identical positions
( )( )
n
kn
k
n
ak
an
x
xxx
k
n
P
2
1
−
=
−
=
∑
Genetic conditioning of the amino acid
replacement probabilities and spectrum in
molecular evolution
Do the amino acids possess their pedigree ?
or
Do they contain the information about their history
(genealogy)?
Can the amino acid mutational replacements described as
Markovian processes ?
or
The Markov model assumes that the substitution probability of
amino acid AA
1
by AA
2
is the same, regardless of what the initial
residue AA
1
was transformed from (
AA
x
,
AA
y
)
The currently used statistical algorithms are based on Markovian
model of the amino acid replacement (they directly use stochastic
matrices of replacement frequency indices)
AA
1
AA
2
AA
x
P
a
AA
1
AA
2
AA
y
P
b
P
a
= P
b
C 12
S 0 2
T -2
1 3
P -3
1 0 6
A -2
1 1 1 2
G -3
1 0 -1
1 5
N -4
1 0 -1
0 0 2
D -5
0 0 -1
0 1 2 4
E -5
0 0 -1
0 0 1 3 4
Q -5
-1
-1
0 0 -1
1 2 2 4
H -3
-1
-1
0 -1
-2
2 1 1 3 6
R -4
0 -1
0 -2
-3
0 -1
-1
1 2 6
K -5
0 0 -1
-1
-2
1 0 0 1 0 3 5
M -5
-2
-1
-2
-1
-3
-2
-3
-2
-1
-2
0 0 6
I -2
-1
0 -2
-1
-3
-2
-2
-2
-2
-2
-2
-2
2 5
L -6
-3
-2
-3
-2
-4
-3
-4
-3
-2
-2
-3
-3
4 2 6
V -2
-1
0 -1
0 -1
-2
-2
-2
-2
-2
-2
-2
2 4 2 4
F -4
-3
-3
-5
-5
-5
-4
-6
-5
-5
-2
-4
-5
0 1 2 -1
9
Y 0 -3
-3
-5
-3
-5
-2
-4
-4
-4
0 -4
-4
-2
-1
-1
-2
7 10
W -8
-2
-5
-6
-6
-7
-4
-7
-7
-5
-3
2 -3
-4
-5
-2
-6
0 0 17
C S T P A G N D E Q H R K M I L V F Y W
PAM250 matrix
of
amino
acid replacements
Why tryptophane is here
the most conservative residue?
A
4
R
-1
5
N
-2
0
6
D
-2
-2
1
6
C
0
-3
-3
-3
9
Q
-1
1
0
0
-3
5
E
-1
0
0
2
-4
2
5
G
0
-2
0
-1
-3
-2
-2
6
H
-2
0
1
-1
-3
0
0
-2
8
I -1
-3
-3
-3
-1
-3
-3
-4
-3
4
L
-1
-2
-3
-4
-1
-2
-3
-4
-3
2
4
K
-1
2
0
-1
-3
1
1
-2
-1
-3
-2
5
M
-1
-1
-2
-3
-1
0
-2
-3
-2
1
2
-1
5
F
-2
-3
-3
-3
-2
-3
-3
-3
-1
0
0
-3
0
6
P
-1
-2
-2
-1
-3
-1
-1
-2
-2
-3
-3
-1
-2
-4
7
S
1
-1
1
0
-1
0
0
0
-1
-2
-2
0
-1
-2
-1
4
T
0
-1
0
-1
-1
-1
-1
-2
-2
-1
-1
-1
-1
-2
-1
1
5
W
-3
-3
-4
-4
-2
-2
-3
-2
-2
-3
-2
-3
-1
1
-4
-3
-2
11
Y
-2
-2
-2
-3
-2
-1
-2
-3
2
-1
-1
-2
-1
3
-3
-2
-2
2
7
V
0
-3
-3
-3
-1
-2
-2
-3
-3
3
1
-2
1
-1
-2
-2
0
-3
-1
4
A
R
N
D
C
Q
E
G
H
I L
K
M
F
P
S
T
W
Y
V
BLOSUM62 matrix of amino acid replacements
[...]... [YHS][K] [DN][M] GA EKRA C RKE PLQE KERD [ISV][H] [VG][PT] 66 [MEK][PS] ! What part of the codon contains the information about the previous amino acid that occurred at certain position of the protein sequence? At most 2/3 of the entire codon Ala Val GCG GUG How long is the information about codons of preceeding amino acids stored? The shortest storage period is 3 transitions/transversions Ala Val Met . SEQUENCE ALIGNMENT
SEQUENCE ALIGNMENT
Two Alignment
Two Alignment
Multiple Alignment
Multiple Alignment
Fundamental.
procedure
procedure
leading
leading
to optimal 2
to optimal 2
sequences
sequences
alignment
alignment
1
R V C P K I L M E C K K D S D C L A E C I C L