Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 440 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
440
Dung lượng
26,85 MB
Nội dung
CHAPTER 1 SEQHUNT A Program to Screen Aligned Nucleotide and Amino Acid Sequences George Johnson, Tai Te Wu, and Elvin A Kabat 1. Introduction We have been collecting nucleotide and amino acid sequences of pro- teins of immunological interest, and aligning them in order to understand the structure and function relations of these proteins (I). To aid in orga- nizing and analyzing this collection, a computer program, called SEQHUNT, was written. The SEQHUNT program is written in PL/PROPHET (2,3). SEQHUNT uses a preprocessed form of the database as its search data. SEQHUNT can pattern match nucleotide and amino acid sequences with the aligned data, pattern match phrases in the annotation fields of the sequences, and compare specified regions in similarly aligned sequences. The SEQHUNT program can be used only on a machine with the PWPROPHET environment present and with the PL/PROPHET table representation of the database present. To allow greater accessibility to the matching capabilities of the program, a partial implementation of SEQHUNT is available via electronic mail. 2. Materials The variable and constant regions of immunoglobulins and T-cell receptors for antigen, and the various domains of MHC class I and class II molecules have been aligned (I). These aligned sequences and From Methods m Molecular B/ology, Vol. 51: Ant/body Engmeenng Protocols Edited by. S Paul Humana Press Inc., Totowa, NJ 1 2 Johnson, Wu, and Kabat sequences of related proteins (I), together with new sequences published recently, have been stored in the NIH-supported PROPHET computer system (2,3) in the form of PL/PROPHET data tables. SEQHUNT uses this Kabat database (I) for its searching and region analysis. 3. Methods SEQHUNT is a computer program written in PL/PROPHET for use in the PL/PROPHET environment. The program performs three main types of analyses. The first is matching. Given a nucleotide or amino acid sequence and restrictions on the number of allowable mismatches and data tables to search through, SEQHUNT will return aligned matches of all sequences with mismatches equal to or less than the allowable num- ber. The second function of SEQHUNT allows searching for specified patterns in the sequence annotations. Name, antibody specificity, T-cell receptor classification, and reference fields may be searched for the desired pattern. Moreover, the full implementation of SEQHUNT allows region analysis of any one or a number of sequence stretches in similarly aligned sequences, e.g., all immunoglobulin heavy (H) chains. The program queries for the given region, such as the entire light (L) chain variable region (positions l-107) or a combination of several com- plementarity determining regions (CDRs), for example, CDRLl, CDRL2, and CDRL3 together. All sequences are called as search pat- terns, and the entire set of sequences is used as the search pool. Redun- dant matching is eliminated to reduce output. Any number of mismatches may be specified, although the output for mismatches above 1 or 2 is usually massive. These three types of searches may be performed on nucleotide or amino acid data, and matching and annotation searches also may be performed on unaligned data. SEQHUNT, as written, must be called from the PL/PROPHET envi- ronment. To allow greater access to the program, an interface has been developed that allows specially formatted queries to be sent via elec- tronic mail for processing. The interface supports all functions of the original SEQHUNT, except region analysis. 3.1. Sequence Pattern Match The nucleotide sequence pattern-matching capabilities of SEQHUNT are shown in Fig. 1. In this example, the nucleotide sequence to match (TARGET SEQUENCE) is the H-chain variable region of the IgM SE&HUNT BALB/c murine monoclonal antibody (MAb) PRl (4), which has speci- ficity for the PRl antigen on human prostate cancer cells and normal human prostate cells, This SEQHUNT search was restricted to 12 or fewer mismatches among the sequences of all H-chain variable regions of all species currently in the database. In Fig. 1, several sequences with 6,7, or 11 mismatches are shown. They are listed in order of increasing mismatches. An upper-case base is a mismatch, and all lower-case bases are matches. Dashes are for alignment (I). To save space, several other sequences with fewer than 12 mismatches are not listed (see Notes 1 and 2 for other examples). Figure 2 shows the results of a search of all H-chain variable regions for matches with a segment of the human D-minigene D2 (5). Human D-minigenes sometimes match segments other than the third CDR of human H chains (6). As shown in Fig. 2, a segment of 14 nucleotides from human D2 is found in the second CDR of human, mouse, and rabbit H chains (see Note 3). For nucleotide sequences in the human CDRH3 region, additional matches are found on both sides of the 14 nucleotides. RF-SJ2 matches human D2 for 24 bases, ttgtugtggtggtagctgctactc, and L42 for 28 bases, ggatattgtagtggtggtugctgctact. The 14 matches in Fig. 2 are under- lined. Usually, only short segments of human D-minigenes are incor- porated into CDRH3s (7). When some of these short segments of the human D-minigenes, e.g., aactgg, a segment of DHQ52 (5), are searched, identical matches occur frequently over the entire H-chain variable region (Fig. 3). 3.2. Antibody Specificity Search An example of antibody specificity searching is shown in Fig. 4 for a SEQHUNT search called with the specified pattern “HIV,” the abbrevia- tion for Human Immunodeficiency Virus. Only a few of the matches are shown. The search was restricted to all H-chain variable region sequences in the database. SEQHUNT scans the antibody specificities and looks for exact matches with the “HIV” pattern. This search, Fig. 4, found anti- bodies directed against ~24, gp120, and gp4 1. Even for the same protein, the numbers of dashes in the last three lines of the sequences are differ- ent, indicating that the length of H-chain CDR3 can vary more exten- sively than those of CDRl and CDR2 (8). Most likely, the antibodies are directed toward different parts of one of the HIV proteins, Searches of the name and reference fields are also allowed. ‘ARGET :LQ”ENCL 131?? OL-126-7 3 !IPFLR $EQ"?JNCE KLS , yrtytycayctyytygag+c+ggay- gt~gcctg~t~ra~cctyya~~~~~~ cc.ga~ac.c.cc.ytycn9-c yyatteyattttsytagotsctggs- tyaytty~gtcccrgcayyctccagp- gssay~gctagaatggat+aya aa- sttaafccayataycayts~ye aa- 8 actstscyccatctc+rsagg"t~~- ~ttcatcatctccagtyncaac?rc- aaaaatacgctqtacctgcaastga- ycaasytyaysfctgaggaca~a~~- cctttattactytgcer?a 90 GytyAagctTCtCgsytctysa~- ytygcctggtycagcctapay~*t~- ,c;',e eaactctcct~tycsycct=a- tcyattttayfag"taccgya- toagt tyyptccpgcaoor- I tccayyyaaaygyctrgost~q~++- gyagaaattsatcca qata- ycaytacystaaactstscoccatc- tCt."."~lt."~ttC~tC~tCrCC- gtyc"gcctcs- ggrttcgattttagtagatactgga- tgayt tgyytccygcapgc- tccaggg*aag ? gctagaatggatt- ygayaaatta* cc Qa+rl- gc*ytacgstsaactatacgccatc- tctaaaygltsaattcstcatcttc- agAgacsscgccaaasatacgctut- acctycaaatgagcasaytgagatc- tgaggac.c~gccctttattaotgt- gcragC FECIES OOSE OUSE 0”SG EGIN 4 4 4 I iFLCIPICITY I 8 REPEPENCE UITI-3 TJC"S- !LLncT~SAtIIN- I "YBRIDOMA i LIMURA. H . B”- lt;';~yf~$ l- BORDEII F 6 - ?ABAT.k A 1297) FRK Ii- ATL ACRD SCI- USA P4,244"- -2445 UITI-AMINOFH- :NYL-BLTA-N ,cETYLGLocoS- MINIIDL. A/F- L/P/34 IN!-L"- :NZA VIRUS B0NILLA.F A - , ZAGHWANI, H- ,P"BIN,M 6- B0NAA.C (lP- PO ) J . IMMUNO- &,145,616-6- 5 3A? I1 qaGqtqAaqct?CtCqsqtctqqaq- qtqqcctggtqcaqcctqqaqqatc- cct 2 aaaGtctcct 99. tcqattttsq e tqcaqcctca- agstactgoa- tqaqt tqqqtccggcsqqc- tcca999.aa9qqctaqaatggatt- .$~aqa.mttaatcc qate,- .gtacgCtaaactatGcqccatc- tctaa.qgataaattcstcatctcc- ? ECIaI*at.CACtTzI qaqcsaaqtqaaa tqaqgaoacnqccctttrttsctgt- qca*ga “ODSE II 11 gaGqtgAagcCTCtCgagtctgqaq- gtggcctqgtqcaqcctqqaqqatc- cctgaarctctcotgtqcaqcctca- ? gattcg*tttt*gtaqatactgga- q.qt tgqqtccggcsggc- tcc~CqqaaaCqqctaqaatqqatt- qqaqaaattaatcca gata- ? a~qt*oqat~aCct~tacqo=~t=- ~taalqqataaattcatcatctcc- l qAgmcarcgccmaaaatacqctqt- ~cotqc~artqsgca~aqt~~~~~~~ RNII-GALACTO- SNYDfR,J G ,- SYIGLUBOSIDE NENG. A , W-L- I$:*; Mh y.z 0)'EUR J ItSi- ONOL ,20,267- 3-2677 RNTI-UORPHIN- E-6-HEMISUCC- INATE-BSA mss1E.P H.,- ANCHIN,J.H ,- suBPAMANxAM, - i. 'EE;hk:cI I M.D.S. (lPY- 1) J.mOL- 146,4246-4- 557 Fig. 1. Matching a nucleotide sequence of a H-chain variable region, positions l-94. The sequence, PRl, IS shown m row 1 labeled as TARGET SEQUENCE. Some of the sequences in the database with 6,7, or 11 mismatches are listed m the order of increasing rmsmatches as shown in column 2. Names of these sequences are given in column 1. Columns 4- 8 indicate species, beginning position, endmg position, antibody specificity, and reference, respectively. 0 I 2 3 4 5 5 7 8 FO”“Wt? Iw4E DIFFER SCQ”EtlCE SFECIES Er‘SIN END SFE’IPICITf PE~ER?NCE 1 TARGET 0 qtsgtggtqotalc SEQUENCE 2 014 0 9t “o+q- MOOSE S? 5F. “NKNOWIJ LE”Y,N s ,MALlFIERO.O " ,LEBEC- qtaqtaec P"E.2 G 6 GEARHAPT,F J (l?Rl- ) ST EXF l,El2 ,163,2007-2019 3 PVl1720 0 gt aa+3- rABBIT 52 56 “tlnlOWll 9tcl9ta-3= RC”X,h II ,DRANARAJAN, F , GOTTSC- HALh, 7 , MCPOPMACK. A T 5 PENSH- ~lV5R&;03~l~"' J.IMMUt1OL ,14F,- 4 lR/O 0 4t ac7tcl3t3- “WA1 52h 56 ANTI-WA AUT- @EPSIMOtlIAN.H ,SCHWAPTZ,F S ,B- gtag= OANTIB?DY HY- ARRETT,h J h ST0LLAP.B D. iI+ BPIDOMA 47) J. IMMDNOL ,139,24’=6-25Fl I 5 RF-SJZ 0 gtagtggtggtagc HUMAN 98 IlOOB AN-II-IGGl,IG- G2, IGG4 RHCU- MATOIU FACTOR -t 6 L42 0 3tagtggtggtagc Ii- 100 1OOD ADTOANTIBODY PASCUAL, V , RANDEN, I , TNOMPSON, - K SIOOD.M.PORRE.O .NATVIG.J - i &PRA,k.D (1930) ‘J CLIN- INV- EST ,86,1320-1328: RANDEN I R-,D.,TN-SON,K.M.,HDGkLSL~I 0NES.N ,PASCUAL.V ,VICTOR,h ,C- APRA J D.,PORRL 0 L NATVIG J - $i992) J.IMtdJNOL ,148,32!+6 K1PPS.T J. L DDFFY,S.F. (lYYl)- J.CLIN INVEST ,87,2087-2096 Fig. 2. Matching segments of D-minigenes. The format of this figure is identical to that of Fig. 1. The TARGET SEQUENCE pattern consists of 14 bp from the human D2minigene. It matches identically to nucleotide segments in the CDRH2 region of 914 (mouse), RVH720 (rabbit) and 1819 (human) as shown in rows 2, 3, and 4 respectively. It also matches human CDRH3 segments of RF-SJ2 and LA2 shown in rows 5 and 6. For details, see section 3.1. 0 1 2 3 4 1 ZEGIN LD 8 Pownme NAME DIFFER SEQOBNCE SPECIES SPECIFICITY RECBRENCl BEES 1 TARGET 0 aactgg SEQUENCE CLL4 0 asctqg HUMAN 3 5 VNIWOWN CAI,J.,HDMeHRIES,C- RICHARDSON,A. C - +~cKER,F.W. aaozb- J.EXP.MED.,176,10- 73-1081 3 257-D 0 aactgg Hvl4AN 32 33 ANTI-HIV TY-PE 1 SF- ANDRIS J.S JOHNSO- ECIFIC FOR TIE PRI- N,S., 26IJA-bAENER, - NCIFAL NBUTRALIZIN- 9 4 CAPRA,J D. (l- G DCHAIN OP gpl20 - 991) PROC NA?L.ACA- OF lS4 AND MAP TO R- 97;$I.VSA,88,7783 ESIDVES XRIHI 4 VHVI 0 aactgg HvMAN 358 36 VNRNoim BOLUW3LA,L. L RABB- IT'IS,T.H. (1988) E- VR J.IMMUNOL.,18,1- 843-1845 '1 CLL-27 0 aactgg HvNAN 44 46 O?WNOWN BP,RMF.N,J.E. RDMPHR- IBS,C.G.,EdTH,J - ALT,P." ‘ TUCKER:- 6 Ab21 0 asctgg HUMAN 50 52 POLYRBACTIVE AUTOA- SANf,I.,CASALI,P ,- NTIBODY TIiOtlAS,J.W.,NOTKIN- S,A.L. ‘ CAPRA,J D- I 4m6 0 ractgg MOUSE 93 95 VNKNOMi GV&;$p"I= i-i-Eg-l TV K. c ~OR!~TER I - (i991, J.EXP MtD - ,173,1357-1371 8 L22 0 .llC2tgg HUMAN 1OOE loo? ADTOANTIEODY KIPPS,T.J. L DDFPY- ,S.F. (1991) J.CLI- luvEsT.,e7,2087 Q 4611 0 ractgg MOUSB 102 103 ANTI-IDIOTYPIC ANT- TAOB,R.,HSJ,J C.,- IBODY AGAINST THB - CARSKY V.M. HILL,B- TEYROTROPIS (TSR) - .L.,ldANGCk,B.F. - RECEPTOR b KOHN,L.D. (1992)- J.BIOL.CBEM ,267,- 5977-5984 Fig. 3. Matching of a segment of the human D-minigene, DHQ52 (TARGET SEQUENCE) with different regions of mouse and human H chains. 1 2 3 4 5 0wn.m. NAME SEQUENCE SPECIES SPECIPICITY REIERENCE BAT123 gaa gtg c.g ctt tag gag tcg gga cct ggc ctg gt- MOUSE ANTI-GLYCOPR- LIOW,R -S ,ROSEN,E.H- g a cot tct c*g tct ctg tee ctc *cc tgc act gtc - OTEIN qpl20 - .,mNG,H s c.,suN,w - act ggc t-0 tu at0 *cc 'gt g.t tat gco tgg .*c OF HTLV-IIIb- - tgg .tc egg c.g ttt cc. gg. .*c .** ctg gag tgg - N.C. SUN C.,GORDON W- STRAIN OF H- .,CHhG,k.T. A CHAfW atg ggc tat .t. age tat sgt ggt ogc .o- IVTXFE 1 ,T (1. (1989) J.IMMU?i- t *cc t.0 sac cc. tot ctc *.a agt cg. .tc tct ate - OL ,143,3967-3975 act cga g agt to gtg act tct g*g g.c acn gee ac. -at 8 ac no. tee sag 1.0 otg ttc ttc ct.$ C.?.Z”I tgt gc. agg ggg agt ttc gg. gac - - - - tgg ggc a* - ggg act ctg gtc act gtc tot get CB-mab-,x24/ tag gtc I ctg tag gag tct ggg gg. ggc tta gt- 13-S MOUSE ANTI-P24 COR- KUTTNER,G ,GIEBMANN,- g eag ctt gg. ggg tee ctg a ctc tee tgt go. gee - E PROTEIN OF- E ,NIEMNN,E ,WINKLE- tct gg. ttc act ttc agt ago t.t tn.2 atg tct. HIV-1 - tgg gtt cgs c.g act cc. g.g g agg ctg gag ttg - R K ,GRUNOW,R , HINKU- I,& J , ROSEN, J , WAHRE- gtc gc* gee .tt t agt aat ggt ggt ago gc- N,S 5 VW 6AEHR.P. - c tat tat cc. ? a0 l ct gtg a.g ggc cg. ttc *cc at= - tee ag. g.0 a. gee g a.0 .CC ctg t-0 ctc c .t- j;;fX\-r;'Mk IllMUNOL ,- g age agt ctg g tct g.g g.c .c. gee ttg tat t.c - tgt gc. ag. ct. ccc ctt - g*c i-*c tgg ggc a* - ggg ICC asg gto *cc gtc tee tea 0 S-BETA tag gtt csg ctg 0-g ug tct ggg get a.0 ctg gt- MOIlSE f ANTI-FRINCIF- MATSVSRITA,S.,P!AEDA,- at Cf. .t. gag - tgg ntg a c.g t c.t ggg arg ago ct. g.g tgg - PND) OF HIV .,TOKIYOSHI,S. C TAK- .tt gga *at ttt c.t cot t-0 agt gst gat .c- 1 gpl20 t c tat t g . . . ttc g ggc a. ATSUKI,K (1992) AID- ? gee *** tt? - S RES.HUMAN RETROVIR- .ct gta g I tco tot age .c. gttc .c ttg gag t- USES,8,1107-1115. c age cga tt. .c. tot gat g.c tot get gtt tat tat - tgt gc. at. c.c t.0 ggt agt gee t.0 got atg - gac t*c tgg ggt aa - gg. *cc tc. gtc *cc gtc tee tea 257-D ANTI-HIV TXQ- ARDRIS J.S. JOiiNSON,- E 1 SFSCIFIC- S.,ZOLh,-FAkNER,3 b- tct gg. t.0 *cc ttt C g.c C tgg .tc ggc FOR TflG FRI- CAFRA,J D (1991) P- - tgg gtg cgc erg .tg ccc ggg . . . ggc ctg gag tgg - ~~C~GNSUSU- ROC NATL.ACAD SC1 US- atg ggg .tc .tc tat Cd grt g.c tct g.c .g- - A,80,7703-77R7 c aca gtc .gt ccg tc. ttc c ggc tag gtc .EC .tc - tu gee g.c arg tee .cc .gc .CE gee t-0 ctg c.g tg- 0 *go .gc ctg g go0 tog 9.0 .CC gee at= tat trc - F TO RESIDW- 3 KRIHI t.c ttt g.c t. gg. *cc ctg gtc .cc gtc tee tea 71-31 0.g F g =*g "t? =w -g tct p7g=p~.~*w&3~ HtJnAN ANTI-HIV TYF- ANDRIS J.S. JOMSON,- g arg cc ggg gee 0. gtg a at G 1 SFSCIFIC- S.,ZO&-FAhNER.5. L- tct ggc ttc tee ttc .tc ICC t.c t.t ttt-cat 2- FOR ~24 CAFRA,J D (1991) P- - tgg gtg cg. c.g gee ccc gga tag ,gg ctt 0-g tgg - ROC NATL.ACAD SC1 US- .tg gga gt. *to C cc .gt ssf cw= *c- A,B0,7703-7787 . *cc tn.2 to. tag g tto tag ggc ag. gtc gee stg - *cc .gg gac .cg tee .cq ago go. gtg tat rtg g.g tt- 9 ago .gc ctg ag. tct gta g.0 .cg gee at. tat t-c - St get 99. gtt -g 999 stg g=c egg a=t tt. ggg w- ,= - - - tgg ggc ag - gg. tee ctg gttc ICC gtc tee tu [...]... antiN-(P-cyanophenyl)-N’-(diphenylemethyl)guanidineacetic acid antibody (1CGS) is also provided (2) The method used to model this antibody is based on the CAMAL algorithm (3-9) that combines structural and ab initio approaches to determine antibody structure, and is embodied in the commercial version of the program AbM (10) From Methods m Molecular Biology, Vol 51 Ant/body Engmeenng Edited by S Paul Humana Press Inc , Totowa, NJ 17 Protocols 18 Webster... Jr., Hillson, J L., and Perlmutter, R M (1987) Early restriction of the human antibody repertoire Science 238,79 l-793 CHAPTER2 Molecular Modeling of Antibody- Combining Sites David M Webster and Anthony R Rees 1 Introduction Antibodies possess a vast repertoire of specificity and affinity To understand the molecular basis of antibody function, we require highresolution X-ray crystallographic structures... Ant/body Engmeenng Edited by S Paul Humana Press Inc , Totowa, NJ 17 Protocols 18 Webster and Rees 1.1 Antibody Structure 1.1.1 The Antibody Fold Antibodies have a distinctive structure often depicted as a Y or T shape with the two distal arms (Fab) containing the sites for antigen binding (Fig 1) An antibody consists of two identical light (L) chains and two identical heavy (H) chains that fold into... coworkers (12) This fold and its variants have also been observed in nonantibody molecules, including T-cell receptors The Fab contains a variable domain (V,/V,) and a constant domain (C,/C,l), with the two halves of each domain formed from the two H chains Since the antibody contains two Fab arms, two antigen molecules may be bound by the same antibody The constant domains for L and H chains are constant for... all of the CDR loops may be involved in antigen binding Three classes of antibody- combining-site topology are recognized: a cavity type that typically binds haptens, a groove type that binds peptides, carbohydrates, or nucleic acids, and a planar type that binds proteins (16) Antibody- Combining 19 Sites Fig 1 A cartoon of an IgG antibody displaying two L chains and two H chains These two chains fold... domains The hinge and elbow angles for the Fab are different, illustrating potential flexibility in this region 2 Antibody Modeling 2.1 General Methods Antibody modeling has attracted increased interest over the past few years (3,8,9, I7-20), in part owing to an explosion in the number of published antibody sequences (21), a gradual increase in availability of good crystal structures (I), and availability... anti-N-(P-cyanophenyl)-N’-(diphenylmethyl)guanidine acetic acid antibody (1CGS) (2) was chosen for modeling Its structure has been solved at an average resolution of 2.6 I$, and it contains different types of hypervariable loops that may be modeled using the canonical loop, database, and combined database/CONGEN approaches This antibody has been chosen here to demonstrate the need for an understanding of antibody structure when interpreting... orientation affects CDR takeoff points on the F, L and H chains are chosen based on greatest homology from a database of antibody structures Where these chains are not derived from the same antibody, a fitting procedure is used to reconstruct a new F, framework The L and H chains of the 1CGS antibody show greatest homology with the corresponding subunits derived from two different antibodies Therefore, the... a, y, 6, E, and ~1.The two Fabs are attached to the F, region by a flexible hinge, giving the antibody an intrinsic flexibility 1.1.2 The Variable Domain The variable domains (V,/V,) associate noncovalently to form a twisted antiparallel P-sheet structure Although the framework is well conserved between known antibody structures (Table 1A and B), variations in the packing of P-sheets and strands do occur... regions of the antibody both in sequence and in structure These regions are known as the hypervariable or complementarity determining regions (CDRs) Each CDR interconnects a P-strand, with three CDRs (Ll, L2, L3) derived from the L chain and three from the H chain (Hl, H2, H3) This interconnection of the antiparallel P-strands brings the CDRs close together in space at the distal end of the antibody Some . acid antibody (1CGS) is also provided (2). The method used to model this antibody is based on the CAMAL algorithm (3-9) that combines structural and ab initio approaches to determine antibody. Biology, Vol. 51 Ant/body Engmeenng Protocols Edited by S. Paul Humana Press Inc , Totowa, NJ 17 18 Webster and Rees 1.1. Antibody Structure 1.1.1. The Antibody Fold Antibodies have a distinctive. tool for studying the underlymg mecha- nisms of antibody specificity. Based on the idea of random assortment of the six CDRs generating the antibody repertoire, a given CDR, e.g., CDRLl, should