1. Trang chủ
  2. » Khoa Học Tự Nhiên

algorithms in bioinformatics 2002

543 1K 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 543
Dung lượng 5,33 MB

Nội dung

Preface We are pleased to present the proceedings of the Second Workshop on Algo- rithms in Bioinformatics (WABI 2002), which took place on September 17-21, 2002 in Rome, Italy. The WABI workshop was part of a three-conference meet- ing, which, in addition to WABI, included the ESA and APPROX 2002. The three conferences are jointly called ALGO 2002, and were hosted by the Fac- ulty of Engineering, University of Rome “La Sapienza”. See http://www.dis. uniroma1.it/˜algo02 for more details. The Workshop on Algorithms in Bioinformatics covers research in all areas of algorithmic work in bioinformatics and computational biology. The emphasis is on discrete algorithms that address important problems in molecular biology, genomics, and genetics, that are founded on sound models, that are computation- ally efficient, and that have been implemented and tested in simulations and on real datasets. The goal is to present recent research results, including significant work in progress, and to identify and explore directions of future research. Original research papers (including significant work in progress) or state- of-the-art surveys were solicited on all aspects of algorithms in bioinformatics, including, but not limited to: exact and approximate algorithms for genomics, genetics, sequence analysis, gene and signal recognition, alignment, molecular evolution, phylogenetics, structure determination or prediction, gene expression and gene networks, proteomics, functional genomics, and drug design. We received 83 submissions in response to our call for papers, and were able to accept about half of the submissions. In addition, WABI hosted two invited, distinguished lectures, given to the entire ALGO 2002 conference, by Dr. Ehud Shapiro of the Weizmann Institute and Dr. Gene Myers of Celera Genomics. An abstract of Dr. Shapiro’s lecture, and a full paper detailing Dr. Myers lecture, are included in these proceedings. We would like to sincerely thank all the authors of submitted papers, and the participants of the workshop. We also thank the program committee for their hard work in reviewing and selecting the papers for the workshop. We were fortunate to have on the program committee the following distinguished group of researchers: Pankaj Agarwal (GlaxoSmithKline Pharmaceuticals, King of Prussia) Alberto Apostolico (Universit`a di Padova and Purdue University, Lafayette) Craig Benham (University of California, Davis) Jean-Michel Claverie (CNRS-AVENTIS, Marseille) Nir Friedman (Hebrew University, Jerusalem) Olivier Gascuel (Universit´e de Montpellier II and CNRS, Montpellier) Misha Gelfand (IntegratedGenomics, Moscow) Raffaele Giancarlo (Universit`a di Palermo) VI Preface David Gilbert (University of Glasgow) Roderic Guigo (Institut Municipal d’Investigacions M`ediques, Barcelona, co-chair) Dan Gusfield (University of California, Davis, co-chair) Jotun Hein (University of Oxford) Inge Jonassen (Universitetet i Bergen) Giuseppe Lancia (Universit`adiPadova) Bernard M.E. Moret (University of New Mexico, Albuquerque) Gene Myers (Celera Genomics, Rockville) Christos Ouzonis (European Bioinformatics Institute, Hinxton Hall) Lior Pachter (University of California, Berkeley) Knut Reinert (Celera Genomics, Rockville) Marie-France Sagot (Universit´e Claude Bernard, Lyon) David Sankoff (Universit´e de Montr´eal) Steve Skiena (State University of New York, Stony Brook) Gary Stormo (Washington University, St. Louis) Jens Stoye (Universit¨at Bielefeld) Martin Tompa (University of Washington, Seattle) Alfonso Valencia (Centro Nacional de Biotecnolog´ıa, Madrid) Martin Vingron (Max-Planck-Institut f¨ur Molekulare Genetik, Berlin) Lusheng Wang (City University of Hong Kong) Tandy Warnow (University of Texas, Austin) We also would like to thank the WABI steering committee, Olivier Gascuel, Jotun Hein, Raffaele Giancarlo, Erik Meineche-Schmidt, and Bernard Moret, for inviting us to co-chair this program committee, and for their help in carrying out that task. We are particularly indebted to Terri Knight of the University of California, Davis, Robert Castelo of the Universitat Pompeu Fabra, Barcelona, and Bernard Moret of the University of New Mexico, Albuquerque, for the extensive technical and advisory help they gave us. We could not have managed the reviewing process and the preparation of the proceedings without their help and advice. Thanks again to everyone who helped to make WABI 2002 a success. We hope to see everyone again at WABI 2003. July, 2002 Roderic Guig´o and Dan Gusfield Table of Contents Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces 1 L.R. Grate (Lawrence Berkeley National Laboratory), C. Bhattacharyya, M.I. Jordan, and I.S. Mian (University of California Berkeley) Pooled Genomic Indexing (PGI): Mathematical Analysis and Experiment Design 10 M. Cs˝ur¨os (Universit´e de Montr´eal), and A. Milosavljevic (Human Genome Sequencing Center) Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem 29 R. Rizzi (Universit`a di Trento), V. Bafna, S. Istrail (Celera Genomics), and G. Lancia (Universit`a di Padova) Methods for Inferring Block-Wise Ancestral History from Haploid Sequences 44 R. Schwartz, A.G. Clark (Celera Genomics), and S. Istrail (Celera Genomics) Finding Signal Peptides in Human Protein Sequences Using Recurrent Neural Networks 60 M. Reczko (Synaptic Ltd.), P. Fiziev, E. Staub (metaGen Pharmaceu- ticals GmbH), and A. Hatzigeorgiou (University of Pennsylvania) Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry 68 N. Edwards and R. Lippert (Celera Genomics) Improved Approximation Algorithms for NMR Spectral Peak Assignment . 82 Z Z. Chen (Tokyo Denki University), T. Jiang (University of Califor- nia, Riverside), G. Lin (University of Alberta), J. Wen (University of California, Riverside), D. Xu, and Y. Xu (Oak Ridge National Labo- ratory) Efficient Methods for Inferring Tandem Duplication History 97 L. Zhang (Nat. University of Singapore), B. Ma (University of Western Ontario), and L. Wang (City University of Hong Kong) Genome Rearrangement Phylogeny Using Weighbor 112 L S. Wang (University of Texas at Austin) VI II Table of Contents Segment Match Refinement and Applications 126 A.L. Halpern (Celera Genomics), D.H. Huson (T¨ubingen University), and K. Reinert (Celera Genomics) Extracting Common Motifs under the Levenshtein Measure: Theory and Experimentation 140 E.F. Adebiyi and M. Kaufmann (Universit¨at T¨ubingen) Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics 157 M.H. Goldwasser (Loyola University Chicago), M Y. Kao (Northwest- ern University), and H I. Lu (Academia Sinica) FAUST: An Algorithm for Extracting Functionally Relevant Templates from Protein Structures 172 M. Milik, S. Szalma, and K.A. Olszewski (Accelrys) Efficient Unbound Docking of Rigid Molecules 185 D. Duhovny, R. Nussinov, and H.J. Wolfson (Tel Aviv University) A Method of Consolidating and Combining EST and mRNA Alignments to a Genome to Enumerate Supported Splice Variants 201 R. Wheeler (Affymetrix) A Method to Improve the Performance of Translation Start Site Detection and Its Application for Gene Finding 210 M. Pertea and S.L. Salzberg (The Institute for Genomic Research) Comparative Methods for Gene Structure Prediction in Homologous Sequences 220 C.N.S. Pedersen and T. Scharling (University of Aarhus) MultiProt – A Multiple Protein Structural Alignment Algorithm 235 M. Shatsky, R. Nussinov, and H.J. Wolfson (Tel Aviv University) A Hybrid Scoring Function for Protein Multiple Alignment 251 E. Rocke (University of Washington) Functional Consequences in Metabolic Pathways from Phylogenetic Profiles 263 Y. Bilu and M. Linial (Hebrew University) Finding Founder Sequences from a Set of Recombinants 277 E. Ukkonen (University of Helsinki) Estimating the Deviation from a Molecular Clock 287 L. Nakhleh, U. Roshan (University of Texas at Austin), L. Vawter (Aventis Pharmaceuticals), and T. Warnow (University of Texas at Austin) Table of Contents IX Exploring the Set of All Minimal Sequences of Reversals – An Application to Test the Replication-Directed Reversal Hypothesis 300 Y. Ajana, J F. Lefebvre (Universit´e de Montr´eal), E.R.M. Tillier (Uni- versity Health Network), and N. El-Mabrouk (Universit´e de Montr´eal) Approximating the Expected Number of Inversions Given the Number of Breakpoints 316 N. Eriksen (Royal Institute of Technology) Invited Lecture – Accelerating Smith-Waterman Searches 331 G. Myers (Celera Genomics) and R. Durbin (Sanger Centre) Sequence-Length Requirements for Phylogenetic Methods 343 B.M.E. Moret (University of New Mexico), U. Roshan, and T. Warnow (University of Texas at Austin) Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle 357 R. Desper (National Library of Medicine, NIH) and O. Gascuel (LIRMM) NeighborNet: An Agglomerative Method for the Construction of Planar Phylogenetic Networks 375 D. Bryant (McGill University) and V. Moulton (Uppsala University) On the Control of Hybridization Noise in DNA Sequencing-by-Hybridization 392 H W. Leong (National University of Singapore), F.P. Preparata (Brown University), W K. Sung, and H. Willy (National University of Singa- pore) Restricting SBH Ambiguity via Restriction Enzymes 404 S. Skiena (SUNY Stony Brook) and S. Snir (Technion) Invited Lecture – Molecule as Computation: Towards an Abstraction of Biomolecular Systems 418 E. Shapiro (Weizmann Institute) Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search 419 P. Berman (Pennsylvania State University), P. Bertone (Yale Univer- sity), B. DasGupta (University of Illinois at Chicago), M. Gerstein (Yale University), M Y. Kao (Northwestern University), and M. Sny- der (Yale University) Rapid Large-Scale Oligonucleotide Selection for Microarrays 434 S. Rahmann (Max-Planck-Institute for Molecular Genetics) X Table of Contents Border Length Minimization in DNA Array Design 435 A.B. Kahng, I.I. M˘andoiu, P.A. Pevzner, S. Reda (University of Cal- ifornia at San Diego), and A.Z. Zelikovsky (Georgia State University) The Enhanced Suffix Array and Its Applications to Genome Analysis 449 M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch (University of Bielefeld) The Algorithmic of Gene Teams 464 A. Bergeron (Universit´eduQu´ebec a Montreal), S. Corteel (CNRS - Universit´e de Versailles), and M. Raffinot (CNRS - Laboratoire G´enome et Informatique) Combinatorial Use of Short Probes for Differential Gene Expression Profiling 477 L.L. Warren and B.H. Liu (North Carolina State University) Designing Specific Oligonucleotide Probes for the Entire S. cerevisiae Transcriptome 491 D. Lipson (Technion), P. Webb, and Z. Yakhini (Agilent Laboratories) K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data 506 Z. Bar-Joseph, E.D. Demaine, D.K. Gifford (MIT LCS), A.M. Hamel (Wilfrid Laurier University), T.S. Jaakkola (MIT AI Lab), and N. Sre- bro (MIT LCS) Inversion Medians Outperform Breakpoint Medians in Phylogeny Reconstruction from Gene-Order Data 521 B.M.E. Moret (University of New Mexico), A.C. Siepel (University of California at Santa Cruz), J. Tang, and T. Liu (University of New Mexico) Modified Mincut Supertrees 537 R.D.M. Page (University of Glasgow) Author Index 553 Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces L.R. Grate 1 , C. Bhattacharyya 2,3 , M.I. Jordan 2,3 , and I.S. Mian 1 1 Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley CA 94720 2 Department of EECS, University of California Berkeley, Berkeley CA 94720 3 Department of Statistics, University of California Berkeley, Berkeley CA 94720 Abstract. Molecular profiling technologies monitor thousands of tran- scripts, proteins, metabolites or other species concurrently in biologi- cal samples of interest. Given two-class, high-dimensional profiling data, nominal Liknon [4] is a specific implementation of a methodology for performing simultaneous relevant feature identification and classifica- tion. It exploits the well-known property that minimizing an l 1 norm (via linear programming) yields a sparse hyperplane [15,26,2,8,17]. This work (i) examines computational, software and practical issues required to realize nominal Liknon, (ii) summarizes results from its application to five real world data sets, (iii) outlines heuristic solutions to problems posed by domain experts when interpreting the results and (iv) defines some future directions of the research. 1 Introduction Biologists and clinicians are adopting high-throughput genomics, proteomics and related technologies to assist in interrogating normal and perturbed systems such as unaffected and tumor tissue specimens. Such investigations can generate data having the form D = {(x n ,y n ),n∈ (1, ,N)} where x n ∈ R P and, for two-class data, y n ∈{+1, −1}. Each element of a data point x n is the absolute or relative abundance of a molecular species monitored. In transcript profiling, a data point represents transcript (gene) levels measured in a sample using cDNA, oligonu- cleotide or similar microarray technology. A data point from protein profiling can represent Mass/Charge (M/Z) values for low molecular weight molecules (proteins) measured in a sample using mass spectroscopy. In cancer biology, profiling studies of different types of (tissue) specimens are motivated largely by a desire to create clinical decision support systems for accurate tumor classification and to identify robust and reliable targets, “biomarkers”, for imaging, diagnosis, prognosis and therapeutic intervention [14,3,13,27,18,23,9,25,28,19,21,24]. Meeting these biological challenges includes addressing the general statistical problems of classification and prediction, and relevant feature identification. Support Vector Machines (SVMs) [30,8] have been employed successfully for cancer classification based on transcript profiles [5,22,25,28]. Although mecha- nisms for reducing the number of features to more manageable numbers include R. Guig´o and D. Gusfield (Eds.): WABI 2002, LNCS 2452, pp. 1–9, 2002. c  Springer-Verlag Berlin Heidelberg 2002 2 L.R. Grate et al. discarding those below a user-defined threshold, relevant feature identification is usually addressed via a filter-wrapper strategy [12,22,32]. The filter generates candidate feature subsets whilst the wrapper runs an induction algorithm to determine the discriminative ability of a subset. Although SVMs and the newly formulated Minimax Probability Machine (MPM) [20] are good wrappers [4], the choice of filtering statistic remains an open question. Nominal Liknon is a specific implementation of a strategy for perform- ing simultaneous relevant feature identification and classification [4]. It exploits the well-known property that minimizing an l 1 norm (via linear programming) yields a sparse hyperplane [15,26,2,8,17]. The hyperplane constitutes the clas- sifier whilst its sparsity, a weight vector with few non-zero elements, defines a small number of relevant features. Nominal Liknon is computationally less de- manding than the prevailing filter–(SVM/MPM) wrapper strategy which treats the problems of feature selection and classification as two independent tasks [4,16]. Biologically, nominal Liknon performs well when applied to real world data generated not only by the ubiquitous transcript profiling technology, but also by the emergent protein profiling technology. 2 Simultaneous Relevant Feature Identification and Classification Consider a data set D = {(x n ,y n ),n ∈ (1, ,N)}. Each of the N data points (profiling experiments) is a P -dimensional vector of features (gene or protein abundances) x n ∈ R P (usually N ∼ 10 1 − 10 2 ; P ∼ 10 3 − 10 4 ). A data point n is assigned to one of two classes y n ∈{+1, −1} such a normal or tumor tis- sue sample. Given such two-class high-dimensional data, the analytical goal is to estimate a sparse classifier, a model which distinguishes the two classes of data points (classification) and specifies a small subset of discriminatory fea- tures (relevant feature identification). Assume that the data D can be separated by a linear hyperplane in the P -dimensional input feature space. The learning task can be formulated as an attempt to estimate a hyperplane, parameterized in terms of a weight vector w and bias b, via a solution to the following N inequalities [30]: y n z n = y n (w T x n − b) ≥ 0 ∀n = {1, ,N} . (1) The hyperplane satisfying w T x −b = 0 is termed a classifier. A new data point x (abundances of P features in a new sample) is classified by computing z = w T x −b.Ifz>0, the data point is assigned to one class otherwise it belongs to the other class. Enumerating relevant features at the same time as discovering a classifier can be addressed by finding a sparse hyperplane, a weight vector w in which most components are equal to zero. The rationale is that zero elements do not contribute to determining the value of z: Simultaneous Relevant Feature Identification and Classification 3 z = P  p=1 w p x p − b. If w p = 0, feature p is “irrelevant” with regards to deciding the class. Since only non-zero elements w p = 0 influence the value of z, they can be regarded as “relevant” features. The task of defining a small number of relevant features can be equated with that of finding a small set of non-zero elements. This can be formulated as an optimization problem; namely that of minimizing the l 0 norm w 0 , where w 0 = number of{p : w p =0}, the number of non-zero elements of w.Thuswe obtain: min w,b w 0 subject to y n (w T x n − b) ≥ 0 ∀n = {1, ,N} . (2) Unfortunately, problem (2) is NP-hard [10]. A tractable, convex approxima- tion to this problem can be obtained by replacing the l 0 norm with the l 1 norm w 1 , where w 1 =  P p=1 |w p |, the sum of the absolute magnitudes of the elements of a vector [10]: min w,b w 1 =  P p=1 |w p | subject to y n (w T x n − b) ≥ 0 ∀n = {1, ,N} . (3) A solution to (3) yields the desired sparse weight vector w. Optimization problem (3) can be solved via linear programming [11]. The ensuing formulation requires the imposition of constraints on the allowed ranges of variables. The introduction of new variables u p ,v p ∈ R P such that |w p | = u p + v p and w p = u p − v p ensures non-negativity. The range of w p = u p − v p is unconstrained (positive or negative) whilst u p and v p remain non-negative. u p and v p are designated the “positive” and “negative” parts respectively. Similarly, the bias b is split into positive and negative components b = b + − b − . Given a solution to problem (3), either u p or v p will be non-zero for feature p [11]: min u,v,b + ,b −  P p=1 (u p + v p ) subject to y n ((u −v) T x n − (b + − b − )) ≥ 1 u p ≥ 0; v p ≥ 0; b + ≥ 0; b − ≥ 0 ∀n = {1, ,N}; ∀p = {1, ,P} . (4) A detailed description of the origins of the ≥ 1 constraint can be found elsewhere [30]. If the data D are not linearly separable, misclassifications (errors in the class labels y n ) can be accounted for by the introduction of slack variables ξ n . Problem (4) can be recast yielding the final optimization problem, 4 L.R. Grate et al. min u,v,b + ,b −  P p=1 (u p + v p )+C  N n=1 ξ n subject to y n ((u −v) T x n − (b + − b − )) ≥ 1 − ξ n u p ≥ 0; v p ≥ 0; b + ≥ 0; b − ≥ 0; ξ n ≥ 0 ∀n = {1, ,N}; ∀p = {1, ,P} . (5) C is an adjustable parameter weighing the contribution of misclassified data points. Larger values lead to fewer misclassifications being ignored: C = 0 cor- responds to no outliers being ignored whereas C →∞leads to the hard margin limit. 3 Computational, Software and Practical Issues Learning the sparse classifier defined by optimization problem (5) involves min- imizing a linear function subject to linear constraints. Efficient algorithms for solving such linear programming problems involving ∼10,000 variables (N) and ∼10,000 constraints (P ) are well-known. Standalone open source codes include lp solve 1 and PCx 2 . Nominal Liknon is an implementation of the sparse classifier (5). It incor- porates routines written in Matlab 3 and a system utilizing perl 4 and lp solve. The code is available from the authors upon request. The input consists of a file containing an N × (P + 1) data matrix in which each row represents a single profiling experiment. The first P columns are the feature values, abundances of molecular species, whilst column P + 1 is the class label y n ∈{+1, −1}. The output comprises the non-zero values of the weight vector w (relevant features), the bias b and the number of non-zero slack variables ξ n . The adjustable parameter C in problem (5) can be set using cross validation techniques. The results described here were obtained by choosing C =0.5or C =1. 4 Application of Nominal Liknon to Real World Data Nominal Liknon was applied to five data sets in the size range (N = 19, P = 1,987) to (N = 200, P = 15,154). A data set D yielded a sparse classifier, w and b, and a specification of the l relevant features (P  l). Since the profiling studies produced only a small number of data points (N  P), the generalization error of a nominal Liknon classifier was determined by computing the leave-one-out error for l-dimensional data points. A classifier trained using N −1 data points was used to predict the class of the withheld data point; the procedure repeated N times. The results are shown in Table 1. Nominal Liknon performs well in terms of simultaneous relevant feature identification and classification. In all five transcript and protein profiling data 1 http://www.netlib.org/ampl/solvers/lpsolve/ 2 http://www-fp.mcs.anl.gov/otc/Tools/PCx/ 3 http://www.mathworks.com 4 http://www.perl.org/ [...]... lp-machines In Ninth International Conference on Artificial Neural Networks, volume 470, pages 304–309 IEE, London, 1999 16 L.R Grate, C Bhattacharyya, M.I Jordan, and I.S Mian Integrated analysis of transcript profiling and protein sequence data In press, Mechanisms of Ageing and Development, 2002 17 T Hastie, R Tibshirani, , and Friedman J The Elements of Statistical Learning: Data Mining, Inference,... a clone in each cell (Ra,b , Cx,y ) for which ax+b = y, where the arithmetic is carried out over the finite field Fm This design results in a sparse array by the following reasoning Considering the affine plane of order m, rows correspond to lines, and columns correspond to points A cell contains a clone only if the column’s point lies on the row’s line Since there are no two distinct lines going through... is 1, using the simple rule that the z-th element of Fq is replaced by a binary vector in which the z-th coordinate is 1 The resulting binary code C has length qn, and each binary codeword has weight n Using the binary code vectors as clone signatures, the binary code defines a pooling design with K = qn pools and N = |C| clones, for which each clone is included in n pools If d is the the minimum distance... n array, in which the j-th column vector equals φ(cj ) for all j Enumerating the entries of φ(c) in any fixed order gives a binary vector, giving the signature for the clone corresponding to c Let f : Fn → Fqn denote the mapping of the original q 2 Pooled Genomic Indexing (PGI) 19 codewords onto binary vectors defined by φ and the enumeration of the matrix entries Designs from Linear Codes A linear code... minimum in Equation (3b) is attained for q k−w(x) choices of singleton clone sets (Proof in Appendix.) For instance, a design based on the RS(6, 3) code has the following properties – 343 clones are pooled in 42 pools; – each clone is included in 6 pools, if at least 3 of those are included in an index to the clone, the index can be deconvoluted unambiguously; – each pool contains 49 clones; – signatures... following pooling design method using error-correcting block codes Let C be a code of block length n over the finite field Fq In other words, let C be a set of length-n vectors over Fq A corresponding binary code is constructed by replacing the elements of Fq in the codewords with binary vectors of length q The substitution uses binary vectors of weight one, i.e., vectors in which exactly one coordinate is... turned into an index into the yet-to-be sequenced homologous clones across species As we will see below, this basic pooling scheme is somewhat modified in practice in order to achieve correct and unambiguous mapping PGI constructs comparative BAC-based physical maps at a fraction (on the order of 1%) of the cost of full genome sequencing PGI requires only minor changes in the BAC-based sequencing pipeline... p0 )t p0 t t=n min M −c n where p0 ≈ e (Proof in Appendix.) In simple arrayed pooling nmin = n = 2 In the pooling based on the RS(6, 3) code, nmin = 3, n = 6 When is the probability of success larger with the latter indexing method, at a fixed coverage? The probabilities can be compared using the following analogy Let 6 balls be colored with red and green independently, each ball is colored randomly... specific chromosome There exist different combinatorial versions of haplotyping problems In particular, the problem of haplotyping a population (i.e., a set of individuals) has been extensively studied, under many objective functions [4,5,3], while haplotyping for a single individual has been studied in [8] and in [9] Given complete DNA sequence, haplotyping an individual would consist of a trivial check... sequencing, which represents the bulk of the effort involved in a PGI project, provides useful information irrespective of the pooling scheme In other words, pooling by itself does not represent a significant overhead, and yet produces a comprehensive and accurate comparative physical map Our reason for proposing PGI is motivated by recent advances in sequencing technology [3] that allow shotgun sequencing . Gusfield (Eds.): WABI 2002, LNCS 2452, pp. 10–28, 2002. c  Springer-Verlag Berlin Heidelberg 2002 Pooled Genomic Indexing (PGI) 11 separately. Because of the pooling in PGI, the individual shotgun. profiling and protein sequence data. In press, Mechanisms of Ageing and Development, 2002. 17. T. Hastie, R. Tibshirani, , and Friedman J. The Elements of Statistical Learning: Data Mining, Inference,. details. The Workshop on Algorithms in Bioinformatics covers research in all areas of algorithmic work in bioinformatics and computational biology. The emphasis is on discrete algorithms that address

Ngày đăng: 10/04/2014, 10:59

TỪ KHÓA LIÊN QUAN