Methods in Molecular Biology™ Series Editor John M Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK For other titles published in this series, go to www.springer.com/series/7651 Data Mining in Proteomics From Standards to Applications Edited by Michael Hamacher Lead Discovery Center GmbH, Dortmund, Germany Martin Eisenacher and Christian Stephan Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany Editors Dr Michael Hamacher Lead Discovery Center GmbH Dortmund Germany hamacher@lead-discovery.de Dr Martin Eisenacher Medizinisches Proteom-Center Ruhr-Universität Bochum Bochum Germany martin.eisenacher@rub.de Dr Christian Stephan Medizinisches Proteom-Center Ruhr-Universität Bochum Bochum Germany christian.stephan@rub.de ISSN 1064-3745 e-ISSN 1940-6029 ISBN 978-1-60761-986-4 e-ISBN 978-1-60761-987-1 DOI 10.1007/978-1-60761-987-1 Springer New York Dordrecht Heidelberg London © Springer Science+Business Media, LLC 2011 All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made The publisher makes no warranty, express or implied, with respect to the material contained herein Printed on acid-free paper Humana Press is part of Springer Science+Business Media (www.springer.com) Preface Inspired by the enormous impact of Genomics and the hopes that came along with it, biochemistry and its methods slowly evolved into what is now widely known as Proteomics Scientists dedicated to mass spectrometry and gel-based technologies became aware of the powerful tools they hold in hand, dreaming of the quantitative analyses of proteins in cells, tissues, and diseases Thus, Proteomics soon went from a shooting-star in the life science field to a must-have in each larger wet-lab group Methods and technology developed rapidly, often much faster than the awareness of the special needs of the tools in use and even faster than standard protocols and standard formats could mature Soon proteomics techniques created more and more data, while meaningful approaches for data handling, interpretation, and exchange sometimes were clearly behind, resulting in misinterpreted studies and frustrated colleagues from time to time However, the know-how generated and experiences made especially in the last several years caused a rethinking of strategy design and data interpretation Moreover, the elaboration of standards by such voluntarily driven groups as Proteomics Standards Initiative within the Human Proteome Organisation or the US institutions, Institute of Systems Biology (ISB), and National Institute of Standards and Technology (NIST), ushered in a new era of understanding and quality, proving how powerful Proteomics is when the technology can be controlled through data generation, handling, and mining This book reflects these new insights within the Proteomics community, taking the historical evolution as well as the most important international standardization projects into account so that the reader gets a feeling for the dynamism and openness in this field Basic and sophisticated overviews are given in regard to proteomics technologies, standard data formats, and databases – both local laboratory databases and public repositories There are chapters dealing with detailed information concerning data interpretation strategies, including statistics, spectra interpretation, and analysis environments Other chapters describe the HUPO initiatives or are about more specialized tasks, such as data annotation, peak picking, phosphoproteomics, spectrum libraries, LC/MS imaging, and splice isoforms This volume also includes in-depth description of tools for data mining and visualization of Proteomics data, leading to modeling and Systems Biology approaches To look beyond the Proteomics tasks and challenges, some chapters present insights into protein interaction network evolution, text mining, and random matrix approaches All in all, we believe that this book is a well-balanced compendium for beginners and experts, offering a broad scope of data mining topics but always focusing on the current state-of-the-art and beyond Enjoy! Dortmund, Germany Bochum, Germany Bochum, Germany Michael Hamacher Martin Eisenacher Christian Stephan v Contents Preface Contributors v ix Part I Data Generation and Result Finding Instruments and Methods in Proteomics Caroline May, Frederic Brosseron, Piotr Chartowski, Cornelia Schumbrutzki, Bodo Schoenebeck, and Katrin Marcus In-Depth Protein Characterization by Mass Spectrometry Daniel Chamrad, Gerhard Körting, and Martin Blüggel Analysis of Phosphoproteomics Data Christoph Schaab 27 41 Part II Databases The Origin and Early Reception of Sequence Databases 61 Joel B Hagen Laboratory Data and Sample Management for Proteomics 79 Jari Häkkinen and Fredrik Levander PRIDE and “Database on Demand” as Valuable Tools for Computational Proteomics 93 Juan Antonio Vizcno, Florian Reisinger, Richard Cơté, and Lennart Martens Analysing Proteomics Identifications in the Context of Functional and Structural Protein Annotation: Integrating Annotation Using PICR, DAS, and BioMart 107 Philip Jones Tranche Distributed Repository and ProteomeCommons.org 123 Bryan E Smith, James A Hill, Mark A Gjukich, and Philip C Andrews Part III Standards Data Standardization by the HUPO-PSI: How has the Community Benefitted? 149 Sandra Orchard and Henning Hermjakob 10 mzIdentML: An Open Community-Built Standard Format for the Results of Proteomics Spectrum Identification Algorithms 161 Martin Eisenacher 11 Spectra, Chromatograms, Metadata: mzML-The Standard Data Format for Mass Spectrometer Output 179 Michael Turewicz and Eric W Deutsch vii viii Contents 12 imzML: Imaging Mass Spectrometry Markup Language: A Common Data Format for Mass Spectrometry Imaging 205 Andreas Römpp, Thorsten Schramm, Alfons Hester, Ivo Klinkert, Jean-Pierre Both, Ron M.A Heeren, Markus Stöckli, and Bernhard Spengler 13 Tandem Mass Spectrometry Spectral Libraries and Library Searching 225 Eric W Deutsch Part IV Processing and Interpretation of Data 14 Inter-Lab Proteomics: Data Mining in Collaborative Projects on the Basis of the HUPO Brain Proteome Project’s Pilot Studies Michael Hamacher, Bernd Gröttrup, Martin Eisenacher, Katrin Marcus, Young Mok Park, Helmut E Meyer, Kyung-Hoon Kwon, and Christian Stephan 15 Data Management and Data Integration in the HUPO Plasma Proteome Project Gilbert S Omenn 16 Statistics in Experimental Design, Preprocessing, and Analysis of Proteomics Data Klaus Jung 17 The Evolution of Protein Interaction Networks Andreas Schüler and Erich Bornberg-Bauer 18 Cytoscape: Software for Visualization and Analysis of Biological Networks Michael Kohl, Sebastian Wiese, and Bettina Warscheid 19 Text Mining for Systems Modeling Axel Kowald and Sebastian Schmeier 20 Identification of Alternatively Spliced Transcripts Using a Proteomic Informatics Approach Rajasree Menon and Gilbert S Omenn 21 Distributions of Ion Series in ETD and CID Spectra: Making a Comparison Sarah R Hart, King Wai Lau, Simon J Gaskell, and Simon J Hubbard 235 247 259 273 291 305 319 327 Part V Tools 22 Evaluation of Peak-Picking Algorithms for Protein Mass Spectrometry 341 Chris Bauer, Rainer Cramer, and Johannes Schuchhardt 23 OpenMS and TOPP: Open Source Software for LC-MS Data Analysis 353 Andreas Bertsch, Clemens Gröpl, Knut Reinert, and Oliver Kohlbacher 24 LC/MS Data Processing for Label-Free Quantitative Analysis 369 Patricia M Palagi, Markus Müller, Daniel Walther, and Frédérique Lisacek Part VI Modelling and Systems Biology 25 Spectral Properties of Correlation Matrices – Towards Enhanced Spectral Clustering Daniel Fulger and Enrico Scalas 26 Standards, Databases, and Modeling Tools in Systems Biology Michael Kohl 27 Modeling of Cellular Processes: Methods, Data, and Requirements Thomas Millat, Olaf Wolkenhauer, Ralf-Jörg Fischer, and Hubert Bahl Index 381 413 429 449 Contributors Philip C Andrews • Departments of Biological Chemistry, Bioinformatics and Chemistry, University of Michigan, Ann Arbor, MI, USA Hubert Bahl • Division of Microbiology, Institute of Biological Sciences, University of Rostock, Rostock, Germany Chris Bauer • MicroDiscovery GmbH, Berlin, Germany Andreas Bertsch • Division for Simulation of Biological Systems, WSI/ZBIT, Eberhard-Karls-Universität Tübingen, Tübingen, Germany Martin Blüggel • Protagen AG, Dortmund, Germany Erich Bornberg-Bauer • Bioinformatics Division, Institute for Evolution and Biodiversity, School of Biological Sciences, University of Muenster, Münster, Germany Jean-Pierre Both • Commissariat l’Énergie Atomique, Saclay, France Frederic Brosseron • Department of Functional Proteomics, Medizinisches ProteomCenter, Ruhr-Universität Bochum, Bochum, Germany Daniel Chamrad • Protagen AG, Dortmund, Germany Piotr Chartowski • Department of Functional Proteomics, Medizinisches ProteomCenter, Ruhr-Universität Bochum, Bochum, Germany Richard Cơté • European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK Rainer Cramer • The BioCentre and Department of Chemistry, The University of Reading, Whiteknights, Reading, UK Eric W Deutsch • Institute for Systems Biology, Seattle, WA, USA Martin Eisenacher • Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany Ralf-Jưrg Fischer • Division of Microbiology, Institute of Biological Sciences, University of Rostock, Rostock, Germany Daniel Fulger • Department of Chemistry and WZMW, Computer Simulation Group, Philipps-University Marburg, Marburg, Germany Complex Systems Lagrange Lab, Institute for Scientific Interchange, Torino, Italy Simon J Gaskell • Michael Barber Centre for Mass Spectrometry, School of Chemistry, Manchester Interdisciplinary Biocentre, University of Manchester, Manchester, UK Mark A Gjukich • Departments of Biological Chemistry, Bioinformatics and Chemistry, University of Michigan, Ann Arbor, MI, USA Clemens Grưpl • Division for Simulation of Biological Systems, WSI/ZBIT, Eberhard-Karls-Universität Tübingen, Tübingen, Germany Bernd Grưttrup • Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany Joel B Hagen • Department of Biology, Radford University, Radford, VA, USA ix ... expressed in a cell or tissue at a defined time point (1) Indeed, not only qualitative analysis resulting in a defined “protein inventory” Michael Hamacher et al (eds.), Data Mining in Proteomics: From. .. imaging, and splice isoforms This volume also includes in- depth description of tools for data mining and visualization of Proteomics data, leading to modeling and Systems Biology approaches To. .. www.springer.com/series/7651 Data Mining in Proteomics From Standards to Applications Edited by Michael Hamacher Lead Discovery Center GmbH, Dortmund, Germany Martin Eisenacher and Christian Stephan