PREDICTIVE TOXICOLOGY © 2005 by Taylor & Francis Group, LLC edited by Christoph Helma University of Freiburg, Germany PREDICTIVE TOXICOLOGY © 2005 by Taylor & Francis Group, LLC Published in 2005 by Taylor & Francis Group 6000 Broken Sound Parkway NW Boca Raton, FL 33487–2742 # 2005 by Taylor & Francis Group, LLC No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10987654321 International Standard Book Number-10: 0–8247–2397–X (Hardcover) This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:==www.copyright.com=) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978–750–8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trade- marks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Catalog record is available from the Library of Congress Visit the Taylor & Francis Web site at http :==www.taylorandfrancis.com © 2005 by Taylor & Francis Group, LLC Contents Contributors . . . . ix 1. A Brief Introduction to Predictive Toxicology 1 Christoph Helma What Is Predictive Toxicology? 1 Ingredients of a Predictive Toxicology System . . . . 3 Concluding Remarks . . . . 7 2. Description and Representation of Chemicals . . . 11 Wolfgang Guba Introduction . . . . 11 Fragment-Based and Whole Molecule Descriptor Schemes . . . . 13 Fragment Descriptors . . . . 14 Topological Descriptors 19 3D Molecular Interaction Fields . . . . 23 Other Approaches . . . . 27 iii © 2005 by Taylor & Francis Group, LLC 3. Computational Biology and Toxicogenomics . . 37 Kathleen Marchal, Frank De Smet, Kristof Engelen, and Bart De Moor Introduction . . . . 37 Microarrays . . . . 41 Analysis of Microarray Experiments . . . . 46 Conclusions and Perspectives . . . . 74 4. Toxicological Information for Use in Predictive Modeling: Quality, Sources, and Databases 93 Mark T. D. Cronin Introduction . . . . 93 Requirements for Toxicological Data for Predictive Toxicity . . . . 98 High Quality Data Sources for Predictive Modeling . . . . 104 Databases Providing General Sources of Toxicological Information . . . . 104 Databases Providing Sources of Toxicological Information for Specific Endpoints . . . . 110 Sources of Chemical Structures . . . . 119 Sources of Further Toxicity Data . . . . 121 Conclusions . . . . 123 5. The Use of Expert Systems for Toxicology Risk Prediction . . . . 135 Simon Parsons and Peter McBurney Introduction 136 Expert Systems . . . . 137 Expert Systems for Risk Prediction . . . . 147 Systems of Argumentation . . . . 153 Summary . . . . 167 6. Regression- and Projection-Based Approaches in Predictive Toxicology . . . . . . 177 Lennart Eriksson, Erik Johansson, and Torbjo¨rn Lundstedt Introduction . . . . 178 iv Contents © 2005 by Taylor & Francis Group, LLC Characterization and Selection of Compounds: Statistical Molecular Design . . . . 179 Data Analytical Techniques . . . . 182 Results for the First Example—Modeling and Predicting In Vitro Toxicity of Small Haloalkanes . . . . 190 Results for the Second Example—Lead Finding and QSAR-Directed Virtual Screening of Hexapeptides . . . . 203 Discussion . . . . 211 7. Machine Learning and Data Mining . . . . 223 Stefan Kramer and Christoph Helma Introduction . . . . 223 Descriptive DM . . . . 231 Predictive DM . . . . 239 Literature and Tools=Implementations . . . . 246 Summary . . . . 249 8. Neural Networks and Kernel Machines for Vector and Structured Data . . . 255 Paolo Frasconi Introduction . . . . 255 Supervised Learning . . . . 258 The Multilayered Perceptron . . . . 268 Support Vector Machines . . . . 279 Learning in Structured Domains . . . . 288 Conclusion . . . . 299 9. Applications of Substructure-Based SAR in Toxicology 309 Herbert S. Rosenkranz and Bhavani P. Thampatty Introduction . . . . 309 The Role of Human Expertise . . . . 311 Model Validation: Characterization and Interpretation . . . . 316 Congeneric vs. Non-congeneric Data Sets . . . . 335 Complexity of Toxicological Phenomena and Limitations of the SAR Approach . . . . 343 Mechanistic Insight from SAR Models . . . . 345 Contents v © 2005 by Taylor & Francis Group, LLC Application of SAR to a Dietary Supplement . . . . 348 SAR in the Generation of Mechanistic Hypotheses . . . . 354 Mechanisms: Data Mining Approach . . . . 355 An SAR-Based Data Mining Approach to Toxicological Discovery . . . . 357 Conclusion . . . . 361 10. OncoLogic: A Mechanism-Based Expert System for Predicting the Carcinogenic Potential of Chemicals . . . . 385 Yin-Tak Woo and David Y. Lai Introduction . . . . 385 Mechanism-Based Structure–Activity Relationships Analysis . . . . 387 The OncoLogic Expert System . . . . 390 11. META: An Expert System for the Prediction of Metabolic Transformations . . . 415 Gilles Klopman and Aleksandr Sedykh Overview of Metabolism Expert Systems . . . . 415 The META Expert System . . . . 416 META Dictionary Structure . . . . 417 META Methodology . . . . 418 META_TREE . . . . 419 12. MC4PC—An Artificial Intelligence Approach to the Discovery of Quantitative Structure–Toxic Activity Relationships . 423 Gilles Klopman, Julian Ivanov, Roustem Saiakhov, and Suman Chakravarti Introduction . . . . 423 The MCASE Methodology . . . . 427 Recent Developments: The MC4PC Program . . . . 433 BAIA Plus . . . . 438 Development of Expert System Predictors Based on MCASE Results . . . . 443 Conclusion . . . . 451 vi Contents © 2005 by Taylor & Francis Group, LLC 13. PASS: Prediction of Biological Activity Spectra for Substances 459 Vladimir Poroikov and Dmitri Filimonov Introduction . . . . 459 Brief Description of the Method for Predicting Biological Activity Spectra . . . . 461 Application of Predicted Biological Activity Spectra in Pharmaceutical Research and Development . . . . 471 Future Trends in Biological Activity Spectra Prediction . . . . 474 14. lazar: Lazy Structure–Activity Relationships for Toxicity Prediction . . . . 479 Christoph Helma Introduction . . . . 479 Problem Definition . . . . 482 The Basic lazar Concept . . . . 484 Detailed Description . . . . 485 Results . . . . 491 Learning from Mistakes . . . . 493 Conclusion . . . . 495 Contents vii © 2005 by Taylor & Francis Group, LLC Contributors Suman Chakravarti Case Western Reserve University, Cleveland, Ohio, U.S.A. Mark T. D. Cronin School of Pharmacy and Chemistry, John Moores University, Liverpool, U.K. Bart De Moor ESAT-SCD, K.U. Leuven, Leuven, Belgium Frank De Smet ESAT-SCD, K.U. Leuven, Leuven, Belgium Kristof Engelen ESAT-SCD, K.U. Leuven, Leuven, Belgium Lennart Eriksson Umetrics AB, Umea ˚ , Sweden Dmitri Filimonov Institute of Biomedical Chemistry of Russian Academy of Medical Sciences, Moscow, Russia Paolo Frasconi Dipartimento di Sistemi e Informatica, Universita ` degli Studi di Firenze, Firenze, Italy Wolfgang Guba F. Hoffmann-La Roche Ltd, Pharmaceuticals Division, Basel, Switzerland Christoph Helma Institute for Computer Science, Universita ¨ t Freiburg, Georges Ko ¨ hler Allee, Freiburg, Germany ix © 2005 by Taylor & Francis Group, LLC Julian Ivanov MULTICASE Inc., Beachwood, Ohio, U.S.A. Erik Johansson Umetrics AB, Umea ˚ , Sweden Gilles Klopman MULTICASE Inc., Beachwood, Ohio, and Department of Chemistry, Case Western Reserve University, Cleveland, Ohio, U.S.A. Stefan Kramer Institut fu¨ r Informatik, Technische Universita ¨ t Mu¨ nchen, Garching, Mu¨ nchen, Germany David Y. Lai Risk Assessment Division, Office of Pollution Prevention and Toxics, U.S. Environmental Protection Agency, Washington, D.C., U.S.A. Torbjo ¨ rn Lundstedt Acurepharma AB and BMC, Uppsala, Sweden Peter McBurney Department of Computer Science, University of Liverpool, Liverpool, U.K. Kathleen Marchal ESAT-SCD, K.U. BMC, Leuven, Leuven, Belgium Simon Parsons Department of Computer and Information Science, Brooklyn College, City University of New York, Brooklyn, New York, U.S.A. Vladimir Poroikov Institute of Biomedical Chemistry of Russian Academy of Medical S ciences, Moscow, Russia Herbert S. Rosenkranz Department of Biomedical Sciences, Florida Atlantic University, Boca Raton, Florida, U.S.A. Roustem Saiakhov MULTICASE Inc., Beachwood, Ohio, U.S.A. Aleksandr Sedykh Department of Chemistry, Case Western Reserve University, Cleveland, Ohio, U.S.A. Bhavani P. Thampatty Department of Environmental and Occupational Health, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, U.S.A. Yin-Tak Woo Risk Assessment Division, Office of Pol lution Prevention and Toxics, U.S. Environmental Protection Agency, Washington, D.C., U.S.A. x Contributors © 2005 by Taylor & Francis Group, LLC [...]... In: Helma C, ed Predictive Toxicology New York: Marcel Dekker, 2005 6 Woo Y, Lai DY OncoLogic: a mechanism-based expert system for predicting the carcinogenic potential of chemicals In: Helma C, ed Predictive Toxicology New York: Marcel Dekker, 2005 7 Eriksson L, Johansson E, Lundstedt T Regression- and projection-based approaches in Predictive Toxicology In: Helma C, ed Predictive Toxicology New York:... www .predictive- toxicology. org You can conb Please note that the specfic meaning or usage of the terms may vary from chapter to chapter © 2005 by Taylor & Francis Group, LLC 8 Helma tact me by e-mail at helma@informatik.uni-freiburg.de if you wish to comment on the book or intend to contribute to the website REFERENCES 1 Guba W Description and representation of chemicals In: Helma C, ed Predictive Toxicology. .. relationships (QSTAR) In: Helma C, ed Predictive Toxicology New York: Marcel Dekker, 2005 11 Poroikov VV, Filimonov D PASS: prediction of biological activity for substance In: Helma C, ed Predictive Toxicology New York: Marcel Dekker, 2005 12 Helma C lazer: Lazy structure–activity relationship for toxicity prediction In: Helma C, ed Predictive Toxicology New York: Marcel Dekker, 2005 13 Rosenkranz HS, Thampathy... decades Eriksson et al.(7) describe statistical techniques in the chapter entitled Regression- and Projection-Based Approaches in Predictive Toxicology More recently, techniques originating from artificial intelligence research have been used in predictive toxicology These computer-science oriented developments are summarized in two chapters: Machine Learning and Data Mining by Kramer and Helma (8)... and toxicogenomics In: Helma C, ed Predictive Toxicology New York: Marcel Dekker, 2005 3 Cronin MTD Toxicological information for use in predictive modeling: quality, sources, and databases In: Helma C, ed Predictive Toxicology New York: Marcel Dekker, 2005 4 Parsons S, McBurney P The use of expert systems for toxicology risk prediction In: Helma C, ed Predictive Toxicology New York: Marcel Dekker,... overlooked but, in my opinion, very important feature of many predictive toxicology systems is their applicability as a tool for the generation and verification of scientific hypotheses The chapter Applications of Substructure-Based SAR in Toxicology by Rosenkranz and Thampatty (13 ) provides some examples for the creative use of a predictive toxicology system for scientific purposes as well as some mainstream... ed Predictive Toxicology New York: Marcel Dekker, 2005 9 Frasconi P Neural networks and kernel machines for vector and structured data In: Helma C, ed Predictive Toxicology New York: Marcel Dekker, 2005 10 Klopman G, Ivanov J, Saiakhov R, Chakravarti S MC4PC— an artificial intelligence approach to the discovery of quantita- © 2005 by Taylor & Francis Group, LLC A Brief Introduction to Predictive Toxicology. .. This is essentially a predictive strategy (Fig 1) : Toxicologists study the action of chemicals in simplified biological systems (e.g., cell cultures, laboratory animals) and try to use these results to predict the potential impact on human or environmental health 1 © 2005 by Taylor & Francis Group, LLC 2 Figure 1 Helma Abstraction of the predictive toxicology process Predictive toxicology, as we understand... also distinguish between data (input and output) and algorithms © 2005 by Taylor & Francis Group, LLC A Brief Introduction to Predictive Toxicology 3 2 INGREDIENTS OF A PREDICTIVE TOXICOLOGY SYSTEM 2 .1 Chemical, Biological, and Toxicological Data Most of the research in predictive toxicology has been devoted to the development of algorithms, but for a good performance, the data aspect is at least equally.. .1 A Brief Introduction to Predictive Toxicology CHRISTOPH HELMA Institute for Computer Science, Universitat ¨ Freiburg, Georges Kohler Allee, Freiburg, Germany ¨ 1 WHAT IS PREDICTIVE TOXICOLOGY? The public demand for the protection of human and environmental health has led to the establishment of toxicology as the science of the action of chemicals . LLC Contents Contributors . . . . ix 1. A Brief Introduction to Predictive Toxicology 1 Christoph Helma What Is Predictive Toxicology? 1 Ingredients of a Predictive Toxicology System . . . . 3 Concluding. . . 11 0 Sources of Chemical Structures . . . . 11 9 Sources of Further Toxicity Data . . . . 12 1 Conclusions . . . . 12 3 5. The Use of Expert Systems for Toxicology Risk Prediction . . . . 13 5 Simon. Chemicals . . . 11 Wolfgang Guba Introduction . . . . 11 Fragment-Based and Whole Molecule Descriptor Schemes . . . . 13 Fragment Descriptors . . . . 14 Topological Descriptors 19 3D Molecular