PREDICTIVE TOXICOLOGY - CHAPTER 2 pps

2 Description and Representation of Chemicals WOLFGANG GUBA F. Hoffmann-La Roche Ltd, Pharmaceuticals Division, Basel, Switzerland 1. INTRODUCTION Biological effects are mediated by intermolecular interactions, for instance, through the binding of a ligand to a receptor, which triggers a signaling event in a signal transduction path- way. Three-dimensional (3D) structures of receptor–ligand complexes are of great value to rationalize pharmacological or toxicological effects of small molecule ligands. However, due to experimental constraints such as purity and homogene- ity of the protein, crystallizability, solubility, size of the protein–ligand complex, etc., an X-ray or NMR-based structure determination is often not feasible. In those cases empirical models have to be developed that deduce biological effects 11 © 2005 by Taylor & Francis Group, LLC from the 2D or 3D molecular structures of small molecule ligands only, and as a result structure–activity relationships (SAR) are formulated. These SARs may either be qualitative [i.e., molecular features (substructures, functional groups) are associated with activity] or quantitative SARs QSARs (defined by correlating molecular structures with biological effects via mathematical equations). QSAR models require the translation of molecular structures into numerical scales, i.e., molecular descriptors. These descriptors are used by var- ious linear [partial least squares (PLS) (1), etc.] or non-linear [neural networks (2)] regression algorithms to predict biological effects of molecules which have not been tested or not even synthesized. The core of empirical model building by QSAR is the similarity principle (3), which states that similar chemical structures should have similar biological activities. The con- verse is not true, since similar biological activities may be dis- played by chemically diverse molecules (4). Within the context of QSAR, the similarity principle implies that small changes in molecular structure cause correspondingly slight variations in biological activity, which allows the interpola- tion of biological activity from a calibration set to a structu- rally related prediction set of compounds. Thus, molecular descriptors have a pivotal role for quantifying the similarity principle and their usefulness can be ranked by the following criteria:  relevance for the biological effect to be described  interpretability  speed of calculation The biological relevance of molecular descriptors can be easily checked by the stability and predictivity of the generated mathematical QSAR models and speed is (at least for up to 1000–10,000 compounds in most cases) no longer a lim- iting factor. Only for virtual screening campaigns with >10 5 compounds does this issue need to be taken into consid- eration. The most critical factor is the interpretability of molecular descriptors, because a clear understanding of the correlation of molecular structures with toxicological effects 12 Guba © 2005 by Taylor & Francis Group, LLC is crucial for correctly associating structural features with toxic liabilities and for optimizing the biological profile of a compound. This chapter will describe how molecules are transformed into numerical descriptors. Fragment-based and whole molecule descriptor schemes will be discussed, followed by examples for 1D, 2D, and 3D molecular descriptors. The focus will not be on reviewing algorithms for descriptor gen- eration but rather on illustrating strategies on how to deal with homogenous and diverse sets of molecules and on outlin- ing the scope of commonly used descriptor schemes. For more detailed information about molecular descriptors and algorithms, the reader is referred to the references and to the encyclopedic Handbook of Molecular Descriptors (5). The quest for a universal set of descriptors which can be generally applied to structure–activity modeling is ongoing and will probably never succeed. The choice of molecular descriptors is determined by the biological phenomenon to be analyzed and very often experience in descriptor selection is a critical success factor in QSAR model building. 2. FRAGMENT-BASED AND WHOLE MOLECULE DESCRIPTOR SCHEMES Molecular descriptors are usually classified in terms of dimensionality of the molecular representation (1D, 2D, and 3D) from which they are derived. However, before selecting the dimensionality of the molecular descriptor scheme, the following question needs to be answered. Do the molecules in the dataset contain an invariant substructure, a common scaffold, with one or more substitution sites to which variable building blocks are attached or is there no common substructure? A QSAR analysis correlates the variation in molecular structures with the variation in biological activities. In the case of an invariant scaffold the obvious strategy is to estab- lish a relationship between the structural variation of the substituents (R groups), the substitution site (R 1 ,R 2 , etc.) Description and Representation of Chemicals 13 © 2005 by Taylor & Francis Group, LLC and the resulting biological effects. Since the R groups also influence the scaffold (e.g., via electronic effects), another approach would be to compare the effects of the substituent groups onto the common set of scaffold atoms (Fig. 1). Hybrid approaches are also feasible correlating both the variation of building blocks with respect to a substitution site and the modified properties of common scaffold atoms with biological activities. If no common substructure can be identified, whole molecule descriptors have to be calculated. These heterogeneous datasets are more challenging than series with common scaf- folds. It cannot be assumed a priori that each molecule in the dataset interacts with the biological target in the same way, and it is usually not a trivial task to identify those structural features which cause a biological effect. Later it will be illustrated how topological and 3D descriptor schemes attempt to tackle this problem. 3. FRAGMENT DESCRIPTORS 3.1. Homogeneous Dataset with a Common Scaffold Drugs or toxic agents interact with their macromolecular tar- gets (enzymes, receptors, ion channels, etc.) via hydrophobic, steric, polar, and electrostatic forces. The classical Hansch analysis (6) assigns physicochemical property constants to each substituent, and a correlation between the substituent Figure 1 In datasets with a common, invariant scaffold (here, biphenyl) molecular descriptors can be generated both for the variation of substituents R (marked by circles) and for the substitution sites (marked by squares). 14 Guba © 2005 by Taylor & Francis Group, LLC constants and the biological effect is established. Commonly used fragment values are hydrophobic constants (p), molar refractivity (MR), and the electronic Hammett constant (s)(7). The hydrophobic constant p has been derived from the difference of octanol–water partition coefficients of a substituted molecule and the unsubstituted parent compound. p substituent ¼ log P substituted compound À log P parent compound The octanol–water partition coefficient is defined by log P ¼ Concentration of solute in octanol phase Concentration of solute in aqueous phase and assumes positive values for lipophilic compounds favoring the octanol phase and negative values for polar molecules with a preference for the aqueous phase. There is a general trend between lipophilicity and toxicity of xenobiotics which is caused by an enrichment in hydrophobic body compartments (membranes, body fat, etc.) and by extensive metabolism lead- ing to reactive species. The octanol–water partition coefficient (log P) is a highly relevant measure and it can be calculated via a battery of atom- and fragment-based methods (8). Molar refractivity is determined from the Lorentz– Lorenz equation (9) and is a function of the refractive index (n), density (d), and molecular weight (MW): MR ¼ n 2 À 1 n 2 þ 2 MW d Molar refractivity is related to the molar volume (MW=density), but the refractive index correction also accounts for polarizability. However, molar refractivity does not discriminate between substituents with different shapes (10); e.g., the MR values of butyl and tert-butyl are 19.61 and 19.62, respectively. Therefore, Verloop (11) developed the STERIMOL parameters which are based on 3D models of fragments generated with standard bond lengths and angles. Topologically based approaches to derive shape descriptors will be described below. Description and Representation of Chemicals 15 © 2005 by Taylor & Francis Group, LLC The Hammett constant is a measure of the electron- withdrawing or electron-donating properties of substituents and was originally derived from the effect of substituents on the ionization of benzoic acids (12,13). Hammett defined the parameter s as follows with pK a being the negative decadic logarithm of the ionization constant: s ¼ pK a benzoic acid À pK a meta; paraÀsubstituted benzoic acid Positive values of s correspond to electron withdrawal by the substituent from the aromatic ring (s para-nitro ¼0.78), whereas negative s values indicate an electron-donating substituent (s para-methoxy ¼À0.27). Electronic effects can be categorized into field-inductive and resonance effects. Field-inductive effects consist of two components: 1) s-orp-bond mediated inductive effect and 2) electrostatic field effect which is trans- mitted through solvent space. Resonance effects energetically stabilize a molecule by the delocalization of p electrons or by hyperconjugation (delocalization of s electrons in a p orbital aligned with the s bond). Swain and Lupton (14) introduced the factoring of s values into field and resonance effects, which were more consistently redefined by Hansch and Leo (9). A compilation of electronic substituent constants has been published for 530 substituents (15). Although these tabulated electronic substituent constants are of great value, the main drawback is the limited number of available data. Therefore, quantum-chemical calculations to derive electronic whole molecule and fragment descriptors are becoming increasingly common in QSAR=QSPR modeling (16,17). Finally, indicator variables have to be mentioned as a special case of fragment descriptors. Indicator variables show whether a particular substructure or feature is present (l ¼1) or absent (l ¼0) at a given substitution site on a scaffold. Typi- cal applications of indicator variables are the description of ortho effects, cis=trans isomerism, chirality, different parent skeletons, charge state, etc. (6). The original Hansch analysis (6) correlates the above- mentioned physicochemical descriptors of the substituents of a congeneric series with a biological activity. From a more general perspective, each substitution site (R 1 ,R 2 , etc.) is 16 Guba © 2005 by Taylor & Francis Group, LLC characterized with respect to principal physicochemical properties (steric bulk, lipophilicity, hydrogen bonding, electro- nics), and the presence of special structural features is encoded by indicator variables. This descriptor matrix is correlated with the ligand concentration C which causes a biological effect (e.g., EC 50 : concentration of an agonist that produces 50% of the maximal response). The regression coefficients are often determined by multiple linear regression (MLR). However, MLR assumes that the descriptors are uncorrelated, which is predominantly not the case. Therefore, PLS is recommended as the general statistical engine since it does not suffer from the drawbacks of MLR: Àlog C ¼a 0 þ a 1 p þ a 2 MR þ a 3 s |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} R 1 þÁÁÁþb 0 þ b 1 p þ b 2 MR þ b 3 s |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} R 2 þÁÁÁþk 1 l 1 þ |fflfflfflfflffl{zfflfflfflfflffl} indicator variables ÁÁÁ In addition, quadratic terms can be introduced to account for the frequently observed non-linear correlation of physicochemical properties, such as log P, with bioactivity. For example, let us assume that the lipophilicity within a compound series is positively correlated with a biological response. Even if an increase in lipophilicity is accompanied by an analogous rise of activity, there will be a lipophilicity optimum beyond which activity drops. Possible causes for this deviation from a linear correlation model are a decreasing solubility in aqueous body fluids or an enrichment in biological membranes or body fat which would reduce the effective concentration of the ligand at the site of action. This can be described by the following parabolic model that defines a log P optimum beyond which activity is reduced: Àlog C ¼ a 0 À a 1 ðlog PÞ 2 þ a 2 log P ÁÁÁ If a stable and predictive QSAR can be derived, the analysis of the statistical model allows one to determine which substitu- Description and Representation of Chemicals 17 © 2005 by Taylor & Francis Group, LLC tion site or structural feature has the largest impact on activity and what physicochemical property profile is required for the optimization of activity. However, this strategy is confined to a congeneric series with a common scaffold. 3.2. Heterogeneous Dataset In heterogeneous datasets no common molecular substructure can be defined. A QSAR model of a molecular descriptor matrix consisting of n rows (molecules) and k descriptor col- umns requires that the column entries of the descriptor matrix denote the same property for each molecule. For heterogeneous datasets atomic descriptors cannot be compared directly due to the different number of atoms in each molecule. In the following two sections, van der Waals surface area descriptors and autocorrelation functions will be introduced as a means to allow for QSAR=QSPR modeling of compound sets with no common core structures. 3.2.1. van der Waals Surface Area Descriptors The Hansch concept of correlating biological effects with principal physicochemical properties of substituents has been extended to whole molecules by Paul Labute (18). In a first step, the van der Waals surface area (VSA) is calculated for each atom of a molecule from a topological connection table with a predefined set of van der Waals radii and ideal bond lengths. Thus, V i , the contribution of atom i to the VSA of a molecule, is a conformationally independent approximate 3D property which only requires 2D connectivity information. In a second step, steric, lipophilic, and electrostatic properties are calculated for each atom by applying the algorithms of Wildman and Crippen (19) for determining log P and molar refractivity and assigning the partial charges of Gasteiger and Marsili (20). Each of the three properties is divided into a set of predefined ranges (10 bins for log P, 8 bins for MR, and 14 bins for partial charges) and for each property the atomic VSA contributions in a given property range bin are added up. Thus, the VSA descriptors correspond to a subdivi- sion of the total molecular surface area into surface patches 18 Guba © 2005 by Taylor & Francis Group, LLC that are assigned to ranges of steric, lipophilic, and electrostatic properties. Summing up, each molecule is transformed into a 10 þ8 þ14 ¼32 dimensional vector. Linear combina- tions of VSA descriptors correlate well with many widely used descriptors such as connectivity indices, physicochemical, properties, atom counts, polarizability, etc. However, the interpretation of VSA-based QSAR models with respect to proposing chemical modifications for the optimization of compound properties is not straightforward. 3.2.2. Autocorrelation Transforms of 2D Descriptors As mentioned above, the analysis of heterogeneous datasets requires the comparison of molecules with different numbers of atoms. Descriptor vectors of varying length can be transformed into directly comparable vectors of uniform length by an autocorrelation transform. Moreau and Broto (21,22) were the first to add or average atom pair properties sepa- rated by a predefined number of bonds. However, due to the summation or averaging of atom pair properties for a given distance bin, the interpretation of biological effects with respect to individual atom pair properties is no longer possible. 4. TOPOLOGICAL DESCRIPTORS Topological descriptors are derived entirely from 2D structural formulas and, therefore, missing parameters, conforma- tional flexibility, or molecular alignment do not have to be taken into account. The pros and cons of 2D vs. 3D descriptors will be briefly discussed in the following section. Whereas topological descriptors can be easily calculated from molecular graphs, the interpretation of topological indices with respect to molecular structures is often far from obvious. There is still a highly controversial debate about the utility of topological indices which peaked in provocative statements like ‘‘connectivity parameters are artificial parameters that are worthless in real quantitative structure–activity relationships’’ (23). Nevertheless, the interested reader should Description and Representation of Chemicals 19 © 2005 by Taylor & Francis Group, LLC develop his=her own opinion and, therefore, the electrotopological state (E-state) indices developed by Kier and Hall (24) will be introduced as one of the more intuitive examples of topological indices. The general concept of the E-state indices is to character- ize each atom of a molecule in terms of its potential for electronic interactions which is influenced by the bound neighboring atoms. Kier and Hall describe the topological environment of an atom by the d-value, which is defined as the number of adjacent atoms minus the number of bound hydrogens d ¼ s À h In other words, the d-parameter characterizes the number of sigma electrons or bonds around each non-hydrogen atom. In addition, the valence delta value, d v , is introduced as d v ¼ s þ p þ n À h with p being the number of electrons in pi orbitals and n being the number of lone-pair electrons. Thus, the valence delta value d v indicates the total number of sigma, pi and lone pair electrons for each atom excluding hydrogen atoms. As an example, an sp 3 hybridized ether oxygen has a d value of 2 (2 sigma bonds) and d v equals 6 (2 sigma and 4 lone pair electrons). An sp 2 hybridized carbonyl oxygen, however, has a d value of 1 (1 sigma bond) and d v equals 6 (1 sigma, 1 pi and 4 lone pair electrons). From the parameters d and d v the term d v Àd is derived: d v À d ¼ p þ n Thus, the term d v Àd is the total count of pi and lone-pair electrons for each atom in a molecule. It provides quantitative information about the potential of an atom for intermolecular interactions and, in addition, it is correlated with electronegativity (25). The intrinsic state I combines the information about the topological environment of an atom and the availability of electrons for intermolecular interactions. This is achieved by multiplying the electronegativity-related term d v Àd with 20 Guba © 2005 by Taylor & Francis Group, LLC [...]... permeation from three-dimensional molecular structure J Med Chem 20 00; 43 :22 04 22 16 32 Zamora I, Oprea T, Cruciani G, Pastor M, Ungell AL Surface descriptors for protein–ligand affinity prediction J Med Chem 20 03; 46 :25 –33 33 Oprea T, Zamora I, Ungell AL Pharmacokinetically based mapping device for chemical space navigation J Comb Chem 20 02; 4 :25 8 26 6 34 Pastor M, Cruciani G, McLay I, Pickett S, Clementi S... Sci 1998; 38: 121 4– 121 7 18 Labute P A widely applicable set of descriptors J Mol Graphics Mod 20 00; 18:464–477 19 Wildman SA, Crippen GM Prediction of physicochemical parameters by atomic contributions J Chem Inf Comput Sci 1999; 39:868–873 20 Gasteiger J, Marsili M Iterative partial equalization of orbital electronegativity: a rapid access to atomic charges Tetrahedron 1980; 36: 321 9– 322 2 21 Moreau G,... Although both I-states and E-states cannot be translated back to molecular structures directly, they can, nevertheless, be interpreted in terms of electronic and topological features For instance, I-states for sp3 carbon atoms decrease from primary to quaternary carbon atoms (2. 000–1 .25 0), which reflects the reduced steric accessibility The sp3 hybridized © 20 05 by Taylor & Francis Group, LLC 22 Guba terminal... alignment-independent three-dimensional molecular descriptors J Med Chem 20 00; 43: 323 3– 324 3 35 Cramer RD III, Patterson DE, Bunce JD Comparative molecular field analysis (CoMFA) 1 Effect of shape on binding of steroids to carrier proteins J Am Chem Soc 1988; 110: 5959–5967 36 Trohalaki S, Pachter R Quantum descriptors for predictive toxicology of halogenated aliphatic hydrocarbons SAR QSAR Environ Res 20 03;... relationships (QSARs) SAR QSAR Environ Res 20 03; 14 :22 3 23 1 41 Carhart RE, Smith DH, Venkataraghavan R Atom pairs as molecular features in structure–activity studies: definition and applications J Chem Inf Comput Sci 1985; 25 :64–73 42 Ramaswamy N, Bauman N, Dixon JS, Venkataraghavan R Topological torsion: a new molecular descriptor for SAR © 20 05 by Taylor & Francis Group, LLC 32 Guba applications Comparison with... schemes are derived from GRID (27 ,28 ), where the interaction energies between chemical probes and the target molecule(s) are calculated at each single grid point in a 3D cage These probes represent van der Waals, H-bonding, electrostatic, and hydrophobic properties, and commonly used probes are H2O (H-bond donor=acceptor), carbonyl oxygen (H-bond acceptor only), amide nitrogen (H-bond donor only), and the... into descriptors as described below 5.1 VolSurf The VolSurf (28 –30) procedure transforms polar and hydrophobic 3D interaction maps into a quantitative scale by calculating the volume or the surface of the interaction contours This is illustrated in Fig 2, where the H2O and © 20 05 by Taylor & Francis Group, LLC 24 Guba Figure 2 The GRID probes H2O (left) and DRY (right) have been used to sample energetically... therefore, the hydrogen E-state is entirely based on the Kier–Hall relative electronegativities (KHE) (25 ): KHE ¼ dv À d p þ n ¼ N2 N2 Thus, the perturbation term for the calculation of the hydrogen E-state is: DHIij ¼ KHEi À KHEj r2 ij with a predefined KHE of À0 .2 for hydrogen atoms Small numerical E-state values for polar hydrogen atoms indicate a low electron density on the hydrogen atom and, therefore,... Nouv J Chim 1980; 4: 359–360 22 Moreau G, Broto P Autocorrelation of molecular structures Application to SAR studies Nouv J Chim 1980; 4:757–764 23 Kubinyi H The physicochemical significance of topological parameters A rebuttal Quant Struct-Act Relat 1995; 14:149–150 24 Kier LB, Hall LH Molecular Structure Description The Electrotopological State San Diego: Academic Press, 1999 25 Kier LB, Hall LH Molecular... electronegativity of higher than second-row atoms) yield the general definition of the intrinsic state value I as: I¼ 2= N 2 dv þ 1 d Finally, the influence of the molecular environment onto the I-states of each atom within a molecule is determined by summing up pairwise atomic interactions These interactions represent the perturbation of the I-state of a given atom by the differences in I-states with all the other . electronegativities (KHE) (25 ): KHE ¼ d v À d N 2 ¼ p þ n N 2 Thus, the perturbation term for the calculation of the hydrogen E-state is: DHI ij ¼ KHE i À KHE j r 2 ij with a predefined KHE of À0 .2 for hydrogen. Sci 1999; 39:868–873. 20 . Gasteiger J, Marsili M. Iterative partial equalization of orbital electronegativity: a rapid access to atomic charges. Tetrahe- dron 1980; 36: 321 9– 322 2. 21 . Moreau G, Broto. paths of 2 20 atoms are recorded and their occurrences are counted. Atoms may be assigned to pharmacophoric classes (H-bond donor or accep- Description and Representation of Chemicals 27 © 20 05 by

Định dạng
Số trang	25
Dung lượng	443,66 KB