Screening of physicochemical properties should be considered one of the essential steps in the drug discovery pipeline. Among the available methods, biomimetic chromatography with an immobilized artificial membrane is a powerful tool for simulating interactions between a molecule and a biological membrane.
Journal of Chromatography A 1660 (2021) 462666 Contents lists available at ScienceDirect Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma Prediction of the chromatographic hydrophobicity index with immobilized artificial membrane chromatography using simple molecular descriptors and artificial neural networks Krzesimir Ciura a,b,∗, Strahinja Kovacˇ evic´ c, Monika Pastewska a, Hanna Kapica a, Martyna Kornela a, Wiesław Sawicki a a ´ sk, Aleja Gen Hallera 107, Gdan ´ sk 80-416, Poland Department of Physical Chemistry, Medical University of Gdan ´ sk 80-172, Poland QSAR Lab Ltd., Trzy Lipy 3St Gdan c Department of Applied and Engineering Chemistry, Faculty of Technology Novi Sad, University of Novi Sad, Bulevar cara Lazara 1, Novi Sad 21000, Serbia b a r t i c l e i n f o Article history: Received 22 September 2021 Revised 27 October 2021 Accepted 28 October 2021 Available online November 2021 Keywords: Quantitative structure–retention relationships Chemometrics IAM-HPLC Artificial neural networks a b s t r a c t Screening of physicochemical properties should be considered one of the essential steps in the drug discovery pipeline Among the available methods, biomimetic chromatography with an immobilized artificial membrane is a powerful tool for simulating interactions between a molecule and a biological membrane This study developed a quantitative structure–retention relationships model that would predict the chromatographically determined affinity of xenobiotics to phospholipids, expressed as a chromatographic hydrophobicity index determined using immobilized artificial membrane chromatography A heterogeneous set of 261 molecules, mostly showing pharmacological activity or toxicity, was analyzed chromatographically to realize this goal The chromatographic analysis was performed using the fast gradient protocol proposed by Valko, where acetonitrile was applied as an organic modifier Next, quantitative structure– retention relationships modeling was performed using multiple linear regression (MLR) methods and artificial neural networks (ANNs) coupled with genetic algorithm (GA)-inspired selection Subsequently, the selection of the best ANN was supported by statistical parameters, the sum of ranking differences approach with the comparison of rank by random numbers and hierarchical cluster analysis © 2021 The Author(s) Published by Elsevier B.V This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/) Introduction In the early stages of drug discovery, each drug candidate’s physicochemical properties should be determined beyond the biological activity screening [1] Biomimetic chromatography with an immobilized artificial membrane (IAM) can be used to assess affinity for phospholipids because phosphatidylcholine head groups are present on the surface of the stationary phase [2,3] Therefore IAM–high-performance liquid chromatography (HPLC) can mimic the lipid membrane monolayer The first HPLC columns with an IAM were introduced by Pidgeon et al in 1989 [4] Nowadays, only one type of IAM column, IAM.PC.DD2, has been provided by Regis Technologies IAM-HPLC has been successfully applied for phospholipid affinity studies of various drug classes, including beta blockers [37], calcium channel blockers [38], local anaesthetics [39], biogenic amines [40] and sets of structurally non-related ∗ Corresponding author at: Department of Physical Chemistry, Medical University ´ ´ of Gdansk, Aleja Gen Hallera 107, Gdansk 80-416, Poland E-mail address: krzesimir.ciura@gumed.edu.pl (K Ciura) basic, acidic and neutral drugs [5,31] IAM-HPLC has also been applied to the prediction of complex biological properties, such as blood–brain barrier permeability [5], oral absorption [6], volume of distribution [7], skin permeation [8], and cardiotoxicity [9] Furthermore, IAM-HPLC plays an essential role in toxicity and ecotoxicity studies [10] It is worth emphasizing that among non-cellbased methods, chromatographic approaches hold great applicability for high-throughput screening because modern HPLC systems are highly automated and widely distributed by academia and pharmaceutical companies [11,12] This study proposes Quantitative Structure–Retention Relationships (QSRR) models, which allow the prediction of the chromatographically determined affinity of xenobiotics to phospholipids A chromatographic hydrophobicity index with an immobilized artificial membrane (CHIIAM ) was used as retention data to develop models that would give information allowing an easily interpretable and quick comparison of phospholipid affinities with commercially available drugs A heterogeneous set of 261 molecules, mostly showing pharmacological activity or toxicity, was analyzed under IAM-HPLC conditions QSRR models were constructed us- https://doi.org/10.1016/j.chroma.2021.462666 0021-9673/© 2021 The Author(s) Published by Elsevier B.V This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/) K Ciura, S Kovacˇ evi´c, M Pastewska et al Journal of Chromatography A 1660 (2021) 462666 ing multiple linear regression (MLR) and artificial neural networks (ANNs) coupled with genetic algorithm (GA)-inspired selection ANN regression modeling was carried out using the Statistica version 12 program with an automated network search (ANS) approach The modeling was done so the feedforward multilayer perceptron (MLP) and radial basis function (RBF) networks were obtained The algorithm employed in the modeling was Broyden– Fletcher–Goldfarb–Shanno (BFGS) The range of hidden units was set between and 30 The inputs were the same as the independent variables in the MLR model In addition, the data were split into training and test sets in the same way as in the MLR model Identity, logistic, tanh, exponential and sine functions were used during the network’s training The number of training cycles varied depending on finding of the best network configuration The total number of trained networks was 10 0 ANN modeling was performed on the raw data because the modeling of the normalized data did not result in any acceptable ANN model Besides the formation of the training set and external test set, in the ANN modeling, it was necessary to split the training set into the additional test set and validation set This division was carried out by the software, so the random sample sizes were 70% for the training set, 15% for the test set, and 15% for the validation set The generalization error was determined using the test set, whereas the purpose of the validation set is to find the best ANN configuration and training parameters by comparing the validation set error and training set error during the training procedure [17] A global sensitivity analysis (GSA) was carried out to determine the significance of each input variable in each ANN model based on the value of the GSA index Generally, if the GSA index is greater than 1, the variable should be kept in the model [18] The rigorous internal and external validation of the ANN models was done by calculating the following statistical parameters: determination coefficient (R2 ), adjusted determination coefficient (R2 adj ), cross-validation determination coefficient (R2 cv ), F-test, RMSE, and standard deviation of the cross-validation (SDPRESS ) For the external validation, the following additional parameters were calculated: determination coefficient of the distribution of the residual (R2 res ), predictive squared correlation coefficients (Q2 F1 and Q2 F2 ), concordance correlation coefficient (CCC), RMSEP, average absolute error (AAE), and standard deviation (SD) The external validation parameters were calculated using the XternalValidationPlus_1.2 program (http://teqip.jdvu.ac.in/QSAR_Tools/) The networks’ similarities were examined using hierarchical cluster analysis (HCA) based on Ward’s algorithm and Euclidean distances The HCA was carried out on the whole dataset containing raw data predicted by each ANN model To rank, group and select the established ANN models, the sum of ranking differences (SRD) approach with the comparison of rank by random numbers (CRRN) and 7-fold cross-validation procedure was applied [19] The SRD analysis of the ANNs was performed based on the raw average (reference ranking – consensus) of predicted values for each ANN model The experimental values were also included in the analysis to determine which ANN models can be considered acceptable (Models with smaller SRD values than the experimental values are acceptable) The inclusion of the experimental values in the ranking provides a simple approach to the selection of “good” and “bad” models “Bad” models rationalize the information in the data worse than the experimental values, so their existence and application are not justified [19–21] A detailed explanation of the SRD methodology can be found elsewhere [19– 21] Materials and methods 2.1 Reagents Analytical reagents were used without further purification Ammonium acetate and acetonitrile (suitable for HPLC, gradient grade, ≥ 99.9%) were obtained from Sigma Aldrich (Steinheim, Germany) Ultrapure water (18.2 M × cm−1 ) used to prepare the mobile phase was purified and deionised in our laboratory via a Millipore Direct-Q UV Water Purification System (Millipore Corporation, Bedford, MA, USA) In our study, 261 compounds (listed in Table S1) were selected as a model set of solutes All substances of appropriate purity and information about the suppliers are given in a supplemental datasheet file All investigated substances were dissolved in dimethyl sulfoxide (DMSO), water or hexane at a mg/mL concentration level and stored at 2–8 °C between analyses 2.2 IAM chromatography All IAM-HPLC experiments were performed using a Prominence-1 LC-2030C 3D HPLC system (Shimadzu, Japan) equipped with a diode-array detector (DAD) and controlled by the LabSolution system (version 5.90, Shimadzu, Japan) The stock solutions of solutes were diluted to obtain 100 μg/mL concentrations, and the injected volume was μL The chromatographic analyses were carried out on an IAM.PC.DD2 column (10 × 4.6 mm; particle size 10.0 μm with an IAM guard column; Regis Technologies, USA) according to procedures proposed by Valko and co-workers [11,13,14] Briefly, a linear gradient of 0–85% phase B (where phase A was a 50 mm ammonium acetate buffer with an adjusted pH of 7.4 and phase B was acetonitrile) was used at a flow rate of 1.5 mL/min The temperature of the chromatographic column was constant at 30.0 °C, and the analysis time was 6.5 The CHIIAM indexes of the target solutes were obtained using a calibration set of reference substances Each IAM-HPLC analysis was run in triplicate All collected data are presented in a supplemented datasheet file 2.3 Advanced chemistry development labs descriptors The Advanced Chemistry Development (ACD)/Labs software from the Percepta Platform (PhysChem Module) was implemented to characterize the molecular structures of the investigated substances Considering the character of the predicted endpoint, CHIIAM , the following ACD/Labs descriptors were selected: polar surface area (PSA), molar volume (MV), hydrogen bond donors (HDo), hydrogen bond acceptors (HAc), and polarisability The ACD/Labs software was accessed on the chemspider.com website (20.05.2021) Calculated molecular descriptors of target solutes are listed in the supplemental datasheet file 2.4 Chemometrics analysis GA-MLR modeling was done using QSARINS version 2.2.4 software [15,16] The analyzed solutes were divided into two groups before QSRR analysis—a training group (70%) and a testing group (30%) Information about belonging to training or validation subsets is presented in the supplemental datasheet file The models were assessed based on fit, robustness and predictive abilities and included R2 , Q2 and root mean square error (RMSE) of cross-validation (RMSECV) coming from the leave-one-out crossvalidation technique, as well as RMSE in prediction (RMSEP) derived from external validation Results and discussion The relationship between retention and the chemical structure of analytes has attracted attention from the beginning of chromatographic research Kaliszan initiated and introduced the particular type of QSPR analysis, namely the QSRR approach, in 1977 K Ciura, S Kovacˇ evi´c, M Pastewska et al Journal of Chromatography A 1660 (2021) 462666 R2 = 0.550;Q2 loo = 0.513; RMSECV = 11.648; RMSEext = 10.277 [22] Since that time, QSRR has proven to be a powerful tool in chromatographic research [23,24] Nevertheless, the very actual paper published by Wiczling pointed out the main limitation of QSRR studies [25] Usually, relatively small datasets of compounds (frequently considering congeneric molecules) are used for QSRR modeling Furthermore, the data matrix often includes a considerable collection of calculated theoretical descriptors These descriptors are often challenging to interpret or not interpret, particularly in the case of analytically oriented researchers Moreover, since the molecules in the congeneric groups are highly similar, the proportional differences of the theoretical molecular descriptors for the congeners are generally only slightly visible Consequently, the calculation of descriptors should be done with great care because the calculation error must be noticeably lower than the real variance of that descriptor between the congeners Considering all these aspects, we have revitalised the approach to building QSAR models First, we used only descriptors provided by ACD/Labs Percepta Platform (PhysChem Module) that could be easily calculated and interpreted According to a study presented by Kubik and Wiczling, ACD-based descriptors showed similar precision and applicability in QSRR modeling as quantum chemistry–based descriptors [4] Nevertheless, it should be emphasized that using ACDbased descriptors is significantly more user friendly than the quantum chemistry approach is What is more, the developer regularly updates the accuracy and precision of ACD/Labs software Analysis of previously published QSRR models for IAM chromatography suggested that the ACD-based descriptors should cover the molecule-stationary phase interaction, which is mainly governed by the lipophilicity character solutes [5,26–29] Still, several QSRR models pointed out that H-bond descriptors and other descriptors related to PSA and molecular volume affected the IAM partitioning [30–32] The retention of analytes in IAM-HPLC can be measured in various ways, depending on the elution method Two of the most commonly used approaches are extrapolated logkwIAM parameters in the case of isocratic elution and IAM chromatographic hydrophobicity indices (CHIIAM ) determined in gradient elution CHI parameters were introduced by Valko et al for lipophilicity assessment and later adapted to IAM-HPLC conditions Briefly, CHI/CHIIAM values determined using a fast-organic phase gradient are derived from an assumption that analytes not move in the chromatographic system until a suitable organic phase concentration reaches the column, which starts eluting the analytes practically within the dead time The CHIIAM linearly depends on the retention time and ranges from to 100, corresponding to the acetonitrile concentration in the mobile phase One of the essential advantages of the fast gradient protocol compared with the isocratic approach is the rapid determination of (phospho)lipophilicity because they avoid multiple isocratic measurements and extrapolation procedures [33] Another study published by Valko and co-workers indicated that CHIIAM procedures showed excellent batch-to-batch repeatability [34] We determined the affinities to phospholipids of pharmaceutically and toxically relevant compounds using the CHIIAM approach and applied this parameter for QSRR modeling for the reasons presented above The first step of QSRR model construction was the selection of ACD descriptors using the GA-MLR approach This procedure aims to select the molecular descriptors that express the most decisive influence on the retention behavior in IAM-HPLC The model set was randomly split into training and examination groups The best model had four theoretical descriptors, as follows: The ACD descriptors included in this model have a tremendous physicochemical sense The most crucial parameter is the lipophilicity-related descriptor, logDp.H 7.4 This descriptor included information about influences of ionization on a lipophilic character under physiological conditions, the same as an experimental conditions Numbers of H-donors can be related to the interaction between analytes and phosphate groups presented on the phosphatidylcholine structure Similarly, PSA and molecular volume descriptors are frequently applied to modeling the molecular mechanism of retention in IAM chromatography Although the MLR model met the Tosphas criteria in terms of the Q2 value, it did not exceed the required threshold of R2 Nevertheless, both RMSECV and RMSEext have acceptable values of 11.648 and 10.277, respectively To obtain QSRR models with improved prediction ability, a nonlinear approach was applied The same input data that were used in the MLR modeling served as the input variables in the ANN modeling The only differences concerned the division of the examination set in the case of the ANN, which was divided into two subsets—the test and validation sets 3.1 Non-linear QSRR modeling: the ANN approach The ANN modeling resulted in 10 0 networks; among these, 11 networks were distinguished, comprising six MLP networks and five RBF networks The most reliable networks were selected based on the statistical parameters calculated by the program Statistica version 12, and they were submitted for further statistical validation The statistical parameters, network architectures, algorithms and activation functions of the distinguished ANNs are presented in Table The networks differ in the number of hidden neurons, whereas there is the same number of neurons in the input layer (4 independent variables: HDo , LogD7.4 , PSA and MV ) and the output layer (1 dependent variable: CHIIAM ) To evaluate the importance of all input variables, the GSA indices were calculated As can be seen from Fig 1, all the input variables have a GSA index greater than 1, meaning that all are justifiably included in all the ANN models In most of the ANNs, the PSA descriptor is characterized by the highest GSA coefficient, meaning that in those networks, it has the strongest influence on the network’s parameters Another descriptor with a significantly high GSA index is MV Therefore, comparing the average values of GSA indices (Fig 1), it can be said that in the set of the established ANNs, the PSA and MV descriptors have a dominant influence on the parameters of the network The statistical parameters given in Table indicate that all the ANNs have quite good statistical performance Considering the statistical parameters of the training set, calculated by the NCSS 2007 program, it can be seen that all the networks have considerably high R2 , R2 adj, and R2 cv coefficients and satisfactorily low values of RMSE and SDPRESS parameters The F-test values indicate that a very good fitting of the experimental and predicted data was achieved by all the ANNs Considering all parameters, ANN11 can be selected as the network that fits the data best However, observing the external validation parameters, the situation is a bit different Considering the external validation, it can be seen that some of the models failed to fulfill some criteria, such as CCC> 0.8, Q2 F1 , Q2 F2 > 0.6 and R2cv > 0.6 [35] It can be observed that the parameters of error metrics (RMSEP, SD, RMSE, SDPRESS ) in the case of the external validation set are higher than those of the training set The fitting of the experimental and predicted data of the external dataset is a bit worse than in the case of the training dataset; however, it is in the acceptable range Considering all the prediction statistics and based on consensus, ANN4 can be considered the model that al- CHIIAM = 3.781(±1.567 )HDo + 3.079(±0.947 )LogD7.4 − 0.206(±0.063 )PSA + 0.097(±0.026 )MV + 12.021(±5.482 ) K Ciura, S Kovacˇ evi´c, M Pastewska et al Journal of Chromatography A 1660 (2021) 462666 Table Statistical parameters of the obtained ANNs Training set (n = 197) Statistical parameters ANN1 ANN2 ANN3 ANN4 ANN5 ANN6 ANN7 ANN8 ANN9 ANN10 ANN11 Architecture Algorithm Cycles No Hidden a.f Output a.f R2 R2 adj R2 cv F-test RMSE SDPRESS 4–29–1 RBF – Gauss Ident 0.7401 0.7387 0.7350 555.2 7.44 7.47 4–8–1 MLP 105 Log Exp 0.7682 0.7670 0.7634 646.1 7.60 7.64 4–10–1 MLP 83 Tanh Log 0.7612 0.7600 0.7567 621.6 7.33 7.36 4–9–1 MLP 77 Log Exp 0.7462 0.7449 0.7413 573.2 7.74 7.76 4–23–1 RBF – Gauss Ident 0.7065 0.7050 0.7004 469.4 7.63 7.67 4–25–1 RBF – Gauss Ident 0.7102 0.7087 0.7038 477.7 7.75 7.79 4–29–1 RBF – Gauss Ident 0.7468 0.7455 0.7416 575.1 7.41 7.44 4–6–1 MLP 113 Tanh Log 0.7604 0.7591 0.7559 618.8 7.56 7.59 4–29–1 RBF – Gauss Ident 0.7405 0.7391 0.7350 556.3 7.56 7.60 4–8–1 MLP 53 Tanh Exp 0.7456 0.7443 0.7409 571.6 7.76 7.79 4–8–1 MLP 54 Tanh Exp 0.7790 0.7697 0.7663 656.0 7.38 7.41 0.6247 0.6186 0.5961 0.2556 0.9097 0.6076 0.7780 9.18 5.85 7.12 103.2 7.93 8.11 0.5231 0.5153 0.4899 0.1689 0.8765 0.4633 0.7226 10.74 7.19 8.02 68.0 9.95 10.13 0.5850 0.5782 0.5581 0.2756 0.9019 0.5737 0.7554 9.57 6.21 7.33 87.3 8.25 8.38 0.6769 0.6714 0.6562 0.1192 0.9206 0.6548 0.8220 8.61 6.02 6.20 129.7 8.21 8.33 0.6468 0.6415 0.6216 0.3039 0.9177 0.6424 0.7898 8.77 5.52 6.84 113.7 7.39 7.53 0.6164 0.6100 0.5890 0.2579 0.9085 0.6045 0.7744 9.24 5.90 7.15 99.6 8.02 8.17 0.5986 0.5922 0.5732 0.2594 0.9048 0.5863 0.7650 9.43 6.02 7.30 92.5 8.22 8.34 0.6411 0.6346 0.6178 0.1959 0.9143 0.6275 0.7968 8.95 6.02 6.66 110.4 8.14 8.27 0.6087 0.6024 0.5820 0.2736 0.9074 0.5974 0.7690 9.30 6.11 7.05 96.4 8.01 8.15 0.6604 0.6547 0.6380 0.1324 0.9168 0.6381 0.8119 8.82 6.11 6.40 120.5 8.34 8.48 0.6545 0.6490 0.6317 0.2169 0.9184 0.6451 0.8030 8.73 5.24 7.02 117.5 7.83 7.95 External test set (n = 64) R2 R2 adj R2 cv R2 res Q2 F1 Q2 F2 CCC RMSEP SD AAE F-test RMSE SDPRESS Fig GSA indices of the input variables for each ANN model and average values of GSA indices of each input variable lowed the best fit of the data from the external dataset, generating the lowest prediction error (RMSEP) The comparison between the experimental data and the data predicted by the ANN4 and ANN11 models, as well as the distribution of the residuals for these models, are presented in Fig The graphs for the rest of the models are given in supplementary data as Fig 1S The external dataset is very well fitted to the training set The amplitude of the residuals is in the acceptable range, and their random distribution around the zero axis implies that the prediction error is unpredictable This is also confirmed by the quite low R2 res values for each ANN model except ANN5, where R2 res is considerably high (R2res > 0.3) on the presented plot, it is quite difficult to estimate similarities and dissimilarities among the models because they seem to have the same distribution, and there is no ANN that can be considered significantly better or worse than the other networks In addition, it is worth stressing that there were no outliers or extremes detected on the box-whisker plot In the next step of analyzing the networks’ similarity, HCA was conducted The results are presented in the form of a dendrogram in Fig What can be first noticed on the dendrogram is that the experimental values (EXP) are outside of any cluster and can be considered outliers Therefore, there is some considerable difference between the experimental CHIIAM values and CHIIAM values predicted by the ANNs On the dendrogram, it can be observed that there are two main clusters The first cluster contains ANN11, ANN3, ANN8, ANN10, ANN4 and ANN2, whereas the second cluster comprises ANN6, ANN9, ANN7, ANN5 and ANN1 This separation into two clusters is particularly interesting because there are only MLP networks in the first cluster, and in the second cluster, there are only RBF networks This separation can be an indicator of 3.2 Network similarity and ranking The comparison of the ANNs, their ranking, and their selection is a challenging but not impossible task To preliminarily compare the models (predicted CHIIAM values) together with the experimental (EXP) values, a box-whisker plot was generated (Fig 3) based K Ciura, S Kovacˇ evi´c, M Pastewska et al Journal of Chromatography A 1660 (2021) 462666 Fig Comparison between the experimental CHIIAM parameters and CHIIAM parameters predicted by ANN4 and ANN11 and the distribution of the residuals for each model (•, training set , external test set) Fig Box-whisker plot of the experimental CHIIAM values and CHIIAM values predicted by the ANNs the crucial differences in the prediction ability of these two types of networks Indeed, the MLP networks use any non-linear function as an activation function, while in RBF networks, the activation function is a function of Euclidean distance between inputs and weights, and it usually applies Gaussian activation functions; there can also be more than one hidden layer in MLP networks, whereas there is only one hidden and one output layer in RBF networks [17,35] The main advantage of RBF networks is that they make more robust predictions than MLP networks do; however, they have more limited applications In contrast, MLP networks are more vulnerable to adversarial noise and can sometimes make quite wrong predictions, unlike RBF networks [35] Considering the number of hidden neurons, the architecture of the RBF networks in the present study is more complex than the architecture of MLP networks K Ciura, S Kovacˇ evi´c, M Pastewska et al Journal of Chromatography A 1660 (2021) 462666 Fig HCA of the experimental CHIIAM values and CHIIAM values predicted by the ANNs Table The SRD ranking of the ANNs based on row average and p% intervals SRD-CRRN results The data in Table indicate that the smallest SRD value is in ANN4, and this is the closest to the reference ranking, whereas ANN6 has the highest SRD value, and it is placed the furthest from the reference All the networks can be considered acceptable because they have SRD values smaller than the SRD value of the experimental data In addition, the probability that the models are of a random character is negligible (p% intervals are between 2.82E09 and 4.79E-09) The separation of the MLP and RBF networks is also observable in the graph in Fig 5, as evident in the HCA dendrogram Here, the MLP networks (ANN11, ANN4, ANN8, ANN3, ANN2, ANN10) are placed closer to the reference ranking compared with the RBF networks (ANN1, ANN7, ANN5, ANN9) As an RBF network, ANN6 is clearly separated from the others and can be considered an outlier All the networks are significantly distinguished from the experimental data on the SRD graph Although the HCA and SRD methodologies have very different basics, the results are very similar Considering the statistical parameters of the training and external test sets of the established networks (Table 1), the networks ANN4 and ANN11 were previously suggested as the networks that would be the most suitable for predicting CHIIAM parameters in the analyzed set of compounds under the applied chromatographic conditions The results of SRD analysis pointed out that these two networks are closest to the reference ranking and confirmed previous assumptions about their selection as the best ones To estimate the uncertainties of the SRD values of the ANNs, 7-fold cross-validation was applied, so one-seventh of the objects were left out, and the ranking was carried out on the remaining six-sevenths of the objects The results of the cross-validation of the SRD procedure are given in the form of a box-whisker plot in Fig In the presented plot, the same separation of the networks is observable as in Fig The MLP networks are closer to the reference ranking than the RBF networks are, which are separated by a vertical dashed line The ANN4 and ANN11 networks possess the lowest median and are the best choice for predicting CHIIAM parameters Considering its ranking value, the application of ANN6 should definitely be avoided The cross-validation confirmed the reliability of the conducted SRD procedure p% Networks SRD x < SRD < y ANN4 ANN11 ANN8 ANN3 ANN2 ANN10 ANN1 ANN7 ANN5 ANN9 ANN6 EXP XX1 Q1 Med Q3 XX19 2918 2944 2962 3104 3172 3272 3688 3742 3858 3968 4958 6436 21,138 22,042 22,652 23,190 24,094 2.82E-09 2.84E-09 2.86E-09 3.00E-09 3.06E-09 3.16E-09 3.56E-09 3.61E-09 3.72E-09 3.83E-09 4.79E-09 6.21E-09 4.96 24.82 49.76 74.99 94.97 2.82E-09 2.84E-09 2.86E-09 3.00E-09 3.06E-09 3.16E-09 3.56E-09 3.61E-09 3.73E-09 3.83E-09 4.79E-09 6.22E-09 5.03 25.06 50.06 75.23 95.05 To rank and group the ANNs, as well as to choose the most reliable ones, the SRD method was applied in the following step The reference ranking was the average row values that represent a consensus The average would provide the most probable ranking; however, it is not necessarily a bias-free solution [19] Rather, it is a solution that has less bias than the ranking based on any other Refs [19,36] The ranking of the ANNs was done based on the matrix that contained the CHIIAM values predicted by each ANN sorted in columns with regard to the row average as the reference ranking The experimental CHIIAM values were also considered so that acceptable models could be determined The networks that can be considered acceptable have smaller SRD values than the experimental ones The results of the ranking are presented in Table and Fig K Ciura, S Kovacˇ evi´c, M Pastewska et al Journal of Chromatography A 1660 (2021) 462666 Fig Ranking of ANNs by SRD and comparison of ranks by random numbers with row average as a reference ranking The statistical characteristics of Gaussian fit are as follows: first icosaile [5%], XX1 = 21,138; first quartile, Q1 = 22,042; median, Mediana [Med] = 22,652; last quartile, Q3 = 23,190; last icosaile (95%), XX19 = 24,094 Fig Box-whisker plot of the seven-fold cross-validation of SRD procedure using row average as a reference ranking (consensus ranking) The Y-axis represents the SRD values with uncertainties Conclusion terpretation of data; in the writing of the manuscript, or in the decision to publish the results Although several free and commercial programs can be applied for lipophilicity prediction, estimation of the affinity to phospholipids is a serious loophole The proposed models are dedicated to application in the early steps of the drug discovery pipeline when high throughput is more required than accuracy Non-linear modeling is more sustained in terms of predicting phospholipids’ affinity Non-linear models can be used for fast screening of phospholipid affinity, which represents a significant gap in the current modeling of physicochemical properties of drug candidates Furthermore, in the literature, it is possible to find an extensive database for CHIIAM values determined for active pharmaceutical ingredients [7,9,33] and compare the calculated CHIIAM value for designed or newly synthesized molecules with well-known drugs targeting the same therapeutic goals The proposed ANN4 and ANN11 networks allow for a better selection of drug candidates, reducing the costs of late-stage attrition experiments They are also the first step in creating a tool for assessing affinity to phospholipids as a more biomimetic feature than classical lipophilicity CRediT authorship contribution statement Krzesimir Ciura: Conceptualization, Writing – original draft, Methodology, Supervision, Project administration, Formal analysis, Investigation Strahinja Kovacˇ evic´ : Visualization, Software, Writing – original draft, Formal analysis, Investigation, Methodology Monika Pastewska: Investigation Hanna Kapica: Investigation Martyna Kornela: Investigation Wiesław Sawicki: Funding acquisition Acknowledgements This research was funded by the Ministry of Science and Higher Education by means of ST3 02–0 03/07/518 statutory funds We also thank Prof Paola Gramatica for free academic licences for the use of the QSARINS software Declaration of Competing Interest Supplementary materials The authors declare no conflict of interest The funders had no role in the design of the study; in the collection, analyses, or in- Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.chroma.2021.462666 K Ciura, S Kovacˇ evi´c, M Pastewska et al Journal of Chromatography A 1660 (2021) 462666 References [24] R Kaliszan, QSRR: quantitative structure-(Chromatographic) retention relationships, Chem Rev 107 (2007) 3212–3246, doi:10.1021/cr068412z [25] P Wiczling, A Kamedulska, Ł Kubik, Application of bayesian multilevel modeling in the quantitative structure–retention relationship studies of heterogeneous compounds, Anal Chem 93 (2021) 6961–6971, doi:10.1021/acs analchem.0c05227 [26] L Grumetto, C Carpentiero, P Di Vaio, F Frecentese, F Barbato, Lipophilic and polar interaction forces between acidic drugs and membrane phospholipids encoded in IAM-HPLC indexes: their role in membrane partition and relationships with BBB permeation data, J Pharm Biomed Anal 75 (2013) 165–172, doi:10.1016/j.jpba.2012.11.034 [27] A Taillardat-Bertschinger, C.A.M Martinet, P.A Carrupt, M Reist, G Caron, R Fruttero, B Testa, Molecular factors influencing retention on immobilized artificial membranes (IAM) compared to partitioning in liposomes and n-octanol, Pharm Res 19 (2002) 729–737, doi:10.1023/a:1016156927420 [28] L Grumetto, C Carpentiero, F Barbato, Lipophilic and electrostatic forces encoded in IAM-HPLC indexes of basic drugs: their role in membrane partition and their relationships with BBB passage data, Eur J Pharm Sci 45 (2012) 685–692, doi:10.1016/j.ejps.2012.01.008 [29] G Russo, L Grumetto, F Barbato, G Vistoli, A Pedretti, Prediction and mechanism elucidation of analyte retention on phospholipid stationary phases (IAMHPLC) by in silico calculated physico-chemical descriptors, Eur J Pharm Sci Off J Eur Fed Pharm Sci 99 (2017) 173–184, doi:10.1016/j.ejps.2016.11.026 [30] L Grumetto, C Carpentiero, F Barbato, Lipophilic and electrostatic forces encoded in IAM-HPLC indexes of basic drugs: their role in membrane partition and their relationships with BBB passage data, Eur J Pharm Sci 45 (2012) 685–692, doi:10.1016/j.ejps.2012.01.008 [31] G Russo, L Grumetto, F Barbato, G Vistoli, A Pedretti, Prediction and mechanism elucidation of analyte retention on phospholipid stationary phases (IAMHPLC) by in silico calculated physico-chemical descriptors, Eur J Pharm Sci 99 (2017) 173–184, doi:10.1016/j.ejps.2016.11.026 [32] C Giaginis, A Tsantili-Kakoulidou, Alternative measures of lipophilicity: from octanol-water partitioning to IAM retention, J Pharm Sci (2008), doi:10.1002/ jps.21244 [33] K Valko, S Nunhuck, C Bevan, M.H Abraham, D.P Reynolds, Fast gradient HPLC method to determine compounds binding to human serum albumin Relationships with octanol/water and immobilized artificial membrane lipophilicity, J Pharm Sci 92 (2003) 2236–2248, doi:10.1002/jps.10494 [34] K.L Valko, S Rava, S Bunally, S Anderson, Revisiting the application of immobilized artificial membrane (IAM) chromatography to estimate in vivo distribution properties of drug discovery compounds based on the model of marketed drugs, ADMET DMPK (2020) 78–97, doi:10.5599/admet.757 [35] V Goncalves, K Maria, A.B.F da Silv, Applications of artificial neural networks in chemical problems, Artificial Neural Network Architecture Applied, InTech, 2013, doi:10.5772/51275 ´ S.O Podunavac-Kuzmanovic, ´ L.R Jevric, ´ E.A Djurendic, ´ J.J Aj[36] S.Z Kovacˇ evic, ´ S.B Gadžuric, ´ M.B Vraneš, How to rank and discriminate artificial dukovic, neural networks? Case study: prediction of anticancer activity of 17-picolyl and 17-picolinylidene androstane derivatives, J Iran Chem Soc 13 (2016) 499– 507, doi:10.1007/s13738-015-0759-9 [37] Masucci, Caldwell, Foley, Comparison of the retention behavior of β -blockers using immobilized artificial membrane chromatography and lysophospholipid micellar electrokinetic chromatography, Journal of Chromatography A (1998), doi:10.1016/S0 021-9673(98)0 0219-2 [38] Barbato, La Rotonda, Quaglia, Cromatographic indices determined on an immobilized artificialmembrane (IAM) column as descriptors of lipophilic and polar interactions of 4-phenyldihydropyridinecalcium-channel blockers with biomembranes, Eur J Med Chem (1996), doi:10.1016/s0014- 827x(98)00082- [39] Demare, Roy, Legendre, Actors ongerning the retention of solutes on chromatographic immobilized artificial membranes: Application to anti-inflammotory and analgesic drugs, J Liq Chromatogr Relat Technol (1999), doi:10.1081/ JLC-100102051 [40] Amato, Barbato, Morrica, Quaglia, Rotonda, Interactions between Amines and Phospholipids: A chromatographic study on immobilized artificial membrane (IAM) stationary phases at various pH values, Helvetica Chimica Acta (20 0) doi:10.1002/1522-2675(20001004)83:10%3C2836::AID-HLCA2836%3E3.0.CO;2-G [1] F Tsopelas, C Giaginis, A Tsantili-Kakoulidou, Lipophilicity and biomimetic properties to support drug discovery, Expert Opin Drug Discov 12 (2017) 885– 896, doi:10.1080/17460441.2017.1344210 [2] A Tsantili-Kakoulidou, How can we better realize the potential of immobilized artificial membrane chromatography in drug discovery and development? Expert Opin Drug Discov 15 (2020) 273–276, doi:10.1080/17460441 2020.1718101 [3] F Tsopelas, T Vallianatou, A Tsantili-Kakoulidou, Advances in immobilized artificial membrane (IAM) chromatography for novel drug discovery, Expert Opin Drug Discov 11 (2016) 473–488, doi:10.1517/17460441.2016.1160886 [4] C Pidgeon, U.V Venkataram, Immobilized artificial membrane chromatography: supports composed of membrane lipids, Anal Biochem 176 (1989) 36–47, doi:10.1016/0 03- 2697(89)90269- [5] L Grumetto, C Carpentiero, P Di Vaio, F Frecentese, F Barbato, Lipophilic and polar interaction forces between acidic drugs and membrane phospholipids encoded in IAM-HPLC indexes: their role in membrane partition and relationships with BBB permeation data, J Pharm Biomed Anal 75 (2013) 165–172, doi:10.1016/j.jpba.2012.11.034 [6] L Grumetto, G Russo, F Barbato, Relationships between human intestinal absorption and polar interactions drug/phospholipids estimated by IAM-HPLC, Int J Pharm 489 (2015) 186–194, doi:10.1016/j.ijpharm.2015.04.062 [7] S Teague, K Valko, How to identify and eliminate compounds with a risk of high clinical dose during the early phase of lead optimization in drug discovery, Eur J Pharm Sci 110 (2017) 37–50, doi:10.1016/j.ejps.2017.02.017 [8] M Hidalgo-Rodríguez, S Soriano-Meseguer, E Fuguet, C Ràfols, M Rosés, Evaluation of the suitability of chromatographic systems to predict human skin permeation of neutral compounds, Eur J Pharm Sci 50 (2013) 557–568, doi:10.1016/j.ejps.2013.04.005 [9] C Stergiopoulos, F Tsopelas, K Valko, Prediction of hERG inhibition of drug discovery compounds using biomimetic HPLC measurements, ADMET DMPK (2021), doi:10.5599/admet.995 [10] F Tsopelas, C Stergiopoulos, L.A Tsakanika, M Ochsenkühn-Petropoulou, A Tsantili-Kakoulidou, The use of immobilized artificial membrane chromatography to predict bioconcentration of pharmaceutical compounds, Ecotoxicol Environ Saf 139 (2017) 150–157, doi:10.1016/j.ecoenv.2017.01.028 [11] K.L Valko, Application of biomimetic HPLC to estimate in vivo behavior of early drug discovery compounds, Futur Drug Discov (2019) FDD11, doi:10.4155/ fdd- 2019- 0 04 [12] K.L Valkó, Lipophilicity and biomimetic properties measured by HPLC to support drug discovery, J Pharm Biomed Anal 130 (2016) 35–54, doi:10.1016/j jpba.2016.04.009 [13] K Valkó, Chromatographic hydrophobicity index by fast-gradient RP-HPLC: a high-throughput alternative to log P/log D, Anal Chem 69 (1997) 2022–2029, doi:10.1021/ac961242d [14] K.L Valkó, Biomimetic chromatography to accelerate drug discovery: part I, J LC-GC N Am 36 (2018) 397–405 [15] P Gramatica, N Chirico, E Papa, S Cassani, S Kovarich, QSARINS: a new software for the development, analysis, and validation of QSAR MLR models, J Comput Chem 34 (2013) 2121–2132, doi:10.1002/jcc.23361 [16] P Gramatica, S Cassani, N Chirico, QSARINS-chem: insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS, J Comput Chem 35 (2014) 1036–1044, doi:10.1002/jcc.23576 [17] K.L Priddy, P.E Keller, Artificial Neural Networks: An Introduction, SPIE, 2009, doi:10.1117/3.633187 [18] Inc StatSoft, Electronic Statistics Textbook, StatSoft, Tulsa, OK, 2011 WEB http: //www.statsoft.com/textbook/ [19] K Héberger, Sum of ranking differences compares methods or models fairly, TrAC 29 (2010) 101–109, doi:10.1016/j.trac.20 09.09.0 09 [20] K Kollár-Hunek, K Héberger, Method and model comparison by sum of ranking differences in cases of repeated observations (ties), Chemom Intell Lab Syst 127 (2013) 139–146, doi:10.1016/j.chemolab.2013.06.007 [21] K Héberger, K Kollár-Hunek, Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers, J Chemom 25 (2011) 151–158, doi:10.1002/cem.1320 [22] R Kaliszan, Correlation between the retention indices and the connectivity indices of alcohols and methyl esters with complex cyclic structure, Chromatographia 10 (1977) 529–531, doi:10.1007/BF02262911 [23] P Žuvela, M Skoczylas, J.Jay Liu, T Ba̧czek, R Kaliszan, M.W Wong, B Buszewski, Column characterization and selection systems in reversed-phase high-performance liquid chromatography, Chem Rev 119 (2019) 3674–3729, doi:10.1021/acs.chemrev.8b00246 ... Box-whisker plot of the experimental CHIIAM values and CHIIAM values predicted by the ANNs the crucial differences in the prediction ability of these two types of networks Indeed, the MLP networks use... carried out on the remaining six-sevenths of the objects The results of the cross-validation of the SRD procedure are given in the form of a box-whisker plot in Fig In the presented plot, the same... applied The same input data that were used in the MLR modeling served as the input variables in the ANN modeling The only differences concerned the division of the examination set in the case of the