New spherical harmonic based descriptors to efficiently fuel QSAR methodology : Endocrine disruptor case study
Two-dimension quantitative structure-activity relationship (2D QSAR) has been a standard methodology for the last decade whereas multiple three-dimension (3D) descriptors have been tested with mitigated successes. In the present study, we propose a new set of highly informative and compact 3D descriptors from spherical harmonic (SH) based representations covering both the geometrical shape and the pharmacophoric features of a molecule. The process consists in placing a molecule on three different axes – each one is captured by its own set of spherical harmonics. SH related expansions are used to create compact and rotation independent descriptors – e.g. 32 floating coefficients – to describe a conformer of the molecule. These descriptors were then applied to a QSAR model of toxicity which was built from the reference dataset of the CERAPP project – a collaborative project that developed a consensus model of toxicity for the endocrine disruption . The QSAR model was trained with SH based descriptors and binding activity to the estrogen receptor was considered in this study. The resulting model yielded a balance accuracy of 0.87 on the evaluation dataset. Furthermore, by combining SH and 2D descriptors from the RDKit suite , the subsequent QSAR model gave rise to a balance accuracy of 0.91 on the evaluation dataset, positioning its performance at the high level of the consensus model obtained by the CERAPP project.
GESSE : The “magic triangle”
The “magic triangle” of “drugs, targets, side effects” (SEs) is the new “holy grail” of the pharmaceutical industry. This figure shows a subset of a triangular matrix associating the most significant drug–target relationships predicted by the authors’ GES algorithm with the SEs for those drugs predicted by the same authors’ GESSE approach. Combining GES with GESSE allows the physicochemical space of drugs, the polypharmacologically relevant biological subspace of drug targets, and the phenotypic space of SEs to be related computationally.
GES Polypharmacology Fingerprints: A Novel Approach for Drug Repositioning
Polypharmacology is now recognized as an increasingly important aspect of drug design. We previously introduced the Gaussian ensemble screening (GES) approach to predict relationships between drug classes rapidly without requiring thousands of bootstrap comparisons as in current promiscuity prediction approaches. Here we present the GES “computational polypharmacology fingerprint” (CPF), the first target fingerprint to encode drug promiscuity information. The similarity between the 3D shapes and chemical properties of ligands is calculated using PARAFIT and our HPCC programs to give a consensus shape-plus-chemistry ligand similarity score, and ligand promiscuity for a given set of targets is quantified using the GES fingerprints. To demonstrate our approach, we calculated the CPFs for a set of ligands from DrugBank that are related to some 800 targets. The performance of the approach was measured by comparing our CPF with an in-house “experimental polypharmacology fingerprint” (EPF) built using publicly available experimental data for the targets that comprise the fingerprint. Overall, the GES CPF gives very low fall-out while still giving high precision. We present examples of polypharmacology relationships predicted by our approach that have been experimentally validated. This demonstrates that our CPF approach can successfully describe drug–target relationships and can serve as a novel drug repurposing method for proposing new targets for preclinical compounds and clinical drug candidates.
A highly specific and sensitive pharmacophore model for identifying CXCR4 antagonists. Comparison with docking and shape-matching virtual screening performance
HIV infection is initiated by fusion of the virus with the target cell through binding of the viral gp120 protein with the CD4 cell surface receptor protein and the CXCR4 or CCR5 coreceptors. There is currently considerable interest in developing novel ligands that can modulate the conformations of these coreceptors and, hence, ultimately block virus–cell fusion. Herein, we present a highly specific and sensitive pharmacophore model for identifying CXCR4 antagonists that could potentially serve as HIV entry inhibitors. Its performance was compared with docking and shape-matching virtual screening approaches using 3OE6 CXCR4 crystal structure and high-affinity ligands as query molecules, respectively. The performance of these methods was compared by virtually screening a library assembled by us, consisting of 228 high affinity known CXCR4 inhibitors from 20 different chemotype families and 4696 similar presumed inactive molecules. The area under the ROC plot (AUC), enrichment factors, and diversity of the resulting virtual hit lists was analyzed. Results show that our pharmacophore model achieves the highest VS performance among all the docking and shape-based scoring functions used. Its high selectivity and sensitivity makes our pharmacophore a very good filter for identifying CXCR4 antagonists.
Benchmarking of HPCC: A novel 3D molecular representation combining shape and pharmacophoric descriptors for efficient molecular similarity assessments
Since 3D molecular shape is an important determinant of biological activity, designing accurate 3D molecular representations is still of high interest. Several chemoinformatic approaches have been developed to try to describe accurate molecular shapes.
Here, we present a novel 3D molecular description, namely harmonic pharma chemistry coefficient (HPCC), combining a ligand-centric pharmacophoric description projected onto a spherical harmonic based shape of a ligand. The performance of HPCC was evaluated by comparison to the standard ROCS software in a ligand-based virtual screening (VS) approach using the publicly available directory of useful decoys (DUD) data set comprising over 100,000 compounds distributed across 40 protein targets.
Our results were analyzed using commonly reported statistics such as the area under the curve (AUC) and normalized sum of logarithms of ranks (NSLR) metrics. Overall, our HPCC 3D method is globally as efficient as the state-of-the-art ROCS software in terms of enrichment and slightly better for more than half of the DUD targets. Since it is largely admitted that VS results depend strongly on the nature of the protein families, we believe that the present HPCC solution is of interest over the current ligand-based VS methods.