Contact us

Any Questions? Contact jcaballero@utalca.cl directly

Docking results analysis

Input file:

Excel

Output file:

CSV

Find the computationally predicted binding energies that best adjust with the experimentally reported activity through R2 for your congeneric ligands. For this, modify our Excel template with the results obtained from your preferred docking software.

Glide results analysis

Input file:

ZIP containing .mae/.maegz

Output files:

CSV and MAE

Submit your Glide docking results to find the best correlation (R2) between experimental activity and computationally calculated energy. Also, three-dimensionally visualize the ligands and amino acids of the binding site in the different protein conformations. This will help you understand how ligands recognize the protein binding site.

Information

The genetic algorithm

From your molecular docking results, identify the binding pose of each congeneric ligand and the protein conformations involved in molecular recognition. This is achieved using an optimization algorithm implemented through a genetic algorithm that performs M iterations.

The algorithm optimizes combinations (reported experimental activity versus a score value of a binding pose for each of the congeneric ligands) of molecular docking poses, which are evaluated using a scoring function that calculates the correlation coefficient between the experimental activity value and the computationally predicted energy. The process begins with the generation of N combinations, where each combination is evaluated using the coefficient of determination (R2). Those combinations that exhibit the highest R2 values are selected and proceed to the next iteration cycle. Within this same cycle, new combinations are generated using a crossover process based on existing combinations. These new combinations may or may not undergo point mutations, which mostly involve changes in the indicators of the ligand binding poses, leading to new energy values to be correlated. In addition, the option to reduce the number of protein conformations to simulate the conformational selection theory has been implemented, thus generating new combinations until the same N of combinations is reached. This process is repeated iteratively until a predefined convergence criterion is met, finally allowing the identification of the optimal combination with the highest R2 value.

The genetic algorithm operates as a random search engine within a vast space of combination possibilities. This causes different combinations with maximum R2 to be found when repeated with the same parameters. To address this challenge, the prune and search method was implemented. This technique involves grouping the poses of each ligand to reduce the search space, thus identifying the unique combination with the ligand binding energy pose that contributes to the maximum R2 value.

The parameters of the genetic algorithm are as follows:

Number of iterations: Number of cycles or iterations to search for the combination with the maximum R2 value between experimental activity and computationally predicted energy.

Number of possible combinations: Total number of combinations to be evaluated with R2 in each iteration.

Mutation rate: Percentage of mutations that change the ligand binding pose, resulting in new computationally predicted energy values to be correlated.

Percentage of crossing: Crossover percentage between existing combinations to generate new combinations.

Offspring elitism rate: Determines which of the current combinations (those with the highest R2 scores) are selected as offspring for the next iteration cycle.

Offspring random rate: It controls how much of the offspring is generated randomly.

Penalty coefficient per protein: Penalty coefficient that controls the extent to which the number of protein conformations affects the individual score. Fewer protein conformations are always preferred.

Protein trim rat: Trimming a gene involves randomly changing the positions of a selected protein conformation to reduce the number of protein conformations in the gene by one. New valid ligand poses are selected such that they belong to the protein conformations of the current gene minus the excluded one.

Grouping ligand poses: Enable the iterative procedure to reduce the combinatorial search space of docking poses using the prune and search method. Greatly improves the convergence of the search but the result may be suboptimal.

Number of groups: Number of groups used at each iteration in the iterative search. Note that a larger value would require a greater number of GA iterations to converge.

Correlation Result Example

Sample correlation and visualization of Ligand-Binding Site Interactions

Reset zoom
Correlation graph

Information about Our Correlation Examples

Our case studies

The genetic algorithm was trained with 6 protein systems and their respective congeneric ligand series. These are as follows:

P38alpha:

13 ligands derived from 4-(3-benzoylamino-6-methyl-anilino)pyrimidines

Heat shock protein (HSP90):

46 ligands derivated from Benzimidazolone

Heat shock protein (HSP90):

36 ligands derivated from Aminopyrimidine

Heat shock protein (HSP90):

20 ligands derivated from Benzophenone

β-Thrombin (PDB):

77 ligands derivated from 3-amidinophenylalanine

β-Thrombin (GaMD):

86 ligands derivated from 3-amidinophenylalanine

Descriptions

The results show R2 correlations above 0.6 for all case studies, except for β-Thrombin with the 77 ligands. This is due to the low structural variability of the β-Thrombin binding site. To address this, new conformations were generated using Gaussian Accelerated Molecular Dynamics (GaMD).

You can run our 6 case studies in both "Docking results analysis" and "Glide results analysis". To do this, click on "Submit a job" (top of the page). Then, choose between the two buttons "Excel file analysis" or "Glide file analysis" and a window will open. Click on "Load example" and select any of the examples. Finally, upload the Excel or ZIP file to the "UPLOAD" field and click "Run correlation".

Comparison Table of Analysis Methods

Key differences between "Docking Results Analysis" and "Glide Results Analysis" methods for identifying ligand poses and binding energies that best fit with experimental activity

Methods Find the best correlation between activities Analysis for results of any docking software 3D visualization of protein-ligand complexes Results in tables and graphs Download results in CSV formats Download results in MAE formats
Docking Results Analysis
Glide Results Analysis

Developers

About the members

Motivation

This work arises from the need to provide a tool that helps elucidate the relationship between the molecular structure and the experimental activity of a series of compounds. The researchers involved are all professionals in the field of bioinformatics, each specializing in molecular design and/or modeling.

Sergio Alfaro

Bioinformatician

Francisco Adasme

PhD

Fabian Gonzalez

Bioinformatician

Jose Luis Velazquez

PhD

Julio Caballero

PhD

Gratefulness

Information about institutions/universities

ANID folio

#21211296

Fondecyt regular

#1210131

Study program

PhD

Center for Bioinformatics

Simulations and Modelling

Universidad de Talca

Chile