Team:Paris Bettencourt/Result

RESULTS

Test AMP Forest on random peptide library

We tested 47,938 peptides with the AMP Forest 100 model (100 decision trees), and predicted result is highly proportional to the scores measured in experiment, with a correction coefficient ~ 2.1419, and Pearson correlation coefficient ~ 0.982 (Figure 1, where the predicted score is already corrected by the coefficient)

Figure 1:AMP Forest 100 model’s prediction vs experimental measured bacterial survival score on 47938 tested samples.

Limitation of AMP Forest: bias and error analysis at the boundary condition

First of all, the AMP Forest model describe the sequence - antimicrobial efficiency landscape, according to the peptides detected in the random library. Therefore, it must exist a manifold that could describe the boundary of the model mathematically.

When the peptide sequences fall inside the manifold, the model could provide an accurate estimation of the antimicrobial efficiency. However, we could consider the possibility of a peptide that is outside the manifold -- where the model loses the power. Here we show a diagram to describe this scenario in a 2D simplification, whereas the real N-mer peptide sequence space is 20*N dimensional.

When we consider the error of predictive model, it is actually obvious that the error is larger and more biased when the score goes further from the 0, i.e. towards to the edge of the model.

Figure 3:Prediction error of AMP Forest 100, where the prediction error is calculated as prediction score - measured score.

Especially, the optimization in the AMP Evolver is making the sequence more and more efficient. The results AMP Forest gives get less and less accurate, and at the end reach the edge of the model, or even jump out of the model due the randomness. Therefore, the outcomes of AMP Designer must be further screened in wet-lab experiment.

The result from wet-lab experiment could be used to further update, and expand the edge of AMP Forest. As a consequence, the software itself will evolve in this design-build-test cycle.

Positive-cluster modification design principle found on AMP Designer

First of all, we confirm that the power of AMP Designer by using random mutagenesis as our mutation generator. In the random mutagenesis function, we will first define a random number of the total positions needed to be mutated, then randomly distribute them on the peptide, and finally convert the parent amino acid to a randomly chosen amino acid -- which also means there is small chance (1/20) that nothing changed.

In only 3 cycles of evolution, we’ve already converge to the same local optimum in all 10 replicates (Figure left), which is consistent to previous reported experience on the difficulty to escape from the local fitness landscape.

Then we compare the random mutagenesis to a semi-rational method we developed, where we find the positively charged amino acids in the parents and change the charges around them. Although this engineering method is not necessarily to output better AMP all the time (data not shown here), it is an improved mutation generator for AMP Designers, compared to random mutagenesis.

  • Positive-cluster modification is easier to jump out of the local optimum and generates more diverse results -- discovered 7 optima in 10 replicates; whereas the results of random mutagenesis always converge into the same peptides.
  • The average efficiency score generated by positive-cluster modification (0.4857) is 2.8-fold higher than the random mutagenesis (0.1753).
  • Also, as a cost of further exploration, positive-cluster also needs a longer time to converge, when the parameter setup is at the same level.

    Figure 4:Comparison between two mutation generators, random mutagenesis, and positive cluster modification (semi-rational design). Here we show a representative in silico evolutionary experiments, with 10 different replicates (color labeled), either using random or semi-rational design method.

    Positive-cluster modification principle validation in wet-lab experiment

    Using the positive clustering design principle we found in the AMP Designer, we generated a library containing ~ 12,000 peptides from natural sequences. Using the experimental characterization, we confirmed a log-normal like distribution of the bacterial survival under the present of AMP-protein fusion. Besides, we found out all the top-efficiency AMPs shared a similar pattern -- reduced charges instead of the increase the charge.

    This shows the a trade-off between single AMP efficiency and AMP’s impact on protein expression and folding. According to our experimental data, it seems that the protein expression is a more important parameter to optimize the performance of AMP fused to a scaffold protein. (See details in: )

    Centre for Research and Interdisciplinarity (CRI)
    Faculty of Medicine Cochin Port-Royal, South wing, 2nd floor
    Paris Descartes University
    24, rue du Faubourg Saint Jacques
    75014 Paris, France
    paris-bettencourt-2018@cri-paris.org