RESULTS
Test AMP Forest on random peptide library
We tested 47,938 peptides with the AMP Forest 100 model (100 decision trees), and predicted result is highly proportional to the scores measured in experiment, with a correction coefficient ~ 2.1419, and Pearson correlation coefficient ~ 0.982 (Figure 1, where the predicted score is already corrected by the coefficient)
Limitation of AMP Forest: bias and error analysis at the boundary condition
First of all, the AMP Forest model describe the sequence - antimicrobial efficiency landscape, according to the peptides detected in the random library. Therefore, it must exist a manifold that could describe the boundary of the model mathematically.
When the peptide sequences fall inside the manifold, the model could provide an accurate estimation of the antimicrobial efficiency. However, we could consider the possibility of a peptide that is outside the manifold -- where the model loses the power. Here we show a diagram to describe this scenario in a 2D simplification, whereas the real N-mer peptide sequence space is 20*N dimensional.
When we consider the error of predictive model, it is actually obvious that the error is larger and more biased when the score goes further from the 0, i.e. towards to the edge of the model.
Especially, the optimization in the AMP Evolver is making the sequence more and more efficient. The results AMP Forest gives get less and less accurate, and at the end reach the edge of the model, or even jump out of the model due the randomness. Therefore, the outcomes of AMP Designer must be further screened in wet-lab experiment.
The result from wet-lab experiment could be used to further update, and expand the edge of AMP Forest. As a consequence, the software itself will evolve in this design-build-test cycle.
Positive-cluster modification design principle found on AMP Designer
First of all, we confirm that the power of AMP Designer by using random mutagenesis as our mutation generator. In the random mutagenesis function, we will first define a random number of the total positions needed to be mutated, then randomly distribute them on the peptide, and finally convert the parent amino acid to a randomly chosen amino acid -- which also means there is small chance (1/20) that nothing changed.
In only 3 cycles of evolution, we’ve already converge to the same local optimum in all 10 replicates (Figure left), which is consistent to previous reported experience on the difficulty to escape from the local fitness landscape.
Then we compare the random mutagenesis to a semi-rational method we developed, where we find the positively charged amino acids in the parents and change the charges around them. Although this engineering method is not necessarily to output better AMP all the time (data not shown here), it is an improved mutation generator for AMP Designers, compared to random mutagenesis.
Also, as a cost of further exploration, positive-cluster also needs a longer time to converge, when the parameter setup is at the same level.
Positive-cluster modification principle validation in wet-lab experiment
Using the positive clustering design principle we found in the AMP Designer, we generated a library containing ~ 12,000 peptides from natural sequences. Using the experimental characterization, we confirmed a log-normal like distribution of the bacterial survival under the present of AMP-protein fusion. Besides, we found out all the top-efficiency AMPs shared a similar pattern -- reduced charges instead of the increase the charge.
This shows the a trade-off between single AMP efficiency and AMP’s impact on protein expression and folding. According to our experimental data, it seems that the protein expression is a more important parameter to optimize the performance of AMP fused to a scaffold protein. (See details in: )