NCTU_Formosa 2017 had completed a Peptide prediction model which can predict peptides for new function. In the model, they used Scoring Card Method (SCM) for machine learning. This year, NCTU_Formosa 2018 continued to use the same method for predicting antimicrobial peptide, in order to seek more candidates for our project.
We started from the Uniprot to pick out the peptides with the key word of antimicrobial. After deleting peptides with similarity over 50% with each other, there were totally 425 sequences left. These data will be taken as positive data for our model.
For negative data, we chose the sequences which are not antimicrobial peptides and the sequences length should be around 30 to 300 amino acids. The amount of these random selection is the same as the amount of positive datasets. Therefore, the ratio between positive and negative would be 1:1.
Next, all the datasets were mixed together, 2/3 for training and 1/3 for testing. After training, we got a threshold and the score of each amino acid sequence. Then we can calculate the score of any peptide through this scoring card.
At last, comparing the score of peptides with threshold, we can easily determine whether the unknown peptide might have the function of inhibiting bacteria’s growth.
Bacteriocin |
Score |
---|---|
Leucocyclicin Q |
438.38 |
Enterocin B |
464.36 |
Enterocin 96 |
464.06 |
Lacticin Z |
450.92 |
Bovicin HJ50 |
459.87 |
Durancin TW-49M |
478.97 |
Threshold |
431.65 |