Theoretical support
The original results of BLAST could not be directly used for database docking. In order to screen out the matching results with sufficient similarity, we developed a screening model based on the logistic regression classifier. We use a manually validated and tagged training set to train the model, and the accuracy of the triple-fold cross-validation of the classifier on the training set is about 98%.
The ROC curve reflects the performance of the classifier. The larger the area under the ROC curve (auc value), the better the classifier performance. We plot the ROC curve for this classifier and the auc(area under the ROC curve) value is up to 0.99 indicates that the classifier performs extremely well.
Fig.1.Roc curve and auc value
We performed a manual sampling test on the classification results of the classifier and found that the classification accuracy is very high.
We tested the effect of our prediction model with an independent validation set. In the E. coli Sigma70 promoter prediction, the sensitivity reached 89.1%, the specificity reached 95.2%, and the accuracy reached 92.2%. The performance in eukaryotes is as follows:
Table.1.The performance compared with other method by Fickett & Hatzigeorgiou's evaluation criterion.
Based on Fickett & Hatzigeorgiou's datasets and evaluation criterion, we evaluated the effect of our prediction tool in human genes. Since our tool do not have strand specificity, we treated all approaches as “not strand specific”.Its performance is better than most of the tools selected.
Table.2.The performance in eukaryotic with independent validation sets.
Database testing
Introduction
In order to make synthetic biologists query and understand biobricks more quickly and easily, BioMaster is built into web form. You can use it by visiting our web site: http://igem.uestc.edu.cn/biomaster/
Information expansion
Many biobricks are unclear in the iGEM Registry, and there is not enough information. When you browse BBa_K209410 in the iGEM registry and BioMaster, the results are as follows:
In iGEM Registry
In BioMaster
Through the above results, we can find that there is no description of the biobricks in the iGEM registry. However, in BioMaster you can not only find its function, species, Feature Key, but also the GO annotations and some references. With those information, you can better use the biobricks and even create new biobricks.
Search
When you want to search for a biobrick with a certain function such as: cellulose synthase, BioMaster will give the following results:
You can also search the wiki by keywords. For example, if you search for biosensor, BioMaster gives the following results:
Finally, you can also search directly using sequences: input sequence.
We have a different result:
In iGEM registry
In BioMaster
BioMaster can directly use sequence matching to find all biobricks that match the input sequence and sort by score. In addition, BioMaster gives more detailed information about matching sites, E-values and biobricks.
When the user cannot find a suitable biobrick, we also provide a reference to the user from the promoter predicted in the E. coli genome.
Wet-lab Validation
We worked with USETC-China to validate the effects of our predictor. They provided us with the FRE sequence of they used, we predicted this sequence, and performed promoter optimization to remove unnecessary part. They constructed plasmids with predicted promoter, and selecting red fluorescent protein as their reporter gene. After the vector was constructed and verified by sequencing, they transferred it to the host DH5.
By verification, the normal work of the red fluorescent protein can be seen, which proves that the predicted promoter is very likely correct.
Feedback
We invited 2018-NKU_CHINA, 2018-UESTC-China and 2018-USTC-Software to use our database. Meanwhile, we invited some previous iGEMers and professions to use it. They gave affirmation to our database and also gave us some advice to improve the database and better serve synthetic biology.
The search results page can add a description of the search terms. And the search results can be sorted in chronological order.
Database compatibility with browsers needs to be improved and some pages have Chinese words.” —NKU_CHINA