Team:UESTC-Software/Validation

Document

Theoretical support

  The original results of BLAST could not be directly used for database docking. In order to screen out the matching results with sufficient similarity, we developed a screening model based on the logistic regression classifier. We use a manually validated and tagged training set to train the model, and the accuracy of the triple-fold cross-validation of the classifier on the training set is about 98%.

  The ROC curve reflects the performance of the classifier. The larger the area under the ROC curve (auc value), the better the classifier performance. We plot the ROC curve for this classifier and the auc(area under the ROC curve) value is up to 0.99 indicates that the classifier performs extremely well.

Fig.1.Roc curve and auc value

  We performed a manual sampling test on the classification results of the classifier and found that the classification accuracy is very high.

  We tested the effect of our prediction model with an independent validation set. In the E. coli Sigma70 promoter prediction, the sensitivity reached 89.1%, the specificity reached 95.2%, and the accuracy reached 92.2%. The performance in eukaryotes is as follows:

Table.1.The performance compared with other method by Fickett & Hatzigeorgiou's evaluation criterion.

  Based on Fickett & Hatzigeorgiou's datasets and evaluation criterion, we evaluated the effect of our prediction tool in human genes. Since our tool do not have strand specificity, we treated all approaches as “not strand specific”.Its performance is better than most of the tools selected.

Table.2.The performance in eukaryotic with independent validation sets.

Database testing

Introduction

  In order to make synthetic biologists query and understand biobricks more quickly and easily, BioMaster is built into web form. You can use it by visiting our web site: http://igem.uestc.edu.cn/biomaster/

Information expansion

  Many biobricks are unclear in the iGEM Registry, and there is not enough information. When you browse BBa_K209410 in the iGEM registry and BioMaster, the results are as follows:

In iGEM Registry

In BioMaster

  Through the above results, we can find that there is no description of the biobricks in the iGEM registry. However, in BioMaster you can not only find its function, species, Feature Key, but also the GO annotations and some references. With those information, you can better use the biobricks and even create new biobricks.

Search

  When you want to search for a biobrick with a certain function such as: cellulose synthase, BioMaster will give the following results:

  You can also search the wiki by keywords. For example, if you search for biosensor, BioMaster gives the following results:

  Finally, you can also search directly using sequences: input sequence.

  We have a different result:

In iGEM registry

In BioMaster

  BioMaster can directly use sequence matching to find all biobricks that match the input sequence and sort by score. In addition, BioMaster gives more detailed information about matching sites, E-values and biobricks.

  When the user cannot find a suitable biobrick, we also provide a reference to the user from the promoter predicted in the E. coli genome.

Wet-lab Validation

  We worked with USETC-China to validate the effects of our predictor. They provided us with the FRE sequence of they used, we predicted this sequence, and performed promoter optimization to remove unnecessary part. They constructed plasmids with predicted promoter, and selecting red fluorescent protein as their reporter gene. After the vector was constructed and verified by sequencing, they transferred it to the host DH5.

  By verification, the normal work of the red fluorescent protein can be seen, which proves that the predicted promoter is very likely correct.

Feedback

  We invited 2018-NKU_CHINA, 2018-UESTC-China and 2018-USTC-Software to use our database. Meanwhile, we invited some previous iGEMers and professions to use it. They gave affirmation to our database and also gave us some advice to improve the database and better serve synthetic biology.

“The database is detailed and comprehensive, with relatively high practicability. However, it needs to be more aesthetically pleasing.
The search results page can add a description of the search terms. And the search results can be sorted in chronological order.
Database compatibility with browsers needs to be improved and some pages have Chinese words.” —NKU_CHINA
“BioMaster is of great excellence. It indeed provides us many useful statistics for us so that we can refer to it. BioMaster brings conveniences to out experiments. It some relevant references are provided and it can display biobrick information in a visualized way would be more user-friendly.” —UESTC-China
“The UI surface is very user-friendly. However, there’s some details you should be more careful. I suggest that you should provide what medal the team won on the searching of Team Wiki.” — Pro. Zhu Lvyun in NUTD
“Your information is comprehensive and the UI surface is user-friendly. However, the ranking of search results is to be optimized when searching with iGEM_ID, such as searching for BBa_K104000, the first display of BBa_K1040001, and BBa_K104000 in the back position. You’d better give out a guidance to each information, for example, new user may not know what EPD& RegulonDB contains of.” ” —USTC-Software

Document