Team:SKLMT-China/Principle

Principle

We built our research into websites to address the need for interaction.

First, we processed the update module of the data. We use python language to process the input of DNA sequences, and use the coding format of the triple base to encode the core sequence. Subsequently, 65 data of the occurrence frequency and promoter strength of the AAA~GGG sequence are input into the database through library “pymysql”.

Next, we dealt with the part of deep learning. After users set learning generation on the Internet, our program will automatically retrieve all the data stored in the database, calculate all parameters according to the model of supervised training, and finally export the calculation results to the database for future use.

Finally, we provide an interface for predicting promoter strength. The user only needs to input the core sequence, and our program will read all the parameters obtained after deep learning from the database and calculate the strength of the promoter through the neural network structure we present. If the user has added new data before, re-complete the deep learning before the prediction session to feed the added data back to the results.

Python program execution process

Set Database

Based on pymysql, the python extension library, this program sets four databases for the blank database promoters we will use next: data, status, layer 1 and layer 2 weights.

Data Input

The purpose of this program is to import the data measured by our team into our database as the initial data for the promoter prediction.

DNAInsert

It helps other teams import their data into our database. Before importing, please carefully check the input data and ensure its correctness to the greatest extent, because a set of wrong data will cause great interference to our results.

Obtain Generation

This program is mainly responsible for extracting the existing generation information from the database and providing preset values for the deep learning part.

Deep Learning

It's the heart of the whole system. The coded value and strength value are obtained from the database and all parameters of hidden layer and output layer are generated by the neural network algorithm. Our database will be updated with these parameters immediately.

Promoter Strength

It reflects our team's research. Enter a core sequence for this program, which will use the results of deep learning to predict the strength level of the promoter and inform our users.