Team:UESTC-Software/Notebook

Document

2018.3 Team building.

2018.4 Team meeting and topic selection meeting.

2018.5 Select topic and building up project’s overall framework. We analyzed current problems in the iGEM Registry, developed solutions and analyzed program feasibility.

2018.6.3-6.16 We held a large-scale publicity science activity in Sichuan science and technology museum and prepared for the iGEM southwest league.
2018.6.17 iGEM southwest league’s meeting was held in Ya 'an campus of Sichuan agricultural university.
2018.6.18 We went to Sichuan science and technology museum to organize the science activity. 2018.7.15-7.22 We decided to use blast to match integration and begun to localize blast. Based on existing research, we collected a large number of related documents and selected prediction method.

More

2018.7.18-7.22 This week we collected data sets for training. The prokaryotic promoters come from RegulonDB and eukaryotic promoters come from EPD. Non-promoter sample was selected from the CDS antisense strand.
2018.7.26-7.30 We built the CNN (convolutional neural networks) model today. Based on LeNet-5, we improved the model
2018.7.31-8.1 We interviewed with scholar Junbiao Dai. We collected the method of promoter recognition on the genome size. Following methods we chose:
Making the window moves 1-2 bp each time. If more than 10 consecutive windows are predicted as promoters, we call that region a promoter region and the segment with the highest score is selected as the promoter in this region.

More

2018.8.2-3 We went to Life baseline Chengdu (a life science company) for exchange visit. We also manually sample the results of the matching and add labels.
2018.8.4-5 We downloaded database, including Regulon DB, Promoc, Phage. And we also integrated databases including GO and EMBL.
2018.8.6-8 The accuracy of the model was changed to 98.80% by using the method of logistic regression. The trained model was used to screen the matching results. The matching results were manually checked and were used to expand the bio brick information.
2018.8.9-10 We completed building the global search model. Meanwhile, we used the sliding window method to continuously search long sequences. We recorded each position and score. After that we used the above two methods to analyze result to get the final promoter region.
2018.8.11 We collected the interaction information from String. We designed two different methods for promoter recognition on genome scale. At the same time, our game Bio-Chess was being designed. We used the results of global search to compare the results with other prediction software and data in GENBANK.
2018.8.12-13 We decided to use Mysql relational database as our foundation and preliminary designed the relationship structure in the database. By the way, we checked the contents of the database we previously built. We started to build the backend database.
2018.8.14 and optimized the parameters of E.coli global search.
2018.8.15-16 We designed different model for different Sigma factors. We decided to update our classifier for Eukaryotic promoter. We used three negative sample sets: exons, introns and 3’UTR.
2018.8.20-22 We finished work on CCiC posters, ppt, speech draft and web page. We captured the topic and abstract of all related references in NCBI. We also collected previous projects and wikis from old teams.
2018.8.23-25 We used the Microsoft Text Analytics Service to extract keywords and manually filter the most important keywords.
2018.8.26-31 We went to Shanghai to attend CCiC.

More

2018.9.1-3 We adjusted all parameters in our CNN and updated our datasets. The method of randomly intercepting a 251 bp sequence in exons, introns and 3UTR was improved in order to obtain more representative data. We used CDHIT to remove redundant sequences for each sample set and retrained the model with new datasets.
2018.9.4-6 All above process was repeated for commonly used eukaryotes such as Arabidopsis thaliana.
2018.9.7-9 We designed the final structure of our database.
2018.9.11-13 We make BioMaster a web application. We determined the display form of the information in the database and did some UI beautification.
2018.9.12-18 We drew he sequence diagrams, plasmid maps and interaction scatter plots. We also established a predictive promoter database. We obtained all the promoter sequence of E.coli using the CNN predictor. Then we match the sequence according to the EMBL genome annotation information. We expand its information and establish our own database.
2018.9.19-23 We sorted all our files and had our first database internal test to solve bugs. Than we opened our database to the public with its web site: http://igem.uestc.edu.cn/database. 2018.9.24-28 We kept on writing drafts and designed web page layout.
2018.9.29-30 We started to write drafts for our presentation and design our PPT. The PC version of Bio-Chess is finished.

More

2018.10.2-4 We finished translating our drafts. We started to upload our web pages and our PPT was still being designed.
2018.10.5 We recorded our video and beautified our PPT.
2018.10.6-8 We kept uploading our web pages and finished our judging form. The PC version Bio-Chess was finally released.
2018.10.9 We started to design our posters and do some web debugging.
2018.10.10 rehearsal.
2018.10.11-13 We did some check about our work and beautified our web pages. We also did a final check of our judging form.
2018.10.14-17 We debugged our web pages and uploaded our codes to Github.
2018.10.17-20 We tested our database and practiced for our presentation.
2018.10.21-22 Final version of our poster came out and we prepared for our trip to Boston.
2018.10.23 Our journey began!

More

Document