Difference between revisions of "Team:SJTU-software/Document"

Latest revision as of 08:06, 17 October 2018

SJTU-software

Metlab: a metabolic network research tool box

Project —— Document

Met Differ

Data format

We design a data format, .met format, to simply describe a metabolic network. .met format is easy to read and write.
If a line starts with ‘##’, this line indicates a new subgraph. Follows the ‘##’ is the name of the subgraph. And the next lines are this subgraph.
If a line starts with ‘#’, this line is a metabolite. In this line, there should be metabolite ID, name and other information like SMILES.
If a line starts without any symbol, it indicates a reaction. Reaction ID, reactors and products should be included. And if exists, enzyme information should be included as well.

Algorithm

In this part, we use merge-and-mine method. First, we merge the pathway graph and the network graph into one graph called align graph, according to the similarity coefficient of nodes. Then the nodes in the align graph are lined according to topology structure. Finally, we search for maximal connected subgraph as the alignment result.

Similarity coefficient

When we align the pathway to the network, similarity should be qualified. We use similarity coefficient to value the similarity between the pathway and the network. Here we consider the similarity from two aspects, node and topology structure.

Node
There are two kinds of nodes in the metabolic network, metabolites and reactions. We consider them separately.

For metabolites, we mainly compare their structure information. Chemical structural formula can be described with SMILES. So we use a software package to extract a feature vector from the SMILES. Then we calculate the similarity coefficient of two feature vectors.

For reactions, we mainly consider the enzyme information. For each enzyme has an EC number, we can simply regularize the EC number to be the enzyme’s feature vector. For those reactions without an EC number, we use their reactors’ and products’ information to be the reaction’s information.

Topology structure
After merging the pathway and the network into an align graph, we add edges to the align graph according to topology structure. We define the pathway as graph X, the network as graph Y, the align graph as Z. And x1, x2 are two nodes from X; y1, y2 are nodes from Y. x1, y1 are similar enough to be merged into a node z1 in Z, and x2, y2 are similar enough to be merged into a node z2. If node pairs (x1,x2) and (y1,y2) are both connected, and the edges have the same direction, we give the edge (z1,z2) a high weight; if the edge (x1,x2) is opposite to (y1,y2), we give the edge (z1,z2) a lower weight; if there is no edge (x1,x2) or (y1,y2), we give the edge (z1,z2) the lowest weight.

Now the weight is not accurate enough to evaluate the edges, so we calculate ELI(extended local interactome) between each pair of nodes. ELI is a coefficient to evaluate the connection between two nodes and the surrounding nodes. The calculation of ELI is as follows:

1. Calculate Ek(x): The set of paths connecting node x and its neighbors at distance k.

2. Calculate Sk(x,y): Thumbnail

3. Calculate ELI(x,y): Thumbnail

Now we can use ELI to replace the rough weight we gave to each edge. Since the align graph is a dense graph, we can prune it with ELI: for a node zi, the edges from zi are (z1,zi), (z2,zi), ... ,and suppose the edge with the highest ELI is (zj,zi).Then we prune the edges with ELI < k*ELI(zj,zi), and k < 1. In this way, we delete some edges with low ELI, making the align graph not so dense and easy to de the maximal connected subgraph search.

SBML Drawer & Differ

With the development of synthetic biology, more and more computational methods were applied to reduce the researchers’ workload. The Systems Biology Markup Language (SBML), which is a free and open interchange format for computer models, is widely used. The abilities to compare different SBML models of different situations and different versions of the same model are both important. Many other engineering disciplines rely a great extent on version control to track designs that are produced at each stage of the iterative design cycle. This is often accompanied by using the File Differences tool to compare different versions directly and determine the changes. However, it is not satisfactory when comparing two models of the SBML format as text directly. Because it is difficult to find the significant features in its output. What’s more, many textual changes are not significant (e.g., changes in whitespace or the ordering of elements), and if the ID of a species is changed, this change will occur in many places and has a large impact.

We propose model-diff, a tool that can read two or more metabolic network models in SBML format and generate images to show the differences. The default view depicts the metabolites as an ellipse and the reaction as a rectangle. By default, elements in both models will be treated as the same entity if they have the same id attribute. Shading is used to indicate whether each node and edge are shared by two models (gray), a single model rather than two models (red or blue). The dotted node edge indicates that the component is shared between models but its properties are different: a rectangle with a dashed border indicates that not all models have the same kinetic law response; an ellipse with a dashed double border indicates that not all models have the same is boundary property.

Models-diff reads the metabolic models in SBML format and produces the output in DOT format, which can be converted to an image using GraphViz or other compatible software. It can be used as a python package, as a standalone command line tool, or through a form on our website.

SMILES Drawer

SMILES (Simplified molecular input line entry specification) is a specification that explicitly describes the molecular structure in ASCII strings. Smiles-differ provides two functions of smiles alignment and visualization. Users need to provide two molecules of SMILES to be compared. The software classifies the similarity of two molecules based on molecular structure. In theory, the lowest score is 0 and the highest score is 1 point. At the same time, the user can enter SMILE to preview the molecules to be compared.

DNA Editor

SDNA Editor is an everyday lab tool for handling sequences, it has many functions that can be used by researchers very conveniently. The basic function is that it can do basic DNA/RNA operations, such as reverse-Complement DNA, remove non-IUPAC letters and change upper/lower case. Also, it can show the basic information of the sequence. Restriction sites module has a function to find restriction sites, select them and highlight the selected in sequence so that users can find out the locations of restriction sites. Digest module can draws graphic maps. Draw restriction maps with or without features for linear circular sequences. Besides, it can visualize a digest as gel picture. Simulate band intensity or draw all bands black. Translate module can translates DNA sequences in one, three or six frames and show out the peptide sequence features module can handles GenBank features. Use features to annotate regions in the DNA. DNA Editor gives you full control. All calculation happens on your machine. You can load and save sequences in FASTA or GenBank format. Pictures can be saved as html or SVG format.

@@ Line 18: / Line 18: @@
              padding-bottom: 40px;
              border-bottom: 1px solid #6b3b25;
+            text-align:justify;
              }
              #posts-list article:last-child {
@@ Line 66: / Line 67: @@
              font-size: 26px;
              font-weight: 700;
+            padding: 0px 0px 8px 0px;
              }
              #posts-list article .subtitle{
@@ Line 73: / Line 75: @@
                  font-size: 18px;
                  font-weight: 550;
-                 padding: 15px 0px 0px 0px;
+                 padding: 10px 0px 0px 0px;
              }
              #posts-list article .excerpt {
@@ Line 200: / Line 202: @@
                 <article class="cf">
                                      <div class="entry-title" id="Met">
-                                         <div class="post-heading" >Met differ
+                                         <div class="post-heading" >Met Differ</div>
                                      </div>
                                      <div class="subtitle" >Data format</div>
@@ Line 214: / Line 216: @@
                          <img src="https://static.igem.org/mediawiki/2018/4/4d/T--SJTU-software--format.jpg" alt="Thumbnail" />
                      </div>
-                                 <div class="excerpt">
-                                            This game is basically a maze and mainly contains five parts: glycolysis, the citric acid cycle, fatty acid metabolism, biosynthesis of steroid and urea cycle. You can choose different path at nodes(middle metabolites) by clicking on different walls, where the requirement of each path will be shown. The wall will turn red if you don't meet the requirement.
-                                            ATP is the "currency" in the game and will decline as time goes by. The game fails when ATP drops to zero, so try to make the right choice and survive. It can also be used to exchange some important middle metabolites. You can always check the remaining time as well as your current major metabolite in the top left corner of your version while the repository sysmtem is in the top right corner.
-                                            We provide both Engish and Chinese version of this game.
-                                    </div>
-                           <div class="subtitle" id="VRpop">VR game popularization</div>
+                    <div class="subtitle"> Algorithm</div>
-                                  <div class="excerpt">
-                                 On Sept. 16th, we came to Youth Center of Minhang District with our ‘Met Journey’, to introduce metabolism and synthetic biology. First, we explained about the metabolic system in the human body to them, leading them to recognize the importance of metabolism. Then we introduced our game, ‘Met Journey’ and taught them to play the game. They were all very interested with the game. At first, they were not so familiar with the system, so they failed once and again. As they became more familiar, they could complete the system, and had fun in the process of exploring. With the fun of playing the game, they realized the complexity of metabolic system. Then we introduced synthetic biology to them, explaining the close relationship between synthetic biology and our daily life. We were all proud to see their interests in synthetic biology were aroused.
+                                 <div class="excerpt" width="80%">
+In this part, we use merge-and-mine method. First, we merge the pathway graph and the network graph into one graph called align graph, according to the similarity coefficient of nodes. Then the nodes in the align graph are lined according to topology structure. Finally, we search for maximal connected subgraph as the alignment result.
                                   </div>
-                  </article>
+                     <div class="subtitle">Similarity coefficient</div>
-                <article class="cf">
+                                 <div class="excerpt" width="80%">
+When we align the pathway to the network, similarity should be qualified. We use similarity coefficient to value the similarity between the pathway and the network. Here we consider the similarity from two aspects, node and topology structure.
-                    <div class="entry-title">
+                                 </div>
-                        <div class="meta">April 7 2018</div>
+                                 <div class="feature-image">
-                        <div class="post-heading" >2018 SJTUer’s Festival——Popularization of Biology</div>
+                                    <img src="https://static.igem.org/mediawiki/2018/2/21/T--SJTU-software--similarity.jpg" alt="Thumbnail" />
-                    </div>
+                                 </div>
-                    <div class="excerpt">
+                                  <div class="excerpt" width="80%">
-                        SJTUer’s festival is an annual Youth creative carnival in Shanghai Jiao Tong University. In 2018 SJTUer’s festival, we set up a booth in section of Science and technology to provide public with scientific biology knowledge and promoting their scientific literacy.
+<strong>Node</strong></br>There are two kinds of nodes in the metabolic network, metabolites and reactions. We consider them separately.
-                    </div>
+                                 </div>
-                    <div class="subtitle" id="Microscopic">Microscopic observation</div>
+                                 <div class="feature-image">
-                    <div class="excerpt">
+                                    <img src="https://static.igem.org/mediawiki/2018/9/9b/T--SJTU-software--node.jpg" alt="Thumbnail" />
-                        We demonstrated the public how to use a microscope and stereoscope. From Moss slice，blood cell to petal, ant…… Everyone have enjoyed the wonder of microworld. Especially, the activity attracted many children and stimulated them the exploration of the biological world!
+                                 </div>
-                    </div>
+                                  <div class="excerpt" width="80%">
+For metabolites, we mainly compare their structure information. Chemical structural formula can be described with SMILES. So we use a software package to extract a feature vector from the SMILES. Then we calculate the similarity coefficient of two feature vectors.
+                                 </div>
+                                 <div class="feature-image">
+                                    <img src="https://static.igem.org/mediawiki/2018/6/65/T--SJTU-software--reaction.jpg" alt="Thumbnail" />
+                                 </div>
+                                <div class="excerpt" width="80%">
+For reactions, we mainly consider the enzyme information. For each enzyme has an EC number, we can simply regularize the EC number to be the enzyme’s feature vector. For those reactions without an EC number, we use their reactors’ and products’ information to be the reaction’s information.
+                                 </div>
+                                 <div class="excerpt" width="80%">
+<strong>Topology structure</strong></br>After merging the pathway and the network into an align graph, we add edges to the align graph according to topology structure. We define the pathway as graph X, the network as graph Y, the align graph as Z. And x1, x2 are two nodes from X; y1, y2 are nodes from Y. x1, y1 are similar enough to be merged into a node z1 in Z, and x2, y2 are similar enough to be merged into a node z2. If node pairs (x1,x2) and (y1,y2) are both connected, and the edges have the same direction, we give the edge (z1,z2) a high weight; if the edge (x1,x2) is opposite to (y1,y2), we give the edge (z1,z2) a lower weight; if there is no edge (x1,x2) or (y1,y2), we give the edge (z1,z2) the lowest weight.
+                                 </div>
+                                 <div align="center">
+                                    <img src="https://static.igem.org/mediawiki/2018/e/e8/T--SJTU-software--align.jpg" alt="Thumbnail" />
+                                 </div>
+                                 <div class="excerpt" width="80%">
+Now the weight is not accurate enough to evaluate the edges, so we calculate ELI(extended local interactome) between each pair of nodes. ELI is a coefficient to evaluate the connection between two nodes and the surrounding nodes. The calculation of ELI is as follows:
+                                 </div>
+                                  <div class="excerpt" width="80%">
+. Calculate Ek(x): The set of paths connecting node x and its neighbors at distance k.
+                                 </div>
+                                 <div class="excerpt" width="80%">
+. Calculate Sk(x,y):
+                                     <img src="https://static.igem.org/mediawiki/2018/1/16/T--SJTU-software--calculate1.jpg" alt="Thumbnail" />
+                                 </div>
+                                 <div class="excerpt" width="80%">
+. Calculate ELI(x,y):
+                                     <img src="https://static.igem.org/mediawiki/2018/4/4e/T--SJTU-software--calculate2.jpg" alt="Thumbnail" />
+                                 </div>
+                                 <div class="excerpt" width="80%">
+Now we can use ELI to replace the rough weight we gave to each edge. Since the align graph is a dense graph, we can prune it with ELI: for a node zi, the edges from zi are (z1,zi), (z2,zi), ... ,and suppose the edge with the highest ELI is (zj,zi).Then we prune the edges with ELI &lt; k*ELI(zj,zi), and k &lt; 1. In this way, we delete some edges with low ELI, making the align graph not so dense and easy to de the maximal connected subgraph search.
+                                 </div>
+<br/>
+            <div class="entry-title" id="SBML">
+                                        <div class="post-heading" >SBML Drawer & Differ</div>
+            </div>
+                                 <div class="excerpt" width="80%">
+With the development of synthetic biology, more and more computational methods were applied to reduce the researchers’ workload. The Systems Biology Markup Language (SBML), which is a free and open interchange format for computer models, is widely used. The abilities to compare different SBML models of different situations and different versions of the same model are both important. Many other engineering disciplines rely a great extent on version control to track designs that are produced at each stage of the iterative design cycle. This is often accompanied by using the File Differences tool to compare different versions directly and determine the changes. However, it is not satisfactory when comparing two models of the SBML format as text directly. Because it is difficult to find the significant features in its output. What’s more, many textual changes are not significant (e.g., changes in whitespace or the ordering of elements), and if the ID of a species is changed, this change will occur in many places and has a large impact.
+                                 </div>
+                                  <div class="excerpt" width="80%">
+We propose model-diff, a tool that can read two or more metabolic network models in SBML format and generate images to show the differences. The default view depicts the metabolites as an ellipse and the reaction as a rectangle. By default, elements in both models will be treated as the same entity if they have the same id attribute. Shading is used to indicate whether each node and edge are shared by two models (gray), a single model rather than two models (red or blue). The dotted node edge indicates that the component is shared between models but its properties are different: a rectangle with a dashed border indicates that not all models have the same kinetic law response; an ellipse with a dashed double border indicates that not all models have the same is boundary property.
+                                 </div>
                      <div class="feature-image">
-                         <img src="https://static.igem.org/mediawiki/2018/4/4d/T--SJTU-software--format.jpg" alt="Thumbnail" />
+                         <img src="https://static.igem.org/mediawiki/2018/d/d0/T--SJTU-software--SBML.jpg" alt="Thumbnail" />
-                    </div>
-                    <div class="subtitle" id="DNA">DNA DIY workshop</div>
-                    <div class="excerpt">
-                        We also provided public a chance to build DNA models with their own hands. In this DIY process, everyone not only feel the sense of accomplishment of building a model by hand, but also increased the knowledge. They were impressed with the structure of double stranded DNA and understood the mysteries of molecular biology.
-                    </div>
-                    <div class="feature-image">
-                       <img src="https://raw.githubusercontent.com/sjtusoftware2018/2018iGEM_wiki/master/images/HP/DNA.jpg" alt="Thumbnail" />
                      </div>
-                    <div class="subtitle" id="synthetic">Introduction to synthetic biology</div>
+                                  <div class="excerpt" width="80%">
-                    <div class="excerpt">
+Models-diff reads the metabolic models in SBML format and produces the output in DOT format, which can be converted to an image using GraphViz or other compatible software. It can be used as a python package, as a standalone command line tool, or through a form on our website.
-                        We gave a brief introduction of synthetic biology to public, then we encouraged them write down their hope for the future development of synthetic biology. Someone wrote” I hope synthetic biology can help cure for cancer in the future.” Another wrote” I hope synthetic biology can play a role in promoting plant growth and raising the yield of crops.’’…… We were greatly inspired by these public opinions.
+                </div>
-                    </div>
-                    <div class="feature-image">
-                        <img src="https://raw.githubusercontent.com/sjtusoftware2018/2018iGEM_wiki/master/images/HP/Introduction.jpg" alt="Thumbnail" />
-                    </div>
-                </article>
-   <article class="cf">
+<br/>
+                 <div class="entry-title" id="SMILES">
-                   <div class="entry-title" id="Camp">
+                            <div class="post-heading" >SMILES Drawer</div>
-                           <div class="meta">July 2018</div>
+                 </div>
-                           <div class="post-heading" >Science summer camp </div>
+                               <div class="excerpt" width="80%">
-                   </div>
+SMILES (Simplified molecular input line entry specification) is a specification that explicitly describes the molecular structure in ASCII strings. Smiles-differ provides two functions of smiles alignment and visualization. Users need to provide two molecules of SMILES to be compared. The software classifies the similarity of two molecules based on molecular structure. In theory, the lowest score is 0 and the highest score is 1 point. At the same time, the user can enter SMILE to preview the molecules to be compared.
+                                 </div>
-                   <div class="excerpt">
+<br/>
-                         During the summer vacation, we took advantage of the experiment center for life-science teaching and conducted a series of basic experiments facing to the middle school students nationwide. These experiments including the identification of GM foods were aimed at giving students a better understanding of genes and life science. We hope we could intrigue their interest in life science and make them be devoted in synthetic biology in the near feature.
+                 <div class="entry-title" id="Editor">
-                   </div>
+                            <div class="post-heading" >DNA Editor</div>
+                 </div>
-                    <div class="feature-image">
+                               <div class="excerpt" width="80%">
-                          <img src="https://static.igem.org/mediawiki/2018/5/56/T--SJTU-software--camp.jpg" alt="Thumbnail" />
+SDNA Editor is an everyday lab tool for handling sequences, it has many functions that can be used by researchers very conveniently. The basic function is that it can do basic DNA/RNA operations, such as reverse-Complement DNA, remove non-IUPAC letters and change upper/lower case. Also, it can show the basic information of the sequence. Restriction sites module has a function to find restriction sites, select them and highlight the selected in sequence so that users can find out the locations of restriction sites. Digest module can draws graphic maps. Draw restriction maps with or without features for linear circular sequences. Besides, it can visualize a digest as gel picture. Simulate band intensity or draw all bands black. Translate module can translates DNA sequences in one, three or six frames and show out the peptide sequence features module can handles GenBank features. Use features to annotate regions in the DNA. DNA Editor gives you full control. All calculation happens on your machine. You can load and save sequences in FASTA or GenBank format. Pictures can be saved as html or SVG format.
-                    </div>
+                                 </div>
-            </article>
-                <article class="cf">
-                   <div class="entry-title" id="College">
-                           <div class="meta">October 10 2018</div>
-                           <div class="post-heading" > College Open Day——Poster Presentation </div>
-                   </div>
-                   <div class="feature-image">
-                           <img src="https://raw.githubusercontent.com/sjtusoftware2018/2018iGEM_wiki/master/images/HP/College1.jpg" alt="Thumbnail" />
-                   </div>
-                   <div class="excerpt">
-                                In desire of raising more students and faculty’s awareness of synthetic biology and IGEM, our team elaborately prepared a presentation on October 10th, the college open day. We introduced the iGEM competition, 2017&2018’s projects of our team, as well as the status quo and prospect of synthetic biology to the freshmen in SJTU. They showed great interest in iGEM and the projects we did, and some of them even showed willingness to join us someday.
-                   </div>
-                    <div class="feature-image">
-                          <img src="https://raw.githubusercontent.com/sjtusoftware2018/2018iGEM_wiki/master/images/HP/College2.jpg" alt="Thumbnail" />
-                    </div>
              </article>
@@ Line 309: / Line 314: @@
                  <ul>
                      <li class="block">
-                         <h4 class="heading"  align="center">Met &nbsp; Journey</h4>
+                         <h4 class="heading"  align="center">Document</h4>
                          <ul>
-                             <li class="cat-item"><a href="#VRintro" >VR game introduction</a></li>
+                             <li class="cat-item"><a href="#Met" >Met Differ</a></li>
-                             <li class="cat-item"><a href="#VRpop" >VR game popularization</a></li>
+                             <li class="cat-item"><a href="#SBML" >SBML Drawer & Differ</a></li>
-                        </ul>
+                             <li class="cat-item"><a href="#SMILES" >SMILES Drawer</a></li>
-                    </li>
+                             <li class="cat-item"><a href="#Editor" >DNA Editor</a></li>
-                    <li class="block">
-                        <h4 class="heading"  align="center">Popularization</h4>
-                        <ul>
-                             <li class="cat-item"><a href="#Microscopic" >Microscopic observation</a></li>
-                             <li class="cat-item"><a href="#DNA" >DNA DIY workshop</a></li>
-                            <li class="cat-item"><a href="#synthetic" >Introduction to synthetic biology</a></li>
                          </ul>
                      </li>
-                    <li class="block">
-                                <h4 class="heading"  align="center">Science camp</h4>
-                                <ul>
-                                    <li class="cat-item"><a href="#Camp">Basic experiment instruction</a></li>
-                                </ul>
-                     </li>
-                    <li class="block">
-                                <h4 class="heading"  align="center">College Open Day</h4>
-                                <ul>
-                                    <li class="cat-item"><a href="#College">Poster Presentationn</a></li>
-                                </ul>
-                     </li>
                  </ul>

Difference between revisions of "Team:SJTU-software/Document"

Latest revision as of 08:06, 17 October 2018

Address

Contact Us

Metlab

Related Links