Difference between revisions of "Team:Madrid-OLM/Computationaimprovement"

 
(4 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
     <style>
 
     <style>
 
         .tittle-secc{
 
         .tittle-secc{
             padding-top: 12em !important;
+
             padding-top: 8em !important;
 
         }
 
         }
 
         .figureimage{
 
         .figureimage{
Line 44: Line 44:
 
                             <h1 id="Teamtittle">Computational improvement of the aptamer</h1>
 
                             <h1 id="Teamtittle">Computational improvement of the aptamer</h1>
 
                             <p class="lead">
 
                             <p class="lead">
                                 The target of this protocol is to improve the affinity of the union between the protein and the aptamer, obtained by the process that had been explained previously, by bioinformatics methods.
+
                                 The target of this section is to improve the affinity of the union between the protein and the aptamer, obtained by the process that has been explained previously, by bioinformatics methods.
 
                             </p>
 
                             </p>
 
                             <p class="lead">
 
                             <p class="lead">
                                 To carry out this affinity improvement, you can start from only 2 elements: the DNA aptamer sequence and the name or preferably the amino acid protein sequence. However, any extra information that one could find in previous studies could be so helpful as to save time and reduce the margin of cumulative error. The complete protocol has four different sections:   
+
                                 To carry out this affinity improvement, you can start from only 2 elements: the DNA aptamer sequence and the name or ,preferably, the amino acid protein sequence. However, any extra information that one could find in previous studies could be so helpful as to save time and reduce the margin of acumulative error. The complete path has four different sections:   
 
                             </p>
 
                             </p>
 
                             <ul>
 
                             <ul>
 
                                 <li>
 
                                 <li>
                                     <p class="lead">• Obtaining the 3D protein structure.
+
                                     <p class="lead">• Obtaining the 3D protein structure  
 
                                     </p>       
 
                                     </p>       
 
                                 </li>
 
                                 </li>
Line 63: Line 63:
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
                                     <p class="lead">• Study of the union and proposal of mutation in the sequence.  
+
                                     <p class="lead">• Study of the union and proposal of mutation in the sequence.
 
                                     </p>       
 
                                     </p>       
 
                                 </li>
 
                                 </li>
Line 69: Line 69:
 
                              
 
                              
 
                             <p class="lead">
 
                             <p class="lead">
                                 <br/>It is very important to understand that this process does not guarantee a better result. However it gives the opportunity to improve the aptamer obtained. Although there are many works with proteins structures and the interaction between them, there are almost no studies that work with nucleic acids in simulation or structure prediction of the structures beyond double strand helix. Therefore any final result obtained should be checked in the laboratory, either to confirm or discard the new sequence obtained. important to understand that this process not guarantee a better result. However it gives the opportunity of improve the aptamer obtained. Although there are so many works with proteins structures and the interaction between each other’s, there are almost no studies that work with nucleic acids in simulation or structure prediction terms beyond double strand helix. Therefore any final result obtained should be checked in the laboratory, either to confirm or discard the new sequence obtained.
+
                                 <br/>It is very important to understand that this process does not guarantee a better result. However it gives the opportunity to improve the aptamer obtained. Although there are many works with proteins structures and the interaction between them, there are almost no studies that work with nucleic acids in simulation or structure prediction of the structures beyond double strand helix. Therefore any final results obtained should be checked in the laboratory, either to confirm or discard the new sequence obtained. Important to understand that this process does not guarantee a better result. However it gives the opportunity to improve the aptamer obtained.  
 
                             </p>
 
                             </p>
 
                             <p class="lead">
 
                             <p class="lead">
 
                                 <br/>You could see all the server that we will mention in the link section:
 
                                 <br/>You could see all the server that we will mention in the link section:
 
                             </p>
 
                             </p>
                             <a class="btn btn--primary type--uppercase " href="#links" class="inner-link">
+
                             <a class="btn btn--primary-2 type--uppercase " href="#links" class="inner-link">
 
                                 <span class="btn__text">
 
                                 <span class="btn__text">
 
                                     Links of interest
 
                                     Links of interest
Line 92: Line 92:
 
                             <h2>Obtaining the 3D protein structure</h2>
 
                             <h2>Obtaining the 3D protein structure</h2>
 
                             <p class="lead">
 
                             <p class="lead">
                                 The final result of this section should be a file in .pdb format where we can find the coordinates of every single atom of the protein. That file could be interpreted by bioinformatics programs like Phymol or Chimera and represent the protein in different graphically. To obtain it, there are two different paths depending on one fact: has the protein been previously crystallographed?
+
                                 The final result of this section should be a file in .pdb format where we can find the coordinates of every single atom of the protein. That file could be interpreted by bioinformatics programs like Phymol or Chimera and represent the protein in different graphical ways. To obtain it, there are two different paths depending on one fact: has the protein been previously crystallographed?
 
                             </p>
 
                             </p>
                             <h4>A structure obtained experimentally exists</h4>
+
                             <h4>A cristallographed structure exists</h4>
 
                             <p class="lead">
 
                             <p class="lead">
                                 We will simply use this proven structure. This will be the most reliable method due to it being the result of a group's investigation. To check if there is any previous structure we recommend the following simple steps:
+
                                 We will simply use this proven structure. This will be the most reliable method as it is the experimental result of any research group. To check if there is any previous cristallographed structure we recommend the following simple steps:
 
                             </p>
 
                             </p>
 
                             <ul>
 
                             <ul>
Line 118: Line 118:
 
                                 <br/>Clarification, the PDB is a universal database where are the 3D structure of all proteins which have been obtained its structure experimentally. That is, any structure that we find there is quite reliable and proven. However, predictions of structures are not included, owing to this procedure is not experimental.
 
                                 <br/>Clarification, the PDB is a universal database where are the 3D structure of all proteins which have been obtained its structure experimentally. That is, any structure that we find there is quite reliable and proven. However, predictions of structures are not included, owing to this procedure is not experimental.
 
                             </p>
 
                             </p>
                             <h4>A structure obtained experimentally don’t exists.</h4>
+
                             <h4>A cristallographed structure does not exist</h4>
 
                             <p class="lead">
 
                             <p class="lead">
                                 In this case is necessary to start with a structure prediction process. Although this is not the optimal method because we could start to accumulate error, we can obtain very reliable results. Several different methods can be used for this purpose. We recommend the following two and in this order:
+
                                 In this case it is necessary to start with a structure prediction process. Although this is not the optimal method because we could start to accumulate error, we can obtain very reliable results. Several different methods can be used for this purpose. We recommend the following two and in this order:
 
                             </p>
 
                             </p>
 
                             <ul>
 
                             <ul>
 
                                 <li>
 
                                 <li>
                                     <p class="lead">1. <b>Homology methods:</b> This method looks for sequences of proteins that are included in the PDB that have a high similarity with our protein and adopt its 3D structure. If the result shows a similarity of more than 30% in the entire sequence, we could be certain the structure. For this step we recommend the Swiss-Model server. Otherwise, if it gives an inferior value, we should move on to the next step.
+
                                     <p class="lead">1. <b>Homology methods:</b> This method looks for sequences of proteins, that are included in the PDB, that have a high similarity with our protein and adopt its 3D structure. If the result shows a similarity of more than 30% in the entire sequence, we could be certain about the structure. For this step, we recommend the Swiss-Model server. Otherwise, if it gives an inferior value, we should move on to the next step.
 
                                     </p>       
 
                                     </p>       
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
                                     <p class="lead">2. <b>I-Tasser (Iterative Threading ASSEmbly Refinement): </b> This server uses several methods, as they define themselves: “a hierarchical approach to protein structure and function prediction. It first identifies structural templates from the PDB by multiple threading approach LOMETS, with full-length atomic models constructed by iterative template fragment assembly simulations.” This method is currently the most reliable and they have been the winners in predicting structure in the last 6 editions of the CAP tests.
+
                                     <p class="lead">2. <b>I-Tasser (Iterative Threading ASSEmbly Refinement): </b> This server uses several methods, as they define themselves: “a hierarchical approach to protein structure and function prediction. It first identifies structural templates from the PDB by multiple threading approach LOMETS, with full-length atomic models constructed by iterative template fragment assembly simulations.” This method is currently the most reliable and has won the last 6 editions of the CAP tests.
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
                                     <p class="lead">3. As an extra option we mention <b>Robetta: </b> It is an ab-initio simulation method, where it tries to create the protein from 0, using physical simulation and gross computational power. It works best for short sequences and can take a long time to get reliable results.
+
                                     <p class="lead">3. As an extra option we mention <b>Robetta: </b> It is an ab-initio simulation method, whose aim is to create the protein from scratch, using physical simulation and brute force. It works better for short sequences as it may take a long time to get reliable results.
 
                                     </p>       
 
                                     </p>       
 
                                 </li>
 
                                 </li>
Line 139: Line 139:
 
                             <h4>Energy minimization.</h4>
 
                             <h4>Energy minimization.</h4>
 
                             <p class="lead">
 
                             <p class="lead">
                                 This last step is completely recommendable whatever the origin of the file with the structure be. It is necessary to have the structure in a state, as stable as possible. To minimize the structure we can employ 2 strategies:
+
                                 This last step is completely recommendable whatever the origin of the file with the structure is. It is necessary to have the structure in a state, as stable as possible. To minimize the structure we can use two strategies:
 
                             </p>
 
                             </p>
 
                             <ul>
 
                             <ul>
 
                                 <li>
 
                                 <li>
                                     <p class="lead">1. <b>Quick vacuum minimization</b> This is the simplest method. For this purpose we will use the Chimera program and within its tools we will choose the "Energy minimization". Then we will adjust the parameters to what is most convenient for us
+
                                     <p class="lead">1. <b>Quick vacuum minimization</b> This is the simplest method. For this purpose, we will use the Chimera program and we will choose the "Energy minimization" option. Then we will adjust the parameters that suit our requirements better.
 
                                     </p>       
 
                                     </p>       
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
                                     <p class="lead">2. <b>Minimization in aqueous medium: </b> Complex method, for which we recommend looking for a tutorial to understand the management of the programs used and the necessary concepts. In this, more complex case, we will use the VMD program (Visual Molecular Dynamics). It will be necessary to build a structure around the protein with the "Automatic PSF builder" tool, which includes water molecules and ions. Subsequently a simulation of energy minimization will be carried out with the tool "AutoIMD" which in turn calls an external program (NAND) that must be installed on the PC. Finally, the resulting file should be saved and opened in Chimera to eliminate all water molecules and leave only the structure of the protein.
+
                                     <p class="lead">2. <b>Minimization in aqueous medium: </b> It is a complex method. We recommend to look for a tutorial to understand how to use the programs and some of the basic concepts involved. We will use the VMD program (Visual Molecular Dynamics). It will be necessary to build a structure around the protein with the "Automatic PSF builder" tool, which includes water molecules and ions. Subsequently, a simulation of energy minimization will be carried out with the tool "AutoIMD" which in turn calls an external program (NAND) that must be installed on the PC. Finally, the resulting file should be saved and opened in Chimera to eliminate all water molecules and leave only the structure of the protein.
 +
 
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
Line 156: Line 157:
 
                             </p>
 
                             </p>
 
                             <img class= "figureimage" alt="Figure2" src="https://static.igem.org/mediawiki/2018/2/23/T--Madrid-OLM--Aptamer--Modelization--OleE1ribbsurf.png" style="width:75%;"/>
 
                             <img class= "figureimage" alt="Figure2" src="https://static.igem.org/mediawiki/2018/2/23/T--Madrid-OLM--Aptamer--Modelization--OleE1ribbsurf.png" style="width:75%;"/>
                             <p class="lead" style="margin-left:10%; margin-right:10%;">Figure 2. Ole E1 after an energy minimization process represented as ribbon and as ASA (Accessible Surface Area). Made by Chimera tool..
+
                             <p class="lead" style="margin-left:10%; margin-right:10%;">Figure 2. Ole E1 after an energy minimization process represented as ribbon and as ASA (Accessible Surface Area). Made by Chimera tool.
 
                             </p>
 
                             </p>
 
                              
 
                              
Line 172: Line 173:
 
                             <h2>Obtaining the 3D aptamer structure</h2>
 
                             <h2>Obtaining the 3D aptamer structure</h2>
 
                             <p class="lead">
 
                             <p class="lead">
                                 Previously there were two options, that the protein had a known structure or not. In this case it is a DNA chain that we have personally obtained in the laboratory, so the possibility of finding a structure published for that exact sequence it's almost zero. In this way the only case to consider is to try to predict its structure. This action presents a series of drawbacks nowadays. While for proteins there are numerous methods and servers with high reliability, for nucleic acids there are hardly any servers and prediction methods. This is due to the structure of proteins has always been very important and varied for its function, but from the genetic material, the important thing has always been its code, whether in the form of DNA or RNA. For the case of RNAs, there is more than one function, reason why in certain occasions if it has been attended to the need to know their structures but it has not been developed correctly. Even so, we propose a series of steps to predict the structure of the aptamer. Although this predicted structure will not be reliable and it is important to understand this, it can serve us for our objective of proposing an improvement in the initial sequence.
+
                                 Previously there were two options, that the protein had a known structure or not. In this case it is a DNA chain that we have personally obtained in the laboratory, so the possibility of finding a structure published for that exact sequence is almost zero.  
 +
                            </p>
 +
                            <p class="lead">
 +
                                The only possible way is to try to predict its structure. Nowadays, this action presents a series of drawbacks. While for proteins there are numerous methods and servers with high reliability, for nucleic acids there are hardly any servers and prediction methods. We must consider that he structure of proteins has always been very important and varied depending on its function. On the other hand, the most important part of the genetic material has always been its code, whether in the form of DNA or RNA.
 +
                            </p>
 +
                            <p class="lead">
 +
                                For the case of RNAs, there is more than one function, that is the reason why in certain occasions the need to know their structures has been taken into account, although it has not been developed correctly.  
 +
                            </p>
 +
                            <p class="lead">
 +
                                We propose a series of steps to predict the structure of the aptamer. Although this predicted structure will not be reliable and it is important to understand this, it can fulfill  our objective of proposing an improvement in the initial sequence.  
 
                             </p>
 
                             </p>
 
                             <h4>Step 1: Obtaining the secondary structure</h4>
 
                             <h4>Step 1: Obtaining the secondary structure</h4>
 
                             <p class="lead">
 
                             <p class="lead">
                                 As there is no single server as in proteins to obtain a final result giving only the sequence, we have to divide the process into different stages. In this first stage we need to obtain a secondary structure of the single strand of DNA. The server we recommend is the MFold server of the University of Albany. It provides both a graphic map of the links between bases, and the secondary structure in text format (in the Vienna format that we need later), as we show in this example:
+
                                 As there is no single server, as in proteins, to obtain a final result giving only the sequence, we have to divide the process into different stages: In this first stage, we need to obtain a secondary structure of the single strand of DNA. The server we recommend is the MFold server of the University of Albany. It provides both a graphic map of the links between bases and the secondary structure in text format (in the Vienna format that we need later), as we show in this example:
 
                             </p>
 
                             </p>
 
                             <p class="lead" style="text-align: center;">
 
                             <p class="lead" style="text-align: center;">
Line 183: Line 193:
 
                             </p>
 
                             </p>
 
                             <img class= "figureimage" alt="Figure3" src="https://static.igem.org/mediawiki/2018/5/59/T--Madrid-OLM--Aptamer--Modelization--MapAptamer2Structure.png" style="width:40%;"/>
 
                             <img class= "figureimage" alt="Figure3" src="https://static.igem.org/mediawiki/2018/5/59/T--Madrid-OLM--Aptamer--Modelization--MapAptamer2Structure.png" style="width:40%;"/>
                             <p class="lead" style="margin-left:20%; margin-right:10%;">Figure 3. Example image of the links between base pairs that the server provides.
+
                             <p class="lead" style="margin-left:20%; margin-right:10%;">Figure 3. Example image of the links between base pairs that the server provides
 
                             </p>
 
                             </p>
 
                             <h4>Step 2: Prediction of 3D structure</h4>
 
                             <h4>Step 2: Prediction of 3D structure</h4>
 
                             <p class="lead">
 
                             <p class="lead">
                                 The next step is to predict the tertiary structure. The only server capable of doing that nowadays is ROSIE and it requires the secondary structure that we have previously obtained in Vienna format in addition to the sequence. This server predicts RNA structures (not DNA), so it will be necessary change the T for the U in the sequence.
+
                                 The next step is to predict the tertiary structure. The only server capable of doing that nowadays is ROSIE and it requires the secondary structure that we have previously obtained in Vienna format in addition to the sequence. This server predicts RNA structures (not DNA), so it will be necessary to change the T for the U in the sequence.
 
                             </p>
 
                             </p>
 
                             <p class="lead">
 
                             <p class="lead">
                                 The result obtained will be a series of three-dimensional RNA structure proposals in .pdb format, so a posteriori it will be necessary to make file modifications to eliminate the excess atoms and modify the links so that it can be identified as a DNA chain.
+
                                 The result obtained will be a series of three-dimensional RNA structure proposals in .pdb format, so a posteriori it will be necessary to make file modifications to eliminate the excess atoms and modify the links so it can be identified as a DNA chain.
 
                             </p>
 
                             </p>
 
                             <p class="lead">
 
                             <p class="lead">
                                 We must take into account that the algorithm used by this server is not as tight as in proteins and usually commit some deviations in the ab initio simulation. It is particularly very important to know if our aptamer has incorporated some ion because this method is not taken into account by aptamers that incorporate external molecules and in those cases, it usually deviates a lot in the final result.
+
                                 We must take into account that the algorithm used by this server is not as refined as in proteins and usually commits some deviations in the ab initio simulation. It is particularly very important to know if our aptamer has incorporated some ions because this method does not take into account aptamers that incorporate external molecules. In these cases, the predicted structure deviates a lot from the real one.
 
                             </p>
 
                             </p>
 
                             <p class="lead">
 
                             <p class="lead">
                                 Subsequently it will be necessary to perform a process of energy minimization as explained in the previous section, choosing the method that is preferred to have the most stable molecule.
+
                                 Subsequently, it will be necessary to perform a process of energy minimization as explained in the previous section, choosing the method that is preferred to have the most stable molecule.
 
                             </p>
 
                             </p>
 
                             <img class= "figureimage" alt="Figure4" src="https://static.igem.org/mediawiki/2018/e/e9/T--Madrid-OLM--Aptamer--Modelization--Aptamer3Dcompare.png" style="width:80%;"/>
 
                             <img class= "figureimage" alt="Figure4" src="https://static.igem.org/mediawiki/2018/e/e9/T--Madrid-OLM--Aptamer--Modelization--Aptamer3Dcompare.png" style="width:80%;"/>
                             <p class="lead" style="margin-left:10%; margin-right:10%;">Figure 4. Trombine aptamer structure comparison between the crystallographed in a lab (blue) and the prediction one (grey). We can see that the blue one have a ion that conditions the structure.
+
                             <p class="lead" style="margin-left:10%; margin-right:10%;">Figure 4. Thrombin aptamer structure comparison between the crystallographed in a lab (blue) and the predicted (grey). We can see that the blue one has an ion that conditions the structure.
 
                             </p>
 
                             </p>
 
                             <p class="lead">
 
                             <p class="lead">
                                 <b>Fact: </b>We tried to fine-tune this step with an aptamer already characterized by thrombin. Although theoretically, the protocol was logical, the results obtained with the set-up led us to conclude that we did not have it prepared yet due to the high error rate. We tried to fine-tune this step with an aptamer already characterized by thrombin. Although theoretically, the protocol was logical, the results obtained with the set-up led us to conclude that we did not have it ready yet due to the high error rate.
+
                                 <b>Fact: </b>We tried to fine-tune this step with an aptamer already characterized by thrombin. Although theoretically, the protocol was logical, the results obtained with the set-up led us to conclude that we did not have it ready yet due to the high error rate.  
 
                             </p>
 
                             </p>
 
                         </div>
 
                         </div>
Line 217: Line 227:
 
                             <h2>Docking procces and mutation proposal</h2>
 
                             <h2>Docking procces and mutation proposal</h2>
 
                             <p class="lead">
 
                             <p class="lead">
                                 After obtaining the structures of the protein and the aptamer, we need to know in which area and with which orientation the interaction between them takes place; Studying which pairs of bases and which amino acids are involved in this interaction and propose a change of 1 or 2 bases in order to make this interaction stronger.
+
                                 After obtaining the structures of the protein and the aptamer, we need to know in which area and with which orientation the interaction between them takes place. To do so, we will study which pairs of bases and which amino acids are involved in this interaction, and then we will propose a change of 1 or 2 bases in order to make this interaction stronger.
 
                             </p>
 
                             </p>
 
                             <h4>Docking</h4>
 
                             <h4>Docking</h4>
 
                             <p class="lead">
 
                             <p class="lead">
                                 The docking is the computational simulation method in which the ideal orientation of an organic molecule is calculated when interacting with another one forming a stable complex (if it exists). Generally this process is designed for protein-protein interactions, the reason why most servers are optimized for these cases. However, we recommend using the NPDock server. This server allows the simulation of the interaction between a protein and a single-stranded nucleic acid molecule, having to identify whether it is DNA or RNA before sending the work.
+
                                 The docking is the computational simulation method in which the ideal orientation of an organic molecule is calculated when interacting with another one forming a stable complex (if it exists). Generally this process is designed for protein-protein interactions. This is the reason why most servers are optimized for these cases. However, we recommend using the NPDock server. This server allows to perform the simulation of the interaction between a protein and a single-stranded nucleic acid molecule, having to identify whether it is DNA or RNA before sending the work.
 
                             </p>
 
                             </p>
 
                             <img class= "figureimage" alt="Figure5" src="https://static.igem.org/mediawiki/2018/a/a6/T--Madrid-OLM--Aptamer--Modelization--NPDockResult.png" style="width:80%;"/>
 
                             <img class= "figureimage" alt="Figure5" src="https://static.igem.org/mediawiki/2018/a/a6/T--Madrid-OLM--Aptamer--Modelization--NPDockResult.png" style="width:80%;"/>
                             <p class="lead" style="margin-left:10%; margin-right:10%;">Figure 5. Interaction between thrombine and an aptamer simulated by the server NPDock.
+
                             <p class="lead" style="margin-left:10%; margin-right:10%;">Figure 5. The interaction between thrombin and an aptamer simulated by the server NPDock.
 
                             </p>
 
                             </p>
 
                             <h4>Proposal of mutation</h4>
 
                             <h4>Proposal of mutation</h4>
 
                             <p class="lead">
 
                             <p class="lead">
                                 With the .pdb file obtained from the previous step, we only have the most subjective step. In this case we have to perform a study of the interaction and see which atoms interact with each other. For this, the most useful thing is to use a visualizer like the ones we mentioned at the beginning (Chimera or phymol). With the evaluation of the different options will have to interpret if making any change in the aptamer, these interactions could improve
+
                                 With the .pdb file obtained from the previous step, we only have the most subjective step. In this case we have to perform a study of the interaction and see which atoms interact with each others. For this purpose, the most useful thing is to use a visualizer like the ones that we mentioned at the beginning (Chimera or Phymol). With the evaluation of the different options we will have to interpret if these interactions could improve by making any change in the aptamer.
 
                             </p>
 
                             </p>
 
                             <p class="lead">
 
                             <p class="lead">
Line 249: Line 259:
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="https://www.uniprot.org/" >UniProt:</a>Database with all proteins and the related information.
+
                                         <a href="http://www.uniprot.org/" >UniProt:</a> Database with all proteins and the related information.
 
                                     </p>   
 
                                     </p>   
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="https://www.rcsb.org/" >PDB:</a>Database with all 3D shapes of protein, nucleic acids, and complex assemblies.
+
                                         <a href="http://www.rcsb.org/" >PDB:</a>D atabase with all 3D shapes of protein, nucleic acids, and complex assemblies.
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="https://pymol.org/2/" >Phymol:</a>Is a user-sponsored molecular visualization system on an open-source foundation, maintained and distributed by Schrödinger.
+
                                         <a href="http://pymol.org/2/" >Phymol:</a> Is a user-sponsored molecular visualization system on an open-source foundation, maintained and distributed by Schrödinger.
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="https://www.cgl.ucsf.edu/chimera/" >Chimera:</a>A highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles.
+
                                         <a href="http://www.cgl.ucsf.edu/chimera/" > Chimera:</a> A highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="https://www.ks.uiuc.edu/Research/vmd/" >VMD:</a>A molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.
+
                                         <a href="http://www.ks.uiuc.edu/Research/vmd/" > VMD:</a> A molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="https://swissmodel.expasy.org/" >Swiss-Model:</a>A fully automated protein structure homology-modelling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).
+
                                         <a href="http://swissmodel.expasy.org/" > Swiss-Model:</a> A fully automated protein structure homology-modelling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="https://zhanglab.ccmb.med.umich.edu/I-TASSER/" >I-tasser:</a>A hierarchical approach to protein structure and function prediction.
+
                                         <a href="http://zhanglab.ccmb.med.umich.edu/I-TASSER/" > I-tasser:</a> A hierarchical approach to protein structure and function prediction.
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="http://robetta.bakerlab.org/" >Robbeta:</a>   Ab-inition full chain protein structures prediction server.
+
                                         <a href="http://robetta.bakerlab.org/" > Robbeta:</a> Ab-inition full chain protein structures prediction server.
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="http://unafold.rna.albany.edu/?q=mfold/DNA-Folding-Form" >Albany University MFold:</a>     Web server for the prediction of secundary structures in DNA single-strand chains.
+
                                         <a href="http://unafold.rna.albany.edu/?q=mfold/DNA-Folding-Form" > Albany University MFold:</a> Web server for the prediction of secondary structures in DNA single-strand chains.
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="http://rosie.rosettacommons.org/rna_denovo/submit" >Rossie:</a>   Web server for ab-initio RNA single-stand structure prediction.
+
                                         <a href="http://rosie.rosettacommons.org/rna_denovo/submit" > Rossie:</a> Web server for ab-initio RNA single-stand structure prediction.
 
                                     </p>
 
                                     </p>
 
                                 </li>
 
                                 </li>
 
                                 <li>
 
                                 <li>
 
                                     <p class="lead">
 
                                     <p class="lead">
                                         <a href="http://genesilico.pl/NPDock" >NPDock:</a> Web server for modeling of RNA-protein and DNA-protein complex structures.
+
                                         <a href="http://genesilico.pl/NPDock" > NPDock:</a> Web server for modelling of RNA-protein and DNA-protein complex structures.  
                                    </p>
+
 
                                 </li>
 
                                 </li>
 
                             </ul>
 
                             </ul>

Latest revision as of 00:51, 18 October 2018

Madrid-OLM

Computational improvement of the aptamer

Computational improvement of the aptamer

The target of this section is to improve the affinity of the union between the protein and the aptamer, obtained by the process that has been explained previously, by bioinformatics methods.

To carry out this affinity improvement, you can start from only 2 elements: the DNA aptamer sequence and the name or ,preferably, the amino acid protein sequence. However, any extra information that one could find in previous studies could be so helpful as to save time and reduce the margin of acumulative error. The complete path has four different sections:

  • • Obtaining the 3D protein structure

  • • Obtaining the 3D aptamer structure (most critical point).

  • • A Search of the binding site between protein and aptamer through a docking process.

  • • Study of the union and proposal of mutation in the sequence.


It is very important to understand that this process does not guarantee a better result. However it gives the opportunity to improve the aptamer obtained. Although there are many works with proteins structures and the interaction between them, there are almost no studies that work with nucleic acids in simulation or structure prediction of the structures beyond double strand helix. Therefore any final results obtained should be checked in the laboratory, either to confirm or discard the new sequence obtained. Important to understand that this process does not guarantee a better result. However it gives the opportunity to improve the aptamer obtained.


You could see all the server that we will mention in the link section:

Links of interest

Obtaining the 3D protein structure

The final result of this section should be a file in .pdb format where we can find the coordinates of every single atom of the protein. That file could be interpreted by bioinformatics programs like Phymol or Chimera and represent the protein in different graphical ways. To obtain it, there are two different paths depending on one fact: has the protein been previously crystallographed?

A cristallographed structure exists

We will simply use this proven structure. This will be the most reliable method as it is the experimental result of any research group. To check if there is any previous cristallographed structure we recommend the following simple steps:

  • 1. Access the UniProt server and search for the protein of interest.

  • 2. Search for the structure description area and check if it has an entry to the Protein Data Bank (PDB) database.

  • 3. In the positive case, access that PDB entry and download the file that is needed.

  • 4. In the case of no entry, the steps explained in the next section must be performed.


Clarification, the PDB is a universal database where are the 3D structure of all proteins which have been obtained its structure experimentally. That is, any structure that we find there is quite reliable and proven. However, predictions of structures are not included, owing to this procedure is not experimental.

A cristallographed structure does not exist

In this case it is necessary to start with a structure prediction process. Although this is not the optimal method because we could start to accumulate error, we can obtain very reliable results. Several different methods can be used for this purpose. We recommend the following two and in this order:

  • 1. Homology methods: This method looks for sequences of proteins, that are included in the PDB, that have a high similarity with our protein and adopt its 3D structure. If the result shows a similarity of more than 30% in the entire sequence, we could be certain about the structure. For this step, we recommend the Swiss-Model server. Otherwise, if it gives an inferior value, we should move on to the next step.

  • 2. I-Tasser (Iterative Threading ASSEmbly Refinement): This server uses several methods, as they define themselves: “a hierarchical approach to protein structure and function prediction. It first identifies structural templates from the PDB by multiple threading approach LOMETS, with full-length atomic models constructed by iterative template fragment assembly simulations.” This method is currently the most reliable and has won the last 6 editions of the CAP tests.

  • 3. As an extra option we mention Robetta: It is an ab-initio simulation method, whose aim is to create the protein from scratch, using physical simulation and brute force. It works better for short sequences as it may take a long time to get reliable results.


Energy minimization.

This last step is completely recommendable whatever the origin of the file with the structure is. It is necessary to have the structure in a state, as stable as possible. To minimize the structure we can use two strategies:

  • 1. Quick vacuum minimization This is the simplest method. For this purpose, we will use the Chimera program and we will choose the "Energy minimization" option. Then we will adjust the parameters that suit our requirements better.

  • 2. Minimization in aqueous medium: It is a complex method. We recommend to look for a tutorial to understand how to use the programs and some of the basic concepts involved. We will use the VMD program (Visual Molecular Dynamics). It will be necessary to build a structure around the protein with the "Automatic PSF builder" tool, which includes water molecules and ions. Subsequently, a simulation of energy minimization will be carried out with the tool "AutoIMD" which in turn calls an external program (NAND) that must be installed on the PC. Finally, the resulting file should be saved and opened in Chimera to eliminate all water molecules and leave only the structure of the protein.


Figure1

Figure 1. Comparison of the structure of Ole E1 before (purple) and after (blue) an energy minimization process.

Figure2

Figure 2. Ole E1 after an energy minimization process represented as ribbon and as ASA (Accessible Surface Area). Made by Chimera tool.

Obtaining the 3D aptamer structure

Previously there were two options, that the protein had a known structure or not. In this case it is a DNA chain that we have personally obtained in the laboratory, so the possibility of finding a structure published for that exact sequence is almost zero.

The only possible way is to try to predict its structure. Nowadays, this action presents a series of drawbacks. While for proteins there are numerous methods and servers with high reliability, for nucleic acids there are hardly any servers and prediction methods. We must consider that he structure of proteins has always been very important and varied depending on its function. On the other hand, the most important part of the genetic material has always been its code, whether in the form of DNA or RNA.

For the case of RNAs, there is more than one function, that is the reason why in certain occasions the need to know their structures has been taken into account, although it has not been developed correctly.

We propose a series of steps to predict the structure of the aptamer. Although this predicted structure will not be reliable and it is important to understand this, it can fulfill our objective of proposing an improvement in the initial sequence.

Step 1: Obtaining the secondary structure

As there is no single server, as in proteins, to obtain a final result giving only the sequence, we have to divide the process into different stages: In this first stage, we need to obtain a secondary structure of the single strand of DNA. The server we recommend is the MFold server of the University of Albany. It provides both a graphic map of the links between bases and the secondary structure in text format (in the Vienna format that we need later), as we show in this example:

GTGACGTAGGTTGGTGTGGTTGGGGCGTCAC
(((((((.................)))))))

Figure3

Figure 3. Example image of the links between base pairs that the server provides

Step 2: Prediction of 3D structure

The next step is to predict the tertiary structure. The only server capable of doing that nowadays is ROSIE and it requires the secondary structure that we have previously obtained in Vienna format in addition to the sequence. This server predicts RNA structures (not DNA), so it will be necessary to change the T for the U in the sequence.

The result obtained will be a series of three-dimensional RNA structure proposals in .pdb format, so a posteriori it will be necessary to make file modifications to eliminate the excess atoms and modify the links so it can be identified as a DNA chain.

We must take into account that the algorithm used by this server is not as refined as in proteins and usually commits some deviations in the ab initio simulation. It is particularly very important to know if our aptamer has incorporated some ions because this method does not take into account aptamers that incorporate external molecules. In these cases, the predicted structure deviates a lot from the real one.

Subsequently, it will be necessary to perform a process of energy minimization as explained in the previous section, choosing the method that is preferred to have the most stable molecule.

Figure4

Figure 4. Thrombin aptamer structure comparison between the crystallographed in a lab (blue) and the predicted (grey). We can see that the blue one has an ion that conditions the structure.

Fact: We tried to fine-tune this step with an aptamer already characterized by thrombin. Although theoretically, the protocol was logical, the results obtained with the set-up led us to conclude that we did not have it ready yet due to the high error rate.

Docking procces and mutation proposal

After obtaining the structures of the protein and the aptamer, we need to know in which area and with which orientation the interaction between them takes place. To do so, we will study which pairs of bases and which amino acids are involved in this interaction, and then we will propose a change of 1 or 2 bases in order to make this interaction stronger.

Docking

The docking is the computational simulation method in which the ideal orientation of an organic molecule is calculated when interacting with another one forming a stable complex (if it exists). Generally this process is designed for protein-protein interactions. This is the reason why most servers are optimized for these cases. However, we recommend using the NPDock server. This server allows to perform the simulation of the interaction between a protein and a single-stranded nucleic acid molecule, having to identify whether it is DNA or RNA before sending the work.

Figure5

Figure 5. The interaction between thrombin and an aptamer simulated by the server NPDock.

Proposal of mutation

With the .pdb file obtained from the previous step, we only have the most subjective step. In this case we have to perform a study of the interaction and see which atoms interact with each others. For this purpose, the most useful thing is to use a visualizer like the ones that we mentioned at the beginning (Chimera or Phymol). With the evaluation of the different options we will have to interpret if these interactions could improve by making any change in the aptamer.

The proposed mutation cannot be accepted blindly. It will be necessary to synthesize the proposed aptamer and check in the laboratory if its affinity is superior to that of the original aptamer.