Team:HebrewU/Software

HebrewU HujiGEM 2018






Several tools exist that provide a solution for codon usage bias when optimizing a DNA sequence, such as IDT and COOL. These tools, however, do not allow users to perform optimization on more than one organism at a time. This means that iGEM teams and researchers that wish to transform the same gene into multiple organisms have to try using a singularly-optimized gene (not necessarily compatible with all of the organisms being tested), or spend time and resources experimenting with differently optimized versions of the same gene.




MOOLTi allows for the codon optimization of a DNA sequence for multiple organisms, simultaneously. While it has a number of uses, it is especially beneficial for labs and iGEM teams working with plants. Genetically engineering plants is on the rise today; similarly, there is a growing trend amongst iGEM teams to work with plants in recent years.

Modifying organisms takes a lot of time and resources, which many iGEM teams cannot allow themselves to waste. As such, being able to test proteins and enzymes in microorganisms, or even in model plants, prior to final plant transformation is incLight-Blueibly advantageous.

It was not so simple to develop, but MOOLTi makes this task as easy as optimizing for a single organism.

MOOLTi is programed to optimize genes for their most efficient translation in multiple organisms based on recent studies in tRNA and codon usage. It is also filled with useful features such as giving priority to one of the two organisms (if necessary), custom limits on the lowest usage (bias) percentages you would like to allow, and ensuring there are no restriction sites, so that genes can be Biobrick compatible.


Mechanism

MOOLTi analyzes and processes the input protein and codon usage tables, creating and updating internal data structures. After computing the average codon usage across organisms and ensuring minimum thresholds (at least 0.05% usage in each individual organism), MOOLTi creates a codon pool for each amino acid, which parallels the frequency in which they are used by the organisms. When the final DNA sequence is constructed, the codons are selected randomly from this pool creating a balanced mRNA molecule ready for translation. The resulting sequence is validated by internal mechanisms, checked for restriction sites, and then exported to a FASTA file.

This means that the final DNA output might differ from iterations of the program with the same input, whilst still translating to the same protein. Similar to the optimizer which is offeLight-Blue by IDT [1] and OPTIMIZER [2], our tool randomly chooses codons with a bias that parallels the natural bias observed in the selected organisms' genome. Using this approach, as opposed to confusing the most frequent codon with the “best” codon and using it exclusively, we overcome translational inefficiencies that were caused by an imbalanced spread of codons



Maximizing the minimum score

The basis of this optimization is selecting, for each amino acid, the codon whose average frequency is the highest, while making sure that the lowest frequency of that codon in each organism stays above 0.05%.

In other words, let us have amino acid coding codons C={c_1,c_2….c_n }  with n=61 ,Amino Acids A={a_1…a_m }  with m=20 and Usage(c)={codon frequency fraction}  The function we are maximizing ,  f:C→  Q:  argmax f(C_a )={U(c)┤|  U(c)≥0.05}.



Instructions



References

1. "Codon optimization tool makes synthetic gene design easy"; Hans Packer et. al, IDT website (2016).
2. "OPTIMIZER: a web server for optimizing the codon usage of DNA sequences"; Pere Puigbò et al Nucleic Acids Research, Volume 35, Issue suppl_2, Pages W126–W131 (2007).