Team:Vilnius-Lithuania-OG/Software

Software Motivation

Introducton

The computational content we present in these software sections are directly responsible for the motivation and creation of CAT-Seq!

Just as the iGEM year has started and our Team was formed, we readily acknowledged the main “big picture” problem we want to solve in synthetic biology - the difficulty of creating novel, synthetic parts. In particular we were interest in synthetic catalytic biomolecules.

Catalytic biomolecules are a cleaner and greener substitute to chemical catalyzers used all over the world. The catalytic biomolecules offer an enormous 17 orders of magnitude chemical reaction acceleration as well as excellent stereo-, chemo- and regio-selectivity in aqueous environments. Yet, one of the main drawbacks of using enzymes is that for many important chemical reactions, efficient enzymes have not yet been discovered or engineered.

That said, identifying enzyme amino acid sequence with the required novel or optimized reaction is a challenging task because the sequence space is incomprehensibly large.

The discovery of Generative Adversarial Networks

While exploring the possible ways of solving the previously described problem we have stumbled upon an article describing Generative Adversarial Networks (GANs). As we were not familiar with these relatively new neural networks at that time, we have studied the examples of what it has already achieved. The main idea of GANs is that it can learn the patterns of a given data input in order to generate examples as if they were from a given dataset. For example, if shown enough cat images, it will not start to generate an exact, specific cat image from the dataset, but it will learn how a cat should look in general. That results in a GAN that may produce cat images which do not have analogs in the dataset. A completely new cat!

We have also found articles that describe how GANs were used to generate novel anti-cancer drug molecules that were not tested before, and proven to be effective alter on.

Deep down we already knew the question we wanted to ask - can we generate novel enzymes using the Generative Adversarial Networks?

Setting the goals

Despite the recent success of GANs in various field, to the best of our knowledge, no one attempted to apply this novel technique to proteins. In a way, it can be explained by the great complexity of such task. In order to proceed, one would need to re-design the architecture of GAN networks that work well with images and audio, to architecture that works well with sequences of amino acids.

This is why we have decided to create the world first Generative Adversarial Networks for Enzymatic Sequence generation.

We believe that the development of such framework will facilitate the discovery of novel and useful enzymes and accelerate the field of synthetic biology and protein engineering immensely.

Click here to learn more about GANs