Difference between revisions of "Team:Vilnius-Lithuania-OG/ProteinGAN"

Revision as of 00:26, 18 October 2018 (view source)

Donatas repecka (Talk | contribs)

← Older edit

Latest revision as of 02:52, 18 October 2018 (view source)

LaurynasK (Talk | contribs)

(15 intermediate revisions by 2 users not shown)

Line 36:

−

~~<script src="https://2018.igem.org/common/MathJax-2.5-latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">~~

Line 93:

Line 92:

−

~~<a href="index.php" class="retina-logo" data-dark-logo="images/logo-dark@2x.png"><img src="images/logo@2x.png"></a>~~

+

</div>

Line 116:

Line 115:

−

+

−

<h2 data-caption-animate="fadeInUp" style="color:aliceblue; ">~~ProteinGAN~~</h2>

+

<h2 data-caption-animate="fadeInUp" style="color:aliceblue; ">Protein GAN</h2>

</div>

Line 151:

Line 150:

<li><a href="#" data-href="#section-building">Building ProteinGAN</a></li>

<li><a href="#" data-href="#section-results">Results</a></li>

+

<li><a href="#" data-href="#section-deeper">Deeper look at ProteinGAN</a></li>

</ul>

</nav>

Line 184:

<p>After seeing multiple successful applications of GAN (generative adversarial networks) in numerous of fields, we have decided to apply them to the <strong>field of synthetic biology</strong> for the <strong>creation of novel biological parts</strong> with useful functions. More specifically, we were interest in world’s cleanest and environmentally friendly catalyzers - <strong>enzymes</strong>. For many important reactions used in research or industry <strong>we don’t have the appropriate enzymes</strong> to catalyze them, and have not other option but to use chemical catalyzers.</p>

−

<p>Thus, we have decided to build the <strong>world’s first Protein Generative Adversarial Network</strong> (ProteinGAN) which would be capable to learn “what makes protein a protein”. We have started by acquiring and standardizing large number of protein sequences from public databases, which all had a specific class attributed to them.</p>

+

<p>Thus, we have decided to build the <strong>world’s first Protein sequence Generative Adversarial Network</strong> (ProteinGAN) which would be capable to learn “what makes protein a protein”. We have started by acquiring and standardizing large number of protein sequences from public databases, which all had a specific class attributed to them.</p>

<p><br />After in-depth literature analysis and a large number of in-silico prototypes we have built the appropriate GAN architecture for protein work. Finally - we have trained the neural networks with specific classes of enzymes. We have hoped they would learn how to generate the class of enzymes they were trained for, yet also deliver unique protein sequences for that class.</p>

−

<p>All important technical details, architectural choices and detailed explanation of how ProteinGAN works can be found at the end of the page.</p>

+

<p>All important technical details, architectural choices and detailed explanation of how ProteinGAN works can be found at the <a href="https://2018.igem.org/Team:Vilnius-Lithuania-OG/ProteinGAN#section-deeper">end of the page.</a></p>

−

<p>In addition to that, we also provided <strong>a short guide on how to build and train your own ProteinGAN</strong>!</p>

+

<p>In addition to that, we also provided <strong> <a href="https://2018.igem.org/Team:Vilnius-Lithuania-OG/ReactionGAN#section-run">a short guide on how to build and train your own ProteinGAN </a></strong>!</p>

Line 355:

−

+

<h2>Deeper look at ProteinGAN</h2>

Line 464:

<p>Using the scores from discriminator, each part of the GAN is evaluated using loss function. </p>

−

<~~p> \text{Discriminator loss}~~ = ~~min(0, 1~~ - ~~\sum_{i~~=~~0}^n D(real_i)) + min(0, 1 + \sum_{i=0}^n D(G(noise_i)) ) \(~~T~~(t)\)~~</p>

+

+

Line 527:

Line 528:

</div>

−

<p>Given original GAN formulation, there is nothing to prevent generator from generating a single, very realistic example to fool the discriminator. Such scenario is known as mode collapse. It happens when generator learns to ignore the input (random numbers). Logically, it is an efficient way for generator to start generating examples that could fool discriminator. However, it is not desirable behaviour and it eventually cripples the training as discriminator can easily remember generated examples. While working with proteins, we observed that this issue is even more severe in comparison to images. In scientific community, a lot of different approaches were proposed to address the mode collapse issue: Unrolled GAN (Metz et al., 2017), Dual Discriminator (Nguyen et al., 2017), Mini batch Discriminator (Salimans et al., 2016) to name a few. We preferred Mini Batch Discriminator approach due to its simplicity and minimal overhead. Mini Batch Discriminator works an extra layer in the network that computes the standard deviation across the batch of examples (batch contains only real, or only fake sequences). If the batch contains a small variety of examples standard deviation will be low and discriminator will be able to use this information to lower the final score for each example in the batch. ProteinGAN ~~and ReactionGAN follow~~ the approach proposed by authors of Progressively growing GAN (Karras et al., 2018).

+

<p>Given original GAN formulation, there is nothing to prevent generator from generating a single, very realistic example to fool the discriminator. Such scenario is known as mode collapse. It happens when generator learns to ignore the input (random numbers). Logically, it is an efficient way for generator to start generating examples that could fool discriminator. However, it is not desirable behaviour and it eventually cripples the training as discriminator can easily remember generated examples. While working with proteins, we observed that this issue is even more severe in comparison to images. In scientific community, a lot of different approaches were proposed to address the mode collapse issue: Unrolled GAN (Metz et al., 2017), Dual Discriminator (Nguyen et al., 2017), Mini batch Discriminator (Salimans et al., 2016) to name a few. We preferred Mini Batch Discriminator approach due to its simplicity and minimal overhead. Mini Batch Discriminator works an extra layer in the network that computes the standard deviation across the batch of examples (batch contains only real, or only fake sequences). If the batch contains a small variety of examples standard deviation will be low and discriminator will be able to use this information to lower the final score for each example in the batch. ProteinGAN follows the approach proposed by authors of Progressively growing GAN (Karras et al., 2018).

</p>

−

+

<p><a href="https://2018.igem.org/Team:Vilnius-Lithuania-OG/ReactionGAN"> Click here to find what we did next </a></p>

Blast hit number	Identity	e.value	Enzyme Class
1	89%	2e-117	DNA Helicase
2	89%	2e-117	DNA Helicase
3	89%	1e-116	DNA Helicase

Difference between revisions of "Team:Vilnius-Lithuania-OG/ProteinGAN"

Latest revision as of 02:52, 18 October 2018

Protein GAN

Introducton

Gathering the Protein Sequence data

Building the ProteinGAN

ProteinGAN results

Conclusions

Deeper look at ProteinGAN

Data preprocessing

Training process of ProteinGAN

ProteinGAN Architecture details

The addition of Dilation into ProteinGAN

Self-Attention

Spectrum normalization

Mode collapse

@@ Line 36: / Line 36: @@
 	<script src="js/custom.js"></script>
 	<link rel="stylesheet" href="https://2018.igem.org/Template:Vilnius-Lithuania-OG/css/custom?action=raw&amp;ctype=text/css"/>
-<script src="https://2018.igem.org/common/MathJax-2.5-latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
 <style> .firstHeading{ display: none;} .mw-body{ display: none;} #globalWrapper{ margin-top:-1%;} #header.sticky-header #header-wrap {top: 13px;} #page-menu.sticky-page-menu #page-menu-wrap {margin-top: 13px}</style>
@@ Line 93: / Line 92: @@
 							<div id="logo">
 								<a href="index.php" class="standard-logo" data-dark-logo="images/logo-dark.png"><img src="https://static.igem.org/mediawiki/2018/1/18/T--Vilnius-Lithuania-OG--logo-black.png"></a>
-								<a href="index.php" class="retina-logo" data-dark-logo="images/logo-dark@2x.png"><img src="images/logo@2x.png"></a>
 							</div><!-- #logo end -->
@@ Line 116: / Line 115: @@
 				<div class="swiper-container swiper-parent">
 					<div class="swiper-wrapper">
-						<div class="swiper-slide" style="background-image: url('images/softwaremotivation/header.jpg');">
+						<div class="swiper-slide" style="background-image: url('https://static.igem.org/mediawiki/2018/d/d7/T--Vilnius-Lithuania-OG--protgan.jpg');">
 							<div class="container clearfix">
 								<div class="slider-caption slider-caption-center">
-									<h2 data-caption-animate="fadeInUp" style="color:aliceblue; ">ProteinGAN</h2>
+									<h2 data-caption-animate="fadeInUp" style="color:aliceblue; ">Protein GAN</h2>
 									<p class="d-none d-sm-block" data-caption-animate="fadeInUp" data-caption-delay="400"></p>
 								</div>
@@ Line 151: / Line 150: @@
 							<li><a href="#" data-href="#section-building">Building ProteinGAN</a></li>
 							<li><a href="#" data-href="#section-results">Results</a></li>
+<li><a href="#" data-href="#section-deeper">Deeper look at ProteinGAN</a></li>
 						</ul>
 					</nav>
@@ Line 184: / Line 184: @@
 			<p>After seeing multiple successful applications of GAN (generative adversarial networks) in numerous of fields, we have decided to apply them to the <strong>field of synthetic biology</strong> for the <strong>creation of novel biological parts</strong> with useful functions. More specifically, we were interest in world&rsquo;s cleanest and environmentally friendly catalyzers - <strong>enzymes</strong>. For many important reactions used in research or industry <strong>we don&rsquo;t have the appropriate enzymes</strong> to catalyze them, and have not other option but to use chemical catalyzers.</p>
-			<p>Thus, we have decided to build the <strong>world&rsquo;s first Protein Generative Adversarial Network</strong> (ProteinGAN) which would be capable to learn &ldquo;what makes protein a protein&rdquo;. We have started by acquiring and standardizing large number of protein sequences from public databases, which all had a specific class attributed to them.</p>
+			<p>Thus, we have decided to build the <strong>world&rsquo;s first Protein sequence Generative Adversarial Network</strong> (ProteinGAN) which would be capable to learn &ldquo;what makes protein a protein&rdquo;. We have started by acquiring and standardizing large number of protein sequences from public databases, which all had a specific class attributed to them.</p>
 			<p><br />After in-depth literature analysis and a large number of in-silico prototypes we have built the appropriate GAN architecture for protein work. Finally - we have trained the neural networks with specific classes of enzymes. We have hoped they would learn how to generate the class of enzymes they were trained for, yet also deliver unique protein sequences for that class.</p>
-			<p>All important technical details, architectural choices and detailed explanation of how ProteinGAN works can be found at the end of the page.</p>
+			<p>All important technical details, architectural choices and detailed explanation of how ProteinGAN works can be found at the <a href="https://2018.igem.org/Team:Vilnius-Lithuania-OG/ProteinGAN#section-deeper">end of the page.</a></p>
-			<p>In addition to that, we also provided <strong>a short guide on how to build and train your own ProteinGAN</strong>!</p>
+			<p>In addition to that, we also provided <strong> <a href="https://2018.igem.org/Team:Vilnius-Lithuania-OG/ReactionGAN#section-run">a short guide on how to build and train your own ProteinGAN </a></strong>!</p>
@@ Line 355: / Line 355: @@
+                        <section id="section-deeper" class="page-section">
 			<div class="fancy-title title-border-color">
 				<h2>Deeper look at ProteinGAN</h2>
@@ Line 464: / Line 464: @@
 			<p>Using the scores from discriminator, each part of the GAN is evaluated using loss function. </p>
-			<p> \text{Discriminator loss} = min(0, 1 - \sum_{i=0}^n D(real_i)) + min(0, 1 + \sum_{i=0}^n D(G(noise_i)) ) \(T(t)\)</p>
+			<img style=" width: 65%; display: block; margin-left: auto; margin-right: auto; margin-bottom: 5%;" src="https://static.igem.org/mediawiki/2018/e/ef/T--Vilnius-Lithuania-OG--Formula.gif">
+<img style=" width: 35%; display: block; margin-left: auto; margin-right: auto; margin-bottom: 5%;" src="https://static.igem.org/mediawiki/2018/4/40/T--Vilnius-Lithuania-OG--Formula2.gif">
@@ Line 527: / Line 528: @@
 					</div>
-			<p>Given original GAN formulation, there is nothing to prevent generator from generating a single, very realistic example to fool the discriminator. Such scenario is known as mode collapse. It happens when generator learns to ignore the input (random numbers). Logically, it is an efficient way for generator to start generating examples that could fool discriminator. However, it is not desirable behaviour and it eventually cripples the training as discriminator can easily remember generated examples. While working with proteins, we observed that this issue is even more severe in comparison to images. In scientific community, a lot of different approaches were proposed to address the mode collapse issue: Unrolled GAN (Metz et al., 2017), Dual Discriminator (Nguyen et al., 2017), Mini batch Discriminator (Salimans et al., 2016) to name a few. We preferred Mini Batch Discriminator approach due to its simplicity and minimal overhead. Mini Batch Discriminator works an extra layer in the network that computes the standard deviation across the batch of examples (batch contains only real, or only fake sequences). If the batch contains a small variety of examples standard deviation will be low and discriminator will be able to use this information to lower the final score for each example in the batch. ProteinGAN and ReactionGAN follow the approach proposed by authors of Progressively growing GAN (Karras et al., 2018).
+			<p>Given original GAN formulation, there is nothing to prevent generator from generating a single, very realistic example to fool the discriminator. Such scenario is known as mode collapse. It happens when generator learns to ignore the input (random numbers). Logically, it is an efficient way for generator to start generating examples that could fool discriminator. However, it is not desirable behaviour and it eventually cripples the training as discriminator can easily remember generated examples. While working with proteins, we observed that this issue is even more severe in comparison to images. In scientific community, a lot of different approaches were proposed to address the mode collapse issue: Unrolled GAN (Metz et al., 2017), Dual Discriminator (Nguyen et al., 2017), Mini batch Discriminator (Salimans et al., 2016) to name a few. We preferred Mini Batch Discriminator approach due to its simplicity and minimal overhead. Mini Batch Discriminator works an extra layer in the network that computes the standard deviation across the batch of examples (batch contains only real, or only fake sequences). If the batch contains a small variety of examples standard deviation will be low and discriminator will be able to use this information to lower the final score for each example in the batch. ProteinGAN follows the approach proposed by authors of Progressively growing GAN (Karras et al., 2018).
 				</p>
+<p><a href="https://2018.igem.org/Team:Vilnius-Lithuania-OG/ReactionGAN"> Click here to find what we did next </a></p>
 				<div class="toggle toggle-bg" style="margin-top: 10%;">