(20 intermediate revisions by the same user not shown) | |||
Line 12: | Line 12: | ||
<div class="row"> | <div class="row"> | ||
<div class="col-sm-12"> | <div class="col-sm-12"> | ||
− | <h1 class="tc-white mg-md text-center"> | + | <h1 class="tc-white mg-md text-center" style="line-height:1.3em;"> |
− | <strong> | + | <strong>Project Nanopore </strong><br> |
</h1> | </h1> | ||
</div> | </div> | ||
Line 26: | Line 26: | ||
<div class="row"> | <div class="row"> | ||
<div class="col-sm-11"> | <div class="col-sm-11"> | ||
− | <h2 class="mg-md tc-black"> | + | <h2 class="mg-md tc-black" style="margin-top: 0px;"> |
− | <span class="fa fa-chevron-right"></span> Motivation | + | <span class="fa fa-chevron-right" style="margin-top: 0px; margin-bottom: 25px;"></span> Motivation |
</h2> | </h2> | ||
− | <p class=" text-left"> | + | <p class="text-left"> |
After our interview with the medical professionals, we recognized that the high cost of RNA sequencing would be a big hurdle in providing affordable gene therapy based on RNA editing. Also, the current next-generation sequencing methods do not provide unbiased direct reads of the transcriptome and ignore important modifications on the nucleobases. As such, to improve the affordability and the accuracy of RNA sequencing in the future, we proposed to develop a high-throughput, unbiased and modification-sensitive RNA sequencing method based on nanopore technologies.<br> | After our interview with the medical professionals, we recognized that the high cost of RNA sequencing would be a big hurdle in providing affordable gene therapy based on RNA editing. Also, the current next-generation sequencing methods do not provide unbiased direct reads of the transcriptome and ignore important modifications on the nucleobases. As such, to improve the affordability and the accuracy of RNA sequencing in the future, we proposed to develop a high-throughput, unbiased and modification-sensitive RNA sequencing method based on nanopore technologies.<br> | ||
</p> | </p> | ||
Line 38: | Line 38: | ||
<span class="fa fa-chevron-right"></span> Background | <span class="fa fa-chevron-right"></span> Background | ||
</h2> | </h2> | ||
− | <p> | + | <p style="padding-top:1em;"> |
The human transcriptome contains not only the four canonical nucleobases – adenine (A), uracil (U), cytosine (C), and guanine (G), but also non-canonical ones such as inosine (I), pseudouridine (Ψ), 5-methylcytosine (m<a class="small-letter ltc-black" href="index.html">5</a>C), and many others. These non-canonical nucleobases are naturally present in our cells to regulate gene expression. However, there are also modified bases that are not supposed to occur in healthy cells, which in turn could lead to certain diseases.<br> | The human transcriptome contains not only the four canonical nucleobases – adenine (A), uracil (U), cytosine (C), and guanine (G), but also non-canonical ones such as inosine (I), pseudouridine (Ψ), 5-methylcytosine (m<a class="small-letter ltc-black" href="index.html">5</a>C), and many others. These non-canonical nucleobases are naturally present in our cells to regulate gene expression. However, there are also modified bases that are not supposed to occur in healthy cells, which in turn could lead to certain diseases.<br> | ||
</p> | </p> | ||
− | <h4 class="mg-md text-center"> | + | <img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/e/eb/T--NTU-Singapore--Modified_Bases.jpg" class="img-responsive center-block lazyload" width="70%" height="70%" style="padding: 1em 0em;"/> |
+ | <h4 class="mg-md text-center" style="padding-bottom: 1em;"> | ||
Figure 1. Common Modified Bases in RNA | Figure 1. Common Modified Bases in RNA | ||
− | </h4 | + | </h4> |
<p> | <p> | ||
− | In order to cure diseases caused by modified bases in the transcriptome level, it is necessary to know the sequence of all the RNA in the cell. Since the diseases are caused by the presence of modified nucleobases, the sequencing technology has to be able to identify the modified bases. Illumina sequencing could identify the modified bases position in the transcriptome, but | + | In order to cure diseases caused by modified bases in the transcriptome level, it is necessary to know the sequence of all the RNA in the cell. Since the diseases are caused by the presence of modified nucleobases, the sequencing technology has to be able to identify the modified bases. Illumina sequencing could identify the modified bases position in the transcriptome, but it lacks the correlation information among the modified bases within the RNA strand. Mass spectrometry of RNA is another way to determine the positions of the modified bases. This technique provides information regarding their correlation, but its low throughput makes it unable to sequence the entire human transcriptome.<br> |
</p> | </p> | ||
− | <h4 class="mg-md text-center"> | + | <img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/2/2e/T--NTU-Singapore--Nanopore_Tech.jpg" class="img-responsive center-block lazyload" width="50%" height="50%" style="padding: 1em 0em;"/> |
+ | <h4 class="mg-md text-center" style="padding-bottom: 1em;"> | ||
Figure 2. Mechanism of Nanopore Sequencing | Figure 2. Mechanism of Nanopore Sequencing | ||
− | </h4>< | + | </h4> |
− | + | <p style="padding-top:1em;"> | |
In this project, we explored the nanopore technology for identification of non-canonical nucleobases in the RNA. Nanopore sequencing is a high-throughput direct sequencing technique which could provide information on the correlation among the modified bases. In nanopore sequencing, the RNA will go through the nanopore from 3’ and it will produce electrical signals. The electrical signals are determined by each 5-mer of the RNA sequence. We will then compare the difference between the electrical signals generated by RNA with modified bases and those generated by normal RNA.<br> | In this project, we explored the nanopore technology for identification of non-canonical nucleobases in the RNA. Nanopore sequencing is a high-throughput direct sequencing technique which could provide information on the correlation among the modified bases. In nanopore sequencing, the RNA will go through the nanopore from 3’ and it will produce electrical signals. The electrical signals are determined by each 5-mer of the RNA sequence. We will then compare the difference between the electrical signals generated by RNA with modified bases and those generated by normal RNA.<br> | ||
</p> | </p> | ||
Line 59: | Line 61: | ||
<span class="fa fa-chevron-right"></span> Identification of Inosine in RNA | <span class="fa fa-chevron-right"></span> Identification of Inosine in RNA | ||
</h2> | </h2> | ||
− | <p class=" text- | + | <p class="text-left" style="padding-top:1em;"> |
− | Synthetic RNA samples were produced from PCR-amplified DNA gBlocks with a predefined sequence. The forward primer was designed to contain overhang T7 promoter, while the reverse primer contained polyA tail to enable binding of adapters for nanopore sequencing. In vitro transcription (IVT) was done on the amplified DNA using inosine (I) as the modified nucleotide to replace all the canonical guanosine (G) while keeping A, U, and C | + | Synthetic RNA samples were produced from PCR-amplified DNA gBlocks with a predefined sequence. The forward primer was designed to contain overhang T7 promoter, while the reverse primer contained polyA tail to enable binding of adapters for nanopore sequencing. In vitro transcription (IVT) was done on the amplified DNA using inosine (I) as the modified nucleotide to replace all the canonical guanosine (G) while keeping A, U, and C unchanged. Another synthetic RNA sample containing only canonical nucleobases with the exact same sequence as the DNA template was also produced as the negative control. <br> |
</p> | </p> | ||
− | <p class=" text- | + | <p class="text-left"> |
− | The DNA gBlock sequences were designed so that the guanosines were positioned every | + | The DNA gBlock sequences were designed so that the guanosines were positioned every 10 to 11 nucleotides other than G, with a total length of around 1 kb. We tried different variations of G sequences in the gBlocks, such as xxGxx, xxGGxx, xxGGGxx, and xxGxGxx where x is any canonical nucleotide other than G. We wanted to find out if our method could differentiate different G sequences.<br> |
</p> | </p> | ||
− | <p class=" text- | + | <p class="text-left"> |
− | We weren’t sure if I would result in much different signal from G. So, we also produced RNA samples which were labeled with acrylonitrile. The acrylonitrile attached only to the Is and Gs in the RNA samples which we hope would produce more distinct signals compared to normal guanosines so that it would be easier to distinguish between I and G in the RNA strands.<br> | + | We weren’t sure if I would result in a much different signal from G. So, we also produced RNA samples which were labeled with acrylonitrile. The acrylonitrile attached only to the Is and Gs in the RNA samples which we hope would produce more distinct signals compared to normal guanosines so that it would be easier to distinguish between I and G in the RNA strands.<br> |
</p> | </p> | ||
− | <p class=" text- | + | <img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/4/48/T--NTU-Singapore--ACN_label.jpg" class="img-responsive center-block lazyload" width="70%" height="70%" style="padding: 1em 0em;"/> |
− | The current signals produced by all the samples were compared to the negative control, which is the normal RNA sample containing G without acrylonitrile. The electrical signal data were analyzed by machine learning to produce data of % modification in each position in the RNA samples. The % modification is the percent of nanopore electrical signals in that position generated by modified-base-containing RNA sample that | + | <h4 class="mg-md text-center" style="padding-bottom: 1em;"> |
+ | Figure 3. Inosine labelling by acrylonitrile | ||
+ | </h4> | ||
+ | <p class="text-left"> | ||
+ | The current signals produced by all the samples were compared to the negative control, which is the normal RNA sample containing G without acrylonitrile. The electrical signal data were analyzed by machine learning to produce data of % modification in each position in the RNA samples. The % modification is the percent of nanopore electrical signals in that position generated by modified-base-containing RNA sample that is different from the normal RNA. A peak indicates that the difference in the produced signals is much more apparent in that position than the surrounding positions.<br> | ||
</p> | </p> | ||
+ | <img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/6/67/T--NTU-Singapore--xxGxx_I.jpg" class="img-responsive center-block lazyload" style="padding-left:2.5em;"/> | ||
<h4 class="mg-md text-center" style="padding-top:1em;"> | <h4 class="mg-md text-center" style="padding-top:1em;"> | ||
− | Figure | + | Figure 4. Percentage of modification in xxGxx variation with inosine as the modified base<br> |
− | </h4>< | + | </h4> |
− | + | <p class="text-left" style="padding-top: 1em;"> | |
− | We define the peaks that indicate the presence of inosine | + | We define the peaks that indicate the presence of inosine is those within 4 positions away from the actual inosine position. In the case of xxGxx sequences, we found that 61% of the peaks indicating the presence of inosine are located 1 or 2 positions behind the actual position. We also found that the signals from the sample with inosine labeled with acrylonitrile (I with ACN) produced a higher % modification compared to the sample with inosine without acrylonitrile (I no ACN) in most of the positions, which is what we expected.<br> |
</p> | </p> | ||
+ | <img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/5/5d/T--NTU-Singapore--xxGGxx.jpg" class="img-responsive center-block lazyload" style="padding-left:2.5em;"/> | ||
<h4 class="mg-md text-center" style="padding-top:1em;"> | <h4 class="mg-md text-center" style="padding-top:1em;"> | ||
− | Figure | + | Figure 5. Percentage of modification in xxGGxx variation with inosine as the modified base |
− | </h4 | + | </h4> |
− | <p class=" text- | + | <p class="text-left"> |
For xxGGxx variation, we found that 45% of the peaks corresponding to the inosine were located on the first inosine and 2 positions behind the first inosine. We speculate that the peak at 2 positions behind the first inosine and the one on the first inosine corresponds to the first inosine and second inosine respectively. However, there are also positions where there is only 1 peak, which means only 1 modified base was detected in that position.<br> | For xxGGxx variation, we found that 45% of the peaks corresponding to the inosine were located on the first inosine and 2 positions behind the first inosine. We speculate that the peak at 2 positions behind the first inosine and the one on the first inosine corresponds to the first inosine and second inosine respectively. However, there are also positions where there is only 1 peak, which means only 1 modified base was detected in that position.<br> | ||
</p> | </p> | ||
+ | <img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/5/50/T--NTU-Singapore--xxGGGxx_I.jpg" class="img-responsive center-block lazyload"style="padding-left:2.5em;"/> | ||
<h4 class="mg-md text-center" style="padding-top:1em;"> | <h4 class="mg-md text-center" style="padding-top:1em;"> | ||
− | Figure | + | Figure 6. Percentage of modification in xxGGGxx variation with inosine as the modified base |
− | </h4 | + | </h4> |
− | <p class=" text- | + | <p class="text-left"> |
− | In | + | In xxGGGxx patterns, we could see that over 70% of them have a peak in the middle inosine position. However, there is no obvious pattern for the other peaks near the inosines.<br> |
</p> | </p> | ||
+ | <img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/1/11/T--NTU-Singapore--xxGxGxx_I.jpg" class="img-responsive center-block lazyload" style="padding-left:2.5em;"/> | ||
<h4 class="mg-md text-center" style="padding-top:1em;"> | <h4 class="mg-md text-center" style="padding-top:1em;"> | ||
− | + | Figure 7. Percentage of modification in xxGxGxx variation with inosine as the modified base | |
− | + | </h4> | |
− | + | <p class="text-left"> | |
− | Figure 7. Percentage of modification in | + | In the case of xxGxGxx variation, it is shown that 60% of xxGxGxx sequences have a peak in the between of the inosines and about half of them have another peak behind the first inosine. In this case, the second inosine is identified more easily than the first inosine. <br> |
− | </h4 | + | |
− | <p class=" text- | + | |
− | In | + | |
</p> | </p> | ||
<div class="divider-h"> | <div class="divider-h"> | ||
Line 102: | Line 109: | ||
</div> | </div> | ||
<h2 class="mg-md "> | <h2 class="mg-md "> | ||
− | <span class="fa fa-chevron-right"></span> Identification of | + | <span class="fa fa-chevron-right"></span> Identification of Pseudouridine in RNA |
</h2> | </h2> | ||
− | <p class=" text- | + | <p class="text-left"> |
− | We also tried to identify | + | We also tried to identify another non-canonical nucleobase, pseudouridine (Ψ), through nanopore sequencing. The methods were exactly the same as detecting inosines, but now we used another predefined DNA gBlocks with xxTxx and xxTTxx variations.<br> |
</p> | </p> | ||
− | <h4 class="mg-md text-center"> | + | <img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/b/b8/T--NTU-Singapore--xxTxx-and-xxTTxx_1.jpg" class="img-responsive center-block lazyload" style="padding-left:2.5em;"/><img src="img/lazyload-ph.png" data-src="https://static.igem.org/mediawiki/2018/a/a0/T--NTU-Singapore--xxTxx-and-xxTTxx_2.jpg" class="img-responsive center-block lazyload" style="padding-left:2.5em;"/> |
+ | <h4 class="mg-md text-center" style="padding-top:1em;"> | ||
Figure 8. Percentage of modification in xxTxx and xxTTxx variation with pseudouridine as the modified base | Figure 8. Percentage of modification in xxTxx and xxTTxx variation with pseudouridine as the modified base | ||
− | </h4>< | + | </h4> |
− | + | <p class="text-left" style="padding-top:1em;"> | |
− | + | In our Ψ-containing RNA samples, we observed that in 47% of the positions of the modified bases, both single and double T variations, the peaks are located 2 positions behind the Ψ or the first Ψ for double T variation.<br> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</p> | </p> | ||
<div class="divider-h"> | <div class="divider-h"> | ||
Line 131: | Line 127: | ||
<span class="fa fa-chevron-right"></span> Conclusion | <span class="fa fa-chevron-right"></span> Conclusion | ||
</h2> | </h2> | ||
− | <p class=" text- | + | <p class="text-left"> |
− | + | In conclusion, we found that our current analysis to detect modified bases could detect the signal changes at the positions near the modified bases, indicating that there is indeed a structural effect towards nanopore electrical signals. However, our current model is still oversimplified, as shown by the fact that most of the peaks in the graphs not being located at the exact position of the modified bases. Further adjustment to the analysis is required to take into account other factors affecting the nanopore electrical signals.<br> | |
</p> | </p> | ||
<div class="divider-h"> | <div class="divider-h"> |
Latest revision as of 03:57, 18 October 2018
Project Nanopore
Motivation
After our interview with the medical professionals, we recognized that the high cost of RNA sequencing would be a big hurdle in providing affordable gene therapy based on RNA editing. Also, the current next-generation sequencing methods do not provide unbiased direct reads of the transcriptome and ignore important modifications on the nucleobases. As such, to improve the affordability and the accuracy of RNA sequencing in the future, we proposed to develop a high-throughput, unbiased and modification-sensitive RNA sequencing method based on nanopore technologies.
Background
The human transcriptome contains not only the four canonical nucleobases – adenine (A), uracil (U), cytosine (C), and guanine (G), but also non-canonical ones such as inosine (I), pseudouridine (Ψ), 5-methylcytosine (m5C), and many others. These non-canonical nucleobases are naturally present in our cells to regulate gene expression. However, there are also modified bases that are not supposed to occur in healthy cells, which in turn could lead to certain diseases.
Figure 1. Common Modified Bases in RNA
In order to cure diseases caused by modified bases in the transcriptome level, it is necessary to know the sequence of all the RNA in the cell. Since the diseases are caused by the presence of modified nucleobases, the sequencing technology has to be able to identify the modified bases. Illumina sequencing could identify the modified bases position in the transcriptome, but it lacks the correlation information among the modified bases within the RNA strand. Mass spectrometry of RNA is another way to determine the positions of the modified bases. This technique provides information regarding their correlation, but its low throughput makes it unable to sequence the entire human transcriptome.
Figure 2. Mechanism of Nanopore Sequencing
In this project, we explored the nanopore technology for identification of non-canonical nucleobases in the RNA. Nanopore sequencing is a high-throughput direct sequencing technique which could provide information on the correlation among the modified bases. In nanopore sequencing, the RNA will go through the nanopore from 3’ and it will produce electrical signals. The electrical signals are determined by each 5-mer of the RNA sequence. We will then compare the difference between the electrical signals generated by RNA with modified bases and those generated by normal RNA.
Identification of Inosine in RNA
Synthetic RNA samples were produced from PCR-amplified DNA gBlocks with a predefined sequence. The forward primer was designed to contain overhang T7 promoter, while the reverse primer contained polyA tail to enable binding of adapters for nanopore sequencing. In vitro transcription (IVT) was done on the amplified DNA using inosine (I) as the modified nucleotide to replace all the canonical guanosine (G) while keeping A, U, and C unchanged. Another synthetic RNA sample containing only canonical nucleobases with the exact same sequence as the DNA template was also produced as the negative control.
The DNA gBlock sequences were designed so that the guanosines were positioned every 10 to 11 nucleotides other than G, with a total length of around 1 kb. We tried different variations of G sequences in the gBlocks, such as xxGxx, xxGGxx, xxGGGxx, and xxGxGxx where x is any canonical nucleotide other than G. We wanted to find out if our method could differentiate different G sequences.
We weren’t sure if I would result in a much different signal from G. So, we also produced RNA samples which were labeled with acrylonitrile. The acrylonitrile attached only to the Is and Gs in the RNA samples which we hope would produce more distinct signals compared to normal guanosines so that it would be easier to distinguish between I and G in the RNA strands.
Figure 3. Inosine labelling by acrylonitrile
The current signals produced by all the samples were compared to the negative control, which is the normal RNA sample containing G without acrylonitrile. The electrical signal data were analyzed by machine learning to produce data of % modification in each position in the RNA samples. The % modification is the percent of nanopore electrical signals in that position generated by modified-base-containing RNA sample that is different from the normal RNA. A peak indicates that the difference in the produced signals is much more apparent in that position than the surrounding positions.
Figure 4. Percentage of modification in xxGxx variation with inosine as the modified base
We define the peaks that indicate the presence of inosine is those within 4 positions away from the actual inosine position. In the case of xxGxx sequences, we found that 61% of the peaks indicating the presence of inosine are located 1 or 2 positions behind the actual position. We also found that the signals from the sample with inosine labeled with acrylonitrile (I with ACN) produced a higher % modification compared to the sample with inosine without acrylonitrile (I no ACN) in most of the positions, which is what we expected.
Figure 5. Percentage of modification in xxGGxx variation with inosine as the modified base
For xxGGxx variation, we found that 45% of the peaks corresponding to the inosine were located on the first inosine and 2 positions behind the first inosine. We speculate that the peak at 2 positions behind the first inosine and the one on the first inosine corresponds to the first inosine and second inosine respectively. However, there are also positions where there is only 1 peak, which means only 1 modified base was detected in that position.
Figure 6. Percentage of modification in xxGGGxx variation with inosine as the modified base
In xxGGGxx patterns, we could see that over 70% of them have a peak in the middle inosine position. However, there is no obvious pattern for the other peaks near the inosines.
Figure 7. Percentage of modification in xxGxGxx variation with inosine as the modified base
In the case of xxGxGxx variation, it is shown that 60% of xxGxGxx sequences have a peak in the between of the inosines and about half of them have another peak behind the first inosine. In this case, the second inosine is identified more easily than the first inosine.
Identification of Pseudouridine in RNA
We also tried to identify another non-canonical nucleobase, pseudouridine (Ψ), through nanopore sequencing. The methods were exactly the same as detecting inosines, but now we used another predefined DNA gBlocks with xxTxx and xxTTxx variations.
Figure 8. Percentage of modification in xxTxx and xxTTxx variation with pseudouridine as the modified base
In our Ψ-containing RNA samples, we observed that in 47% of the positions of the modified bases, both single and double T variations, the peaks are located 2 positions behind the Ψ or the first Ψ for double T variation.
Conclusion
In conclusion, we found that our current analysis to detect modified bases could detect the signal changes at the positions near the modified bases, indicating that there is indeed a structural effect towards nanopore electrical signals. However, our current model is still oversimplified, as shown by the fact that most of the peaks in the graphs not being located at the exact position of the modified bases. Further adjustment to the analysis is required to take into account other factors affecting the nanopore electrical signals.